AI Testing

How to Test Vercel AI SDK Apps Before They Hit Production

HelpMeTest

13 May 2026 — 5 min read

You built an AI app with the Vercel AI SDK. Streaming works. The chat UI looks right in development. You deploy to Vercel and it works in your quick manual test.

Then a user hits a path where the streaming response stalls, your tool call returns data the model doesn't know how to parse, and the UI freezes. You can't reproduce it in dev. The error doesn't show up in logs. You have no idea it happened until a user reports it.

This is the Vercel AI SDK testing challenge: AI-powered Next.js apps fail in ways that standard React testing doesn't catch.

What Makes Vercel AI SDK Apps Hard to Test

The Vercel AI SDK is the de-facto standard for building AI into Next.js and TypeScript apps. It abstracts streaming, tool calls, and multi-step interactions into clean hooks and server actions. That abstraction is powerful — and it creates a testing gap.

Standard React component tests don't cover:

Streaming behavior — does your UI handle partial responses correctly? What happens when the stream stalls?
Tool call roundtrips — does your useChat hook correctly handle tool calls and their results back to the model?
Multi-step generateText — does your server-side pipeline reach the right terminal state across multiple model calls?
Provider-specific behavior — your tests pass with OpenAI, but you switched to Anthropic and edge cases behave differently
Rate limit and error handling — what does your UI actually show when the API returns a 429?

Layer 1: Unit Testing AI Functions

For server-side functions built with generateText or generateObject, start by mocking the model:

import { generateText } from 'ai';
import { MockLanguageModelV1 } from 'ai/test';

test('classifies support ticket severity correctly', async () => {
  const model = new MockLanguageModelV1({
    doGenerate: async () => ({
      rawCall: { rawPrompt: null, rawSettings: {} },
      finishReason: 'stop',
      usage: { promptTokens: 10, completionTokens: 5 },
      text: JSON.stringify({ severity: 'high', category: 'billing' }),
    }),
  });

  const result = await generateText({
    model,
    prompt: 'Ticket: I was charged twice for my subscription',
  });

  const parsed = JSON.parse(result.text);
  expect(parsed.severity).toBe('high');
  expect(parsed.category).toBe('billing');
});

The Vercel AI SDK ships MockLanguageModelV1 in the ai/test package precisely for this. Use it instead of hitting real model APIs in unit tests.

Layer 2: Testing `generateObject` Schema Validation

generateObject gives you structured outputs with Zod schema validation. Test both schema conformance and value correctness:

import { generateObject } from 'ai';
import { MockLanguageModelV1 } from 'ai/test';
import { z } from 'zod';

const TicketSchema = z.object({
  severity: z.enum(['low', 'medium', 'high', 'critical']),
  assignee: z.string(),
  requiresHuman: z.boolean(),
});

test('critical billing issue requires human escalation', async () => {
  const model = new MockLanguageModelV1({
    doGenerate: async () => ({
      rawCall: { rawPrompt: null, rawSettings: {} },
      finishReason: 'stop',
      usage: { promptTokens: 15, completionTokens: 8 },
      text: JSON.stringify({
        severity: 'critical',
        assignee: 'billing-team',
        requiresHuman: true,
      }),
    }),
  });

  const { object } = await generateObject({
    model,
    schema: TicketSchema,
    prompt: 'My account was locked and I have an active paid subscription',
  });

  expect(object.severity).toBe('critical');
  expect(object.requiresHuman).toBe(true);
});

Test the edge cases your Zod schema doesn't catch — values that are technically valid but semantically wrong for your use case.

Layer 3: Testing Tool Calls

Tool calls in Vercel AI SDK apps are where most production bugs hide. The model decides when and how to call your tools — test that explicitly.

import { generateText, tool } from 'ai';
import { MockLanguageModelV1 } from 'ai/test';
import { z } from 'zod';

const toolCallLog: string[] = [];

const getOrderStatus = tool({
  description: 'Get the status of an order',
  parameters: z.object({ orderId: z.string() }),
  execute: async ({ orderId }) => {
    toolCallLog.push(`getOrderStatus:${orderId}`);
    return { status: 'shipped', estimatedDelivery: '2026-05-15' };
  },
});

test('agent calls correct tool for order status query', async () => {
  const model = new MockLanguageModelV1({
    doGenerate: async () => ({
      rawCall: { rawPrompt: null, rawSettings: {} },
      finishReason: 'tool-calls',
      usage: { promptTokens: 20, completionTokens: 10 },
      toolCalls: [{
        toolCallType: 'function',
        toolCallId: 'call_1',
        toolName: 'getOrderStatus',
        args: JSON.stringify({ orderId: '8823' }),
      }],
    }),
  });

  await generateText({
    model,
    tools: { getOrderStatus },
    prompt: 'Where is order 8823?',
  });

  expect(toolCallLog).toContain('getOrderStatus:8823');
});

Test both correct tool selection and incorrect cases — inputs where the model might ambiguously pick between tools.

Layer 4: Testing Streaming with `useChat`

The useChat hook manages streaming chat in your React UI. Testing streaming behavior requires a test environment that simulates the SDK's stream protocol.

Use MSW (Mock Service Worker) to intercept the chat API route and return simulated stream chunks:

import { render, screen, fireEvent, waitFor } from '@testing-library/react';
import { http, HttpResponse } from 'msw';
import { setupServer } from 'msw/node';
import ChatComponent from './ChatComponent';

const server = setupServer(
  http.post('/api/chat', () => {
    // Simulate Vercel AI SDK streaming format
    const stream = new ReadableStream({
      start(controller) {
        controller.enqueue('0:"Hello, "\n');
        controller.enqueue('0:"how can I help you today?"\n');
        controller.enqueue('d:{"finishReason":"stop"}\n');
        controller.close();
      },
    });
    return new HttpResponse(stream, {
      headers: { 'Content-Type': 'text/event-stream' },
    });
  })
);

beforeAll(() => server.listen());
afterEach(() => server.resetHandlers());
afterAll(() => server.close());

test('chat renders streaming response correctly', async () => {
  render(<ChatComponent />);
  
  fireEvent.change(screen.getByRole('textbox'), { target: { value: 'Hello' } });
  fireEvent.submit(screen.getByRole('form'));
  
  await waitFor(() => {
    expect(screen.getByText(/how can I help you today/)).toBeInTheDocument();
  });
});

Layer 5: Testing Error States

Test what your UI shows when the AI API fails. Most apps have a happy path test — almost none have error state tests.

test('shows error message when API returns 429', async () => {
  server.use(
    http.post('/api/chat', () => {
      return HttpResponse.json(
        { error: 'Rate limit exceeded' },
        { status: 429 }
      );
    })
  );

  render(<ChatComponent />);
  fireEvent.change(screen.getByRole('textbox'), { target: { value: 'Hello' } });
  fireEvent.submit(screen.getByRole('form'));

  await waitFor(() => {
    expect(screen.getByRole('alert')).toHaveTextContent(/try again/i);
  });
});

Your users will hit rate limits, network timeouts, and model errors. Test what they see when that happens.

What Code-Level Tests Miss

Unit and component tests verify your code logic. They don't verify behavior in production conditions.

Vercel AI SDK apps fail in production for reasons outside your test suite:

Model provider changes — you switch from OpenAI to Anthropic, or OpenAI updates a model version. Tool call formats and output patterns shift.
Prompt drift — your system prompt worked with last month's model. This month's model interprets one instruction differently, causing a 15% rate of wrong tool selections.
Real user inputs — users phrase requests in ways your test fixtures don't cover. Edge cases you didn't imagine cause unexpected tool selections.
Latency degradation — the model API slows down under load. Streaming stalls. Your UI timeout isn't set correctly and users see a hung interface.

Monitoring Vercel AI SDK Apps in Production

Once your app is live, you need ongoing behavioral tests that run against the real deployment.

HelpMeTest lets you write natural language behavioral tests against your deployed app and run them on a schedule:

Test: chat responds to product question with relevant answer
Go to https://yourapp.vercel.app
Click chat widget
Type "What are your pricing plans?"
Then: response mentions pricing within 10 seconds
And: response contains a price or plan name
And: no error message is shown

Tests run every hour. If your app's behavior shifts after a model update, a prompt change, or a dependency bump, you find out before your users do.

Free tier: 10 tests, unlimited health checks. Try HelpMeTest →

Vercel AI SDK Testing Checklist

Before shipping any Vercel AI SDK app:

Unit tests for all generateText / generateObject functions using MockLanguageModelV1
Schema validation tests — correct values, not just correct types
Tool selection tests — verify the right tool is called for each input type
Streaming UI tests with MSW interceptors
Error state tests — 429, 500, network timeout
Multi-step pipeline tests — does the full chain reach the right terminal state?
Rate limit handling — does the UI recover gracefully?
Production behavioral monitoring for drift after model or prompt changes

The streaming works in dev. The question is whether it works correctly in production, for the inputs your users actually send.

How to Test Vercel AI SDK Apps Before They Hit Production

HelpMeTest

What Makes Vercel AI SDK Apps Hard to Test

Layer 1: Unit Testing AI Functions

Layer 2: Testing `generateObject` Schema Validation

Layer 3: Testing Tool Calls

Layer 4: Testing Streaming with `useChat`

Layer 5: Testing Error States

What Code-Level Tests Miss

Monitoring Vercel AI SDK Apps in Production

Vercel AI SDK Testing Checklist

Read more

Testing React Router v7 with Vite + Vitest: Setup and Best Practices

E2E Testing React Router v7 Apps with Playwright

Migrating from Remix to React Router v7: Testing Your Migration

Testing React Router v7 Loaders and Actions with Vitest

What Makes Vercel AI SDK Apps Hard to Test

Layer 1: Unit Testing AI Functions

Layer 2: Testing generateObject Schema Validation

Layer 3: Testing Tool Calls

Layer 4: Testing Streaming with useChat

Layer 5: Testing Error States

What Code-Level Tests Miss

Monitoring Vercel AI SDK Apps in Production

Vercel AI SDK Testing Checklist

Read more

Testing React Router v7 with Vite + Vitest: Setup and Best Practices

E2E Testing React Router v7 Apps with Playwright

Migrating from Remix to React Router v7: Testing Your Migration

Testing React Router v7 Loaders and Actions with Vitest

Layer 2: Testing `generateObject` Schema Validation

Layer 4: Testing Streaming with `useChat`