How to Test Vercel AI SDK Apps Before They Hit Production
You built an AI app with the Vercel AI SDK. Streaming works. The chat UI looks right in development. You deploy to Vercel and it works in your quick manual test.
Then a user hits a path where the streaming response stalls, your tool call returns data the model doesn't know how to parse, and the UI freezes. You can't reproduce it in dev. The error doesn't show up in logs. You have no idea it happened until a user reports it.
This is the Vercel AI SDK testing challenge: AI-powered Next.js apps fail in ways that standard React testing doesn't catch.
What Makes Vercel AI SDK Apps Hard to Test
The Vercel AI SDK is the de-facto standard for building AI into Next.js and TypeScript apps. It abstracts streaming, tool calls, and multi-step interactions into clean hooks and server actions. That abstraction is powerful — and it creates a testing gap.
Standard React component tests don't cover:
- Streaming behavior — does your UI handle partial responses correctly? What happens when the stream stalls?
- Tool call roundtrips — does your
useChathook correctly handle tool calls and their results back to the model? - Multi-step
generateText— does your server-side pipeline reach the right terminal state across multiple model calls? - Provider-specific behavior — your tests pass with OpenAI, but you switched to Anthropic and edge cases behave differently
- Rate limit and error handling — what does your UI actually show when the API returns a 429?
Layer 1: Unit Testing AI Functions
For server-side functions built with generateText or generateObject, start by mocking the model:
import { generateText } from 'ai';
import { MockLanguageModelV1 } from 'ai/test';
test('classifies support ticket severity correctly', async () => {
const model = new MockLanguageModelV1({
doGenerate: async () => ({
rawCall: { rawPrompt: null, rawSettings: {} },
finishReason: 'stop',
usage: { promptTokens: 10, completionTokens: 5 },
text: JSON.stringify({ severity: 'high', category: 'billing' }),
}),
});
const result = await generateText({
model,
prompt: 'Ticket: I was charged twice for my subscription',
});
const parsed = JSON.parse(result.text);
expect(parsed.severity).toBe('high');
expect(parsed.category).toBe('billing');
});The Vercel AI SDK ships MockLanguageModelV1 in the ai/test package precisely for this. Use it instead of hitting real model APIs in unit tests.
Layer 2: Testing generateObject Schema Validation
generateObject gives you structured outputs with Zod schema validation. Test both schema conformance and value correctness:
import { generateObject } from 'ai';
import { MockLanguageModelV1 } from 'ai/test';
import { z } from 'zod';
const TicketSchema = z.object({
severity: z.enum(['low', 'medium', 'high', 'critical']),
assignee: z.string(),
requiresHuman: z.boolean(),
});
test('critical billing issue requires human escalation', async () => {
const model = new MockLanguageModelV1({
doGenerate: async () => ({
rawCall: { rawPrompt: null, rawSettings: {} },
finishReason: 'stop',
usage: { promptTokens: 15, completionTokens: 8 },
text: JSON.stringify({
severity: 'critical',
assignee: 'billing-team',
requiresHuman: true,
}),
}),
});
const { object } = await generateObject({
model,
schema: TicketSchema,
prompt: 'My account was locked and I have an active paid subscription',
});
expect(object.severity).toBe('critical');
expect(object.requiresHuman).toBe(true);
});Test the edge cases your Zod schema doesn't catch — values that are technically valid but semantically wrong for your use case.
Layer 3: Testing Tool Calls
Tool calls in Vercel AI SDK apps are where most production bugs hide. The model decides when and how to call your tools — test that explicitly.
import { generateText, tool } from 'ai';
import { MockLanguageModelV1 } from 'ai/test';
import { z } from 'zod';
const toolCallLog: string[] = [];
const getOrderStatus = tool({
description: 'Get the status of an order',
parameters: z.object({ orderId: z.string() }),
execute: async ({ orderId }) => {
toolCallLog.push(`getOrderStatus:${orderId}`);
return { status: 'shipped', estimatedDelivery: '2026-05-15' };
},
});
test('agent calls correct tool for order status query', async () => {
const model = new MockLanguageModelV1({
doGenerate: async () => ({
rawCall: { rawPrompt: null, rawSettings: {} },
finishReason: 'tool-calls',
usage: { promptTokens: 20, completionTokens: 10 },
toolCalls: [{
toolCallType: 'function',
toolCallId: 'call_1',
toolName: 'getOrderStatus',
args: JSON.stringify({ orderId: '8823' }),
}],
}),
});
await generateText({
model,
tools: { getOrderStatus },
prompt: 'Where is order 8823?',
});
expect(toolCallLog).toContain('getOrderStatus:8823');
});Test both correct tool selection and incorrect cases — inputs where the model might ambiguously pick between tools.
Layer 4: Testing Streaming with useChat
The useChat hook manages streaming chat in your React UI. Testing streaming behavior requires a test environment that simulates the SDK's stream protocol.
Use MSW (Mock Service Worker) to intercept the chat API route and return simulated stream chunks:
import { render, screen, fireEvent, waitFor } from '@testing-library/react';
import { http, HttpResponse } from 'msw';
import { setupServer } from 'msw/node';
import ChatComponent from './ChatComponent';
const server = setupServer(
http.post('/api/chat', () => {
// Simulate Vercel AI SDK streaming format
const stream = new ReadableStream({
start(controller) {
controller.enqueue('0:"Hello, "\n');
controller.enqueue('0:"how can I help you today?"\n');
controller.enqueue('d:{"finishReason":"stop"}\n');
controller.close();
},
});
return new HttpResponse(stream, {
headers: { 'Content-Type': 'text/event-stream' },
});
})
);
beforeAll(() => server.listen());
afterEach(() => server.resetHandlers());
afterAll(() => server.close());
test('chat renders streaming response correctly', async () => {
render(<ChatComponent />);
fireEvent.change(screen.getByRole('textbox'), { target: { value: 'Hello' } });
fireEvent.submit(screen.getByRole('form'));
await waitFor(() => {
expect(screen.getByText(/how can I help you today/)).toBeInTheDocument();
});
});Layer 5: Testing Error States
Test what your UI shows when the AI API fails. Most apps have a happy path test — almost none have error state tests.
test('shows error message when API returns 429', async () => {
server.use(
http.post('/api/chat', () => {
return HttpResponse.json(
{ error: 'Rate limit exceeded' },
{ status: 429 }
);
})
);
render(<ChatComponent />);
fireEvent.change(screen.getByRole('textbox'), { target: { value: 'Hello' } });
fireEvent.submit(screen.getByRole('form'));
await waitFor(() => {
expect(screen.getByRole('alert')).toHaveTextContent(/try again/i);
});
});Your users will hit rate limits, network timeouts, and model errors. Test what they see when that happens.
What Code-Level Tests Miss
Unit and component tests verify your code logic. They don't verify behavior in production conditions.
Vercel AI SDK apps fail in production for reasons outside your test suite:
- Model provider changes — you switch from OpenAI to Anthropic, or OpenAI updates a model version. Tool call formats and output patterns shift.
- Prompt drift — your system prompt worked with last month's model. This month's model interprets one instruction differently, causing a 15% rate of wrong tool selections.
- Real user inputs — users phrase requests in ways your test fixtures don't cover. Edge cases you didn't imagine cause unexpected tool selections.
- Latency degradation — the model API slows down under load. Streaming stalls. Your UI timeout isn't set correctly and users see a hung interface.
Monitoring Vercel AI SDK Apps in Production
Once your app is live, you need ongoing behavioral tests that run against the real deployment.
HelpMeTest lets you write natural language behavioral tests against your deployed app and run them on a schedule:
Test: chat responds to product question with relevant answer
Go to https://yourapp.vercel.app
Click chat widget
Type "What are your pricing plans?"
Then: response mentions pricing within 10 seconds
And: response contains a price or plan name
And: no error message is shownTests run every hour. If your app's behavior shifts after a model update, a prompt change, or a dependency bump, you find out before your users do.
Free tier: 10 tests, unlimited health checks. Try HelpMeTest →
Vercel AI SDK Testing Checklist
Before shipping any Vercel AI SDK app:
- Unit tests for all
generateText/generateObjectfunctions usingMockLanguageModelV1 - Schema validation tests — correct values, not just correct types
- Tool selection tests — verify the right tool is called for each input type
- Streaming UI tests with MSW interceptors
- Error state tests — 429, 500, network timeout
- Multi-step pipeline tests — does the full chain reach the right terminal state?
- Rate limit handling — does the UI recover gracefully?
- Production behavioral monitoring for drift after model or prompt changes
The streaming works in dev. The question is whether it works correctly in production, for the inputs your users actually send.