MCP

How to Test MCP Tool Implementations: Schema, Handlers, and Error Propagation

HelpMeTest

18 May 2026 — 6 min read

Your MCP tool works in Claude Desktop. You've called it manually a dozen times. It returns results. You're confident.

Then a user sends an unexpected input format, or calls two tools in sequence with shared state, or the downstream API your tool depends on returns a 503. And your tool either hangs, returns malformed output, or crashes the server process entirely.

The problem isn't that MCP tools are unreliable. The problem is that most developers test the happy path and ship. Testing MCP tool implementations properly — schemas, handlers, and error paths — is a different discipline than testing the function that backs the tool.

Here's how to do it systematically.

What "Testing an MCP Tool" Actually Means

An MCP tool has four distinct layers, each of which can fail independently:

The schema — the JSON Schema definition of inputs. Does it correctly describe what the handler expects?
The handler — the function that receives validated inputs and produces outputs. Is the logic correct?
The protocol wrapper — does the handler output get serialized into a valid MCP CallToolResult? Is isError set correctly?
The MCP client interaction — does the tool behave correctly when called via the MCP protocol, including edge cases like missing params, extra params, or concurrent calls?

Most developers only test layer 2. All four need coverage.

Testing Input Schema Validation

Your tool's input schema is a contract. When Claude or any MCP client calls your tool, it constructs a call that matches what your schema advertises. If the schema is wrong — too permissive, too strict, or simply incorrect — callers get confused and your tool breaks.

Test 1: Required fields.

import { describe, it, expect } from 'vitest';
import Ajv from 'ajv';

const schema = {
  type: 'object',
  properties: {
    query: { type: 'string' },
    limit: { type: 'number', minimum: 1, maximum: 100 }
  },
  required: ['query']
};

describe('search tool schema', () => {
  const ajv = new Ajv();
  const validate = ajv.compile(schema);

  it('accepts valid input', () => {
    expect(validate({ query: 'test', limit: 10 })).toBe(true);
  });

  it('accepts input without optional fields', () => {
    expect(validate({ query: 'test' })).toBe(true);
  });

  it('rejects missing required field', () => {
    expect(validate({ limit: 10 })).toBe(false);
    expect(validate.errors?.[0].params).toMatchObject({ missingProperty: 'query' });
  });

  it('rejects wrong type for limit', () => {
    expect(validate({ query: 'test', limit: 'ten' })).toBe(false);
  });

  it('rejects limit below minimum', () => {
    expect(validate({ query: 'test', limit: 0 })).toBe(false);
  });
});

The goal isn't just "does validation pass?" — it's "do the error messages make sense?" When an agent receives a validation error, it uses the error message to retry. Useless error messages cause agents to loop.

Test 2: Edge cases in string fields.

it('handles empty string query', async () => {
  const result = await searchTool({ query: '', limit: 10 });
  // Empty string is valid per schema, but what does the handler do?
  // It should either return empty results or return an error — not throw.
  expect(result.isError === true || result.content.length >= 0).toBe(true);
});

it('handles very long query string', async () => {
  const longQuery = 'a'.repeat(10000);
  const result = await searchTool({ query: longQuery, limit: 10 });
  expect(result).toBeDefined();
  // Should not crash the server
});

The schema may accept these — your handler needs to handle them gracefully.

Testing Handler Correctness

Handler testing is unit testing with one extra constraint: the output must be a valid MCP tool result, not just any return value.

A valid MCP CallToolResult looks like:

{
  content: [
    { type: 'text', text: 'result string' }
  ],
  isError: false
}

Test your handler in isolation:

import { searchHandler } from '../tools/search';

describe('search handler', () => {
  it('returns text content for valid query', async () => {
    const result = await searchHandler({ query: 'Node.js testing', limit: 5 });
    
    expect(result.isError).toBe(false);
    expect(result.content).toHaveLength(1);
    expect(result.content[0].type).toBe('text');
    expect(typeof result.content[0].text).toBe('string');
    expect(result.content[0].text.length).toBeGreaterThan(0);
  });

  it('returns structured results matching the limit', async () => {
    const result = await searchHandler({ query: 'testing', limit: 3 });
    const parsed = JSON.parse(result.content[0].text);
    expect(parsed.results).toHaveLength(3);
  });
});

Test with mocked dependencies:

If your handler calls an external API, mock it. This keeps tests fast and deterministic.

import { vi, describe, it, expect, beforeEach } from 'vitest';
import { searchHandler } from '../tools/search';
import * as apiModule from '../lib/search-api';

describe('search handler with mocked API', () => {
  beforeEach(() => {
    vi.spyOn(apiModule, 'fetchSearchResults').mockResolvedValue({
      results: [
        { title: 'Result 1', url: 'https://example.com/1' },
        { title: 'Result 2', url: 'https://example.com/2' }
      ]
    });
  });

  it('calls the API with correct parameters', async () => {
    await searchHandler({ query: 'testing frameworks', limit: 10 });
    expect(apiModule.fetchSearchResults).toHaveBeenCalledWith('testing frameworks', 10);
  });

  it('formats API response into MCP content', async () => {
    const result = await searchHandler({ query: 'test', limit: 2 });
    const text = result.content[0].text;
    expect(text).toContain('Result 1');
    expect(text).toContain('https://example.com/1');
  });
});

Testing Error Propagation

Error handling is where most MCP tools fail. The MCP spec provides isError: true precisely for this — but most implementations either throw unhandled exceptions or swallow errors silently.

Test 1: Downstream API failures.

describe('error propagation', () => {
  it('returns isError when API call fails', async () => {
    vi.spyOn(apiModule, 'fetchSearchResults').mockRejectedValue(
      new Error('API rate limit exceeded')
    );

    const result = await searchHandler({ query: 'test', limit: 5 });
    
    expect(result.isError).toBe(true);
    expect(result.content[0].type).toBe('text');
    expect(result.content[0].text).toContain('rate limit');
    // The error message should be human-readable for the agent
  });

  it('returns isError on API timeout', async () => {
    vi.spyOn(apiModule, 'fetchSearchResults').mockImplementation(
      () => new Promise((_, reject) => setTimeout(() => reject(new Error('Timeout')), 100))
    );

    const result = await searchHandler({ query: 'test', limit: 5 });
    expect(result.isError).toBe(true);
  });

  it('never throws — always returns a result object', async () => {
    vi.spyOn(apiModule, 'fetchSearchResults').mockRejectedValue(
      new Error('Unexpected fatal error')
    );

    // This should not throw — it should return isError: true
    await expect(searchHandler({ query: 'test', limit: 5 })).resolves.toBeDefined();
  });
});

The critical invariant: handlers must never throw. If a handler throws, the server crashes or returns an invalid response, and the agent has no way to recover.

Test 2: Partial failure scenarios.

it('handles partial API results gracefully', async () => {
  vi.spyOn(apiModule, 'fetchSearchResults').mockResolvedValue({
    results: null, // Unexpected null instead of array
    error: 'partial result'
  });

  const result = await searchHandler({ query: 'test', limit: 5 });
  // Should handle null results without crashing
  expect(result).toBeDefined();
  expect(result.content[0]).toBeDefined();
});

Mocking the MCP Client

When testing tool behavior at the protocol level — without a real Claude or Cursor client — you need to mock the MCP client. The @modelcontextprotocol/sdk provides a Client class you can use in tests.

import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { InMemoryTransport } from '@modelcontextprotocol/sdk/inMemory.js';
import { createServer } from '../server';

describe('MCP tool via protocol', () => {
  let client: Client;
  let cleanup: () => Promise<void>;

  beforeEach(async () => {
    const [clientTransport, serverTransport] = InMemoryTransport.createLinkedPair();
    
    const server = createServer();
    await server.connect(serverTransport);

    client = new Client({ name: 'test-client', version: '1.0' });
    await client.connect(clientTransport);

    cleanup = async () => {
      await client.close();
      await server.close();
    };
  });

  afterEach(async () => {
    await cleanup();
  });

  it('lists available tools', async () => {
    const { tools } = await client.listTools();
    const toolNames = tools.map(t => t.name);
    expect(toolNames).toContain('search');
  });

  it('calls search tool via protocol', async () => {
    const result = await client.callTool({
      name: 'search',
      arguments: { query: 'MCP testing', limit: 5 }
    });

    expect(result.isError).toBe(false);
    expect(result.content).toHaveLength(1);
  });

  it('handles protocol-level missing required params', async () => {
    const result = await client.callTool({
      name: 'search',
      arguments: { limit: 5 } // Missing required 'query'
    });

    // Server should return an error result, not crash
    expect(result.isError).toBe(true);
  });
});

The InMemoryTransport is the key: it gives you a real MCP protocol connection without needing stdio or network, making tests fast and reproducible.

Testing Multiple Tools Together

Real-world MCP servers have multiple tools that may share state (database connections, rate limiters, caches). Test interactions between them.

describe('multi-tool interactions', () => {
  it('cache is shared across tool calls', async () => {
    // First call populates cache
    const result1 = await client.callTool({
      name: 'search',
      arguments: { query: 'shared state test', limit: 5 }
    });

    // Second call should hit cache (faster, same result)
    const start = Date.now();
    const result2 = await client.callTool({
      name: 'search',
      arguments: { query: 'shared state test', limit: 5 }
    });
    const elapsed = Date.now() - start;

    expect(elapsed).toBeLessThan(50); // Cache hit is fast
    expect(result1.content[0].text).toBe(result2.content[0].text);
  });

  it('concurrent tool calls do not corrupt shared state', async () => {
    const calls = Array.from({ length: 10 }, (_, i) =>
      client.callTool({
        name: 'search',
        arguments: { query: `query-${i}`, limit: 1 }
      })
    );

    const results = await Promise.all(calls);
    results.forEach(r => expect(r.isError).toBe(false));
  });
});

Running MCP Tool Tests in CI

MCP tool tests run as regular Node.js unit tests — no special infrastructure needed.

# .github/workflows/test.yml
- name: Run MCP tool tests
  run: npm test
  env:
    NODE_ENV: test
    # Mock API keys for tests
    SEARCH_API_KEY: test-key

Keep a separation between unit tests (pure handler logic, no server) and integration tests (InMemoryTransport, full protocol). Unit tests run on every commit; integration tests run on PRs.

What to Test Versus What to Skip

Test:

Every required parameter is actually required
Every optional parameter has a sane default behavior
Error paths return isError: true with a useful message
Handlers never throw unhandled exceptions
Output is valid MCP content structure

Skip:

Testing the MCP SDK itself (assume it works)
Testing that Claude parses your tool description correctly (you can't control that)
Exact string matching on tool descriptions (they change)

Using HelpMeTest for MCP Tool Tests

If your MCP server is deployed and accessible, you can use HelpMeTest to run tool tests on a schedule and get alerted when tool behavior drifts. Define a test scenario that calls your tool via a real MCP client and validates the output — then run it every 15 minutes against your production server.

This catches regressions that unit tests can't: API dependency failures, schema drift after deployment, rate limit exhaustion. The test runs whether you're watching or not.

MCP tools are contracts. Test them like contracts.

How to Test MCP Tool Implementations: Schema, Handlers, and Error Propagation

HelpMeTest

What "Testing an MCP Tool" Actually Means

Testing Input Schema Validation

Testing Handler Correctness

Testing Error Propagation

Mocking the MCP Client

Testing Multiple Tools Together

Running MCP Tool Tests in CI

What to Test Versus What to Skip

Using HelpMeTest for MCP Tool Tests

Read more

Testing LangGraph State Machine Agents: Nodes, Edges, State Transitions, and Checkpointer Mocking

Testing CrewAI Agent Crews: Task Delegation, Role Validation, Tool Assertions, and Output Validation

Testing AI Safety Guardrails: Input/Output Filter Validation, PII Detection, and Adversarial Jailbreak Testing

Testing AI Agent Observability: LangSmith Integration Tests, Trace Validation, and Cost Regression Testing