MCP Server Testing Patterns: From Unit to End-to-End

MCP Server Testing Patterns: From Unit to End-to-End

Basic MCP server tests cover tool invocation and resource reading. Advanced patterns go further: property-based testing to discover unexpected inputs, fuzz testing for protocol robustness, multi-client concurrency testing, transport-layer validation, and performance benchmarking. This guide covers those advanced patterns.

A previous guide covered the basics of testing MCP servers — unit testing tools, resources, and prompts with in-memory transports. This guide goes deeper: advanced testing patterns that discover the bugs basic tests miss.

Property-Based Testing for Tool Schemas

Unit tests with hand-picked examples miss entire classes of inputs. Property-based testing generates thousands of inputs automatically and verifies that your tool satisfies declared properties for all of them.

Using Hypothesis (Python):

from hypothesis import given, strategies as st, settings
from hypothesis.strategies import SearchStrategy
import pytest

def valid_path_strategy() -> SearchStrategy:
    """Generate valid-looking file system paths."""
    components = st.lists(
        st.text(alphabet=st.characters(whitelist_categories=("Lu", "Ll", "Nd"), min_codepoints=1), min_size=1, max_size=20),
        min_size=1, max_size=5
    )
    return components.map(lambda parts: "/" + "/".join(parts))

@settings(max_examples=200, deadline=5000)
@given(path=valid_path_strategy())
def test_list_files_never_crashes(mcp_client, path):
    """list_files must never raise an exception — only return error content."""
    result = mcp_client.call_tool_sync("list_files", {"path": path})
    
    # Tool must always return a result — never raise
    assert result is not None
    assert hasattr(result, "content")
    assert hasattr(result, "isError")
    assert isinstance(result.isError, bool)
    
    # Content must always be a list
    assert isinstance(result.content, list)
    assert len(result.content) >= 1
    
    # Each content item must have type and text
    for item in result.content:
        assert "type" in item
        assert item["type"] in ("text", "image", "resource")

@settings(max_examples=100)
@given(query=st.text(min_size=0, max_size=500))
def test_search_handles_all_text_inputs(mcp_client, query):
    """Search tool must handle any text input without crashing."""
    result = mcp_client.call_tool_sync("search", {"query": query})
    assert result is not None
    # If query is empty or whitespace-only, tool may return an error, but must not crash
    if not query.strip():
        assert result.isError  # Expected: empty queries return error
    else:
        assert isinstance(result.content, list)

Property-based tests catch the off-by-one errors, Unicode edge cases, and empty input failures that manual test case selection misses.

Fuzz Testing for Protocol Robustness

Your MCP server must handle malformed JSON-RPC messages gracefully. Fuzz testing sends malformed requests and verifies the server doesn't crash:

// fuzz/mcp-protocol-fuzz.ts
import { spawn, ChildProcess } from "child_process";
import { randomBytes } from "crypto";

function generateMalformedRequest(): string {
  const mutations = [
    // Missing required fields
    () => JSON.stringify({ jsonrpc: "2.0", id: 1 }), // Missing method
    () => JSON.stringify({ jsonrpc: "2.0", method: "tools/call" }), // Missing id for non-notification
    
    // Wrong types
    () => JSON.stringify({ jsonrpc: "2.0", id: "not-a-number-or-null", method: "tools/list" }),
    () => JSON.stringify({ jsonrpc: "2.0", id: 1, method: 42 }), // method should be string
    
    // Invalid JSON
    () => `{broken json: ${randomBytes(20).toString("hex")}`,
    () => randomBytes(100).toString("utf8"),
    () => "",
    () => "null",
    
    // Oversized requests
    () => JSON.stringify({ 
      jsonrpc: "2.0", id: 1, method: "tools/call",
      params: { name: "a".repeat(100000) }
    }),
    
    // SQL/command injection in tool arguments
    () => JSON.stringify({
      jsonrpc: "2.0", id: 1, method: "tools/call",
      params: { name: "list_files", arguments: { path: "; rm -rf /" } }
    }),
    () => JSON.stringify({
      jsonrpc: "2.0", id: 1, method: "tools/call",
      params: { name: "query_db", arguments: { sql: "'; DROP TABLE users; --" } }
    }),
  ];
  
  return mutations[Math.floor(Math.random() * mutations.length)]();
}

async function fuzzServer(serverCommand: string[], iterations: number): Promise<void> {
  let crashCount = 0;
  
  for (let i = 0; i < iterations; i++) {
    const server: ChildProcess = spawn(serverCommand[0], serverCommand.slice(1), {
      stdio: ["pipe", "pipe", "pipe"]
    });
    
    const malformed = generateMalformedRequest();
    const crashed = await new Promise<boolean>((resolve) => {
      const timeout = setTimeout(() => resolve(false), 2000);
      
      server.on("exit", (code, signal) => {
        clearTimeout(timeout);
        if (signal === "SIGSEGV" || signal === "SIGABRT" || code === null) {
          console.error(`CRASH on input: ${malformed.slice(0, 100)}`);
          resolve(true);
        } else {
          resolve(false);
        }
      });
      
      server.stdin!.write(malformed + "\n");
      server.stdin!.end();
    });
    
    if (crashed) crashCount++;
    server.kill();
  }
  
  if (crashCount > 0) {
    throw new Error(`Server crashed ${crashCount}/${iterations} times during fuzz testing`);
  }
  
  console.log(`✓ Fuzz testing passed: 0 crashes in ${iterations} iterations`);
}

// Run it
fuzzServer(["node", "dist/server.js"], 500).catch(err => {
  console.error(err);
  process.exit(1);
});

Fuzz testing is particularly important for MCP servers exposed to untrusted AI agent inputs — an AI can be prompted to construct unusual tool arguments.

Multi-Client Concurrency Testing

Multiple AI agents may connect to your MCP server simultaneously. Test that concurrent access is safe:

// tests/concurrency.test.ts
import { createTestServer } from "./helpers";
import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { InMemoryTransport } from "@modelcontextprotocol/sdk/inMemory.js";

async function createClientPair(serverFactory: () => any) {
  const [clientTransport, serverTransport] = InMemoryTransport.createLinkedPair();
  const client = new Client({ name: "test-client", version: "1.0.0" }, { capabilities: {} });
  const server = serverFactory();
  await server.connect(serverTransport);
  await client.connect(clientTransport);
  return client;
}

describe("Concurrent access", () => {
  it("handles 10 simultaneous tool calls without data corruption", async () => {
    const serverFactory = createTestServer;
    
    // Create 10 clients connected to the same server
    const clients = await Promise.all(
      Array.from({ length: 10 }, () => createClientPair(serverFactory))
    );
    
    // Each client creates a record concurrently
    const results = await Promise.all(
      clients.map((client, i) =>
        client.callTool({
          name: "create_record",
          arguments: { name: `Record-${i}`, value: i * 100 }
        })
      )
    );
    
    // All must succeed
    expect(results.every(r => !r.isError)).toBe(true);
    
    // All IDs must be unique (no data corruption)
    const ids = results.map(r => JSON.parse(r.content[0].text).id);
    const uniqueIds = new Set(ids);
    expect(uniqueIds.size).toBe(10);
    
    await Promise.all(clients.map(c => c.close()));
  });

  it("isolates sessions — one client cannot see another's state", async () => {
    const [client1, client2] = await Promise.all([
      createClientPair(createTestServer),
      createClientPair(createTestServer)
    ]);
    
    // Client 1 stores a secret value
    await client1.callTool({
      name: "set_session_value",
      arguments: { key: "secret", value: "client1-private-data" }
    });
    
    // Client 2 tries to read client 1's session data
    const result = await client2.callTool({
      name: "get_session_value",
      arguments: { key: "secret" }
    });
    
    // Must not find client 1's data
    const value = JSON.parse(result.content[0].text).value;
    expect(value).toBeNull();
    
    await client1.close();
    await client2.close();
  });
});

Session isolation is critical for multi-tenant MCP servers — an AI agent for user A must never access user B's data.

Transport Layer Testing

MCP supports stdio and SSE transports. Test both:

// tests/transport.test.ts
import { spawn } from "child_process";
import { StdioClientTransport } from "@modelcontextprotocol/sdk/client/stdio.js";
import { SSEClientTransport } from "@modelcontextprotocol/sdk/client/sse.js";
import { Client } from "@modelcontextprotocol/sdk/client/index.js";

describe("Stdio transport", () => {
  it("initializes and lists tools via stdio", async () => {
    const serverProcess = spawn("node", ["dist/server.js"], {
      stdio: ["pipe", "pipe", "pipe"]
    });
    
    const transport = new StdioClientTransport({
      stdin: serverProcess.stdin!,
      stdout: serverProcess.stdout!
    });
    
    const client = new Client({ name: "test", version: "1.0.0" }, { capabilities: {} });
    await client.connect(transport);
    
    const { tools } = await client.listTools();
    expect(tools.length).toBeGreaterThan(0);
    
    await client.close();
    serverProcess.kill();
  });
  
  it("handles server restart gracefully", async () => {
    const serverProcess = spawn("node", ["dist/server.js"]);
    const transport = new StdioClientTransport({
      stdin: serverProcess.stdin!,
      stdout: serverProcess.stdout!
    });
    const client = new Client({ name: "test", version: "1.0.0" }, { capabilities: {} });
    await client.connect(transport);
    
    // Kill the server mid-session
    serverProcess.kill("SIGTERM");
    
    // Client should detect the disconnect, not hang
    await expect(client.listTools()).rejects.toThrow();
    
    await client.close();
  });
});

describe("SSE transport", () => {
  it("lists tools via SSE transport", async () => {
    // Server must be running at localhost:3001 for this test
    const transport = new SSEClientTransport(new URL("http://localhost:3001/sse"));
    const client = new Client({ name: "test", version: "1.0.0" }, { capabilities: {} });
    
    await client.connect(transport);
    const { tools } = await client.listTools();
    
    expect(tools.length).toBeGreaterThan(0);
    await client.close();
  }, 10000);
});

Testing both transports catches transport-specific bugs. Some tools work perfectly over stdio but fail over SSE due to encoding issues or buffering differences.

Performance Benchmarking

AI agents may call your tools hundreds of times in a single session. Benchmark performance under load:

// benchmarks/tool-throughput.ts
import { createTestClient } from "../tests/helpers";

async function benchmark(
  name: string,
  fn: () => Promise<void>,
  iterations: number
): Promise<void> {
  const times: number[] = [];
  
  for (let i = 0; i < iterations; i++) {
    const start = performance.now();
    await fn();
    times.push(performance.now() - start);
  }
  
  times.sort((a, b) => a - b);
  const avg = times.reduce((s, t) => s + t, 0) / times.length;
  const p50 = times[Math.floor(times.length * 0.50)];
  const p95 = times[Math.floor(times.length * 0.95)];
  const p99 = times[Math.floor(times.length * 0.99)];
  
  console.log(`${name}:`);
  console.log(`  avg=${avg.toFixed(1)}ms p50=${p50.toFixed(1)}ms p95=${p95.toFixed(1)}ms p99=${p99.toFixed(1)}ms`);
}

async function main() {
  const client = await createTestClient();
  
  await benchmark(
    "list_tools",
    () => client.listTools(),
    1000
  );
  
  await benchmark(
    "read_resource (small file, 1KB)",
    () => client.readResource({ uri: "files:///tmp/small.txt" }),
    200
  );
  
  await benchmark(
    "tools/call (fast tool, no I/O)",
    () => client.callTool({ name: "calculate", arguments: { expression: "2 + 2" } }),
    500
  );
  
  await benchmark(
    "tools/call (database query)",
    () => client.callTool({ name: "query_users", arguments: { limit: 10 } }),
    100
  );
  
  await client.close();
}

main().catch(console.error);

Run benchmarks as part of your performance regression testing:

# .github/workflows/performance.yml
- name: Run MCP benchmarks
  run: node dist/benchmarks/tool-throughput.js | tee benchmark-results.txt

- name: Check performance thresholds
  run: |
    # Fail if list_tools p95 exceeds 50ms
    P95=$(grep "list_tools" benchmark-results.txt | grep -oP 'p95=\K[\d.]+')
    if (( $(echo "$P95 > 50" | bc -l) )); then
      echo "FAIL: list_tools p95=${P95}ms exceeds 50ms threshold"
      exit 1
    fi
    echo "✓ list_tools p95=${P95}ms"

Testing with HelpMeTest's MCP Server

HelpMeTest itself exposes an MCP server that you can integrate into your development workflow. Install it with:

curl -fsSL https://helpmetest.com/install | bash
helpmetest install mcp --claude HELP-your-token-here

This gives Claude Code tools to run tests, create test scenarios, and monitor health checks directly from your coding session. You can write tests for your application using natural language in Claude, and HelpMeTest executes them with Robot Framework and Playwright.

To validate your own MCP server using HelpMeTest's E2E capabilities:

*** Test Cases ***
MCP Server Tool Returns Valid JSON
    # Use HelpMeTest to test your MCP server's HTTP endpoint
    ${response}=  Go To  http://localhost:3001/health
    Wait For Elements State  body  visible  timeout=5s
    Get Text  body  contains  ok

Testing Checklist: Advanced MCP Patterns

  • Property-based tests cover tools with text/path/ID inputs
  • Fuzz tests verify server doesn't crash on malformed JSON-RPC
  • Fuzz tests verify injection-safe handling of tool arguments
  • Concurrent client tests verify no data corruption
  • Session isolation tests verify no data leakage between clients
  • Stdio transport tests pass
  • SSE transport tests pass (if applicable)
  • Performance benchmarks established for all tools
  • Performance regression gates in CI
  • Server restart/disconnect handled gracefully by clients

Conclusion

Advanced MCP server testing goes beyond basic tool invocation. Property-based testing discovers input classes that manual examples miss. Fuzz testing builds protocol robustness. Concurrency tests validate safe multi-client access. Transport tests ensure both stdio and SSE work correctly. Performance benchmarks prevent latency regressions. Together, these patterns build MCP servers that AI agents can depend on in production — servers that handle unexpected inputs, survive network issues, and perform consistently under load.

Read more