Mock Data Generation for Testing: Tools, Strategies, and Best Practices

Mock Data Generation for Testing: Tools, Strategies, and Best Practices

Mock data generation is how you create realistic test data without using real production data. It covers everything from simple in-memory objects to large datasets, API response mocks, and schema-driven generators.

This guide covers the tools, strategies, and decisions behind effective mock data generation for software testing.

What Is Mock Data?

Mock data is synthetic data that resembles real data in structure and format but contains no actual user information. It's distinct from test fixtures (which are specific, predefined values) and from production data exports (which contain real PII).

Mock data serves several purposes:

  • Unit and integration tests: provide realistic inputs without real users
  • Development: populate a dev database so features look realistic
  • Performance tests: generate millions of records to test at scale
  • Demos and screenshots: realistic-looking data without showing real customers
  • Third-party integrations: simulate external API responses

In-Code Generation: Faker Libraries

The most common approach — generate mock data programmatically using a Faker library.

JavaScript: @faker-js/faker

const { faker } = require('@faker-js/faker');

const mockUser = {
  id: faker.string.uuid(),
  name: faker.person.fullName(),
  email: faker.internet.email(),
  phone: faker.phone.number(),
  address: {
    street: faker.location.streetAddress(),
    city: faker.location.city(),
    country: faker.location.country(),
    zip: faker.location.zipCode(),
  },
  company: faker.company.name(),
  jobTitle: faker.person.jobTitle(),
  avatar: faker.image.avatar(),
  registeredAt: faker.date.past({ years: 2 }),
};

Python: Faker

from faker import Faker

fake = Faker()

mock_user = {
    'id': fake.uuid4(),
    'name': fake.name(),
    'email': fake.email(),
    'phone': fake.phone_number(),
    'address': fake.address(),
    'company': fake.company(),
    'job_title': fake.job(),
    'bio': fake.paragraph(),
    'registered_at': fake.date_time_this_year(),
}

Other Languages

Language Library
Ruby ffaker, faker gem
Java JavaFaker, datafaker
C# Bogus
Go gofaker
PHP FakerPHP/Faker
Rust fake-rs

Schema-Driven Generation

When you have a JSON Schema, OpenAPI spec, or database schema, you can generate mock data that conforms to it automatically.

JSON Schema Faker (JavaScript)

npm install --save-dev json-schema-faker
const jsf = require('json-schema-faker');
const { faker } = require('@faker-js/faker');

jsf.extend('faker', () => faker);

const userSchema = {
  type: 'object',
  required: ['id', 'email', 'name'],
  properties: {
    id: { type: 'string', format: 'uuid' },
    email: { type: 'string', format: 'email' },
    name: { type: 'string', faker: 'person.fullName' },
    age: { type: 'integer', minimum: 18, maximum: 90 },
    role: { type: 'string', enum: ['admin', 'editor', 'user'] },
    score: { type: 'number', minimum: 0, maximum: 100 },
  },
};

// Generate one object
const user = await jsf.resolve(userSchema);

// Generate many
const users = await Promise.all(
  Array.from({ length: 100 }, () => jsf.resolve(userSchema))
);

Best for: Keeping test data in sync with your API contracts automatically.

OpenAPI Mock Servers

Tools like Prism (by Stoplight) read your OpenAPI spec and serve mock responses:

npm install --save-dev @stoplight/prism-cli

# Start mock server from OpenAPI spec
prism mock ./openapi.yaml

Now your API endpoints return realistic mock data without a real backend:

curl http://localhost:4010/users
# Returns: [{"id": "abc-123", "email": "test@example.com", ...}]

Best for: Frontend development, contract testing, CI without a backend.

API Response Mocking

When your code calls external APIs, you mock those API responses to:

  • Avoid flaky tests due to network issues
  • Control exactly what the API returns
  • Test error conditions that are hard to trigger in real systems

MSW (Mock Service Worker)

MSW intercepts network requests at the browser or Node.js level — no proxy needed.

npm install --save-dev msw
// mocks/handlers.js
const { http, HttpResponse } = require('msw');

const handlers = [
  http.get('https://api.example.com/users/:id', ({ params }) => {
    return HttpResponse.json({
      id: params.id,
      name: 'Alice Johnson',
      email: 'alice@example.com',
      role: 'admin',
    });
  }),
  
  http.post('https://api.example.com/orders', async ({ request }) => {
    const body = await request.json();
    return HttpResponse.json({
      id: 'order-123',
      ...body,
      status: 'pending',
      createdAt: new Date().toISOString(),
    }, { status: 201 });
  }),
  
  // Simulate an error
  http.delete('https://api.example.com/users/:id', () => {
    return HttpResponse.json(
      { error: 'Unauthorized' },
      { status: 403 }
    );
  }),
];
// tests/setup.js (Node.js)
const { setupServer } = require('msw/node');
const { handlers } = require('./mocks/handlers');

const server = setupServer(...handlers);

beforeAll(() => server.listen());
afterEach(() => server.resetHandlers());
afterAll(() => server.close());

Nock (Node.js HTTP Mocking)

const nock = require('nock');

test('fetches user from API', async () => {
  nock('https://api.example.com')
    .get('/users/123')
    .reply(200, {
      id: '123',
      name: 'Alice',
      email: 'alice@example.com',
    });
  
  const user = await fetchUser('123');
  expect(user.name).toBe('Alice');
});

Python: responses

import responses
import requests

@responses.activate
def test_fetch_user():
    responses.add(
        responses.GET,
        'https://api.example.com/users/123',
        json={'id': '123', 'name': 'Alice', 'email': 'alice@example.com'},
        status=200
    )
    
    user = fetch_user('123')
    assert user['name'] == 'Alice'

Database-Level Mock Data

For integration and end-to-end tests, you need data in the database, not just in-memory objects.

Bulk Generation Scripts

// scripts/generate-test-data.js
const { faker } = require('@faker-js/faker');
const db = require('./db');

async function generateUsers(count) {
  const users = Array.from({ length: count }, () => ({
    id: faker.string.uuid(),
    email: faker.internet.email(),
    name: faker.person.fullName(),
    role: faker.helpers.arrayElement(['admin', 'editor', 'user', 'user', 'user']),
    created_at: faker.date.past({ years: 2 }),
  }));

  // Insert in batches of 1000
  for (let i = 0; i < users.length; i += 1000) {
    await db('users').insert(users.slice(i, i + 1000));
    console.log(`Inserted ${Math.min(i + 1000, users.length)}/${count} users`);
  }
}

async function generateOrders(userIds, count) {
  const orders = Array.from({ length: count }, () => ({
    id: faker.string.uuid(),
    user_id: faker.helpers.arrayElement(userIds),
    status: faker.helpers.weightedArrayElement([
      { weight: 50, value: 'delivered' },
      { weight: 20, value: 'shipped' },
      { weight: 15, value: 'processing' },
      { weight: 10, value: 'pending' },
      { weight: 5, value: 'cancelled' },
    ]),
    total: faker.number.float({ min: 10, max: 500, precision: 0.01 }),
    created_at: faker.date.past({ years: 1 }),
  }));

  for (let i = 0; i < orders.length; i += 1000) {
    await db('orders').insert(orders.slice(i, i + 1000));
  }
}

// Generate 10,000 users and 50,000 orders
generateUsers(10000)
  .then(() => db('users').pluck('id'))
  .then(userIds => generateOrders(userIds, 50000))
  .then(() => { console.log('Done'); process.exit(0); })
  .catch(err => { console.error(err); process.exit(1); });

Mockaroo (Cloud-Based Generator)

Mockaroo generates large datasets via a web UI or API:

# Download 1000 rows as JSON via API
curl <span class="hljs-string">"https://api.mockaroo.com/api/generate.json?count=1000&key=YOUR_KEY&schema=users"

Or use the REST API with a custom schema:

curl -X POST https://api.mockaroo.com/api/generate.json \
  -H "Content-Type: application/json" \
  -H <span class="hljs-string">"X-API-Key: YOUR_KEY" \
  -d <span class="hljs-string">'{
    "count": 1000,
    "fields": [
      {"name": "id", "type": "UUID"},
      {"name": "email", "type": "Email Address"},
      {"name": "name", "type": "Full Name"},
      {"name": "country", "type": "Country"},
      {"name": "signup_date", "type": "Date", "min": "01/01/2023", "max": "01/01/2025"}
    ]
  }'

Best for: Large datasets for load testing, one-off data generation needs.

Mocking vs. Real Data

The fundamental question in mock data strategy is when to use mocks versus when to use real (sanitized) data.

Use Mocks When:

  • Unit and integration tests: mocks are fast, controllable, and don't need a database
  • Testing edge cases: hard to produce in real data (payment failures, network timeouts, malformed responses)
  • Testing third-party integrations: you can't control when Stripe returns a card_declined
  • Parallel test runs: fake data avoids shared-state conflicts

Use Real (Sanitized) Data When:

  • Performance and load testing: real data volume and distribution matters
  • Exploratory testing: finding bugs from real-world patterns
  • Regression testing: catching bugs that only appear with real user data patterns
  • After a production incident: reproduce exactly what the system saw

Sanitizing Production Data

If you use production data for tests, sanitize it first:

-- Anonymize emails
UPDATE users SET email = CONCAT('user_', id, '@test.example.com');

-- Scramble names
UPDATE users SET 
  first_name = 'Test',
  last_name = CONCAT('User_', id);

-- Nullify sensitive fields
UPDATE users SET
  phone = NULL,
  ssn = NULL,
  credit_card_last_four = NULL;

-- Replace addresses with fake ones
UPDATE users SET
  address_line1 = '123 Test Street',
  city = 'Testville',
  postal_code = '00000';

Never commit sanitized production data to version control.

Generating Consistent Mock Data

When you need the same mock data across multiple runs (reproducibility):

Seeded Faker

// Set seed for reproducible output
faker.seed(42);

const user1 = { name: faker.person.fullName() }; // Always 'Leroy Jenkins' (or whatever seed 42 gives)
const user2 = { name: faker.person.fullName() }; // Always the same second value

faker.seed(); // Reset to random

Static Mock Files

For API responses that should always be the same:

// __mocks__/api-responses/user-123.json
{
  "id": "user-123",
  "name": "Alice Johnson",
  "email": "alice@example.com",
  "role": "admin"
}
// In tests
const mockUser = require('../__mocks__/api-responses/user-123.json');
jest.mock('../src/api', () => ({
  getUser: () => Promise.resolve(mockUser),
}));

Snapshot Testing with Mock Data

Snapshot tests capture the output of a component or function and fail when it changes. Mock data keeps snapshots stable:

test('UserCard renders correctly', () => {
  const user = createUser({
    id: 'fixed-id-123',       // Fixed so snapshot is stable
    name: 'Alice Johnson',    // Fixed
    email: 'alice@example.com', // Fixed
    role: 'admin',
  });
  
  const { container } = render(<UserCard user={user} />);
  expect(container).toMatchSnapshot();
});

Never use faker in snapshot tests — the values change every run and snapshots never match.

Common Mistakes

Using Mock Data for Everything

Mocks can mask real integration bugs. Your mock API always returns 200 OK. The real API might return 429 Too Many Requests. Mix mocks (for unit tests) with real integration tests.

Over-Specifying Mock Responses

// Fragile — test breaks if any field changes
server.use(
  http.get('/api/users/1', () => HttpResponse.json({
    id: 1,
    name: 'Alice',
    email: 'alice@example.com',
    createdAt: '2024-01-01T00:00:00.000Z', // breaks on timezone change
    updatedAt: '2024-06-15T08:22:33.441Z', // breaks if format changes
    preferences: { theme: 'dark', language: 'en' }, // irrelevant to most tests
  }))
);

Keep mock responses minimal — only include the fields the test actually needs.

Not Resetting Handlers

MSW handlers set in one test can affect others if you don't reset:

afterEach(() => server.resetHandlers()); // Required

Summary

Approach Tool Best For
In-code generation Faker.js, Python Faker Unit tests, integration tests
Schema-driven JSON Schema Faker, Prism API contract testing
API interception MSW, Nock, responses Network mocking
Bulk DB generation Custom scripts + Faker Load tests, realistic dev data
Cloud generators Mockaroo, Generatedata Large one-off datasets
Sanitized prod data pg_dump + SQL anonymize Regression, load testing

The right mix for most projects: Faker in factories for unit/integration tests + MSW for API mocking + seeded scripts for CI database setup + sanitized prod data for load tests.

Start with Faker and MSW. Add complexity when you hit problems, not before.

Read more