Faker.js Advanced: Custom Providers, Locales, Seeded Determinism, and Large Datasets

Faker.js Advanced: Custom Providers, Locales, Seeded Determinism, and Large Datasets

Faker.js (@faker-js/faker) generates realistic fake data for tests and seeding. The basics — faker.person.firstName(), faker.internet.email() — are straightforward. The advanced features are what make Faker useful for complex test data: custom providers for domain-specific data, locales for internationalization testing, seeded randomness for reproducible test runs, and efficient large dataset generation.

Installation

npm install --save-dev @faker-js/faker

Seeded Determinism

By default, Faker uses a random seed on each run. This makes failing tests hard to reproduce — the data that caused the failure is gone. Use faker.seed() to make a test repeatable:

import { faker } from '@faker-js/faker';

beforeEach(() => {
  faker.seed(12345);
});

test('generates the same data on every run', () => {
  const user = {
    name: faker.person.fullName(),
    email: faker.internet.email(),
  };
  // These values are identical on every run with seed 12345
  expect(user.name).toBe('Cecilia Abshire');
});

With a fixed seed, the same sequence of Faker calls always produces the same output. When a test fails in CI, you can reproduce the exact data locally.

Seed per test, not globally: Reset the seed in beforeEach so tests don't depend on execution order.

Seed from an environment variable: In CI, log the seed and allow it to be replayed:

const TEST_SEED = process.env.TEST_SEED
  ? parseInt(process.env.TEST_SEED)
  : Math.round(Math.random() * 1_000_000);

console.log(`Faker seed: ${TEST_SEED}`);  // Always log it
faker.seed(TEST_SEED);

// Replay: TEST_SEED=482913 npm test

Locales

Faker supports locale-specific data. Use it to test internationalization, character encoding, and locale-aware formatting:

import { faker, fakerDE, fakerJA, fakerAR } from '@faker-js/faker';

// German names and addresses
const germanUser = {
  name: fakerDE.person.fullName(),       // 'Katharina Müller'
  city: fakerDE.location.city(),         // 'München'
  phone: fakerDE.phone.number(),         // '+49 89 123456'
};

// Japanese
const japaneseUser = {
  name: fakerJA.person.fullName(),       // '山田 太郎'
  company: fakerJA.company.name(),
};

// Arabic (right-to-left)
const arabicText = fakerAR.lorem.paragraph();

Available locales include fakerFR, fakerES, fakerPT_BR, fakerZH_CN, fakerKO, and dozens more. Import the specific locale faker from @faker-js/faker.

Testing with Multiple Locales

import { allFakers } from '@faker-js/faker';

describe('User display name', () => {
  const testLocales = ['en', 'de', 'ja', 'ar', 'zh_CN'];

  test.each(testLocales)('renders correctly for locale %s', (locale) => {
    const localeFaker = allFakers[locale];
    const name = localeFaker.person.fullName();
    const rendered = renderDisplayName(name);
    expect(rendered).toContain(name);
    expect(rendered).not.toBeNull();
  });
});

Custom Providers

For domain-specific data (medical codes, product SKUs, regulatory IDs), extend Faker:

import { faker, Faker } from '@faker-js/faker';

// Extend the base faker with domain-specific helpers
const medicalFaker = {
  ...faker,
  medical: {
    icd10Code: () => {
      const letters = faker.string.alpha({ length: 1, casing: 'upper' });
      const numbers = faker.string.numeric(2);
      const decimal = faker.string.numeric(1);
      return `${letters}${numbers}.${decimal}`;
    },

    npiNumber: () => {
      // 10-digit National Provider Identifier
      return faker.string.numeric(10);
    },

    dosageAmount: () => {
      const amounts = [2.5, 5, 10, 20, 25, 50, 100, 200, 500];
      return faker.helpers.arrayElement(amounts);
    },

    medicationName: () => {
      const medications = [
        'Metformin', 'Lisinopril', 'Atorvastatin', 'Amoxicillin',
        'Omeprazole', 'Sertraline', 'Gabapentin', 'Amlodipine'
      ];
      return faker.helpers.arrayElement(medications);
    },
  },
};

// Usage
const prescription = {
  medicationName: medicalFaker.medical.medicationName(),
  dosage: medicalFaker.medical.dosageAmount(),
  diagnosisCode: medicalFaker.medical.icd10Code(),
  prescriberId: medicalFaker.medical.npiNumber(),
};

For TypeScript type safety, create an interface:

interface MedicalFaker extends Faker {
  medical: {
    icd10Code(): string;
    npiNumber(): string;
    dosageAmount(): number;
    medicationName(): string;
  };
}

The helpers Module

faker.helpers provides utilities for combining data:

// Random element from array
const status = faker.helpers.arrayElement(['pending', 'active', 'suspended']);

// Random elements from array (multiple)
const permissions = faker.helpers.arrayElements(
  ['read', 'write', 'delete', 'admin'],
  { min: 1, max: 3 }
);

// Shuffle array
const shuffled = faker.helpers.shuffle([1, 2, 3, 4, 5]);

// Fill a template with fake data
const username = faker.helpers.fromRegExp(/[a-z]{4,8}_[0-9]{2,4}/);
// e.g., 'john_2847'

// Weighted random selection
const userType = faker.helpers.weightedArrayElement([
  { weight: 70, value: 'free' },
  { weight: 25, value: 'pro' },
  { weight: 5, value: 'enterprise' },
]);

unique() Helper

When you need values that don't repeat (database unique constraints, distinct test identifiers):

import { faker } from '@faker-js/faker';

// Generate 10 unique email addresses
const emails = faker.helpers.uniqueArray(
  () => faker.internet.email(),
  10
);

// Unique from a known set
const productCodes = faker.helpers.uniqueArray(
  ['SKU-001', 'SKU-002', 'SKU-003', 'SKU-004', 'SKU-005'],
  3
);

uniqueArray throws if it can't generate enough unique values (the pool is exhausted). For large unique sets, use sequences or combine Faker with a counter.

Generating Large Datasets

For seeding development databases or performance testing, generate large arrays efficiently:

function generateUsers(count: number) {
  return Array.from({ length: count }, (_, i) => ({
    id: i + 1,
    name: faker.person.fullName(),
    email: faker.internet.email(),
    company: faker.company.name(),
    phone: faker.phone.number(),
    createdAt: faker.date.between({
      from: '2020-01-01',
      to: '2024-12-31',
    }),
    plan: faker.helpers.weightedArrayElement([
      { weight: 60, value: 'free' },
      { weight: 30, value: 'pro' },
      { weight: 10, value: 'enterprise' },
    ]),
  }));
}

// Generate 50,000 users
const users = generateUsers(50_000);

For truly large datasets (millions of records), stream to avoid memory issues:

import { createWriteStream } from 'fs';
import { Writable } from 'stream';

async function generateLargeCSV(path: string, count: number) {
  const stream = createWriteStream(path);
  stream.write('id,name,email,plan\n');

  for (let i = 0; i < count; i++) {
    const line = [
      i + 1,
      faker.person.fullName().replace(',', ''),
      faker.internet.email(),
      faker.helpers.arrayElement(['free', 'pro', 'enterprise']),
    ].join(',');
    stream.write(line + '\n');

    // Yield to event loop every 10k rows
    if (i % 10_000 === 0) {
      await new Promise(resolve => setImmediate(resolve));
    }
  }

  stream.end();
}

Date Generation

// Random past date
const pastDate = faker.date.past({ years: 2 });

// Random future date
const futureDate = faker.date.future({ years: 1 });

// Between two dates
const subscriptionDate = faker.date.between({
  from: '2023-01-01',
  to: '2024-01-01',
});

// Recent (within last N days)
const recentActivity = faker.date.recent({ days: 7 });

// Birthdate for age range
const birthdate = faker.date.birthdate({ min: 18, max: 65, mode: 'age' });

Realistic Email Formats

// Based on a person's name
const firstName = faker.person.firstName();
const lastName = faker.person.lastName();
const domain = faker.internet.domainName();

const emails = [
  `${firstName.toLowerCase()}.${lastName.toLowerCase()}@${domain}`,
  `${firstName[0].toLowerCase()}${lastName.toLowerCase()}@${domain}`,
  `${firstName.toLowerCase()}${faker.string.numeric(2)}@${domain}`,
];

const email = faker.helpers.arrayElement(emails);

Key Points

  • faker.seed(n) makes tests reproducible; reset in beforeEach and log the seed for CI replay
  • Locale-specific fakers (fakerDE, fakerJA, fakerAR) generate culturally accurate data for i18n testing
  • Custom providers extend Faker with domain objects — wrap in a plain object alongside the base faker instance
  • faker.helpers.uniqueArray() generates non-repeating values for unique constraint testing
  • Weighted selection with faker.helpers.weightedArrayElement() reflects real-world distributions
  • Stream large datasets in a for-loop with periodic setImmediate yields to avoid memory exhaustion
  • faker.helpers.fromRegExp() generates strings matching a pattern — useful for structured IDs and codes

Read more