Faker.js Advanced: Custom Providers, Locales, Seeded Determinism, and Large Datasets
Faker.js (@faker-js/faker) generates realistic fake data for tests and seeding. The basics — faker.person.firstName(), faker.internet.email() — are straightforward. The advanced features are what make Faker useful for complex test data: custom providers for domain-specific data, locales for internationalization testing, seeded randomness for reproducible test runs, and efficient large dataset generation.
Installation
npm install --save-dev @faker-js/fakerSeeded Determinism
By default, Faker uses a random seed on each run. This makes failing tests hard to reproduce — the data that caused the failure is gone. Use faker.seed() to make a test repeatable:
import { faker } from '@faker-js/faker';
beforeEach(() => {
faker.seed(12345);
});
test('generates the same data on every run', () => {
const user = {
name: faker.person.fullName(),
email: faker.internet.email(),
};
// These values are identical on every run with seed 12345
expect(user.name).toBe('Cecilia Abshire');
});With a fixed seed, the same sequence of Faker calls always produces the same output. When a test fails in CI, you can reproduce the exact data locally.
Seed per test, not globally: Reset the seed in beforeEach so tests don't depend on execution order.
Seed from an environment variable: In CI, log the seed and allow it to be replayed:
const TEST_SEED = process.env.TEST_SEED
? parseInt(process.env.TEST_SEED)
: Math.round(Math.random() * 1_000_000);
console.log(`Faker seed: ${TEST_SEED}`); // Always log it
faker.seed(TEST_SEED);
// Replay: TEST_SEED=482913 npm testLocales
Faker supports locale-specific data. Use it to test internationalization, character encoding, and locale-aware formatting:
import { faker, fakerDE, fakerJA, fakerAR } from '@faker-js/faker';
// German names and addresses
const germanUser = {
name: fakerDE.person.fullName(), // 'Katharina Müller'
city: fakerDE.location.city(), // 'München'
phone: fakerDE.phone.number(), // '+49 89 123456'
};
// Japanese
const japaneseUser = {
name: fakerJA.person.fullName(), // '山田 太郎'
company: fakerJA.company.name(),
};
// Arabic (right-to-left)
const arabicText = fakerAR.lorem.paragraph();Available locales include fakerFR, fakerES, fakerPT_BR, fakerZH_CN, fakerKO, and dozens more. Import the specific locale faker from @faker-js/faker.
Testing with Multiple Locales
import { allFakers } from '@faker-js/faker';
describe('User display name', () => {
const testLocales = ['en', 'de', 'ja', 'ar', 'zh_CN'];
test.each(testLocales)('renders correctly for locale %s', (locale) => {
const localeFaker = allFakers[locale];
const name = localeFaker.person.fullName();
const rendered = renderDisplayName(name);
expect(rendered).toContain(name);
expect(rendered).not.toBeNull();
});
});Custom Providers
For domain-specific data (medical codes, product SKUs, regulatory IDs), extend Faker:
import { faker, Faker } from '@faker-js/faker';
// Extend the base faker with domain-specific helpers
const medicalFaker = {
...faker,
medical: {
icd10Code: () => {
const letters = faker.string.alpha({ length: 1, casing: 'upper' });
const numbers = faker.string.numeric(2);
const decimal = faker.string.numeric(1);
return `${letters}${numbers}.${decimal}`;
},
npiNumber: () => {
// 10-digit National Provider Identifier
return faker.string.numeric(10);
},
dosageAmount: () => {
const amounts = [2.5, 5, 10, 20, 25, 50, 100, 200, 500];
return faker.helpers.arrayElement(amounts);
},
medicationName: () => {
const medications = [
'Metformin', 'Lisinopril', 'Atorvastatin', 'Amoxicillin',
'Omeprazole', 'Sertraline', 'Gabapentin', 'Amlodipine'
];
return faker.helpers.arrayElement(medications);
},
},
};
// Usage
const prescription = {
medicationName: medicalFaker.medical.medicationName(),
dosage: medicalFaker.medical.dosageAmount(),
diagnosisCode: medicalFaker.medical.icd10Code(),
prescriberId: medicalFaker.medical.npiNumber(),
};For TypeScript type safety, create an interface:
interface MedicalFaker extends Faker {
medical: {
icd10Code(): string;
npiNumber(): string;
dosageAmount(): number;
medicationName(): string;
};
}The helpers Module
faker.helpers provides utilities for combining data:
// Random element from array
const status = faker.helpers.arrayElement(['pending', 'active', 'suspended']);
// Random elements from array (multiple)
const permissions = faker.helpers.arrayElements(
['read', 'write', 'delete', 'admin'],
{ min: 1, max: 3 }
);
// Shuffle array
const shuffled = faker.helpers.shuffle([1, 2, 3, 4, 5]);
// Fill a template with fake data
const username = faker.helpers.fromRegExp(/[a-z]{4,8}_[0-9]{2,4}/);
// e.g., 'john_2847'
// Weighted random selection
const userType = faker.helpers.weightedArrayElement([
{ weight: 70, value: 'free' },
{ weight: 25, value: 'pro' },
{ weight: 5, value: 'enterprise' },
]);unique() Helper
When you need values that don't repeat (database unique constraints, distinct test identifiers):
import { faker } from '@faker-js/faker';
// Generate 10 unique email addresses
const emails = faker.helpers.uniqueArray(
() => faker.internet.email(),
10
);
// Unique from a known set
const productCodes = faker.helpers.uniqueArray(
['SKU-001', 'SKU-002', 'SKU-003', 'SKU-004', 'SKU-005'],
3
);uniqueArray throws if it can't generate enough unique values (the pool is exhausted). For large unique sets, use sequences or combine Faker with a counter.
Generating Large Datasets
For seeding development databases or performance testing, generate large arrays efficiently:
function generateUsers(count: number) {
return Array.from({ length: count }, (_, i) => ({
id: i + 1,
name: faker.person.fullName(),
email: faker.internet.email(),
company: faker.company.name(),
phone: faker.phone.number(),
createdAt: faker.date.between({
from: '2020-01-01',
to: '2024-12-31',
}),
plan: faker.helpers.weightedArrayElement([
{ weight: 60, value: 'free' },
{ weight: 30, value: 'pro' },
{ weight: 10, value: 'enterprise' },
]),
}));
}
// Generate 50,000 users
const users = generateUsers(50_000);For truly large datasets (millions of records), stream to avoid memory issues:
import { createWriteStream } from 'fs';
import { Writable } from 'stream';
async function generateLargeCSV(path: string, count: number) {
const stream = createWriteStream(path);
stream.write('id,name,email,plan\n');
for (let i = 0; i < count; i++) {
const line = [
i + 1,
faker.person.fullName().replace(',', ''),
faker.internet.email(),
faker.helpers.arrayElement(['free', 'pro', 'enterprise']),
].join(',');
stream.write(line + '\n');
// Yield to event loop every 10k rows
if (i % 10_000 === 0) {
await new Promise(resolve => setImmediate(resolve));
}
}
stream.end();
}Date Generation
// Random past date
const pastDate = faker.date.past({ years: 2 });
// Random future date
const futureDate = faker.date.future({ years: 1 });
// Between two dates
const subscriptionDate = faker.date.between({
from: '2023-01-01',
to: '2024-01-01',
});
// Recent (within last N days)
const recentActivity = faker.date.recent({ days: 7 });
// Birthdate for age range
const birthdate = faker.date.birthdate({ min: 18, max: 65, mode: 'age' });Realistic Email Formats
// Based on a person's name
const firstName = faker.person.firstName();
const lastName = faker.person.lastName();
const domain = faker.internet.domainName();
const emails = [
`${firstName.toLowerCase()}.${lastName.toLowerCase()}@${domain}`,
`${firstName[0].toLowerCase()}${lastName.toLowerCase()}@${domain}`,
`${firstName.toLowerCase()}${faker.string.numeric(2)}@${domain}`,
];
const email = faker.helpers.arrayElement(emails);Key Points
faker.seed(n)makes tests reproducible; reset inbeforeEachand log the seed for CI replay- Locale-specific fakers (
fakerDE,fakerJA,fakerAR) generate culturally accurate data for i18n testing - Custom providers extend Faker with domain objects — wrap in a plain object alongside the base faker instance
faker.helpers.uniqueArray()generates non-repeating values for unique constraint testing- Weighted selection with
faker.helpers.weightedArrayElement()reflects real-world distributions - Stream large datasets in a for-loop with periodic
setImmediateyields to avoid memory exhaustion faker.helpers.fromRegExp()generates strings matching a pattern — useful for structured IDs and codes