Database Seeding and State Management for Integration Tests
Integration tests live and die by their data. A test that creates an order needs a product to exist. That product needs a category. That category has constraints. Building this state in every test is tedious and produces tests that are tightly coupled to unrelated domain objects. Getting it wrong — letting state leak between tests, or starting tests in an unpredictable state — produces the most infuriating category of test failure: results that depend on what ran before.
There is a principled way to handle this. It involves being deliberate about how you seed data, how you isolate tests from each other's changes, and how you reset state between runs.
The Problem: Test Interdependence Through Shared DB State
Consider this naive test setup:
// test/orders.test.js
describe('Order API', () => {
it('creates an order', async () => {
const product = await db('products').insert({ name: 'Widget', price: 9.99 }).returning('*');
const order = await createOrder({ productId: product[0].id, quantity: 2 });
expect(order.total).toBe(19.98);
});
it('lists orders for a user', async () => {
const orders = await db('orders').where({ userId: 1 });
expect(orders).toHaveLength(1); // FAILS: the previous test created an order too
});
});The second test assumes a clean database. The first test leaves rows behind. This fails when tests run in this order and passes when the second test runs first. Nobody can explain why it's flaky because the failure is in the data, not the logic.
Seeding Strategy 1: Fixtures
Fixtures are static data files that represent a known, reproducible state. Load them before tests, clean up after.
// test/fixtures/products.json
[
{ "id": 1, "name": "Widget", "price": 9.99, "category_id": 1, "stock": 100 },
{ "id": 2, "name": "Gadget", "price": 24.99, "category_id": 1, "stock": 50 },
{ "id": 3, "name": "Doohickey", "price": 4.99, "category_id": 2, "stock": 200 }
]A fixture loader for Knex:
// test/helpers/fixtures.ts
import knex from '../../src/db/knex';
import productsFixture from '../fixtures/products.json';
import categoriesFixture from '../fixtures/categories.json';
import usersFixture from '../fixtures/users.json';
export async function loadFixtures() {
// Order matters: respect foreign key constraints
await knex('categories').insert(categoriesFixture);
await knex('users').insert(usersFixture);
await knex('products').insert(productsFixture);
}
export async function clearFixtures() {
// Reverse order for FK constraints, or disable checks temporarily
await knex.raw('SET CONSTRAINTS ALL DEFERRED');
await knex('orders').truncate();
await knex('products').truncate();
await knex('users').truncate();
await knex('categories').truncate();
await knex.raw('SET CONSTRAINTS ALL IMMEDIATE');
}Usage:
beforeAll(async () => await loadFixtures());
afterAll(async () => await clearFixtures());The downside: fixtures rot. As the schema evolves, fixture files need manual updates. A missing field with a NOT NULL constraint causes all tests to fail with a cryptic database error. For stable reference data (categories, regions, permission levels), fixtures are ideal. For entity data that changes frequently, factories are better.
Seeding Strategy 2: Factories with Faker.js
Factories generate entities programmatically with sensible defaults that you can override per test. They handle relationships automatically and produce realistic randomized data.
// test/factories/product.factory.ts
import { faker } from '@faker-js/faker';
import knex from '../../src/db/knex';
interface ProductOverrides {
name?: string;
price?: number;
categoryId?: number;
stock?: number;
status?: 'active' | 'draft' | 'archived';
}
export async function createProduct(overrides: ProductOverrides = {}) {
const [product] = await knex('products')
.insert({
name: overrides.name ?? faker.commerce.productName(),
price: overrides.price ?? parseFloat(faker.commerce.price({ min: 1, max: 200 })),
category_id: overrides.categoryId ?? (await createCategory()).id,
stock: overrides.stock ?? faker.number.int({ min: 0, max: 500 }),
status: overrides.status ?? 'active',
created_at: new Date(),
updated_at: new Date(),
})
.returning('*');
return product;
}
export async function createCategory(overrides: { name?: string } = {}) {
const [category] = await knex('categories')
.insert({
name: overrides.name ?? faker.commerce.department(),
slug: faker.helpers.slugify(faker.commerce.department()).toLowerCase(),
})
.returning('*');
return category;
}A user factory with a full relationship graph:
// test/factories/user.factory.ts
import { faker } from '@faker-js/faker';
import bcrypt from 'bcrypt';
import knex from '../../src/db/knex';
export async function createUser(overrides: Partial<User> = {}) {
const password = overrides.password ?? 'testpassword123';
const passwordHash = await bcrypt.hash(password, 1); // Cost 1 for test speed
const [user] = await knex('users')
.insert({
email: overrides.email ?? faker.internet.email(),
name: overrides.name ?? faker.person.fullName(),
password_hash: passwordHash,
role: overrides.role ?? 'user',
email_verified: overrides.emailVerified ?? true,
created_at: new Date(),
})
.returning('*');
// Attach the plain password for use in auth tests
return { ...user, _plainPassword: password };
}
export async function createUserWithOrders(count = 3) {
const user = await createUser();
const orders = await Promise.all(
Array.from({ length: count }, () => createOrder({ userId: user.id }))
);
return { user, orders };
}Tests using factories are self-contained and readable:
it('returns only the requesting user\'s orders', async () => {
const alice = await createUser({ email: 'alice@example.com' });
const bob = await createUser({ email: 'bob@example.com' });
await createOrder({ userId: alice.id });
await createOrder({ userId: alice.id });
await createOrder({ userId: bob.id });
const response = await api.get('/orders').auth(alice.id);
expect(response.body.orders).toHaveLength(2);
expect(response.body.orders.every((o: Order) => o.userId === alice.id)).toBe(true);
});Transaction Rollback vs Truncation
Transaction Rollback
The cleanest reset strategy: wrap each test in a transaction, roll it back after. No rows persist:
import knex from '../../src/db/knex';
let trx: Knex.Transaction;
beforeEach(async () => {
trx = await knex.transaction();
// Monkeypatch all db calls to use this transaction
jest.spyOn(knex, 'raw').mockImplementation((...args) => trx.raw(...args));
});
afterEach(async () => {
await trx.rollback();
jest.restoreAllMocks();
});Speed: rollback is extremely fast — it's a single command regardless of how much data was inserted. A test that inserts 1000 rows rolls back in under a millisecond.
Limitation: doesn't work when the code under test uses afterCommit hooks, spawns background jobs that run their own queries, or uses multiple connections. For those cases, use truncation.
Truncation
Truncation deletes all rows from all test tables between each test. Slower than rollback, but works universally:
// test/helpers/truncate.ts
import knex from '../../src/db/knex';
const TABLES = [
'order_items',
'orders',
'product_variants',
'products',
'categories',
'addresses',
'users',
];
export async function truncateAll() {
// Disable FK checks for the duration of truncation
await knex.raw('BEGIN');
await knex.raw('SET CONSTRAINTS ALL DEFERRED');
await Promise.all(TABLES.map(table => knex(table).truncate()));
await knex.raw('SET CONSTRAINTS ALL IMMEDIATE');
await knex.raw('COMMIT');
}For PostgreSQL, TRUNCATE table1, table2, table3 RESTART IDENTITY CASCADE is faster than individual truncates:
export async function truncateAll() {
const tableList = TABLES.join(', ');
await knex.raw(`TRUNCATE ${tableList} RESTART IDENTITY CASCADE`);
}RESTART IDENTITY resets sequences, so auto-increment IDs start from 1 again. This matters if tests assert on specific IDs.
Knex Seeds
Knex has a built-in seed system for setting up baseline data. Seeds live in a seeds/ directory and are run with knex seed:run:
// seeds/01_categories.js
exports.seed = async function(knex) {
await knex('categories').del();
await knex('categories').insert([
{ id: 1, name: 'Electronics', slug: 'electronics' },
{ id: 2, name: 'Clothing', slug: 'clothing' },
{ id: 3, name: 'Books', slug: 'books' },
]);
};For tests, separate seed files from production seeds using --knexfile:
# Test seeds
npx knex seed:run --knexfile knexfile.test.js
<span class="hljs-comment"># Production seeds
npx knex seed:run --knexfile knexfile.jsknexfile.test.js points to the test database and test-specific seed directory:
// knexfile.test.js
module.exports = {
client: 'pg',
connection: process.env.TEST_DATABASE_URL,
seeds: {
directory: './test/seeds',
},
migrations: {
directory: './migrations',
},
};Prisma db seed
Prisma has native seed support via the prisma.seed configuration in package.json:
{
"prisma": {
"seed": "ts-node prisma/seed.ts"
}
}// prisma/seed.ts
import { PrismaClient } from '@prisma/client';
const prisma = new PrismaClient();
async function main() {
// Idempotent: upsert instead of insert
await prisma.category.upsert({
where: { slug: 'electronics' },
update: {},
create: {
name: 'Electronics',
slug: 'electronics',
},
});
await prisma.user.upsert({
where: { email: 'admin@example.com' },
update: {},
create: {
email: 'admin@example.com',
name: 'Admin',
role: 'ADMIN',
password: await hashPassword('admin123'),
},
});
}
main()
.catch(console.error)
.finally(() => prisma.$disconnect());Run seeds in tests:
// test/setup.ts
import { execSync } from 'child_process';
beforeAll(() => {
execSync('npx prisma db push --accept-data-loss', {
env: { ...process.env, DATABASE_URL: process.env.TEST_DATABASE_URL },
});
execSync('npx prisma db seed', {
env: { ...process.env, DATABASE_URL: process.env.TEST_DATABASE_URL },
});
});Reset Strategies: CI vs Local
In CI, start fresh. Before the test suite runs, drop and recreate the database, run all migrations, then load seeds. This guarantees a pristine state and catches migration issues:
# .github/workflows/test.yml
- name: Reset test database
run: |
PGPASSWORD=testpass psql -h localhost -U postgres -c \
"DROP DATABASE IF EXISTS testdb; CREATE DATABASE testdb;"
npx knex migrate:latest --knexfile knexfile.test.js
npx knex seed:run --knexfile knexfile.test.js
env:
TEST_DATABASE_URL: postgres://postgres:testpass@localhost/testdb
- name: Run tests
run: npm testIn local development, a full reset is too slow for tight feedback loops. Use truncation between test runs, with migrations applied incrementally. Add a db:reset:test script:
{
"scripts": {
"db:reset:test": "knex migrate:rollback --all --knexfile knexfile.test.js && knex migrate:latest --knexfile knexfile.test.js && knex seed:run --knexfile knexfile.test.js",
"test": "jest --runInBand",
"test:fresh": "npm run db:reset:test && npm test"
}
}Developers run npm test for a fast feedback loop and npm run test:fresh when they suspect database state is corrupted or when switching branches.
The principle to hold onto: in tests, data is code. It should be explicit, reproducible, and cleaned up. Implicit data — left over from a previous test, seeded globally without cleanup, or pulled from a shared environment — is a bug waiting to surface at the worst possible moment.