Testing

Database Seeding and State Management for Integration Tests

HelpMeTest

16 May 2026 — 6 min read

Integration tests live and die by their data. A test that creates an order needs a product to exist. That product needs a category. That category has constraints. Building this state in every test is tedious and produces tests that are tightly coupled to unrelated domain objects. Getting it wrong — letting state leak between tests, or starting tests in an unpredictable state — produces the most infuriating category of test failure: results that depend on what ran before.

There is a principled way to handle this. It involves being deliberate about how you seed data, how you isolate tests from each other's changes, and how you reset state between runs.

The Problem: Test Interdependence Through Shared DB State

Consider this naive test setup:

// test/orders.test.js
describe('Order API', () => {
  it('creates an order', async () => {
    const product = await db('products').insert({ name: 'Widget', price: 9.99 }).returning('*');
    const order = await createOrder({ productId: product[0].id, quantity: 2 });
    expect(order.total).toBe(19.98);
  });

  it('lists orders for a user', async () => {
    const orders = await db('orders').where({ userId: 1 });
    expect(orders).toHaveLength(1); // FAILS: the previous test created an order too
  });
});

The second test assumes a clean database. The first test leaves rows behind. This fails when tests run in this order and passes when the second test runs first. Nobody can explain why it's flaky because the failure is in the data, not the logic.

Seeding Strategy 1: Fixtures

Fixtures are static data files that represent a known, reproducible state. Load them before tests, clean up after.

// test/fixtures/products.json
[
  { "id": 1, "name": "Widget", "price": 9.99, "category_id": 1, "stock": 100 },
  { "id": 2, "name": "Gadget", "price": 24.99, "category_id": 1, "stock": 50 },
  { "id": 3, "name": "Doohickey", "price": 4.99, "category_id": 2, "stock": 200 }
]

A fixture loader for Knex:

// test/helpers/fixtures.ts
import knex from '../../src/db/knex';
import productsFixture from '../fixtures/products.json';
import categoriesFixture from '../fixtures/categories.json';
import usersFixture from '../fixtures/users.json';

export async function loadFixtures() {
  // Order matters: respect foreign key constraints
  await knex('categories').insert(categoriesFixture);
  await knex('users').insert(usersFixture);
  await knex('products').insert(productsFixture);
}

export async function clearFixtures() {
  // Reverse order for FK constraints, or disable checks temporarily
  await knex.raw('SET CONSTRAINTS ALL DEFERRED');
  await knex('orders').truncate();
  await knex('products').truncate();
  await knex('users').truncate();
  await knex('categories').truncate();
  await knex.raw('SET CONSTRAINTS ALL IMMEDIATE');
}

Usage:

beforeAll(async () => await loadFixtures());
afterAll(async () => await clearFixtures());

The downside: fixtures rot. As the schema evolves, fixture files need manual updates. A missing field with a NOT NULL constraint causes all tests to fail with a cryptic database error. For stable reference data (categories, regions, permission levels), fixtures are ideal. For entity data that changes frequently, factories are better.

Seeding Strategy 2: Factories with Faker.js

Factories generate entities programmatically with sensible defaults that you can override per test. They handle relationships automatically and produce realistic randomized data.

// test/factories/product.factory.ts
import { faker } from '@faker-js/faker';
import knex from '../../src/db/knex';

interface ProductOverrides {
  name?: string;
  price?: number;
  categoryId?: number;
  stock?: number;
  status?: 'active' | 'draft' | 'archived';
}

export async function createProduct(overrides: ProductOverrides = {}) {
  const [product] = await knex('products')
    .insert({
      name: overrides.name ?? faker.commerce.productName(),
      price: overrides.price ?? parseFloat(faker.commerce.price({ min: 1, max: 200 })),
      category_id: overrides.categoryId ?? (await createCategory()).id,
      stock: overrides.stock ?? faker.number.int({ min: 0, max: 500 }),
      status: overrides.status ?? 'active',
      created_at: new Date(),
      updated_at: new Date(),
    })
    .returning('*');
  return product;
}

export async function createCategory(overrides: { name?: string } = {}) {
  const [category] = await knex('categories')
    .insert({
      name: overrides.name ?? faker.commerce.department(),
      slug: faker.helpers.slugify(faker.commerce.department()).toLowerCase(),
    })
    .returning('*');
  return category;
}

A user factory with a full relationship graph:

// test/factories/user.factory.ts
import { faker } from '@faker-js/faker';
import bcrypt from 'bcrypt';
import knex from '../../src/db/knex';

export async function createUser(overrides: Partial<User> = {}) {
  const password = overrides.password ?? 'testpassword123';
  const passwordHash = await bcrypt.hash(password, 1); // Cost 1 for test speed

  const [user] = await knex('users')
    .insert({
      email: overrides.email ?? faker.internet.email(),
      name: overrides.name ?? faker.person.fullName(),
      password_hash: passwordHash,
      role: overrides.role ?? 'user',
      email_verified: overrides.emailVerified ?? true,
      created_at: new Date(),
    })
    .returning('*');

  // Attach the plain password for use in auth tests
  return { ...user, _plainPassword: password };
}

export async function createUserWithOrders(count = 3) {
  const user = await createUser();
  const orders = await Promise.all(
    Array.from({ length: count }, () => createOrder({ userId: user.id }))
  );
  return { user, orders };
}

Tests using factories are self-contained and readable:

it('returns only the requesting user\'s orders', async () => {
  const alice = await createUser({ email: 'alice@example.com' });
  const bob = await createUser({ email: 'bob@example.com' });
  
  await createOrder({ userId: alice.id });
  await createOrder({ userId: alice.id });
  await createOrder({ userId: bob.id });

  const response = await api.get('/orders').auth(alice.id);
  expect(response.body.orders).toHaveLength(2);
  expect(response.body.orders.every((o: Order) => o.userId === alice.id)).toBe(true);
});

Transaction Rollback vs Truncation

Transaction Rollback

The cleanest reset strategy: wrap each test in a transaction, roll it back after. No rows persist:

import knex from '../../src/db/knex';

let trx: Knex.Transaction;

beforeEach(async () => {
  trx = await knex.transaction();
  // Monkeypatch all db calls to use this transaction
  jest.spyOn(knex, 'raw').mockImplementation((...args) => trx.raw(...args));
});

afterEach(async () => {
  await trx.rollback();
  jest.restoreAllMocks();
});

Speed: rollback is extremely fast — it's a single command regardless of how much data was inserted. A test that inserts 1000 rows rolls back in under a millisecond.

Limitation: doesn't work when the code under test uses afterCommit hooks, spawns background jobs that run their own queries, or uses multiple connections. For those cases, use truncation.

Truncation

Truncation deletes all rows from all test tables between each test. Slower than rollback, but works universally:

// test/helpers/truncate.ts
import knex from '../../src/db/knex';

const TABLES = [
  'order_items',
  'orders',
  'product_variants',
  'products',
  'categories',
  'addresses',
  'users',
];

export async function truncateAll() {
  // Disable FK checks for the duration of truncation
  await knex.raw('BEGIN');
  await knex.raw('SET CONSTRAINTS ALL DEFERRED');
  
  await Promise.all(TABLES.map(table => knex(table).truncate()));
  
  await knex.raw('SET CONSTRAINTS ALL IMMEDIATE');
  await knex.raw('COMMIT');
}

For PostgreSQL, TRUNCATE table1, table2, table3 RESTART IDENTITY CASCADE is faster than individual truncates:

export async function truncateAll() {
  const tableList = TABLES.join(', ');
  await knex.raw(`TRUNCATE ${tableList} RESTART IDENTITY CASCADE`);
}

RESTART IDENTITY resets sequences, so auto-increment IDs start from 1 again. This matters if tests assert on specific IDs.

Knex Seeds

Knex has a built-in seed system for setting up baseline data. Seeds live in a seeds/ directory and are run with knex seed:run:

// seeds/01_categories.js
exports.seed = async function(knex) {
  await knex('categories').del();
  await knex('categories').insert([
    { id: 1, name: 'Electronics', slug: 'electronics' },
    { id: 2, name: 'Clothing', slug: 'clothing' },
    { id: 3, name: 'Books', slug: 'books' },
  ]);
};

For tests, separate seed files from production seeds using --knexfile:

# Test seeds
npx knex seed:run --knexfile knexfile.test.js

<span class="hljs-comment"># Production seeds
npx knex seed:run --knexfile knexfile.js

knexfile.test.js points to the test database and test-specific seed directory:

// knexfile.test.js
module.exports = {
  client: 'pg',
  connection: process.env.TEST_DATABASE_URL,
  seeds: {
    directory: './test/seeds',
  },
  migrations: {
    directory: './migrations',
  },
};

Prisma db seed

Prisma has native seed support via the prisma.seed configuration in package.json:

{
  "prisma": {
    "seed": "ts-node prisma/seed.ts"
  }
}

// prisma/seed.ts
import { PrismaClient } from '@prisma/client';

const prisma = new PrismaClient();

async function main() {
  // Idempotent: upsert instead of insert
  await prisma.category.upsert({
    where: { slug: 'electronics' },
    update: {},
    create: {
      name: 'Electronics',
      slug: 'electronics',
    },
  });

  await prisma.user.upsert({
    where: { email: 'admin@example.com' },
    update: {},
    create: {
      email: 'admin@example.com',
      name: 'Admin',
      role: 'ADMIN',
      password: await hashPassword('admin123'),
    },
  });
}

main()
  .catch(console.error)
  .finally(() => prisma.$disconnect());

Run seeds in tests:

// test/setup.ts
import { execSync } from 'child_process';

beforeAll(() => {
  execSync('npx prisma db push --accept-data-loss', {
    env: { ...process.env, DATABASE_URL: process.env.TEST_DATABASE_URL },
  });
  execSync('npx prisma db seed', {
    env: { ...process.env, DATABASE_URL: process.env.TEST_DATABASE_URL },
  });
});

Reset Strategies: CI vs Local

In CI, start fresh. Before the test suite runs, drop and recreate the database, run all migrations, then load seeds. This guarantees a pristine state and catches migration issues:

# .github/workflows/test.yml
- name: Reset test database
  run: |
    PGPASSWORD=testpass psql -h localhost -U postgres -c \
      "DROP DATABASE IF EXISTS testdb; CREATE DATABASE testdb;"
    npx knex migrate:latest --knexfile knexfile.test.js
    npx knex seed:run --knexfile knexfile.test.js
  env:
    TEST_DATABASE_URL: postgres://postgres:testpass@localhost/testdb

- name: Run tests
  run: npm test

In local development, a full reset is too slow for tight feedback loops. Use truncation between test runs, with migrations applied incrementally. Add a db:reset:test script:

{
  "scripts": {
    "db:reset:test": "knex migrate:rollback --all --knexfile knexfile.test.js && knex migrate:latest --knexfile knexfile.test.js && knex seed:run --knexfile knexfile.test.js",
    "test": "jest --runInBand",
    "test:fresh": "npm run db:reset:test && npm test"
  }
}

Developers run npm test for a fast feedback loop and npm run test:fresh when they suspect database state is corrupted or when switching branches.

The principle to hold onto: in tests, data is code. It should be explicit, reproducible, and cleaned up. Implicit data — left over from a previous test, seeded globally without cleanup, or pulled from a shared environment — is a bug waiting to surface at the worst possible moment.

Database Seeding and State Management for Integration Tests

HelpMeTest

The Problem: Test Interdependence Through Shared DB State

Seeding Strategy 1: Fixtures

Seeding Strategy 2: Factories with Faker.js

Transaction Rollback vs Truncation

Transaction Rollback

Truncation

Knex Seeds

Prisma db seed

Reset Strategies: CI vs Local

Read more

Writing Custom Checkov Policies: Python and YAML for IaC Security Rules

Pulumi CrossGuard: Policy-as-Code Testing for Infrastructure

tfsec: Static Security Analysis for Terraform Code

Terratest Advanced Patterns: Table-Driven Tests, Parallel Testing, and Module Fixtures