Database Seeding for Tests: Strategies, Scripts, and CI Setup

Database Seeding for Tests: Strategies, Scripts, and CI Setup

Database seeding for tests is the process of populating a test database with known data before tests run. Done right, it gives every test a predictable starting state. Done wrong, it becomes the most common cause of flaky tests.

This guide covers seeding strategies, seed script design, CI integration, and how to handle the inevitable state management problems.

Why Seeding Matters

Tests that depend on a database need data. That data has to come from somewhere. Your options:

  1. Each test creates everything it needs (factories/fixtures)
  2. A seed script runs once before all tests
  3. A combination: seed for reference data, factories for test-specific data

The choice depends on what kind of tests you're writing:

  • Unit tests: usually no DB needed at all
  • Integration tests: factories + per-test cleanup
  • End-to-end tests: seed for realistic state, factories for test-specific variations
  • Load/performance tests: large seeded dataset, no per-test cleanup

Designing a Seed Script

A good seed script is:

  • Idempotent: running it twice leaves the same state as running it once
  • Order-aware: respects foreign key constraints (users before orders)
  • Fast: a 30-second seed script blocks your entire CI pipeline
  • Version-controlled: lives in the repo, runs in CI
// test/seed.js (Node.js / Knex)
const db = require('./db');

async function seed() {
  // Truncate in reverse dependency order
  await db.raw(`
    TRUNCATE TABLE 
      order_items, orders, product_reviews,
      products, categories, users
    RESTART IDENTITY CASCADE
  `);

  // Seed reference data
  const categories = await db('categories').insert([
    { id: 1, name: 'Electronics', slug: 'electronics' },
    { id: 2, name: 'Clothing', slug: 'clothing' },
    { id: 3, name: 'Books', slug: 'books' },
  ]).returning('*');

  // Seed users
  const users = await db('users').insert([
    { id: 1, email: 'admin@test.com', role: 'admin', is_active: true },
    { id: 2, email: 'user@test.com', role: 'user', is_active: true },
    { id: 3, email: 'inactive@test.com', role: 'user', is_active: false },
  ]).returning('*');

  // Seed products (depend on categories)
  const products = await db('products').insert([
    { id: 1, name: 'Laptop', price: 999.99, category_id: 1, stock: 50 },
    { id: 2, name: 'Phone', price: 699.99, category_id: 1, stock: 100 },
    { id: 3, name: 'T-Shirt', price: 29.99, category_id: 2, stock: 200 },
  ]).returning('*');

  console.log(`Seeded: ${users.length} users, ${products.length} products`);
  return { categories, users, products };
}

module.exports = { seed };

if (require.main === module) {
  seed()
    .then(() => process.exit(0))
    .catch(err => { console.error(err); process.exit(1); });
}

Python / Django Example

# tests/seed.py
import os
import django

os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'myapp.settings.test')
django.setup()

from django.contrib.auth import get_user_model
from myapp.models import Category, Product, Order

User = get_user_model()

def seed():
    # Clear existing data
    Order.objects.all().delete()
    Product.objects.all().delete()
    Category.objects.all().delete()
    User.objects.filter(is_superuser=False).delete()

    # Create reference data
    electronics = Category.objects.create(name='Electronics', slug='electronics')
    clothing = Category.objects.create(name='Clothing', slug='clothing')

    # Create users
    admin = User.objects.create_superuser(
        username='admin',
        email='admin@test.com',
        password='testpass123',
    )
    regular = User.objects.create_user(
        username='user',
        email='user@test.com',
        password='testpass123',
    )

    # Create products
    Product.objects.bulk_create([
        Product(name='Laptop', price=999.99, category=electronics, stock=50),
        Product(name='Phone', price=699.99, category=electronics, stock=100),
        Product(name='T-Shirt', price=29.99, category=clothing, stock=200),
    ])

    print(f'Seeded: 2 users, 3 products')

if __name__ == '__main__':
    seed()

Idempotency Patterns

TRUNCATE + RESTART IDENTITY

The simplest approach for PostgreSQL:

TRUNCATE TABLE users, products, orders RESTART IDENTITY CASCADE;

RESTART IDENTITY resets auto-increment sequences so IDs are predictable. CASCADE handles foreign keys automatically.

INSERT OR REPLACE / ON CONFLICT

When you can't truncate (shared database, production-like environment):

-- PostgreSQL
INSERT INTO users (id, email, role)
VALUES (1, 'admin@test.com', 'admin')
ON CONFLICT (id) DO UPDATE 
SET email = EXCLUDED.email, role = EXCLUDED.role;
// Knex
await db('users')
  .insert({ id: 1, email: 'admin@test.com', role: 'admin' })
  .onConflict('id')
  .merge();

Check Before Insert

For seeds that should be additive, not destructive:

async function seedIfEmpty() {
  const count = await db('categories').count('id as n').first();
  if (count.n > 0) {
    console.log('DB already seeded, skipping');
    return;
  }
  await seed();
}

CI Integration

GitHub Actions

# .github/workflows/test.yml
name: Test

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    
    services:
      postgres:
        image: postgres:15
        env:
          POSTGRES_DB: test_db
          POSTGRES_USER: test
          POSTGRES_PASSWORD: test
        ports:
          - 5432:5432
        options: >-
          --health-cmd pg_isready
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5

    steps:
      - uses: actions/checkout@v4
      
      - name: Install dependencies
        run: npm ci
      
      - name: Run migrations
        run: npm run migrate
        env:
          DATABASE_URL: postgres://test:test@localhost:5432/test_db
      
      - name: Seed test database
        run: node test/seed.js
        env:
          DATABASE_URL: postgres://test:test@localhost:5432/test_db
      
      - name: Run tests
        run: npm test
        env:
          DATABASE_URL: postgres://test:test@localhost:5432/test_db

Docker Compose for Local Dev

# docker-compose.test.yml
version: '3.8'

services:
  db:
    image: postgres:15
    environment:
      POSTGRES_DB: test_db
      POSTGRES_USER: test
      POSTGRES_PASSWORD: test
    ports:
      - '5433:5432'  # Different port to avoid conflicts with dev DB
    
  test:
    build: .
    depends_on:
      db:
        condition: service_healthy
    environment:
      DATABASE_URL: postgres://test:test@db:5432/test_db
    command: sh -c "npm run migrate && node test/seed.js && npm test"
docker-compose -f docker-compose.test.yml run --rm <span class="hljs-built_in">test

Seed Timing: Once vs. Per-Test

Seed Once Per Suite

Seed at the start of the test suite, reset between tests using transaction rollback:

// jest.globalSetup.js
const { seed } = require('./test/seed');
const db = require('./test/db');

module.exports = async () => {
  await db.migrate.latest();
  await seed();
};
// jest.setup.js (runs before each test file)
beforeEach(async () => {
  await db.raw('BEGIN');
});

afterEach(async () => {
  await db.raw('ROLLBACK');
});

This is fast — seed runs once, each test gets a clean slate via rollback.

Seed Per Test (Heavy Isolation)

For tests that commit transactions and can't use rollback:

beforeEach(async () => {
  await db.raw('TRUNCATE TABLE users, products, orders RESTART IDENTITY CASCADE');
  await seed();
});

Slower but guaranteed isolation. Acceptable for integration test suites with <500 tests.

Hybrid Approach

Seed static reference data once. Create dynamic test data per test:

// jest.globalSetup.js — runs once
module.exports = async () => {
  await db.migrate.latest();
  // Only seed lookup/reference tables
  await seedReferenceData(); // categories, roles, countries, etc.
};

// Individual tests use factories for test-specific data
test('admin can delete products', async () => {
  const admin = await UserFactory.create({ role: 'admin' });
  const product = await ProductFactory.create({ category_id: 1 });
  
  await deleteProduct(admin.id, product.id);
  
  const found = await Product.findById(product.id);
  expect(found).toBeNull();
});

Parallel Testing and Seeding

When tests run in parallel (multiple workers), each worker needs its own database to avoid data conflicts.

Per-Worker Databases

// jest.config.js
module.exports = {
  globalSetup: './test/global-setup.js',
  globalTeardown: './test/global-teardown.js',
  testEnvironment: './test/db-environment.js',
};

// test/global-setup.js
module.exports = async () => {
  const workers = process.env.JEST_WORKERS || 4;
  
  for (let i = 1; i <= workers; i++) {
    const dbName = `test_db_${i}`;
    await createDatabase(dbName);
    await runMigrations(dbName);
    await seedDatabase(dbName);
  }
};
// test/db-environment.js
class DatabaseEnvironment extends NodeEnvironment {
  async setup() {
    await super.setup();
    const workerId = process.env.JEST_WORKER_ID;
    this.global.DB_URL = `postgres://test:test@localhost/test_db_${workerId}`;
  }
}

Schema-Based Isolation (PostgreSQL)

Instead of separate databases, use separate schemas:

-- Each worker gets its own schema
CREATE SCHEMA IF NOT EXISTS test_worker_1;
SET search_path TO test_worker_1, public;

Seed Data vs. Test Data

Keep a clear distinction:

Seed data — stable reference data that tests read but don't modify:

  • User roles and permissions
  • Country/currency codes
  • Product categories
  • Configuration tables

Test data — created by factories for specific tests:

  • Individual users with specific states
  • Orders in specific stages
  • Products with specific attributes

Mixing them causes fragility. A test that assumes "there are exactly 3 categories" breaks the moment your seed file changes.

// Bad — test asserts on seeded data count
test('category list shows all categories', async () => {
  const categories = await getCategories();
  expect(categories).toHaveLength(3); // breaks if seed changes
});

// Good — test asserts on behavior, not count
test('category list includes all categories', async () => {
  const seededCount = await db('categories').count('id as n').first();
  const displayed = await getCategories();
  expect(displayed).toHaveLength(seededCount.n);
});

Debugging Seed Failures

When a seed script fails in CI but works locally, the usual causes:

1. Migration not run before seed

# Wrong order
- run: node test/seed.js
- run: npm run migrate  # too late

# Right order
- run: npm run migrate
- run: node test/seed.js

2. Constraint violations from stale data

The database has leftover data from a previous run. Add explicit truncation at the start of your seed:

async function seed() {
  await db.raw('TRUNCATE TABLE ... RESTART IDENTITY CASCADE');
  // now insert
}

3. Sequence not reset

When you hardcode IDs and the sequence is ahead:

SELECT setval('users_id_seq', (SELECT MAX(id) FROM users));

Or use RESTART IDENTITY in your truncate.

4. Timing — database not ready

In CI, the database service takes a few seconds to become available. Add a health check wait:

services:
  postgres:
    options: >-
      --health-cmd pg_isready
      --health-interval 10s
      --health-retries 5

Summary

Database seeding for tests works best when:

  1. Migrations run first — seed assumes schema exists
  2. Truncate before insert — idempotent seeds don't accumulate stale data
  3. Reference data is seeded once — roles, categories, codes
  4. Test data is created per test — use factories, not seed scripts
  5. Per-worker databases in parallel — no shared mutable state across workers
  6. Transaction rollback for isolation where possible — faster than truncate/reseed

The goal is a test database that starts in a known state every time, regardless of what previous tests did.

Read more