Database Testing Guide: Pytest, SQLAlchemy, and Beyond
Database bugs are among the hardest to catch and the most expensive to fix. Data corruption, failed migrations, and broken queries don't always manifest in application tests — they often surface in production, after the fact.
This guide covers database testing from the ground up: writing unit tests for database operations, setting up proper test isolation, testing schema migrations, and monitoring production database health.
Why Database Testing Deserves Its Own Strategy
Application code tests verify behavior. Database tests verify data integrity — a fundamentally different concern.
What application tests miss:
- Constraint violations that only appear with real data
- Index performance degradation under load
- Migration failures on databases with existing data
- Cascading deletions that corrupt related records
- Race conditions in concurrent writes
- Null handling in aggregate queries
What can go wrong without database testing:
- A migration runs fine on an empty staging database, fails on production with 10 million rows
- A query returns correct results in unit tests (SQLite) but fails silently in production (PostgreSQL) due to different NULL semantics
- A foreign key constraint that was never enforced in tests gets violated in production
The solution is to test databases the way you test code: with isolation, repeatability, and real assertions.
Setting Up Pytest for Database Testing
Project Structure
myapp/
├── src/
│ ├── models.py # SQLAlchemy models
│ ├── database.py # Engine and session setup
│ └── repositories/
│ ├── user_repo.py
│ └── order_repo.py
├── tests/
│ ├── conftest.py # Shared fixtures
│ ├── test_user_repo.py
│ ├── test_order_repo.py
│ └── test_migrations.py
└── alembic/
└── versions/ # Migration filesCore Fixtures in conftest.py
The most important design decision: use rollback-based isolation instead of creating and dropping databases per test. This is 5–10x faster and more reliable.
# tests/conftest.py
import pytest
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from src.models import Base
from src.database import get_session
# Use an in-memory SQLite database for unit tests
TEST_DATABASE_URL = "sqlite:///:memory:"
# For integration tests that need PostgreSQL behavior:
# TEST_DATABASE_URL = "postgresql://user:pass@localhost:5432/testdb"
@pytest.fixture(scope="session")
def engine():
"""Create the test database engine once per test session."""
engine = create_engine(TEST_DATABASE_URL)
Base.metadata.create_all(engine)
yield engine
Base.metadata.drop_all(engine)
engine.dispose()
@pytest.fixture(scope="function")
def db_session(engine):
"""
Create a fresh database session for each test, with automatic rollback.
This wraps each test in a transaction that gets rolled back at the end,
keeping tests isolated without the overhead of recreating the schema.
"""
connection = engine.connect()
transaction = connection.begin()
Session = sessionmaker(bind=connection)
session = Session()
yield session
session.close()
transaction.rollback()
connection.close()Every test gets a clean database state. No cleanup code needed. No test pollution.
Why Not Drop and Recreate?
# ❌ Slow — recreates schema for every test
@pytest.fixture
def db_session():
Base.metadata.create_all(engine)
session = Session()
yield session
session.close()
Base.metadata.drop_all(engine)
# ✅ Fast — wraps in transaction, rolls back
@pytest.fixture
def db_session(engine):
connection = engine.connect()
transaction = connection.begin()
session = Session(bind=connection)
yield session
session.close()
transaction.rollback()
connection.close()A test suite that takes 112 minutes with drop/recreate takes 14 minutes with rollback fixtures — an 8x speedup from one architecture change.
Testing Database Models
CRUD Operations
# tests/test_user_repo.py
from src.repositories.user_repo import UserRepository
from src.models import User
def test_create_user(db_session):
repo = UserRepository(db_session)
user = repo.create(email="test@example.com", name="Test User")
assert user.id is not None
assert user.email == "test@example.com"
assert user.created_at is not None
def test_get_user_by_email(db_session):
repo = UserRepository(db_session)
repo.create(email="findme@example.com", name="Find Me")
found = repo.get_by_email("findme@example.com")
assert found is not None
assert found.name == "Find Me"
def test_get_nonexistent_user_returns_none(db_session):
repo = UserRepository(db_session)
result = repo.get_by_email("nobody@example.com")
assert result is None
def test_update_user(db_session):
repo = UserRepository(db_session)
user = repo.create(email="update@example.com", name="Old Name")
updated = repo.update(user.id, name="New Name")
assert updated.name == "New Name"
# Verify the original object was also updated (same session)
assert user.name == "New Name"
def test_delete_user(db_session):
repo = UserRepository(db_session)
user = repo.create(email="delete@example.com", name="Delete Me")
user_id = user.id
repo.delete(user_id)
assert repo.get_by_id(user_id) is NoneTesting Constraints
def test_duplicate_email_raises_integrity_error(db_session):
repo = UserRepository(db_session)
repo.create(email="duplicate@example.com", name="First")
with pytest.raises(IntegrityError):
repo.create(email="duplicate@example.com", name="Second")
def test_email_cannot_be_null(db_session):
with pytest.raises((IntegrityError, ValidationError)):
User(email=None, name="No Email")
db_session.flush()
def test_cascade_delete_removes_orders(db_session):
# Create user with orders
user = User(email="cascade@example.com", name="Cascade Test")
db_session.add(user)
db_session.flush()
order = Order(user_id=user.id, total=99.99)
db_session.add(order)
db_session.flush()
# Delete user
db_session.delete(user)
db_session.flush()
# Orders should be gone too
remaining_orders = db_session.query(Order).filter_by(user_id=user.id).all()
assert len(remaining_orders) == 0Testing Queries and Aggregates
def test_count_active_users(db_session):
repo = UserRepository(db_session)
repo.create(email="active1@example.com", name="Active 1", active=True)
repo.create(email="active2@example.com", name="Active 2", active=True)
repo.create(email="inactive@example.com", name="Inactive", active=False)
count = repo.count_active()
assert count == 2
def test_get_orders_in_date_range(db_session):
# Create orders across different dates
create_order(db_session, date=date(2026, 1, 1), total=10.00)
create_order(db_session, date=date(2026, 3, 15), total=25.00)
create_order(db_session, date=date(2026, 6, 30), total=50.00)
orders = OrderRepository(db_session).get_in_range(
start=date(2026, 1, 1),
end=date(2026, 4, 1)
)
assert len(orders) == 2
assert sum(o.total for o in orders) == 35.00Testing Database Migrations
Migration testing is where most teams have gaps. A migration that works on a clean database often fails on a database with years of production data.
Testing Migration Up/Down
# tests/test_migrations.py
from alembic.config import Config
from alembic import command
from alembic.runtime.migration import MigrationContext
from alembic.script import ScriptDirectory
@pytest.fixture
def migration_engine():
"""Separate engine for migration tests — needs DDL operations."""
engine = create_engine("postgresql://user:pass@localhost:5432/migration_test")
yield engine
engine.dispose()
def test_migration_up_and_down(migration_engine):
"""Verify the latest migration can be applied and rolled back."""
alembic_cfg = Config("alembic.ini")
alembic_cfg.set_main_option("sqlalchemy.url", str(migration_engine.url))
# Apply migration
command.upgrade(alembic_cfg, "head")
# Verify schema is as expected
with migration_engine.connect() as conn:
context = MigrationContext.configure(conn)
current_rev = context.get_current_revision()
assert current_rev is not None
# Roll back
command.downgrade(alembic_cfg, "-1")
# Verify rollback worked
with migration_engine.connect() as conn:
context = MigrationContext.configure(conn)
rolled_back_rev = context.get_current_revision()
assert rolled_back_rev != current_rev
def test_migration_with_existing_data(migration_engine):
"""Verify migration handles existing data correctly."""
# Create state before migration
with migration_engine.connect() as conn:
conn.execute(text("""
INSERT INTO users (email, name)
VALUES ('existing@example.com', 'Existing User')
"""))
conn.commit()
# Apply migration
alembic_cfg = Config("alembic.ini")
command.upgrade(alembic_cfg, "head")
# Verify existing data survived
with migration_engine.connect() as conn:
result = conn.execute(text("SELECT COUNT(*) FROM users")).scalar()
assert result == 1
def test_no_pending_migrations():
"""CI check: fail if there are unapplied migrations."""
alembic_cfg = Config("alembic.ini")
script = ScriptDirectory.from_config(alembic_cfg)
with engine.connect() as conn:
context = MigrationContext.configure(conn)
current_heads = set(context.get_current_heads())
script_heads = set(script.get_heads())
assert current_heads == script_heads, (
f"Unapplied migrations: {script_heads - current_heads}"
)Testing Migrations in CI
Add migration testing to your CI pipeline:
# .github/workflows/migrations.yml
name: Migration Tests
on: [push, pull_request]
jobs:
test-migrations:
services:
postgres:
image: postgres:15
env:
POSTGRES_PASSWORD: testpass
POSTGRES_DB: migration_test
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
steps:
- uses: actions/checkout@v3
- name: Test migrations apply cleanly
run: |
alembic upgrade head
python -m pytest tests/test_migrations.py -vSQLite vs PostgreSQL in Tests
SQLite is fast and convenient for unit tests. PostgreSQL is what runs in production. Use both appropriately:
| Use case | Database | Why |
|---|---|---|
| Unit tests (repository logic) | SQLite in-memory | Fast, isolated |
| Integration tests (constraints, transactions) | PostgreSQL | Real behavior |
| Migration tests | PostgreSQL | DDL differences matter |
| Performance tests | PostgreSQL | Explain plans, indexes |
SQLite differences that bite teams:
AUTOINCREMENTbehavior differsSTRICTmode not default- Full-text search works differently
NOW()has different precision- No
RETURNINGclause support (before SQLite 3.35) - NULL comparisons behave differently in some edge cases
For tests that validate business logic: SQLite is fine. For tests that validate data integrity and constraints: use PostgreSQL.
Testing Async Database Code
FastAPI and modern Python apps increasingly use async database access. Testing async code requires async fixtures:
# For SQLAlchemy async
import pytest
from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession
from sqlalchemy.orm import sessionmaker
@pytest.fixture
async def async_session():
engine = create_async_engine("sqlite+aiosqlite:///:memory:")
async with engine.begin() as conn:
await conn.run_sync(Base.metadata.create_all)
AsyncSessionLocal = sessionmaker(
engine, class_=AsyncSession, expire_on_commit=False
)
async with AsyncSessionLocal() as session:
yield session
await engine.dispose()
@pytest.mark.asyncio
async def test_async_create_user(async_session):
repo = AsyncUserRepository(async_session)
user = await repo.create(email="async@example.com", name="Async User")
assert user.id is not NoneInstall pytest-asyncio and configure it:
# pytest.ini
[pytest]
asyncio_mode = autoMonitoring Database Health in Production
Automated tests tell you your database logic is correct. Monitoring tells you your database is healthy.
Key database health signals to monitor:
Query performance
# Monitor slow query percentage
helpmetest health db-slow-queries 5mConnection pool Monitor connection pool exhaustion — it manifests as application timeouts, not database errors.
Replication lag For read replicas, monitor replication lag. Stale reads cause subtle bugs that are hard to reproduce.
Disk usage Database storage fills up suddenly when data volumes spike. Monitor disk usage trends, not just current state.
Failed jobs Background jobs that write to the database often fail silently. Monitor your job queue for failed runs.
Property-Based Testing for Databases
For complex queries, use property-based testing to verify invariants:
from hypothesis import given, strategies as st
from hypothesis.extra.django import TestCase
class TestOrderCalculations(TestCase):
@given(
quantities=st.lists(st.integers(min_value=1, max_value=100), min_size=1, max_size=10),
prices=st.lists(st.decimals(min_value="0.01", max_value="999.99"), min_size=1, max_size=10)
)
def test_order_total_equals_sum_of_line_items(self, quantities, prices):
# Pair quantities with prices (use shortest list)
line_items = list(zip(quantities, prices))
order = Order.create(line_items=line_items)
expected_total = sum(q * p for q, p in line_items)
assert order.total == pytest.approx(float(expected_total), rel=1e-5)Hypothesis generates hundreds of combinations automatically, finding edge cases that hand-written tests miss.
Summary: Database Testing Checklist
Unit tests (run in CI on every PR):
- CRUD operations for each repository
- Constraint violation tests
- Edge cases (empty results, null values, boundary conditions)
- Business logic in aggregates and queries
Integration tests (run in CI on every PR):
- Tests use PostgreSQL (not just SQLite)
- Concurrent write scenarios
- Transaction rollback behavior
- Foreign key constraint enforcement
Migration tests (run in CI on every PR):
- Latest migration applies cleanly
- Latest migration rolls back cleanly
- No pending migrations check
- Migration with existing data test
Production monitoring:
- Slow query monitoring
- Connection pool health
- Replication lag (if applicable)
- Disk usage trending
The database is the one part of your system that doesn't reset between deployments. Bugs in database logic compound over time. Testing it thoroughly is an investment that pays back every time a migration works correctly and every time a query returns the right result.