Stress Testing Databases Under Concurrency: Connections, Locks, and Bottlenecks
Database layers fail in ways that application layers don't. HTTP servers can scale horizontally almost indefinitely — databases hit hard limits around connections, lock contention, and I/O capacity. Stress testing the database independently from the application reveals bottlenecks that composite API tests often miss.
Why Database Stress Testing Is Different
When you stress test an API endpoint, you're testing the entire stack. A slow response could mean: slow application code, slow database query, lock contention, connection pool exhaustion, or network latency. The signal is blended.
Database stress testing isolates the data layer:
- Connection pool limits: how many concurrent connections does your database handle before new connections queue or fail?
- Lock contention: do concurrent writes block each other? How long?
- Query performance under load: does a query that takes 10ms at 1 connection take 200ms at 100?
- Cache effectiveness: does the buffer pool hold under load, or does cache eviction cause I/O spikes?
Connection Pool Limits
Every database has a maximum concurrent connection limit. PostgreSQL defaults to 100. MySQL defaults to 151. Beyond this limit, new connections queue (if client-side pooling is configured) or fail immediately.
Testing the Limit
Use pgbench (PostgreSQL) to test raw connection limits:
# Initialize test database
pgbench -i -s 50 mydb
<span class="hljs-comment"># Stress test: 200 clients, 10 threads, 60 seconds
pgbench -c 200 -j 10 -T 60 mydbWatch for connection errors as you increase -c past the server's max_connections. The transition from linear throughput degradation to error spikes marks the limit.
For MySQL, use sysbench:
# Install
<span class="hljs-built_in">sudo apt install sysbench
<span class="hljs-comment"># Prepare
sysbench oltp_read_write --db-driver=mysql --mysql-db=testdb \
--mysql-user=root prepare
<span class="hljs-comment"># Run with 300 concurrent threads
sysbench oltp_read_write --db-driver=mysql --mysql-db=testdb \
--mysql-user=root --threads=300 --<span class="hljs-keyword">time=60 runConnection Pooler Stress Testing
If you use PgBouncer, Pgpool-II, or a connection pooler in your application (HikariCP, c3p0, DBCP), stress test the pooler configuration separately:
# Test PgBouncer pool limits
pgbench -c 500 -j 20 -T 60 -h localhost -p 6432 mydbpool_size in PgBouncer limits server-side connections. If pool_size=20 and you send 500 client connections, 480 will queue at PgBouncer. The throughput ceiling is determined by the 20 pooled connections, not the 500 clients.
Identifying Lock Contention
Lock contention appears when concurrent writes block each other. Under stress, short locks become long waits. A table-level lock that holds for 100ms at low concurrency becomes a 10-second bottleneck at 200 concurrent writers.
Simulating Write Contention
Write a concurrent update stress test:
import threading
import psycopg2
import time
def worker(thread_id, iterations=1000):
conn = psycopg2.connect("dbname=testdb user=postgres")
cur = conn.cursor()
for i in range(iterations):
try:
cur.execute("UPDATE counters SET value = value + 1 WHERE id = 1")
conn.commit()
except Exception as e:
conn.rollback()
cur.close()
conn.close()
# Launch 100 concurrent updaters
threads = [threading.Thread(target=worker, args=(i,)) for i in range(100)]
start = time.time()
for t in threads: t.start()
for t in threads: t.join()
print(f"Elapsed: {time.time() - start:.2f}s")This creates a hot row — 100 threads all updating the same row. Under PostgreSQL's row-level locking, transactions queue. The throughput you observe is the maximum write throughput for that update pattern.
Observing Live Lock Contention
During the stress test, query PostgreSQL's lock views:
-- See waiting locks
SELECT
pid,
wait_event_type,
wait_event,
state,
query_start,
query
FROM pg_stat_activity
WHERE wait_event_type = 'Lock';
-- See blocking relationships
SELECT
blocked.pid AS blocked_pid,
blocking.pid AS blocking_pid,
blocked.query AS blocked_query
FROM pg_stat_activity AS blocked
JOIN pg_stat_activity AS blocking
ON blocking.pid = ANY(pg_blocking_pids(blocked.pid))
WHERE blocked.wait_event_type = 'Lock';If you see hundreds of rows in pg_stat_activity waiting for locks, your write pattern needs redesign (partitioning, batching, or optimistic concurrency).
Query Performance Under Load
A query that performs well in isolation may degrade significantly under concurrent load due to cache contention, I/O saturation, or resource competition.
Concurrent Query Testing
import concurrent.futures
import psycopg2
import time
import statistics
def run_query(query, params=None):
conn = psycopg2.connect("dbname=testdb user=postgres")
cur = conn.cursor()
start = time.time()
cur.execute(query, params)
cur.fetchall()
elapsed = time.time() - start
cur.close()
conn.close()
return elapsed
# Baseline: single thread
single_times = [run_query("SELECT * FROM orders WHERE user_id = %s", (1,)) for _ in range(100)]
print(f"Single-thread p95: {statistics.quantiles(single_times, n=20)[18]*1000:.0f}ms")
# Stress: 100 concurrent threads
with concurrent.futures.ThreadPoolExecutor(max_workers=100) as executor:
futures = [
executor.submit(run_query, "SELECT * FROM orders WHERE user_id = %s", (i % 1000,))
for i in range(1000)
]
concurrent_times = [f.result() for f in futures]
print(f"100-concurrent p95: {statistics.quantiles(concurrent_times, n=20)[18]*1000:.0f}ms")Compare baseline p95 with concurrent p95. A 5x or greater increase indicates resource contention — usually shared buffer cache misses, sequential scan competition, or I/O saturation.
Database-Specific Stress Scenarios
Index Effectiveness Under Load
-- Before: sequential scan
EXPLAIN ANALYZE SELECT * FROM events WHERE created_at > NOW() - INTERVAL '1 hour';
-- Add index
CREATE INDEX CONCURRENTLY idx_events_created_at ON events(created_at);
-- After: index scan
EXPLAIN ANALYZE SELECT * FROM events WHERE created_at > NOW() - INTERVAL '1 hour';Under concurrent load, the index advantage compounds — sequential scans fight for I/O, while index scans are more selective and buffer-friendly.
Autovacuum Under Write Load
Heavy writes trigger autovacuum. Under stress, autovacuum can compete for I/O, causing latency spikes. Monitor during stress tests:
SELECT
schemaname,
tablename,
last_autovacuum,
n_dead_tup,
n_live_tup
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;High n_dead_tup under active writes indicates autovacuum is falling behind — a risk for long-running stress tests.
Stress Testing Redis Under Concurrency
For caching layers, use redis-benchmark:
# 100 concurrent clients, 1M requests, mixed read/write
redis-benchmark -c 100 -n 1000000 -q
<span class="hljs-comment"># Specific command pattern
redis-benchmark -c 200 -n 500000 -t get,<span class="hljs-built_in">set -r 1000000 -qWatch for latency percentiles. Redis is single-threaded for commands, so throughput scales with command complexity, not concurrency.
Interpreting Results
After database stress tests, look for:
Linear throughput curve: throughput increases proportionally with concurrency up to a point, then plateaus. The plateau is your database's effective limit.
Latency cliff: response time stays flat until a threshold, then increases sharply. This is the point where connections start queuing.
Error rate appearance: when connections are refused or queries time out, you've exceeded the hard limit.
Recovery behavior: reduce concurrency back to baseline — does throughput return immediately, or does it take time? Delayed recovery indicates connection pool exhaustion or lock storms.
Integrating Database Stress Testing
Database stress tests fit best in:
- Pre-release pipelines: run before major releases that include schema changes or new query patterns
- Capacity planning: run periodically to validate that growth projections match actual database headroom
- Post-migration validation: after schema migrations, verify that query plans remain efficient under load
Pair with application-level monitoring via tools like HelpMeTest — functional tests verify correctness, while database stress tests validate the data layer's capacity.
Conclusion
Database layers fail silently and unpredictably under concurrency. Connection pool limits, lock contention, and cache eviction don't appear in unit tests or low-concurrency integration tests — they emerge only when real concurrent load is applied.
Testing the database layer independently, before composite API stress tests, gives you a clear picture of where the limits actually are.