Pumba: Docker Container Chaos Testing Guide
Pumba is a chaos testing tool for Docker containers. It kills containers, pauses them, and injects network failures using Linux tc (traffic control) — directly on your Docker host or inside Docker Compose environments. If you run microservices locally with Docker Compose or test in Docker-based CI environments, Pumba is the fastest way to add chaos testing without Kubernetes.
What Pumba Does
Pumba targets Docker containers by name or label and applies:
Container chaos:
kill— send a signal to the container (SIGKILL by default)stop— graceful stop with timeoutrm— remove the containerpause— pause all processes in the container (Docker equivalent of SIGSTOP)
Network chaos (using Linux tc + netem):
netem delay— add latency to outgoing trafficnetem loss— randomly drop packetsnetem corrupt— corrupt packets (flip random bits)netem rate— limit bandwidthnetem duplicate— duplicate packets
Stress testing (using stress-ng):
stress cpu— consume CPU cyclesstress memory— consume memorystress io— stress disk I/O
Installation
Binary (Linux/macOS/Windows):
# macOS
brew install pumba
<span class="hljs-comment"># Linux
curl -L https://github.com/alexei-led/pumba/releases/latest/download/pumba_linux_amd64 \
-o /usr/local/bin/pumba
<span class="hljs-built_in">chmod +x /usr/local/bin/pumba
<span class="hljs-comment"># Verify
pumba --versionDocker (run Pumba as a container, with access to the Docker socket):
docker run -d \
-v /var/run/docker.sock:/var/run/docker.sock \
--name pumba \
gaiaadm/pumba pumba --helpBasic Container Kill
Kill a container by name:
pumba kill my-web-serviceKill with a specific signal (SIGTERM for graceful shutdown):
pumba kill --signal SIGTERM my-web-serviceKill a random container from a group (by label):
pumba kill --random <span class="hljs-string">"re2:app=my-.*"Pumba uses RE2 regex syntax for container matching. re2:app=my-.* matches any container whose name matches the pattern.
Random Kill on a Schedule
Run kill chaos every 30 seconds against a specific container:
pumba --interval 30s kill my-web-serviceThis simulates random instance termination — like AWS Spot Instance interruptions or random pod evictions. Run this while your integration tests are executing to verify they pass even with container restarts.
Network Latency
Add 500ms latency to all outgoing traffic from a container:
pumba netem --duration 30s delay \
--time 500 \
my-web-serviceAdd latency with 100ms jitter (normally distributed):
pumba netem --duration 30s delay \
--time 500 \
--jitter 100 \
--distribution normal \
my-web-serviceDistribution options: normal, uniform, pareto, paretonormal. Normal is usually most realistic.
Packet Loss
Drop 30% of outgoing packets randomly:
pumba netem --duration 30s loss \
--percent 30 \
my-web-serviceThis simulates a flaky network connection. Your application should retry failed requests — if it doesn't, packet loss will cause visible errors.
Correlated packet loss (packets are more likely to be dropped in bursts, like real network issues):
pumba netem --duration 30s loss \
--percent 30 \
--correlation 80 \
my-web-service--correlation 80 means 80% correlation between consecutive packet drop decisions — drops cluster together rather than being independent.
Bandwidth Throttling
Limit outgoing bandwidth to 100 Kbit/s:
pumba netem --duration 60s rate \
--rate 100kbit \
my-web-serviceTest how your service behaves when uploading large files or streaming data over a limited connection. Rate options: bps, kbps, mbps, gbps, kbit, mbit, gbit.
Targeting Specific Interfaces
Pumba's network chaos applies to all traffic by default. Target a specific network interface or egress to specific IP ranges:
# Only inject chaos on the eth0 interface
pumba netem --interface eth0 --duration 30s delay \
--<span class="hljs-keyword">time 1000 \
my-web-service
<span class="hljs-comment"># Only affect traffic to a specific container (using egress)
pumba netem --tc-image gaiaadm/pumba:0.10.0 --duration 30s delay \
--<span class="hljs-keyword">time 500 \
--egress <span class="hljs-string">"$(docker inspect -f '{{.NetworkSettings.IPAddress}}' my-database)" \
my-web-serviceTargeting specific IP ranges is critical in local development — you don't want to inject chaos on all outbound traffic, just on connections to specific dependencies.
Container Pause
Pause all processes in a container (equivalent to SIGSTOP):
pumba pause --duration 10s my-web-serviceThe container remains running from Docker's perspective but all processes inside are frozen. This simulates a "zombie" container — one that's alive but not responding. Useful for testing:
- Health check behavior when a service stops responding
- Load balancer removal of unhealthy backends
- Timeout handling in upstream callers
Stress Testing
Run CPU stress inside a container:
pumba stress --duration 60s \
--stressors "--cpu 2 --timeout 60s" \
my-web-serviceThis uses stress-ng inside the container. The container must have stress-ng installed, or you can use the -o flag to use Pumba's built-in stress implementation.
Memory stress:
pumba stress --duration 60s \
--stressors "--vm 2 --vm-bytes 512m --timeout 60s" \
my-web-serviceThis allocates 512MB with 2 workers — useful for testing what happens when a service approaches its memory limit.
Docker Compose Integration
Pumba works naturally with Docker Compose environments:
# docker-compose.yml
version: '3'
services:
api:
image: my-api:latest
labels:
role: api
depends_on:
- database
- redis
database:
image: postgres:15
labels:
role: database
redis:
image: redis:7
labels:
role: cacheRun chaos against the database service:
# Add 1-second latency to database container
pumba netem --duration 60s delay \
--<span class="hljs-keyword">time 1000 \
api_database_1Or target by pattern:
# Target any container whose name starts with "api"
pumba netem --duration 60s loss \
--percent 20 \
<span class="hljs-string">"re2:api_.*"Integration Test Pattern
The most valuable use of Pumba: inject chaos during your integration test run.
#!/bin/bash
<span class="hljs-comment"># run-chaos-tests.sh
<span class="hljs-comment"># Start the application stack
docker-compose up -d
<span class="hljs-comment"># Wait for services to be healthy
<span class="hljs-built_in">sleep 10
<span class="hljs-comment"># Start chaos in the background
pumba --interval 15s <span class="hljs-built_in">kill <span class="hljs-string">"re2:myapp_worker_.*" &
PUMBA_PID=$!
<span class="hljs-comment"># Run your test suite
pytest tests/integration/ --<span class="hljs-built_in">timeout=120
TEST_RESULT=$?
<span class="hljs-comment"># Stop chaos
<span class="hljs-built_in">kill <span class="hljs-variable">$PUMBA_PID
<span class="hljs-comment"># Cleanup
docker-compose down
<span class="hljs-built_in">exit <span class="hljs-variable">$TEST_RESULTIf your integration tests fail under container chaos, you've found a resilience gap. Fix the application (add retries, handle worker restarts) until the tests pass with chaos running.
Chaos as a CI Step
# .github/workflows/integration-tests.yml
name: Integration Tests with Chaos
on: [push]
jobs:
chaos-integration:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install Pumba
run: |
curl -L https://github.com/alexei-led/pumba/releases/latest/download/pumba_linux_amd64 \
-o /usr/local/bin/pumba
chmod +x /usr/local/bin/pumba
- name: Start application stack
run: docker-compose up -d
- name: Wait for healthy state
run: |
until docker-compose exec -T api curl -f http://localhost:8080/health; do
sleep 2
done
- name: Run background chaos
run: |
pumba --interval 20s kill "re2:${COMPOSE_PROJECT_NAME}_worker_.*" &
echo "PUMBA_PID=$!" >> $GITHUB_ENV
- name: Run integration tests
run: pytest tests/integration/ --tb=short
- name: Stop chaos
if: always()
run: kill $PUMBA_PID || true
- name: Collect logs
if: always()
run: docker-compose logs > docker-logs.txt
- name: Cleanup
if: always()
run: docker-compose down -v
- name: Upload logs
uses: actions/upload-artifact@v3
if: failure()
with:
name: chaos-test-logs
path: docker-logs.txtNetwork Chaos in CI (tc requirements)
Pumba's network chaos uses Linux's tc command. In CI environments, this requires:
- A Linux runner (GitHub Actions
ubuntu-latestworks) - The
iproute2package installed (usually present in Ubuntu) - Sufficient privileges (run with
--privilegedif inside Docker)
Verify tc is available:
which tc
<span class="hljs-comment"># /usr/sbin/tc
tc --<span class="hljs-built_in">helpIf tc isn't available, you can use Pumba's -tc-image flag to inject tc from a Docker image:
pumba --tc-image gaiaadm/pumba netem --duration 30s delay \
--time 500 \
my-containerPumba vs LitmusChaos vs Toxiproxy
| Tool | Environment | Approach | Best For |
|---|---|---|---|
| Pumba | Docker/local | Direct container manipulation | Local dev, Docker Compose CI |
| LitmusChaos | Kubernetes | CRD-based experiments | Kubernetes production |
| Toxiproxy | Any (proxy) | TCP-level network interception | Integration tests, precise control |
Pumba shines in local Docker Compose environments and Docker-based CI. It requires no cluster, no CRDs, no proxy configuration — just the Docker socket and a binary. For teams not yet on Kubernetes, it's the fastest path to meaningful chaos testing.
Toxiproxy is better when you need precise, per-connection control — injecting latency on only the connection between service A and database B, while leaving other connections alone.
LitmusChaos is better when you're running in Kubernetes and want experiments managed as Kubernetes resources with proper RBAC, scheduling, and observability integration.
Pumba's simplicity is its greatest asset. Running pumba kill my-service during your tests takes five minutes to set up and immediately tells you whether your service handles restarts correctly. Start there, then graduate to more complex tools as your resilience engineering practice matures.