Pumba: Docker Container Chaos Testing Guide

Pumba: Docker Container Chaos Testing Guide

Pumba is a chaos testing tool for Docker containers. It kills containers, pauses them, and injects network failures using Linux tc (traffic control) — directly on your Docker host or inside Docker Compose environments. If you run microservices locally with Docker Compose or test in Docker-based CI environments, Pumba is the fastest way to add chaos testing without Kubernetes.

What Pumba Does

Pumba targets Docker containers by name or label and applies:

Container chaos:

  • kill — send a signal to the container (SIGKILL by default)
  • stop — graceful stop with timeout
  • rm — remove the container
  • pause — pause all processes in the container (Docker equivalent of SIGSTOP)

Network chaos (using Linux tc + netem):

  • netem delay — add latency to outgoing traffic
  • netem loss — randomly drop packets
  • netem corrupt — corrupt packets (flip random bits)
  • netem rate — limit bandwidth
  • netem duplicate — duplicate packets

Stress testing (using stress-ng):

  • stress cpu — consume CPU cycles
  • stress memory — consume memory
  • stress io — stress disk I/O

Installation

Binary (Linux/macOS/Windows):

# macOS
brew install pumba

<span class="hljs-comment"># Linux
curl -L https://github.com/alexei-led/pumba/releases/latest/download/pumba_linux_amd64 \
  -o /usr/local/bin/pumba
<span class="hljs-built_in">chmod +x /usr/local/bin/pumba

<span class="hljs-comment"># Verify
pumba --version

Docker (run Pumba as a container, with access to the Docker socket):

docker run -d \
  -v /var/run/docker.sock:/var/run/docker.sock \
  --name pumba \
  gaiaadm/pumba pumba --help

Basic Container Kill

Kill a container by name:

pumba kill my-web-service

Kill with a specific signal (SIGTERM for graceful shutdown):

pumba kill --signal SIGTERM my-web-service

Kill a random container from a group (by label):

pumba kill --random <span class="hljs-string">"re2:app=my-.*"

Pumba uses RE2 regex syntax for container matching. re2:app=my-.* matches any container whose name matches the pattern.

Random Kill on a Schedule

Run kill chaos every 30 seconds against a specific container:

pumba --interval 30s kill my-web-service

This simulates random instance termination — like AWS Spot Instance interruptions or random pod evictions. Run this while your integration tests are executing to verify they pass even with container restarts.

Network Latency

Add 500ms latency to all outgoing traffic from a container:

pumba netem --duration 30s delay \
  --time 500 \
  my-web-service

Add latency with 100ms jitter (normally distributed):

pumba netem --duration 30s delay \
  --time 500 \
  --jitter 100 \
  --distribution normal \
  my-web-service

Distribution options: normal, uniform, pareto, paretonormal. Normal is usually most realistic.

Packet Loss

Drop 30% of outgoing packets randomly:

pumba netem --duration 30s loss \
  --percent 30 \
  my-web-service

This simulates a flaky network connection. Your application should retry failed requests — if it doesn't, packet loss will cause visible errors.

Correlated packet loss (packets are more likely to be dropped in bursts, like real network issues):

pumba netem --duration 30s loss \
  --percent 30 \
  --correlation 80 \
  my-web-service

--correlation 80 means 80% correlation between consecutive packet drop decisions — drops cluster together rather than being independent.

Bandwidth Throttling

Limit outgoing bandwidth to 100 Kbit/s:

pumba netem --duration 60s rate \
  --rate 100kbit \
  my-web-service

Test how your service behaves when uploading large files or streaming data over a limited connection. Rate options: bps, kbps, mbps, gbps, kbit, mbit, gbit.

Targeting Specific Interfaces

Pumba's network chaos applies to all traffic by default. Target a specific network interface or egress to specific IP ranges:

# Only inject chaos on the eth0 interface
pumba netem --interface eth0 --duration 30s delay \
  --<span class="hljs-keyword">time 1000 \
  my-web-service

<span class="hljs-comment"># Only affect traffic to a specific container (using egress)
pumba netem --tc-image gaiaadm/pumba:0.10.0 --duration 30s delay \
  --<span class="hljs-keyword">time 500 \
  --egress <span class="hljs-string">"$(docker inspect -f '{{.NetworkSettings.IPAddress}}' my-database)" \
  my-web-service

Targeting specific IP ranges is critical in local development — you don't want to inject chaos on all outbound traffic, just on connections to specific dependencies.

Container Pause

Pause all processes in a container (equivalent to SIGSTOP):

pumba pause --duration 10s my-web-service

The container remains running from Docker's perspective but all processes inside are frozen. This simulates a "zombie" container — one that's alive but not responding. Useful for testing:

  • Health check behavior when a service stops responding
  • Load balancer removal of unhealthy backends
  • Timeout handling in upstream callers

Stress Testing

Run CPU stress inside a container:

pumba stress --duration 60s \
  --stressors "--cpu 2 --timeout 60s" \
  my-web-service

This uses stress-ng inside the container. The container must have stress-ng installed, or you can use the -o flag to use Pumba's built-in stress implementation.

Memory stress:

pumba stress --duration 60s \
  --stressors "--vm 2 --vm-bytes 512m --timeout 60s" \
  my-web-service

This allocates 512MB with 2 workers — useful for testing what happens when a service approaches its memory limit.

Docker Compose Integration

Pumba works naturally with Docker Compose environments:

# docker-compose.yml
version: '3'
services:
  api:
    image: my-api:latest
    labels:
      role: api
    depends_on:
      - database
      - redis

  database:
    image: postgres:15
    labels:
      role: database

  redis:
    image: redis:7
    labels:
      role: cache

Run chaos against the database service:

# Add 1-second latency to database container
pumba netem --duration 60s delay \
  --<span class="hljs-keyword">time 1000 \
  api_database_1

Or target by pattern:

# Target any container whose name starts with "api"
pumba netem --duration 60s loss \
  --percent 20 \
  <span class="hljs-string">"re2:api_.*"

Integration Test Pattern

The most valuable use of Pumba: inject chaos during your integration test run.

#!/bin/bash
<span class="hljs-comment"># run-chaos-tests.sh

<span class="hljs-comment"># Start the application stack
docker-compose up -d

<span class="hljs-comment"># Wait for services to be healthy
<span class="hljs-built_in">sleep 10

<span class="hljs-comment"># Start chaos in the background
pumba --interval 15s <span class="hljs-built_in">kill <span class="hljs-string">"re2:myapp_worker_.*" &
PUMBA_PID=$!

<span class="hljs-comment"># Run your test suite
pytest tests/integration/ --<span class="hljs-built_in">timeout=120
TEST_RESULT=$?

<span class="hljs-comment"># Stop chaos
<span class="hljs-built_in">kill <span class="hljs-variable">$PUMBA_PID

<span class="hljs-comment"># Cleanup
docker-compose down

<span class="hljs-built_in">exit <span class="hljs-variable">$TEST_RESULT

If your integration tests fail under container chaos, you've found a resilience gap. Fix the application (add retries, handle worker restarts) until the tests pass with chaos running.

Chaos as a CI Step

# .github/workflows/integration-tests.yml
name: Integration Tests with Chaos
on: [push]

jobs:
  chaos-integration:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Install Pumba
        run: |
          curl -L https://github.com/alexei-led/pumba/releases/latest/download/pumba_linux_amd64 \
            -o /usr/local/bin/pumba
          chmod +x /usr/local/bin/pumba

      - name: Start application stack
        run: docker-compose up -d

      - name: Wait for healthy state
        run: |
          until docker-compose exec -T api curl -f http://localhost:8080/health; do
            sleep 2
          done

      - name: Run background chaos
        run: |
          pumba --interval 20s kill "re2:${COMPOSE_PROJECT_NAME}_worker_.*" &
          echo "PUMBA_PID=$!" >> $GITHUB_ENV

      - name: Run integration tests
        run: pytest tests/integration/ --tb=short

      - name: Stop chaos
        if: always()
        run: kill $PUMBA_PID || true

      - name: Collect logs
        if: always()
        run: docker-compose logs > docker-logs.txt

      - name: Cleanup
        if: always()
        run: docker-compose down -v

      - name: Upload logs
        uses: actions/upload-artifact@v3
        if: failure()
        with:
          name: chaos-test-logs
          path: docker-logs.txt

Network Chaos in CI (tc requirements)

Pumba's network chaos uses Linux's tc command. In CI environments, this requires:

  1. A Linux runner (GitHub Actions ubuntu-latest works)
  2. The iproute2 package installed (usually present in Ubuntu)
  3. Sufficient privileges (run with --privileged if inside Docker)

Verify tc is available:

which tc
<span class="hljs-comment"># /usr/sbin/tc

tc --<span class="hljs-built_in">help

If tc isn't available, you can use Pumba's -tc-image flag to inject tc from a Docker image:

pumba --tc-image gaiaadm/pumba netem --duration 30s delay \
  --time 500 \
  my-container

Pumba vs LitmusChaos vs Toxiproxy

Tool Environment Approach Best For
Pumba Docker/local Direct container manipulation Local dev, Docker Compose CI
LitmusChaos Kubernetes CRD-based experiments Kubernetes production
Toxiproxy Any (proxy) TCP-level network interception Integration tests, precise control

Pumba shines in local Docker Compose environments and Docker-based CI. It requires no cluster, no CRDs, no proxy configuration — just the Docker socket and a binary. For teams not yet on Kubernetes, it's the fastest path to meaningful chaos testing.

Toxiproxy is better when you need precise, per-connection control — injecting latency on only the connection between service A and database B, while leaving other connections alone.

LitmusChaos is better when you're running in Kubernetes and want experiments managed as Kubernetes resources with proper RBAC, scheduling, and observability integration.

Pumba's simplicity is its greatest asset. Running pumba kill my-service during your tests takes five minutes to set up and immediately tells you whether your service handles restarts correctly. Start there, then graduate to more complex tools as your resilience engineering practice matures.

Read more