Toxiproxy: Simulating Network Conditions for Testing

Toxiproxy: Simulating Network Conditions for Testing

Toxiproxy sits between your application and a downstream service and injects network problems: latency, bandwidth limits, dropped connections, slow responses, and timeouts. Unlike infrastructure-level chaos tools, Toxiproxy works at the TCP level and is easy to control programmatically — making it ideal for integration tests that need to verify how your application handles network failures.

What Toxiproxy Simulates

Toxiproxy doesn't destroy servers or kill processes. It intercepts TCP connections and applies "toxics" — transformations that degrade the connection:

  • Latency: Add delay to every packet (simulates slow network)
  • Slow close: Keep connection alive longer than expected (simulates TCP RST on timeout)
  • Timeout: Close connections after a specified duration
  • Bandwidth: Limit connection throughput (simulates slow network link)
  • Slicer: Cut data into smaller chunks, forcing multiple TCP segments
  • Limit data: Close connection after N bytes
  • Reset peer: Send TCP RST to the client

The critical difference between Toxiproxy and other chaos tools: Toxiproxy is programmable during tests. You can start a test with normal conditions, inject latency mid-test, and remove it — all within a single test run. This enables integration tests that verify real failure scenarios.

Installation

Standalone binary:

# macOS
brew install toxiproxy

<span class="hljs-comment"># Linux (download binary)
wget https://github.com/Shopify/toxiproxy/releases/latest/download/toxiproxy-server-linux-amd64 \
  -O toxiproxy-server
<span class="hljs-built_in">chmod +x toxiproxy-server
./toxiproxy-server

Docker:

docker run -d \
  --name toxiproxy \
  -p 8474:8474 \   # API port
  -p 3306:3306 \   <span class="hljs-comment"># Forward to MySQL
  -p 6379:6379 \   <span class="hljs-comment"># Forward to Redis
  ghcr.io/shopify/toxiproxy

Verify the server is running:

curl http://localhost:8474/version
{"version":<span class="hljs-string">"2.5.0"}

Creating Your First Proxy

The Toxiproxy API creates named proxies that forward traffic:

# Create a proxy from localhost:3306 -> your-mysql-host:3306
curl -X POST http://localhost:8474/proxies \
  -H <span class="hljs-string">"Content-Type: application/json" \
  -d <span class="hljs-string">'{
    "name": "mysql",
    "listen": "0.0.0.0:3306",
    "upstream": "your-mysql-host:3306",
    "enabled": true
  }'

Now configure your application to connect to localhost:3306 instead of your-mysql-host:3306. All traffic routes through Toxiproxy, which forwards it normally until you add toxics.

Adding Toxics

Latency

Add 500ms latency with 100ms jitter to all database queries:

curl -X POST http://localhost:8474/proxies/mysql/toxics \
  -H "Content-Type: application/json" \
  -d <span class="hljs-string">'{
    "name": "latency",
    "type": "latency",
    "attributes": {
      "latency": 500,
      "jitter": 100
    }
  }'

Remove the toxic when done:

curl -X DELETE http://localhost:8474/proxies/mysql/toxics/latency

Bandwidth Limiting

Simulate a slow network link:

curl -X POST http://localhost:8474/proxies/mysql/toxics \
  -H "Content-Type: application/json" \
  -d <span class="hljs-string">'{
    "name": "slow-bandwidth",
    "type": "bandwidth",
    "attributes": {
      "rate": 100
    }
  }'

rate: 100 means 100 KB/s. A normal database query returning 1MB of data will take 10 seconds.

Timeout

Close the connection after a specified duration:

curl -X POST http://localhost:8474/proxies/mysql/toxics \
  -H "Content-Type: application/json" \
  -d <span class="hljs-string">'{
    "name": "timeout",
    "type": "timeout",
    "attributes": {
      "timeout": 3000
    }
  }'

Connections are reset after 3,000ms. This tests whether your application has appropriate query timeouts configured and handles connection reset errors gracefully.

Partial Traffic Impact

Apply toxics to only a percentage of connections using toxicity:

curl -X POST http://localhost:8474/proxies/mysql/toxics \
  -H "Content-Type: application/json" \
  -d <span class="hljs-string">'{
    "name": "intermittent-latency",
    "type": "latency",
    "toxicity": 0.5,
    "attributes": {
      "latency": 2000
    }
  }'

toxicity: 0.5 means 50% of connections experience the 2-second latency. The rest are normal. This simulates flaky networks or partially degraded services.

CLI Tool

Toxiproxy ships with a CLI for interactive use:

# Install the CLI
brew install toxiproxy-cli

<span class="hljs-comment"># List proxies
toxiproxy-cli list

<span class="hljs-comment"># Create a proxy
toxiproxy-cli create redis --listen 127.0.0.1:6380 --upstream redis:6379

<span class="hljs-comment"># Add a latency toxic
toxiproxy-cli toxic add redis --<span class="hljs-built_in">type latency --attribute latency=500

<span class="hljs-comment"># List toxics on a proxy
toxiproxy-cli inspect redis

<span class="hljs-comment"># Remove a toxic
toxiproxy-cli toxic remove redis --toxicName latency

<span class="hljs-comment"># Disable a proxy entirely (simulate complete outage)
toxiproxy-cli toggle redis

Integration with Tests

The real power of Toxiproxy is programmatic control within tests. The Go client is most complete, but clients exist for many languages:

Go

package main

import (
    "testing"
    toxiproxy "github.com/Shopify/toxiproxy/v2/client"
)

func TestDatabaseTimeout(t *testing.T) {
    client := toxiproxy.NewClient("localhost:8474")
    
    // Create proxy
    proxy, err := client.CreateProxy("mysql-test", "localhost:33060", "mysql:3306")
    if err != nil {
        t.Fatal(err)
    }
    defer proxy.Delete()
    
    // Normal behavior — should work
    err = queryDatabase("localhost:33060")
    if err != nil {
        t.Fatal("Expected success:", err)
    }
    
    // Add timeout toxic — query should fail
    _, err = proxy.AddToxic("timeout", "timeout", "downstream", 1.0, toxiproxy.Attributes{
        "timeout": 100,  // 100ms timeout
    })
    if err != nil {
        t.Fatal(err)
    }
    
    // This should now fail with a timeout error
    err = queryDatabase("localhost:33060")
    if err == nil {
        t.Error("Expected timeout error, got none")
    }
    if !isTimeoutError(err) {
        t.Errorf("Expected timeout error, got: %v", err)
    }
}

Python

import requests
import psycopg2
import pytest

TOXIPROXY_API = "http://localhost:8474"

class TestDatabaseResilience:
    def setup_method(self):
        # Create proxy for this test
        response = requests.post(f"{TOXIPROXY_API}/proxies", json={
            "name": "postgres-test",
            "listen": "localhost:5433",
            "upstream": "postgres:5432",
            "enabled": True
        })
        response.raise_for_status()
    
    def teardown_method(self):
        requests.delete(f"{TOXIPROXY_API}/proxies/postgres-test")
    
    def add_latency(self, latency_ms: int):
        requests.post(
            f"{TOXIPROXY_API}/proxies/postgres-test/toxics",
            json={
                "name": "latency",
                "type": "latency",
                "attributes": {"latency": latency_ms}
            }
        )
    
    def remove_latency(self):
        requests.delete(f"{TOXIPROXY_API}/proxies/postgres-test/toxics/latency")
    
    def test_slow_query_timeout(self):
        # Normal connection works
        conn = psycopg2.connect(host="localhost", port=5433, dbname="test")
        conn.close()
        
        # Add 5-second latency
        self.add_latency(5000)
        
        # Connection with 1-second timeout should fail
        with pytest.raises(psycopg2.OperationalError):
            psycopg2.connect(
                host="localhost",
                port=5433,
                dbname="test",
                connect_timeout=1
            )

Node.js

const toxiproxyClient = require('toxiproxy-node-client');

const toxiproxy = new toxiproxyClient.Toxiproxy('http://localhost:8474');

describe('Redis resilience', () => {
  let proxy;

  beforeEach(async () => {
    proxy = await toxiproxy.createProxy({
      name: 'redis-test',
      listen: '127.0.0.1:6380',
      upstream: 'redis:6379'
    });
  });

  afterEach(async () => {
    await proxy.delete();
  });

  test('handles connection reset gracefully', async () => {
    const client = createRedisClient('localhost', 6380);
    
    // Normal set/get works
    await client.set('key', 'value');
    expect(await client.get('key')).toBe('value');
    
    // Add reset peer toxic
    await proxy.addToxic({
      name: 'reset',
      type: 'reset_peer',
      toxicity: 1.0,
      attributes: { timeout: 0 }  // Immediate reset
    });
    
    // Get should throw
    await expect(client.get('key')).rejects.toThrow();
    
    // Remove toxic
    await proxy.removeToxic('reset');
    
    // Should recover
    expect(await client.get('key')).toBe('value');
  });
});

Docker Compose Setup

For integration test environments:

# docker-compose.test.yml
version: '3.8'
services:
  toxiproxy:
    image: ghcr.io/shopify/toxiproxy:2.5.0
    ports:
      - "8474:8474"    # API
      - "5433:5433"    # Postgres proxy
      - "6380:6380"    # Redis proxy
    command: -host 0.0.0.0

  postgres:
    image: postgres:15
    environment:
      POSTGRES_DB: testdb
      POSTGRES_PASSWORD: password
    # Note: not exposed directly — connect through toxiproxy

  redis:
    image: redis:7
    # Note: not exposed directly — connect through toxiproxy

  toxiproxy-setup:
    image: curlimages/curl
    depends_on:
      - toxiproxy
    command: |
      sh -c "
        sleep 2
        curl -X POST http://toxiproxy:8474/proxies -H 'Content-Type: application/json' -d '{\"name\":\"postgres\",\"listen\":\"0.0.0.0:5433\",\"upstream\":\"postgres:5432\",\"enabled\":true}'
        curl -X POST http://toxiproxy:8474/proxies -H 'Content-Type: application/json' -d '{\"name\":\"redis\",\"listen\":\"0.0.0.0:6380\",\"upstream\":\"redis:6379\",\"enabled\":true}'
      "

Your application connects to toxiproxy:5433 (Postgres) and toxiproxy:6380 (Redis) instead of the services directly. Tests manipulate Toxiproxy via the API to inject failures.

Kubernetes Deployment

In Kubernetes, run Toxiproxy as a sidecar or as a cluster-wide proxy service:

# toxiproxy-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: toxiproxy
  namespace: testing
spec:
  replicas: 1
  selector:
    matchLabels:
      app: toxiproxy
  template:
    metadata:
      labels:
        app: toxiproxy
    spec:
      containers:
        - name: toxiproxy
          image: ghcr.io/shopify/toxiproxy:2.5.0
          args: ["-host", "0.0.0.0"]
          ports:
            - containerPort: 8474   # API
            - containerPort: 5432   # DB proxy
          livenessProbe:
            httpGet:
              path: /version
              port: 8474
            initialDelaySeconds: 5
---
apiVersion: v1
kind: Service
metadata:
  name: toxiproxy
  namespace: testing
spec:
  selector:
    app: toxiproxy
  ports:
    - name: api
      port: 8474
      targetPort: 8474
    - name: db
      port: 5432
      targetPort: 5432

Configure your application to use toxiproxy.testing:5432 as its database host in test environments.

Real Scenarios Toxiproxy Tests

Scenario 1: Slow database under load

Your application handles 1ms database queries fine. But what about 500ms queries? Does your connection pool exhaust? Do request handlers time out correctly? Do users see errors or just slow responses?

# Test: add 500ms latency, run load test, verify error rate stays below 1%
add_latency(500)
results = run_load_test(requests_per_second=100, duration=30)
assert results.error_rate < 0.01

Scenario 2: Downstream service intermittent failures

Your microservice calls a payment service. 20% of the time, that service is slow (flaky network). Does your retry logic handle this? Does it retry with exponential backoff?

add_latency(3000, toxicity=0.2)  # 20% of connections get 3s latency
# Run tests — verify retries work, check success rate, check retry count

Scenario 3: Complete outage recovery

Disable the proxy entirely to simulate a full downstream outage:

curl -X POST http://localhost:8474/proxies/redis/toxics \
  -d '{"name": "down", "type": "timeout", "attributes": {"timeout": 0}}'

Then re-enable it and verify your application recovers without restart.

Toxiproxy vs Infrastructure-Level Chaos

Toxiproxy operates at the application network level — it intercepts connections between your services. Infrastructure chaos tools (FIS, LitmusChaos) operate at the resource level — they kill VMs or pods.

Use Toxiproxy when:

  • You want precise, reproducible network conditions in integration tests
  • You need programmatic control within test code
  • You're testing retry logic, circuit breakers, or timeout handling
  • You want to test without affecting infrastructure

Use infrastructure chaos when:

  • You want to test actual infrastructure failure recovery
  • You need to verify Kubernetes pod restart behavior
  • You're testing health checks and load balancer behavior
  • You want scheduled chaos that runs continuously

Toxiproxy is the right tool for the integration test layer — where you need network failures to be deterministic and reversible within a test. Infrastructure chaos is right for the resilience verification layer — where you need to validate that production-like failures are handled correctly.

Read more