Toxiproxy: Simulating Network Conditions for Testing
Toxiproxy sits between your application and a downstream service and injects network problems: latency, bandwidth limits, dropped connections, slow responses, and timeouts. Unlike infrastructure-level chaos tools, Toxiproxy works at the TCP level and is easy to control programmatically — making it ideal for integration tests that need to verify how your application handles network failures.
What Toxiproxy Simulates
Toxiproxy doesn't destroy servers or kill processes. It intercepts TCP connections and applies "toxics" — transformations that degrade the connection:
- Latency: Add delay to every packet (simulates slow network)
- Slow close: Keep connection alive longer than expected (simulates TCP RST on timeout)
- Timeout: Close connections after a specified duration
- Bandwidth: Limit connection throughput (simulates slow network link)
- Slicer: Cut data into smaller chunks, forcing multiple TCP segments
- Limit data: Close connection after N bytes
- Reset peer: Send TCP RST to the client
The critical difference between Toxiproxy and other chaos tools: Toxiproxy is programmable during tests. You can start a test with normal conditions, inject latency mid-test, and remove it — all within a single test run. This enables integration tests that verify real failure scenarios.
Installation
Standalone binary:
# macOS
brew install toxiproxy
<span class="hljs-comment"># Linux (download binary)
wget https://github.com/Shopify/toxiproxy/releases/latest/download/toxiproxy-server-linux-amd64 \
-O toxiproxy-server
<span class="hljs-built_in">chmod +x toxiproxy-server
./toxiproxy-serverDocker:
docker run -d \
--name toxiproxy \
-p 8474:8474 \ # API port
-p 3306:3306 \ <span class="hljs-comment"># Forward to MySQL
-p 6379:6379 \ <span class="hljs-comment"># Forward to Redis
ghcr.io/shopify/toxiproxyVerify the server is running:
curl http://localhost:8474/version
{"version":<span class="hljs-string">"2.5.0"}Creating Your First Proxy
The Toxiproxy API creates named proxies that forward traffic:
# Create a proxy from localhost:3306 -> your-mysql-host:3306
curl -X POST http://localhost:8474/proxies \
-H <span class="hljs-string">"Content-Type: application/json" \
-d <span class="hljs-string">'{
"name": "mysql",
"listen": "0.0.0.0:3306",
"upstream": "your-mysql-host:3306",
"enabled": true
}'Now configure your application to connect to localhost:3306 instead of your-mysql-host:3306. All traffic routes through Toxiproxy, which forwards it normally until you add toxics.
Adding Toxics
Latency
Add 500ms latency with 100ms jitter to all database queries:
curl -X POST http://localhost:8474/proxies/mysql/toxics \
-H "Content-Type: application/json" \
-d <span class="hljs-string">'{
"name": "latency",
"type": "latency",
"attributes": {
"latency": 500,
"jitter": 100
}
}'Remove the toxic when done:
curl -X DELETE http://localhost:8474/proxies/mysql/toxics/latencyBandwidth Limiting
Simulate a slow network link:
curl -X POST http://localhost:8474/proxies/mysql/toxics \
-H "Content-Type: application/json" \
-d <span class="hljs-string">'{
"name": "slow-bandwidth",
"type": "bandwidth",
"attributes": {
"rate": 100
}
}'rate: 100 means 100 KB/s. A normal database query returning 1MB of data will take 10 seconds.
Timeout
Close the connection after a specified duration:
curl -X POST http://localhost:8474/proxies/mysql/toxics \
-H "Content-Type: application/json" \
-d <span class="hljs-string">'{
"name": "timeout",
"type": "timeout",
"attributes": {
"timeout": 3000
}
}'Connections are reset after 3,000ms. This tests whether your application has appropriate query timeouts configured and handles connection reset errors gracefully.
Partial Traffic Impact
Apply toxics to only a percentage of connections using toxicity:
curl -X POST http://localhost:8474/proxies/mysql/toxics \
-H "Content-Type: application/json" \
-d <span class="hljs-string">'{
"name": "intermittent-latency",
"type": "latency",
"toxicity": 0.5,
"attributes": {
"latency": 2000
}
}'toxicity: 0.5 means 50% of connections experience the 2-second latency. The rest are normal. This simulates flaky networks or partially degraded services.
CLI Tool
Toxiproxy ships with a CLI for interactive use:
# Install the CLI
brew install toxiproxy-cli
<span class="hljs-comment"># List proxies
toxiproxy-cli list
<span class="hljs-comment"># Create a proxy
toxiproxy-cli create redis --listen 127.0.0.1:6380 --upstream redis:6379
<span class="hljs-comment"># Add a latency toxic
toxiproxy-cli toxic add redis --<span class="hljs-built_in">type latency --attribute latency=500
<span class="hljs-comment"># List toxics on a proxy
toxiproxy-cli inspect redis
<span class="hljs-comment"># Remove a toxic
toxiproxy-cli toxic remove redis --toxicName latency
<span class="hljs-comment"># Disable a proxy entirely (simulate complete outage)
toxiproxy-cli toggle redisIntegration with Tests
The real power of Toxiproxy is programmatic control within tests. The Go client is most complete, but clients exist for many languages:
Go
package main
import (
"testing"
toxiproxy "github.com/Shopify/toxiproxy/v2/client"
)
func TestDatabaseTimeout(t *testing.T) {
client := toxiproxy.NewClient("localhost:8474")
// Create proxy
proxy, err := client.CreateProxy("mysql-test", "localhost:33060", "mysql:3306")
if err != nil {
t.Fatal(err)
}
defer proxy.Delete()
// Normal behavior — should work
err = queryDatabase("localhost:33060")
if err != nil {
t.Fatal("Expected success:", err)
}
// Add timeout toxic — query should fail
_, err = proxy.AddToxic("timeout", "timeout", "downstream", 1.0, toxiproxy.Attributes{
"timeout": 100, // 100ms timeout
})
if err != nil {
t.Fatal(err)
}
// This should now fail with a timeout error
err = queryDatabase("localhost:33060")
if err == nil {
t.Error("Expected timeout error, got none")
}
if !isTimeoutError(err) {
t.Errorf("Expected timeout error, got: %v", err)
}
}Python
import requests
import psycopg2
import pytest
TOXIPROXY_API = "http://localhost:8474"
class TestDatabaseResilience:
def setup_method(self):
# Create proxy for this test
response = requests.post(f"{TOXIPROXY_API}/proxies", json={
"name": "postgres-test",
"listen": "localhost:5433",
"upstream": "postgres:5432",
"enabled": True
})
response.raise_for_status()
def teardown_method(self):
requests.delete(f"{TOXIPROXY_API}/proxies/postgres-test")
def add_latency(self, latency_ms: int):
requests.post(
f"{TOXIPROXY_API}/proxies/postgres-test/toxics",
json={
"name": "latency",
"type": "latency",
"attributes": {"latency": latency_ms}
}
)
def remove_latency(self):
requests.delete(f"{TOXIPROXY_API}/proxies/postgres-test/toxics/latency")
def test_slow_query_timeout(self):
# Normal connection works
conn = psycopg2.connect(host="localhost", port=5433, dbname="test")
conn.close()
# Add 5-second latency
self.add_latency(5000)
# Connection with 1-second timeout should fail
with pytest.raises(psycopg2.OperationalError):
psycopg2.connect(
host="localhost",
port=5433,
dbname="test",
connect_timeout=1
)Node.js
const toxiproxyClient = require('toxiproxy-node-client');
const toxiproxy = new toxiproxyClient.Toxiproxy('http://localhost:8474');
describe('Redis resilience', () => {
let proxy;
beforeEach(async () => {
proxy = await toxiproxy.createProxy({
name: 'redis-test',
listen: '127.0.0.1:6380',
upstream: 'redis:6379'
});
});
afterEach(async () => {
await proxy.delete();
});
test('handles connection reset gracefully', async () => {
const client = createRedisClient('localhost', 6380);
// Normal set/get works
await client.set('key', 'value');
expect(await client.get('key')).toBe('value');
// Add reset peer toxic
await proxy.addToxic({
name: 'reset',
type: 'reset_peer',
toxicity: 1.0,
attributes: { timeout: 0 } // Immediate reset
});
// Get should throw
await expect(client.get('key')).rejects.toThrow();
// Remove toxic
await proxy.removeToxic('reset');
// Should recover
expect(await client.get('key')).toBe('value');
});
});Docker Compose Setup
For integration test environments:
# docker-compose.test.yml
version: '3.8'
services:
toxiproxy:
image: ghcr.io/shopify/toxiproxy:2.5.0
ports:
- "8474:8474" # API
- "5433:5433" # Postgres proxy
- "6380:6380" # Redis proxy
command: -host 0.0.0.0
postgres:
image: postgres:15
environment:
POSTGRES_DB: testdb
POSTGRES_PASSWORD: password
# Note: not exposed directly — connect through toxiproxy
redis:
image: redis:7
# Note: not exposed directly — connect through toxiproxy
toxiproxy-setup:
image: curlimages/curl
depends_on:
- toxiproxy
command: |
sh -c "
sleep 2
curl -X POST http://toxiproxy:8474/proxies -H 'Content-Type: application/json' -d '{\"name\":\"postgres\",\"listen\":\"0.0.0.0:5433\",\"upstream\":\"postgres:5432\",\"enabled\":true}'
curl -X POST http://toxiproxy:8474/proxies -H 'Content-Type: application/json' -d '{\"name\":\"redis\",\"listen\":\"0.0.0.0:6380\",\"upstream\":\"redis:6379\",\"enabled\":true}'
"Your application connects to toxiproxy:5433 (Postgres) and toxiproxy:6380 (Redis) instead of the services directly. Tests manipulate Toxiproxy via the API to inject failures.
Kubernetes Deployment
In Kubernetes, run Toxiproxy as a sidecar or as a cluster-wide proxy service:
# toxiproxy-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: toxiproxy
namespace: testing
spec:
replicas: 1
selector:
matchLabels:
app: toxiproxy
template:
metadata:
labels:
app: toxiproxy
spec:
containers:
- name: toxiproxy
image: ghcr.io/shopify/toxiproxy:2.5.0
args: ["-host", "0.0.0.0"]
ports:
- containerPort: 8474 # API
- containerPort: 5432 # DB proxy
livenessProbe:
httpGet:
path: /version
port: 8474
initialDelaySeconds: 5
---
apiVersion: v1
kind: Service
metadata:
name: toxiproxy
namespace: testing
spec:
selector:
app: toxiproxy
ports:
- name: api
port: 8474
targetPort: 8474
- name: db
port: 5432
targetPort: 5432Configure your application to use toxiproxy.testing:5432 as its database host in test environments.
Real Scenarios Toxiproxy Tests
Scenario 1: Slow database under load
Your application handles 1ms database queries fine. But what about 500ms queries? Does your connection pool exhaust? Do request handlers time out correctly? Do users see errors or just slow responses?
# Test: add 500ms latency, run load test, verify error rate stays below 1%
add_latency(500)
results = run_load_test(requests_per_second=100, duration=30)
assert results.error_rate < 0.01Scenario 2: Downstream service intermittent failures
Your microservice calls a payment service. 20% of the time, that service is slow (flaky network). Does your retry logic handle this? Does it retry with exponential backoff?
add_latency(3000, toxicity=0.2) # 20% of connections get 3s latency
# Run tests — verify retries work, check success rate, check retry countScenario 3: Complete outage recovery
Disable the proxy entirely to simulate a full downstream outage:
curl -X POST http://localhost:8474/proxies/redis/toxics \
-d '{"name": "down", "type": "timeout", "attributes": {"timeout": 0}}'Then re-enable it and verify your application recovers without restart.
Toxiproxy vs Infrastructure-Level Chaos
Toxiproxy operates at the application network level — it intercepts connections between your services. Infrastructure chaos tools (FIS, LitmusChaos) operate at the resource level — they kill VMs or pods.
Use Toxiproxy when:
- You want precise, reproducible network conditions in integration tests
- You need programmatic control within test code
- You're testing retry logic, circuit breakers, or timeout handling
- You want to test without affecting infrastructure
Use infrastructure chaos when:
- You want to test actual infrastructure failure recovery
- You need to verify Kubernetes pod restart behavior
- You're testing health checks and load balancer behavior
- You want scheduled chaos that runs continuously
Toxiproxy is the right tool for the integration test layer — where you need network failures to be deterministic and reversible within a test. Infrastructure chaos is right for the resilience verification layer — where you need to validate that production-like failures are handled correctly.