Envoy Proxy Testing: xDS Config, Filter Chains, and Integration Tests
Envoy Proxy powers many service meshes (Istio, AWS App Mesh) and API gateways (Contour, Emissary-Ingress). It's highly capable and highly complex. Testing Envoy configurations before deploying them prevents outages caused by misconfigured filter chains, broken xDS configs, and subtle routing mistakes.
This guide covers testing Envoy's configuration validation, filter chain behavior, and integration correctness.
Envoy's Configuration Model
Envoy is configured through a hierarchy of objects:
- Listeners — bind to a port and accept connections
- Filter chains — process traffic for a listener; selected based on connection properties
- HTTP connection manager — the main HTTP filter; contains the HTTP filter chain
- HTTP filters — process HTTP requests in order (Router, Rate Limit, JWT Auth, etc.)
- Clusters — define upstream endpoints that Envoy forwards traffic to
- Routes — map request properties to clusters
In production, this config is delivered dynamically via the xDS API (from a control plane like Istio's Pilot). For testing, you can use static bootstrap config or a test xDS server.
Configuration Validation
Static Config Validation
Envoy ships with a --mode validate flag that checks config syntax without starting the proxy:
# Validate a static config file
docker run --<span class="hljs-built_in">rm \
-v $(<span class="hljs-built_in">pwd)/envoy.yaml:/etc/envoy/envoy.yaml \
envoyproxy/envoy:v1.29.0 \
envoy --config-path /etc/envoy/envoy.yaml --mode validate
<span class="hljs-comment"># Expected output for valid config:
<span class="hljs-comment"># configuration '/etc/envoy/envoy.yaml' OK
<span class="hljs-comment"># For invalid config:
<span class="hljs-comment"># [error] …Field 'rate_limit_service' is not set.Integrate this in CI on every config change:
# .github/workflows/envoy-validate.yml
name: Envoy Config Validation
on:
pull_request:
paths:
- 'envoy/**'
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Validate all Envoy configs
run: |
for config in envoy/*.yaml; do
echo "Validating $config..."
docker run --rm \
-v $(pwd)/$config:/etc/envoy/envoy.yaml \
envoyproxy/envoy:v1.29.0 \
envoy --config-path /etc/envoy/envoy.yaml --mode validate
echo "$config: OK"
doneJSON Schema Validation
For programmatic validation, use Envoy's protobuf definitions compiled to JSON Schema:
import yaml
import json
import pytest
from jsonschema import validate, ValidationError
def load_yaml(path):
with open(path) as f:
return yaml.safe_load(f)
def test_envoy_bootstrap_has_required_sections():
config = load_yaml("envoy/envoy.yaml")
assert "static_resources" in config or "dynamic_resources" in config
assert "admin" in config
if "static_resources" in config:
assert "listeners" in config["static_resources"]
assert "clusters" in config["static_resources"]
def test_cluster_health_checks_defined():
config = load_yaml("envoy/envoy.yaml")
clusters = config["static_resources"]["clusters"]
for cluster in clusters:
# Production clusters should have health checks
if cluster.get("name") != "local_access_log": # Exempt utility clusters
assert "health_checks" in cluster, \
f"Cluster '{cluster['name']}' missing health_checks"
def test_timeouts_configured():
config = load_yaml("envoy/envoy.yaml")
listeners = config["static_resources"]["listeners"]
for listener in listeners:
for fc in listener.get("filter_chains", []):
for f in fc.get("filters", []):
if f.get("name") == "envoy.filters.network.http_connection_manager":
hcm = f["typed_config"]
route_config = hcm.get("route_config", {})
for vhost in route_config.get("virtual_hosts", []):
for route in vhost.get("routes", []):
timeout = route.get("route", {}).get("timeout")
assert timeout is not None, \
f"Route missing timeout: {route}"
def test_no_wildcard_cors():
config = load_yaml("envoy/envoy.yaml")
config_str = json.dumps(config)
# Detect overly permissive CORS
assert '"allow_origin_string_match"' not in config_str or \
'"prefix": "*"' not in config_str, \
"Wildcard CORS origin detected — use explicit origins"Integration Testing
Spin up Envoy with a test upstream and verify behavior end-to-end.
# docker-compose.test.yml
version: "3.8"
services:
envoy:
image: envoyproxy/envoy:v1.29.0
command: envoy -c /etc/envoy/envoy-test.yaml
ports:
- "10000:10000" # Proxy port
- "9901:9901" # Admin port
volumes:
- ./envoy/envoy-test.yaml:/etc/envoy/envoy-test.yaml
upstream:
image: kennethreitz/httpbin
ports:
- "8080:80"# envoy/envoy-test.yaml
static_resources:
listeners:
- name: listener_0
address:
socket_address:
protocol: TCP
address: 0.0.0.0
port_value: 10000
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_http
access_log:
- name: envoy.access_loggers.stdout
typed_config:
"@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog
http_filters:
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
route_config:
name: local_route
virtual_hosts:
- name: local_service
domains: ["*"]
routes:
- match:
prefix: "/"
route:
cluster: upstream_service
timeout: 30s
clusters:
- name: upstream_service
connect_timeout: 5s
type: LOGICAL_DNS
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: upstream_service
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: upstream
port_value: 80
admin:
address:
socket_address:
protocol: TCP
address: 0.0.0.0
port_value: 9901Basic Proxy Tests
import pytest
import httpx
import time
PROXY = "http://localhost:10000"
ADMIN = "http://localhost:9901"
@pytest.fixture(scope="session", autouse=True)
def wait_for_envoy():
for _ in range(30):
try:
r = httpx.get(f"{ADMIN}/ready")
if r.status_code == 200:
return
except httpx.ConnectError:
pass
time.sleep(1)
raise RuntimeError("Envoy did not become ready")
def test_proxy_forwards_get_request():
r = httpx.get(f"{PROXY}/get")
assert r.status_code == 200
assert "args" in r.json()
def test_proxy_preserves_request_headers():
r = httpx.get(
f"{PROXY}/headers",
headers={"X-Custom-Header": "test-value"}
)
assert r.status_code == 200
received = r.json()["headers"]
assert received.get("X-Custom-Header") == "test-value"
def test_proxy_adds_via_header():
"""Envoy adds a Via header to identify proxy hops."""
r = httpx.get(f"{PROXY}/headers")
received = r.json()["headers"]
assert "Via" in received
def test_unknown_route_returns_404():
r = httpx.get(f"{PROXY}/this-route-does-not-exist-xyz")
# Depends on config — might be 404 from upstream or 404 from Envoy routing
assert r.status_code in (404, 503)HTTP Filter Testing
JWT Authentication Filter
# Add to http_filters in envoy-test.yaml (before router):
- name: envoy.filters.http.jwt_authn
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.jwt_authn.v3.JwtAuthentication
providers:
test_provider:
issuer: "https://test-issuer.example.com"
local_jwks:
inline_string: |
{
"keys": [{
"kty": "oct",
"kid": "test-key",
"alg": "HS256",
"k": "c2VjcmV0"
}]
}
forward: true
payload_in_metadata: "jwt_payload"
rules:
- match:
prefix: "/protected"
requires:
provider_name: "test_provider"
- match:
prefix: "/public"
# No requires = public endpointimport jwt as pyjwt
from datetime import datetime, timedelta
def make_test_jwt(expired=False, wrong_issuer=False):
payload = {
"sub": "test-user",
"iss": "https://evil.example.com" if wrong_issuer else "https://test-issuer.example.com",
"exp": datetime.utcnow() + (timedelta(hours=-1) if expired else timedelta(hours=1))
}
return pyjwt.encode(payload, "secret", algorithm="HS256")
def test_protected_route_requires_jwt():
r = httpx.get(f"{PROXY}/protected/resource")
assert r.status_code == 401
def test_valid_jwt_accesses_protected_route():
token = make_test_jwt()
r = httpx.get(
f"{PROXY}/protected/resource",
headers={"Authorization": f"Bearer {token}"}
)
assert r.status_code == 200
def test_expired_jwt_rejected():
token = make_test_jwt(expired=True)
r = httpx.get(
f"{PROXY}/protected/resource",
headers={"Authorization": f"Bearer {token}"}
)
assert r.status_code == 401
# Envoy returns Www-Authenticate on JWT failures
assert "Www-Authenticate" in r.headers or "WWW-Authenticate" in r.headers
def test_wrong_issuer_rejected():
token = make_test_jwt(wrong_issuer=True)
r = httpx.get(
f"{PROXY}/protected/resource",
headers={"Authorization": f"Bearer {token}"}
)
assert r.status_code == 401
def test_public_route_accessible_without_jwt():
r = httpx.get(f"{PROXY}/public/health")
assert r.status_code == 200Rate Limiting Filter
Envoy's rate limiting filter calls an external rate limit service (like Lyft's ratelimit). Test the integration:
def test_rate_limit_headers_present():
"""Envoy should return rate limit headers from the rate limit service."""
r = httpx.get(f"{PROXY}/get", headers={"X-User-Id": "test-user"})
# Rate limit headers vary by configuration:
assert any(h in r.headers for h in [
"x-ratelimit-limit",
"x-envoy-ratelimited",
"ratelimit-limit"
])
def test_rate_limited_request_returns_429():
"""After exhausting limits, Envoy returns 429 with retry-after."""
headers = {"X-User-Id": "heavy-user"}
# Make many requests to trigger rate limiting
for _ in range(200):
r = httpx.get(f"{PROXY}/get", headers=headers)
if r.status_code == 429:
assert "retry-after" in r.headers or "x-envoy-ratelimited" in r.headers
return
# If no 429 was returned, check that rate limit service is actually configured
pytest.skip("Rate limit not triggered — check rate limit service configuration")xDS Dynamic Configuration Testing
When using dynamic xDS configuration, test that your control plane delivers valid configs and that Envoy applies them correctly.
import grpc
from envoy.service.discovery.v3 import ads_pb2_grpc
def test_control_plane_reachable():
"""Envoy should be able to connect to the xDS control plane."""
admin_stats = httpx.get(f"{ADMIN}/stats")
stats_text = admin_stats.text
# Check xDS connection is established
assert "xds_grpc" in stats_text, "No xDS gRPC stats found"
# Check for update failures
cds_failures = [
line for line in stats_text.splitlines()
if "cds.update_failure" in line
]
assert not any(
int(line.split(":")[-1].strip()) > 0
for line in cds_failures
), f"CDS update failures detected: {cds_failures}"
def test_lds_config_applied():
"""Listener Discovery Service config should be applied."""
admin_stats = httpx.get(f"{ADMIN}/stats")
# lds.update_success should be > 0
for line in admin_stats.text.splitlines():
if "lds.update_success" in line:
count = int(line.split(":")[-1].strip())
assert count > 0, "LDS updates not applied"
return
pytest.fail("lds.update_success stat not found")
def test_cluster_discovery_applied():
"""Check clusters from CDS are loaded."""
clusters_response = httpx.get(f"{ADMIN}/clusters?format=json")
clusters = clusters_response.json()
cluster_names = [c["name"] for c in clusters.get("cluster_statuses", [])]
assert "upstream_service" in cluster_names, \
f"Expected cluster not found. Got: {cluster_names}"Admin API Testing
Envoy's admin interface at port 9901 exposes runtime information and controls. Test it as part of integration.
def test_admin_ready_endpoint():
r = httpx.get(f"{ADMIN}/ready")
assert r.status_code == 200
assert r.text.strip() == "LIVE"
def test_admin_health_check():
r = httpx.get(f"{ADMIN}/healthcheck/ok")
assert r.status_code == 200
def test_stats_accessible():
r = httpx.get(f"{ADMIN}/stats")
assert r.status_code == 200
stats = r.text
assert "http.ingress_http.rq_total" in stats
def test_config_dump_readable():
"""Admin config dump should return valid JSON."""
r = httpx.get(f"{ADMIN}/config_dump")
assert r.status_code == 200
config = r.json()
assert "configs" in config
def test_upstream_cluster_healthy():
"""All configured clusters should have healthy endpoints."""
r = httpx.get(f"{ADMIN}/clusters?format=json")
assert r.status_code == 200
data = r.json()
for cluster in data.get("cluster_statuses", []):
for host in cluster.get("host_statuses", []):
health = host.get("health_status", {})
eds_health = health.get("eds_health_status", "HEALTHY")
assert eds_health == "HEALTHY", \
f"Unhealthy host in cluster '{cluster['name']}': {host['address']}"Outlier Detection Testing
Envoy can automatically eject unhealthy hosts. Verify this works.
def test_outlier_ejection_on_consecutive_errors(flaky_upstream):
"""After consecutive 5xx errors, Envoy ejects the upstream host."""
# Configure upstream to return 500s
flaky_upstream.set_response_code(500)
# Make requests until Envoy ejects the host (or times out)
ejections = 0
for _ in range(10):
r = httpx.get(f"{PROXY}/get")
time.sleep(0.1)
# Check admin stats for ejections
stats = httpx.get(f"{ADMIN}/stats").text
ejection_stats = [
line for line in stats.splitlines()
if "outlier_detection.ejections_active" in line
]
assert any(
int(line.split(":")[-1].strip()) > 0
for line in ejection_stats
), "No outlier ejections detected despite 5xx responses"Filter Chain Testing
Filter chain matching determines which set of filters handles a connection. Test that the right chain is selected.
def test_tls_traffic_matches_tls_chain():
"""TLS connections should be handled by the TLS filter chain."""
# Connect via HTTPS (if TLS listener configured)
try:
r = httpx.get(
"https://localhost:10443/get",
verify=False # Test cert — not for production
)
assert r.status_code == 200
except httpx.ConnectError:
pytest.skip("TLS listener not configured in this test environment")
def test_plaintext_traffic_matches_plaintext_chain():
"""Non-TLS connections use the plaintext filter chain."""
r = httpx.get(f"{PROXY}/get")
assert r.status_code == 200
def test_filter_chain_order_matters():
"""Verify filter order: auth → rate limit → router."""
# Without credentials: should fail at auth (401), not rate limit (429)
r = httpx.get(f"{PROXY}/protected/resource")
assert r.status_code == 401 # Auth filter runs first
# Spam requests without auth — should still get 401, not 429
for _ in range(50):
r = httpx.get(f"{PROXY}/protected/resource")
assert r.status_code == 401, \
"Got non-401 — rate limiter may be running before auth filter"Istio Integration Testing
When Envoy runs as an Istio sidecar, test your VirtualServices and DestinationRules.
# Validate Istio config before applying
istioctl analyze -n your-namespace
<span class="hljs-comment"># Check proxy config for a specific pod
istioctl proxy-config routes your-pod -n your-namespace --name 80
<span class="hljs-comment"># Verify listeners
istioctl proxy-config listeners your-pod -n your-namespace
<span class="hljs-comment"># Check clusters
istioctl proxy-config clusters your-pod -n your-namespaceimport subprocess
import json
def test_istio_virtual_service_applied(pod_name, namespace):
"""VirtualService route rules should appear in proxy config."""
result = subprocess.run(
["istioctl", "proxy-config", "routes", pod_name,
"-n", namespace, "--name", "80", "-o", "json"],
capture_output=True, text=True
)
assert result.returncode == 0
routes = json.loads(result.stdout)
route_names = [r.get("name") for r in routes]
assert any("my-service" in name for name in route_names), \
f"VirtualService not found in proxy routes. Got: {route_names}"Continuous Testing with HelpMeTest
Envoy configurations can drift from what you tested. Use HelpMeTest for continuous validation:
Go To http://envoy-proxy/ready
Status Should Be 200
Go To http://envoy-admin:9901/stats
Page Should Contain http.ingress_http.rq_totalSchedule these checks to run every 5 minutes so you know immediately when Envoy's behavior changes after a config update.