Envoy Proxy Testing: xDS Config, Filter Chains, and Integration Tests

Envoy Proxy Testing: xDS Config, Filter Chains, and Integration Tests

Envoy Proxy powers many service meshes (Istio, AWS App Mesh) and API gateways (Contour, Emissary-Ingress). It's highly capable and highly complex. Testing Envoy configurations before deploying them prevents outages caused by misconfigured filter chains, broken xDS configs, and subtle routing mistakes.

This guide covers testing Envoy's configuration validation, filter chain behavior, and integration correctness.

Envoy's Configuration Model

Envoy is configured through a hierarchy of objects:

  • Listeners — bind to a port and accept connections
  • Filter chains — process traffic for a listener; selected based on connection properties
  • HTTP connection manager — the main HTTP filter; contains the HTTP filter chain
  • HTTP filters — process HTTP requests in order (Router, Rate Limit, JWT Auth, etc.)
  • Clusters — define upstream endpoints that Envoy forwards traffic to
  • Routes — map request properties to clusters

In production, this config is delivered dynamically via the xDS API (from a control plane like Istio's Pilot). For testing, you can use static bootstrap config or a test xDS server.

Configuration Validation

Static Config Validation

Envoy ships with a --mode validate flag that checks config syntax without starting the proxy:

# Validate a static config file
docker run --<span class="hljs-built_in">rm \
  -v $(<span class="hljs-built_in">pwd)/envoy.yaml:/etc/envoy/envoy.yaml \
  envoyproxy/envoy:v1.29.0 \
  envoy --config-path /etc/envoy/envoy.yaml --mode validate

<span class="hljs-comment"># Expected output for valid config:
<span class="hljs-comment"># configuration '/etc/envoy/envoy.yaml' OK

<span class="hljs-comment"># For invalid config:
<span class="hljs-comment"># [error] …Field 'rate_limit_service' is not set.

Integrate this in CI on every config change:

# .github/workflows/envoy-validate.yml
name: Envoy Config Validation

on:
  pull_request:
    paths:
      - 'envoy/**'

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Validate all Envoy configs
        run: |
          for config in envoy/*.yaml; do
            echo "Validating $config..."
            docker run --rm \
              -v $(pwd)/$config:/etc/envoy/envoy.yaml \
              envoyproxy/envoy:v1.29.0 \
              envoy --config-path /etc/envoy/envoy.yaml --mode validate
            echo "$config: OK"
          done

JSON Schema Validation

For programmatic validation, use Envoy's protobuf definitions compiled to JSON Schema:

import yaml
import json
import pytest
from jsonschema import validate, ValidationError

def load_yaml(path):
    with open(path) as f:
        return yaml.safe_load(f)

def test_envoy_bootstrap_has_required_sections():
    config = load_yaml("envoy/envoy.yaml")
    assert "static_resources" in config or "dynamic_resources" in config
    assert "admin" in config
    
    if "static_resources" in config:
        assert "listeners" in config["static_resources"]
        assert "clusters" in config["static_resources"]

def test_cluster_health_checks_defined():
    config = load_yaml("envoy/envoy.yaml")
    clusters = config["static_resources"]["clusters"]
    
    for cluster in clusters:
        # Production clusters should have health checks
        if cluster.get("name") != "local_access_log":  # Exempt utility clusters
            assert "health_checks" in cluster, \
                f"Cluster '{cluster['name']}' missing health_checks"

def test_timeouts_configured():
    config = load_yaml("envoy/envoy.yaml")
    listeners = config["static_resources"]["listeners"]
    
    for listener in listeners:
        for fc in listener.get("filter_chains", []):
            for f in fc.get("filters", []):
                if f.get("name") == "envoy.filters.network.http_connection_manager":
                    hcm = f["typed_config"]
                    route_config = hcm.get("route_config", {})
                    for vhost in route_config.get("virtual_hosts", []):
                        for route in vhost.get("routes", []):
                            timeout = route.get("route", {}).get("timeout")
                            assert timeout is not None, \
                                f"Route missing timeout: {route}"

def test_no_wildcard_cors():
    config = load_yaml("envoy/envoy.yaml")
    config_str = json.dumps(config)
    
    # Detect overly permissive CORS
    assert '"allow_origin_string_match"' not in config_str or \
           '"prefix": "*"' not in config_str, \
        "Wildcard CORS origin detected — use explicit origins"

Integration Testing

Spin up Envoy with a test upstream and verify behavior end-to-end.

# docker-compose.test.yml
version: "3.8"

services:
  envoy:
    image: envoyproxy/envoy:v1.29.0
    command: envoy -c /etc/envoy/envoy-test.yaml
    ports:
      - "10000:10000"  # Proxy port
      - "9901:9901"    # Admin port
    volumes:
      - ./envoy/envoy-test.yaml:/etc/envoy/envoy-test.yaml

  upstream:
    image: kennethreitz/httpbin
    ports:
      - "8080:80"
# envoy/envoy-test.yaml
static_resources:
  listeners:
    - name: listener_0
      address:
        socket_address:
          protocol: TCP
          address: 0.0.0.0
          port_value: 10000
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                stat_prefix: ingress_http
                access_log:
                  - name: envoy.access_loggers.stdout
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog
                http_filters:
                  - name: envoy.filters.http.router
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
                route_config:
                  name: local_route
                  virtual_hosts:
                    - name: local_service
                      domains: ["*"]
                      routes:
                        - match:
                            prefix: "/"
                          route:
                            cluster: upstream_service
                            timeout: 30s

  clusters:
    - name: upstream_service
      connect_timeout: 5s
      type: LOGICAL_DNS
      lb_policy: ROUND_ROBIN
      load_assignment:
        cluster_name: upstream_service
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: upstream
                      port_value: 80

admin:
  address:
    socket_address:
      protocol: TCP
      address: 0.0.0.0
      port_value: 9901

Basic Proxy Tests

import pytest
import httpx
import time

PROXY = "http://localhost:10000"
ADMIN = "http://localhost:9901"

@pytest.fixture(scope="session", autouse=True)
def wait_for_envoy():
    for _ in range(30):
        try:
            r = httpx.get(f"{ADMIN}/ready")
            if r.status_code == 200:
                return
        except httpx.ConnectError:
            pass
        time.sleep(1)
    raise RuntimeError("Envoy did not become ready")

def test_proxy_forwards_get_request():
    r = httpx.get(f"{PROXY}/get")
    assert r.status_code == 200
    assert "args" in r.json()

def test_proxy_preserves_request_headers():
    r = httpx.get(
        f"{PROXY}/headers",
        headers={"X-Custom-Header": "test-value"}
    )
    assert r.status_code == 200
    received = r.json()["headers"]
    assert received.get("X-Custom-Header") == "test-value"

def test_proxy_adds_via_header():
    """Envoy adds a Via header to identify proxy hops."""
    r = httpx.get(f"{PROXY}/headers")
    received = r.json()["headers"]
    assert "Via" in received

def test_unknown_route_returns_404():
    r = httpx.get(f"{PROXY}/this-route-does-not-exist-xyz")
    # Depends on config — might be 404 from upstream or 404 from Envoy routing
    assert r.status_code in (404, 503)

HTTP Filter Testing

JWT Authentication Filter

# Add to http_filters in envoy-test.yaml (before router):
- name: envoy.filters.http.jwt_authn
  typed_config:
    "@type": type.googleapis.com/envoy.extensions.filters.http.jwt_authn.v3.JwtAuthentication
    providers:
      test_provider:
        issuer: "https://test-issuer.example.com"
        local_jwks:
          inline_string: |
            {
              "keys": [{
                "kty": "oct",
                "kid": "test-key",
                "alg": "HS256",
                "k": "c2VjcmV0"
              }]
            }
        forward: true
        payload_in_metadata: "jwt_payload"
    rules:
      - match:
          prefix: "/protected"
        requires:
          provider_name: "test_provider"
      - match:
          prefix: "/public"
        # No requires = public endpoint
import jwt as pyjwt
from datetime import datetime, timedelta

def make_test_jwt(expired=False, wrong_issuer=False):
    payload = {
        "sub": "test-user",
        "iss": "https://evil.example.com" if wrong_issuer else "https://test-issuer.example.com",
        "exp": datetime.utcnow() + (timedelta(hours=-1) if expired else timedelta(hours=1))
    }
    return pyjwt.encode(payload, "secret", algorithm="HS256")

def test_protected_route_requires_jwt():
    r = httpx.get(f"{PROXY}/protected/resource")
    assert r.status_code == 401

def test_valid_jwt_accesses_protected_route():
    token = make_test_jwt()
    r = httpx.get(
        f"{PROXY}/protected/resource",
        headers={"Authorization": f"Bearer {token}"}
    )
    assert r.status_code == 200

def test_expired_jwt_rejected():
    token = make_test_jwt(expired=True)
    r = httpx.get(
        f"{PROXY}/protected/resource",
        headers={"Authorization": f"Bearer {token}"}
    )
    assert r.status_code == 401
    # Envoy returns Www-Authenticate on JWT failures
    assert "Www-Authenticate" in r.headers or "WWW-Authenticate" in r.headers

def test_wrong_issuer_rejected():
    token = make_test_jwt(wrong_issuer=True)
    r = httpx.get(
        f"{PROXY}/protected/resource",
        headers={"Authorization": f"Bearer {token}"}
    )
    assert r.status_code == 401

def test_public_route_accessible_without_jwt():
    r = httpx.get(f"{PROXY}/public/health")
    assert r.status_code == 200

Rate Limiting Filter

Envoy's rate limiting filter calls an external rate limit service (like Lyft's ratelimit). Test the integration:

def test_rate_limit_headers_present():
    """Envoy should return rate limit headers from the rate limit service."""
    r = httpx.get(f"{PROXY}/get", headers={"X-User-Id": "test-user"})
    # Rate limit headers vary by configuration:
    assert any(h in r.headers for h in [
        "x-ratelimit-limit",
        "x-envoy-ratelimited",
        "ratelimit-limit"
    ])

def test_rate_limited_request_returns_429():
    """After exhausting limits, Envoy returns 429 with retry-after."""
    headers = {"X-User-Id": "heavy-user"}
    
    # Make many requests to trigger rate limiting
    for _ in range(200):
        r = httpx.get(f"{PROXY}/get", headers=headers)
        if r.status_code == 429:
            assert "retry-after" in r.headers or "x-envoy-ratelimited" in r.headers
            return
    
    # If no 429 was returned, check that rate limit service is actually configured
    pytest.skip("Rate limit not triggered — check rate limit service configuration")

xDS Dynamic Configuration Testing

When using dynamic xDS configuration, test that your control plane delivers valid configs and that Envoy applies them correctly.

import grpc
from envoy.service.discovery.v3 import ads_pb2_grpc

def test_control_plane_reachable():
    """Envoy should be able to connect to the xDS control plane."""
    admin_stats = httpx.get(f"{ADMIN}/stats")
    stats_text = admin_stats.text
    
    # Check xDS connection is established
    assert "xds_grpc" in stats_text, "No xDS gRPC stats found"
    
    # Check for update failures
    cds_failures = [
        line for line in stats_text.splitlines()
        if "cds.update_failure" in line
    ]
    assert not any(
        int(line.split(":")[-1].strip()) > 0
        for line in cds_failures
    ), f"CDS update failures detected: {cds_failures}"

def test_lds_config_applied():
    """Listener Discovery Service config should be applied."""
    admin_stats = httpx.get(f"{ADMIN}/stats")
    
    # lds.update_success should be > 0
    for line in admin_stats.text.splitlines():
        if "lds.update_success" in line:
            count = int(line.split(":")[-1].strip())
            assert count > 0, "LDS updates not applied"
            return
    
    pytest.fail("lds.update_success stat not found")

def test_cluster_discovery_applied():
    """Check clusters from CDS are loaded."""
    clusters_response = httpx.get(f"{ADMIN}/clusters?format=json")
    clusters = clusters_response.json()
    
    cluster_names = [c["name"] for c in clusters.get("cluster_statuses", [])]
    assert "upstream_service" in cluster_names, \
        f"Expected cluster not found. Got: {cluster_names}"

Admin API Testing

Envoy's admin interface at port 9901 exposes runtime information and controls. Test it as part of integration.

def test_admin_ready_endpoint():
    r = httpx.get(f"{ADMIN}/ready")
    assert r.status_code == 200
    assert r.text.strip() == "LIVE"

def test_admin_health_check():
    r = httpx.get(f"{ADMIN}/healthcheck/ok")
    assert r.status_code == 200

def test_stats_accessible():
    r = httpx.get(f"{ADMIN}/stats")
    assert r.status_code == 200
    stats = r.text
    assert "http.ingress_http.rq_total" in stats

def test_config_dump_readable():
    """Admin config dump should return valid JSON."""
    r = httpx.get(f"{ADMIN}/config_dump")
    assert r.status_code == 200
    config = r.json()
    assert "configs" in config

def test_upstream_cluster_healthy():
    """All configured clusters should have healthy endpoints."""
    r = httpx.get(f"{ADMIN}/clusters?format=json")
    assert r.status_code == 200
    
    data = r.json()
    for cluster in data.get("cluster_statuses", []):
        for host in cluster.get("host_statuses", []):
            health = host.get("health_status", {})
            eds_health = health.get("eds_health_status", "HEALTHY")
            assert eds_health == "HEALTHY", \
                f"Unhealthy host in cluster '{cluster['name']}': {host['address']}"

Outlier Detection Testing

Envoy can automatically eject unhealthy hosts. Verify this works.

def test_outlier_ejection_on_consecutive_errors(flaky_upstream):
    """After consecutive 5xx errors, Envoy ejects the upstream host."""
    # Configure upstream to return 500s
    flaky_upstream.set_response_code(500)
    
    # Make requests until Envoy ejects the host (or times out)
    ejections = 0
    for _ in range(10):
        r = httpx.get(f"{PROXY}/get")
        time.sleep(0.1)
    
    # Check admin stats for ejections
    stats = httpx.get(f"{ADMIN}/stats").text
    ejection_stats = [
        line for line in stats.splitlines()
        if "outlier_detection.ejections_active" in line
    ]
    
    assert any(
        int(line.split(":")[-1].strip()) > 0
        for line in ejection_stats
    ), "No outlier ejections detected despite 5xx responses"

Filter Chain Testing

Filter chain matching determines which set of filters handles a connection. Test that the right chain is selected.

def test_tls_traffic_matches_tls_chain():
    """TLS connections should be handled by the TLS filter chain."""
    # Connect via HTTPS (if TLS listener configured)
    try:
        r = httpx.get(
            "https://localhost:10443/get",
            verify=False  # Test cert — not for production
        )
        assert r.status_code == 200
    except httpx.ConnectError:
        pytest.skip("TLS listener not configured in this test environment")

def test_plaintext_traffic_matches_plaintext_chain():
    """Non-TLS connections use the plaintext filter chain."""
    r = httpx.get(f"{PROXY}/get")
    assert r.status_code == 200

def test_filter_chain_order_matters():
    """Verify filter order: auth → rate limit → router."""
    # Without credentials: should fail at auth (401), not rate limit (429)
    r = httpx.get(f"{PROXY}/protected/resource")
    assert r.status_code == 401  # Auth filter runs first
    
    # Spam requests without auth — should still get 401, not 429
    for _ in range(50):
        r = httpx.get(f"{PROXY}/protected/resource")
        assert r.status_code == 401, \
            "Got non-401 — rate limiter may be running before auth filter"

Istio Integration Testing

When Envoy runs as an Istio sidecar, test your VirtualServices and DestinationRules.

# Validate Istio config before applying
istioctl analyze -n your-namespace

<span class="hljs-comment"># Check proxy config for a specific pod
istioctl proxy-config routes your-pod -n your-namespace --name 80

<span class="hljs-comment"># Verify listeners
istioctl proxy-config listeners your-pod -n your-namespace

<span class="hljs-comment"># Check clusters
istioctl proxy-config clusters your-pod -n your-namespace
import subprocess
import json

def test_istio_virtual_service_applied(pod_name, namespace):
    """VirtualService route rules should appear in proxy config."""
    result = subprocess.run(
        ["istioctl", "proxy-config", "routes", pod_name,
         "-n", namespace, "--name", "80", "-o", "json"],
        capture_output=True, text=True
    )
    assert result.returncode == 0
    
    routes = json.loads(result.stdout)
    route_names = [r.get("name") for r in routes]
    
    assert any("my-service" in name for name in route_names), \
        f"VirtualService not found in proxy routes. Got: {route_names}"

Continuous Testing with HelpMeTest

Envoy configurations can drift from what you tested. Use HelpMeTest for continuous validation:

Go To http://envoy-proxy/ready
Status Should Be 200

Go To http://envoy-admin:9901/stats
Page Should Contain http.ingress_http.rq_total

Schedule these checks to run every 5 minutes so you know immediately when Envoy's behavior changes after a config update.

Read more