Portkey AI Gateway Testing: Rate Limits, Fallbacks, and Routing

Portkey AI Gateway Testing: Rate Limits, Fallbacks, and Routing

Most LLM applications fail at the infrastructure layer, not the model layer. Your prompt is fine. Your retrieval is fine. But when OpenAI's API goes down at 2 AM, or when your token budget runs out mid-sprint, the application breaks entirely. Portkey is an AI gateway that sits between your application and the LLM providers — handling failover, rate limiting, caching, and routing. Testing those configurations correctly is non-trivial and often skipped entirely.

This guide covers how to test Portkey gateway configurations so you know your fallback chain actually works before production tells you it doesn't.

What Portkey Actually Does

Portkey acts as a proxy in front of OpenAI, Anthropic, Cohere, and 200+ other providers. Key features relevant to testing:

  • Fallbacks — if OpenAI returns a 429 or 5xx, route to Anthropic automatically
  • Load balancing — split traffic across providers by weight
  • Retry logic — retry failed requests with exponential backoff
  • Caching — return cached responses for identical prompts (cost savings)
  • Rate limiting — enforce per-user or per-tenant token budgets
  • Guardrails — block certain input/output patterns before they reach the model

The catch: these configurations live in Portkey's dashboard or in code. If you deploy a broken fallback config, you won't know until a real failure hits.

Installation

pip install portkey-ai  # Python
npm install portkey-ai  <span class="hljs-comment"># TypeScript/Node

Basic Setup

from portkey_ai import Portkey

client = Portkey(
    api_key="YOUR_PORTKEY_API_KEY",
    virtual_key="YOUR_OPENAI_VIRTUAL_KEY",  # Maps to your OpenAI key in Portkey
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}]
)

The virtual_key concept is important for testing: you can create separate virtual keys for test vs production, with different rate limits and routing configurations.

Testing Fallback Chains

The most critical thing to test is whether your fallback actually fires when the primary provider fails.

Setting Up a Fallback Config

from portkey_ai import Portkey, createConfig

fallback_config = createConfig({
    "strategy": {
        "mode": "fallback",
        "on_status_codes": [429, 500, 502, 503, 504],
    },
    "targets": [
        {
            "virtual_key": "openai-virtual-key",
            "weight": 1,
        },
        {
            "virtual_key": "anthropic-virtual-key",  # Fallback
            "override_params": {
                "model": "claude-3-5-sonnet-20241022",
            },
        },
    ],
})

client = Portkey(
    api_key="YOUR_PORTKEY_API_KEY",
    config=fallback_config,
)

Testing the Fallback Fires

You can't rely on OpenAI actually failing. Use Portkey's test virtual keys that simulate specific error codes:

import pytest
from portkey_ai import Portkey, createConfig
from unittest.mock import patch
import httpx

def test_fallback_fires_on_429():
    """Verify that a 429 from primary triggers fallback to secondary."""
    
    call_log = []
    original_post = httpx.Client.post
    
    def mock_post(self, url, **kwargs):
        # First call (OpenAI) returns 429
        if "openai" in str(kwargs.get("json", {}).get("model", "")):
            call_log.append("primary")
            raise httpx.HTTPStatusError(
                "Rate limit", 
                request=None, 
                response=httpx.Response(429)
            )
        # Second call (Anthropic) succeeds
        call_log.append("fallback")
        return original_post(self, url, **kwargs)
    
    # Better approach: use Portkey's simulation virtual keys
    # Create a virtual key in dashboard with "simulate_error": 429
    client = Portkey(
        api_key="YOUR_PORTKEY_API_KEY",
        config=createConfig({
            "strategy": {"mode": "fallback"},
            "targets": [
                {"virtual_key": "simulate-429-key"},  # Always returns 429
                {"virtual_key": "anthropic-virtual-key"},
            ],
        }),
    )
    
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "test"}],
    )
    
    assert response.choices[0].message.content is not None
    # Verify response came from Anthropic (check metadata)
    assert "claude" in response.model.lower()

Testing Load Balancing Distribution

For load balancing, verify the distribution is approximately correct over many requests:

def test_load_balance_distribution():
    """Verify 70/30 split between providers is approximately correct."""
    
    lb_config = createConfig({
        "strategy": {"mode": "loadbalance"},
        "targets": [
            {"virtual_key": "openai-vk", "weight": 70},
            {"virtual_key": "anthropic-vk", "weight": 30},
        ],
    })
    
    client = Portkey(api_key="YOUR_PORTKEY_API_KEY", config=lb_config)
    
    provider_hits = {"openai": 0, "anthropic": 0}
    n_requests = 100
    
    for _ in range(n_requests):
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": "ping"}],
        )
        # Check which provider responded via response headers or model name
        if "claude" in response.model.lower():
            provider_hits["anthropic"] += 1
        else:
            provider_hits["openai"] += 1
    
    anthropic_pct = provider_hits["anthropic"] / n_requests
    # Allow ±10% variance from the 30% target
    assert 0.20 <= anthropic_pct <= 0.40, f"Anthropic hit {anthropic_pct:.0%}, expected ~30%"

Testing Rate Limiting

Portkey can enforce token budgets per user or per API key. Test that requests are blocked when limits are exceeded:

def test_rate_limit_enforced():
    """Verify that rate limits block requests after budget is exhausted."""
    
    # Create a virtual key with a very low token limit for testing
    # (Configure this in Portkey dashboard: rate_limit: 100 tokens/minute)
    test_client = Portkey(
        api_key="YOUR_PORTKEY_API_KEY",
        virtual_key="rate-limited-test-key",  # 100 token/min limit
    )
    
    responses = []
    errors = []
    
    # Send requests until we hit the limit
    for i in range(20):
        try:
            r = test_client.chat.completions.create(
                model="gpt-4o",
                messages=[{"role": "user", "content": "Write a long paragraph about AI."}],
                max_tokens=50,
            )
            responses.append(r)
        except Exception as e:
            errors.append(str(e))
            break
    
    assert len(errors) > 0, "Expected rate limit to trigger, but no errors occurred"
    assert "429" in errors[0] or "rate" in errors[0].lower()

Testing Semantic Caching

Portkey's semantic cache returns cached responses for semantically similar (not just identical) queries:

import time

def test_semantic_cache_hit():
    """Verify that semantically similar queries return cached responses."""
    
    cache_config = createConfig({
        "cache": {
            "mode": "semantic",
            "max_age": 3600,  # 1 hour TTL
        },
        "targets": [{"virtual_key": "openai-vk"}],
    })
    
    client = Portkey(api_key="YOUR_PORTKEY_API_KEY", config=cache_config)
    
    # First request — populates cache
    t1 = time.time()
    r1 = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "What is the capital of France?"}],
    )
    t1_duration = time.time() - t1
    
    # Semantically similar query — should hit cache
    t2 = time.time()
    r2 = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Tell me the capital city of France"}],
    )
    t2_duration = time.time() - t2
    
    # Cache hit should be significantly faster
    assert t2_duration < t1_duration * 0.5, f"Cache miss: {t2_duration:.2f}s vs {t1_duration:.2f}s"
    
    # Responses should be identical (cache returned same content)
    assert r1.choices[0].message.content == r2.choices[0].message.content

TypeScript Example

import Portkey from "portkey-ai";

const client = new Portkey({
  apiKey: process.env.PORTKEY_API_KEY!,
  config: {
    strategy: { mode: "fallback" },
    targets: [
      { virtualKey: process.env.OPENAI_VIRTUAL_KEY! },
      { virtualKey: process.env.ANTHROPIC_VIRTUAL_KEY! },
    ],
  },
});

// Test: verify fallback fires
async function testFallback() {
  const response = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: "Hello" }],
  });
  
  console.assert(response.choices[0].message.content !== null);
  console.log("Fallback test passed:", response.model);
}

Testing Guardrails

Portkey guardrails can block toxic inputs or filter outputs. Test both directions:

def test_input_guardrail_blocks_pii():
    """Verify that PII in prompts is blocked before reaching the model."""
    
    guardrail_client = Portkey(
        api_key="YOUR_PORTKEY_API_KEY",
        virtual_key="guardrail-enabled-key",  # Configured with PII detection
    )
    
    with pytest.raises(Exception) as exc_info:
        guardrail_client.chat.completions.create(
            model="gpt-4o",
            messages=[{
                "role": "user",
                "content": "My SSN is 123-45-6789, can you help me?",
            }],
        )
    
    assert "guardrail" in str(exc_info.value).lower() or "blocked" in str(exc_info.value).lower()

CI Integration

# .github/workflows/gateway-tests.yml
name: Portkey Gateway Tests

on: [pull_request]

jobs:
  gateway-tests:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"
      - run: pip install portkey-ai pytest
      - name: Run gateway configuration tests
        env:
          PORTKEY_API_KEY: ${{ secrets.PORTKEY_API_KEY }}
          OPENAI_VIRTUAL_KEY: ${{ secrets.PORTKEY_OPENAI_VK }}
          ANTHROPIC_VIRTUAL_KEY: ${{ secrets.PORTKEY_ANTHROPIC_VK }}
        run: pytest tests/test_gateway.py -v

Beyond Gateway Testing

Gateway tests verify your routing configuration. They don't verify that your application behaves correctly from the user's perspective. A fallback that routes to Anthropic might still produce wrong answers for your specific prompts. Combine gateway tests with end-to-end application testing using HelpMeTest to cover both layers: the infrastructure and the user experience.

Summary

Testing your Portkey configuration before production is straightforward but requires deliberate setup: create test virtual keys with simulated error modes, write tests that verify fallback fires under specific HTTP status codes, validate load balancing distribution with statistical checks, and test that rate limits actually block requests. Add these to CI and your gateway configuration becomes as reliable as your application code.

Read more