Portkey AI Gateway Testing: Rate Limits, Fallbacks, and Routing
Most LLM applications fail at the infrastructure layer, not the model layer. Your prompt is fine. Your retrieval is fine. But when OpenAI's API goes down at 2 AM, or when your token budget runs out mid-sprint, the application breaks entirely. Portkey is an AI gateway that sits between your application and the LLM providers — handling failover, rate limiting, caching, and routing. Testing those configurations correctly is non-trivial and often skipped entirely.
This guide covers how to test Portkey gateway configurations so you know your fallback chain actually works before production tells you it doesn't.
What Portkey Actually Does
Portkey acts as a proxy in front of OpenAI, Anthropic, Cohere, and 200+ other providers. Key features relevant to testing:
- Fallbacks — if OpenAI returns a 429 or 5xx, route to Anthropic automatically
- Load balancing — split traffic across providers by weight
- Retry logic — retry failed requests with exponential backoff
- Caching — return cached responses for identical prompts (cost savings)
- Rate limiting — enforce per-user or per-tenant token budgets
- Guardrails — block certain input/output patterns before they reach the model
The catch: these configurations live in Portkey's dashboard or in code. If you deploy a broken fallback config, you won't know until a real failure hits.
Installation
pip install portkey-ai # Python
npm install portkey-ai <span class="hljs-comment"># TypeScript/NodeBasic Setup
from portkey_ai import Portkey
client = Portkey(
api_key="YOUR_PORTKEY_API_KEY",
virtual_key="YOUR_OPENAI_VIRTUAL_KEY", # Maps to your OpenAI key in Portkey
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}]
)The virtual_key concept is important for testing: you can create separate virtual keys for test vs production, with different rate limits and routing configurations.
Testing Fallback Chains
The most critical thing to test is whether your fallback actually fires when the primary provider fails.
Setting Up a Fallback Config
from portkey_ai import Portkey, createConfig
fallback_config = createConfig({
"strategy": {
"mode": "fallback",
"on_status_codes": [429, 500, 502, 503, 504],
},
"targets": [
{
"virtual_key": "openai-virtual-key",
"weight": 1,
},
{
"virtual_key": "anthropic-virtual-key", # Fallback
"override_params": {
"model": "claude-3-5-sonnet-20241022",
},
},
],
})
client = Portkey(
api_key="YOUR_PORTKEY_API_KEY",
config=fallback_config,
)Testing the Fallback Fires
You can't rely on OpenAI actually failing. Use Portkey's test virtual keys that simulate specific error codes:
import pytest
from portkey_ai import Portkey, createConfig
from unittest.mock import patch
import httpx
def test_fallback_fires_on_429():
"""Verify that a 429 from primary triggers fallback to secondary."""
call_log = []
original_post = httpx.Client.post
def mock_post(self, url, **kwargs):
# First call (OpenAI) returns 429
if "openai" in str(kwargs.get("json", {}).get("model", "")):
call_log.append("primary")
raise httpx.HTTPStatusError(
"Rate limit",
request=None,
response=httpx.Response(429)
)
# Second call (Anthropic) succeeds
call_log.append("fallback")
return original_post(self, url, **kwargs)
# Better approach: use Portkey's simulation virtual keys
# Create a virtual key in dashboard with "simulate_error": 429
client = Portkey(
api_key="YOUR_PORTKEY_API_KEY",
config=createConfig({
"strategy": {"mode": "fallback"},
"targets": [
{"virtual_key": "simulate-429-key"}, # Always returns 429
{"virtual_key": "anthropic-virtual-key"},
],
}),
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "test"}],
)
assert response.choices[0].message.content is not None
# Verify response came from Anthropic (check metadata)
assert "claude" in response.model.lower()Testing Load Balancing Distribution
For load balancing, verify the distribution is approximately correct over many requests:
def test_load_balance_distribution():
"""Verify 70/30 split between providers is approximately correct."""
lb_config = createConfig({
"strategy": {"mode": "loadbalance"},
"targets": [
{"virtual_key": "openai-vk", "weight": 70},
{"virtual_key": "anthropic-vk", "weight": 30},
],
})
client = Portkey(api_key="YOUR_PORTKEY_API_KEY", config=lb_config)
provider_hits = {"openai": 0, "anthropic": 0}
n_requests = 100
for _ in range(n_requests):
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "ping"}],
)
# Check which provider responded via response headers or model name
if "claude" in response.model.lower():
provider_hits["anthropic"] += 1
else:
provider_hits["openai"] += 1
anthropic_pct = provider_hits["anthropic"] / n_requests
# Allow ±10% variance from the 30% target
assert 0.20 <= anthropic_pct <= 0.40, f"Anthropic hit {anthropic_pct:.0%}, expected ~30%"Testing Rate Limiting
Portkey can enforce token budgets per user or per API key. Test that requests are blocked when limits are exceeded:
def test_rate_limit_enforced():
"""Verify that rate limits block requests after budget is exhausted."""
# Create a virtual key with a very low token limit for testing
# (Configure this in Portkey dashboard: rate_limit: 100 tokens/minute)
test_client = Portkey(
api_key="YOUR_PORTKEY_API_KEY",
virtual_key="rate-limited-test-key", # 100 token/min limit
)
responses = []
errors = []
# Send requests until we hit the limit
for i in range(20):
try:
r = test_client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Write a long paragraph about AI."}],
max_tokens=50,
)
responses.append(r)
except Exception as e:
errors.append(str(e))
break
assert len(errors) > 0, "Expected rate limit to trigger, but no errors occurred"
assert "429" in errors[0] or "rate" in errors[0].lower()Testing Semantic Caching
Portkey's semantic cache returns cached responses for semantically similar (not just identical) queries:
import time
def test_semantic_cache_hit():
"""Verify that semantically similar queries return cached responses."""
cache_config = createConfig({
"cache": {
"mode": "semantic",
"max_age": 3600, # 1 hour TTL
},
"targets": [{"virtual_key": "openai-vk"}],
})
client = Portkey(api_key="YOUR_PORTKEY_API_KEY", config=cache_config)
# First request — populates cache
t1 = time.time()
r1 = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What is the capital of France?"}],
)
t1_duration = time.time() - t1
# Semantically similar query — should hit cache
t2 = time.time()
r2 = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Tell me the capital city of France"}],
)
t2_duration = time.time() - t2
# Cache hit should be significantly faster
assert t2_duration < t1_duration * 0.5, f"Cache miss: {t2_duration:.2f}s vs {t1_duration:.2f}s"
# Responses should be identical (cache returned same content)
assert r1.choices[0].message.content == r2.choices[0].message.contentTypeScript Example
import Portkey from "portkey-ai";
const client = new Portkey({
apiKey: process.env.PORTKEY_API_KEY!,
config: {
strategy: { mode: "fallback" },
targets: [
{ virtualKey: process.env.OPENAI_VIRTUAL_KEY! },
{ virtualKey: process.env.ANTHROPIC_VIRTUAL_KEY! },
],
},
});
// Test: verify fallback fires
async function testFallback() {
const response = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "Hello" }],
});
console.assert(response.choices[0].message.content !== null);
console.log("Fallback test passed:", response.model);
}Testing Guardrails
Portkey guardrails can block toxic inputs or filter outputs. Test both directions:
def test_input_guardrail_blocks_pii():
"""Verify that PII in prompts is blocked before reaching the model."""
guardrail_client = Portkey(
api_key="YOUR_PORTKEY_API_KEY",
virtual_key="guardrail-enabled-key", # Configured with PII detection
)
with pytest.raises(Exception) as exc_info:
guardrail_client.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "user",
"content": "My SSN is 123-45-6789, can you help me?",
}],
)
assert "guardrail" in str(exc_info.value).lower() or "blocked" in str(exc_info.value).lower()CI Integration
# .github/workflows/gateway-tests.yml
name: Portkey Gateway Tests
on: [pull_request]
jobs:
gateway-tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: pip install portkey-ai pytest
- name: Run gateway configuration tests
env:
PORTKEY_API_KEY: ${{ secrets.PORTKEY_API_KEY }}
OPENAI_VIRTUAL_KEY: ${{ secrets.PORTKEY_OPENAI_VK }}
ANTHROPIC_VIRTUAL_KEY: ${{ secrets.PORTKEY_ANTHROPIC_VK }}
run: pytest tests/test_gateway.py -vBeyond Gateway Testing
Gateway tests verify your routing configuration. They don't verify that your application behaves correctly from the user's perspective. A fallback that routes to Anthropic might still produce wrong answers for your specific prompts. Combine gateway tests with end-to-end application testing using HelpMeTest to cover both layers: the infrastructure and the user experience.
Summary
Testing your Portkey configuration before production is straightforward but requires deliberate setup: create test virtual keys with simulated error modes, write tests that verify fallback fires under specific HTTP status codes, validate load balancing distribution with statistical checks, and test that rate limits actually block requests. Add these to CI and your gateway configuration becomes as reliable as your application code.