Dark Launch Testing: Combining Feature Flags with Shadow Mode
The safest way to deploy new code is to have real production traffic run through it before any user sees its responses. Dark launch testing achieves this by combining two techniques: feature flags control who is opted into new behavior, and shadow mode runs new code on all traffic simultaneously while serving responses only from the current implementation. When used together, they create a progressive deployment strategy that dramatically reduces the risk of large releases.
Dark Launch vs Feature Flags vs Shadow Mode
These terms are often confused, so let us define them precisely:
Feature flags (feature toggles) control which users see which behavior. With a flag, you can enable a new feature for 5% of users, or for specific beta testers, or for users in a specific region. If a problem emerges, you disable the flag. Users are split: some see the new behavior, some see the old.
Shadow mode runs new code on every request but discards the new code's responses. No user sees the new behavior. The new code is exercised with real traffic, but only for observation and comparison. It is invisible to users.
Dark launch is the combination: new code is deployed and running, shadow mode verifies it works correctly, then feature flags progressively shift real users onto the new behavior as confidence grows.
The progression is:
- Deploy new code in shadow mode (0% real users, 100% shadow traffic)
- Verify shadow mode shows no regressions (error rate matches, responses match, performance is acceptable)
- Enable flag for 1% of real users, keep shadow for the remaining 99%
- Monitor error rates, performance, user experience metrics
- Ramp to 5%, 10%, 50%, 100% as confidence grows
- Turn off shadow mode once 100% is reached and stable
LaunchDarkly Integration with Shadow Mode
LaunchDarkly is the most widely used feature flag service. Here is how to combine it with traffic mirroring.
Step 1: Create the shadow flag
import ldclient
from ldclient.config import Config
ldclient.set_config(Config("your-sdk-key"))
client = ldclient.get()
# Check if request should use new implementation
def should_use_new_implementation(user_context):
return client.variation(
"new-checkout-flow",
user_context,
default=False
)
# Check if request should be shadowed (separate from real rollout)
def should_shadow(user_context):
return client.variation(
"shadow-new-checkout-flow",
user_context,
default=False
)Step 2: Application middleware with shadow execution
import asyncio
import logging
from dataclasses import dataclass
from typing import Any
logger = logging.getLogger(__name__)
@dataclass
class CheckoutResult:
order_id: str
total: float
items: list
async def process_checkout_v1(cart, user):
"""Current implementation"""
# existing code
return CheckoutResult(order_id="...", total=cart.total, items=cart.items)
async def process_checkout_v2(cart, user):
"""New implementation under dark launch"""
# new code
return CheckoutResult(order_id="...", total=cart.total, items=cart.items)
async def checkout_with_dark_launch(cart, user, ld_context):
use_new = should_use_new_implementation(ld_context)
shadow = should_shadow(ld_context)
if use_new:
# Real users on new implementation
return await process_checkout_v2(cart, user)
# All other users: run v1 for real, optionally shadow v2
result_v1 = await process_checkout_v1(cart, user)
if shadow:
# Fire and forget: run v2 in background, compare, don't affect user
asyncio.create_task(
shadow_execute(cart, user, result_v1, ld_context)
)
return result_v1
async def shadow_execute(cart, user, v1_result, ld_context):
try:
v2_result = await process_checkout_v2(cart, user)
compare_results(v1_result, v2_result, ld_context)
except Exception as e:
logger.warning(f"Shadow execution failed: {e}", extra={
'user_key': ld_context.get('key'),
'feature': 'new-checkout-flow'
})Step 3: Comparing results and logging divergence
def compare_results(v1: CheckoutResult, v2: CheckoutResult, context: dict):
divergences = []
if abs(v1.total - v2.total) > 0.01: # float tolerance
divergences.append({
'field': 'total',
'v1': v1.total,
'v2': v2.total,
'diff': v2.total - v1.total
})
if set(i.id for i in v1.items) != set(i.id for i in v2.items):
divergences.append({
'field': 'items',
'v1_count': len(v1.items),
'v2_count': len(v2.items)
})
if divergences:
# Log to your observability platform
logger.warning("Shadow divergence detected", extra={
'feature': 'new-checkout-flow',
'divergences': divergences,
'user_key': context.get('key')
})
# Emit metric for dashboarding
metrics.increment('shadow.divergence', tags={
'feature': 'new-checkout-flow',
'field': divergences[0]['field']
})
else:
metrics.increment('shadow.match', tags={
'feature': 'new-checkout-flow'
})Infrastructure-Level Shadow with Feature Flag Coordination
For services where you cannot easily add shadow logic to the application, combine NGINX mirroring with LaunchDarkly at the infrastructure level.
The pattern: use LaunchDarkly's server-side SDK in your proxy layer to decide whether to mirror:
-- nginx.conf with Lua for dynamic mirroring based on feature flags
-- Using OpenResty + lua-resty-launchdarkly
local ldclient = require "resty.launchdarkly"
local ld = ldclient.init("your-sdk-key")
server {
location / {
content_by_lua_block {
local user_key = ngx.req.get_headers()["X-User-ID"] or "anonymous"
local ctx = {key = user_key}
local should_shadow = ld:variation("shadow-v2-service", ctx, false)
if should_shadow then
ngx.exec("@with_mirror")
else
ngx.exec("@without_mirror")
end
}
}
location @with_mirror {
mirror /shadow;
mirror_request_body on;
proxy_pass http://production;
}
location @without_mirror {
proxy_pass http://production;
}
location = /shadow {
internal;
proxy_pass http://candidate$request_uri;
proxy_read_timeout 200ms;
}
}This approach mirrors traffic to the candidate service only for users where the feature flag is enabled (in shadow mode). You can start with 100% shadow, then once shadow results are clean, begin the real rollout to users.
Measuring Divergence for Rollout Decisions
The key decision point in a dark launch is: when have you seen enough shadow traffic to be confident? You need quantitative criteria.
Set up a Prometheus counter for shadow results:
from prometheus_client import Counter, Histogram
shadow_matches = Counter(
'shadow_responses_total',
'Total shadow responses',
['feature', 'result'] # result: 'match' or 'diverge'
)
shadow_latency = Histogram(
'shadow_latency_ms',
'Shadow execution latency',
['feature'],
buckets=[10, 50, 100, 250, 500, 1000, 2500]
)Your rollout decision rules:
def is_safe_to_ramp_up(feature: str, window_minutes: int = 60) -> bool:
"""
Returns True if it's safe to increase the rollout percentage.
"""
match_count = get_metric_count(
f'shadow_responses_total{{feature="{feature}",result="match"}}',
window_minutes
)
diverge_count = get_metric_count(
f'shadow_responses_total{{feature="{feature}",result="diverge"}}',
window_minutes
)
if match_count + diverge_count < 1000:
return False # Not enough data
diverge_rate = diverge_count / (match_count + diverge_count)
p99_latency = get_p99_latency(f'shadow_latency_ms{{feature="{feature}"}}')
return (
diverge_rate < 0.001 and # Less than 0.1% divergence
p99_latency < 500 # P99 latency under 500ms
)Automated Progressive Rollout
With these metrics in place, you can automate the rollout progression:
import time
ROLLOUT_STAGES = [0, 1, 5, 10, 25, 50, 75, 100]
async def progressive_rollout(feature: str, flag_key: str):
"""
Automatically progress through rollout stages based on
shadow mode health signals.
"""
for stage in ROLLOUT_STAGES[1:]: # Skip 0%, start from 1%
print(f"Setting {flag_key} rollout to {stage}%")
await launchdarkly.set_rollout_percentage(flag_key, stage)
if stage == 100:
print("Rollout complete. Disabling shadow mode.")
await launchdarkly.disable_flag(f"shadow-{flag_key}")
break
# Wait for data and check health
print(f"Waiting 30 minutes for {stage}% rollout to stabilize...")
await asyncio.sleep(30 * 60)
if not is_safe_to_ramp_up(feature):
print(f"Health check failed at {stage}%. Rolling back.")
await launchdarkly.set_rollout_percentage(flag_key, 0)
await alert_on_call(f"Dark launch failed at {stage}% for {feature}")
break
print(f"{stage}% rollout healthy. Proceeding to next stage.")Dark Launch Anti-Patterns
Anti-pattern 1: Skipping shadow entirely
Some teams use feature flags but jump straight to real user rollout without shadow mode. This means real users are your canaries. Shadow mode lets you validate with real traffic before anyone is affected.
Anti-pattern 2: Shadow mode with side effects
If the shadow executes writes (sends emails, charges cards, creates records), the dark launch causes real-world effects. Always ensure shadow code paths have no observable side effects. Use shadow detection flags:
import os
IS_SHADOW = os.environ.get('SHADOW_MODE') == 'true'
async def send_confirmation_email(user, order):
if IS_SHADOW:
logger.debug(f"Shadow: would send email to {user.email}")
return
await email_service.send(user.email, ...)Anti-pattern 3: Ignoring divergence noise
Not all divergences are bugs. New fields, timestamp differences, and reordered arrays create noise. Establish your noise baseline (divergence rate between two instances of the current code) before interpreting shadow divergence rates.
Anti-pattern 4: Rolling back based on percent thresholds without context
A 2% divergence rate might be acceptable if those 2% are the new beta_features field you intentionally added. Understand your divergences before treating them as blockers.
Dark launch testing, done with the combination of feature flags and shadow mode, is the closest thing to zero-risk deployment. You test with real traffic, validate behavior before users see it, and have a kill switch the entire way. It requires more infrastructure than a simple deploy, but for significant changes to critical code paths, it is the professional standard.