Test Automation

Dark Launch Testing: Combining Feature Flags with Shadow Mode

HelpMeTest

20 May 2026 — 5 min read

The safest way to deploy new code is to have real production traffic run through it before any user sees its responses. Dark launch testing achieves this by combining two techniques: feature flags control who is opted into new behavior, and shadow mode runs new code on all traffic simultaneously while serving responses only from the current implementation. When used together, they create a progressive deployment strategy that dramatically reduces the risk of large releases.

Dark Launch vs Feature Flags vs Shadow Mode

These terms are often confused, so let us define them precisely:

Feature flags (feature toggles) control which users see which behavior. With a flag, you can enable a new feature for 5% of users, or for specific beta testers, or for users in a specific region. If a problem emerges, you disable the flag. Users are split: some see the new behavior, some see the old.

Shadow mode runs new code on every request but discards the new code's responses. No user sees the new behavior. The new code is exercised with real traffic, but only for observation and comparison. It is invisible to users.

Dark launch is the combination: new code is deployed and running, shadow mode verifies it works correctly, then feature flags progressively shift real users onto the new behavior as confidence grows.

The progression is:

Deploy new code in shadow mode (0% real users, 100% shadow traffic)
Verify shadow mode shows no regressions (error rate matches, responses match, performance is acceptable)
Enable flag for 1% of real users, keep shadow for the remaining 99%
Monitor error rates, performance, user experience metrics
Ramp to 5%, 10%, 50%, 100% as confidence grows
Turn off shadow mode once 100% is reached and stable

LaunchDarkly Integration with Shadow Mode

LaunchDarkly is the most widely used feature flag service. Here is how to combine it with traffic mirroring.

Step 1: Create the shadow flag

import ldclient
from ldclient.config import Config

ldclient.set_config(Config("your-sdk-key"))
client = ldclient.get()

# Check if request should use new implementation
def should_use_new_implementation(user_context):
    return client.variation(
        "new-checkout-flow",
        user_context,
        default=False
    )

# Check if request should be shadowed (separate from real rollout)
def should_shadow(user_context):
    return client.variation(
        "shadow-new-checkout-flow",
        user_context,
        default=False
    )

Step 2: Application middleware with shadow execution

import asyncio
import logging
from dataclasses import dataclass
from typing import Any

logger = logging.getLogger(__name__)

@dataclass
class CheckoutResult:
    order_id: str
    total: float
    items: list

async def process_checkout_v1(cart, user):
    """Current implementation"""
    # existing code
    return CheckoutResult(order_id="...", total=cart.total, items=cart.items)

async def process_checkout_v2(cart, user):
    """New implementation under dark launch"""
    # new code
    return CheckoutResult(order_id="...", total=cart.total, items=cart.items)

async def checkout_with_dark_launch(cart, user, ld_context):
    use_new = should_use_new_implementation(ld_context)
    shadow = should_shadow(ld_context)
    
    if use_new:
        # Real users on new implementation
        return await process_checkout_v2(cart, user)
    
    # All other users: run v1 for real, optionally shadow v2
    result_v1 = await process_checkout_v1(cart, user)
    
    if shadow:
        # Fire and forget: run v2 in background, compare, don't affect user
        asyncio.create_task(
            shadow_execute(cart, user, result_v1, ld_context)
        )
    
    return result_v1

async def shadow_execute(cart, user, v1_result, ld_context):
    try:
        v2_result = await process_checkout_v2(cart, user)
        compare_results(v1_result, v2_result, ld_context)
    except Exception as e:
        logger.warning(f"Shadow execution failed: {e}", extra={
            'user_key': ld_context.get('key'),
            'feature': 'new-checkout-flow'
        })

Step 3: Comparing results and logging divergence

def compare_results(v1: CheckoutResult, v2: CheckoutResult, context: dict):
    divergences = []
    
    if abs(v1.total - v2.total) > 0.01:  # float tolerance
        divergences.append({
            'field': 'total',
            'v1': v1.total,
            'v2': v2.total,
            'diff': v2.total - v1.total
        })
    
    if set(i.id for i in v1.items) != set(i.id for i in v2.items):
        divergences.append({
            'field': 'items',
            'v1_count': len(v1.items),
            'v2_count': len(v2.items)
        })
    
    if divergences:
        # Log to your observability platform
        logger.warning("Shadow divergence detected", extra={
            'feature': 'new-checkout-flow',
            'divergences': divergences,
            'user_key': context.get('key')
        })
        # Emit metric for dashboarding
        metrics.increment('shadow.divergence', tags={
            'feature': 'new-checkout-flow',
            'field': divergences[0]['field']
        })
    else:
        metrics.increment('shadow.match', tags={
            'feature': 'new-checkout-flow'
        })

Infrastructure-Level Shadow with Feature Flag Coordination

For services where you cannot easily add shadow logic to the application, combine NGINX mirroring with LaunchDarkly at the infrastructure level.

The pattern: use LaunchDarkly's server-side SDK in your proxy layer to decide whether to mirror:

-- nginx.conf with Lua for dynamic mirroring based on feature flags
-- Using OpenResty + lua-resty-launchdarkly

local ldclient = require "resty.launchdarkly"
local ld = ldclient.init("your-sdk-key")

server {
    location / {
        content_by_lua_block {
            local user_key = ngx.req.get_headers()["X-User-ID"] or "anonymous"
            local ctx = {key = user_key}
            
            local should_shadow = ld:variation("shadow-v2-service", ctx, false)
            
            if should_shadow then
                ngx.exec("@with_mirror")
            else
                ngx.exec("@without_mirror")
            end
        }
    }
    
    location @with_mirror {
        mirror /shadow;
        mirror_request_body on;
        proxy_pass http://production;
    }
    
    location @without_mirror {
        proxy_pass http://production;
    }
    
    location = /shadow {
        internal;
        proxy_pass http://candidate$request_uri;
        proxy_read_timeout 200ms;
    }
}

This approach mirrors traffic to the candidate service only for users where the feature flag is enabled (in shadow mode). You can start with 100% shadow, then once shadow results are clean, begin the real rollout to users.

Measuring Divergence for Rollout Decisions

The key decision point in a dark launch is: when have you seen enough shadow traffic to be confident? You need quantitative criteria.

Set up a Prometheus counter for shadow results:

from prometheus_client import Counter, Histogram

shadow_matches = Counter(
    'shadow_responses_total',
    'Total shadow responses',
    ['feature', 'result']  # result: 'match' or 'diverge'
)

shadow_latency = Histogram(
    'shadow_latency_ms',
    'Shadow execution latency',
    ['feature'],
    buckets=[10, 50, 100, 250, 500, 1000, 2500]
)

Your rollout decision rules:

def is_safe_to_ramp_up(feature: str, window_minutes: int = 60) -> bool:
    """
    Returns True if it's safe to increase the rollout percentage.
    """
    match_count = get_metric_count(
        f'shadow_responses_total{{feature="{feature}",result="match"}}',
        window_minutes
    )
    diverge_count = get_metric_count(
        f'shadow_responses_total{{feature="{feature}",result="diverge"}}',
        window_minutes
    )
    
    if match_count + diverge_count < 1000:
        return False  # Not enough data
    
    diverge_rate = diverge_count / (match_count + diverge_count)
    p99_latency = get_p99_latency(f'shadow_latency_ms{{feature="{feature}"}}')
    
    return (
        diverge_rate < 0.001 and  # Less than 0.1% divergence
        p99_latency < 500  # P99 latency under 500ms
    )

Automated Progressive Rollout

With these metrics in place, you can automate the rollout progression:

import time

ROLLOUT_STAGES = [0, 1, 5, 10, 25, 50, 75, 100]

async def progressive_rollout(feature: str, flag_key: str):
    """
    Automatically progress through rollout stages based on
    shadow mode health signals.
    """
    
    for stage in ROLLOUT_STAGES[1:]:  # Skip 0%, start from 1%
        print(f"Setting {flag_key} rollout to {stage}%")
        await launchdarkly.set_rollout_percentage(flag_key, stage)
        
        if stage == 100:
            print("Rollout complete. Disabling shadow mode.")
            await launchdarkly.disable_flag(f"shadow-{flag_key}")
            break
        
        # Wait for data and check health
        print(f"Waiting 30 minutes for {stage}% rollout to stabilize...")
        await asyncio.sleep(30 * 60)
        
        if not is_safe_to_ramp_up(feature):
            print(f"Health check failed at {stage}%. Rolling back.")
            await launchdarkly.set_rollout_percentage(flag_key, 0)
            await alert_on_call(f"Dark launch failed at {stage}% for {feature}")
            break
        
        print(f"{stage}% rollout healthy. Proceeding to next stage.")

Dark Launch Anti-Patterns

Anti-pattern 1: Skipping shadow entirely

Some teams use feature flags but jump straight to real user rollout without shadow mode. This means real users are your canaries. Shadow mode lets you validate with real traffic before anyone is affected.

Anti-pattern 2: Shadow mode with side effects

If the shadow executes writes (sends emails, charges cards, creates records), the dark launch causes real-world effects. Always ensure shadow code paths have no observable side effects. Use shadow detection flags:

import os
IS_SHADOW = os.environ.get('SHADOW_MODE') == 'true'

async def send_confirmation_email(user, order):
    if IS_SHADOW:
        logger.debug(f"Shadow: would send email to {user.email}")
        return
    await email_service.send(user.email, ...)

Anti-pattern 3: Ignoring divergence noise

Not all divergences are bugs. New fields, timestamp differences, and reordered arrays create noise. Establish your noise baseline (divergence rate between two instances of the current code) before interpreting shadow divergence rates.

Anti-pattern 4: Rolling back based on percent thresholds without context

A 2% divergence rate might be acceptable if those 2% are the new beta_features field you intentionally added. Understand your divergences before treating them as blockers.

Dark launch testing, done with the combination of feature flags and shadow mode, is the closest thing to zero-risk deployment. You test with real traffic, validate behavior before users see it, and have a kill switch the entire way. It requires more infrastructure than a simple deploy, but for significant changes to critical code paths, it is the professional standard.

Dark Launch Testing: Combining Feature Flags with Shadow Mode

HelpMeTest

Dark Launch vs Feature Flags vs Shadow Mode

LaunchDarkly Integration with Shadow Mode

Infrastructure-Level Shadow with Feature Flag Coordination

Measuring Divergence for Rollout Decisions

Automated Progressive Rollout

Dark Launch Anti-Patterns

Read more

Testing React Router v7 with Vite + Vitest: Setup and Best Practices

E2E Testing React Router v7 Apps with Playwright

Migrating from Remix to React Router v7: Testing Your Migration

Testing React Router v7 Loaders and Actions with Vitest