Blue-Green Deployment Testing: Smoke Tests, Traffic Cutover & Rollback Validation

Blue-Green Deployment Testing: Smoke Tests, Traffic Cutover & Rollback Validation

Blue-green deployment reduces downtime risk by running two identical production environments — blue (live) and green (new). The switch happens at the router level, and rollback is instant: flip traffic back to blue. The testing challenge is making the cutover decision confidently: is green actually ready?

The Blue-Green Testing Problem

A blue-green cutover is irreversible in the moment — users are immediately on the new version. The window to catch issues is before the cutover, not after. Testing must answer:

  • Does the green deployment start correctly and pass health checks?
  • Does green produce the same responses as blue for critical paths?
  • Does green handle the expected load?
  • If something goes wrong, does rollback actually restore green to working state?

Health Check Validation

Before any traffic reaches green, validate it's genuinely healthy:

#!/bin/bash
<span class="hljs-comment"># validate-green.sh

GREEN_URL=<span class="hljs-string">"${1:-http://green.internal.example.com}"
MAX_RETRIES=30
RETRY_INTERVAL=5

<span class="hljs-built_in">echo <span class="hljs-string">"Validating green deployment at $GREEN_URL"

<span class="hljs-comment"># Phase 1: Basic health check with retries
<span class="hljs-built_in">echo <span class="hljs-string">"Phase 1: Health endpoint"
<span class="hljs-keyword">for i <span class="hljs-keyword">in $(<span class="hljs-built_in">seq 1 <span class="hljs-variable">$MAX_RETRIES); <span class="hljs-keyword">do
  STATUS=$(curl -s -o /dev/null -w <span class="hljs-string">"%{http_code}" <span class="hljs-string">"$GREEN_URL/health")
  <span class="hljs-keyword">if [ <span class="hljs-string">"$STATUS" = <span class="hljs-string">"200" ]; <span class="hljs-keyword">then
    <span class="hljs-built_in">echo <span class="hljs-string">"  Health check passed on attempt $i"
    <span class="hljs-built_in">break
  <span class="hljs-keyword">fi
  <span class="hljs-built_in">echo <span class="hljs-string">"  Attempt $i: got <span class="hljs-variable">$STATUS, retrying in <span class="hljs-variable">${RETRY_INTERVAL}s..."
  <span class="hljs-built_in">sleep <span class="hljs-variable">$RETRY_INTERVAL
  [ <span class="hljs-variable">$i -eq <span class="hljs-variable">$MAX_RETRIES ] && { <span class="hljs-built_in">echo <span class="hljs-string">"FAIL: Health check never passed"; <span class="hljs-built_in">exit 1; }
<span class="hljs-keyword">done

<span class="hljs-comment"># Phase 2: Deep health — dependencies reachable
<span class="hljs-built_in">echo <span class="hljs-string">"Phase 2: Dependency health"
DEEP_HEALTH=$(curl -s <span class="hljs-string">"$GREEN_URL/health/deep")
DB_STATUS=$(<span class="hljs-built_in">echo <span class="hljs-string">"$DEEP_HEALTH" <span class="hljs-pipe">| jq -r <span class="hljs-string">'.database.status')
CACHE_STATUS=$(<span class="hljs-built_in">echo <span class="hljs-string">"$DEEP_HEALTH" <span class="hljs-pipe">| jq -r <span class="hljs-string">'.cache.status')
QUEUE_STATUS=$(<span class="hljs-built_in">echo <span class="hljs-string">"$DEEP_HEALTH" <span class="hljs-pipe">| jq -r <span class="hljs-string">'.queue.status')

[ <span class="hljs-string">"$DB_STATUS" = <span class="hljs-string">"healthy" ] <span class="hljs-pipe">|| { <span class="hljs-built_in">echo <span class="hljs-string">"FAIL: Database unhealthy: $DB_STATUS"; <span class="hljs-built_in">exit 1; }
[ <span class="hljs-string">"$CACHE_STATUS" = <span class="hljs-string">"healthy" ] <span class="hljs-pipe">|| { <span class="hljs-built_in">echo <span class="hljs-string">"FAIL: Cache unhealthy: $CACHE_STATUS"; <span class="hljs-built_in">exit 1; }
[ <span class="hljs-string">"$QUEUE_STATUS" = <span class="hljs-string">"healthy" ] <span class="hljs-pipe">|| { <span class="hljs-built_in">echo <span class="hljs-string">"FAIL: Queue unhealthy: $QUEUE_STATUS"; <span class="hljs-built_in">exit 1; }
<span class="hljs-built_in">echo <span class="hljs-string">"  All dependencies healthy"

<span class="hljs-comment"># Phase 3: Version check — is the right version deployed?
<span class="hljs-built_in">echo <span class="hljs-string">"Phase 3: Version validation"
DEPLOYED_VERSION=$(curl -s <span class="hljs-string">"$GREEN_URL/version" <span class="hljs-pipe">| jq -r <span class="hljs-string">'.version')
EXPECTED_VERSION=<span class="hljs-string">"${EXPECTED_VERSION:-$(git rev-parse --short HEAD)}"
[ <span class="hljs-string">"$DEPLOYED_VERSION" = <span class="hljs-string">"$EXPECTED_VERSION" ] <span class="hljs-pipe">|| {
  <span class="hljs-built_in">echo <span class="hljs-string">"FAIL: Wrong version deployed. Expected $EXPECTED_VERSION, got <span class="hljs-variable">$DEPLOYED_VERSION"
  <span class="hljs-built_in">exit 1
}
<span class="hljs-built_in">echo <span class="hljs-string">"  Version $DEPLOYED_VERSION confirmed"

<span class="hljs-built_in">echo <span class="hljs-string">"Green deployment validation PASSED"

Automated Smoke Tests

Smoke tests verify critical paths work on green before the cutover. They're minimal, fast, and focused on the highest-value flows:

// smoke-tests/smoke.test.js
const axios = require('axios');

const GREEN_URL = process.env.GREEN_URL;
const BLUE_URL = process.env.BLUE_URL;

// Auth credentials for smoke test user (created in test environment)
const TEST_EMAIL = process.env.SMOKE_TEST_EMAIL;
const TEST_PASSWORD = process.env.SMOKE_TEST_PASSWORD;

let authToken;

// Setup: authenticate against GREEN
beforeAll(async () => {
  const response = await axios.post(`${GREEN_URL}/api/auth/login`, {
    email: TEST_EMAIL,
    password: TEST_PASSWORD
  });
  authToken = response.data.token;
});

const authHeaders = () => ({ Authorization: `Bearer ${authToken}` });

describe('Critical path smoke tests', () => {
  it('home page loads', async () => {
    const response = await axios.get(`${GREEN_URL}/`);
    expect(response.status).toBe(200);
    expect(response.headers['content-type']).toContain('text/html');
  });

  it('authentication works', async () => {
    // Already tested in beforeAll — verify token is valid
    const response = await axios.get(`${GREEN_URL}/api/me`, {
      headers: authHeaders()
    });
    expect(response.status).toBe(200);
    expect(response.data.email).toBe(TEST_EMAIL);
  });

  it('primary API endpoints respond', async () => {
    const endpoints = [
      '/api/products',
      '/api/categories',
      '/api/user/preferences'
    ];

    for (const endpoint of endpoints) {
      const response = await axios.get(`${GREEN_URL}${endpoint}`, {
        headers: authHeaders()
      });
      expect(response.status).toBe(200);
    }
  });

  it('creates and retrieves a resource (data layer is working)', async () => {
    // Create
    const createResponse = await axios.post(
      `${GREEN_URL}/api/items`,
      { name: `smoke-test-${Date.now()}`, category: 'test' },
      { headers: authHeaders() }
    );
    expect(createResponse.status).toBe(201);
    const itemId = createResponse.data.id;

    // Retrieve
    const getResponse = await axios.get(
      `${GREEN_URL}/api/items/${itemId}`,
      { headers: authHeaders() }
    );
    expect(getResponse.status).toBe(200);
    expect(getResponse.data.id).toBe(itemId);

    // Cleanup
    await axios.delete(
      `${GREEN_URL}/api/items/${itemId}`,
      { headers: authHeaders() }
    );
  });

  it('static assets are served', async () => {
    const response = await axios.get(`${GREEN_URL}/static/main.js`);
    expect(response.status).toBe(200);
    expect(response.headers['content-type']).toContain('javascript');
    // Verify cache headers are present
    expect(response.headers['cache-control']).toBeDefined();
  });
});

describe('Blue/green response parity', () => {
  it('product listing returns same schema', async () => {
    const [blueResponse, greenResponse] = await Promise.all([
      axios.get(`${BLUE_URL}/api/products`, { headers: authHeaders() }),
      axios.get(`${GREEN_URL}/api/products`, { headers: authHeaders() })
    ]);

    expect(blueResponse.status).toBe(greenResponse.status);
    // Same top-level keys in response
    const blueKeys = Object.keys(blueResponse.data).sort();
    const greenKeys = Object.keys(greenResponse.data).sort();
    expect(greenKeys).toEqual(blueKeys);
  });

  it('response time within acceptable range', async () => {
    const measurements = [];
    for (let i = 0; i < 5; i++) {
      const start = Date.now();
      await axios.get(`${GREEN_URL}/api/products`, { headers: authHeaders() });
      measurements.push(Date.now() - start);
    }
    const avg = measurements.reduce((a, b) => a + b) / measurements.length;
    // Green must respond within 200ms average (adjust for your SLA)
    expect(avg).toBeLessThan(200);
  });
});

Traffic Cutover Testing

Test the cutover mechanism itself. This is rarely tested — teams assume the router switch works — until it doesn't.

AWS ALB Target Group Swap

#!/bin/bash
<span class="hljs-comment"># test-cutover.sh

ALB_ARN=<span class="hljs-string">"arn:aws:elasticloadbalancing:us-east-1:123456789:loadbalancer/app/my-alb/abc123"
LISTENER_ARN=<span class="hljs-string">"arn:aws:elasticloadbalancing:us-east-1:123456789:listener/app/my-alb/abc123/def456"
BLUE_TG=<span class="hljs-string">"arn:aws:elasticloadbalancing:us-east-1:123456789:targetgroup/blue-tg/111"
GREEN_TG=<span class="hljs-string">"arn:aws:elasticloadbalancing:us-east-1:123456789:targetgroup/green-tg/222"
PUBLIC_URL=<span class="hljs-string">"https://api.example.com"

<span class="hljs-comment"># Step 1: Send continuous requests during cutover (should see zero failures)
<span class="hljs-built_in">echo <span class="hljs-string">"Starting continuous request monitor..."
<span class="hljs-built_in">cat > /tmp/monitor.sh << <span class="hljs-string">'MONITOR'
<span class="hljs-comment">#!/bin/bash
PASS=0; FAIL=0
<span class="hljs-keyword">while <span class="hljs-literal">true; <span class="hljs-keyword">do
  STATUS=$(curl -s -o /dev/null -w <span class="hljs-string">"%{http_code}" <span class="hljs-string">"$1/health")
  [ <span class="hljs-string">"$STATUS" = <span class="hljs-string">"200" ] && ((PASS++)) <span class="hljs-pipe">|| { ((FAIL++)); <span class="hljs-built_in">echo <span class="hljs-string">"$(date): FAIL <span class="hljs-variable">$STATUS"; }
  <span class="hljs-built_in">sleep 0.1
<span class="hljs-keyword">done
MONITOR
<span class="hljs-built_in">chmod +x /tmp/monitor.sh
bash /tmp/monitor.sh <span class="hljs-string">"$PUBLIC_URL" &
MONITOR_PID=$!

<span class="hljs-comment"># Step 2: Perform cutover
<span class="hljs-built_in">echo <span class="hljs-string">"Switching traffic to green..."
aws elbv2 modify-listener \
  --listener-arn <span class="hljs-string">"$LISTENER_ARN" \
  --default-actions Type=forward,TargetGroupArn=<span class="hljs-string">"$GREEN_TG"

<span class="hljs-built_in">echo <span class="hljs-string">"Waiting 30s to observe post-cutover behavior..."
<span class="hljs-built_in">sleep 30

<span class="hljs-comment"># Step 3: Verify traffic is on green
ACTIVE_VERSION=$(curl -s <span class="hljs-string">"$PUBLIC_URL/version" <span class="hljs-pipe">| jq -r <span class="hljs-string">'.version')
<span class="hljs-built_in">echo <span class="hljs-string">"Active version after cutover: $ACTIVE_VERSION"

<span class="hljs-comment"># Step 4: Stop monitor and check results
<span class="hljs-built_in">kill <span class="hljs-variable">$MONITOR_PID
<span class="hljs-built_in">wait <span class="hljs-variable">$MONITOR_PID 2>/dev/null

<span class="hljs-built_in">echo <span class="hljs-string">"Done. Check /tmp/monitor output for failures during cutover."

Weighted Traffic Shift Testing (Canary within Blue-Green)

Before full cutover, shift a percentage of traffic:

# 10% to green, 90% stays on blue
aws elbv2 modify-listener \
  --listener-arn <span class="hljs-string">"$LISTENER_ARN" \
  --default-actions <span class="hljs-string">"Type=forward,ForwardConfig={
    TargetGroups=[
      {TargetGroupArn=$BLUE_TG,Weight=90},
      {TargetGroupArn=<span class="hljs-variable">$GREEN_TG,Weight=10}
    ]
  }"

<span class="hljs-comment"># Monitor error rate for 5 minutes
start_time=$(<span class="hljs-built_in">date +%s)
<span class="hljs-keyword">while <span class="hljs-literal">true; <span class="hljs-keyword">do
  elapsed=$(( $(date +%s) - start_time ))
  [ <span class="hljs-variable">$elapsed -gt 300 ] && <span class="hljs-built_in">break

  <span class="hljs-comment"># Check green target group 4xx/5xx counts
  ERROR_COUNT=$(aws cloudwatch get-metric-statistics \
    --namespace AWS/ApplicationELB \
    --metric-name HTTPCode_Target_5XX_Count \
    --dimensions Name=TargetGroup,Value=<span class="hljs-string">"${GREEN_TG##*/}" \
    --start-time <span class="hljs-string">"$(date -u -d '1 minute ago' +%Y-%m-%dT%H:%M:%S)" \
    --end-time <span class="hljs-string">"$(date -u +%Y-%m-%dT%H:%M:%S)" \
    --period 60 --statistics Sum \
    <span class="hljs-pipe">| jq <span class="hljs-string">'.Datapoints[0].Sum // 0')

  <span class="hljs-built_in">echo <span class="hljs-string">"$(date): Green error count last minute: <span class="hljs-variable">$ERROR_COUNT"

  <span class="hljs-keyword">if [ <span class="hljs-string">"$(echo "$ERROR_COUNT > 5" <span class="hljs-pipe">| bc)" -eq 1 ]; <span class="hljs-keyword">then
    <span class="hljs-built_in">echo <span class="hljs-string">"ERROR THRESHOLD EXCEEDED — rolling back"
    <span class="hljs-comment"># Rollback
    aws elbv2 modify-listener \
      --listener-arn <span class="hljs-string">"$LISTENER_ARN" \
      --default-actions <span class="hljs-string">"Type=forward,TargetGroupArn=$BLUE_TG"
    <span class="hljs-built_in">exit 1
  <span class="hljs-keyword">fi

  <span class="hljs-built_in">sleep 30
<span class="hljs-keyword">done

<span class="hljs-built_in">echo <span class="hljs-string">"Canary phase stable — proceed to full cutover"

Rollback Verification

A rollback that has never been tested will fail when you need it most.

#!/bin/bash
<span class="hljs-comment"># test-rollback.sh — run in staging periodically

BLUE_URL=<span class="hljs-string">"http://blue.internal.example.com"
GREEN_URL=<span class="hljs-string">"http://green.internal.example.com"
PUBLIC_URL=<span class="hljs-string">"https://staging.example.com"
LISTENER_ARN=<span class="hljs-string">"${LISTENER_ARN}"
BLUE_TG=<span class="hljs-string">"${BLUE_TG_ARN}"
GREEN_TG=<span class="hljs-string">"${GREEN_TG_ARN}"

<span class="hljs-built_in">echo <span class="hljs-string">"=== Rollback Test ==="

<span class="hljs-comment"># 1. Verify we know current state
INITIAL_ACTIVE=$(curl -s <span class="hljs-string">"$PUBLIC_URL/version" <span class="hljs-pipe">| jq -r <span class="hljs-string">'.version')
<span class="hljs-built_in">echo <span class="hljs-string">"Initial active version: $INITIAL_ACTIVE"

<span class="hljs-comment"># 2. Switch to green (simulating a deployment)
<span class="hljs-built_in">echo <span class="hljs-string">"Simulating deployment: switching to green..."
aws elbv2 modify-listener \
  --listener-arn <span class="hljs-string">"$LISTENER_ARN" \
  --default-actions <span class="hljs-string">"Type=forward,TargetGroupArn=$GREEN_TG"

<span class="hljs-built_in">sleep 5

GREEN_VERSION=$(curl -s <span class="hljs-string">"$PUBLIC_URL/version" <span class="hljs-pipe">| jq -r <span class="hljs-string">'.version')
<span class="hljs-built_in">echo <span class="hljs-string">"Green version active: $GREEN_VERSION"

<span class="hljs-comment"># 3. Perform rollback
<span class="hljs-built_in">echo <span class="hljs-string">"Performing rollback..."
ROLLBACK_START=$(<span class="hljs-built_in">date +%s%3N)
aws elbv2 modify-listener \
  --listener-arn <span class="hljs-string">"$LISTENER_ARN" \
  --default-actions <span class="hljs-string">"Type=forward,TargetGroupArn=$BLUE_TG"
ROLLBACK_END=$(<span class="hljs-built_in">date +%s%3N)
ROLLBACK_TIME=$((ROLLBACK_END - ROLLBACK_START))

<span class="hljs-built_in">echo <span class="hljs-string">"Rollback completed in ${ROLLBACK_TIME}ms"

<span class="hljs-comment"># 4. Verify rollback restored correct version
<span class="hljs-built_in">sleep 2
RESTORED_VERSION=$(curl -s <span class="hljs-string">"$PUBLIC_URL/version" <span class="hljs-pipe">| jq -r <span class="hljs-string">'.version')

<span class="hljs-keyword">if [ <span class="hljs-string">"$RESTORED_VERSION" = <span class="hljs-string">"$INITIAL_ACTIVE" ]; <span class="hljs-keyword">then
  <span class="hljs-built_in">echo <span class="hljs-string">"PASS: Rollback successful. Version restored: $RESTORED_VERSION"
<span class="hljs-keyword">else
  <span class="hljs-built_in">echo <span class="hljs-string">"FAIL: Rollback failed. Expected $INITIAL_ACTIVE, got <span class="hljs-variable">$RESTORED_VERSION"
  <span class="hljs-built_in">exit 1
<span class="hljs-keyword">fi

<span class="hljs-comment"># 5. Verify functionality after rollback
STATUS=$(curl -s -o /dev/null -w <span class="hljs-string">"%{http_code}" <span class="hljs-string">"$PUBLIC_URL/health")
[ <span class="hljs-string">"$STATUS" = <span class="hljs-string">"200" ] <span class="hljs-pipe">|| { <span class="hljs-built_in">echo <span class="hljs-string">"FAIL: Health check failed after rollback"; <span class="hljs-built_in">exit 1; }
<span class="hljs-built_in">echo <span class="hljs-string">"PASS: Health check passes after rollback"
<span class="hljs-built_in">echo <span class="hljs-string">"Rollback time: ${ROLLBACK_TIME}ms"

Database Migration Testing in Blue-Green

Database migrations are the trickiest part of blue-green. The safest pattern:

Expand phase (before cutover): add new columns, add new tables. Both blue and green must work with the expanded schema.

Contract phase (after cutover is stable): remove old columns. Only green needs to work.

Test the expand phase before cutover:

// test/migration-compatibility.test.js
describe('Database schema compatibility', () => {
  it('new schema columns are nullable (backwards compatible)', async () => {
    // Connect to test DB with expanded schema
    const result = await db.query(`
      SELECT column_name, is_nullable, column_default
      FROM information_schema.columns
      WHERE table_name = 'users'
      AND column_name = 'new_column'
    `);
    
    expect(result.rows[0].is_nullable).toBe('YES');  // Nullable = blue still works
  });

  it('blue application queries still work with new schema', async () => {
    // Run blue's query patterns against the expanded schema
    const result = await db.query(
      'SELECT id, email, name FROM users WHERE id = $1',
      ['test-user-id']
    );
    expect(result.rows).toHaveLength(1);
  });

  it('green application can read and write new columns', async () => {
    await db.query(
      'UPDATE users SET new_column = $1 WHERE id = $2',
      ['new-value', 'test-user-id']
    );
    const result = await db.query(
      'SELECT new_column FROM users WHERE id = $1',
      ['test-user-id']
    );
    expect(result.rows[0].new_column).toBe('new-value');
  });
});

CI/CD Pipeline Integration

# .github/workflows/blue-green-deploy.yml
name: Blue-Green Deployment

on:
  push:
    branches: [main]

jobs:
  deploy-green:
    runs-on: ubuntu-latest
    outputs:
      green_url: ${{ steps.deploy.outputs.green_url }}
    
    steps:
      - uses: actions/checkout@v4

      - name: Deploy to green environment
        id: deploy
        run: |
          # Deploy green (your deployment mechanism)
          ./scripts/deploy-green.sh
          echo "green_url=$GREEN_URL" >> $GITHUB_OUTPUT

  validate-green:
    needs: deploy-green
    runs-on: ubuntu-latest
    
    steps:
      - uses: actions/checkout@v4

      - name: Validate green deployment
        run: bash scripts/validate-green.sh ${{ needs.deploy-green.outputs.green_url }}

      - name: Run smoke tests
        env:
          GREEN_URL: ${{ needs.deploy-green.outputs.green_url }}
          BLUE_URL: ${{ vars.BLUE_URL }}
          SMOKE_TEST_EMAIL: ${{ secrets.SMOKE_TEST_EMAIL }}
          SMOKE_TEST_PASSWORD: ${{ secrets.SMOKE_TEST_PASSWORD }}
        run: npx jest smoke-tests/smoke.test.js --forceExit

  cutover:
    needs: validate-green
    runs-on: ubuntu-latest
    environment: production  # Requires manual approval in GitHub
    
    steps:
      - name: Switch traffic to green
        run: |
          aws elbv2 modify-listener \
            --listener-arn "${{ vars.LISTENER_ARN }}" \
            --default-actions "Type=forward,TargetGroupArn=${{ vars.GREEN_TG_ARN }}"

      - name: Post-cutover validation
        run: |
          # Wait for steady state
          sleep 10
          
          # Verify public URL is now on green
          ACTIVE_VERSION=$(curl -s "${{ vars.PUBLIC_URL }}/version" | jq -r '.version')
          EXPECTED_VERSION=$(git rev-parse --short HEAD)
          
          if [ "$ACTIVE_VERSION" != "$EXPECTED_VERSION" ]; then
            echo "Cutover failed — rolling back"
            aws elbv2 modify-listener \
              --listener-arn "${{ vars.LISTENER_ARN }}" \
              --default-actions "Type=forward,TargetGroupArn=${{ vars.BLUE_TG_ARN }}"
            exit 1
          fi
          
          echo "Cutover successful. Version $ACTIVE_VERSION is live."

Read more