Blue-Green Deployment Testing: Smoke Tests, Traffic Cutover & Rollback Validation
Blue-green deployment reduces downtime risk by running two identical production environments — blue (live) and green (new). The switch happens at the router level, and rollback is instant: flip traffic back to blue. The testing challenge is making the cutover decision confidently: is green actually ready?
The Blue-Green Testing Problem
A blue-green cutover is irreversible in the moment — users are immediately on the new version. The window to catch issues is before the cutover, not after. Testing must answer:
- Does the green deployment start correctly and pass health checks?
- Does green produce the same responses as blue for critical paths?
- Does green handle the expected load?
- If something goes wrong, does rollback actually restore green to working state?
Health Check Validation
Before any traffic reaches green, validate it's genuinely healthy:
#!/bin/bash
<span class="hljs-comment"># validate-green.sh
GREEN_URL=<span class="hljs-string">"${1:-http://green.internal.example.com}"
MAX_RETRIES=30
RETRY_INTERVAL=5
<span class="hljs-built_in">echo <span class="hljs-string">"Validating green deployment at $GREEN_URL"
<span class="hljs-comment"># Phase 1: Basic health check with retries
<span class="hljs-built_in">echo <span class="hljs-string">"Phase 1: Health endpoint"
<span class="hljs-keyword">for i <span class="hljs-keyword">in $(<span class="hljs-built_in">seq 1 <span class="hljs-variable">$MAX_RETRIES); <span class="hljs-keyword">do
STATUS=$(curl -s -o /dev/null -w <span class="hljs-string">"%{http_code}" <span class="hljs-string">"$GREEN_URL/health")
<span class="hljs-keyword">if [ <span class="hljs-string">"$STATUS" = <span class="hljs-string">"200" ]; <span class="hljs-keyword">then
<span class="hljs-built_in">echo <span class="hljs-string">" Health check passed on attempt $i"
<span class="hljs-built_in">break
<span class="hljs-keyword">fi
<span class="hljs-built_in">echo <span class="hljs-string">" Attempt $i: got <span class="hljs-variable">$STATUS, retrying in <span class="hljs-variable">${RETRY_INTERVAL}s..."
<span class="hljs-built_in">sleep <span class="hljs-variable">$RETRY_INTERVAL
[ <span class="hljs-variable">$i -eq <span class="hljs-variable">$MAX_RETRIES ] && { <span class="hljs-built_in">echo <span class="hljs-string">"FAIL: Health check never passed"; <span class="hljs-built_in">exit 1; }
<span class="hljs-keyword">done
<span class="hljs-comment"># Phase 2: Deep health — dependencies reachable
<span class="hljs-built_in">echo <span class="hljs-string">"Phase 2: Dependency health"
DEEP_HEALTH=$(curl -s <span class="hljs-string">"$GREEN_URL/health/deep")
DB_STATUS=$(<span class="hljs-built_in">echo <span class="hljs-string">"$DEEP_HEALTH" <span class="hljs-pipe">| jq -r <span class="hljs-string">'.database.status')
CACHE_STATUS=$(<span class="hljs-built_in">echo <span class="hljs-string">"$DEEP_HEALTH" <span class="hljs-pipe">| jq -r <span class="hljs-string">'.cache.status')
QUEUE_STATUS=$(<span class="hljs-built_in">echo <span class="hljs-string">"$DEEP_HEALTH" <span class="hljs-pipe">| jq -r <span class="hljs-string">'.queue.status')
[ <span class="hljs-string">"$DB_STATUS" = <span class="hljs-string">"healthy" ] <span class="hljs-pipe">|| { <span class="hljs-built_in">echo <span class="hljs-string">"FAIL: Database unhealthy: $DB_STATUS"; <span class="hljs-built_in">exit 1; }
[ <span class="hljs-string">"$CACHE_STATUS" = <span class="hljs-string">"healthy" ] <span class="hljs-pipe">|| { <span class="hljs-built_in">echo <span class="hljs-string">"FAIL: Cache unhealthy: $CACHE_STATUS"; <span class="hljs-built_in">exit 1; }
[ <span class="hljs-string">"$QUEUE_STATUS" = <span class="hljs-string">"healthy" ] <span class="hljs-pipe">|| { <span class="hljs-built_in">echo <span class="hljs-string">"FAIL: Queue unhealthy: $QUEUE_STATUS"; <span class="hljs-built_in">exit 1; }
<span class="hljs-built_in">echo <span class="hljs-string">" All dependencies healthy"
<span class="hljs-comment"># Phase 3: Version check — is the right version deployed?
<span class="hljs-built_in">echo <span class="hljs-string">"Phase 3: Version validation"
DEPLOYED_VERSION=$(curl -s <span class="hljs-string">"$GREEN_URL/version" <span class="hljs-pipe">| jq -r <span class="hljs-string">'.version')
EXPECTED_VERSION=<span class="hljs-string">"${EXPECTED_VERSION:-$(git rev-parse --short HEAD)}"
[ <span class="hljs-string">"$DEPLOYED_VERSION" = <span class="hljs-string">"$EXPECTED_VERSION" ] <span class="hljs-pipe">|| {
<span class="hljs-built_in">echo <span class="hljs-string">"FAIL: Wrong version deployed. Expected $EXPECTED_VERSION, got <span class="hljs-variable">$DEPLOYED_VERSION"
<span class="hljs-built_in">exit 1
}
<span class="hljs-built_in">echo <span class="hljs-string">" Version $DEPLOYED_VERSION confirmed"
<span class="hljs-built_in">echo <span class="hljs-string">"Green deployment validation PASSED"Automated Smoke Tests
Smoke tests verify critical paths work on green before the cutover. They're minimal, fast, and focused on the highest-value flows:
// smoke-tests/smoke.test.js
const axios = require('axios');
const GREEN_URL = process.env.GREEN_URL;
const BLUE_URL = process.env.BLUE_URL;
// Auth credentials for smoke test user (created in test environment)
const TEST_EMAIL = process.env.SMOKE_TEST_EMAIL;
const TEST_PASSWORD = process.env.SMOKE_TEST_PASSWORD;
let authToken;
// Setup: authenticate against GREEN
beforeAll(async () => {
const response = await axios.post(`${GREEN_URL}/api/auth/login`, {
email: TEST_EMAIL,
password: TEST_PASSWORD
});
authToken = response.data.token;
});
const authHeaders = () => ({ Authorization: `Bearer ${authToken}` });
describe('Critical path smoke tests', () => {
it('home page loads', async () => {
const response = await axios.get(`${GREEN_URL}/`);
expect(response.status).toBe(200);
expect(response.headers['content-type']).toContain('text/html');
});
it('authentication works', async () => {
// Already tested in beforeAll — verify token is valid
const response = await axios.get(`${GREEN_URL}/api/me`, {
headers: authHeaders()
});
expect(response.status).toBe(200);
expect(response.data.email).toBe(TEST_EMAIL);
});
it('primary API endpoints respond', async () => {
const endpoints = [
'/api/products',
'/api/categories',
'/api/user/preferences'
];
for (const endpoint of endpoints) {
const response = await axios.get(`${GREEN_URL}${endpoint}`, {
headers: authHeaders()
});
expect(response.status).toBe(200);
}
});
it('creates and retrieves a resource (data layer is working)', async () => {
// Create
const createResponse = await axios.post(
`${GREEN_URL}/api/items`,
{ name: `smoke-test-${Date.now()}`, category: 'test' },
{ headers: authHeaders() }
);
expect(createResponse.status).toBe(201);
const itemId = createResponse.data.id;
// Retrieve
const getResponse = await axios.get(
`${GREEN_URL}/api/items/${itemId}`,
{ headers: authHeaders() }
);
expect(getResponse.status).toBe(200);
expect(getResponse.data.id).toBe(itemId);
// Cleanup
await axios.delete(
`${GREEN_URL}/api/items/${itemId}`,
{ headers: authHeaders() }
);
});
it('static assets are served', async () => {
const response = await axios.get(`${GREEN_URL}/static/main.js`);
expect(response.status).toBe(200);
expect(response.headers['content-type']).toContain('javascript');
// Verify cache headers are present
expect(response.headers['cache-control']).toBeDefined();
});
});
describe('Blue/green response parity', () => {
it('product listing returns same schema', async () => {
const [blueResponse, greenResponse] = await Promise.all([
axios.get(`${BLUE_URL}/api/products`, { headers: authHeaders() }),
axios.get(`${GREEN_URL}/api/products`, { headers: authHeaders() })
]);
expect(blueResponse.status).toBe(greenResponse.status);
// Same top-level keys in response
const blueKeys = Object.keys(blueResponse.data).sort();
const greenKeys = Object.keys(greenResponse.data).sort();
expect(greenKeys).toEqual(blueKeys);
});
it('response time within acceptable range', async () => {
const measurements = [];
for (let i = 0; i < 5; i++) {
const start = Date.now();
await axios.get(`${GREEN_URL}/api/products`, { headers: authHeaders() });
measurements.push(Date.now() - start);
}
const avg = measurements.reduce((a, b) => a + b) / measurements.length;
// Green must respond within 200ms average (adjust for your SLA)
expect(avg).toBeLessThan(200);
});
});Traffic Cutover Testing
Test the cutover mechanism itself. This is rarely tested — teams assume the router switch works — until it doesn't.
AWS ALB Target Group Swap
#!/bin/bash
<span class="hljs-comment"># test-cutover.sh
ALB_ARN=<span class="hljs-string">"arn:aws:elasticloadbalancing:us-east-1:123456789:loadbalancer/app/my-alb/abc123"
LISTENER_ARN=<span class="hljs-string">"arn:aws:elasticloadbalancing:us-east-1:123456789:listener/app/my-alb/abc123/def456"
BLUE_TG=<span class="hljs-string">"arn:aws:elasticloadbalancing:us-east-1:123456789:targetgroup/blue-tg/111"
GREEN_TG=<span class="hljs-string">"arn:aws:elasticloadbalancing:us-east-1:123456789:targetgroup/green-tg/222"
PUBLIC_URL=<span class="hljs-string">"https://api.example.com"
<span class="hljs-comment"># Step 1: Send continuous requests during cutover (should see zero failures)
<span class="hljs-built_in">echo <span class="hljs-string">"Starting continuous request monitor..."
<span class="hljs-built_in">cat > /tmp/monitor.sh << <span class="hljs-string">'MONITOR'
<span class="hljs-comment">#!/bin/bash
PASS=0; FAIL=0
<span class="hljs-keyword">while <span class="hljs-literal">true; <span class="hljs-keyword">do
STATUS=$(curl -s -o /dev/null -w <span class="hljs-string">"%{http_code}" <span class="hljs-string">"$1/health")
[ <span class="hljs-string">"$STATUS" = <span class="hljs-string">"200" ] && ((PASS++)) <span class="hljs-pipe">|| { ((FAIL++)); <span class="hljs-built_in">echo <span class="hljs-string">"$(date): FAIL <span class="hljs-variable">$STATUS"; }
<span class="hljs-built_in">sleep 0.1
<span class="hljs-keyword">done
MONITOR
<span class="hljs-built_in">chmod +x /tmp/monitor.sh
bash /tmp/monitor.sh <span class="hljs-string">"$PUBLIC_URL" &
MONITOR_PID=$!
<span class="hljs-comment"># Step 2: Perform cutover
<span class="hljs-built_in">echo <span class="hljs-string">"Switching traffic to green..."
aws elbv2 modify-listener \
--listener-arn <span class="hljs-string">"$LISTENER_ARN" \
--default-actions Type=forward,TargetGroupArn=<span class="hljs-string">"$GREEN_TG"
<span class="hljs-built_in">echo <span class="hljs-string">"Waiting 30s to observe post-cutover behavior..."
<span class="hljs-built_in">sleep 30
<span class="hljs-comment"># Step 3: Verify traffic is on green
ACTIVE_VERSION=$(curl -s <span class="hljs-string">"$PUBLIC_URL/version" <span class="hljs-pipe">| jq -r <span class="hljs-string">'.version')
<span class="hljs-built_in">echo <span class="hljs-string">"Active version after cutover: $ACTIVE_VERSION"
<span class="hljs-comment"># Step 4: Stop monitor and check results
<span class="hljs-built_in">kill <span class="hljs-variable">$MONITOR_PID
<span class="hljs-built_in">wait <span class="hljs-variable">$MONITOR_PID 2>/dev/null
<span class="hljs-built_in">echo <span class="hljs-string">"Done. Check /tmp/monitor output for failures during cutover."Weighted Traffic Shift Testing (Canary within Blue-Green)
Before full cutover, shift a percentage of traffic:
# 10% to green, 90% stays on blue
aws elbv2 modify-listener \
--listener-arn <span class="hljs-string">"$LISTENER_ARN" \
--default-actions <span class="hljs-string">"Type=forward,ForwardConfig={
TargetGroups=[
{TargetGroupArn=$BLUE_TG,Weight=90},
{TargetGroupArn=<span class="hljs-variable">$GREEN_TG,Weight=10}
]
}"
<span class="hljs-comment"># Monitor error rate for 5 minutes
start_time=$(<span class="hljs-built_in">date +%s)
<span class="hljs-keyword">while <span class="hljs-literal">true; <span class="hljs-keyword">do
elapsed=$(( $(date +%s) - start_time ))
[ <span class="hljs-variable">$elapsed -gt 300 ] && <span class="hljs-built_in">break
<span class="hljs-comment"># Check green target group 4xx/5xx counts
ERROR_COUNT=$(aws cloudwatch get-metric-statistics \
--namespace AWS/ApplicationELB \
--metric-name HTTPCode_Target_5XX_Count \
--dimensions Name=TargetGroup,Value=<span class="hljs-string">"${GREEN_TG##*/}" \
--start-time <span class="hljs-string">"$(date -u -d '1 minute ago' +%Y-%m-%dT%H:%M:%S)" \
--end-time <span class="hljs-string">"$(date -u +%Y-%m-%dT%H:%M:%S)" \
--period 60 --statistics Sum \
<span class="hljs-pipe">| jq <span class="hljs-string">'.Datapoints[0].Sum // 0')
<span class="hljs-built_in">echo <span class="hljs-string">"$(date): Green error count last minute: <span class="hljs-variable">$ERROR_COUNT"
<span class="hljs-keyword">if [ <span class="hljs-string">"$(echo "$ERROR_COUNT > 5" <span class="hljs-pipe">| bc)" -eq 1 ]; <span class="hljs-keyword">then
<span class="hljs-built_in">echo <span class="hljs-string">"ERROR THRESHOLD EXCEEDED — rolling back"
<span class="hljs-comment"># Rollback
aws elbv2 modify-listener \
--listener-arn <span class="hljs-string">"$LISTENER_ARN" \
--default-actions <span class="hljs-string">"Type=forward,TargetGroupArn=$BLUE_TG"
<span class="hljs-built_in">exit 1
<span class="hljs-keyword">fi
<span class="hljs-built_in">sleep 30
<span class="hljs-keyword">done
<span class="hljs-built_in">echo <span class="hljs-string">"Canary phase stable — proceed to full cutover"Rollback Verification
A rollback that has never been tested will fail when you need it most.
#!/bin/bash
<span class="hljs-comment"># test-rollback.sh — run in staging periodically
BLUE_URL=<span class="hljs-string">"http://blue.internal.example.com"
GREEN_URL=<span class="hljs-string">"http://green.internal.example.com"
PUBLIC_URL=<span class="hljs-string">"https://staging.example.com"
LISTENER_ARN=<span class="hljs-string">"${LISTENER_ARN}"
BLUE_TG=<span class="hljs-string">"${BLUE_TG_ARN}"
GREEN_TG=<span class="hljs-string">"${GREEN_TG_ARN}"
<span class="hljs-built_in">echo <span class="hljs-string">"=== Rollback Test ==="
<span class="hljs-comment"># 1. Verify we know current state
INITIAL_ACTIVE=$(curl -s <span class="hljs-string">"$PUBLIC_URL/version" <span class="hljs-pipe">| jq -r <span class="hljs-string">'.version')
<span class="hljs-built_in">echo <span class="hljs-string">"Initial active version: $INITIAL_ACTIVE"
<span class="hljs-comment"># 2. Switch to green (simulating a deployment)
<span class="hljs-built_in">echo <span class="hljs-string">"Simulating deployment: switching to green..."
aws elbv2 modify-listener \
--listener-arn <span class="hljs-string">"$LISTENER_ARN" \
--default-actions <span class="hljs-string">"Type=forward,TargetGroupArn=$GREEN_TG"
<span class="hljs-built_in">sleep 5
GREEN_VERSION=$(curl -s <span class="hljs-string">"$PUBLIC_URL/version" <span class="hljs-pipe">| jq -r <span class="hljs-string">'.version')
<span class="hljs-built_in">echo <span class="hljs-string">"Green version active: $GREEN_VERSION"
<span class="hljs-comment"># 3. Perform rollback
<span class="hljs-built_in">echo <span class="hljs-string">"Performing rollback..."
ROLLBACK_START=$(<span class="hljs-built_in">date +%s%3N)
aws elbv2 modify-listener \
--listener-arn <span class="hljs-string">"$LISTENER_ARN" \
--default-actions <span class="hljs-string">"Type=forward,TargetGroupArn=$BLUE_TG"
ROLLBACK_END=$(<span class="hljs-built_in">date +%s%3N)
ROLLBACK_TIME=$((ROLLBACK_END - ROLLBACK_START))
<span class="hljs-built_in">echo <span class="hljs-string">"Rollback completed in ${ROLLBACK_TIME}ms"
<span class="hljs-comment"># 4. Verify rollback restored correct version
<span class="hljs-built_in">sleep 2
RESTORED_VERSION=$(curl -s <span class="hljs-string">"$PUBLIC_URL/version" <span class="hljs-pipe">| jq -r <span class="hljs-string">'.version')
<span class="hljs-keyword">if [ <span class="hljs-string">"$RESTORED_VERSION" = <span class="hljs-string">"$INITIAL_ACTIVE" ]; <span class="hljs-keyword">then
<span class="hljs-built_in">echo <span class="hljs-string">"PASS: Rollback successful. Version restored: $RESTORED_VERSION"
<span class="hljs-keyword">else
<span class="hljs-built_in">echo <span class="hljs-string">"FAIL: Rollback failed. Expected $INITIAL_ACTIVE, got <span class="hljs-variable">$RESTORED_VERSION"
<span class="hljs-built_in">exit 1
<span class="hljs-keyword">fi
<span class="hljs-comment"># 5. Verify functionality after rollback
STATUS=$(curl -s -o /dev/null -w <span class="hljs-string">"%{http_code}" <span class="hljs-string">"$PUBLIC_URL/health")
[ <span class="hljs-string">"$STATUS" = <span class="hljs-string">"200" ] <span class="hljs-pipe">|| { <span class="hljs-built_in">echo <span class="hljs-string">"FAIL: Health check failed after rollback"; <span class="hljs-built_in">exit 1; }
<span class="hljs-built_in">echo <span class="hljs-string">"PASS: Health check passes after rollback"
<span class="hljs-built_in">echo <span class="hljs-string">"Rollback time: ${ROLLBACK_TIME}ms"Database Migration Testing in Blue-Green
Database migrations are the trickiest part of blue-green. The safest pattern:
Expand phase (before cutover): add new columns, add new tables. Both blue and green must work with the expanded schema.
Contract phase (after cutover is stable): remove old columns. Only green needs to work.
Test the expand phase before cutover:
// test/migration-compatibility.test.js
describe('Database schema compatibility', () => {
it('new schema columns are nullable (backwards compatible)', async () => {
// Connect to test DB with expanded schema
const result = await db.query(`
SELECT column_name, is_nullable, column_default
FROM information_schema.columns
WHERE table_name = 'users'
AND column_name = 'new_column'
`);
expect(result.rows[0].is_nullable).toBe('YES'); // Nullable = blue still works
});
it('blue application queries still work with new schema', async () => {
// Run blue's query patterns against the expanded schema
const result = await db.query(
'SELECT id, email, name FROM users WHERE id = $1',
['test-user-id']
);
expect(result.rows).toHaveLength(1);
});
it('green application can read and write new columns', async () => {
await db.query(
'UPDATE users SET new_column = $1 WHERE id = $2',
['new-value', 'test-user-id']
);
const result = await db.query(
'SELECT new_column FROM users WHERE id = $1',
['test-user-id']
);
expect(result.rows[0].new_column).toBe('new-value');
});
});CI/CD Pipeline Integration
# .github/workflows/blue-green-deploy.yml
name: Blue-Green Deployment
on:
push:
branches: [main]
jobs:
deploy-green:
runs-on: ubuntu-latest
outputs:
green_url: ${{ steps.deploy.outputs.green_url }}
steps:
- uses: actions/checkout@v4
- name: Deploy to green environment
id: deploy
run: |
# Deploy green (your deployment mechanism)
./scripts/deploy-green.sh
echo "green_url=$GREEN_URL" >> $GITHUB_OUTPUT
validate-green:
needs: deploy-green
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Validate green deployment
run: bash scripts/validate-green.sh ${{ needs.deploy-green.outputs.green_url }}
- name: Run smoke tests
env:
GREEN_URL: ${{ needs.deploy-green.outputs.green_url }}
BLUE_URL: ${{ vars.BLUE_URL }}
SMOKE_TEST_EMAIL: ${{ secrets.SMOKE_TEST_EMAIL }}
SMOKE_TEST_PASSWORD: ${{ secrets.SMOKE_TEST_PASSWORD }}
run: npx jest smoke-tests/smoke.test.js --forceExit
cutover:
needs: validate-green
runs-on: ubuntu-latest
environment: production # Requires manual approval in GitHub
steps:
- name: Switch traffic to green
run: |
aws elbv2 modify-listener \
--listener-arn "${{ vars.LISTENER_ARN }}" \
--default-actions "Type=forward,TargetGroupArn=${{ vars.GREEN_TG_ARN }}"
- name: Post-cutover validation
run: |
# Wait for steady state
sleep 10
# Verify public URL is now on green
ACTIVE_VERSION=$(curl -s "${{ vars.PUBLIC_URL }}/version" | jq -r '.version')
EXPECTED_VERSION=$(git rev-parse --short HEAD)
if [ "$ACTIVE_VERSION" != "$EXPECTED_VERSION" ]; then
echo "Cutover failed — rolling back"
aws elbv2 modify-listener \
--listener-arn "${{ vars.LISTENER_ARN }}" \
--default-actions "Type=forward,TargetGroupArn=${{ vars.BLUE_TG_ARN }}"
exit 1
fi
echo "Cutover successful. Version $ACTIVE_VERSION is live."