Testing

Gradual Rollouts and Canary Deployments: Testing Strategies for Progressive Delivery

HelpMeTest

19 May 2026 — 6 min read

Gradual rollouts and canary deployments reduce risk but introduce new testing problems: how do you test that the rollout percentage is correct, that monitoring detects regressions, and that rollback works before you need it in production? This guide covers testing strategies for the full progressive delivery lifecycle.

What Makes Progressive Delivery Hard to Test

Feature flags with percentage rollouts and canary deployments both share the same challenge: the system behaves differently for different users at the same time. Testing has to cover:

Rollout correctness — does the 5% actually reach 5% of users?
Version coexistence — does the old and new code path work simultaneously?
Monitoring integration — do your health metrics detect a bad rollout?
Rollback behavior — does disabling the flag or reverting the canary restore normal behavior?
Observability — are variant assignments tracked so you can correlate them with error rates?

Testing Rollout Percentages

The foundation of gradual rollouts is deterministic, evenly distributed assignment. Test this:

// src/rollout-manager.js
const crypto = require('crypto');

class RolloutManager {
  constructor(config) {
    this.config = config; // { featureKey: { percentage: 10, salt: 'abc' } }
  }

  isEnabled(featureKey, userId) {
    const featureConfig = this.config[featureKey];
    if (!featureConfig) return false;
    if (featureConfig.percentage <= 0) return false;
    if (featureConfig.percentage >= 100) return true;

    const input = `${featureKey}:${featureConfig.salt}:${userId}`;
    const hash = crypto.createHash('sha256').update(input).digest('hex');
    const bucket = (parseInt(hash.slice(0, 8), 16) % 100) + 1;

    return bucket <= featureConfig.percentage;
  }

  getActivePercentage(featureKey) {
    return this.config[featureKey]?.percentage ?? 0;
  }
}

module.exports = { RolloutManager };

// test/rollout-manager.test.js
const { RolloutManager } = require('../src/rollout-manager');

const FEATURE_CONFIG = {
  new_checkout: { percentage: 10, salt: 'nc_salt_v1' },
  updated_pricing: { percentage: 50, salt: 'up_salt_v1' },
};

describe('RolloutManager', () => {
  it('assigns deterministically — same user always gets same result', () => {
    const mgr = new RolloutManager(FEATURE_CONFIG);
    const userId = 'stable_user_xyz';

    const results = Array.from({ length: 20 }, () =>
      mgr.isEnabled('new_checkout', userId)
    );

    expect(new Set(results).size).toBe(1); // All identical
  });

  it('reaches approximately the configured percentage', () => {
    const mgr = new RolloutManager(FEATURE_CONFIG);
    let enabled = 0;

    for (let i = 0; i < 10000; i++) {
      if (mgr.isEnabled('new_checkout', `user_${i}`)) enabled++;
    }

    expect(enabled / 10000).toBeCloseTo(0.10, 1); // ±2%
  });

  it('users in 10% rollout are still in 50% rollout', () => {
    // Monotonicity: users included at lower percentage stay included at higher
    // (requires same salt — use feature-specific salts that don't change)
    const mgr10 = new RolloutManager({ feat: { percentage: 10, salt: 'same_salt' } });
    const mgr50 = new RolloutManager({ feat: { percentage: 50, salt: 'same_salt' } });

    for (let i = 0; i < 200; i++) {
      const userId = `user_${i}`;
      const at10 = mgr10.isEnabled('feat', userId);
      const at50 = mgr50.isEnabled('feat', userId);

      if (at10) {
        expect(at50).toBe(true); // Must be in 50% if in 10%
      }
    }
  });

  it('returns false for unknown feature', () => {
    const mgr = new RolloutManager(FEATURE_CONFIG);
    expect(mgr.isEnabled('unknown_feature', 'user_1')).toBe(false);
  });
});

Testing Version Coexistence

During a canary deployment, both old and new code run simultaneously. Test that shared state (database, cache, APIs) handles both versions:

// test/version-coexistence.test.js
const { v1Handler } = require('../src/handlers/v1-checkout');
const { v2Handler } = require('../src/handlers/v2-checkout');
const db = require('../src/db');

describe('checkout v1/v2 coexistence', () => {
  beforeEach(async () => {
    await db.orders.clear();
  });

  it('orders created by v1 can be read by v2', async () => {
    // v1 creates an order
    const order = await v1Handler.createOrder({ userId: 'user1', items: [{ sku: 'A', qty: 1 }] });

    // v2 reads that order
    const retrieved = await v2Handler.getOrder(order.id);

    expect(retrieved.id).toBe(order.id);
    expect(retrieved.items).toHaveLength(1);
    expect(retrieved.status).toBe('pending');
  });

  it('orders created by v2 are readable by v1', async () => {
    // v2 creates an order (may have extra fields)
    const order = await v2Handler.createOrder({ userId: 'user2', items: [{ sku: 'B', qty: 2 }], source: 'mobile' });

    // v1 reads without knowing about the new "source" field
    const retrieved = await v1Handler.getOrder(order.id);

    expect(retrieved.id).toBe(order.id);
    expect(retrieved.status).toBe('pending');
    // v1 should handle extra unknown fields gracefully (not throw)
  });

  it('concurrent v1 and v2 writes do not corrupt order state', async () => {
    const orderId = 'concurrent_test_order';
    
    // Simulate concurrent updates from both versions
    await Promise.all([
      v1Handler.updateOrderStatus(orderId, 'processing'),
      v2Handler.addOrderTag(orderId, 'mobile'),
    ]);

    const order = await db.orders.findById(orderId);
    expect(order.status).toBe('processing');
    expect(order.tags).toContain('mobile');
  });
});

Testing Rollback Behavior

A rollback must restore previous behavior completely. Test the rollback path as carefully as the rollout path:

// test/rollback.test.js
const { RolloutManager } = require('../src/rollout-manager');
const { getCheckoutFlow } = require('../src/checkout');

describe('feature flag rollback', () => {
  it('reverts to old behavior when flag is disabled', async () => {
    // Simulate: flag is on
    const mgr = new RolloutManager({ checkout_v2: { percentage: 100, salt: 'x' } });
    const flow = getCheckoutFlow(mgr);
    
    const resultEnabled = await flow.getCheckoutConfig('user_1');
    expect(resultEnabled.version).toBe('v2');

    // Simulate: flag is rolled back to 0%
    const mgrRolledBack = new RolloutManager({ checkout_v2: { percentage: 0, salt: 'x' } });
    const flowRolledBack = getCheckoutFlow(mgrRolledBack);

    const resultDisabled = await flowRolledBack.getCheckoutConfig('user_1');
    expect(resultDisabled.version).toBe('v1');
  });

  it('all users in 100% rollout return to v1 after rollback', async () => {
    const mgr = new RolloutManager({ checkout_v2: { percentage: 0, salt: 'x' } });

    for (let i = 0; i < 100; i++) {
      const flow = getCheckoutFlow(mgr);
      const config = await flow.getCheckoutConfig(`user_${i}`);
      expect(config.version).toBe('v1'); // All users on v1 after rollback
    }
  });
});

Testing Canary Error Rate Detection

Canary deployments automatically halt when error rates exceed a threshold. Test the detection logic:

// src/canary-monitor.js
class CanaryMonitor {
  constructor(options = {}) {
    this.baseline = options.baseline; // { errorRate: 0.02 }
    this.threshold = options.threshold || 2.0; // Halt if canary is 2x baseline
    this.minSamples = options.minSamples || 100;
    this.window = { errors: 0, requests: 0 };
  }

  record(success) {
    this.window.requests++;
    if (!success) this.window.errors++;
  }

  shouldHalt() {
    if (this.window.requests < this.minSamples) return false;

    const canaryErrorRate = this.window.errors / this.window.requests;
    const baselineErrorRate = this.baseline.errorRate;

    return canaryErrorRate > baselineErrorRate * this.threshold;
  }

  getMetrics() {
    return {
      requests: this.window.requests,
      errors: this.window.errors,
      errorRate: this.window.requests > 0 ? this.window.errors / this.window.requests : 0,
    };
  }
}

module.exports = { CanaryMonitor };

// test/canary-monitor.test.js
const { CanaryMonitor } = require('../src/canary-monitor');

const baseline = { errorRate: 0.02 }; // 2% baseline error rate

describe('CanaryMonitor', () => {
  it('does not halt below minimum sample size', () => {
    const monitor = new CanaryMonitor({ baseline, threshold: 2.0, minSamples: 100 });

    // Send 50 requests with 50% error rate — not enough samples
    for (let i = 0; i < 50; i++) {
      monitor.record(i % 2 === 0); // alternating success/failure
    }

    expect(monitor.shouldHalt()).toBe(false);
  });

  it('does not halt when canary error rate is near baseline', () => {
    const monitor = new CanaryMonitor({ baseline, threshold: 2.0, minSamples: 100 });

    // 3% error rate with 200 requests (1.5x baseline, below 2x threshold)
    for (let i = 0; i < 200; i++) {
      monitor.record(i % 33 !== 0); // ~3% error rate
    }

    expect(monitor.shouldHalt()).toBe(false);
  });

  it('halts when canary error rate exceeds threshold', () => {
    const monitor = new CanaryMonitor({ baseline, threshold: 2.0, minSamples: 100 });

    // 10% error rate with 200 requests (5x baseline, above 2x threshold)
    for (let i = 0; i < 200; i++) {
      monitor.record(i % 10 !== 0); // 10% error rate
    }

    expect(monitor.shouldHalt()).toBe(true);
  });

  it('tracks metrics correctly', () => {
    const monitor = new CanaryMonitor({ baseline, threshold: 2.0, minSamples: 100 });

    for (let i = 0; i < 10; i++) {
      monitor.record(i < 8); // 2 errors, 8 successes
    }

    const metrics = monitor.getMetrics();
    expect(metrics.requests).toBe(10);
    expect(metrics.errors).toBe(2);
    expect(metrics.errorRate).toBeCloseTo(0.2);
  });
});

Testing Observability — Variant Tracking

For canary analysis to work, you need reliable data about which users got which version. Test the tracking:

// test/variant-tracking.test.js
const { analyticsClient } = require('../src/analytics');
const { checkoutService } = require('../src/checkout-service');

jest.spyOn(analyticsClient, 'track');

beforeEach(() => {
  analyticsClient.track.mockClear();
});

it('tracks variant assignment when user enters checkout', async () => {
  await checkoutService.initCheckout('user_123', { rolloutEnabled: true });

  expect(analyticsClient.track).toHaveBeenCalledWith(
    'Feature Flag Evaluated',
    expect.objectContaining({
      userId: 'user_123',
      featureKey: 'checkout_v2',
      variant: expect.stringMatching(/^(control|treatment)$/),
    })
  );
});

it('logs the same variant on repeated checkout calls', async () => {
  await checkoutService.initCheckout('user_stable', { rolloutEnabled: true });
  await checkoutService.initCheckout('user_stable', { rolloutEnabled: true });

  const calls = analyticsClient.track.mock.calls
    .filter(([event]) => event === 'Feature Flag Evaluated')
    .map(([, props]) => props.variant);

  expect(new Set(calls).size).toBe(1); // Always same variant for same user
});

Integration With HelpMeTest Monitoring

Beyond automated tests, continuous monitoring during a rollout provides a safety net. HelpMeTest can run scheduled health checks against production while a rollout is active, testing both the control and treatment paths in parallel and alerting the moment error rates diverge between them.

Summary

Test rollout percentage with large samples (10,000 users) to detect distribution skew
Test monotonicity: users in 10% rollout must stay in 20% rollout with the same salt
Test version coexistence: old and new code must read/write shared state correctly
Test rollback: disabling the flag must fully restore previous behavior for all users
Test canary error rate detection: the halt threshold fires at the right error rate
Test variant tracking: analytics events must fire reliably for canary analysis to work

Gradual Rollouts and Canary Deployments: Testing Strategies for Progressive Delivery

HelpMeTest

What Makes Progressive Delivery Hard to Test

Testing Rollout Percentages

Testing Version Coexistence

Testing Rollback Behavior

Testing Canary Error Rate Detection

Testing Observability — Variant Tracking

Integration With HelpMeTest Monitoring

Summary

Read more

Testing Stripe Billing Integration: Subscriptions, Webhooks, and Metered Usage

Temporal Workflow Testing: Unit Tests, Replays, and Test Server

Sidekiq Testing Patterns in Rails: Unit, Integration, and System Tests

Video Transcoding and Processing Testing: FFmpeg, AWS MediaConvert, and Mux