Feature Flag Testing Best Practices: Writing Tests That Work With Toggles

Feature Flag Testing Best Practices: Writing Tests That Work With Toggles

Feature flags introduce conditional logic that multiplies the number of code paths your application can take. Without a deliberate testing strategy, you end up with tests that only cover one flag state, silent regressions when flags change, and untested code accumulating behind flags.

The Core Problem

A single feature flag doubles your code paths. Ten flags mean up to 1,024 combinations. You can't test all combinations, but you must test all meaningful ones.

// This function has 4 meaningful states
function getCheckoutFlow(gb) {
  const newUi = gb.isOn('new-checkout-ui')          // 2 states
  const expressCheckout = gb.isOn('express-checkout') // 2 states
  
  if (newUi && expressCheckout) return 'new-express'
  if (newUi) return 'new-standard'
  if (expressCheckout) return 'legacy-express'
  return 'legacy-standard'
}

Rule 1: Test Every Variant Explicitly

Don't test "with flag on" and assume "with flag off" works. Write explicit test cases for each meaningful state.

describe('getCheckoutFlow', () => {
  const states = [
    { newUi: false, express: false, expected: 'legacy-standard' },
    { newUi: true,  express: false, expected: 'new-standard' },
    { newUi: false, express: true,  expected: 'legacy-express' },
    { newUi: true,  express: true,  expected: 'new-express' },
  ]
  
  test.each(states)(
    'newUi=$newUi express=$express → $expected',
    ({ newUi, express, expected }) => {
      const gb = createTestGrowthBook({
        forcedFeatureValues: {
          'new-checkout-ui': newUi,
          'express-checkout': express,
        }
      })
      expect(getCheckoutFlow(gb)).toBe(expected)
    }
  )
})

Rule 2: Name Tests by Flag State

Include the flag state in the test name. This makes failures immediately clear:

// Bad — unclear which variant failed
test('checkout renders correctly', ...)

// Good — clear which variant
test('checkout renders correctly [new-ui=ON express=OFF]', ...)
test('checkout renders correctly [new-ui=OFF express=ON]', ...)

Rule 3: Default to the Production State

Most flags default to false (off) in production until they're fully rolled out. Your tests that don't explicitly set a flag should reflect this default.

// Define production defaults in a shared fixture
export const PRODUCTION_FLAG_DEFAULTS = {
  'new-checkout-ui': false,
  'express-checkout': false,
  'new-pricing': false,
}

// Use in your test helper
export function createTestClient(overrides = {}) {
  return createGrowthBook({
    forcedFeatureValues: {
      ...PRODUCTION_FLAG_DEFAULTS,
      ...overrides,
    }
  })
}

Rule 4: Test the Removal Path

When a flag is fully rolled out and you're removing it, write tests that verify the "always on" behavior before deleting the flag code. This prevents regressions when someone accidentally re-creates the flag as disabled.

// Before removing 'new-checkout-ui' flag
test('checkout uses new UI (flag cleanup verification)', () => {
  // This test should pass with flag removed
  renderCheckout()
  expect(screen.getByTestId('checkout-v2')).toBeInTheDocument()
})

Rule 5: Test Flag Evaluation Order

If your application evaluates multiple flags, order matters for performance and correctness:

test('does not evaluate expensive flag when cheap gate fails', () => {
  const cheapGate = vi.fn().mockReturnValue(false)
  const expensiveGate = vi.fn()
  
  const result = getFeatureSet({ cheapGate, expensiveGate })
  
  expect(cheapGate).toHaveBeenCalled()
  expect(expensiveGate).not.toHaveBeenCalled()
  expect(result).toBe('default')
})

CI Strategy: Matrix Testing

Use CI matrix builds to run your full test suite against multiple flag configurations:

# .github/workflows/test.yml
jobs:
  test:
    strategy:
      matrix:
        flag-set:
          - name: all-off
            NEW_CHECKOUT_UI: false
            EXPRESS_CHECKOUT: false
          - name: all-on
            NEW_CHECKOUT_UI: true
            EXPRESS_CHECKOUT: true
          - name: partial
            NEW_CHECKOUT_UI: true
            EXPRESS_CHECKOUT: false
    
    name: Tests (${{ matrix.flag-set.name }})
    env:
      FEATURE_NEW_CHECKOUT_UI: ${{ matrix.flag-set.NEW_CHECKOUT_UI }}
      FEATURE_EXPRESS_CHECKOUT: ${{ matrix.flag-set.EXPRESS_CHECKOUT }}
    steps:
      - run: npm test

Your application reads env vars to override flags in test mode:

// flag-overrides.js (test environment only)
export function getTestOverrides() {
  if (process.env.NODE_ENV !== 'test') return {}
  
  return {
    'new-checkout-ui': process.env.FEATURE_NEW_CHECKOUT_UI === 'true',
    'express-checkout': process.env.FEATURE_EXPRESS_CHECKOUT === 'true',
  }
}

Preventing Flag Debt

Feature flags that outlive their usefulness become liabilities. Track them:

// flag-registry.js
export const FLAGS = {
  NEW_CHECKOUT_UI: {
    key: 'new-checkout-ui',
    description: 'New single-page checkout flow',
    createdAt: '2024-01-15',
    plannedRemovalAt: '2024-03-01',
    owner: 'checkout-team',
  },
  EXPRESS_CHECKOUT: {
    key: 'express-checkout',
    description: 'One-click checkout for returning customers',
    createdAt: '2024-02-01',
    plannedRemovalAt: '2024-04-01',
    owner: 'checkout-team',
  },
}

// Automated staleness check in tests
test('no flags past their planned removal date', () => {
  const now = new Date()
  const staleFlags = Object.values(FLAGS).filter(f => {
    if (!f.plannedRemovalAt) return false
    return new Date(f.plannedRemovalAt) < now
  })
  
  if (staleFlags.length > 0) {
    const names = staleFlags.map(f => f.key).join(', ')
    throw new Error(`Stale flags need removal: ${names}`)
  }
})

Testing Flag Interactions with Database State

Some flags interact with database schema or migrations. Test these explicitly:

test('new checkout flag works with both old and new order schema', async () => {
  // Create order with old schema format
  const oldOrder = await db.orders.create({ ...legacyOrderData })
  
  // New checkout flag should still work with legacy data
  const gb = createTestGrowthBook({ forcedFeatureValues: { 'new-checkout-ui': true } })
  const result = await renderCheckoutSummary(oldOrder.id, gb)
  
  expect(result.status).toBe(200)
  expect(result.body.total).toBe(oldOrder.total)
})

Summary

Feature flag testing requires explicit coverage of each flag state, not just the happy path. Follow these principles:

  1. Write test cases for each variant combination that matters
  2. Name tests to include flag state
  3. Establish production defaults in a shared test fixture
  4. Write cleanup tests before removing flags
  5. Use CI matrix builds to catch flag interaction regressions

The goal is that any flag state change — enable, disable, remove — should immediately surface which tests are affected and what behavior is expected.

Read more