Feature Flag Testing Best Practices: Writing Tests That Work With Toggles
Feature flags introduce conditional logic that multiplies the number of code paths your application can take. Without a deliberate testing strategy, you end up with tests that only cover one flag state, silent regressions when flags change, and untested code accumulating behind flags.
The Core Problem
A single feature flag doubles your code paths. Ten flags mean up to 1,024 combinations. You can't test all combinations, but you must test all meaningful ones.
// This function has 4 meaningful states
function getCheckoutFlow(gb) {
const newUi = gb.isOn('new-checkout-ui') // 2 states
const expressCheckout = gb.isOn('express-checkout') // 2 states
if (newUi && expressCheckout) return 'new-express'
if (newUi) return 'new-standard'
if (expressCheckout) return 'legacy-express'
return 'legacy-standard'
}Rule 1: Test Every Variant Explicitly
Don't test "with flag on" and assume "with flag off" works. Write explicit test cases for each meaningful state.
describe('getCheckoutFlow', () => {
const states = [
{ newUi: false, express: false, expected: 'legacy-standard' },
{ newUi: true, express: false, expected: 'new-standard' },
{ newUi: false, express: true, expected: 'legacy-express' },
{ newUi: true, express: true, expected: 'new-express' },
]
test.each(states)(
'newUi=$newUi express=$express → $expected',
({ newUi, express, expected }) => {
const gb = createTestGrowthBook({
forcedFeatureValues: {
'new-checkout-ui': newUi,
'express-checkout': express,
}
})
expect(getCheckoutFlow(gb)).toBe(expected)
}
)
})Rule 2: Name Tests by Flag State
Include the flag state in the test name. This makes failures immediately clear:
// Bad — unclear which variant failed
test('checkout renders correctly', ...)
// Good — clear which variant
test('checkout renders correctly [new-ui=ON express=OFF]', ...)
test('checkout renders correctly [new-ui=OFF express=ON]', ...)Rule 3: Default to the Production State
Most flags default to false (off) in production until they're fully rolled out. Your tests that don't explicitly set a flag should reflect this default.
// Define production defaults in a shared fixture
export const PRODUCTION_FLAG_DEFAULTS = {
'new-checkout-ui': false,
'express-checkout': false,
'new-pricing': false,
}
// Use in your test helper
export function createTestClient(overrides = {}) {
return createGrowthBook({
forcedFeatureValues: {
...PRODUCTION_FLAG_DEFAULTS,
...overrides,
}
})
}Rule 4: Test the Removal Path
When a flag is fully rolled out and you're removing it, write tests that verify the "always on" behavior before deleting the flag code. This prevents regressions when someone accidentally re-creates the flag as disabled.
// Before removing 'new-checkout-ui' flag
test('checkout uses new UI (flag cleanup verification)', () => {
// This test should pass with flag removed
renderCheckout()
expect(screen.getByTestId('checkout-v2')).toBeInTheDocument()
})Rule 5: Test Flag Evaluation Order
If your application evaluates multiple flags, order matters for performance and correctness:
test('does not evaluate expensive flag when cheap gate fails', () => {
const cheapGate = vi.fn().mockReturnValue(false)
const expensiveGate = vi.fn()
const result = getFeatureSet({ cheapGate, expensiveGate })
expect(cheapGate).toHaveBeenCalled()
expect(expensiveGate).not.toHaveBeenCalled()
expect(result).toBe('default')
})CI Strategy: Matrix Testing
Use CI matrix builds to run your full test suite against multiple flag configurations:
# .github/workflows/test.yml
jobs:
test:
strategy:
matrix:
flag-set:
- name: all-off
NEW_CHECKOUT_UI: false
EXPRESS_CHECKOUT: false
- name: all-on
NEW_CHECKOUT_UI: true
EXPRESS_CHECKOUT: true
- name: partial
NEW_CHECKOUT_UI: true
EXPRESS_CHECKOUT: false
name: Tests (${{ matrix.flag-set.name }})
env:
FEATURE_NEW_CHECKOUT_UI: ${{ matrix.flag-set.NEW_CHECKOUT_UI }}
FEATURE_EXPRESS_CHECKOUT: ${{ matrix.flag-set.EXPRESS_CHECKOUT }}
steps:
- run: npm testYour application reads env vars to override flags in test mode:
// flag-overrides.js (test environment only)
export function getTestOverrides() {
if (process.env.NODE_ENV !== 'test') return {}
return {
'new-checkout-ui': process.env.FEATURE_NEW_CHECKOUT_UI === 'true',
'express-checkout': process.env.FEATURE_EXPRESS_CHECKOUT === 'true',
}
}Preventing Flag Debt
Feature flags that outlive their usefulness become liabilities. Track them:
// flag-registry.js
export const FLAGS = {
NEW_CHECKOUT_UI: {
key: 'new-checkout-ui',
description: 'New single-page checkout flow',
createdAt: '2024-01-15',
plannedRemovalAt: '2024-03-01',
owner: 'checkout-team',
},
EXPRESS_CHECKOUT: {
key: 'express-checkout',
description: 'One-click checkout for returning customers',
createdAt: '2024-02-01',
plannedRemovalAt: '2024-04-01',
owner: 'checkout-team',
},
}
// Automated staleness check in tests
test('no flags past their planned removal date', () => {
const now = new Date()
const staleFlags = Object.values(FLAGS).filter(f => {
if (!f.plannedRemovalAt) return false
return new Date(f.plannedRemovalAt) < now
})
if (staleFlags.length > 0) {
const names = staleFlags.map(f => f.key).join(', ')
throw new Error(`Stale flags need removal: ${names}`)
}
})Testing Flag Interactions with Database State
Some flags interact with database schema or migrations. Test these explicitly:
test('new checkout flag works with both old and new order schema', async () => {
// Create order with old schema format
const oldOrder = await db.orders.create({ ...legacyOrderData })
// New checkout flag should still work with legacy data
const gb = createTestGrowthBook({ forcedFeatureValues: { 'new-checkout-ui': true } })
const result = await renderCheckoutSummary(oldOrder.id, gb)
expect(result.status).toBe(200)
expect(result.body.total).toBe(oldOrder.total)
})Summary
Feature flag testing requires explicit coverage of each flag state, not just the happy path. Follow these principles:
- Write test cases for each variant combination that matters
- Name tests to include flag state
- Establish production defaults in a shared test fixture
- Write cleanup tests before removing flags
- Use CI matrix builds to catch flag interaction regressions
The goal is that any flag state change — enable, disable, remove — should immediately surface which tests are affected and what behavior is expected.