Feature Flag Testing Best Practices with LaunchDarkly and Unleash

Feature Flag Testing Best Practices with LaunchDarkly and Unleash

Feature flags (feature toggles) let you deploy code to production without activating it for users. They enable trunk-based development, gradual rollouts, and instant kill switches for problematic features. But they add complexity to testing: every flag creates branching code paths that each need to be tested.

This guide covers best practices for testing with feature flags, using LaunchDarkly and Unleash as reference implementations.

The Testing Challenge Feature Flags Create

A single feature flag doubles the number of code paths through any flagged section. Ten flags create 2^10 = 1024 possible combinations. You can't test all combinations — you need a strategy.

The core challenge is that flags create state that lives outside your codebase. Your automated test suite can test specific flag combinations, but production behavior depends on flag values controlled by a remote service. A flag that worked in testing might behave differently when its rollout percentage changes.

Testing Strategy for Feature Flags

Test Each State, Not All Combinations

Test each flag in both states (on and off), but don't try to test all combinations:

  1. Flag off — the control path (what happens without the feature)
  2. Flag on — the treatment path (what happens with the feature)
  3. Critical combinations — if two flags interact (one builds on the other), test those explicit interactions

Ignore the combinatorial explosion of all possible states. In practice, flags are designed to be independent. If they're not, that's a design problem to fix, not a testing problem to solve.

Override Flags in Tests

The key to testing with feature flags is the ability to override flag values without going to the flag management service. Both LaunchDarkly and Unleash support this.

LaunchDarkly — Test Overrides

LaunchDarkly's SDK has a test mode:

import { init } from 'launchdarkly-js-client-sdk';
import { MockFlags } from 'launchdarkly-js-testing';

// In tests, use a mock LDClient
const mockClient = new MockFlags({
  'new-checkout-flow': true,
  'dark-mode': false,
  'beta-api': false,
});

// Inject mock client instead of real SDK
const app = createApp({ ldClient: mockClient });

For server-side SDKs (Node.js):

import { LDTestData } from '@launchdarkly/node-server-sdk';

const td = LDTestData();
const client = init('sdk-key', { updateProcessor: td.getFactory() });

// Override flags per-test
td.update(td.flag('new-checkout-flow').booleanFlag().variationForAll(true));

Unleash — Test Overrides

Unleash provides an in-memory client for testing:

import { InMemoryStorageProvider } from 'unleash-client';

const client = new Unleash({
  url: 'http://localhost',  // Not used with fake transport
  appName: 'test',
  customHeaders: {},
  storageProvider: new InMemoryStorageProvider(),
  disableMetrics: true,
  // Use fake toggle transport
  toggles: [
    { name: 'new-checkout-flow', enabled: true, strategies: [{ name: 'default' }] },
    { name: 'dark-mode', enabled: false, strategies: [] },
  ],
});

For simpler override patterns, use the UNLEASH_TOGGLE_* environment variables:

UNLEASH_TOGGLE_NEW_CHECKOUT_FLOW=true npm <span class="hljs-built_in">test

Separation of Concerns in Tests

Write tests that explicitly state which flag state they assume:

describe('Checkout', () => {
  describe('with new-checkout-flow: ON', () => {
    beforeEach(() => {
      flags.set('new-checkout-flow', true);
    });

    it('shows the new payment step', () => {
      // Test new behavior
    });

    it('uses the new order confirmation email', () => {
      // Test new behavior
    });
  });

  describe('with new-checkout-flow: OFF', () => {
    beforeEach(() => {
      flags.set('new-checkout-flow', false);
    });

    it('uses the legacy checkout flow', () => {
      // Test old behavior (still needed until flag is removed)
    });
  });
});

This makes it clear which tests apply to which code path and makes it easy to delete the "OFF" tests when the flag is removed.

Flag Lifecycle and Testing

Feature flags should have a defined lifecycle to avoid accumulating technical debt.

Phase 1: Development (Flag Off by Default)

The flag is off for everyone in production. All tests run with the flag off — this is the stable path.

Test the flag-on path with explicit overrides:

it('shows beta feature when flag is on', () => {
  flags.set('beta-feature', true);
  render(<App />);
  expect(screen.getByText('Beta Feature')).toBeInTheDocument();
});

Phase 2: Rollout (Percentage-Based)

The flag is gradually rolling out to users. At 10%, 90% of requests still hit the off path.

Your automated tests should still test both paths explicitly. Don't rely on the rollout percentage for test coverage — you need deterministic control.

For end-to-end tests, use user targeting to ensure your test user always sees the flag on or off:

LaunchDarkly targeting rule:

  • "If user key is 'test-user-001', serve TRUE"

Unleash targeting (userId strategy):

{
  "name": "userWithId",
  "parameters": {
    "userIds": "test-user-001"
  }
}

This makes your E2E test environment deterministic regardless of rollout percentage.

Phase 3: Full Rollout (Flag On for All)

The flag is on for everyone. The off-path tests are now dead code. This is the technical debt moment: those tests will keep passing (they test code that still exists) but they test behavior no longer accessible to users.

Remove the dead code and the tests for it as soon as the flag reaches full rollout. Schedule this immediately — don't let it drift.

Phase 4: Flag Removal

Remove the flag from the codebase:

  1. Delete the flag definition and all references to the SDK call
  2. Delete all tests that explicitly override the flag to off
  3. Clean up the flag in LaunchDarkly/Unleash
  4. The previously "flag on" tests now just become regular tests — remove the flag override from their setup

Testing Flag Targeting Rules

When flags use targeting rules (% rollout, user attributes, environment), test the rules themselves:

it('enables flag for users in beta program', () => {
  // Set up user with beta attribute
  const user = { key: 'user-123', custom: { beta: true } };
  flags.identify(user);
  
  expect(flags.variation('beta-feature', false)).toBe(true);
});

it('keeps flag disabled for standard users', () => {
  const user = { key: 'user-456', custom: { beta: false } };
  flags.identify(user);
  
  expect(flags.variation('beta-feature', false)).toBe(false);
});

Testing targeting rules catches misconfigured rollout logic before it affects real users.

Integration Test Strategies

Contract Testing for Flag Behavior

Document what each flag state promises and test that contract:

// Flag contract: when 'new-api' is OFF, use v1 endpoint
// When 'new-api' is ON, use v2 endpoint

describe('API endpoint selection', () => {
  it('[new-api: OFF] calls v1 endpoint', async () => {
    flags.set('new-api', false);
    await fetchData();
    expect(apiClient.lastCall.url).toContain('/api/v1/');
  });

  it('[new-api: ON] calls v2 endpoint', async () => {
    flags.set('new-api', true);
    await fetchData();
    expect(apiClient.lastCall.url).toContain('/api/v2/');
  });
});

E2E Tests with Flag State

For Playwright or Cypress E2E tests, set flag state via the management API before each test:

LaunchDarkly:

// In test setup
await launchDarklyApi.updateFeatureFlag(
  'project-key',
  'new-checkout-flow',
  { instructions: [{ kind: 'turnFlagOff' }] }
);

Unleash:

// In test setup
await fetch('http://unleash:4242/api/admin/features/new-checkout-flow/toggle/off', {
  method: 'POST',
  headers: { Authorization: 'Bearer admin-token' },
});

This approach tests against the real flag infrastructure, verifying that the SDK integration works correctly.

Common Anti-Patterns

Long-lived flags — flags that stay in the codebase for months accumulate technical debt and make the test matrix unmanageable. Set a removal date when creating each flag.

Flag stacking — using a flag to control a feature that's already behind another flag. This creates hidden dependencies that are nearly impossible to test exhaustively.

Testing only the happy path with flag on — you need to test both paths. The off path still exists in production until the flag is fully removed.

Using flags as configuration — flags are for deployment risk reduction, not for per-environment configuration values. Use environment variables for that.

No flag inventory — without tracking which flags exist, their purpose, and their removal timeline, flags accumulate silently. Both LaunchDarkly and Unleash provide dashboards for this; use them.

Flag Hygiene Checklist

When creating a feature flag:

  • Define the purpose and success criteria
  • Set a removal date (typically 2-4 weeks after full rollout)
  • Write tests for both flag states before writing the implementation
  • Add the flag to your team's "flags to remove" backlog
  • Document any inter-flag dependencies

When removing a feature flag:

  • Confirm the flag is at 100% rollout in all environments
  • Remove SDK calls from the codebase
  • Remove tests that override the flag to its off state
  • Delete the flag from LaunchDarkly/Unleash
  • Remove the flag from your test setup/teardown code

Feature flags are a powerful deployment tool that make testing harder. The key to managing that complexity is treating flags as temporary code (they always are) and building removal into the process from day one.

Read more