LaunchDarkly Testing Guide: Strategies for Feature Flag Quality

LaunchDarkly Testing Guide: Strategies for Feature Flag Quality

Feature flags without proper testing are a reliability risk. LaunchDarkly lets you ship code faster, but if your flag logic, targeting rules, or SDK integrations aren't validated, a bad flag evaluation can silently break production for a subset of users. This guide covers proven strategies for testing LaunchDarkly integrations end to end.

Why Feature Flag Testing Is Different

Traditional unit and integration tests cover known, static code paths. Feature flags introduce dynamic branching — the same code can behave differently based on user attributes, percentage rollouts, or A/B configurations. That means tests must cover:

  • The default variation when the flag is off
  • The target variation when the flag is on
  • Targeting rules — segments, attributes, and individual user overrides
  • SDK initialization edge cases — what happens if the flag value isn't loaded yet
  • Flag deprecation — ensuring removed flags don't leave orphaned code paths

Setting Up LaunchDarkly in Test Environments

Use Test Mode SDK Clients

LaunchDarkly's SDKs support offline mode and in-memory data sources, making them ideal for unit tests:

const LaunchDarkly = require('@launchdarkly/node-server-sdk');

// Use in-memory data store for unit tests
const td = LaunchDarkly.TestData();
const client = LaunchDarkly.init('sdk-key-does-not-matter', {
  updateProcessor: td.getFactory(),
  sendEvents: false,
});

await client.waitForInitialization();

The TestData source lets you control flag values programmatically without connecting to LaunchDarkly servers. This is the foundation for deterministic unit tests.

Configuring Flags in Tests

// Set flag to true for all users
td.update(td.flag('my-feature').on(true));

// Set flag with user targeting
td.update(
  td.flag('my-feature')
    .variationForUser('user-123', true)
    .fallthroughVariation(false)
);

This approach gives complete control over flag state without network dependencies.

Unit Testing Flag Evaluations

Unit tests should verify your code's behavior in both flag states, not just the happy path.

Test Both Variations Explicitly

describe('checkout flow', () => {
  it('shows legacy checkout when flag is off', async () => {
    td.update(td.flag('new-checkout').on(false));
    const result = await renderCheckout(testUser);
    expect(result).toContain('legacy-checkout');
  });

  it('shows new checkout when flag is on', async () => {
    td.update(td.flag('new-checkout').on(true));
    const result = await renderCheckout(testUser);
    expect(result).toContain('new-checkout-v2');
  });
});

Missing either test means half the flag lifecycle is untested.

Test Targeting Rules

If you use percentage rollouts or user attribute targeting, those rules must be tested too:

it('enables beta feature for beta segment users', async () => {
  td.update(
    td.flag('beta-dashboard')
      .ifMatch('beta', true).thenReturn(true)
      .fallthroughVariation(false)
  );

  const betaUser = { key: 'user-456', custom: { beta: true } };
  const result = await client.variation('beta-dashboard', betaUser, false);
  expect(result).toBe(true);
});

Integration Testing with Real LaunchDarkly Environments

Unit tests cover logic isolation. Integration tests verify the full SDK-to-LaunchDarkly flow.

Use a Dedicated Test Project

Create a separate LaunchDarkly project (or environment) for automated integration tests. Key practices:

  1. Never run integration tests against production flags — targeting rules may differ
  2. Use environment-specific SDK keys injected via CI secrets
  3. Reset flags to known states before each test run using the LaunchDarkly REST API
# Reset flag via LaunchDarkly REST API before test run
curl -X PATCH \
  <span class="hljs-string">"https://app.launchdarkly.com/api/v2/flags/my-project/my-feature" \
  -H <span class="hljs-string">"Authorization: $LD_API_KEY" \
  -H <span class="hljs-string">"Content-Type: application/json" \
  -d <span class="hljs-string">'[{"op": "replace", "path": "/environments/test/on", "value": false}]'

Testing SDK Initialization

Always test what happens when the SDK hasn't fully initialized:

it('returns default variation before SDK initializes', () => {
  const uninitializedClient = LaunchDarkly.init('sdk-key', {
    offline: true,
  });

  // Should return fallback, not throw
  const result = uninitializedClient.variation('my-flag', user, 'default');
  expect(result).toBe('default');
});

End-to-End Testing Feature Flags

End-to-end tests must simulate real user journeys in both flag states. The challenge is that E2E tests run against live environments where flag state may change.

Strategies for Stable E2E Flag Tests

Option 1: User attribute targeting

Create a dedicated test user in LaunchDarkly with a fixed variation. Your E2E tests always authenticate as this user:

Test user key: e2e-test-user
Flag override: new-checkout → always true

Option 2: Environment-level overrides

Use a dedicated staging environment where flags are locked to specific values for the duration of CI runs.

Option 3: HelpMeTest for E2E flag validation

Tools like HelpMeTest let you write natural-language tests that exercise both flag states:

As the test user
Navigate to /checkout
Verify the new checkout form is displayed
Submit the order
Verify the order confirmation shows the updated messaging

This keeps E2E tests readable and maintainable as your flag set grows.

Testing Flag Cleanup

The biggest hidden risk in feature flag systems is stale flags — code that depends on flags that were archived but not cleaned up.

Automated Stale Flag Detection

// Track flag usage in tests
const usedFlags = new Set();
const originalVariation = client.variation.bind(client);
client.variation = (key, user, fallback) => {
  usedFlags.add(key);
  return originalVariation(key, user, fallback);
};

// After test suite, compare with known active flags
afterAll(async () => {
  const activeFlags = await fetchActiveLaunchDarklyFlags();
  const orphanedFlags = [...usedFlags].filter(f => !activeFlags.includes(f));
  if (orphanedFlags.length > 0) {
    console.warn('Stale flag references found:', orphanedFlags);
  }
});

Code Search for Archived Flags

Before archiving a flag in LaunchDarkly, search the codebase for all references:

grep -rn "my-old-feature" src/ --include=<span class="hljs-string">"*.js" --include=<span class="hljs-string">"*.ts"

Make flag removal a two-step process: archive the flag, ship the code change to remove all references, then delete the flag.

CI/CD Integration Patterns

Environment-Specific Flag Configuration

Structure your CI pipeline to test both flag states:

# .github/workflows/test.yml
jobs:
  test-flag-off:
    env:
      LD_FLAGS_OVERRIDE: '{"new-checkout": false}'
    steps:
      - run: npm test

  test-flag-on:
    env:
      LD_FLAGS_OVERRIDE: '{"new-checkout": true}'
    steps:
      - run: npm test

Implement the override in your application's flag evaluation layer.

Flag Coverage Reports

Track which flags have test coverage using a custom reporter:

// jest.config.js
module.exports = {
  reporters: [
    'default',
    ['./reporters/flag-coverage-reporter.js', { flags: ['new-checkout', 'beta-dashboard'] }]
  ]
};

Common Mistakes to Avoid

Skipping the "flag off" test — Teams often only test the new behavior. The fallback path can have bugs too.

Using production SDK keys in CI — A misconfigured test could modify production flag state or fire analytics events.

Not testing initialization failures — If LaunchDarkly is temporarily unreachable, your default values become the UI. Test them.

Hardcoding user keys in tests — Use constants or factories for test users to avoid conflicts between parallel test runs.

Ignoring multivariate flags — String and number flags need tests for every variation, not just true/false.

Monitoring Flags in Production

Testing catches issues before deploy. Monitoring catches what testing misses.

Set up LaunchDarkly's flag health metrics alongside your application monitoring:

  • Alert when a flag's evaluation error rate spikes
  • Track flag evaluation latency — excessive latency can indicate SDK misconfiguration
  • Monitor percentage rollout cohort behavior with analytics events

HelpMeTest's 24/7 monitoring runs your E2E tests on a schedule, alerting you immediately if a flag evaluation breaks a user journey — without waiting for a human to notice.

Summary

Testing LaunchDarkly integrations requires covering both flag states in unit tests, validating targeting rules, isolating integration tests from production flag state, and building E2E tests that can run against deterministic flag configurations. Combine automated testing at every layer with active monitoring in production to catch issues a deploys cause — not after they've affected users.

Read more