Feature Flag Testing in SaaS: Tenant-Level Flags with LaunchDarkly and Unleash

Feature Flag Testing in SaaS: Tenant-Level Flags with LaunchDarkly and Unleash

Feature flags in single-user apps are already tricky. In multi-tenant SaaS, they're an order of magnitude harder. You're not just controlling whether a feature is on or off — you're controlling which tenants see which features, often with different rollout percentages, different target rules, and different override states per tenant.

When this breaks, it breaks in ways that are hard to detect: a feature that should be off for paying customers is on, a beta feature leaks to a tenant who wasn't enrolled, or a flag state set for one tenant bleeds into another tenant's context.

This guide covers how to test tenant-level feature flags correctly.

The Unique Challenge of Tenant-Level Flags

Standard feature flag testing verifies two states: on and off. Tenant-level flag testing must verify:

  • Tenant A's flag state doesn't affect Tenant B (context isolation)
  • Flag evaluation uses the correct tenant identity (context correctness)
  • Overrides for one tenant don't apply globally (override scoping)
  • Rollout percentages are applied per-tenant, not globally (targeting correctness)

Each of these can fail silently in production.

Testing with LaunchDarkly

LaunchDarkly evaluates flags against a context object. In multi-tenant SaaS, that context must include both the user and the tenant:

const ldContext = {
  kind: 'multi',
  user: {
    key: `user-${userId}`,
    email: user.email,
  },
  organization: {
    key: `org-${tenantId}`,
    name: tenant.name,
    plan: tenant.plan,
  },
};

const flagValue = ldClient.variation('new-dashboard', ldContext, false);

Mocking LaunchDarkly for Unit Tests

Never call LaunchDarkly's servers in unit tests. Use the test SDK or mock the client:

// launchDarkly.test.js
import { LDClient } from 'launchdarkly-js-sdk-common';

jest.mock('launchdarkly-node-server-sdk', () => ({
  init: jest.fn(() => ({
    variation: jest.fn(),
    waitForInitialization: jest.fn().mockResolvedValue(true),
  })),
}));

describe('Dashboard feature flag', () => {
  let mockLDClient;

  beforeEach(() => {
    mockLDClient = {
      variation: jest.fn(),
    };
  });

  it('shows new dashboard when flag is enabled for tenant', () => {
    mockLDClient.variation.mockImplementation((flag, context) => {
      if (flag === 'new-dashboard' && context.organization.key === 'org-tenant-a') {
        return true;
      }
      return false;
    });

    const result = evaluateDashboardFlag(mockLDClient, {
      userId: 'user-1',
      tenantId: 'tenant-a',
    });

    expect(result).toBe(true);
  });

  it('hides new dashboard for tenants not in rollout', () => {
    mockLDClient.variation.mockReturnValue(false);

    const result = evaluateDashboardFlag(mockLDClient, {
      userId: 'user-1',
      tenantId: 'tenant-b', // not in rollout
    });

    expect(result).toBe(false);
  });

  it('uses organization key, not user key, for tenant-level flags', () => {
    mockLDClient.variation.mockReturnValue(false);

    evaluateDashboardFlag(mockLDClient, {
      userId: 'user-1',
      tenantId: 'tenant-a',
    });

    const callContext = mockLDClient.variation.mock.calls[0][1];
    expect(callContext.organization.key).toBe('org-tenant-a');
    expect(callContext.kind).toBe('multi'); // must use multi-context
  });
});

Integration Testing Flag States Per Tenant

Integration tests must verify that flag evaluation at the API layer correctly isolates tenant contexts:

describe('API — tenant-level flag isolation', () => {
  beforeAll(async () => {
    // Seed LaunchDarkly test environment with known states
    await ldTestClient.updateFlag('new-reporting', {
      variations: [true, false],
      targets: [{ variation: 0, values: ['org-tenant-a'] }],
      fallthrough: { variation: 1 },
    });
  });

  it('tenant A sees the new reporting UI', async () => {
    const response = await request(app)
      .get('/api/reporting/config')
      .set('Authorization', `Bearer ${tenantAToken}`);

    expect(response.body.useNewReporting).toBe(true);
  });

  it('tenant B sees the legacy reporting UI', async () => {
    const response = await request(app)
      .get('/api/reporting/config')
      .set('Authorization', `Bearer ${tenantBToken}`);

    expect(response.body.useNewReporting).toBe(false);
  });

  it('switching between tenants evaluates flags independently', async () => {
    // Simulate the same server process handling both tenants
    const [responseA, responseB] = await Promise.all([
      request(app).get('/api/reporting/config').set('Authorization', `Bearer ${tenantAToken}`),
      request(app).get('/api/reporting/config').set('Authorization', `Bearer ${tenantBToken}`),
    ]);

    expect(responseA.body.useNewReporting).toBe(true);
    expect(responseB.body.useNewReporting).toBe(false);
  });
});

Testing with Unleash

Unleash uses strategy-based flag evaluation. Tenant-level targeting typically uses the gradualRolloutUserId or custom strategies with tenant IDs:

// Setting up tenant context in Unleash
const unleash = initialize({
  url: 'https://unleash.yourcompany.com/api',
  appName: 'saas-app',
  customHeaders: {
    Authorization: process.env.UNLEASH_API_KEY,
  },
});

function isFlagEnabled(flagName, tenantId, userId) {
  return unleash.isEnabled(flagName, {
    userId: `tenant:${tenantId}:user:${userId}`,
    properties: {
      tenantId,
      tenantPlan: getTenantPlan(tenantId),
    },
  });
}

Mocking Unleash in Tests

jest.mock('unleash-client', () => ({
  initialize: jest.fn(() => ({
    isEnabled: jest.fn((flag, context) => {
      // Return predictable values based on context
      const enabledTenants = {
        'beta-feature': ['tenant-enterprise-1', 'tenant-enterprise-2'],
      };
      return enabledTenants[flag]?.includes(context.properties.tenantId) ?? false;
    }),
  })),
}));

it('enables beta feature only for enterprise tenants', () => {
  expect(isFlagEnabled('beta-feature', 'tenant-enterprise-1', 'user-1')).toBe(true);
  expect(isFlagEnabled('beta-feature', 'tenant-starter-1', 'user-2')).toBe(false);
});

Testing Flag Context Isolation

The most dangerous bug: one tenant's flag evaluation bleeds into another tenant's request. This typically happens when tenant context is stored in a module-level variable rather than per-request:

// DANGEROUS: module-level tenant context
let currentTenantId = null;

function setTenant(id) {
  currentTenantId = id; // Race condition: another request overwrites this
}

function isFlagEnabled(flag) {
  return ldClient.variation(flag, { key: currentTenantId }, false);
}

Test for this explicitly:

it('concurrent requests evaluate flags with their own tenant context', async () => {
  const results = await Promise.all([
    // Tenant A has flag ON
    request(app).get('/api/feature-status').set('X-Tenant-ID', 'tenant-a'),
    // Tenant B has flag OFF
    request(app).get('/api/feature-status').set('X-Tenant-ID', 'tenant-b'),
    // Run 10 more to increase race condition chance
    ...Array(10).fill(null).map((_, i) =>
      request(app).get('/api/feature-status')
        .set('X-Tenant-ID', i % 2 === 0 ? 'tenant-a' : 'tenant-b')
    ),
  ]);

  // Verify each response matches the expected flag state for that tenant
  for (const [i, response] of results.entries()) {
    const expectedTenant = i === 0 ? 'tenant-a' : 'tenant-b';
    const expectedFlag = expectedTenant === 'tenant-a';
    expect(response.body.featureEnabled).toBe(expectedFlag);
  }
});

Testing Plan-Based Flag Targeting

Enterprise SaaS often ties flags to subscription plans:

describe('Plan-based flag targeting', () => {
  const plans = ['free', 'starter', 'professional', 'enterprise'];
  const flagRequiresPlan = 'advanced-analytics';
  const minimumPlan = 'professional';

  plans.forEach(plan => {
    it(`${plan} plan: flag ${plan >= minimumPlan ? 'enabled' : 'disabled'}`, async () => {
      const tenant = await createTestTenantWithPlan(plan);
      const response = await request(app)
        .get('/api/features')
        .set('Authorization', `Bearer ${tenant.token}`);

      const shouldBeEnabled = ['professional', 'enterprise'].includes(plan);
      expect(response.body.features[flagRequiresPlan]).toBe(shouldBeEnabled);
    });
  });
});

CI/CD Integration

In CI, you need deterministic flag states. Use one of:

  1. Hardcode flags off in test environment — simplest, but doesn't test flag-enabled paths
  2. Use a dedicated test flag environment with known flag states
  3. Inject flag overrides via environment variables for specific test runs
# .github/workflows/test.yml
env:
  LAUNCHDARKLY_SDK_KEY: ${{ secrets.LD_TEST_ENV_SDK_KEY }}
  FEATURE_FLAG_OVERRIDES: '{"new-dashboard": true, "beta-export": false}'
// flagService.js
function variation(flag, context, defaultValue) {
  if (process.env.FEATURE_FLAG_OVERRIDES) {
    const overrides = JSON.parse(process.env.FEATURE_FLAG_OVERRIDES);
    if (flag in overrides) return overrides[flag];
  }
  return ldClient.variation(flag, context, defaultValue);
}

Key Takeaways

  • Multi-context evaluation (user + organization) is required for correct tenant-level targeting
  • Mock flag clients in unit tests; use a dedicated test environment for integration tests
  • Test concurrent requests to catch context leakage from module-level state
  • Verify that plan-based flag targeting matches your subscription tiers exactly
  • Use environment variable overrides in CI to get deterministic flag states without mocking

Read more