A/B Testing Guide: How to Test Experiments Without Breaking Your Application

A/B Testing Guide: How to Test Experiments Without Breaking Your Application

A/B testing (split testing) lets you run controlled experiments — showing variant A to one group and variant B to another — to measure which performs better. From an engineering perspective, the challenge is not the statistics but the testability: experiments must be deterministic, isolated, and not interfere with your automated test suite.

The Core Concepts

Experiment: A controlled change with defined variants (A = control, B = treatment).

Assignment: Deterministically mapping a user/session to a variant. Usually hash-based:

function assignVariant(userId, experimentKey, variants = ['control', 'treatment']) {
  const hash = murmurhash(`${userId}:${experimentKey}`)
  const index = hash % variants.length
  return variants[index]
}

Exposure: The moment a user sees the variant. Record this to calculate denominators.

Conversion: The event you're measuring (click, signup, purchase).

Testing Challenges

A/B testing introduces non-determinism into your application. The same user action can have different outcomes depending on which variant they're assigned. This breaks tests that don't control for variant assignment.

Solutions:

  1. Force a specific variant in tests via override API or cookie
  2. Mock the assignment function to always return a known variant
  3. Test each variant independently — one test file per variant

Testing with Forced Variants

Most A/B testing frameworks support forcing a variant via URL parameter, cookie, or API.

// GrowthBook: force variant via cookie
document.cookie = 'gb_force_exp_button_color=treatment'

// Optimizely: force variation via URL
// ?optimizely_x=123456

// Statsig: override in test
client.overrideGate('new_checkout', true)
client.overrideExperiment('button_color', { color: 'green' })

Unit Testing Experiment Logic

// experiment.js
import { GrowthBook } from '@growthbook/growthbook'

export function getCheckoutButtonColor(gb) {
  const result = gb.run({ key: 'button-color', variations: ['blue', 'green'] })
  return result.value
}

// experiment.test.js
import { GrowthBook } from '@growthbook/growthbook'
import { getCheckoutButtonColor } from './experiment'

function createGrowthBook(forcedVariations = {}) {
  const gb = new GrowthBook({ forcedVariations })
  gb.setFeatures({ 'button-color': { defaultValue: 'blue' } })
  return gb
}

test('returns blue for control variant', () => {
  const gb = createGrowthBook({ 'button-color': 0 })
  expect(getCheckoutButtonColor(gb)).toBe('blue')
})

test('returns green for treatment variant', () => {
  const gb = createGrowthBook({ 'button-color': 1 })
  expect(getCheckoutButtonColor(gb)).toBe('green')
})

Integration Testing Both Variants

// checkout.test.js
describe('Checkout page', () => {
  describe('control variant (blue button)', () => {
    beforeEach(() => {
      // Force control variant
      cy.setCookie('gb_force_button-color', '0')
    })

    it('shows blue CTA button', () => {
      cy.visit('/checkout')
      cy.get('[data-testid="cta-button"]').should('have.css', 'background-color', 'rgb(0, 0, 255)')
    })

    it('completes purchase flow with blue button', () => {
      cy.visit('/checkout')
      cy.get('[data-testid="cta-button"]').click()
      cy.url().should('include', '/confirmation')
    })
  })

  describe('treatment variant (green button)', () => {
    beforeEach(() => {
      cy.setCookie('gb_force_button-color', '1')
    })

    it('shows green CTA button', () => {
      cy.visit('/checkout')
      cy.get('[data-testid="cta-button"]').should('have.css', 'background-color', 'rgb(0, 255, 0)')
    })

    it('completes purchase flow with green button', () => {
      cy.visit('/checkout')
      cy.get('[data-testid="cta-button"]').click()
      cy.url().should('include', '/confirmation')
    })
  })
})

Testing Experiment Exposure Tracking

Experiments are useless without exposure tracking. Test that exposures are recorded correctly.

test('records exposure when experiment is evaluated', async () => {
  const trackMock = vi.fn()
  const gb = new GrowthBook({
    trackingCallback: trackMock,
    forcedVariations: { 'button-color': 1 }
  })
  
  renderCheckout({ gb })
  
  expect(trackMock).toHaveBeenCalledWith(
    expect.objectContaining({ key: 'button-color' }),
    expect.objectContaining({ value: 'green' })
  )
})

Testing Statistical Significance

For server-side A/B tests, validate that your significance calculation is correct:

// stats.js
export function calculateZScore(controlConversions, controlVisitors, treatmentConversions, treatmentVisitors) {
  const p1 = controlConversions / controlVisitors
  const p2 = treatmentConversions / treatmentVisitors
  const p = (controlConversions + treatmentConversions) / (controlVisitors + treatmentVisitors)
  const se = Math.sqrt(p * (1 - p) * (1/controlVisitors + 1/treatmentVisitors))
  return (p2 - p1) / se
}

// stats.test.js
test('calculates correct z-score for significant result', () => {
  // 5% vs 7% conversion rate, 1000 visitors each
  const z = calculateZScore(50, 1000, 70, 1000)
  expect(z).toBeCloseTo(1.96, 1)  // ~95% confidence
})

test('calculates correct z-score for non-significant result', () => {
  const z = calculateZScore(50, 100, 52, 100)
  expect(Math.abs(z)).toBeLessThan(1.65)  // below 90% confidence
})

Preventing Experiment Interference

Multiple simultaneous experiments can interfere. Test for:

test('user in experiment A does not bleed into experiment B', () => {
  const userId = 'user-123'
  
  const expA = assignVariant(userId, 'experiment-a')
  const expB = assignVariant(userId, 'experiment-b')
  
  // Assignments should be independent
  // Run 1000 users and verify distribution in each experiment
  const results = { 'experiment-a': {}, 'experiment-b': {} }
  
  for (let i = 0; i < 1000; i++) {
    const id = `user-${i}`
    const a = assignVariant(id, 'experiment-a')
    const b = assignVariant(id, 'experiment-b')
    results['experiment-a'][a] = (results['experiment-a'][a] || 0) + 1
    results['experiment-b'][b] = (results['experiment-b'][b] || 0) + 1
  }
  
  // Both experiments should have ~50/50 split
  const aRatio = results['experiment-a']['control'] / 1000
  const bRatio = results['experiment-b']['control'] / 1000
  
  expect(aRatio).toBeCloseTo(0.5, 1)
  expect(bRatio).toBeCloseTo(0.5, 1)
})

CI/CD Considerations

  • Never run A/B tests in CI without explicit variant forcing — flaky tests result
  • Add a TEST_VARIANT_OVERRIDE env var or cookie mechanism to force variants
  • Document which variant each test targets at the top of the test file

Summary

The key engineering principle: A/B tests must be deterministic in your test suite. Force variants explicitly in every test. Test each variant as a separate scenario. Verify exposure tracking. The statistics layer is secondary — the application behavior under each variant is what your automated tests protect.

Read more