Tips & Tricks

Flaky Tests: Root Causes and How to Fix Them

HelpMeTest

14 Mar 2026 — 12 min read

Flaky tests fail intermittently — not because the application is broken, but because the test itself has hidden assumptions about timing, state, or environment. The fix is almost never "just retry it." Understanding the root cause and fixing it permanently is the only path to a reliable test suite.

Key Takeaways

Async timing issues cause most flaky E2E tests. The test proceeds before the application is ready. The fix is explicit waits for application state — not sleep timers.

Shared state is the second biggest cause. Tests that modify shared data and run in parallel will collide. The fix is test isolation: each test creates and owns its own data.

Retrying flaky tests masks the problem. A test that passes on the third try is still flaky — you just shipped a slower CI pipeline. Fix the root cause; use retries only as a temporary bandage.

Network dependency failures are a category on their own. Tests that depend on real external services will fail when those services are slow or unavailable. Use mocks or contract tests instead.

Quarantine flaky tests until they are fixed. A test that sometimes fails is worse than no test — it trains your team to ignore red CI builds.

A flaky test is a test that fails intermittently — not because the application has a bug, but because the test has hidden dependencies on timing, environment, or state that vary between runs. It passes on your machine, fails in CI. It passes when run alone, fails in parallel. It passes Monday, fails Tuesday, passes again Wednesday.

Flaky tests are one of the most corrosive problems in software engineering. They slow down CI, generate false alarms, and gradually train teams to ignore test failures — which means real bugs slip through.

This guide covers every major root cause of test flakiness, how to diagnose it, and how to fix it permanently.

Why Flaky Tests Are a Serious Problem

Before diving into causes and fixes, it is worth being explicit about why flaky tests matter.

They erode trust. When developers learn that a failing test "probably just needs another run," they stop treating test failures as signals. This is exactly backwards from what tests are for.

They slow down CI. Each flaky failure requires investigation time. Some teams institute automatic retries, which makes the problem invisible while making CI slower. A suite with 10% test flakiness may be running 30% more test executions than necessary.

They mask real failures. When CI is always orange because of known flaky tests, real failures blend in. The failing test might be the one that actually found a bug — but nobody is looking anymore.

They compound. As teams learn that test failures often require retries, they add retries everywhere. The flaky tests continue failing, the retries multiply, and the underlying problems are never addressed.

The right response to a flaky test is: quarantine it immediately (to preserve signal), investigate the root cause, and fix it within the sprint.

Root Cause 1: Async Timing Issues

This is the most common cause of flaky E2E tests. The test proceeds with the next action before the application has finished the previous one.

The Pattern

// Flaky: clicks button, immediately checks result before API call completes
cy.get('[data-testid="submit"]').click()
cy.get('.success-message').should('be.visible') // Fails intermittently

The test clicks Submit, which triggers an API call. If the API responds quickly, .success-message appears before the assertion runs. If the API is slow (maybe the CI machine was under load), the assertion runs before the success message appears. The test is timing-dependent.

The Fix: Wait for Application State, Not Time

// Bad fix: arbitrary sleep
cy.get('[data-testid="submit"]').click()
cy.wait(2000) // Fails on slow CI, wastes time on fast machines
cy.get('.success-message').should('be.visible')

// Good fix: wait for the actual state change
cy.get('[data-testid="submit"]').click()
cy.get('.success-message').should('be.visible') // Cypress retries until visible

// Better fix: wait for the network request that drives the state change
cy.intercept('POST', '/api/orders').as('createOrder')
cy.get('[data-testid="submit"]').click()
cy.wait('@createOrder') // Wait for the specific API call to complete
cy.get('.success-message').should('be.visible')

In Playwright:

// Wait for the response that drives the state change
const [response] = await Promise.all([
  page.waitForResponse('**/api/orders'),
  page.click('[data-testid="submit"]'),
]);
expect(response.status()).toBe(201);
await expect(page.locator('.success-message')).toBeVisible();

The rule: never use sleep() or wait(N ms) to solve timing issues. Instead, wait for a specific application condition that signals the operation is complete — a network response, a DOM state change, a URL change, or a specific element appearing.

Common Timing Patterns and Their Fixes

Timing Problem	Flaky Pattern	Fix
API call not complete	Assert result immediately after click	Wait for network response
Animation not done	Click element mid-animation	Wait for animation class to be removed
Page not loaded	Interact before page ready	Wait for specific content or network idle
WebSocket message	Assert before message arrives	Intercept and wait for WebSocket event
Debounced input	Assert before debounce fires	Wait for debounce delay or resulting API call

Root Cause 2: Shared Test State

Tests that share state and run in parallel will collide unpredictably.

The Pattern

// Test 1: Creates user with email "test@example.com"
// Test 2: Also creates user with email "test@example.com"
// Result: One of them gets a "email already exists" error — depending on order

Or more subtly:

// Test 1: Navigates to user list, counts 5 users
// Test 2: Creates a user in parallel
// Test 1 re-runs assertion: now 6 users — FAILS

Shared state failures are particularly insidious because they only appear when tests run in parallel, and the failures are not reproducible when you run the failing test in isolation.

The Fix: Test Isolation

Each test should create its own data and clean up after itself. No test should depend on the state left by another test.

// Instead of a shared test user that all tests assume exists:
beforeEach(async () => {
  // Create a unique user for this test run
  testUser = await db.createUser({
    email: `test-${Date.now()}-${Math.random()}@example.com`,
    name: 'Test User',
  });
});

afterEach(async () => {
  // Clean up
  await db.deleteUser(testUser.id);
});

Or use database transactions that are rolled back after each test:

// Jest with transaction isolation
beforeEach(() => db.transaction.begin());
afterEach(() => db.transaction.rollback());

For E2E tests, avoid shared test accounts. Instead, create test users via API in beforeEach, run the test, and delete them in afterEach.

Parallelism-Specific Patterns

Shared State Problem	Fix
Shared test database	Use per-test database transactions or isolated schemas
Shared test user account	Create unique users per test run
Shared browser session	Use separate browser contexts per test
Shared temp files	Use unique temp directories per test
Global counters/IDs	Use random or UUID-based IDs

Root Cause 3: Order Dependency

Tests that must run in a specific order to pass indicate design problems.

The Pattern

// Test A: Creates a product
// Test B: Edits the product created by Test A
// If Test A fails, Test B fails too. If they run in different order, both fail.

Test suites designed this way are extremely fragile. Any parallel execution or shuffled ordering breaks everything.

The Fix: Self-Contained Tests

Every test must be independently executable. This means each test sets up its own preconditions:

describe('Product editing', () => {
  let product;

  beforeEach(async () => {
    // Test B no longer depends on Test A — it creates its own product
    product = await api.createProduct({ name: 'Test Widget', price: 9.99 });
  });

  it('can edit product name', async () => {
    await page.goto(`/products/${product.id}/edit`);
    await page.fill('[name="product-name"]', 'Updated Widget');
    await page.click('[data-testid="save"]');
    await expect(page.locator('.product-name')).toHaveText('Updated Widget');
  });
});

A useful rule: any test in your suite should be runnable on its own with --grep "test name" and produce a meaningful result.

Root Cause 4: External Service Dependencies

Tests that call real external APIs, databases, or third-party services will fail whenever those services are slow, down, or rate-limiting.

The Pattern

// This test calls the real Stripe API — fails when Stripe is slow or test keys are rate-limited
it('processes a payment', async () => {
  await page.fill('[name="card-number"]', '4242424242424242')
  await page.click('[data-testid="pay"]')
  cy.get('.payment-success').should('be.visible') // Fails if Stripe responds slowly
})

The Fix: Mock External Dependencies

For unit and integration tests, mock external services with libraries like msw (Mock Service Worker) or test doubles:

// Mock Stripe API in tests
import { setupServer } from 'msw/node';
import { rest } from 'msw';

const server = setupServer(
  rest.post('https://api.stripe.com/v1/charges', (req, res, ctx) => {
    return res(ctx.json({ id: 'ch_test123', status: 'succeeded' }));
  })
);

beforeAll(() => server.listen());
afterEach(() => server.resetHandlers());
afterAll(() => server.close());

For E2E tests, intercept at the network level:

cy.intercept('POST', '**/stripe/charge', { fixture: 'stripe-success.json' })

For critical payment paths where you must test the real integration, use a separate test suite that runs less frequently (nightly, not on every PR) and is not in the critical CI path.

Root Cause 5: Resource Contention

Tests that compete for the same system resources — ports, files, database connections, or CPU — will fail unpredictably when resources are exhausted.

The Pattern

// Two tests both try to start a server on port 3000
// Whichever starts second fails with "port in use"

Or more subtly: tests that work fine in isolation but fail on resource-constrained CI machines (fewer CPUs, less memory, slower disks).

The Fix: Isolate Resources

Use dynamic port allocation:

// Find a free port instead of hardcoding
import { createServer } from 'net';

async function getFreePort(): Promise<number> {
  return new Promise(resolve => {
    const server = createServer();
    server.listen(0, () => {
      const port = (server.address() as AddressInfo).port;
      server.close(() => resolve(port));
    });
  });
}

const port = await getFreePort();

For test databases, use separate schemas per test worker:

// playwright.config.ts
const workerIndex = process.env.TEST_WORKER_INDEX ?? '0';
process.env.DATABASE_URL = `postgresql://localhost/testdb_worker_${workerIndex}`;

Root Cause 6: Environment-Specific Behavior

Tests that pass in one environment and fail in another are testing environmental assumptions instead of application behavior.

Common Environment Issues

Problem	Symptom	Fix
Timezone differences	Date assertions fail in different timezones	Use UTC everywhere in tests; mock `Date.now()`
Locale/language	Text assertions fail with different system locales	Set explicit locale in test config
Screen resolution	Element positioning assertions	Use viewport-agnostic selectors; set consistent viewport size
File path separators	Path assertions fail on Windows vs. Linux	Use `path.join()` always; normalize in assertions
Floating point	Numeric assertions fail cross-platform	Use `toBeCloseTo()` not `toBe()` for floats

The Fix: Control the Environment

// playwright.config.ts — enforce consistent environment
export default defineConfig({
  use: {
    viewport: { width: 1280, height: 720 },
    locale: 'en-US',
    timezoneId: 'America/New_York',
  },
});

For date/time flakiness, mock the system clock:

// Freeze time at a known point
await page.addInitScript(() => {
  const fakeDate = new Date('2025-01-15T10:00:00Z');
  Date.now = () => fakeDate.getTime();
  globalThis.Date = class extends Date {
    constructor(...args: any[]) {
      super(args.length ? args[0] : fakeDate.getTime());
    }
  };
});

Root Cause 7: Test Data Pollution

Tests that rely on a specific dataset that drifts over time will fail when the data changes.

The Pattern

// This test assumes there are exactly 10 products in the database
it('shows all products', () => {
  cy.get('.product-card').should('have.length', 10)
})

This test fails whenever a product is added or removed from the test database — which happens over time as the application evolves.

The Fix: Test Relative State, Not Absolute Values

// Better: create the data you need, test what you expect relative to it
it('shows all products', () => {
  const productNames = ['Widget A', 'Widget B', 'Widget C'];
  // Set up: create exactly these products
  // Run: visit the page
  cy.get('.product-card').should('have.length', productNames.length)
  productNames.forEach(name => {
    cy.contains('.product-card', name).should('exist')
  })
})

Or even better — test that the products you care about are present, not that the total count is exactly right.

Root Cause 8: Test Infrastructure Issues

Sometimes tests are flaky not because of the test code or application, but because of the test infrastructure itself.

Common Infrastructure Issues

CI machine overload: Tests time out because the CI runner is CPU-throttled
Flaky Docker containers: Containers that do not start cleanly or take variable time to be ready
Inconsistent network: CI environments with variable network latency that exceeds timeout values
Memory pressure: Tests that pass on 8GB machines fail on 4GB CI runners
Browser version mismatches: Tests that pass with Chrome 120 fail with Chrome 119 (or vice versa)

The Fix: Stabilize Infrastructure

Set explicit timeouts that are generous for CI: timeout: 60000 instead of 10000
Use Docker healthchecks to verify dependencies are actually ready, not just started
Pin browser versions in your test configuration
Monitor CI machine resource usage — if tests consistently time out, the machine needs more resources

# Wait for the database to be ready, not just started
services:
  postgres:
    image: postgres:16
    options: >-
      --health-cmd pg_isready
      --health-interval 10s
      --health-timeout 5s
      --health-retries 5

Diagnosing Flaky Tests

When a test starts failing intermittently, use this process:

Step 1: Reproduce Locally

Run the test in a loop to reproduce the failure:

# Run the test 20 times and collect results
<span class="hljs-keyword">for i <span class="hljs-keyword">in {1..20}; <span class="hljs-keyword">do npx playwright <span class="hljs-built_in">test --grep <span class="hljs-string">"test name" && <span class="hljs-built_in">echo <span class="hljs-string">"PASS" <span class="hljs-pipe">|| <span class="hljs-built_in">echo <span class="hljs-string">"FAIL"; <span class="hljs-keyword">done

If you cannot reproduce locally, the flakiness is likely environment-specific — run the same loop in CI with extra logging enabled.

Step 2: Identify the Failure Mode

Collect the failure messages across multiple runs. Are they:

Always the same error (suggests timing — one specific operation is too slow)
Different errors each time (suggests shared state — different collision each time)
Only in CI (suggests environment or resource differences)
Only when tests run in parallel (suggests shared state or resource contention)

Step 3: Add Diagnostic Logging

// Add detailed logging around the flaky section
test('flaky checkout test', async ({ page }) => {
  await page.goto('/cart');

  // Log page state before the flaky action
  console.log('Page URL before click:', page.url());
  console.log('Submit button state:', await page.locator('[data-testid="submit"]').isEnabled());

  await page.click('[data-testid="submit"]');

  // Log what happened after the click
  await page.waitForTimeout(100); // Brief pause to let state settle
  console.log('Page URL after click:', page.url());
  console.log('Error messages:', await page.locator('.error-message').allTextContents());
});

Step 4: Quarantine While You Fix

Add the test to a quarantine list and exclude it from the main CI gate while you investigate:

// playwright.config.ts — exclude known flaky tests from blocking CI
projects: [
  {
    name: 'stable-tests',
    grep: /^(?!.*\[FLAKY\]).*/,
    retries: 0,
  },
  {
    name: 'flaky-tests',
    grep: /\[FLAKY\]/,
    retries: 3, // Temporary retry while investigating
  },
]

Tag the test clearly:

test('[FLAKY] checkout flow - timing issue under investigation', async ({ page }) => {
  // ...
});

The Retry Trap

It is tempting to fix flaky tests by adding retries. Most test frameworks support this:

// Playwright
test.describe.configure({ retries: 3 });

// Cypress
// cypress.config.js
{ retries: { runMode: 2, openMode: 0 } }

Retries have a legitimate place — occasional infrastructure blips, network hiccups, and timing-edge-cases that are genuinely rare. But retries are not a fix for systematic flakiness. A test that fails 20% of the time and retries 3 times will pass on CI most of the time — but the underlying problem is still there. The test is still flaky. Your CI is just slower, and you are shipping code with a test suite that does not actually provide the coverage you think it does.

Use retries as a temporary quarantine measure, not as a permanent solution.

Building a Reliable Test Suite

Preventing flakiness is cheaper than fixing it. These practices will dramatically reduce flakiness in new test suites:

Always wait for application state, never for time. Ban sleep() and wait(N ms) from your test codebase.
Isolate test data. Every test creates its own data. No test assumes data created by another test exists.
Use stable selectors. data-testid attributes are your friends. Invest the five minutes to add them to key elements.
Mock external dependencies. Real API calls in unit and integration tests are a smell.
Run tests in random order. --randomize-order will surface order dependencies immediately.
Monitor flakiness rates. Track which tests have the highest retry rates in CI. The top 10 are your priority.
Fix flakiness when found. Quarantine immediately; fix within the sprint. Flaky tests that stay in quarantine for months become invisible technical debt.

Frequently Asked Questions

What is a flaky test?

A flaky test is an automated test that sometimes passes and sometimes fails when run against the same code, without any changes to the application or test. Flakiness indicates hidden dependencies on timing, state, environment, or external services.

How do you find flaky tests?

Track CI failure history over time and calculate the failure rate per test. Tests with failure rates above 1-2% on unchanged code are likely flaky. Some CI platforms (CircleCI, GitHub Actions, Buildkite) have built-in flakiness detection.

Should I use test retries to fix flaky tests?

Retries hide flakiness but do not fix it. Use retries as a temporary measure while investigating and fixing the root cause. Permanent retries slow down your CI and erode confidence in your test suite.

Why do tests pass locally but fail in CI?

Common causes: different environment variables or configuration, different system resources (slower CPU/memory), different browser versions, timing differences from CI machine load, and parallelism that exposes shared state issues. Start by comparing the exact versions and environment variables between local and CI.

What percentage of tests should be flaky?

The target is zero. In practice, complex E2E suites on real applications often have 1-3% flakiness. Above 5% flakiness, teams typically start losing trust in the test suite entirely. Any flakiness is a signal that something needs fixing.

How does HelpMeTest help with flaky tests?

HelpMeTest uses AI-powered test execution that interprets natural language instructions against the current UI state at runtime. This eliminates the most common source of E2E test flakiness — brittle selectors that break when UI changes. Tests that describe intent ("click the checkout button") are inherently more resilient than tests that reference specific DOM attributes. The platform also includes self-healing for cases where elements have moved but the intent is still clear.

Summary

Flaky tests are a symptom, not a root cause. The most common underlying causes are:

Async timing — test proceeds before application is ready. Fix with explicit state-based waits.
Shared state — tests collide when run in parallel. Fix with test isolation.
Order dependency — tests assume other tests have already run. Fix with self-contained setup.
External services — real API calls fail intermittently. Fix with mocks.
Resource contention — tests compete for ports, files, or CPU. Fix with dynamic resource allocation.
Environment differences — behavior varies by timezone, locale, or screen size. Fix by controlling the environment.
Test data drift — assertions about absolute counts or specific records that change. Fix with relative assertions and isolated data.
Infrastructure issues — CI machines under load, containers not ready. Fix by stabilizing infrastructure.

The process: quarantine immediately, diagnose systematically, fix the root cause, verify with repeated runs, remove the quarantine tag. Every test in your suite should be a reliable signal. When it is, you can trust your CI.

Key Takeaways

Why Flaky Tests Are a Serious Problem

Root Cause 1: Async Timing Issues

The Pattern

The Fix: Wait for Application State, Not Time

Common Timing Patterns and Their Fixes

Root Cause 2: Shared Test State

The Pattern

The Fix: Test Isolation

Parallelism-Specific Patterns

Root Cause 3: Order Dependency

The Pattern

The Fix: Self-Contained Tests

Root Cause 4: External Service Dependencies

The Pattern

The Fix: Mock External Dependencies

Root Cause 5: Resource Contention

The Pattern

The Fix: Isolate Resources

Root Cause 6: Environment-Specific Behavior

Common Environment Issues

The Fix: Control the Environment

Root Cause 7: Test Data Pollution

The Pattern

The Fix: Test Relative State, Not Absolute Values

Root Cause 8: Test Infrastructure Issues

Common Infrastructure Issues

The Fix: Stabilize Infrastructure

Diagnosing Flaky Tests

Step 1: Reproduce Locally

Step 2: Identify the Failure Mode

Step 3: Add Diagnostic Logging

Step 4: Quarantine While You Fix

The Retry Trap

Building a Reliable Test Suite

Frequently Asked Questions

What is a flaky test?

How do you find flaky tests?

Should I use test retries to fix flaky tests?

Why do tests pass locally but fail in CI?

What percentage of tests should be flaky?

How does HelpMeTest help with flaky tests?

Summary

Read more

AI Testing: How Artificial Intelligence Is Changing QA

Selenium Tutorial for Beginners: Web Automation from Scratch

Cucumber Testing: BDD Testing for Every Team

Node.js Testing: Tools, Patterns, and Best Practices