Testing

Root Causes of Flaky Tests in CI: Why Tests Pass Locally but Fail in CI

HelpMeTest

24 May 2026 — 5 min read

"It works on my machine" is one of the oldest complaints in software development. With tests, the equivalent is "it passes locally but fails in CI." This pattern has specific, identifiable causes — and fixing them systematically makes your test suite dramatically more reliable.

Why CI Is Different from Local

CI environments differ from developer machines in important ways:

Different OS (often Linux in CI vs macOS locally)
No GPU or display — tests that render graphics fail without a virtual display
Resource constraints — less CPU and RAM per job, shared infrastructure
Network restrictions — no access to external services, different DNS
Fresh state every run — no caches, no persistent files, no leftover processes
Parallelism — multiple test jobs running concurrently on shared infrastructure
Time zone — usually UTC; your machine might be local time

Understanding these differences lets you anticipate flakiness before it happens.

Root Cause 1: Resource Starvation

CI machines share resources. Your tests might complete in 200ms locally but take 2000ms in CI when CPU is throttled.

Symptoms: Tests that use fixed timeouts fail intermittently. Database queries time out. Async operations complete too slowly.

How to find it:

# Add timing to your test output
pytest tests/ -v --durations=10

<span class="hljs-comment"># Compare local vs CI timing
<span class="hljs-comment"># Local: test_user_creation ... 0.08s
<span class="hljs-comment"># CI:    test_user_creation ... 2.34s  ← resource starvation

Fix:

# Bad: hardcoded timeout
def wait_for_email():
    time.sleep(2)  # assumes 2s is enough

# Good: configurable timeout
def wait_for_email(timeout=None):
    timeout = timeout or int(os.environ.get('TEST_TIMEOUT', '5'))
    deadline = time.time() + timeout
    while time.time() < deadline:
        if email_service.has_pending():
            return True
        time.sleep(0.1)
    return False

Set TEST_TIMEOUT=10 in CI environments where resources are constrained.

Root Cause 2: No Display / Headless Environment

Browser tests using Selenium or Playwright fail in CI without a display server. The browser can't render without one.

Symptom: WebDriverException: Message: unknown error: Chrome failed to start: exited normally

Fix for Selenium:

# GitHub Actions
- name: Run browser tests
  run: pytest tests/browser/
  env:
    DISPLAY: ":99"

- name: Start virtual display
  run: |
    Xvfb :99 -screen 0 1920x1080x24 &

Or use headless mode:

from selenium.webdriver.chrome.options import Options

options = Options()
if os.environ.get('CI'):
    options.add_argument('--headless')
    options.add_argument('--no-sandbox')
    options.add_argument('--disable-dev-shm-usage')

driver = webdriver.Chrome(options=options)

Playwright handles headless automatically, making this much simpler:

const browser = await chromium.launch({ headless: !!process.env.CI });

Root Cause 3: File System Differences

Linux (CI) and macOS (local) have different file systems:

Case sensitivity: macOS is case-insensitive by default; Linux is case-sensitive. require('./MyModule') finds mymodule.js on Mac but fails on Linux.
Path separators: \ vs /. Hardcoded Windows paths break on Unix.
Line endings: \r\n vs \n. Test that compares file contents can fail.
Temp directories: /tmp in Linux, may be different path on Mac.

Symptoms: Module not found errors in CI that don't happen locally. String comparison tests fail on file content.

Fix case sensitivity:

# Find case mismatches in your codebase
git ls-files <span class="hljs-pipe">| <span class="hljs-built_in">tr <span class="hljs-string">'[:upper:]' <span class="hljs-string">'[:lower:]' <span class="hljs-pipe">| <span class="hljs-built_in">sort <span class="hljs-pipe">| <span class="hljs-built_in">uniq -d

Fix temp directory:

import tempfile
import os

# Bad
tmp_file = '/tmp/test_output.txt'

# Good
tmp_file = os.path.join(tempfile.gettempdir(), 'test_output.txt')

Fix line endings:

# Bad: breaks on different OS
assert file_content == "line1\r\nline2\r\n"

# Good: normalize before comparing  
assert file_content.replace('\r\n', '\n') == "line1\nline2\n"

Root Cause 4: Test Ordering and Isolation

When tests run in parallel in CI (common for performance), they share resources that weren't designed for concurrent access: ports, files, database records.

Symptom: Tests pass individually but fail when run together. Failures don't reproduce with serial execution.

Find ordering issues:

# Run in random order to detect ordering dependencies
pytest tests/ -p randomly --randomly-seed=random

<span class="hljs-comment"># Run in parallel to detect isolation issues
pytest tests/ -n 4

Fix shared database state:

# Bad: tests share data
class TestOrders:
    def setup_method(self):
        self.order = Order.create(customer='alice', total=99.99)
    
    def test_order_total(self):
        # May see orders from other parallel test classes
        assert Order.total_revenue() == 99.99

# Good: each test gets its own isolated data
@pytest.fixture
def db_session():
    with begin_transaction() as txn:
        yield txn
        txn.rollback()

def test_order_total(db_session):
    order = Order.create(customer='alice', total=99.99, session=db_session)
    assert Order.total_revenue(session=db_session) == 99.99

Fix port conflicts for parallel tests:

import socket

def get_free_port():
    with socket.socket() as s:
        s.bind(('', 0))
        return s.getsockname()[1]

@pytest.fixture
def server():
    port = get_free_port()
    s = Server(port=port)
    s.start()
    yield s
    s.stop()

Root Cause 5: Network and External Service Access

CI often blocks outbound network connections for security. Tests that call real APIs fail.

Symptoms: ConnectionRefusedError, requests.exceptions.ConnectionError, DNS lookup failures.

Fix: Mock all external service calls. Never call real external services from unit or integration tests:

# Bad: calls real Stripe API
def test_payment_processing():
    result = stripe.charge.create(amount=999, currency='usd', source='tok_visa')
    assert result['status'] == 'succeeded'

# Good: mock the API client
@patch('myapp.payments.stripe.charge.create')
def test_payment_processing(mock_charge):
    mock_charge.return_value = {'status': 'succeeded', 'id': 'ch_test'}
    result = process_payment(amount=999)
    assert result['status'] == 'succeeded'

For integration tests that need a real database, use Docker Compose in CI:

# GitHub Actions
services:
  postgres:
    image: postgres:16
    env:
      POSTGRES_DB: test_db
      POSTGRES_USER: test_user
      POSTGRES_PASSWORD: test_pass
    ports:
      - 5432:5432
    options: >-
      --health-cmd pg_isready
      --health-interval 10s
      --health-timeout 5s
      --health-retries 5

Root Cause 6: Time Zone and Clock Issues

CI usually runs in UTC. Local machines run in the developer's local time zone. Tests that use the current time without specifying a time zone can fail.

Symptom: Date-related tests fail in CI but pass locally. Tests fail at certain hours (usually midnight UTC, corresponding to business hours elsewhere).

Fix:

# Bad: depends on local timezone
def test_shows_today_orders():
    order = create_order()
    orders = get_orders_for_date(date.today())
    assert order in orders  # fails if date.today() differs between order creation and query

# Good: explicit timezone handling
from datetime import datetime, timezone

def test_shows_today_orders():
    now = datetime(2024, 3, 15, 12, 0, 0, tzinfo=timezone.utc)
    order = create_order(created_at=now)
    orders = get_orders_for_date(now.date(), tz=timezone.utc)
    assert order in orders

Set TZ=UTC in CI configuration and run your tests locally with the same setting to catch these issues early:

TZ=UTC pytest tests/

Root Cause 7: Caching Side Effects

Local environments accumulate cached data: compiled assets, downloaded dependencies, test data. CI starts fresh each run.

Symptom: Tests that verify "first run" behavior always pass in CI but fail locally (or vice versa). Tests that create cache files and expect them to persist fail in CI.

Fix: Explicitly set up and tear down any cache state in tests:

@pytest.fixture(autouse=True)
def clear_cache():
    cache.clear()
    yield
    cache.clear()

def test_cache_miss_fetches_from_database(clear_cache):
    # No cache populated yet — guaranteed fresh state
    result = get_user_profile('user123')
    assert db.query_count == 1  # fetched from DB, not cache

Diagnosing in Practice

When a test fails in CI but not locally:

Reproduce in CI — run the same test multiple times in CI to confirm it's flaky, not a one-time infra issue
Add logging — wrap the failing assertion with detailed logging about state, timestamps, and context
Check the category — does the failure look like a timing issue? A state pollution issue? A file system issue?
Run locally with CI constraints — use TZ=UTC, run in Docker with limited CPU/memory, use pytest -n 4 for parallel execution
Isolate the test — run just the failing test in CI to see if it fails without other tests running alongside it

# Run the specific failing test in CI mode
docker run --<span class="hljs-built_in">rm -e TZ=UTC -e CI=<span class="hljs-literal">true \
  --cpus 1 --memory 512m \
  your-test-image pytest tests/path/to/test.py::test_specific_case -v

Systematic diagnosis beats guessing. The differences between local and CI are finite and knowable — work through them methodically.

Root Causes of Flaky Tests in CI: Why Tests Pass Locally but Fail in CI

HelpMeTest

Why CI Is Different from Local

Root Cause 1: Resource Starvation

Root Cause 2: No Display / Headless Environment

Root Cause 3: File System Differences

Root Cause 4: Test Ordering and Isolation

Root Cause 5: Network and External Service Access

Root Cause 6: Time Zone and Clock Issues

Root Cause 7: Caching Side Effects

Diagnosing in Practice

Read more

Testing React Router v7 with Vite + Vitest: Setup and Best Practices

E2E Testing React Router v7 Apps with Playwright

Migrating from Remix to React Router v7: Testing Your Migration

Testing React Router v7 Loaders and Actions with Vitest