Test Quarantine Strategies: Managing Flaky Tests Without Deleting Them

Test Quarantine Strategies: Managing Flaky Tests Without Deleting Them

You've identified a flaky test. You can't fix it right now — it's intermittent, the root cause isn't clear, and you have a release to ship. What do you do?

Deleting the test loses coverage. Skipping it silently means the failure is invisible. Leaving it in the main suite means your CI is unreliable and developers start ignoring failures.

Test quarantine is the systematic approach to handling this situation. Quarantine means isolating flaky tests to a separate pipeline where they can't block releases, while tracking them as technical debt that must be resolved.

The Goals of Quarantine

A good quarantine system:

  1. Removes flaky tests from the main pipeline — they don't block PRs or releases
  2. Keeps them running — you still get signals about failures
  3. Tracks them as debt — there's a visible backlog of quarantined tests to fix
  4. Makes it easy to un-quarantine — once fixed, tests return to the main suite quickly
  5. Alerts on degradation — if quarantined tests get worse, you know

Implementation: Tagging and Separate Jobs

The cleanest approach is marking flaky tests with a tag and running them in a separate CI job.

Python (pytest)

import pytest

# Mark a test as quarantined
@pytest.mark.quarantine
@pytest.mark.xfail(reason="Flaky - race condition in notification service, HEL-456")
def test_notification_sent_after_purchase():
    checkout(user='alice', item='widget')
    time.sleep(1)
    assert notification_service.sent_count() == 1
# pytest.ini
[pytest]
markers =
    quarantine: Tests quarantined due to flakiness - do not run in main pipeline
# .github/workflows/tests.yml
jobs:
  main-tests:
    steps:
      - run: pytest tests/ -m "not quarantine" --tb=short

  quarantine-tests:
    steps:
      - run: pytest tests/ -m quarantine --tb=long -v
    continue-on-error: true  # Don't fail the workflow
    if: github.ref == 'refs/heads/main'  # Only on main branch

JavaScript (Jest)

// Use a custom test runner or describe block
describe.skip('QUARANTINED', () => {
  test('processes payment asynchronously', async () => {
    // ...
  });
});

Or a more structured approach with custom environment variable:

const quarantine = process.env.RUN_QUARANTINED ? test : test.skip;

quarantine('payment processed event published', async () => {
  // flaky async test
});
# Separate CI job
- name: Run quarantined tests
  run: RUN_QUARANTINED=true jest --testPathPattern="quarantine"
  continue-on-error: true

Ruby (RSpec)

RSpec.describe PaymentService do
  context '[QUARANTINED]', :quarantine do
    it 'sends confirmation email within 2 seconds' do
      # flaky timing test
    end
  end
end
# Main pipeline
bundle <span class="hljs-built_in">exec rspec --tag ~quarantine

<span class="hljs-comment"># Quarantine pipeline
bundle <span class="hljs-built_in">exec rspec --tag quarantine

Tracking Quarantined Tests

Quarantine without tracking is just a graveyard. Tests rot there indefinitely unless you make them visible.

Enforce that every quarantined test references an issue:

# Bad: no context
@pytest.mark.quarantine
def test_something_flaky():
    ...

# Good: links to the tracking issue
@pytest.mark.quarantine(reason="Race condition in event processor", issue="HEL-789")
def test_something_flaky():
    ...

Write a custom pytest plugin that enforces this:

# conftest.py
def pytest_collection_modifyitems(items):
    for item in items:
        quarantine_mark = item.get_closest_marker('quarantine')
        if quarantine_mark:
            if not quarantine_mark.kwargs.get('issue'):
                raise ValueError(
                    f"Test {item.nodeid} is quarantined but missing required 'issue' parameter. "
                    f"Add @pytest.mark.quarantine(issue='ABC-123')"
                )

Dashboard and Alerts

Track quarantine health over time:

# Generate quarantine report
import subprocess
import json
from datetime import datetime

result = subprocess.run(
    ['pytest', '--collect-only', '-q', '-m', 'quarantine', '--no-header'],
    capture_output=True, text=True
)

quarantined = [line for line in result.stdout.split('\n') if '::' in line]

report = {
    'date': datetime.now().isoformat(),
    'count': len(quarantined),
    'tests': quarantined
}

with open('quarantine-report.json', 'w') as f:
    json.dump(report, f, indent=2)

print(f"Quarantined tests: {len(quarantined)}")
if len(quarantined) > 20:
    print("WARNING: Quarantine backlog exceeding 20 tests")
    exit(1)  # Fail the build if quarantine grows too large

Set a maximum quarantine size (e.g., 10% of total tests). If you exceed it, something is systematically wrong.

Automated Quarantine Detection

Instead of manually marking tests, automatically detect and quarantine flaky tests:

# quarantine-detector.py
import subprocess
import json
from collections import defaultdict

def run_test_suite(times=5):
    """Run the full suite multiple times and collect results."""
    results = defaultdict(list)
    
    for i in range(times):
        result = subprocess.run(
            ['pytest', '--tb=no', '-q', '--json-report', f'--json-report-file=run-{i}.json'],
            capture_output=True
        )
        with open(f'run-{i}.json') as f:
            data = json.load(f)
        for test in data['tests']:
            results[test['nodeid']].append(test['outcome'])
    
    return results

def find_flaky_tests(results, threshold=0.15):
    """Return tests that fail more than threshold% of the time but not always."""
    flaky = []
    for test_id, outcomes in results.items():
        failure_rate = outcomes.count('failed') / len(outcomes)
        if 0 < failure_rate < 1.0 - threshold:
            flaky.append({
                'test': test_id,
                'failure_rate': failure_rate,
                'outcomes': outcomes
            })
    return flaky

results = run_test_suite(times=10)
flaky = find_flaky_tests(results)

print(f"Found {len(flaky)} flaky tests:")
for test in sorted(flaky, key=lambda t: t['failure_rate'], reverse=True):
    print(f"  {test['failure_rate']:.0%} failure rate: {test['test']}")

Run this weekly and automatically create issues for newly detected flaky tests.

Quarantine SLAs

Quarantine is a holding area, not a permanent home. Enforce time limits:

# Check for stale quarantine marks
import ast
import os
from datetime import datetime, timedelta

QUARANTINE_MAX_AGE_DAYS = 30

def check_quarantine_age():
    for root, dirs, files in os.walk('tests/'):
        for file in files:
            if not file.endswith('.py'):
                continue
            filepath = os.path.join(root, file)
            with open(filepath) as f:
                content = f.read()
            
            if 'quarantine' in content:
                # Parse quarantine date from comment
                # e.g. # quarantined: 2024-01-15
                lines = content.split('\n')
                for i, line in enumerate(lines):
                    if 'quarantine' in line and 'date:' in line:
                        date_str = line.split('date:')[1].strip()
                        quarantine_date = datetime.fromisoformat(date_str)
                        age = datetime.now() - quarantine_date
                        
                        if age > timedelta(days=QUARANTINE_MAX_AGE_DAYS):
                            print(f"STALE QUARANTINE: {filepath}:{i+1} "
                                  f"(quarantined {age.days} days ago)")

check_quarantine_age()

Schedule this check in CI and fail the build if quarantine is older than your SLA.

Returning Tests from Quarantine

When you fix a flaky test, the return-to-main process should be explicit:

  1. Fix the root cause (not just the symptom)
  2. Remove the quarantine mark
  3. Run the test 20+ times to verify it's stable:
# Run the specific test 20 times
<span class="hljs-keyword">for i <span class="hljs-keyword">in $(<span class="hljs-built_in">seq 1 20); <span class="hljs-keyword">do
  pytest tests/path/to/test.py::test_specific_case -v --tb=short
  <span class="hljs-built_in">echo <span class="hljs-string">"Run $i complete"
<span class="hljs-keyword">done
  1. Add it back to the main suite via PR
  2. Close the tracking issue

If the test fails during the 20-run verification, it's not fixed — go back to diagnosis.

Quarantine vs Skip vs Delete

Approach Signal preserved Developer sees failure Tracking
Delete No N/A N/A
Skip (@pytest.mark.skip) No No Depends
xfail Yes (as expected failure) No (shown as xfail) Depends
Quarantine pipeline Yes Yes (different job) Required

Quarantine is the right choice when:

  • The test covers important behavior
  • The flakiness cause is unclear
  • You need time to investigate without blocking CI

Delete when:

  • The behavior is covered by other tests
  • The test is testing implementation details you're removing
  • The test has been in quarantine for months with no fix progress

Skip when:

  • The feature is temporarily disabled
  • You're in the middle of a large refactor and the test will be rewritten

Quarantine is temporary by definition. If you're not actively working to fix quarantined tests, you're just accumulating technical debt with extra steps.

Read more