mutmut: Practical Mutation Testing for Python Projects

mutmut: Practical Mutation Testing for Python Projects

mutmut is the most widely adopted Python mutation testing tool. It works by making small, targeted modifications to your source code — flipping + to -, changing True to False, removing return values — and running your test suite against each modification. If your tests don't fail when the code changes, those tests aren't actually verifying behavior. This guide walks through installing, configuring, and getting real value from mutmut in Python projects.

Key Takeaways

Surviving mutants expose untested logic. When mutmut reports a mutant as "survived," it means that change didn't break any test. That's a gap in your test suite, not a problem with your code.

Start with critical modules. Running mutation testing against an entire codebase is slow. Focus mutmut on business-critical code first — payment logic, auth, data transforms.

Use --use-coverage to skip uncovered lines. Mutation testing on uncovered code is noise. Run pytest with coverage first, then mutmut only mutates lines your tests actually reach.

mutmut html gives you the full picture. The terminal output shows counts. The HTML report shows exactly which lines were mutated and which survived, side by side.

CI thresholds prevent regressions. Set a minimum kill rate in CI. If new code ships with surviving mutants in critical paths, the build should fail.

What mutmut Is and Why It Exists

Code coverage tells you which lines executed during tests. It does not tell you whether your tests actually verify behavior. You can have 100% line coverage with zero meaningful assertions — every line runs, nothing is validated.

Mutation testing takes a different approach. Instead of measuring what code runs, it measures whether tests catch changes to that code. The tool introduces a deliberate bug — a mutation — then runs your test suite. If the tests fail, the mutant was "killed" (good). If the tests still pass, the mutant "survived" (your tests missed it).

mutmut is the de-facto mutation testing tool for Python. It's mature, actively maintained, integrates cleanly with pytest, and produces readable output. Unlike alternatives that require complex setup, mutmut works with your existing test runner and needs no code changes.

The reason mutmut became the standard Python choice comes down to practical usability: it's fast enough for real projects, has sensible defaults, and gives you actionable output rather than raw mutation counts.

Installation

mutmut requires Python 3.7 or later and installs via pip:

pip install mutmut

For projects using pyproject.toml or setup.cfg, add it to your dev dependencies:

# pyproject.toml
[tool.poetry.dev-dependencies]
mutmut = "^2.4"
# Or with pip-tools
<span class="hljs-comment"># requirements-dev.txt
mutmut>=2.4

mutmut has no mandatory dependencies beyond your existing test runner. If you use pytest (which is the common case), you're ready to run immediately after installation.

Basic Usage

Running Mutations

The simplest invocation runs mutations against all Python files in the project:

mutmut run

mutmut will:

  1. Scan your source files and generate mutants
  2. Run your test suite once to confirm it passes cleanly (the "baseline run")
  3. Apply each mutant one at a time and run the tests
  4. Record whether each mutant was killed or survived

On a cold start, this will take a while. mutmut caches results in a .mutmut-cache SQLite database so subsequent runs only re-test changed files.

Viewing Results

After the run completes, get a summary:

mutmut results

Output looks like:

To apply a mutant on disk:
    mutmut apply <id>

To show a mutant:
    mutmut show <id>

Survived mutants:
    12, 15, 23, 24, 41

For a specific mutant:

mutmut show 15

This prints the diff showing exactly what change was made:

--- src/pricing.py
+++ src/pricing.py
@@ -14,7 +14,7 @@
 def calculate_discount(price, rate):
-    return price * (1 - rate)
+    return price * (1 + rate)

A mutant that survives here means no test verified that discounts reduce the price. That's a real gap.

Understanding mutmut Output

mutmut classifies each mutant into one of four categories:

Killed — The mutation caused at least one test to fail. This is what you want. Your tests detected the behavior change.

Survived — The mutation caused no test failures. Either the changed code path has no tests, or the tests aren't asserting the right things.

Suspicious — The test suite took longer than usual with this mutant. Might be a slow test, might be an infinite loop introduced by the mutation. Worth investigating.

Timeout — The test suite exceeded the configured timeout. Often happens with mutations that affect loop conditions, turning finite loops into infinite ones.

The metric you care about most is the kill rate: killed / (killed + survived). A kill rate above 80% is a reasonable target for critical modules. Uncovered mutants (code not reached by any test) don't count toward the denominator by default — they're reported separately.

Configuration

mutmut reads configuration from setup.cfg or pyproject.toml. Common options:

# setup.cfg
[mutmut]
paths_to_mutate=src/
backup=False
runner=python -m pytest
tests_dir=tests/
test_file_pattern=test_*.py
# pyproject.toml
[tool.mutmut]
paths_to_mutate = "src/"
runner = "python -m pytest"
tests_dir = "tests/"
test_file_pattern = "test_*.py"
backup = false

Key configuration options:

  • paths_to_mutate — which directories to mutate. Defaults to everything, which is usually too broad. Set this to your source directory explicitly.
  • runner — the command to run tests. Defaults to python -m pytest. Change if you use a different runner or need specific flags.
  • test_file_pattern — pattern for identifying test files. Defaults to test_*.py.
  • backup — whether to keep backup copies of mutated files. Defaults to False.

Targeting Specific Files and Functions

Run mutations on a single file:

mutmut run --paths-to-mutate src/pricing.py

To focus on a specific function, you can filter at the file level but not the function level directly. The practical approach is to configure paths_to_mutate to point at the specific module you care about right now.

Integrating with pytest

Using Coverage to Focus Mutations

Running mutmut against uncovered code wastes time — if no test reaches a line, no mutation of that line will be caught. Use the --use-coverage flag to skip uncovered lines:

# First generate a coverage report
pytest --cov=src --cov-report=xml

<span class="hljs-comment"># Then run mutmut using coverage data
mutmut run --use-coverage

With --use-coverage, mutmut reads the coverage XML and only generates mutants for lines that tests actually hit. This dramatically reduces mutation count in large codebases and focuses effort on testable code.

Parallel Execution

mutmut supports running mutations in parallel with the --processes flag:

mutmut run --processes 4

This spawns multiple test runner processes, each handling a subset of mutants. Speedup depends on your test suite — if tests themselves are slow, parallel mutant execution won't help as much. For test suites that finish in under 10 seconds, 4 parallel processes typically cuts total mutation testing time by 2-3x.

Fast Fail with -x

When debugging surviving mutants, use pytest's -x flag to stop on first failure:

# setup.cfg
[mutmut]
runner = python -m pytest -x

This makes each mutant test run faster when a failure is found, rather than running the full suite. Use this for interactive sessions, not CI.

CI Integration and JUnit Output

mutmut can export results in JUnit XML format for CI systems:

mutmut run
mutmut junitxml > mutation-results.xml

GitHub Actions example:

name: Mutation Testing
on: [push, pull_request]

jobs:
  mutation:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.12'
      - name: Install dependencies
        run: pip install -r requirements-dev.txt
      - name: Run tests with coverage
        run: pytest --cov=src --cov-report=xml
      - name: Run mutation tests
        run: mutmut run --use-coverage --paths-to-mutate src/critical/
      - name: Export results
        run: mutmut junitxml > mutation-results.xml
      - name: Upload results
        uses: actions/upload-artifact@v4
        with:
          name: mutation-results
          path: mutation-results.xml

To enforce a minimum kill rate in CI, add a check step:

# Extract and check kill rate
python - <<<span class="hljs-string">'EOF'
import subprocess
result = subprocess.run([<span class="hljs-string">'mutmut', <span class="hljs-string">'results'], capture_output=True, text=True)
<span class="hljs-comment"># Parse results and fail if kill rate below threshold
<span class="hljs-comment"># mutmut results --ci flag for exit codes is available in newer versions
EOF

As of mutmut 2.4, the --ci flag makes the process exit with a non-zero code if any mutants survived, which integrates naturally with CI failure detection.

The HTML Report

The terminal output shows IDs and counts. For a full picture, generate the HTML report:

mutmut html

This creates an html/ directory with one file per mutated source file. Open html/index.html in a browser. Each source file shows:

  • Lines in green where all mutants were killed
  • Lines in red where mutants survived
  • The exact diff for each surviving mutant, inline

The HTML report is the fastest way to understand which parts of your codebase are actually validated by tests. A file with consistent red lines tells you where to write tests next.

Fixing Surviving Mutants

When you find a surviving mutant, the fix is usually one of two things:

Write a test that would catch the mutation. If mutmut changed > to >= and your tests didn't notice, write a test with boundary values that exercises that condition explicitly.

Discover the code is actually dead. Sometimes surviving mutants reveal code that's never reached by any test — and after investigation, never reached in production either. That's code to delete, not test.

Real-World Example

Consider a function that applies a rate limit:

def is_rate_limited(request_count, limit, window_seconds, elapsed):
    if elapsed >= window_seconds:
        return False
    return request_count >= limit

mutmut might generate these mutants:

  • Change >= to > on line 2 (boundary condition)
  • Change >= to > on line 3 (boundary condition)
  • Change False to True on line 2 (logic inversion)
  • Flip return request_count >= limit to return request_count <= limit

If your tests only check the common case (100 requests over 60 seconds hits the limit), several of these will survive. The surviving mutants reveal you haven't tested:

  • What happens exactly at the boundary (request_count == limit)
  • What happens at exactly window_seconds elapsed
  • What the return value is when the window has passed

These are real behavioral questions. Mutation testing surfaces them automatically.

Limitations and When to Use mutmut

Mutation testing is slow. A test suite that runs in 5 seconds will take hours if you have thousands of mutants. This is unavoidable — each mutant requires a full test run.

Practical strategies for managing this:

Don't run mutation testing on every commit. Run it nightly or weekly, or only on the modules you're actively changing.

Limit scope to critical code. Payment processing, authentication, authorization rules, core business calculations — these deserve mutation testing. Configuration parsing and utility functions usually don't.

Use caching. mutmut's .mutmut-cache remembers results. On subsequent runs, only changed code is re-mutated.

Set expectations correctly. A 75-80% kill rate on critical business logic is a reasonable target. Chasing 100% is usually not worth the time, and some mutants are genuinely equivalent (the mutation produces identical behavior despite different code).

Mutation testing is also not a replacement for thinking about what to test. It amplifies your existing test suite — if your tests have systematic blind spots (never test error cases, never test boundaries), mutation testing will reveal that, but you still need to understand why and write the right tests.

Conclusion

mutmut is the practical choice for adding mutation testing to Python projects. Install it, point it at a critical module, run with coverage filtering, and spend an hour fixing the surviving mutants. The result is a test suite that actually catches behavioral regressions — not just a coverage number.

The discipline of reading surviving mutants and asking "why didn't my tests catch this?" consistently produces better tests than writing tests by intuition alone. It's a forcing function for test quality.

HelpMeTest extends your Python testing strategy with 24/7 monitoring and AI-powered test generation — try free at helpmetest.com

Read more