Mutation Testing: Why 100% Code Coverage Doesn't Mean Your Tests Are Good
You've hit 100% code coverage. Every line, every branch, every condition — green. Time to ship with confidence, right?
Not quite. Code coverage tells you which lines were executed during tests. It says nothing about whether your tests would catch a bug if one appeared. Mutation testing does.
The Coverage Lie
Consider this function:
function calculateDiscount(price, isPremium) {
if (isPremium) {
return price * 0.8;
}
return price;
}And this test:
it('applies discount for premium users', () => {
const result = calculateDiscount(100, true);
expect(result).toBeDefined(); // just checking it returns something
});Coverage tool says: 100%. Both branches executed. Everything's fine.
But this test is useless. It doesn't check that the discount is 20%. It doesn't verify non-premium users get full price. A bug that changes 0.8 to 0.9 — a 10% pricing error — would sail straight through your test suite and into production.
Mutation testing catches this.
What Is Mutation Testing?
Mutation testing works by deliberately introducing small bugs — called mutants — into your source code, then running your test suite against each mutant. If your tests catch the bug, the mutant is killed. If your tests still pass with the bug present, the mutant survived.
Survived mutants are the signal you care about. They represent real gaps in your test suite — places where a developer could introduce exactly that kind of bug and your tests would miss it entirely.
Types of Mutants
Mutation testing tools generate mutants by applying transformation rules to your code. The most common types:
Arithmetic mutants — swap math operators:
price * 0.8becomesprice / 0.8count + 1becomescount - 1total * ratebecomestotal + rate
Boundary mutants — shift comparison operators:
if (age >= 18)becomesif (age > 18)if (count <= limit)becomesif (count < limit)if (score > 0)becomesif (score >= 0)
Logical mutants — invert boolean operators:
&&becomes||!isValidbecomesisValidtruebecomesfalse
Return value mutants — change what functions return:
return truebecomesreturn falsereturn resultbecomesreturn nullreturn countbecomesreturn 0
Conditional mutants — replace conditions entirely:
if (isPremium)becomesif (true)orif (false)
Each of these represents a class of real bugs developers actually write. Mutation testing asks: would your tests notice?
The Mutation Score Metric
Your mutation score is the percentage of mutants your tests killed:
mutation score = (killed mutants / total mutants) × 100A mutation score of 85% means 85% of the deliberate bugs your tool introduced were caught by your tests. The remaining 15% survived — your tests wouldn't catch those real bugs.
Compare this to coverage: a project can have 100% line coverage and a 40% mutation score. The tests execute the code but make no useful assertions about what it does.
Typical mutation score targets:
- Below 60% — test suite has serious gaps, high risk
- 60–80% — acceptable for lower-risk code
- 80–90% — good for most production code
- Above 90% — excellent, appropriate for critical business logic
Don't aim for 100%. Some mutants are equivalent mutants — semantically identical to the original code despite looking different. Chasing them wastes time.
A Concrete Example
Back to the discount function. A mutation testing tool might generate these mutants:
| Mutant | Changed code | Killed? |
|---|---|---|
| 1 | price * 0.8 → price / 0.8 |
Depends on assertions |
| 2 | price * 0.8 → price + 0.8 |
Depends on assertions |
| 3 | if (isPremium) → if (true) |
Depends on non-premium test |
| 4 | return price → return 0 |
Depends on non-premium test |
With the weak test above (expect(result).toBeDefined()), all four survive. With a proper test:
it('applies 20% discount for premium users', () => {
expect(calculateDiscount(100, true)).toBe(80);
});
it('returns full price for non-premium users', () => {
expect(calculateDiscount(100, false)).toBe(100);
});Now mutants 1 and 2 are killed (wrong result), mutant 3 is killed (non-premium test fails), mutant 4 is killed (non-premium test gets 0 instead of 100).
That's mutation testing working as intended.
When Should You Use Mutation Testing?
Mutation testing is computationally expensive — a large project can take hours to run all mutants. Use it strategically:
Apply it to high-value code first:
- Payment and billing logic
- Authentication and authorization
- Data validation rules
- Core business algorithms
- Security-sensitive functions
Don't run it on everything simultaneously. Start with the code where a missed bug would hurt most. Gradually expand coverage as you tune your pipeline.
Run it on a schedule, not every commit. Many teams run mutation testing nightly or weekly, not on every pull request. Incremental mutation testing (only mutating changed files) makes PR integration more practical.
Use it to evaluate test quality, not developer performance. A low mutation score reveals test gaps, not bad developers. The goal is to use the findings to write better assertions.
Mutation Testing vs Code Coverage
| Metric | What it measures | What it misses |
|---|---|---|
| Line coverage | Whether code was executed | Whether tests check results |
| Branch coverage | Whether all paths were taken | Whether assertions are meaningful |
| Mutation score | Whether tests catch real bugs | Equivalent mutants |
None of these replace each other. Use coverage to find untested paths. Use mutation testing to verify that tests on those paths actually work.
Getting Started
Several mature tools exist for mutation testing:
- JavaScript/TypeScript: Stryker (stryker-mutator.io)
- Java: PIT (pitest.org)
- Python: mutmut, Cosmic Ray
- C#: Stryker.NET
- Go: go-mutesting
Most integrate with existing test runners. You don't need to rewrite your tests — just add the tool and let it run against what you already have. The surviving mutants will tell you exactly where to improve.
The Takeaway
100% code coverage is a floor, not a ceiling. It means your tests touched the code. Mutation testing tells you whether your tests understood the code — whether they would catch the bugs that matter.
A test suite with meaningful assertions that kills 85% of mutants is worth far more than one with 100% coverage and no real checks. Build tests that notice when things go wrong. Mutation testing is how you prove they do.
Start with your most critical business logic. Run the tool. Fix the survivors. Repeat.