How to Improve Your Mutation Score: From Surviving Mutants to Killed Mutants

How to Improve Your Mutation Score: From Surviving Mutants to Killed Mutants

You've run your first mutation testing report. Mutation score: 58%. Now what?

Improving mutation score isn't about writing more tests — it's about writing better assertions. Here's a systematic approach to turning surviving mutants into killed ones, starting with the changes that matter most.

Step 1: Understand Why Mutants Survive

Mutants survive for three reasons:

  1. No test covers the code — the code runs but no test exercises it at all
  2. Tests cover the code but don't assert outcomes — tests execute the code and pass regardless of what it returns
  3. The mutant is equivalent — the changed code is semantically identical to the original (more on this later)

Before writing any new tests, categorize your surviving mutants. Open the HTML report and look at the first 10 survivors. Which category do they fall into?

Category 1 requires writing new tests. Category 2 requires strengthening existing assertions. Category 3 can be ignored. Most projects have a mix of all three, but category 2 is the most common — tests that just don't check enough.

Step 2: Fix Weak Assertions First

The fastest wins come from tests that already execute the right code but make no meaningful assertions. Look for these patterns in your test suite:

Asserting existence instead of value:

// Weak — only checks the function returned something
expect(calculateTotal(items)).toBeDefined();

// Strong — checks the actual result
expect(calculateTotal(items)).toBe(47.50);

Asserting type instead of content:

// Weak
expect(typeof result).toBe('object');

// Strong
expect(result).toEqual({ id: 1, status: 'active', balance: 100 });

Asserting truthiness instead of value:

// Weak — 'hello' and 'world' are both truthy
expect(formatName('John', 'Smith')).toBeTruthy();

// Strong
expect(formatName('John', 'Smith')).toBe('John Smith');

Not verifying mock calls with arguments:

// Weak — verifies the method was called but not with what
expect(emailService.send).toHaveBeenCalled();

// Strong — kills void-call mutants
expect(emailService.send).toHaveBeenCalledWith({
  to: 'user@example.com',
  subject: 'Welcome',
  template: 'welcome-email'
});

Go through your surviving mutants and find tests that could kill them with just a stronger assertion. These are zero-cost improvements — the test already exists, you just need to check the output properly.

Step 3: Apply Boundary Value Testing

The single most productive category of surviving mutants is boundary conditions. Mutation tools love shifting < to <= and > to >= because most test suites test the middle of ranges, not the edges.

For every conditional in your codebase, test three values:

  • One value just below the boundary
  • The exact boundary value
  • One value just above the boundary
// Function under test
function getShippingCost(orderTotal) {
  if (orderTotal >= 50) return 0;
  if (orderTotal >= 25) return 4.99;
  return 9.99;
}

// Boundary tests kill >= vs > mutations
describe('getShippingCost', () => {
  // Below first boundary
  it('charges full shipping under $25', () => {
    expect(getShippingCost(24.99)).toBe(9.99);
  });

  // At first boundary — kills >= vs > mutant
  it('charges reduced shipping at exactly $25', () => {
    expect(getShippingCost(25)).toBe(4.99);
  });

  // Above first boundary, below second
  it('charges reduced shipping between $25 and $49.99', () => {
    expect(getShippingCost(30)).toBe(4.99);
  });

  // At second boundary — kills >= vs > mutant
  it('gives free shipping at exactly $50', () => {
    expect(getShippingCost(50)).toBe(0);
  });

  // Above second boundary
  it('gives free shipping over $50', () => {
    expect(getShippingCost(100)).toBe(0);
  });
});

These five tests kill all boundary mutants for this function. Without the $25 and $50 exact tests, the >= to > mutations survive — meaning a real developer could introduce that exact bug undetected.

Step 4: Test Both Sides of Boolean Logic

Logical mutations (&&||, negation removals) survive when you only test one combination of conditions. For a condition with two boolean inputs, you need all four combinations to kill all logical mutants:

def can_access_admin(user):
    return user.is_active and user.has_role('admin')

# Test all four combinations
def test_active_admin_can_access():
    user = User(is_active=True, role='admin')
    assert can_access_admin(user) == True

def test_inactive_admin_cannot_access():
    # Kills: is_active removed or changed to True constant
    user = User(is_active=False, role='admin')
    assert can_access_admin(user) == False

def test_active_non_admin_cannot_access():
    # Kills: && changed to ||, role check removed
    user = User(is_active=True, role='user')
    assert can_access_admin(user) == False

def test_inactive_non_admin_cannot_access():
    user = User(is_active=False, role='user')
    assert can_access_admin(user) == False

If you only have the first test, the mutation return user.is_active or user.has_role('admin') survives — a serious security bug. All four tests together kill every logical mutation on this function.

Step 5: Prioritize by Business Risk

You can't kill every mutant in one sprint. Prioritize by business impact:

Tier 1 — Kill immediately:

  • Payment calculations, pricing, discount logic
  • Authentication and authorization checks
  • Data validation that prevents corrupt records
  • Any code handling financial amounts or user permissions

Tier 2 — Kill this quarter:

  • Core domain logic specific to your application
  • State machine transitions
  • API contract enforcement (request validation, response formatting)

Tier 3 — Track but don't block on:

  • Utility functions and helpers
  • Formatting and display logic
  • Logging and instrumentation

Ignore (document why):

  • Equivalent mutants
  • Generated code
  • Configuration loading
  • Third-party integrations tested via contract tests

Sort your surviving mutants by the file they're in, map files to tiers, and start with Tier 1. A 60% mutation score overall matters less than having 95% on your billing module.

Step 6: Recognize Equivalent Mutants

Some mutants are equivalent — the changed code produces identical behavior to the original. Chasing them wastes time.

Common equivalent mutants:

// Original
for (int i = 0; i < list.size(); i++) { ... }

// Mutant: i++ → i-- with condition also mutated
// If the loop logic is equivalent due to mutation combination
// Original
const result = value !== null ? value : defaultValue;

// Mutant: !== null → === null, but the ternary branches also flip
// Net behavior: identical

When you encounter a surviving mutant and after analysis you're convinced the mutated code behaves identically to the original, document it and suppress it with a comment or exclusion config. Don't ignore all surviving mutants — only the ones you've verified are equivalent.

A good rule: if you can't write a test that distinguishes the mutant from the original, it's probably equivalent.

Step 7: Track Progress Over Time

Set a baseline mutation score today. Commit to raising it by 5 points per sprint, focused on Tier 1 code. Track:

Sprint 1 baseline: 58% overall, 71% on billing module
Sprint 1 goal: 63% overall, 80% on billing module
Sprint 2 goal: 68% overall, 85% on billing module

Use your mutation tool's threshold configuration to fail CI if the score drops. This prevents new code from being merged with gaps:

  • break: 55 — hard CI failure below this
  • low: 65 — yellow warning
  • high: 80 — green (target state)

Raise the break threshold by 2-3 points each quarter as your test suite improves.

What Good Looks Like

A team that's improved mutation testing effectively has tests that:

  • Assert specific values, not just existence
  • Cover the exact boundary values for every conditional
  • Test both sides of every boolean combination
  • Verify mock calls with specific arguments
  • Use parameterized tests for ranges and edge cases

The mutation score reflects this work directly. An 85% mutation score means 85 out of 100 real bugs of the type your tool generates would be caught before they reach production. That's a concrete claim about your test suite's effectiveness — one that code coverage can't make.

Start with your critical path. Kill the high-priority survivors. Raise the threshold. Repeat.

Read more