Golden Master Testing: Characterization Tests for Legacy Code

Golden Master Testing: Characterization Tests for Legacy Code

Golden master testing (also called characterization testing) captures what your code currently does and stores it as a baseline. You then refactor with confidence — if the output changes, the test fails. It's not about proving your code is correct; it's about proving a refactor didn't change behavior. Essential for legacy code without tests.

Key Takeaways

Golden masters test behavior, not correctness. You're capturing current behavior, not specifying correct behavior. Bugs in the original code are captured in the golden master. That's fine — the goal is to detect changes during refactoring, not to fix bugs yet.

Create golden masters before any refactoring. The whole point is safety during change. Create the golden master first, verify the test works, then refactor.

Golden masters must cover representative inputs. A golden master that only tests the happy path won't catch behavioral regressions in error paths. Feed the system a variety of inputs before approving.

Retire golden masters when you have real tests. A golden master is scaffolding. Once you understand the code and have written proper unit tests, the golden master can be deleted. Don't carry it forever.

Non-determinism is your first obstacle. Legacy code often uses timestamps, random numbers, or global state. You must neutralize these before you can create stable golden masters.

What Is Golden Master Testing?

The term "golden master" comes from the music industry — the master recording that all copies are made from. In software testing, the golden master is the captured output of your code that all future runs are compared against.

Michael Feathers described this pattern in "Working Effectively with Legacy Code" as characterization testing: you're characterizing what the code does, not what it should do. The distinction matters:

  • Specification test: asserts what code should do (requirements-driven)
  • Characterization test: asserts what code currently does (behavior-capture)

Both are valid. Characterization tests are specifically for code you're about to change but don't fully understand.

When to Use Golden Master Testing

The pattern applies when:

  • You need to refactor code with no existing tests
  • You're extracting a module from a monolith
  • You're upgrading a framework or dependency and need behavioral parity
  • You're migrating language versions (Python 2→3, Java 8→17)
  • You inherited code and need to understand what it does before modifying it

Do not use golden masters as a permanent replacement for unit tests. They're scaffolding for refactoring, not a long-term testing strategy.

The Golden Master Workflow

Step 1: Identify the System Under Characterization

Find the highest-level entry point you can test without changing the code:

# Legacy code — this function does something. We don't know exactly what.
def process_order(order_data: dict, user_id: str, discount_code: str = None) -> dict:
    # 300 lines of business logic written in 2019
    ...
    return result

Step 2: Gather Representative Inputs

Run the production system or read production logs to find real inputs. Alternatively, create inputs that cover the main variations:

test_cases = [
    # Happy path
    {"order": {"items": [{"id": "p1", "qty": 2}], "total": 59.98}, "user": "u1"},
    # Discount code
    {"order": {"items": [{"id": "p1", "qty": 1}], "total": 29.99}, "user": "u2", "discount": "SAVE10"},
    # Empty cart
    {"order": {"items": [], "total": 0.0}, "user": "u3"},
    # Large order
    {"order": {"items": [{"id": "p2", "qty": 10}], "total": 599.90}, "user": "u4"},
]

Step 3: Neutralize Non-Determinism

Legacy code often embeds non-deterministic values that would cause golden masters to fail on every run:

# Problem: order IDs generated from timestamp
def process_order(order_data):
    result = {
        "order_id": f"ORD-{int(time.time())}",  # Non-deterministic!
        "processed_at": datetime.now().isoformat(),  # Non-deterministic!
        ...
    }

Fix this by controlling or scrubbing non-deterministic fields:

# Option 1: Freeze time in tests
from freezegun import freeze_time

@freeze_time("2024-01-15 10:00:00")
def test_process_order_golden():
    ...

# Option 2: Scrub after capture
def scrub_output(output: dict) -> dict:
    """Remove/normalize non-deterministic fields before approval."""
    result = output.copy()
    if 'order_id' in result:
        result['order_id'] = '<ORDER_ID>'
    if 'processed_at' in result:
        result['processed_at'] = '<TIMESTAMP>'
    return result

Step 4: Capture the Golden Master

Run the code and capture the output:

import json
import os

def capture_golden_master(test_name: str, actual_output):
    """Save actual output as golden master if none exists."""
    path = f"tests/golden/{test_name}.json"
    os.makedirs(os.path.dirname(path), exist_ok=True)

    scrubbed = scrub_output(actual_output)

    if not os.path.exists(path):
        with open(path, 'w') as f:
            json.dump(scrubbed, f, indent=2, sort_keys=True)
        return True  # Just created — caller should verify manually

    return False

def assert_matches_golden_master(test_name: str, actual_output):
    path = f"tests/golden/{test_name}.json"
    scrubbed = scrub_output(actual_output)

    if not os.path.exists(path):
        with open(path, 'w') as f:
            json.dump(scrubbed, f, indent=2, sort_keys=True)
        raise AssertionError(
            f"Created golden master at {path}. Verify it's correct and rerun."
        )

    with open(path) as f:
        golden = json.load(f)

    assert scrubbed == golden, (
        f"Output changed from golden master. "
        f"If intentional, update {path}."
    )

Step 5: Write the Tests

# tests/test_process_order_golden.py
from freezegun import freeze_time
from myapp.orders import process_order

@freeze_time("2024-01-15 10:00:00")
def test_process_order_standard_happy_path():
    result = process_order(
        order_data={"items": [{"id": "p1", "qty": 2}], "total": 59.98},
        user_id="u1"
    )
    assert_matches_golden_master("process_order_happy_path", result)

@freeze_time("2024-01-15 10:00:00")
def test_process_order_with_discount():
    result = process_order(
        order_data={"items": [{"id": "p1", "qty": 1}], "total": 29.99},
        user_id="u2",
        discount_code="SAVE10"
    )
    assert_matches_golden_master("process_order_discount", result)

@freeze_time("2024-01-15 10:00:00")
def test_process_order_empty_cart():
    result = process_order(
        order_data={"items": [], "total": 0.0},
        user_id="u3"
    )
    assert_matches_golden_master("process_order_empty", result)

Step 6: Run, Review, and Approve

First run creates the golden master files and fails. Review each file:

cat tests/golden/process_order_happy_path.json

If the output looks correct for the inputs you provided, commit the golden files. Subsequent runs compare against them.

Step 7: Refactor with Confidence

Now refactor. If behavior changes, the test fails with a diff:

AssertionError: Output changed from golden master.
If intentional, update tests/golden/process_order_happy_path.json.

Actual:
{
  "status": "approved",
  "discount_applied": true,     # Was: "discount_applied": false
  ...
}

Is this change intentional? Investigate. If yes, update the golden master. If no, you caught a regression.

Handling Captured Bugs

Golden masters capture current behavior, including bugs. You'll find outputs that are clearly wrong:

{
  "discount_percent": 0.1,   // Code says SAVE10 = 10%, but this is 0.1 not 10
  "discount_applied": false,  // This seems wrong — we sent a valid discount code
}

Do not fix the bug in the golden master. Leave it as-is for now. The golden master's job is to ensure your refactoring doesn't break anything. After you have the refactored code under test, fix the bug separately, write a proper unit test for the fix, and update the golden master.

This is the key discipline of the pattern: separate concerns. Refactoring and bug-fixing are different activities.

Using Golden Masters for Framework Migrations

Golden masters are excellent for framework version upgrades. Example: Python 2 to Python 3 migration:

# Same function, both versions
# Capture golden master on Python 2
# Run same tests on Python 3
# Differences reveal behavioral changes in the migration

Or for upgrading a serialization library:

# Before: use pickle for serialization
# After: use orjson for serialization
# Golden master ensures deserialized output is identical

Retiring Golden Masters

Golden masters are scaffolding — retire them once you understand the code:

  1. Write real unit tests that cover the same scenarios
  2. The unit tests pass
  3. Delete the golden master files
  4. Delete the golden master test file
  5. The codebase is now better: specific assertions, no large blob files

Signs a golden master should be retired:

  • You fully understand what the code does
  • You have unit tests for all important behaviors
  • The golden master file is never reviewed in code review
  • Updating the golden master is becoming automatic without review

Summary

Golden master testing gives you a safety net for refactoring legacy code:

  1. Capture current behavior with representative inputs
  2. Neutralize non-determinism (timestamps, random IDs)
  3. Store captured output as golden master files (committed to source control)
  4. Refactor — test fails if behavior changes
  5. Retire when you have real unit tests

The key discipline: golden masters detect changes, they don't prove correctness. Bugs in the original code are captured as-is. Fix bugs separately, after the refactoring is complete.

Read more