Approval Testing Explained: When to Use It and How It Differs from Assertions

Approval Testing Explained: When to Use It and How It Differs from Assertions

Approval testing (also called snapshot testing or golden master testing) captures the output of your code and stores it as an "approved" baseline file. Future test runs compare actual output against the baseline — if they differ, the test fails. It's the right tool when the expected output is large, complex, or hard to express as hand-written assertions.

Key Takeaways

Approval testing trades "what should it be" for "did it change". You're not asserting a specific value upfront. You're saying: "this output was correct once, and I want to know if it changes."

Use approval tests when the output is too complex to assert by hand. HTML fragments, serialized objects, API response bodies, CSV exports — writing assertions for these is tedious and brittle. Approval testing generates the "expected" file from real output.

The first run always fails (that's correct). Approval tests fail on first run because there's no baseline yet. You review the output, approve it, and commit the approved file. Subsequent runs compare against that file.

Approval files are part of your test suite. They live in source control alongside your tests. Reviewing a diff in approved files is a code review of your system's output.

Don't use approval tests for simple values. assertEquals(42, result) is clearer than an approval test for a single number. Use approval testing for complex, structured output where the alternative is 20+ assertion lines.

What Is Approval Testing?

Approval testing is a testing pattern where you capture your code's output and store it in a file (the "approved" or "golden" file). Future test runs compare the actual output against the stored file — a mismatch means the test fails.

The term was popularized by Llewellyn Falco, who built ApprovalTests across multiple languages. The pattern itself is older: it's related to golden master testing, snapshot testing (popularized by Jest), and characterization testing.

The key insight: instead of writing assertEquals("expected", actuallyProduced), you let the system generate the expected value from real output, store it, and detect future changes automatically.

The Approval Testing Workflow

First run:

  1. Test runs your code
  2. Actual output is written to a "received" file
  3. No "approved" file exists yet — test fails
  4. You review the received file: is this output correct?
  5. If yes, rename/copy it to the "approved" file
  6. Commit the approved file
  7. Test now passes

Subsequent runs:

  1. Test runs your code
  2. Actual output is compared to the approved file
  3. No difference → test passes
  4. Difference → test fails with a diff

When behavior changes intentionally:

  1. Make your code change
  2. Test fails — shows you the diff
  3. Review the diff: is this the correct new output?
  4. Approve the new output
  5. Commit the updated approved file

Approval Testing vs. Traditional Assertions

Scenario Traditional assertion Approval test
Simple value assertEquals(42, result) ← use this Overkill
Complex object 15+ assert statements Approve the serialized form ← use this
HTML output String comparison nightmare Approve the HTML ← use this
API response body Parse and assert each field Approve the JSON ← use this
Large text output Fragile substring assertions Approve the full text ← use this
Changing output N/A Wrong tool — output must be stable

When to Use Approval Testing

Good candidates:

  • Report generation — monthly PDFs, CSV exports, complex formatted output
  • Serialized objects — JSON/XML/YAML representations of complex domain objects
  • HTML rendering — template output, email bodies, generated HTML
  • Legacy code characterization — you don't know what "correct" is; you want to capture current behavior
  • Complex API responses — response bodies with many fields
  • Code generators — the generated code is the output being tested

Poor candidates:

  • Simple values: numbers, booleans, short strings
  • Random or time-dependent output
  • Output that changes meaningfully with every run
  • Scenarios where you know exactly what to assert

The Approved File Is a Test Artifact

A common mistake is treating approved files as throwaway files. They're not — they're specifications.

When a team member changes code that affects approved output, the test diff shows exactly what changed in the system's output. That diff is a code review artifact: "I changed this, and the output changed from X to Y — is that intentional?"

This is why approved files belong in source control, committed alongside tests, reviewed in pull requests.

Approval Testing vs. Snapshot Testing

The terms are often used interchangeably. There are subtle differences by convention:

Approval Testing Snapshot Testing
Primary language C#, Java, Ruby JavaScript, Python, Go
Prominent tools ApprovalTests, Verify Jest, Syrupy, Cupaloy
Approval workflow Explicit approve step --updateSnapshot flag
File location Often alongside test Usually __snapshots__/
Diff review External diff tool CLI diff

Both implement the same fundamental pattern: capture → store → compare → fail on change.

Approval Testing vs. Characterization Testing (Golden Master)

Characterization testing (also called golden master testing) is an older term for the same idea applied specifically to legacy code. You run legacy code, capture its output, store that as the "golden master," and then refactor knowing that any behavioral change will be caught.

Golden master = approval testing applied to legacy code for the purpose of safe refactoring.

Tools by Language

Language Tool Notes
C# ApprovalTests.Net, Verify Both mature; Verify is more modern
Python syrupy, pytest-snapshot syrupy is the most popular
Go cupaloy, go-snaps cupaloy is simpler; go-snaps has more features
JavaScript Jest snapshots Built-in to Jest
Java ApprovalTests Same library as C#
Ruby Approvals gem Less popular than other tools

What Makes a Good Approval Test?

The approved output should be human-readable. If the approved file is a binary blob, reviewing diffs in code review is impossible. Prefer JSON, HTML, plain text, or YAML over binary formats.

The output must be deterministic. Approval tests fail on non-deterministic output: timestamps, random IDs, memory addresses. Stub these out in tests. Some tools (like Verify) have built-in "scrubbing" to replace common non-deterministic values.

Test one thing per approved file. An approved file that tests "all of report rendering" is hard to understand when it fails. Separate approved files for each meaningful scenario.

Keep approved files close to tests. Approved files should live in the same directory as the test file or in a clearly named testdata/ folder. Never put all approved files in one project-level directory.

Practical Example Without a Library

Before reaching for a library, understand what approval testing does:

# Minimal approval testing without a library
import json
import os

def approve(test_name, actual_output):
    approved_path = f"testdata/{test_name}.approved.json"
    received_path = f"testdata/{test_name}.received.json"

    with open(received_path, 'w') as f:
        json.dump(actual_output, f, indent=2, sort_keys=True)

    if not os.path.exists(approved_path):
        raise AssertionError(
            f"No approved file. Review {received_path} and copy to {approved_path}"
        )

    with open(approved_path) as f:
        approved = json.load(f)

    if actual_output != approved:
        raise AssertionError(
            f"Output changed. Diff {received_path} against {approved_path}"
        )

def test_user_report():
    result = generate_user_report(user_id="u1")
    approve("user_report", result)

This is exactly what ApprovalTests, Syrupy, and Cupaloy do — with better diff tooling, reporter plugins, and framework integration.

Common Pitfalls

Non-deterministic output. Timestamps, session IDs, and random values cause tests to fail on every run. Fix: use fixed dates in tests, mock random sources, scrub non-deterministic fields before approving.

Too many approvals at once. Approving 200 files at once means you didn't review any of them. Approve in small batches, reviewing each diff.

Committing approved files that contain secrets. Approved JSON files might capture authentication tokens or API keys. Sanitize outputs before approving.

Ignoring approved file diffs in code review. If reviewers skip approved file diffs, the entire value of the pattern is lost. Make approved file diffs a standard part of your review checklist.

Summary

Approval testing answers the question: "did the output of my code change?" It's the right tool when:

  • The output is too complex for hand-written assertions
  • You're characterizing legacy behavior before refactoring
  • You want a living specification of your system's output

It's the wrong tool for:

  • Simple values you can assert directly
  • Non-deterministic output

The approved file is a first-class artifact — commit it, review it, and treat changes to it as carefully as changes to the code that generated it.

Read more