Python

How to Test Jupyter Notebooks: nbmake, testbook, and nbval

HelpMeTest

17 May 2026 — 3 min read

Jupyter notebooks are the primary artifact for data science work—model training, data exploration, report generation. Yet most teams treat them as write-only documents. Notebooks break silently when dependencies update, when data shapes change, or when code paths that ran during development are skipped during the final run. Testing notebooks is not complicated; it just requires knowing which tool fits which use case.

This guide covers the three main tools for notebook testing and when to reach for each.

The Problem with Untested Notebooks

A notebook that passes a visual review can still fail in production because:

Cell execution order: Notebooks that were developed interactively often have cells that depend on state from cells that are not above them. They work in development because variables persist in the kernel. They fail when run top-to-bottom in CI.
Hidden state: Variables defined in deleted cells still exist in the kernel. The notebook runs fine until the kernel is restarted.
Output drift: A model evaluation notebook records accuracy of 94%. Six months later, the same notebook runs and produces 87%. Nobody notices because no test checks the output.
Broken imports: A library update changes an API. The notebook fails on the second cell, but the error is buried in CI logs.

The Three Tools

nbmake: Execution Testing

nbmake is a pytest plugin that discovers and executes notebooks. It verifies that every cell executes without raising an exception. This is the lowest bar—and the most important one to meet first.

pip install pytest nbmake

pytest --nbmake notebooks/

nbmake treats each notebook as a test case. A notebook passes if it executes from top to bottom without error. It fails if any cell raises an exception.

Use nbmake when:

You want to verify notebooks are not broken by dependency updates
You want to enforce top-to-bottom executability
You're starting from zero notebook testing

testbook: Unit Testing

testbook lets you write pytest tests that inject code into specific cells of a running notebook kernel, call functions defined in the notebook, and assert the results. It's notebook testing with the same granularity as function-level unit tests.

pip install testbook

Use testbook when:

The notebook contains functions you want to test with different inputs
You want to test error handling without running the full notebook
You need to mock external dependencies (databases, APIs)

nbval: Output Validation

nbval reruns a notebook and compares the output of each cell to the recorded output in the .ipynb file. If the output changes, the test fails.

pip install pytest nbval

pytest --nbval notebooks/data-validation.ipynb

Use nbval when:

The notebook is a report or data validation step where output consistency matters
You want to detect regressions in model output or computed values
You have deterministic computations that should always produce the same result

CI Integration

All three tools work with standard pytest and integrate into CI the same way:

# .github/workflows/notebook-tests.yml
name: Notebook Tests

on:
  push:
    paths:
      - 'notebooks/**'
      - 'src/**'
      - 'requirements.txt'

jobs:
  test-notebooks:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'
      - name: Install dependencies
        run: pip install -r requirements.txt pytest nbmake testbook nbval
      - name: Test notebook execution
        run: pytest --nbmake notebooks/
      - name: Validate notebook outputs
        run: pytest --nbval-lax notebooks/reports/

The --nbval-lax flag ignores differences in output metadata (timestamps, cell IDs) and only compares the visible output text.

Choosing the Right Tool

Goal	Tool
Notebooks execute without error	nbmake
Functions produce correct output	testbook
Cell outputs haven't changed	nbval
All of the above	Use all three

In practice, most teams start with nbmake (cheapest to set up), add nbval to report notebooks where output consistency is business-critical, and use testbook for notebooks that contain non-trivial data processing logic.

Common Pitfalls

Hardcoded file paths: Notebooks often contain paths like /home/username/data/. In CI, these paths don't exist. Use pathlib.Path(__file__).parent or environment variables for paths that change across environments.

GPU dependencies: Notebooks developed on GPU machines fail in CI because the GPU is not available. Mock GPU calls or run notebook tests on GPU runners if the computation is essential to test.

Random seed inconsistency: Machine learning notebooks that use random operations need a fixed seed to produce deterministic output for nbval. Always call np.random.seed() and random.seed() at the top of notebooks that will be output-validated.

Timeout: Long-running notebooks (model training) should not be part of the CI notebook test suite. Extract the training logic to a separate Python module and test that instead. Test the notebook's data loading and evaluation cells only.

Summary

Three tools, three jobs: nbmake catches broken notebooks, testbook enables unit testing inside notebooks, and nbval guards against output drift. Start with nbmake for execution testing in CI, and add the other tools where the notebook's role in your pipeline demands higher confidence.

How to Test Jupyter Notebooks: nbmake, testbook, and nbval

HelpMeTest

The Problem with Untested Notebooks

The Three Tools

nbmake: Execution Testing

testbook: Unit Testing

nbval: Output Validation

CI Integration

Choosing the Right Tool

Common Pitfalls

Summary

Read more

Soda Core Data Quality Testing: From Setup to Production Monitoring

Amazon Deequ Data Quality Testing at Scale with Apache Spark

Datafold Data Diff for CI/CD: Catching Data Regressions Before They Ship

Monte Carlo Data Observability Testing: Detecting Anomalies and Pipeline Failures