Python

Testing Notebooks with nbmake and pytest

HelpMeTest

17 May 2026 — 4 min read

nbmake is a pytest plugin that treats Jupyter notebooks as test cases. Install it, point pytest at a directory of notebooks, and every notebook that executes without error passes. Every notebook that raises an exception fails. It is the simplest possible notebook testing setup, and for most teams it is exactly the right starting point.

Installation

pip install pytest nbmake

nbmake works as a pytest plugin—no configuration beyond installation is required.

Basic Usage

Discover and run all notebooks in a directory:

pytest --nbmake notebooks/

Run a specific notebook:

pytest --nbmake notebooks/data-processing.ipynb

Run notebooks matching a pattern:

pytest --nbmake notebooks/ -k "not slow"

pytest's full filtering, parallelism, and reporting work with nbmake. The notebook file path appears in test output like a regular test:

notebooks/data-processing.ipynb PASSED
notebooks/model-evaluation.ipynb PASSED
notebooks/broken-import.ipynb FAILED

FAILURES
========
notebooks/broken-import.ipynb
Cell 3:
  ModuleNotFoundError: No module named 'sklearn'

Configuration via pytest.ini

# pytest.ini
[pytest]
addopts = --nbmake
testpaths = notebooks

With this configuration, pytest alone discovers and runs notebooks without additional flags.

Cell Execution Timeout

Long-running notebooks can stall CI indefinitely if a cell hangs. Set a per-cell timeout:

pytest --nbmake --nbmake-timeout=300 notebooks/

This sets a 5-minute timeout per cell. Cells that exceed the timeout fail with a TimeoutError. Set this to the longest reasonable runtime for your most expensive cell.

Per-notebook timeout via tag: add a timeout tag to the notebook's metadata in the notebook's nbformat.notebook_metadata section, or use nbmake's --nbmake-timeout per invocation.

Parametrized Notebook Testing

nbmake supports parametrized notebook execution through papermill integration. You can inject parameters into notebooks that use papermill's cell tagging convention.

Tag a cell with parameters in Jupyter by selecting the cell, clicking the gear icon in the cell toolbar, and adding the parameters tag:

# This cell has the "parameters" tag in the notebook
data_path = "data/sample.csv"
n_samples = 1000
test_size = 0.2
random_seed = 42

Run the notebook with different parameter sets using nbmake's --nbmake-overwrite-cells option (requires papermill):

pip install papermill

A simpler approach for parametrized testing is a pytest fixture that modifies notebook parameters before execution:

# conftest.py
import pytest
import papermill as pm
import tempfile
import os

@pytest.fixture
def run_notebook_with_params():
    def _run(notebook_path, parameters, output_dir=None):
        output_dir = output_dir or tempfile.mkdtemp()
        output_path = os.path.join(output_dir, os.path.basename(notebook_path))
        pm.execute_notebook(
            notebook_path,
            output_path,
            parameters=parameters,
        )
        return output_path
    return _run

# tests/test_analysis_notebook.py
import pytest

@pytest.mark.parametrize("n_samples,expected_rows", [
    (100, 80),   # 80% train split
    (1000, 800),
    (5000, 4000),
])
def test_analysis_notebook_train_split(run_notebook_with_params, n_samples, expected_rows):
    run_notebook_with_params(
        "notebooks/data-analysis.ipynb",
        {"n_samples": n_samples, "random_seed": 42}
    )
    # If execution completes without error, the test passes
    # Output validation requires nbval or custom parsing

Ignoring Specific Cells

Some cells should not be tested—cells that display documentation, fetch data from external sources, or generate visualizations that cannot run headlessly. Skip cells using tags.

Tag a cell with skip in Jupyter's cell toolbar. nbmake respects this convention by default.

Alternatively, add a comment directive in the cell:

# nbmake: skip
import matplotlib
matplotlib.use('Agg')  # This cell is excluded from test execution

Handling Notebooks That Require External Data

Notebooks that load from S3, databases, or large local files fail in CI without data. Two strategies:

Strategy 1: Fixture data — add a small sample dataset to the repository for CI. Update the notebook to check for the fixture path:

import os
import pandas as pd

data_path = os.environ.get('TEST_DATA_PATH', 's3://my-bucket/full-data.parquet')
df = pd.read_parquet(data_path)

In CI, set TEST_DATA_PATH=tests/fixtures/sample.parquet.

Strategy 2: Mocking in conftest — override external calls using monkeypatching at the kernel level. This is harder to maintain and better done with testbook.

Parallel Notebook Execution

pip install pytest-xdist
pytest --nbmake -n auto notebooks/

-n auto uses one worker per CPU core. Notebooks execute in parallel, significantly reducing CI time for large notebook suites. Ensure notebooks don't share global state or write to the same output paths.

GitHub Actions CI Configuration

# .github/workflows/notebook-tests.yml
name: Notebook Tests

on:
  push:
    branches: [main]
  pull_request:
    paths:
      - 'notebooks/**'
      - 'src/**'
      - 'requirements*.txt'

jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: ['3.10', '3.11', '3.12']

    steps:
      - uses: actions/checkout@v4
      
      - name: Set up Python ${{ matrix.python-version }}
        uses: actions/setup-python@v5
        with:
          python-version: ${{ matrix.python-version }}
          cache: pip
      
      - name: Install dependencies
        run: |
          pip install -r requirements.txt
          pip install pytest nbmake pytest-xdist
      
      - name: Run notebook tests
        run: |
          pytest --nbmake --nbmake-timeout=120 -n auto notebooks/ \
            --ignore=notebooks/slow/ \
            -v
        env:
          TEST_DATA_PATH: tests/fixtures/sample.parquet
      
      - name: Upload test report
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: notebook-test-report-${{ matrix.python-version }}
          path: .pytest_cache/

Notebook Organization for Testability

Structure your notebook directory to make testing decisions clear:

notebooks/
├── analysis/          # Run in every PR
│   ├── data-validation.ipynb
│   └── feature-engineering.ipynb
├── reports/           # Run on schedule (weekly), output-validated
│   ├── monthly-summary.ipynb
│   └── model-performance.ipynb
├── slow/              # Run before release only
│   ├── model-training.ipynb
│   └── full-backtest.ipynb
└── exploratory/       # Not in CI (dev exploration)
    └── scratch.ipynb

# Test analysis notebooks on every PR
- run: pytest --nbmake notebooks/analysis/

# Test reports on schedule
- run: pytest --nbmake --nbval-lax notebooks/reports/
  if: github.event_name == 'schedule'

Reporting Failures

When a notebook fails in CI, the error output shows the failing cell and its traceback:

FAILED notebooks/feature-engineering.ipynb
  Cell 5 raised an exception:
  ---------------------------------------------------------------------------
  KeyError                                  Traceback (most recent call last)
  Cell In[5], line 3
        1 df_features = df.copy()
        2 df_features['ratio'] = (
  ----> 3     df_features['numerator'] / df_features['denominator']
        4 )
  KeyError: 'denominator'

This is enough to diagnose the issue without running the notebook locally. The notebook developer sees exactly which cell failed and what the exception was.

Summary

nbmake adds notebook execution testing with minimal setup: install the plugin, add --nbmake to your pytest invocation, and every notebook in the discovered paths becomes a test case. Configure timeouts to prevent hangs, use -n auto for parallel execution, and organize notebooks into directories based on how frequently you want to test them. The cost is low; the benefit—catching broken notebooks before users run them—is immediate.