Testing Notebooks with nbmake and pytest
nbmake is a pytest plugin that treats Jupyter notebooks as test cases. Install it, point pytest at a directory of notebooks, and every notebook that executes without error passes. Every notebook that raises an exception fails. It is the simplest possible notebook testing setup, and for most teams it is exactly the right starting point.
Installation
pip install pytest nbmakenbmake works as a pytest plugin—no configuration beyond installation is required.
Basic Usage
Discover and run all notebooks in a directory:
pytest --nbmake notebooks/Run a specific notebook:
pytest --nbmake notebooks/data-processing.ipynbRun notebooks matching a pattern:
pytest --nbmake notebooks/ -k "not slow"pytest's full filtering, parallelism, and reporting work with nbmake. The notebook file path appears in test output like a regular test:
notebooks/data-processing.ipynb PASSED
notebooks/model-evaluation.ipynb PASSED
notebooks/broken-import.ipynb FAILED
FAILURES
========
notebooks/broken-import.ipynb
Cell 3:
ModuleNotFoundError: No module named 'sklearn'Configuration via pytest.ini
# pytest.ini
[pytest]
addopts = --nbmake
testpaths = notebooksWith this configuration, pytest alone discovers and runs notebooks without additional flags.
Cell Execution Timeout
Long-running notebooks can stall CI indefinitely if a cell hangs. Set a per-cell timeout:
pytest --nbmake --nbmake-timeout=300 notebooks/This sets a 5-minute timeout per cell. Cells that exceed the timeout fail with a TimeoutError. Set this to the longest reasonable runtime for your most expensive cell.
Per-notebook timeout via tag: add a timeout tag to the notebook's metadata in the notebook's nbformat.notebook_metadata section, or use nbmake's --nbmake-timeout per invocation.
Parametrized Notebook Testing
nbmake supports parametrized notebook execution through papermill integration. You can inject parameters into notebooks that use papermill's cell tagging convention.
Tag a cell with parameters in Jupyter by selecting the cell, clicking the gear icon in the cell toolbar, and adding the parameters tag:
# This cell has the "parameters" tag in the notebook
data_path = "data/sample.csv"
n_samples = 1000
test_size = 0.2
random_seed = 42Run the notebook with different parameter sets using nbmake's --nbmake-overwrite-cells option (requires papermill):
pip install papermillA simpler approach for parametrized testing is a pytest fixture that modifies notebook parameters before execution:
# conftest.py
import pytest
import papermill as pm
import tempfile
import os
@pytest.fixture
def run_notebook_with_params():
def _run(notebook_path, parameters, output_dir=None):
output_dir = output_dir or tempfile.mkdtemp()
output_path = os.path.join(output_dir, os.path.basename(notebook_path))
pm.execute_notebook(
notebook_path,
output_path,
parameters=parameters,
)
return output_path
return _run# tests/test_analysis_notebook.py
import pytest
@pytest.mark.parametrize("n_samples,expected_rows", [
(100, 80), # 80% train split
(1000, 800),
(5000, 4000),
])
def test_analysis_notebook_train_split(run_notebook_with_params, n_samples, expected_rows):
run_notebook_with_params(
"notebooks/data-analysis.ipynb",
{"n_samples": n_samples, "random_seed": 42}
)
# If execution completes without error, the test passes
# Output validation requires nbval or custom parsingIgnoring Specific Cells
Some cells should not be tested—cells that display documentation, fetch data from external sources, or generate visualizations that cannot run headlessly. Skip cells using tags.
Tag a cell with skip in Jupyter's cell toolbar. nbmake respects this convention by default.
Alternatively, add a comment directive in the cell:
# nbmake: skip
import matplotlib
matplotlib.use('Agg') # This cell is excluded from test executionHandling Notebooks That Require External Data
Notebooks that load from S3, databases, or large local files fail in CI without data. Two strategies:
Strategy 1: Fixture data — add a small sample dataset to the repository for CI. Update the notebook to check for the fixture path:
import os
import pandas as pd
data_path = os.environ.get('TEST_DATA_PATH', 's3://my-bucket/full-data.parquet')
df = pd.read_parquet(data_path)In CI, set TEST_DATA_PATH=tests/fixtures/sample.parquet.
Strategy 2: Mocking in conftest — override external calls using monkeypatching at the kernel level. This is harder to maintain and better done with testbook.
Parallel Notebook Execution
pip install pytest-xdist
pytest --nbmake -n auto notebooks/-n auto uses one worker per CPU core. Notebooks execute in parallel, significantly reducing CI time for large notebook suites. Ensure notebooks don't share global state or write to the same output paths.
GitHub Actions CI Configuration
# .github/workflows/notebook-tests.yml
name: Notebook Tests
on:
push:
branches: [main]
pull_request:
paths:
- 'notebooks/**'
- 'src/**'
- 'requirements*.txt'
jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ['3.10', '3.11', '3.12']
steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
cache: pip
- name: Install dependencies
run: |
pip install -r requirements.txt
pip install pytest nbmake pytest-xdist
- name: Run notebook tests
run: |
pytest --nbmake --nbmake-timeout=120 -n auto notebooks/ \
--ignore=notebooks/slow/ \
-v
env:
TEST_DATA_PATH: tests/fixtures/sample.parquet
- name: Upload test report
if: always()
uses: actions/upload-artifact@v4
with:
name: notebook-test-report-${{ matrix.python-version }}
path: .pytest_cache/Notebook Organization for Testability
Structure your notebook directory to make testing decisions clear:
notebooks/
├── analysis/ # Run in every PR
│ ├── data-validation.ipynb
│ └── feature-engineering.ipynb
├── reports/ # Run on schedule (weekly), output-validated
│ ├── monthly-summary.ipynb
│ └── model-performance.ipynb
├── slow/ # Run before release only
│ ├── model-training.ipynb
│ └── full-backtest.ipynb
└── exploratory/ # Not in CI (dev exploration)
└── scratch.ipynb# Test analysis notebooks on every PR
- run: pytest --nbmake notebooks/analysis/
# Test reports on schedule
- run: pytest --nbmake --nbval-lax notebooks/reports/
if: github.event_name == 'schedule'Reporting Failures
When a notebook fails in CI, the error output shows the failing cell and its traceback:
FAILED notebooks/feature-engineering.ipynb
Cell 5 raised an exception:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In[5], line 3
1 df_features = df.copy()
2 df_features['ratio'] = (
----> 3 df_features['numerator'] / df_features['denominator']
4 )
KeyError: 'denominator'This is enough to diagnose the issue without running the notebook locally. The notebook developer sees exactly which cell failed and what the exception was.
Summary
nbmake adds notebook execution testing with minimal setup: install the plugin, add --nbmake to your pytest invocation, and every notebook in the discovered paths becomes a test case. Configure timeouts to prevent hangs, use -n auto for parallel execution, and organize notebooks into directories based on how frequently you want to test them. The cost is low; the benefit—catching broken notebooks before users run them—is immediate.