Parameterized Notebook Testing with Papermill
Papermill is a tool for parameterizing and executing Jupyter notebooks programmatically. It was designed for production notebook workflows—running the same analysis against different datasets, date ranges, or configuration sets—but it also makes notebooks significantly more testable. By injecting parameters and capturing outputs, you can test a notebook against many input combinations without maintaining separate notebook files for each scenario.
Installing Papermill
pip install papermillPapermill requires a Jupyter kernel. For Python notebooks:
pip install ipykernel
python -m ipykernel install --userHow Parameterization Works
Papermill uses cell tags to identify the parameters cell. In a Jupyter notebook, tag a cell with parameters:
- Open the notebook in JupyterLab or Jupyter Notebook
- Click the cell
- Open the cell properties panel (wrench icon or Cell Toolbar → Edit Metadata)
- Add the tag
parameters
The tagged cell looks like this:
# In the notebook — cell tagged "parameters"
data_path = "data/default.csv"
n_estimators = 100
test_size = 0.2
random_seed = 42
output_path = "outputs/results.json"When papermill runs the notebook with different parameter values, it inserts a new cell immediately after the parameters cell with the injected values, overriding the defaults.
Basic Parameterized Execution
import papermill as pm
pm.execute_notebook(
input_path='notebooks/model-evaluation.ipynb',
output_path='outputs/model-evaluation-run1.ipynb',
parameters={
'data_path': 'data/test-set.csv',
'n_estimators': 200,
'random_seed': 0
}
)The output notebook contains the executed cells with their outputs, including the injected parameter cell. This output notebook is the test artifact—it captures the full execution trace.
Batch Execution Across Parameter Sets
Test a notebook against multiple parameter combinations using a loop:
# tests/test_model_notebook.py
import papermill as pm
import pytest
import json
import os
PARAMETER_SETS = [
{'n_estimators': 50, 'max_depth': 3, 'label': 'shallow-forest'},
{'n_estimators': 100, 'max_depth': 5, 'label': 'medium-forest'},
{'n_estimators': 200, 'max_depth': None, 'label': 'full-forest'},
]
@pytest.mark.parametrize("params", PARAMETER_SETS, ids=[p['label'] for p in PARAMETER_SETS])
def test_model_notebook(params, tmp_path):
output_path = tmp_path / f"output-{params['label']}.ipynb"
pm.execute_notebook(
input_path='notebooks/random-forest-training.ipynb',
output_path=str(output_path),
parameters={
'n_estimators': params['n_estimators'],
'max_depth': params['max_depth'],
'data_path': 'tests/fixtures/sample-data.csv',
'output_metrics_path': str(tmp_path / f"metrics-{params['label']}.json"),
}
)
# Read the metrics written by the notebook
metrics_path = tmp_path / f"metrics-{params['label']}.json"
assert metrics_path.exists(), f"Notebook did not write metrics for {params['label']}"
with open(metrics_path) as f:
metrics = json.load(f)
assert metrics['accuracy'] >= 0.7, \
f"Accuracy {metrics['accuracy']:.3f} below threshold for {params['label']}"
assert 'f1_score' in metrics
assert 'precision' in metrics
assert 'recall' in metricsOutput Validation
For notebooks that produce structured outputs (JSON metrics files, CSV summaries, model files), validate the output after execution:
# notebooks/data-validation.ipynb writes a validation report
def test_data_validation_notebook(tmp_path):
report_path = tmp_path / 'validation-report.json'
pm.execute_notebook(
'notebooks/data-validation.ipynb',
str(tmp_path / 'output.ipynb'),
parameters={
'data_path': 'tests/fixtures/complete-data.csv',
'report_path': str(report_path),
}
)
with open(report_path) as f:
report = json.load(f)
assert report['total_rows'] > 0
assert report['null_percentage'] < 0.05 # Less than 5% nulls
assert report['duplicate_rows'] == 0
assert all(col in report['columns'] for col in ['user_id', 'timestamp', 'event'])Reading Cell Outputs from Executed Notebooks
Papermill's output notebooks contain the results of each cell execution. Use nbformat to read cell outputs programmatically:
import nbformat
from papermill.iorw import load_notebook_node
def get_cell_output(notebook_path, cell_index):
"""Extract text output from a specific cell in an executed notebook."""
nb = load_notebook_node(notebook_path)
cell = nb.cells[cell_index]
outputs = []
for output in cell.get('outputs', []):
if output.get('output_type') == 'stream':
outputs.append(''.join(output.get('text', [])))
elif output.get('output_type') in ('execute_result', 'display_data'):
if 'text/plain' in output.get('data', {}):
outputs.append(output['data']['text/plain'])
return '\n'.join(outputs)
def test_model_output_notebook(tmp_path):
output_nb = tmp_path / 'output.ipynb'
pm.execute_notebook(
'notebooks/quick-check.ipynb',
str(output_nb),
parameters={'sample_size': 100}
)
# Cell 7 prints the model summary
summary_output = get_cell_output(str(output_nb), 7)
assert 'accuracy' in summary_output.lower()
assert 'RandomForest' in summary_outputNotebook-Level Outputs via pm.record and pm.display
Papermill provides pm.record() and pm.display() to emit structured data from inside notebooks. These outputs are stored in the output notebook's metadata and can be read programmatically.
Inside the notebook:
import papermill as pm
# Train model
accuracy = 0.923
f1 = 0.891
# Record metrics for programmatic access
pm.record('accuracy', accuracy)
pm.record('f1_score', f1)
pm.record('n_samples', len(X_train))
pm.record('feature_count', X_train.shape[1])Reading recorded values from a test:
import papermill as pm
from papermill.iorw import load_notebook_node
import json
def get_papermill_outputs(notebook_path):
nb = load_notebook_node(notebook_path)
metadata = nb.get('metadata', {}).get('papermill', {})
return metadata.get('outputs', {})
def test_recorded_metrics(tmp_path):
output_path = tmp_path / 'output.ipynb'
pm.execute_notebook(
'notebooks/training.ipynb',
str(output_path),
parameters={'data_path': 'tests/fixtures/train.csv'}
)
outputs = get_papermill_outputs(str(output_path))
assert outputs['accuracy'] >= 0.85
assert outputs['f1_score'] >= 0.80
assert outputs['n_samples'] == 800 # 80% of 1000 training samplesHandling Notebook Failures
When a notebook cell raises an exception, papermill raises PapermillExecutionError:
from papermill.exceptions import PapermillExecutionError
def test_notebook_fails_on_invalid_data(tmp_path):
with pytest.raises(PapermillExecutionError) as exc_info:
pm.execute_notebook(
'notebooks/strict-validation.ipynb',
str(tmp_path / 'output.ipynb'),
parameters={'data_path': 'tests/fixtures/invalid-schema.csv'}
)
assert 'SchemaValidationError' in str(exc_info.value)The PapermillExecutionError contains the cell number, cell source, and the original exception. The output notebook is still written even when execution fails—it contains all cells up to the failing one, which is useful for debugging.
CI Integration with Papermill
# .github/workflows/parameterized-notebooks.yml
name: Parameterized Notebook Tests
on:
push:
paths:
- 'notebooks/**'
- 'tests/test_*_notebook.py'
- 'tests/fixtures/**'
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.11'
cache: pip
- name: Install dependencies
run: |
pip install -r requirements.txt
pip install pytest papermill nbformat ipykernel
python -m ipykernel install --user --name python3
- name: Run parameterized notebook tests
run: pytest tests/test_*_notebook.py -v --tb=short
- name: Upload failed notebooks
if: failure()
uses: actions/upload-artifact@v4
with:
name: failed-notebook-outputs
path: /tmp/pytest-*/test_*/*.ipynbCaching Expensive Computations
Papermill doesn't natively support caching, but you can skip notebook execution if the output notebook already exists and is valid:
def run_notebook_cached(input_path, output_path, parameters, force=False):
"""Execute notebook only if output is missing or force=True."""
if not force and os.path.exists(output_path):
try:
nb = load_notebook_node(output_path)
metadata = nb.get('metadata', {}).get('papermill', {})
if not metadata.get('exception', False):
return # Output exists and ran without error
except Exception:
pass # Re-run if output is corrupted
pm.execute_notebook(input_path, output_path, parameters=parameters)Summary
Papermill makes notebooks testable by separating parameters from logic. Tag one cell with parameters, then inject different values in tests using pm.execute_notebook(). Use pm.record() inside notebooks to emit structured metrics that tests can read with get_papermill_outputs(). Combine papermill with pytest's @pytest.mark.parametrize to run the same notebook against many input scenarios in a single test run. For debugging, papermill writes the output notebook even when execution fails—the trace of what ran before the error is preserved.