Parameterized Notebook Testing with Papermill

Parameterized Notebook Testing with Papermill

Papermill is a tool for parameterizing and executing Jupyter notebooks programmatically. It was designed for production notebook workflows—running the same analysis against different datasets, date ranges, or configuration sets—but it also makes notebooks significantly more testable. By injecting parameters and capturing outputs, you can test a notebook against many input combinations without maintaining separate notebook files for each scenario.

Installing Papermill

pip install papermill

Papermill requires a Jupyter kernel. For Python notebooks:

pip install ipykernel
python -m ipykernel install --user

How Parameterization Works

Papermill uses cell tags to identify the parameters cell. In a Jupyter notebook, tag a cell with parameters:

  1. Open the notebook in JupyterLab or Jupyter Notebook
  2. Click the cell
  3. Open the cell properties panel (wrench icon or Cell Toolbar → Edit Metadata)
  4. Add the tag parameters

The tagged cell looks like this:

# In the notebook — cell tagged "parameters"
data_path = "data/default.csv"
n_estimators = 100
test_size = 0.2
random_seed = 42
output_path = "outputs/results.json"

When papermill runs the notebook with different parameter values, it inserts a new cell immediately after the parameters cell with the injected values, overriding the defaults.

Basic Parameterized Execution

import papermill as pm

pm.execute_notebook(
    input_path='notebooks/model-evaluation.ipynb',
    output_path='outputs/model-evaluation-run1.ipynb',
    parameters={
        'data_path': 'data/test-set.csv',
        'n_estimators': 200,
        'random_seed': 0
    }
)

The output notebook contains the executed cells with their outputs, including the injected parameter cell. This output notebook is the test artifact—it captures the full execution trace.

Batch Execution Across Parameter Sets

Test a notebook against multiple parameter combinations using a loop:

# tests/test_model_notebook.py
import papermill as pm
import pytest
import json
import os

PARAMETER_SETS = [
    {'n_estimators': 50, 'max_depth': 3, 'label': 'shallow-forest'},
    {'n_estimators': 100, 'max_depth': 5, 'label': 'medium-forest'},
    {'n_estimators': 200, 'max_depth': None, 'label': 'full-forest'},
]

@pytest.mark.parametrize("params", PARAMETER_SETS, ids=[p['label'] for p in PARAMETER_SETS])
def test_model_notebook(params, tmp_path):
    output_path = tmp_path / f"output-{params['label']}.ipynb"

    pm.execute_notebook(
        input_path='notebooks/random-forest-training.ipynb',
        output_path=str(output_path),
        parameters={
            'n_estimators': params['n_estimators'],
            'max_depth': params['max_depth'],
            'data_path': 'tests/fixtures/sample-data.csv',
            'output_metrics_path': str(tmp_path / f"metrics-{params['label']}.json"),
        }
    )

    # Read the metrics written by the notebook
    metrics_path = tmp_path / f"metrics-{params['label']}.json"
    assert metrics_path.exists(), f"Notebook did not write metrics for {params['label']}"

    with open(metrics_path) as f:
        metrics = json.load(f)

    assert metrics['accuracy'] >= 0.7, \
        f"Accuracy {metrics['accuracy']:.3f} below threshold for {params['label']}"
    assert 'f1_score' in metrics
    assert 'precision' in metrics
    assert 'recall' in metrics

Output Validation

For notebooks that produce structured outputs (JSON metrics files, CSV summaries, model files), validate the output after execution:

# notebooks/data-validation.ipynb writes a validation report
def test_data_validation_notebook(tmp_path):
    report_path = tmp_path / 'validation-report.json'

    pm.execute_notebook(
        'notebooks/data-validation.ipynb',
        str(tmp_path / 'output.ipynb'),
        parameters={
            'data_path': 'tests/fixtures/complete-data.csv',
            'report_path': str(report_path),
        }
    )

    with open(report_path) as f:
        report = json.load(f)

    assert report['total_rows'] > 0
    assert report['null_percentage'] < 0.05  # Less than 5% nulls
    assert report['duplicate_rows'] == 0
    assert all(col in report['columns'] for col in ['user_id', 'timestamp', 'event'])

Reading Cell Outputs from Executed Notebooks

Papermill's output notebooks contain the results of each cell execution. Use nbformat to read cell outputs programmatically:

import nbformat
from papermill.iorw import load_notebook_node

def get_cell_output(notebook_path, cell_index):
    """Extract text output from a specific cell in an executed notebook."""
    nb = load_notebook_node(notebook_path)
    cell = nb.cells[cell_index]
    outputs = []
    for output in cell.get('outputs', []):
        if output.get('output_type') == 'stream':
            outputs.append(''.join(output.get('text', [])))
        elif output.get('output_type') in ('execute_result', 'display_data'):
            if 'text/plain' in output.get('data', {}):
                outputs.append(output['data']['text/plain'])
    return '\n'.join(outputs)

def test_model_output_notebook(tmp_path):
    output_nb = tmp_path / 'output.ipynb'
    pm.execute_notebook(
        'notebooks/quick-check.ipynb',
        str(output_nb),
        parameters={'sample_size': 100}
    )

    # Cell 7 prints the model summary
    summary_output = get_cell_output(str(output_nb), 7)
    assert 'accuracy' in summary_output.lower()
    assert 'RandomForest' in summary_output

Notebook-Level Outputs via pm.record and pm.display

Papermill provides pm.record() and pm.display() to emit structured data from inside notebooks. These outputs are stored in the output notebook's metadata and can be read programmatically.

Inside the notebook:

import papermill as pm

# Train model
accuracy = 0.923
f1 = 0.891

# Record metrics for programmatic access
pm.record('accuracy', accuracy)
pm.record('f1_score', f1)
pm.record('n_samples', len(X_train))
pm.record('feature_count', X_train.shape[1])

Reading recorded values from a test:

import papermill as pm
from papermill.iorw import load_notebook_node
import json

def get_papermill_outputs(notebook_path):
    nb = load_notebook_node(notebook_path)
    metadata = nb.get('metadata', {}).get('papermill', {})
    return metadata.get('outputs', {})

def test_recorded_metrics(tmp_path):
    output_path = tmp_path / 'output.ipynb'
    pm.execute_notebook(
        'notebooks/training.ipynb',
        str(output_path),
        parameters={'data_path': 'tests/fixtures/train.csv'}
    )

    outputs = get_papermill_outputs(str(output_path))

    assert outputs['accuracy'] >= 0.85
    assert outputs['f1_score'] >= 0.80
    assert outputs['n_samples'] == 800  # 80% of 1000 training samples

Handling Notebook Failures

When a notebook cell raises an exception, papermill raises PapermillExecutionError:

from papermill.exceptions import PapermillExecutionError

def test_notebook_fails_on_invalid_data(tmp_path):
    with pytest.raises(PapermillExecutionError) as exc_info:
        pm.execute_notebook(
            'notebooks/strict-validation.ipynb',
            str(tmp_path / 'output.ipynb'),
            parameters={'data_path': 'tests/fixtures/invalid-schema.csv'}
        )

    assert 'SchemaValidationError' in str(exc_info.value)

The PapermillExecutionError contains the cell number, cell source, and the original exception. The output notebook is still written even when execution fails—it contains all cells up to the failing one, which is useful for debugging.

CI Integration with Papermill

# .github/workflows/parameterized-notebooks.yml
name: Parameterized Notebook Tests

on:
  push:
    paths:
      - 'notebooks/**'
      - 'tests/test_*_notebook.py'
      - 'tests/fixtures/**'

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'
          cache: pip

      - name: Install dependencies
        run: |
          pip install -r requirements.txt
          pip install pytest papermill nbformat ipykernel
          python -m ipykernel install --user --name python3

      - name: Run parameterized notebook tests
        run: pytest tests/test_*_notebook.py -v --tb=short

      - name: Upload failed notebooks
        if: failure()
        uses: actions/upload-artifact@v4
        with:
          name: failed-notebook-outputs
          path: /tmp/pytest-*/test_*/*.ipynb

Caching Expensive Computations

Papermill doesn't natively support caching, but you can skip notebook execution if the output notebook already exists and is valid:

def run_notebook_cached(input_path, output_path, parameters, force=False):
    """Execute notebook only if output is missing or force=True."""
    if not force and os.path.exists(output_path):
        try:
            nb = load_notebook_node(output_path)
            metadata = nb.get('metadata', {}).get('papermill', {})
            if not metadata.get('exception', False):
                return  # Output exists and ran without error
        except Exception:
            pass  # Re-run if output is corrupted

    pm.execute_notebook(input_path, output_path, parameters=parameters)

Summary

Papermill makes notebooks testable by separating parameters from logic. Tag one cell with parameters, then inject different values in tests using pm.execute_notebook(). Use pm.record() inside notebooks to emit structured metrics that tests can read with get_papermill_outputs(). Combine papermill with pytest's @pytest.mark.parametrize to run the same notebook against many input scenarios in a single test run. For debugging, papermill writes the output notebook even when execution fails—the trace of what ran before the error is preserved.

Read more