Unit Testing Jupyter Notebooks with testbook

Unit Testing Jupyter Notebooks with testbook

testbook bridges the gap between notebook exploration and software engineering rigor. While nbmake verifies that a notebook executes without error, testbook lets you write pytest test functions that call specific functions defined in a notebook, inject test data, and assert outputs—the same level of testing you'd apply to a Python module.

How testbook Works

testbook connects to a Jupyter kernel, executes specific cells to set up state, then exposes the kernel's namespace to your test code via a proxy. You call notebook functions from your test file as if they were imported from a module.

pip install testbook

The @testbook Decorator

The primary testbook API is the @testbook decorator. It starts a kernel, runs the specified notebook up to a point, and provides a tb object for interacting with the kernel:

# tests/test_data_processing.py
from testbook import testbook

@testbook('notebooks/data-processing.ipynb', execute=True)
def test_clean_data(tb):
    # The entire notebook has executed (execute=True)
    # Now call notebook functions directly
    result = tb.call('clean_data', tb.ref('raw_df'))
    df = tb.get('result')
    assert len(df) > 0
    assert 'cleaned_column' in df.columns

The execute=True parameter runs all cells before the test starts. This is equivalent to "kernel restart and run all" followed by your test assertions.

Calling Notebook Functions

Notebook functions are accessible via tb.call():

# notebooks/feature-engineering.ipynb — cell content:
def compute_ratio(numerator, denominator):
    if denominator == 0:
        raise ValueError("Denominator cannot be zero")
    return numerator / denominator

def normalize(values, min_val=None, max_val=None):
    import numpy as np
    arr = np.array(values)
    lo = min_val if min_val is not None else arr.min()
    hi = max_val if max_val is not None else arr.max()
    return ((arr - lo) / (hi - lo)).tolist()
# tests/test_features.py
from testbook import testbook
import pytest

@testbook('notebooks/feature-engineering.ipynb', execute=True)
def test_compute_ratio_normal(tb):
    result = tb.call('compute_ratio', 10, 4)
    assert result == 2.5

@testbook('notebooks/feature-engineering.ipynb', execute=True)
def test_compute_ratio_zero_denominator(tb):
    with pytest.raises(Exception, match="Denominator cannot be zero"):
        tb.call('compute_ratio', 10, 0)

@testbook('notebooks/feature-engineering.ipynb', execute=True)
def test_normalize_basic(tb):
    result = tb.call('normalize', [0, 50, 100])
    assert result == [0.0, 0.5, 1.0]

@testbook('notebooks/feature-engineering.ipynb', execute=True)
def test_normalize_with_explicit_range(tb):
    result = tb.call('normalize', [25, 50, 75], 0, 100)
    assert result == [0.25, 0.5, 0.75]

Injecting Values into the Kernel

Use tb.inject() to execute arbitrary code in the kernel's context:

@testbook('notebooks/model-evaluation.ipynb', execute=False)
def test_accuracy_threshold(tb):
    # Execute only the import cells (first 3 cells)
    tb.execute_cell([0, 1, 2])
    
    # Inject test data
    tb.inject("""
    import numpy as np
    y_true = np.array([0, 1, 1, 0, 1, 0, 0, 1])
    y_pred = np.array([0, 1, 1, 0, 0, 0, 0, 1])
    """)
    
    # Now execute the evaluation cell that uses y_true and y_pred
    tb.execute_cell(5)  # Cell index 5 contains accuracy calculation
    
    accuracy = tb.get('accuracy')
    assert accuracy == pytest.approx(0.875, rel=1e-3)

execute=False prevents automatic execution, giving you fine-grained control over which cells run.

Executing Specific Cells

@testbook('notebooks/pipeline.ipynb', execute=False)
def test_pipeline_step_by_step(tb):
    # Execute cells by index
    tb.execute_cell(0)  # imports
    tb.execute_cell(1)  # config
    
    # Execute a range of cells
    tb.execute_cell(list(range(2, 6)))  # cells 2-5
    
    # Execute cells with a tag (requires cell metadata tags)
    # tb.execute_cell(tb.cells_by_tag('setup'))
    
    result = tb.get('processed_df')
    assert result is not None

Mocking External Calls

Use tb.inject() to replace external dependencies before the cells that use them execute:

@testbook('notebooks/data-loader.ipynb', execute=False)
def test_load_data_with_mock(tb):
    # Execute import cells first
    tb.execute_cell([0, 1])
    
    # Inject a mock for the S3 download function
    tb.inject("""
    from unittest.mock import MagicMock, patch
    import pandas as pd
    
    # Create mock data
    mock_df = pd.DataFrame({
        'user_id': [1, 2, 3],
        'revenue': [100.0, 250.0, 75.0]
    })
    
    # Replace the download function
    import boto3
    boto3.client = MagicMock()
    
    # Inject the mock dataframe directly
    raw_data = mock_df
    """)
    
    # Execute the cell that uses raw_data
    tb.execute_cell(4)  # Processing cell
    
    processed = tb.get('processed_data')
    assert len(processed) == 3
    assert 'revenue' in processed.columns

Using Fixtures with testbook

pytest fixtures work with testbook's decorator. Use them to provide test data:

# conftest.py
import pytest
import pandas as pd
import numpy as np

@pytest.fixture
def sample_dataframe():
    np.random.seed(42)
    return pd.DataFrame({
        'feature_a': np.random.randn(100),
        'feature_b': np.random.randint(0, 10, 100),
        'target': np.random.binomial(1, 0.3, 100)
    })
# tests/test_notebook.py
from testbook import testbook
import pytest

@pytest.fixture(scope='module')
def notebook_tb():
    with testbook('notebooks/model-training.ipynb', execute=False) as tb:
        tb.execute_cell(list(range(5)))  # Execute setup cells
        yield tb

def test_feature_correlation(notebook_tb):
    tb = notebook_tb
    tb.inject("corr = df.corr()")
    corr = tb.get('corr')
    # Verify no features are perfectly correlated
    assert (corr.abs() < 0.99).all().all()

def test_class_balance(notebook_tb):
    tb = notebook_tb
    class_counts = tb.get('class_counts')
    # Verify the minority class is at least 20% of data
    min_proportion = min(class_counts.values()) / sum(class_counts.values())
    assert min_proportion >= 0.2

Note the scope='module' fixture—this starts one kernel per test module rather than one per test function, significantly speeding up tests against the same notebook.

Testing Error Handling in Notebooks

Notebooks often have try/except blocks for graceful degradation. Test that they work:

@testbook('notebooks/robust-pipeline.ipynb', execute=False)
def test_missing_column_handled_gracefully(tb):
    tb.execute_cell([0, 1, 2])  # Execute setup
    
    # Inject data with a missing expected column
    tb.inject("""
    import pandas as pd
    df = pd.DataFrame({'col_a': [1, 2, 3]})  # Missing 'col_b'
    """)
    
    # Execute the processing cell that handles missing columns
    tb.execute_cell(5)
    
    # Verify the error was handled and a flag was set
    error_handled = tb.get('missing_column_error')
    assert error_handled is True
    
    # Verify a default value was used
    result_df = tb.get('result_df')
    assert 'col_b' in result_df.columns
    assert (result_df['col_b'] == 0).all()  # Default value applied

Scoping Kernel Execution

For expensive notebooks, use the context manager form to control scope explicitly:

def test_expensive_notebook():
    with testbook('notebooks/heavy-computation.ipynb', execute=False) as tb:
        # Set up cheap mocks before running expensive cells
        tb.inject("import numpy as np; np.random.seed(42)")
        tb.execute_cell(list(range(10)))  # Setup cells only
        
        # Test multiple things against the same kernel state
        assert tb.get('config')['batch_size'] == 32
        assert tb.get('config')['learning_rate'] == 0.001
        
        # Test a lightweight computation
        result = tb.call('preprocess', [1, 2, 3])
        assert result == [0.33, 0.67, 1.0]
        
    # Kernel is shut down when the context manager exits

Common testbook Issues

Cell index vs notebook position: Cells are indexed by their position in the notebook at the time the file is opened, not by their visual order after reordering. If you rearrange cells, update the indices in your tests. Use cell tags instead of indices when the notebook structure changes frequently.

Kernel startup time: Each test that uses execute=True starts a new kernel and runs the entire notebook. For notebooks with expensive setup (loading large datasets, model initialization), use scope='module' fixtures.

Variable serialization: tb.get() serializes the value using JSON. NumPy arrays, pandas DataFrames, and other complex objects are returned as their serialized representation (lists of lists, dictionaries). If you need the exact object type, inject code that serializes it to a format you can deserialize in Python.

Kernel crashes: If a cell causes a kernel crash (segfault from a C extension, OOM kill), testbook raises a jupyter_client.kernelspec.KernelSpecError. These crashes appear as test errors rather than failures. Check the kernel's memory usage and whether the notebook requires GPU resources.

Summary

testbook enables unit testing of notebook code at the function level. Use it when notebooks contain functions you want to test with different inputs, when you need to mock external dependencies, or when you want to test error handling. Combine it with nbmake (execution testing) and nbval (output validation) for comprehensive notebook test coverage. The key patterns are: execute=False for surgical cell execution, tb.inject() for mock setup, tb.call() for function testing, and module-scoped fixtures for kernel reuse.

Read more