Unit Testing Jupyter Notebooks with testbook
testbook bridges the gap between notebook exploration and software engineering rigor. While nbmake verifies that a notebook executes without error, testbook lets you write pytest test functions that call specific functions defined in a notebook, inject test data, and assert outputs—the same level of testing you'd apply to a Python module.
How testbook Works
testbook connects to a Jupyter kernel, executes specific cells to set up state, then exposes the kernel's namespace to your test code via a proxy. You call notebook functions from your test file as if they were imported from a module.
pip install testbookThe @testbook Decorator
The primary testbook API is the @testbook decorator. It starts a kernel, runs the specified notebook up to a point, and provides a tb object for interacting with the kernel:
# tests/test_data_processing.py
from testbook import testbook
@testbook('notebooks/data-processing.ipynb', execute=True)
def test_clean_data(tb):
# The entire notebook has executed (execute=True)
# Now call notebook functions directly
result = tb.call('clean_data', tb.ref('raw_df'))
df = tb.get('result')
assert len(df) > 0
assert 'cleaned_column' in df.columnsThe execute=True parameter runs all cells before the test starts. This is equivalent to "kernel restart and run all" followed by your test assertions.
Calling Notebook Functions
Notebook functions are accessible via tb.call():
# notebooks/feature-engineering.ipynb — cell content:
def compute_ratio(numerator, denominator):
if denominator == 0:
raise ValueError("Denominator cannot be zero")
return numerator / denominator
def normalize(values, min_val=None, max_val=None):
import numpy as np
arr = np.array(values)
lo = min_val if min_val is not None else arr.min()
hi = max_val if max_val is not None else arr.max()
return ((arr - lo) / (hi - lo)).tolist()# tests/test_features.py
from testbook import testbook
import pytest
@testbook('notebooks/feature-engineering.ipynb', execute=True)
def test_compute_ratio_normal(tb):
result = tb.call('compute_ratio', 10, 4)
assert result == 2.5
@testbook('notebooks/feature-engineering.ipynb', execute=True)
def test_compute_ratio_zero_denominator(tb):
with pytest.raises(Exception, match="Denominator cannot be zero"):
tb.call('compute_ratio', 10, 0)
@testbook('notebooks/feature-engineering.ipynb', execute=True)
def test_normalize_basic(tb):
result = tb.call('normalize', [0, 50, 100])
assert result == [0.0, 0.5, 1.0]
@testbook('notebooks/feature-engineering.ipynb', execute=True)
def test_normalize_with_explicit_range(tb):
result = tb.call('normalize', [25, 50, 75], 0, 100)
assert result == [0.25, 0.5, 0.75]Injecting Values into the Kernel
Use tb.inject() to execute arbitrary code in the kernel's context:
@testbook('notebooks/model-evaluation.ipynb', execute=False)
def test_accuracy_threshold(tb):
# Execute only the import cells (first 3 cells)
tb.execute_cell([0, 1, 2])
# Inject test data
tb.inject("""
import numpy as np
y_true = np.array([0, 1, 1, 0, 1, 0, 0, 1])
y_pred = np.array([0, 1, 1, 0, 0, 0, 0, 1])
""")
# Now execute the evaluation cell that uses y_true and y_pred
tb.execute_cell(5) # Cell index 5 contains accuracy calculation
accuracy = tb.get('accuracy')
assert accuracy == pytest.approx(0.875, rel=1e-3)execute=False prevents automatic execution, giving you fine-grained control over which cells run.
Executing Specific Cells
@testbook('notebooks/pipeline.ipynb', execute=False)
def test_pipeline_step_by_step(tb):
# Execute cells by index
tb.execute_cell(0) # imports
tb.execute_cell(1) # config
# Execute a range of cells
tb.execute_cell(list(range(2, 6))) # cells 2-5
# Execute cells with a tag (requires cell metadata tags)
# tb.execute_cell(tb.cells_by_tag('setup'))
result = tb.get('processed_df')
assert result is not NoneMocking External Calls
Use tb.inject() to replace external dependencies before the cells that use them execute:
@testbook('notebooks/data-loader.ipynb', execute=False)
def test_load_data_with_mock(tb):
# Execute import cells first
tb.execute_cell([0, 1])
# Inject a mock for the S3 download function
tb.inject("""
from unittest.mock import MagicMock, patch
import pandas as pd
# Create mock data
mock_df = pd.DataFrame({
'user_id': [1, 2, 3],
'revenue': [100.0, 250.0, 75.0]
})
# Replace the download function
import boto3
boto3.client = MagicMock()
# Inject the mock dataframe directly
raw_data = mock_df
""")
# Execute the cell that uses raw_data
tb.execute_cell(4) # Processing cell
processed = tb.get('processed_data')
assert len(processed) == 3
assert 'revenue' in processed.columnsUsing Fixtures with testbook
pytest fixtures work with testbook's decorator. Use them to provide test data:
# conftest.py
import pytest
import pandas as pd
import numpy as np
@pytest.fixture
def sample_dataframe():
np.random.seed(42)
return pd.DataFrame({
'feature_a': np.random.randn(100),
'feature_b': np.random.randint(0, 10, 100),
'target': np.random.binomial(1, 0.3, 100)
})# tests/test_notebook.py
from testbook import testbook
import pytest
@pytest.fixture(scope='module')
def notebook_tb():
with testbook('notebooks/model-training.ipynb', execute=False) as tb:
tb.execute_cell(list(range(5))) # Execute setup cells
yield tb
def test_feature_correlation(notebook_tb):
tb = notebook_tb
tb.inject("corr = df.corr()")
corr = tb.get('corr')
# Verify no features are perfectly correlated
assert (corr.abs() < 0.99).all().all()
def test_class_balance(notebook_tb):
tb = notebook_tb
class_counts = tb.get('class_counts')
# Verify the minority class is at least 20% of data
min_proportion = min(class_counts.values()) / sum(class_counts.values())
assert min_proportion >= 0.2Note the scope='module' fixture—this starts one kernel per test module rather than one per test function, significantly speeding up tests against the same notebook.
Testing Error Handling in Notebooks
Notebooks often have try/except blocks for graceful degradation. Test that they work:
@testbook('notebooks/robust-pipeline.ipynb', execute=False)
def test_missing_column_handled_gracefully(tb):
tb.execute_cell([0, 1, 2]) # Execute setup
# Inject data with a missing expected column
tb.inject("""
import pandas as pd
df = pd.DataFrame({'col_a': [1, 2, 3]}) # Missing 'col_b'
""")
# Execute the processing cell that handles missing columns
tb.execute_cell(5)
# Verify the error was handled and a flag was set
error_handled = tb.get('missing_column_error')
assert error_handled is True
# Verify a default value was used
result_df = tb.get('result_df')
assert 'col_b' in result_df.columns
assert (result_df['col_b'] == 0).all() # Default value appliedScoping Kernel Execution
For expensive notebooks, use the context manager form to control scope explicitly:
def test_expensive_notebook():
with testbook('notebooks/heavy-computation.ipynb', execute=False) as tb:
# Set up cheap mocks before running expensive cells
tb.inject("import numpy as np; np.random.seed(42)")
tb.execute_cell(list(range(10))) # Setup cells only
# Test multiple things against the same kernel state
assert tb.get('config')['batch_size'] == 32
assert tb.get('config')['learning_rate'] == 0.001
# Test a lightweight computation
result = tb.call('preprocess', [1, 2, 3])
assert result == [0.33, 0.67, 1.0]
# Kernel is shut down when the context manager exitsCommon testbook Issues
Cell index vs notebook position: Cells are indexed by their position in the notebook at the time the file is opened, not by their visual order after reordering. If you rearrange cells, update the indices in your tests. Use cell tags instead of indices when the notebook structure changes frequently.
Kernel startup time: Each test that uses execute=True starts a new kernel and runs the entire notebook. For notebooks with expensive setup (loading large datasets, model initialization), use scope='module' fixtures.
Variable serialization: tb.get() serializes the value using JSON. NumPy arrays, pandas DataFrames, and other complex objects are returned as their serialized representation (lists of lists, dictionaries). If you need the exact object type, inject code that serializes it to a format you can deserialize in Python.
Kernel crashes: If a cell causes a kernel crash (segfault from a C extension, OOM kill), testbook raises a jupyter_client.kernelspec.KernelSpecError. These crashes appear as test errors rather than failures. Check the kernel's memory usage and whether the notebook requires GPU resources.
Summary
testbook enables unit testing of notebook code at the function level. Use it when notebooks contain functions you want to test with different inputs, when you need to mock external dependencies, or when you want to test error handling. Combine it with nbmake (execution testing) and nbval (output validation) for comprehensive notebook test coverage. The key patterns are: execute=False for surgical cell execution, tb.inject() for mock setup, tb.call() for function testing, and module-scoped fixtures for kernel reuse.