Pandas Testing Best Practices: Testing Data Transformations with pytest
Testing pandas code requires different patterns than testing application code. DataFrames are complex objects—you rarely want to assert on the entire DataFrame at once. Instead, assert on specific columns, row counts, data types, and statistical properties. Use pytest fixtures for reusable test data, pandas.testing for DataFrame comparisons, and Hypothesis for property-based tests that find edge cases you wouldn't think to write.
Key Takeaways
Use pandas.testing.assert_frame_equal for DataFrame comparisons. It handles floating-point tolerance, column order, dtype checking, and produces readable diffs. Don't compare DataFrames with == or assert df1.equals(df2) in tests.
Test transformations on small, explicit DataFrames. A 3-5 row test DataFrame with known values is better than loading a CSV. You can predict the expected output exactly.
Test edge cases explicitly. Empty DataFrames, single-row DataFrames, null values, all-null columns, duplicate rows—these are where pandas transformation bugs hide.
Verify dtypes, not just values. A transformation that returns strings instead of floats will cause downstream failures. Assert column dtypes explicitly.
Property-based testing with Hypothesis finds bugs you'd never think to test. Generate random DataFrames that satisfy your schema and assert that invariants hold regardless of the specific values.
Why Pandas Code Is Hard to Test
Pandas DataFrames are mutable, have complex dtype behavior, and many operations behave differently on edge cases (empty DataFrames, single rows, all-null columns). The common testing mistakes:
- Asserting on the whole DataFrame with
==(comparison returns a DataFrame of booleans, not a bool) - Using production data as test fixtures (tests become slow and environment-dependent)
- Skipping edge case tests (empty DataFrames, nulls, duplicates)
- Not testing dtype preservation (operations that silently convert int64 to float64)
Setting Up for Testing
Install the testing dependencies:
pip install pytest pandas numpy hypothesisRecommended project structure:
src/
transforms/
orders.py
customers.py
tests/
conftest.py
test_orders_transforms.py
test_customers_transforms.pyWriting Testable Pandas Code
First, write your transformation as a pure function that takes and returns DataFrames:
# src/transforms/orders.py
import pandas as pd
import numpy as np
def calculate_order_metrics(df: pd.DataFrame) -> pd.DataFrame:
"""
Calculate derived metrics for orders.
Input columns required: order_id, quantity, unit_price, discount_rate, created_at
Output adds: subtotal, discount_amount, total, order_month, size_category
"""
if df.empty:
return df.assign(
subtotal=pd.Series(dtype="float64"),
discount_amount=pd.Series(dtype="float64"),
total=pd.Series(dtype="float64"),
order_month=pd.Series(dtype="datetime64[ns]"),
size_category=pd.Series(dtype="object")
)
result = df.copy()
result["subtotal"] = result["quantity"] * result["unit_price"]
result["discount_amount"] = result["subtotal"] * result["discount_rate"]
result["total"] = result["subtotal"] - result["discount_amount"]
result["order_month"] = pd.to_datetime(result["created_at"]).dt.to_period("M").dt.to_timestamp()
result["size_category"] = np.where(result["total"] > 1000, "large", "small")
return resultpytest Fixtures for Test DataFrames
Define reusable fixtures in conftest.py:
# tests/conftest.py
import pytest
import pandas as pd
from datetime import datetime
@pytest.fixture
def sample_orders():
return pd.DataFrame({
"order_id": [1, 2, 3],
"quantity": [2, 5, 1],
"unit_price": [100.0, 50.0, 1500.0],
"discount_rate": [0.10, 0.0, 0.05],
"created_at": [
datetime(2025, 1, 15),
datetime(2025, 2, 20),
datetime(2025, 1, 31),
]
})
@pytest.fixture
def empty_orders():
return pd.DataFrame({
"order_id": pd.Series(dtype="int64"),
"quantity": pd.Series(dtype="int64"),
"unit_price": pd.Series(dtype="float64"),
"discount_rate": pd.Series(dtype="float64"),
"created_at": pd.Series(dtype="datetime64[ns]"),
})
@pytest.fixture
def orders_with_nulls():
return pd.DataFrame({
"order_id": [1, 2, 3],
"quantity": [2, None, 1],
"unit_price": [100.0, 50.0, None],
"discount_rate": [0.1, 0.0, 0.0],
"created_at": [datetime(2025, 1, 15), None, datetime(2025, 3, 1)],
})Core Assertion Patterns
Asserting on Specific Columns
# tests/test_orders_transforms.py
import pandas as pd
import pytest
from pandas.testing import assert_series_equal, assert_frame_equal
from src.transforms.orders import calculate_order_metrics
def test_calculates_subtotal(sample_orders):
result = calculate_order_metrics(sample_orders)
expected_subtotals = pd.Series([200.0, 250.0, 1500.0], name="subtotal")
assert_series_equal(result["subtotal"], expected_subtotals)
def test_calculates_discount_amount(sample_orders):
result = calculate_order_metrics(sample_orders)
expected_discounts = pd.Series([20.0, 0.0, 75.0], name="discount_amount")
assert_series_equal(result["discount_amount"], expected_discounts, check_exact=False, rtol=1e-5)
def test_calculates_total_after_discount(sample_orders):
result = calculate_order_metrics(sample_orders)
expected_totals = pd.Series([180.0, 250.0, 1425.0], name="total")
assert_series_equal(result["total"], expected_totals)Asserting on Full DataFrames
When you want to compare the entire result, use assert_frame_equal with tolerance for floats:
def test_full_output_schema(sample_orders):
result = calculate_order_metrics(sample_orders)
expected = pd.DataFrame({
"order_id": [1, 2, 3],
"quantity": [2, 5, 1],
"unit_price": [100.0, 50.0, 1500.0],
"discount_rate": [0.10, 0.0, 0.05],
"created_at": [datetime(2025, 1, 15), datetime(2025, 2, 20), datetime(2025, 1, 31)],
"subtotal": [200.0, 250.0, 1500.0],
"discount_amount": [20.0, 0.0, 75.0],
"total": [180.0, 250.0, 1425.0],
"order_month": pd.to_datetime(["2025-01-01", "2025-02-01", "2025-01-01"]),
"size_category": ["small", "small", "large"],
})
assert_frame_equal(result, expected, check_exact=False, rtol=1e-5)Testing Edge Cases
Empty DataFrame
def test_handles_empty_dataframe(empty_orders):
result = calculate_order_metrics(empty_orders)
assert len(result) == 0
assert "subtotal" in result.columns
assert "total" in result.columns
assert "size_category" in result.columnsSingle Row
def test_handles_single_row():
df = pd.DataFrame({
"order_id": [1],
"quantity": [1],
"unit_price": [100.0],
"discount_rate": [0.0],
"created_at": [datetime(2025, 6, 15)],
})
result = calculate_order_metrics(df)
assert len(result) == 1
assert result["total"].iloc[0] == 100.0
def test_handles_zero_discount():
df = pd.DataFrame({
"order_id": [1],
"quantity": [3],
"unit_price": [50.0],
"discount_rate": [0.0],
"created_at": [datetime(2025, 1, 1)],
})
result = calculate_order_metrics(df)
assert result["total"].iloc[0] == result["subtotal"].iloc[0]
def test_handles_full_discount():
df = pd.DataFrame({
"order_id": [1],
"quantity": [1],
"unit_price": [100.0],
"discount_rate": [1.0],
"created_at": [datetime(2025, 1, 1)],
})
result = calculate_order_metrics(df)
assert result["total"].iloc[0] == 0.0Null Values
def test_nulls_in_quantity_produce_null_total(orders_with_nulls):
result = calculate_order_metrics(orders_with_nulls)
# Row with null quantity should have null total
assert pd.isna(result.loc[result["order_id"] == 2, "total"].iloc[0])
def test_non_null_rows_are_unaffected_by_null_rows(orders_with_nulls):
result = calculate_order_metrics(orders_with_nulls)
# Row 1 has valid data, should calculate correctly
assert result.loc[result["order_id"] == 1, "total"].iloc[0] == 180.0Testing Data Types
dtype bugs are silent and dangerous—they cause failures downstream when code expects specific types:
def test_output_dtypes(sample_orders):
result = calculate_order_metrics(sample_orders)
assert result["order_id"].dtype == "int64"
assert result["subtotal"].dtype == "float64"
assert result["discount_amount"].dtype == "float64"
assert result["total"].dtype == "float64"
assert result["size_category"].dtype == "object"
assert pd.api.types.is_datetime64_any_dtype(result["order_month"])
def test_does_not_mutate_input(sample_orders):
original_cols = list(sample_orders.columns)
original_len = len(sample_orders)
calculate_order_metrics(sample_orders)
assert list(sample_orders.columns) == original_cols
assert len(sample_orders) == original_lenTesting Aggregations
# src/transforms/orders.py
def aggregate_monthly_revenue(df: pd.DataFrame) -> pd.DataFrame:
return (
df.groupby("order_month")
.agg(
total_revenue=("total", "sum"),
order_count=("order_id", "count"),
avg_order_value=("total", "mean"),
)
.reset_index()
.sort_values("order_month")
)def test_monthly_aggregation_row_count(sample_orders):
metrics = calculate_order_metrics(sample_orders)
result = aggregate_monthly_revenue(metrics)
# January and February should be distinct rows
assert len(result) == 2
def test_monthly_aggregation_revenue():
df = pd.DataFrame({
"order_id": [1, 2, 3],
"total": [100.0, 200.0, 300.0],
"order_month": pd.to_datetime(["2025-01-01", "2025-01-01", "2025-02-01"]),
})
result = aggregate_monthly_revenue(df)
jan = result[result["order_month"] == "2025-01-01"]
feb = result[result["order_month"] == "2025-02-01"]
assert jan["total_revenue"].iloc[0] == 300.0
assert feb["total_revenue"].iloc[0] == 300.0
assert jan["order_count"].iloc[0] == 2Property-Based Testing with Hypothesis
Hypothesis generates random DataFrames that satisfy your schema and checks that invariants hold:
from hypothesis import given, settings
from hypothesis import strategies as st
from hypothesis.extra.pandas import data_frames, column
import numpy as np
@given(
data_frames([
column("quantity", dtype=int, elements=st.integers(min_value=1, max_value=1000)),
column("unit_price", dtype=float, elements=st.floats(min_value=0.01, max_value=10000.0)),
column("discount_rate", dtype=float, elements=st.floats(min_value=0.0, max_value=1.0)),
])
)
@settings(max_examples=200)
def test_total_is_always_non_negative(df):
if df.empty:
return
df["order_id"] = range(len(df))
df["created_at"] = pd.Timestamp("2025-01-01")
result = calculate_order_metrics(df)
# Total should always be >= 0 since discount_rate is in [0, 1]
assert (result["total"] >= 0).all()
@given(
data_frames([
column("quantity", dtype=int, elements=st.integers(min_value=1, max_value=100)),
column("unit_price", dtype=float, elements=st.floats(min_value=0.01, max_value=1000.0)),
column("discount_rate", dtype=float, elements=st.just(0.0)), # no discount
])
)
def test_zero_discount_means_total_equals_subtotal(df):
if df.empty:
return
df["order_id"] = range(len(df))
df["created_at"] = pd.Timestamp("2025-01-01")
result = calculate_order_metrics(df)
assert_series_equal(result["total"], result["subtotal"], check_names=False)Testing with Parametrize
Use @pytest.mark.parametrize for table-driven tests of boundary conditions:
@pytest.mark.parametrize("total,expected_category", [
(0.0, "small"),
(999.99, "small"),
(1000.0, "small"),
(1000.01, "large"),
(10000.0, "large"),
])
def test_size_category_boundary(total, expected_category):
df = pd.DataFrame({
"order_id": [1],
"quantity": [1],
"unit_price": [total],
"discount_rate": [0.0],
"created_at": [datetime(2025, 1, 1)],
})
result = calculate_order_metrics(df)
assert result["size_category"].iloc[0] == expected_categoryTesting Joins and Merges
def test_customer_join_preserves_all_orders():
orders = pd.DataFrame({
"order_id": [1, 2, 3],
"customer_id": [101, 102, 103],
"total": [100.0, 200.0, 300.0],
})
customers = pd.DataFrame({
"customer_id": [101, 102], # Customer 103 missing
"customer_name": ["Alice", "Bob"],
})
from src.transforms.orders import enrich_orders_with_customers
result = enrich_orders_with_customers(orders, customers)
# All orders preserved even if customer missing (left join)
assert len(result) == 3
# Order 3 has null customer_name
assert pd.isna(result.loc[result["order_id"] == 3, "customer_name"].iloc[0])Running Tests
# Run all tests
pytest tests/
<span class="hljs-comment"># Run with coverage
pytest tests/ --cov=src/transforms --cov-report=html
<span class="hljs-comment"># Run only fast unit tests
pytest tests/ -m <span class="hljs-string">"not integration"
<span class="hljs-comment"># Run with verbose output and show local variables on failure
pytest tests/ -v --tb=long -lSummary
Effective pandas testing requires three things:
- Pure functions — transformations that take DataFrames and return DataFrames, with no side effects
- Explicit fixtures — small, known-value DataFrames where you can predict every output value
- Edge case coverage — empty DataFrames, null values, single rows, dtype boundaries
Start with the happy path test that verifies expected output for typical input. Add edge cases for empty, null, and boundary values. Add dtype assertions to catch silent type coercion bugs. Add property-based tests with Hypothesis for statistical invariants. Run everything in CI on every push.