Testing

Pandas Testing Best Practices: Testing Data Transformations with pytest

HelpMeTest

16 May 2026 — 6 min read

Testing pandas code requires different patterns than testing application code. DataFrames are complex objects—you rarely want to assert on the entire DataFrame at once. Instead, assert on specific columns, row counts, data types, and statistical properties. Use pytest fixtures for reusable test data, pandas.testing for DataFrame comparisons, and Hypothesis for property-based tests that find edge cases you wouldn't think to write.

Key Takeaways

Use pandas.testing.assert_frame_equal for DataFrame comparisons. It handles floating-point tolerance, column order, dtype checking, and produces readable diffs. Don't compare DataFrames with == or assert df1.equals(df2) in tests.

Test transformations on small, explicit DataFrames. A 3-5 row test DataFrame with known values is better than loading a CSV. You can predict the expected output exactly.

Test edge cases explicitly. Empty DataFrames, single-row DataFrames, null values, all-null columns, duplicate rows—these are where pandas transformation bugs hide.

Verify dtypes, not just values. A transformation that returns strings instead of floats will cause downstream failures. Assert column dtypes explicitly.

Property-based testing with Hypothesis finds bugs you'd never think to test. Generate random DataFrames that satisfy your schema and assert that invariants hold regardless of the specific values.

Why Pandas Code Is Hard to Test

Pandas DataFrames are mutable, have complex dtype behavior, and many operations behave differently on edge cases (empty DataFrames, single rows, all-null columns). The common testing mistakes:

Asserting on the whole DataFrame with == (comparison returns a DataFrame of booleans, not a bool)
Using production data as test fixtures (tests become slow and environment-dependent)
Skipping edge case tests (empty DataFrames, nulls, duplicates)
Not testing dtype preservation (operations that silently convert int64 to float64)

Setting Up for Testing

Install the testing dependencies:

pip install pytest pandas numpy hypothesis

Recommended project structure:

src/
  transforms/
    orders.py
    customers.py
tests/
  conftest.py
  test_orders_transforms.py
  test_customers_transforms.py

Writing Testable Pandas Code

First, write your transformation as a pure function that takes and returns DataFrames:

# src/transforms/orders.py
import pandas as pd
import numpy as np

def calculate_order_metrics(df: pd.DataFrame) -> pd.DataFrame:
    """
    Calculate derived metrics for orders.
    
    Input columns required: order_id, quantity, unit_price, discount_rate, created_at
    Output adds: subtotal, discount_amount, total, order_month, size_category
    """
    if df.empty:
        return df.assign(
            subtotal=pd.Series(dtype="float64"),
            discount_amount=pd.Series(dtype="float64"),
            total=pd.Series(dtype="float64"),
            order_month=pd.Series(dtype="datetime64[ns]"),
            size_category=pd.Series(dtype="object")
        )
    
    result = df.copy()
    result["subtotal"] = result["quantity"] * result["unit_price"]
    result["discount_amount"] = result["subtotal"] * result["discount_rate"]
    result["total"] = result["subtotal"] - result["discount_amount"]
    result["order_month"] = pd.to_datetime(result["created_at"]).dt.to_period("M").dt.to_timestamp()
    result["size_category"] = np.where(result["total"] > 1000, "large", "small")
    
    return result

pytest Fixtures for Test DataFrames

Define reusable fixtures in conftest.py:

# tests/conftest.py
import pytest
import pandas as pd
from datetime import datetime

@pytest.fixture
def sample_orders():
    return pd.DataFrame({
        "order_id": [1, 2, 3],
        "quantity": [2, 5, 1],
        "unit_price": [100.0, 50.0, 1500.0],
        "discount_rate": [0.10, 0.0, 0.05],
        "created_at": [
            datetime(2025, 1, 15),
            datetime(2025, 2, 20),
            datetime(2025, 1, 31),
        ]
    })

@pytest.fixture
def empty_orders():
    return pd.DataFrame({
        "order_id": pd.Series(dtype="int64"),
        "quantity": pd.Series(dtype="int64"),
        "unit_price": pd.Series(dtype="float64"),
        "discount_rate": pd.Series(dtype="float64"),
        "created_at": pd.Series(dtype="datetime64[ns]"),
    })

@pytest.fixture
def orders_with_nulls():
    return pd.DataFrame({
        "order_id": [1, 2, 3],
        "quantity": [2, None, 1],
        "unit_price": [100.0, 50.0, None],
        "discount_rate": [0.1, 0.0, 0.0],
        "created_at": [datetime(2025, 1, 15), None, datetime(2025, 3, 1)],
    })

Core Assertion Patterns

Asserting on Specific Columns

# tests/test_orders_transforms.py
import pandas as pd
import pytest
from pandas.testing import assert_series_equal, assert_frame_equal
from src.transforms.orders import calculate_order_metrics

def test_calculates_subtotal(sample_orders):
    result = calculate_order_metrics(sample_orders)
    
    expected_subtotals = pd.Series([200.0, 250.0, 1500.0], name="subtotal")
    assert_series_equal(result["subtotal"], expected_subtotals)

def test_calculates_discount_amount(sample_orders):
    result = calculate_order_metrics(sample_orders)
    
    expected_discounts = pd.Series([20.0, 0.0, 75.0], name="discount_amount")
    assert_series_equal(result["discount_amount"], expected_discounts, check_exact=False, rtol=1e-5)

def test_calculates_total_after_discount(sample_orders):
    result = calculate_order_metrics(sample_orders)
    
    expected_totals = pd.Series([180.0, 250.0, 1425.0], name="total")
    assert_series_equal(result["total"], expected_totals)

Asserting on Full DataFrames

When you want to compare the entire result, use assert_frame_equal with tolerance for floats:

def test_full_output_schema(sample_orders):
    result = calculate_order_metrics(sample_orders)
    
    expected = pd.DataFrame({
        "order_id": [1, 2, 3],
        "quantity": [2, 5, 1],
        "unit_price": [100.0, 50.0, 1500.0],
        "discount_rate": [0.10, 0.0, 0.05],
        "created_at": [datetime(2025, 1, 15), datetime(2025, 2, 20), datetime(2025, 1, 31)],
        "subtotal": [200.0, 250.0, 1500.0],
        "discount_amount": [20.0, 0.0, 75.0],
        "total": [180.0, 250.0, 1425.0],
        "order_month": pd.to_datetime(["2025-01-01", "2025-02-01", "2025-01-01"]),
        "size_category": ["small", "small", "large"],
    })
    
    assert_frame_equal(result, expected, check_exact=False, rtol=1e-5)

Testing Edge Cases

Empty DataFrame

def test_handles_empty_dataframe(empty_orders):
    result = calculate_order_metrics(empty_orders)
    
    assert len(result) == 0
    assert "subtotal" in result.columns
    assert "total" in result.columns
    assert "size_category" in result.columns

Single Row

def test_handles_single_row():
    df = pd.DataFrame({
        "order_id": [1],
        "quantity": [1],
        "unit_price": [100.0],
        "discount_rate": [0.0],
        "created_at": [datetime(2025, 6, 15)],
    })
    result = calculate_order_metrics(df)
    
    assert len(result) == 1
    assert result["total"].iloc[0] == 100.0

def test_handles_zero_discount():
    df = pd.DataFrame({
        "order_id": [1],
        "quantity": [3],
        "unit_price": [50.0],
        "discount_rate": [0.0],
        "created_at": [datetime(2025, 1, 1)],
    })
    result = calculate_order_metrics(df)
    assert result["total"].iloc[0] == result["subtotal"].iloc[0]

def test_handles_full_discount():
    df = pd.DataFrame({
        "order_id": [1],
        "quantity": [1],
        "unit_price": [100.0],
        "discount_rate": [1.0],
        "created_at": [datetime(2025, 1, 1)],
    })
    result = calculate_order_metrics(df)
    assert result["total"].iloc[0] == 0.0

Null Values

def test_nulls_in_quantity_produce_null_total(orders_with_nulls):
    result = calculate_order_metrics(orders_with_nulls)
    
    # Row with null quantity should have null total
    assert pd.isna(result.loc[result["order_id"] == 2, "total"].iloc[0])

def test_non_null_rows_are_unaffected_by_null_rows(orders_with_nulls):
    result = calculate_order_metrics(orders_with_nulls)
    
    # Row 1 has valid data, should calculate correctly
    assert result.loc[result["order_id"] == 1, "total"].iloc[0] == 180.0

Testing Data Types

dtype bugs are silent and dangerous—they cause failures downstream when code expects specific types:

def test_output_dtypes(sample_orders):
    result = calculate_order_metrics(sample_orders)
    
    assert result["order_id"].dtype == "int64"
    assert result["subtotal"].dtype == "float64"
    assert result["discount_amount"].dtype == "float64"
    assert result["total"].dtype == "float64"
    assert result["size_category"].dtype == "object"
    assert pd.api.types.is_datetime64_any_dtype(result["order_month"])

def test_does_not_mutate_input(sample_orders):
    original_cols = list(sample_orders.columns)
    original_len = len(sample_orders)
    
    calculate_order_metrics(sample_orders)
    
    assert list(sample_orders.columns) == original_cols
    assert len(sample_orders) == original_len

Testing Aggregations

# src/transforms/orders.py
def aggregate_monthly_revenue(df: pd.DataFrame) -> pd.DataFrame:
    return (
        df.groupby("order_month")
        .agg(
            total_revenue=("total", "sum"),
            order_count=("order_id", "count"),
            avg_order_value=("total", "mean"),
        )
        .reset_index()
        .sort_values("order_month")
    )

def test_monthly_aggregation_row_count(sample_orders):
    metrics = calculate_order_metrics(sample_orders)
    result = aggregate_monthly_revenue(metrics)
    
    # January and February should be distinct rows
    assert len(result) == 2

def test_monthly_aggregation_revenue():
    df = pd.DataFrame({
        "order_id": [1, 2, 3],
        "total": [100.0, 200.0, 300.0],
        "order_month": pd.to_datetime(["2025-01-01", "2025-01-01", "2025-02-01"]),
    })
    result = aggregate_monthly_revenue(df)
    
    jan = result[result["order_month"] == "2025-01-01"]
    feb = result[result["order_month"] == "2025-02-01"]
    
    assert jan["total_revenue"].iloc[0] == 300.0
    assert feb["total_revenue"].iloc[0] == 300.0
    assert jan["order_count"].iloc[0] == 2

Property-Based Testing with Hypothesis

Hypothesis generates random DataFrames that satisfy your schema and checks that invariants hold:

from hypothesis import given, settings
from hypothesis import strategies as st
from hypothesis.extra.pandas import data_frames, column
import numpy as np

@given(
    data_frames([
        column("quantity", dtype=int, elements=st.integers(min_value=1, max_value=1000)),
        column("unit_price", dtype=float, elements=st.floats(min_value=0.01, max_value=10000.0)),
        column("discount_rate", dtype=float, elements=st.floats(min_value=0.0, max_value=1.0)),
    ])
)
@settings(max_examples=200)
def test_total_is_always_non_negative(df):
    if df.empty:
        return
    
    df["order_id"] = range(len(df))
    df["created_at"] = pd.Timestamp("2025-01-01")
    
    result = calculate_order_metrics(df)
    
    # Total should always be >= 0 since discount_rate is in [0, 1]
    assert (result["total"] >= 0).all()

@given(
    data_frames([
        column("quantity", dtype=int, elements=st.integers(min_value=1, max_value=100)),
        column("unit_price", dtype=float, elements=st.floats(min_value=0.01, max_value=1000.0)),
        column("discount_rate", dtype=float, elements=st.just(0.0)),  # no discount
    ])
)
def test_zero_discount_means_total_equals_subtotal(df):
    if df.empty:
        return
    
    df["order_id"] = range(len(df))
    df["created_at"] = pd.Timestamp("2025-01-01")
    
    result = calculate_order_metrics(df)
    
    assert_series_equal(result["total"], result["subtotal"], check_names=False)

Testing with Parametrize

Use @pytest.mark.parametrize for table-driven tests of boundary conditions:

@pytest.mark.parametrize("total,expected_category", [
    (0.0, "small"),
    (999.99, "small"),
    (1000.0, "small"),
    (1000.01, "large"),
    (10000.0, "large"),
])
def test_size_category_boundary(total, expected_category):
    df = pd.DataFrame({
        "order_id": [1],
        "quantity": [1],
        "unit_price": [total],
        "discount_rate": [0.0],
        "created_at": [datetime(2025, 1, 1)],
    })
    result = calculate_order_metrics(df)
    assert result["size_category"].iloc[0] == expected_category

Testing Joins and Merges

def test_customer_join_preserves_all_orders():
    orders = pd.DataFrame({
        "order_id": [1, 2, 3],
        "customer_id": [101, 102, 103],
        "total": [100.0, 200.0, 300.0],
    })
    customers = pd.DataFrame({
        "customer_id": [101, 102],  # Customer 103 missing
        "customer_name": ["Alice", "Bob"],
    })
    
    from src.transforms.orders import enrich_orders_with_customers
    result = enrich_orders_with_customers(orders, customers)
    
    # All orders preserved even if customer missing (left join)
    assert len(result) == 3
    # Order 3 has null customer_name
    assert pd.isna(result.loc[result["order_id"] == 3, "customer_name"].iloc[0])

Running Tests

# Run all tests
pytest tests/

<span class="hljs-comment"># Run with coverage
pytest tests/ --cov=src/transforms --cov-report=html

<span class="hljs-comment"># Run only fast unit tests
pytest tests/ -m <span class="hljs-string">"not integration"

<span class="hljs-comment"># Run with verbose output and show local variables on failure
pytest tests/ -v --tb=long -l

Summary

Effective pandas testing requires three things:

Pure functions — transformations that take DataFrames and return DataFrames, with no side effects
Explicit fixtures — small, known-value DataFrames where you can predict every output value
Edge case coverage — empty DataFrames, null values, single rows, dtype boundaries

Start with the happy path test that verifies expected output for typical input. Add edge cases for empty, null, and boundary values. Add dtype assertions to catch silent type coercion bugs. Add property-based tests with Hypothesis for statistical invariants. Run everything in CI on every push.