Testing BigQuery Queries and Data Pipelines with bq-test-kit and dbt

Testing BigQuery Queries and Data Pipelines with bq-test-kit and dbt

BigQuery pipelines are notoriously hard to test because every query costs money and requires live infrastructure. This guide covers three layers of testing: bq-test-kit for Python unit tests against a local emulator, dbt's built-in test framework for data quality assertions, and direct SQL testing patterns for UDFs and partitioned tables — all runnable in CI without a real GCP project.

Key Takeaways

Use the BigQuery emulator for unit tests. Running a local emulator via Docker eliminates query costs and network latency in CI, making it practical to run hundreds of SQL unit tests on every commit.

bq-test-kit lets you test SQL in isolation. It handles dataset/table creation, data injection, and cleanup — you write a query, assert on results, done.

dbt singular tests catch business logic regressions. A singular test is just a SQL file that returns rows on failure — powerful for encoding invariants like "revenue must never be negative."

Test UDFs with known inputs and edge cases. BigQuery UDFs are pure functions; test them with null inputs, boundary values, and Unicode edge cases before deploying.

Data freshness tests prevent silent staleness. A freshness assertion on your most critical tables will catch pipeline failures before your stakeholders do.

Why Testing BigQuery Is Hard

BigQuery is a managed service with no local equivalent built in. Every query hits the network, costs money (or burns free tier quota), and requires IAM permissions. The usual unit-test feedback loop — run fast, iterate fast — breaks down the moment you introduce a cloud warehouse.

The practical answer is a combination of:

  1. A local emulator that speaks the BigQuery API
  2. A Python test harness that injects data and asserts on query results
  3. dbt tests for data quality at the pipeline level

This guide walks through all three.

Setting Up the BigQuery Emulator

The open-source BigQuery emulator from goccy/bigquery-emulator runs as a Docker container and implements most of the BigQuery REST and Storage APIs.

# docker-compose.yml
version: "3.8"
services:
  bigquery-emulator:
    image: ghcr.io/goccy/bigquery-emulator:0.6.3
    ports:
      - "9050:9050"
      - "9060:9060"
    command: ["--project=test-project", "--log-level=error"]

Start it with docker compose up -d bigquery-emulator. The emulator exposes a standard BigQuery HTTP API on port 9050 and a gRPC endpoint on 9060.

To point the Python client at the emulator, set the API endpoint override:

from google.cloud import bigquery
from google.api_core.client_options import ClientOptions

client = bigquery.Client(
    project="test-project",
    client_options=ClientOptions(api_endpoint="http://localhost:9050"),
    credentials=None,  # emulator ignores credentials
)

Unit Testing with bq-test-kit

bq-test-kit is a pytest-friendly library that wraps the BigQuery client and provides a clean DSL for creating datasets, loading fixture data, running queries, and asserting on results.

pip install bq-test-kit[pandas] pytest

Basic Query Test

# tests/test_revenue_query.py
import pytest
from bq_test_kit.bq_test_kit import BQTestKit
from bq_test_kit.bq_test_kit_config import BQTestKitConfig
from bq_test_kit.resource_loaders.package_file_loader import PackageFileLoader
from google.cloud import bigquery
from google.api_core.client_options import ClientOptions

@pytest.fixture(scope="session")
def bq_client():
    return bigquery.Client(
        project="test-project",
        client_options=ClientOptions(api_endpoint="http://localhost:9050"),
        credentials=None,
    )

@pytest.fixture(scope="session")
def bqtk(bq_client):
    return BQTestKit(bq_client=bq_client, bqtk_config=BQTestKitConfig())

def test_daily_revenue_aggregation(bqtk):
    with bqtk.dataset("analytics").clean_and_keep() as ds:
        with ds.table("orders", schema=[
            bigquery.SchemaField("order_id", "STRING"),
            bigquery.SchemaField("order_date", "DATE"),
            bigquery.SchemaField("amount_usd", "FLOAT64"),
            bigquery.SchemaField("status", "STRING"),
        ]).insert([
            {"order_id": "o1", "order_date": "2026-01-15", "amount_usd": 100.0, "status": "completed"},
            {"order_id": "o2", "order_date": "2026-01-15", "amount_usd": 50.0, "status": "completed"},
            {"order_id": "o3", "order_date": "2026-01-15", "amount_usd": 200.0, "status": "refunded"},
            {"order_id": "o4", "order_date": "2026-01-16", "amount_usd": 75.0, "status": "completed"},
        ]) as orders:
            result = bqtk.query("""
                SELECT
                  order_date,
                  SUM(amount_usd) AS total_revenue,
                  COUNT(*) AS order_count
                FROM `test-project.analytics.orders`
                WHERE status = 'completed'
                GROUP BY order_date
                ORDER BY order_date
            """).to_dataframe()

            assert len(result) == 2
            jan15 = result[result["order_date"].astype(str) == "2026-01-15"].iloc[0]
            assert jan15["total_revenue"] == 150.0
            assert jan15["order_count"] == 2

The .clean_and_keep() context manager creates the dataset at entry and deletes it at exit, keeping your emulator clean between test runs.

Testing BigQuery UDFs

User-defined functions are pure transformations — they're ideal for unit testing because you control inputs and can enumerate edge cases.

-- sql/udf_normalize_phone.sql
CREATE OR REPLACE FUNCTION `test-project.utils.normalize_phone`(raw STRING)
RETURNS STRING
LANGUAGE js AS """
  if (!raw) return null;
  const digits = raw.replace(/\\D/g, '');
  if (digits.length === 10) return '+1' + digits;
  if (digits.length === 11 && digits[0] === '1') return '+' + digits;
  return null;
""";
# tests/test_udf_normalize_phone.py
import pytest

@pytest.mark.parametrize("raw, expected", [
    ("555-867-5309", "+15558675309"),
    ("(555) 867-5309", "+15558675309"),
    ("15558675309", "+15558675309"),
    (None, None),
    ("", None),
    ("not-a-phone", None),
    ("1234", None),
])
def test_normalize_phone(bqtk, raw, expected):
    # Create UDF first
    bqtk.query(open("sql/udf_normalize_phone.sql").read()).run()

    raw_literal = f"'{raw}'" if raw is not None else "NULL"
    result = bqtk.query(f"""
        SELECT `test-project.utils.normalize_phone`({raw_literal}) AS normalized
    """).to_dataframe()

    assert result.iloc[0]["normalized"] == expected

Testing Partitioned Table Queries

Partition pruning is a common source of bugs — a missing WHERE event_date BETWEEN ... clause can silently full-scan a petabyte table. Test that your queries actually filter correctly.

def test_partition_filter_applied(bqtk):
    """Ensure the query only reads from the target partition."""
    schema = [
        bigquery.SchemaField("event_id", "STRING"),
        bigquery.SchemaField("event_date", "DATE"),
        bigquery.SchemaField("user_id", "STRING"),
        bigquery.SchemaField("event_type", "STRING"),
    ]

    with bqtk.dataset("events").clean_and_keep() as ds:
        with ds.table("user_events", schema=schema,
                      partition_field="event_date").insert([
            {"event_id": "e1", "event_date": "2026-01-01", "user_id": "u1", "event_type": "click"},
            {"event_id": "e2", "event_date": "2026-01-02", "user_id": "u2", "event_type": "purchase"},
            {"event_id": "e3", "event_date": "2026-01-03", "user_id": "u3", "event_type": "click"},
        ]) as t:
            result = bqtk.query("""
                SELECT event_id, user_id
                FROM `test-project.events.user_events`
                WHERE event_date = '2026-01-02'
            """).to_dataframe()

            assert len(result) == 1
            assert result.iloc[0]["event_id"] == "e2"

dbt Tests for Data Quality

dbt has two test categories: generic tests (built-in constraints) and singular tests (custom SQL that returns rows on failure).

Generic Tests in schema.yml

# models/schema.yml
version: 2

models:
  - name: fct_orders
    columns:
      - name: order_id
        tests:
          - unique
          - not_null
      - name: amount_usd
        tests:
          - not_null
          - dbt_utils.expression_is_true:
              expression: ">= 0"
      - name: status
        tests:
          - accepted_values:
              values: ["pending", "completed", "refunded", "cancelled"]

Run with dbt test --select fct_orders. Each test compiles to a SQL query that returns failing rows — zero rows means the test passes.

Singular Tests for Business Logic

-- tests/assert_revenue_never_negative.sql
-- Returns rows where daily revenue is negative — should always be 0 rows.
SELECT
  order_date,
  SUM(amount_usd) AS total_revenue
FROM {{ ref('fct_orders') }}
WHERE status = 'completed'
GROUP BY order_date
HAVING SUM(amount_usd) < 0
-- tests/assert_no_duplicate_events_per_user_per_day.sql
SELECT
  user_id,
  event_date,
  event_type,
  COUNT(*) AS cnt
FROM {{ ref('fct_user_events') }}
GROUP BY 1, 2, 3
HAVING COUNT(*) > 1

Testing Data Freshness

Add freshness assertions to your sources.yml:

# models/sources.yml
version: 2

sources:
  - name: raw
    database: my_project
    schema: raw_data
    freshness:
      warn_after:
        count: 6
        period: hour
      error_after:
        count: 24
        period: hour
    loaded_at_field: _loaded_at
    tables:
      - name: orders
      - name: events
        freshness:
          error_after:
            count: 2
            period: hour

Run dbt source freshness in CI to catch stale tables before they affect downstream models.

CI Integration

# .github/workflows/bigquery-tests.yml
name: BigQuery Tests
on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    services:
      bigquery-emulator:
        image: ghcr.io/goccy/bigquery-emulator:0.6.3
        ports:
          - 9050:9050
        options: >-
          --health-cmd "curl -sf http://localhost:9050/bigquery/v2/projects"
          --health-interval 5s
          --health-retries 10

    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      - run: pip install -r requirements-test.txt
      - run: pytest tests/ -v --tb=short
        env:
          BIGQUERY_EMULATOR_HOST: localhost:9050
          GOOGLE_CLOUD_PROJECT: test-project

For dbt tests in CI, point your profiles.yml at the emulator using the bigquery adapter's http connection type, or use a dedicated GCP test project with a service account scoped to a test dataset.

What to Test vs. What to Skip

Test these:

  • SQL transformation logic — aggregations, joins, window functions with known fixture data
  • UDFs — all inputs including nulls, empty strings, boundary values
  • Partition filter correctness — verify queries respect date ranges
  • dbt model relationships — referential integrity between fact and dimension tables
  • Data freshness thresholds on critical source tables
  • Schema evolution — column renames or type changes that break downstream models

Skip these:

  • BigQuery infrastructure itself — Google tests that
  • Query performance and slot consumption — not a unit test concern
  • IAM and permission boundaries — test in staging with real credentials, not in unit tests
  • Storage billing estimates — irrelevant to correctness testing
  • Auto-detected schema inference on JSON exports — flaky by nature, too much variance

Read more