dbt Testing Guide: Schema Tests, dbt-expectations, and Custom Tests

dbt Testing Guide: Schema Tests, dbt-expectations, and Custom Tests

dbt ships with four built-in tests (not_null, unique, accepted_values, relationships) that cover the basics. For richer assertions—value ranges, regex patterns, statistical checks—add the dbt-expectations package. When neither covers your case, write a custom generic test. This guide covers all three layers plus how to run tests in CI.

Key Takeaways

Built-in tests cover the four most common checks. not_null, unique, accepted_values, and relationships handle the majority of data integrity needs with zero extra packages.

dbt-expectations adds 50+ data quality assertions. expect_column_values_to_be_between, expect_column_mean_to_be_between, expect_column_to_match_regex—these translate Great Expectations patterns into dbt syntax.

Custom generic tests are SQL queries that return failures. Any rows returned by a test query count as failures. This mental model makes writing custom tests straightforward.

Singular tests are SQL files in the tests/ directory. They're one-off checks that don't need to be reusable—useful for complex business logic assertions.

Test severity controls whether failures block the build. Use severity: warn for data quality alerts that shouldn't halt pipeline execution.

Why dbt Tests Matter

Data pipelines are different from application code. A bug in application code raises an exception immediately. A bug in a SQL transformation silently produces wrong numbers that flow into dashboards, reports, and downstream models before anyone notices.

dbt tests are the mechanism to catch these silent failures. They run assertions against your transformed data and fail the pipeline if the data doesn't meet expectations. The result: data quality issues surface at transformation time, not when a stakeholder spots an anomaly in a dashboard.

Built-in dbt Tests

dbt ships with four schema tests that you configure in schema.yml files:

models:
  - name: orders
    columns:
      - name: order_id
        tests:
          - not_null
          - unique
      - name: status
        tests:
          - accepted_values:
              values: ['placed', 'shipped', 'returned', 'completed']
      - name: customer_id
        tests:
          - relationships:
              to: ref('customers')
              field: customer_id

not_null — verifies no nulls in the column. Catches missing data from upstream sources.

unique — verifies no duplicate values. Essential for primary keys and business identifiers.

accepted_values — verifies column values are within a defined set. Catches unexpected status codes, categories, or enum values.

relationships — verifies foreign key integrity between models. Catches joins that would produce unexpected nulls or duplicates.

Run all tests with:

dbt test

Run tests for a specific model:

dbt test --<span class="hljs-keyword">select orders

Run only a specific test type:

dbt test --<span class="hljs-keyword">select test_type:not_null

dbt-expectations Package

The dbt-expectations package ports Great Expectations assertions into dbt's testing framework. Install it by adding to packages.yml:

packages:
  - package: calogica/dbt_expectations
    version: [">=0.10.0", "<0.11.0"]

Run dbt deps to install.

Value Range Tests

Check that numeric columns stay within expected bounds:

models:
  - name: orders
    columns:
      - name: order_amount
        tests:
          - dbt_expectations.expect_column_values_to_be_between:
              min_value: 0
              max_value: 10000
              strictly: false

Pattern Matching Tests

Validate string formats with regex:

      - name: email
        tests:
          - dbt_expectations.expect_column_values_to_match_regex:
              regex: "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"

Statistical Tests

Catch sudden distribution shifts that might indicate data pipeline issues:

      - name: order_amount
        tests:
          - dbt_expectations.expect_column_mean_to_be_between:
              min_value: 50
              max_value: 500
          - dbt_expectations.expect_column_stdev_to_be_between:
              min_value: 10
              max_value: 200

Row Count Tests

Ensure models have the expected number of rows—useful for catching full table drops or filter bugs:

models:
  - name: orders
    tests:
      - dbt_expectations.expect_table_row_count_to_be_between:
          min_value: 1000
          max_value: 1000000

Value Set Tests

Check that a column contains all expected values (not just a subset):

      - name: status
        tests:
          - dbt_expectations.expect_column_distinct_values_to_equal_set:
              value_set: ['placed', 'shipped', 'returned', 'completed']

Custom Generic Tests

When built-in tests and dbt-expectations don't cover your case, write a custom generic test. Create a SQL file in the tests/generic/ directory:

-- tests/generic/assert_column_sum_equals.sql
{% test assert_column_sum_equals(model, column_name, expected_sum, tolerance=0) %}

with calculated as (
    select sum({{ column_name }}) as actual_sum
    from {{ model }}
),
validation as (
    select
        actual_sum,
        {{ expected_sum }} as expected_sum,
        abs(actual_sum - {{ expected_sum }}) as difference
    from calculated
)
select *
from validation
where difference > {{ tolerance }}

{% endtest %}

Use it in schema.yml like any built-in test:

      - name: revenue
        tests:
          - assert_column_sum_equals:
              expected_sum: 1000000
              tolerance: 5000

Generic Test with Multiple Columns

Tests can reference multiple columns. Example: verify that start date is always before end date:

-- tests/generic/assert_column_a_less_than_column_b.sql
{% test assert_column_a_less_than_column_b(model, column_name, column_b) %}

select *
from {{ model }}
where {{ column_name }} >= {{ column_b }}

{% endtest %}

Usage:

      - name: start_date
        tests:
          - assert_column_a_less_than_column_b:
              column_b: end_date

Singular Tests

Singular tests are SQL files in the tests/ directory (not tests/generic/). They're one-off assertions rather than reusable generic tests. Any rows returned count as failures.

-- tests/assert_revenue_reconciles_with_payments.sql
-- Revenue in orders should match payment totals within 1%
with order_revenue as (
    select sum(order_amount) as total_revenue
    from {{ ref('orders') }}
    where status = 'completed'
),
payment_total as (
    select sum(payment_amount) as total_payments
    from {{ ref('payments') }}
    where payment_status = 'settled'
),
comparison as (
    select
        o.total_revenue,
        p.total_payments,
        abs(o.total_revenue - p.total_payments) / o.total_revenue as discrepancy_pct
    from order_revenue o
    cross join payment_total p
)
select *
from comparison
where discrepancy_pct > 0.01

Singular tests are best for complex reconciliation checks, cross-model consistency checks, and business rule validations that are specific to your domain.

Test Severity

By default, failing tests block the pipeline. Use severity to convert failures into warnings instead:

      - name: order_amount
        tests:
          - dbt_expectations.expect_column_mean_to_be_between:
              min_value: 50
              max_value: 500
              severity: warn

With severity: warn, the test failure appears in output but doesn't fail the dbt test command. Use warnings for data quality monitoring that shouldn't block pipelines—things like statistical drift checks where occasional out-of-bound values are expected.

You can also set a failure threshold:

          - dbt_expectations.expect_column_values_to_be_between:
              min_value: 0
              max_value: 10000
              config:
                severity: warn
                warn_if: ">10"
                error_if: ">100"

Running Tests in CI

A typical CI workflow runs tests after every model build:

# .github/workflows/dbt-ci.yml
name: dbt CI
on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Install dbt
        run: pip install dbt-bigquery  # or dbt-snowflake, dbt-postgres, etc.
      
      - name: Install packages
        run: dbt deps
        working-directory: ./dbt
      
      - name: Build models
        run: dbt build --target ci
        working-directory: ./dbt
        env:
          DBT_PROFILES_DIR: .

dbt build runs dbt run and dbt test together, so models and tests run in dependency order. If a model fails to build, its downstream tests are skipped.

Test Coverage by Source

Run source freshness checks alongside model tests:

dbt source freshness
dbt build --<span class="hljs-keyword">select <span class="hljs-built_in">source:*+

This checks that your raw source tables are being updated within their expected freshness windows before testing models that depend on them.

Organizing Tests by Layer

A practical pattern: apply stricter tests closer to raw data, more business-logic-focused tests on marts.

Staging models — validate source data integrity:

  • not_null on primary keys and required fields
  • accepted_values on status/type columns
  • relationships to verify FK integrity
  • regex tests on email, phone, ID formats

Intermediate models — validate transformation logic:

  • uniqueness on grain keys
  • value ranges that should hold after transformations
  • custom tests for business rule enforcement

Mart models — validate business metrics:

  • row count bounds (marts shouldn't suddenly shrink)
  • statistical range checks on KPI columns
  • reconciliation tests against source of truth

Testing with HelpMeTest

For end-to-end validation—verifying that dashboards built on top of your dbt models display correctly—use HelpMeTest alongside dbt tests. HelpMeTest runs browser-based tests against your BI tools and can catch rendering issues, broken filters, and visual anomalies that SQL tests can't detect.

Pair dbt tests (data layer) with HelpMeTest (presentation layer) for full coverage: dbt ensures the numbers are right, HelpMeTest ensures the dashboards show them correctly.

Summary

dbt's testing framework is layered:

  1. Built-in tests — not_null, unique, accepted_values, relationships for core data integrity
  2. dbt-expectations — 50+ assertions for ranges, patterns, distributions, and row counts
  3. Custom generic tests — reusable SQL assertions for your domain-specific rules
  4. Singular tests — one-off SQL assertions for complex reconciliation checks

Start with built-in tests on every model's primary key and critical columns. Add dbt-expectations for statistical monitoring. Write custom tests for business rules that don't fit the standard patterns. Run everything in CI to catch data quality issues before they reach production.

Read more

ScyllaDB Testing Guide: Cassandra Driver Compatibility, Shard-per-Core Testing & Performance Regression

ScyllaDB Testing Guide: Cassandra Driver Compatibility, Shard-per-Core Testing & Performance Regression

ScyllaDB delivers Cassandra-compatible APIs with a rewritten Seastar-based engine that achieves dramatically higher throughput. Testing ScyllaDB applications requires validating both Cassandra compatibility and ScyllaDB-specific behaviors like shard-per-core data distribution. This guide covers both angles. ScyllaDB Testing Landscape ScyllaDB is a drop-in replacement for Cassandra at the API level—which means

By HelpMeTest