Testing ML Models with Evidently AI: Data Drift and Model Quality Reports

Testing ML Models with Evidently AI: Data Drift and Model Quality Reports

Evidently AI is an open-source framework for evaluating and monitoring machine learning models. It generates reports and test suites that check data drift, model quality, and dataset statistics — making it a core tool for MLOps testing pipelines.

What Evidently Tests

Evidently organizes checks into three categories:

  • Data quality — missing values, duplicates, type mismatches, out-of-range values
  • Data drift — statistical shift between reference and current datasets
  • Model performance — accuracy, precision, recall, RMSE, and custom metrics

Installing Evidently

pip install evidently

Basic Test Suite

The TestSuite class runs a set of checks and returns pass/fail results:

import pandas as pd
from evidently.test_suite import TestSuite
from evidently.tests import (
    TestNumberOfMissingValues,
    TestNumberOfDuplicatedRows,
    TestShareOfDriftedColumns,
    TestColumnsType,
)

# Load reference (training) and current (production) data
reference = pd.read_csv("reference_data.csv")
current = pd.read_csv("current_data.csv")

suite = TestSuite(tests=[
    TestNumberOfMissingValues(lt=0.05),
    TestNumberOfDuplicatedRows(lt=10),
    TestShareOfDriftedColumns(lt=0.3),
    TestColumnsType(),
])

suite.run(reference_data=reference, current_data=current)

# Assert all tests passed
result = suite.as_dict()
assert result["summary"]["all_passed"], f"Tests failed: {result}"

Data Drift Detection

Evidently uses statistical tests (KS-test, PSI, chi-squared) to detect drift per column:

from evidently.test_suite import TestSuite
from evidently.tests import (
    TestValueDrift,
    TestShareOfDriftedColumns,
)

suite = TestSuite(tests=[
    TestValueDrift(column_name="age"),
    TestValueDrift(column_name="income"),
    TestShareOfDriftedColumns(lt=0.2),
])

suite.run(reference_data=reference, current_data=current)

# Save HTML report for inspection
suite.save_html("drift_report.html")

# Programmatic result
result = suite.as_dict()
for test in result["tests"]:
    print(f"{test['name']}: {test['status']}")

Model Quality Tests

For supervised models, pass predictions alongside features:

from evidently.test_suite import TestSuite
from evidently.tests import (
    TestAccuracyScore,
    TestPrecisionScore,
    TestRecallScore,
    TestF1Score,
)
from evidently.pipeline.column_mapping import ColumnMapping

column_mapping = ColumnMapping(
    target="label",
    prediction="prediction",
)

suite = TestSuite(tests=[
    TestAccuracyScore(gte=0.85),
    TestPrecisionScore(gte=0.80),
    TestRecallScore(gte=0.75),
    TestF1Score(gte=0.80),
])

suite.run(
    reference_data=reference,
    current_data=current,
    column_mapping=column_mapping,
)

assert suite.as_dict()["summary"]["all_passed"]

Integrating with pytest

Wrap Evidently test suites inside pytest for CI integration:

import pytest
import pandas as pd
from evidently.test_suite import TestSuite
from evidently.tests import TestShareOfDriftedColumns, TestNumberOfMissingValues

@pytest.fixture(scope="session")
def datasets():
    reference = pd.read_csv("tests/data/reference.csv")
    current = pd.read_csv("tests/data/current.csv")
    return reference, current

def test_no_significant_drift(datasets):
    reference, current = datasets
    suite = TestSuite(tests=[
        TestShareOfDriftedColumns(lt=0.3),
    ])
    suite.run(reference_data=reference, current_data=current)
    result = suite.as_dict()
    assert result["summary"]["all_passed"], (
        f"Drift detected: {result['summary']['failed_tests']} tests failed"
    )

def test_data_quality(datasets):
    reference, current = datasets
    suite = TestSuite(tests=[
        TestNumberOfMissingValues(lt=0.05),
    ])
    suite.run(reference_data=reference, current_data=current)
    assert suite.as_dict()["summary"]["all_passed"]

Custom Metrics

Define domain-specific metrics with CustomValueTest:

from evidently.tests import CustomValueTest

def revenue_impact_metric(current_data, reference_data, **kwargs):
    # Custom business metric: average predicted revenue
    return current_data["predicted_revenue"].mean()

suite = TestSuite(tests=[
    CustomValueTest(
        func=revenue_impact_metric,
        title="Average predicted revenue",
        gte=1000,
        lte=10000,
    ),
])

Running in CI

Add Evidently checks to your CI pipeline with a non-zero exit code on failure:

# run_evidently_tests.py
python -c <span class="hljs-string">"
import sys, pandas as pd
from evidently.test_suite import TestSuite
from evidently.tests import TestShareOfDriftedColumns

suite = TestSuite(tests=[TestShareOfDriftedColumns(lt=0.3)])
suite.run(
    reference_data=pd.read_parquet('data/reference.parquet'),
    current_data=pd.read_parquet('data/current.parquet'),
)
result = suite.as_dict()
if not result['summary']['all_passed']:
    print('FAILED:', result['summary'])
    sys.exit(1)
print('All checks passed')
"
# .github/workflows/ml-tests.yml
- name: Run Evidently quality checks
  run: python run_evidently_tests.py

Report vs. Test Suite

Evidently has two modes:

Mode Class Output Use case
Report Report HTML/JSON metrics Exploration, dashboards
Test Suite TestSuite Pass/fail assertions CI, automated gates

Use TestSuite for automated pipelines and Report for investigation.

Key Takeaways

  • Use TestSuite for CI gates — it returns structured pass/fail results
  • Set thresholds (lt, gte) based on your model's acceptable degradation range
  • Run drift tests on every deployment, not just periodically
  • Combine Evidently with pytest for integration into existing test infrastructure

Read more