Testing ML Models with Evidently AI: Data Drift and Model Quality Reports
Evidently AI is an open-source framework for evaluating and monitoring machine learning models. It generates reports and test suites that check data drift, model quality, and dataset statistics — making it a core tool for MLOps testing pipelines.
What Evidently Tests
Evidently organizes checks into three categories:
- Data quality — missing values, duplicates, type mismatches, out-of-range values
- Data drift — statistical shift between reference and current datasets
- Model performance — accuracy, precision, recall, RMSE, and custom metrics
Installing Evidently
pip install evidentlyBasic Test Suite
The TestSuite class runs a set of checks and returns pass/fail results:
import pandas as pd
from evidently.test_suite import TestSuite
from evidently.tests import (
TestNumberOfMissingValues,
TestNumberOfDuplicatedRows,
TestShareOfDriftedColumns,
TestColumnsType,
)
# Load reference (training) and current (production) data
reference = pd.read_csv("reference_data.csv")
current = pd.read_csv("current_data.csv")
suite = TestSuite(tests=[
TestNumberOfMissingValues(lt=0.05),
TestNumberOfDuplicatedRows(lt=10),
TestShareOfDriftedColumns(lt=0.3),
TestColumnsType(),
])
suite.run(reference_data=reference, current_data=current)
# Assert all tests passed
result = suite.as_dict()
assert result["summary"]["all_passed"], f"Tests failed: {result}"Data Drift Detection
Evidently uses statistical tests (KS-test, PSI, chi-squared) to detect drift per column:
from evidently.test_suite import TestSuite
from evidently.tests import (
TestValueDrift,
TestShareOfDriftedColumns,
)
suite = TestSuite(tests=[
TestValueDrift(column_name="age"),
TestValueDrift(column_name="income"),
TestShareOfDriftedColumns(lt=0.2),
])
suite.run(reference_data=reference, current_data=current)
# Save HTML report for inspection
suite.save_html("drift_report.html")
# Programmatic result
result = suite.as_dict()
for test in result["tests"]:
print(f"{test['name']}: {test['status']}")Model Quality Tests
For supervised models, pass predictions alongside features:
from evidently.test_suite import TestSuite
from evidently.tests import (
TestAccuracyScore,
TestPrecisionScore,
TestRecallScore,
TestF1Score,
)
from evidently.pipeline.column_mapping import ColumnMapping
column_mapping = ColumnMapping(
target="label",
prediction="prediction",
)
suite = TestSuite(tests=[
TestAccuracyScore(gte=0.85),
TestPrecisionScore(gte=0.80),
TestRecallScore(gte=0.75),
TestF1Score(gte=0.80),
])
suite.run(
reference_data=reference,
current_data=current,
column_mapping=column_mapping,
)
assert suite.as_dict()["summary"]["all_passed"]Integrating with pytest
Wrap Evidently test suites inside pytest for CI integration:
import pytest
import pandas as pd
from evidently.test_suite import TestSuite
from evidently.tests import TestShareOfDriftedColumns, TestNumberOfMissingValues
@pytest.fixture(scope="session")
def datasets():
reference = pd.read_csv("tests/data/reference.csv")
current = pd.read_csv("tests/data/current.csv")
return reference, current
def test_no_significant_drift(datasets):
reference, current = datasets
suite = TestSuite(tests=[
TestShareOfDriftedColumns(lt=0.3),
])
suite.run(reference_data=reference, current_data=current)
result = suite.as_dict()
assert result["summary"]["all_passed"], (
f"Drift detected: {result['summary']['failed_tests']} tests failed"
)
def test_data_quality(datasets):
reference, current = datasets
suite = TestSuite(tests=[
TestNumberOfMissingValues(lt=0.05),
])
suite.run(reference_data=reference, current_data=current)
assert suite.as_dict()["summary"]["all_passed"]Custom Metrics
Define domain-specific metrics with CustomValueTest:
from evidently.tests import CustomValueTest
def revenue_impact_metric(current_data, reference_data, **kwargs):
# Custom business metric: average predicted revenue
return current_data["predicted_revenue"].mean()
suite = TestSuite(tests=[
CustomValueTest(
func=revenue_impact_metric,
title="Average predicted revenue",
gte=1000,
lte=10000,
),
])Running in CI
Add Evidently checks to your CI pipeline with a non-zero exit code on failure:
# run_evidently_tests.py
python -c <span class="hljs-string">"
import sys, pandas as pd
from evidently.test_suite import TestSuite
from evidently.tests import TestShareOfDriftedColumns
suite = TestSuite(tests=[TestShareOfDriftedColumns(lt=0.3)])
suite.run(
reference_data=pd.read_parquet('data/reference.parquet'),
current_data=pd.read_parquet('data/current.parquet'),
)
result = suite.as_dict()
if not result['summary']['all_passed']:
print('FAILED:', result['summary'])
sys.exit(1)
print('All checks passed')
"# .github/workflows/ml-tests.yml
- name: Run Evidently quality checks
run: python run_evidently_tests.pyReport vs. Test Suite
Evidently has two modes:
| Mode | Class | Output | Use case |
|---|---|---|---|
| Report | Report |
HTML/JSON metrics | Exploration, dashboards |
| Test Suite | TestSuite |
Pass/fail assertions | CI, automated gates |
Use TestSuite for automated pipelines and Report for investigation.
Key Takeaways
- Use
TestSuitefor CI gates — it returns structured pass/fail results - Set thresholds (
lt,gte) based on your model's acceptable degradation range - Run drift tests on every deployment, not just periodically
- Combine Evidently with pytest for integration into existing test infrastructure