MLOps

Testing ML Models with Evidently AI: Data Drift and Model Quality Reports

HelpMeTest

20 May 2026 — 3 min read

Evidently AI is an open-source framework for evaluating and monitoring machine learning models. It generates reports and test suites that check data drift, model quality, and dataset statistics — making it a core tool for MLOps testing pipelines.

What Evidently Tests

Evidently organizes checks into three categories:

Data quality — missing values, duplicates, type mismatches, out-of-range values
Data drift — statistical shift between reference and current datasets
Model performance — accuracy, precision, recall, RMSE, and custom metrics

Installing Evidently

pip install evidently

Basic Test Suite

The TestSuite class runs a set of checks and returns pass/fail results:

import pandas as pd
from evidently.test_suite import TestSuite
from evidently.tests import (
    TestNumberOfMissingValues,
    TestNumberOfDuplicatedRows,
    TestShareOfDriftedColumns,
    TestColumnsType,
)

# Load reference (training) and current (production) data
reference = pd.read_csv("reference_data.csv")
current = pd.read_csv("current_data.csv")

suite = TestSuite(tests=[
    TestNumberOfMissingValues(lt=0.05),
    TestNumberOfDuplicatedRows(lt=10),
    TestShareOfDriftedColumns(lt=0.3),
    TestColumnsType(),
])

suite.run(reference_data=reference, current_data=current)

# Assert all tests passed
result = suite.as_dict()
assert result["summary"]["all_passed"], f"Tests failed: {result}"

Data Drift Detection

Evidently uses statistical tests (KS-test, PSI, chi-squared) to detect drift per column:

from evidently.test_suite import TestSuite
from evidently.tests import (
    TestValueDrift,
    TestShareOfDriftedColumns,
)

suite = TestSuite(tests=[
    TestValueDrift(column_name="age"),
    TestValueDrift(column_name="income"),
    TestShareOfDriftedColumns(lt=0.2),
])

suite.run(reference_data=reference, current_data=current)

# Save HTML report for inspection
suite.save_html("drift_report.html")

# Programmatic result
result = suite.as_dict()
for test in result["tests"]:
    print(f"{test['name']}: {test['status']}")

Model Quality Tests

For supervised models, pass predictions alongside features:

from evidently.test_suite import TestSuite
from evidently.tests import (
    TestAccuracyScore,
    TestPrecisionScore,
    TestRecallScore,
    TestF1Score,
)
from evidently.pipeline.column_mapping import ColumnMapping

column_mapping = ColumnMapping(
    target="label",
    prediction="prediction",
)

suite = TestSuite(tests=[
    TestAccuracyScore(gte=0.85),
    TestPrecisionScore(gte=0.80),
    TestRecallScore(gte=0.75),
    TestF1Score(gte=0.80),
])

suite.run(
    reference_data=reference,
    current_data=current,
    column_mapping=column_mapping,
)

assert suite.as_dict()["summary"]["all_passed"]

Integrating with pytest

Wrap Evidently test suites inside pytest for CI integration:

import pytest
import pandas as pd
from evidently.test_suite import TestSuite
from evidently.tests import TestShareOfDriftedColumns, TestNumberOfMissingValues

@pytest.fixture(scope="session")
def datasets():
    reference = pd.read_csv("tests/data/reference.csv")
    current = pd.read_csv("tests/data/current.csv")
    return reference, current

def test_no_significant_drift(datasets):
    reference, current = datasets
    suite = TestSuite(tests=[
        TestShareOfDriftedColumns(lt=0.3),
    ])
    suite.run(reference_data=reference, current_data=current)
    result = suite.as_dict()
    assert result["summary"]["all_passed"], (
        f"Drift detected: {result['summary']['failed_tests']} tests failed"
    )

def test_data_quality(datasets):
    reference, current = datasets
    suite = TestSuite(tests=[
        TestNumberOfMissingValues(lt=0.05),
    ])
    suite.run(reference_data=reference, current_data=current)
    assert suite.as_dict()["summary"]["all_passed"]

Custom Metrics

Define domain-specific metrics with CustomValueTest:

from evidently.tests import CustomValueTest

def revenue_impact_metric(current_data, reference_data, **kwargs):
    # Custom business metric: average predicted revenue
    return current_data["predicted_revenue"].mean()

suite = TestSuite(tests=[
    CustomValueTest(
        func=revenue_impact_metric,
        title="Average predicted revenue",
        gte=1000,
        lte=10000,
    ),
])

Running in CI

Add Evidently checks to your CI pipeline with a non-zero exit code on failure:

# run_evidently_tests.py
python -c <span class="hljs-string">"
import sys, pandas as pd
from evidently.test_suite import TestSuite
from evidently.tests import TestShareOfDriftedColumns

suite = TestSuite(tests=[TestShareOfDriftedColumns(lt=0.3)])
suite.run(
    reference_data=pd.read_parquet('data/reference.parquet'),
    current_data=pd.read_parquet('data/current.parquet'),
)
result = suite.as_dict()
if not result['summary']['all_passed']:
    print('FAILED:', result['summary'])
    sys.exit(1)
print('All checks passed')
"

# .github/workflows/ml-tests.yml
- name: Run Evidently quality checks
  run: python run_evidently_tests.py

Report vs. Test Suite

Evidently has two modes:

Mode	Class	Output	Use case
Report	`Report`	HTML/JSON metrics	Exploration, dashboards
Test Suite	`TestSuite`	Pass/fail assertions	CI, automated gates

Use TestSuite for automated pipelines and Report for investigation.

Key Takeaways

Use TestSuite for CI gates — it returns structured pass/fail results
Set thresholds (lt, gte) based on your model's acceptable degradation range
Run drift tests on every deployment, not just periodically
Combine Evidently with pytest for integration into existing test infrastructure