Building a Test Results Dashboard with Grafana and Your CI Data
Most CI pipelines collect test results and then throw them away. The JUnit XML file gets archived, the pipeline moves on, and nobody looks at the trend until something catastrophically breaks.
A Grafana dashboard changes that. By routing test result data into a time-series store, you get visibility into pass rate trends, failure hotspots, and flakiness — without buying a dedicated test analytics platform.
This guide walks through building a practical test results dashboard using Grafana with either Prometheus (push-based metrics) or PostgreSQL (relational storage for detailed test history).
What You'll Build
A dashboard with four key panels:
- Pass rate over time — trending pass/fail ratio across builds
- Test execution count — total tests run per build, broken down by status
- Top failing tests — which specific tests fail most frequently
- Flakiness score — tests that flip between pass and fail across builds
Approach 1: Prometheus + Pushgateway
This approach works well when you want real-time metrics and already use Prometheus for infrastructure monitoring.
Architecture
CI Pipeline → JUnit XML → Parser Script → Pushgateway → Prometheus → GrafanaStep 1: Parse JUnit XML and Push Metrics
Write a script (Python or shell) that runs after your test suite and pushes metrics to the Prometheus Pushgateway.
#!/usr/bin/env python3
# push_test_metrics.py
import xml.etree.ElementTree as ET
import requests
import sys
import os
def parse_junit(xml_path):
tree = ET.parse(xml_path)
root = tree.getroot()
# Handle both <testsuites> and <testsuite> root elements
suites = root.findall('testsuite') if root.tag == 'testsuites' else [root]
metrics = {'tests': 0, 'failures': 0, 'errors': 0, 'skipped': 0, 'time': 0.0}
for suite in suites:
metrics['tests'] += int(suite.get('tests', 0))
metrics['failures'] += int(suite.get('failures', 0))
metrics['errors'] += int(suite.get('errors', 0))
metrics['skipped'] += int(suite.get('skipped', 0))
metrics['time'] += float(suite.get('time', 0))
return metrics
def push_to_gateway(metrics, gateway_url, job_name, build_id):
labels = f'job="{job_name}",build="{build_id}"'
payload = f"""
# HELP ci_test_total Total number of tests in this run
# TYPE ci_test_total gauge
ci_test_total{{{labels}}} {metrics['tests']}
# HELP ci_test_failures Number of failed tests
# TYPE ci_test_failures gauge
ci_test_failures{{{labels}}} {metrics['failures']}
# HELP ci_test_errors Number of test errors
# TYPE ci_test_errors gauge
ci_test_errors{{{labels}}} {metrics['errors']}
# HELP ci_test_skipped Number of skipped tests
# TYPE ci_test_skipped gauge
ci_test_skipped{{{labels}}} {metrics['skipped']}
# HELP ci_test_pass_rate Pass rate (0-1)
# TYPE ci_test_pass_rate gauge
ci_test_pass_rate{{{labels}}} {(metrics['tests'] - metrics['failures'] - metrics['errors']) / max(metrics['tests'], 1):.4f}
# HELP ci_test_duration_seconds Total test duration
# TYPE ci_test_duration_seconds gauge
ci_test_duration_seconds{{{labels}}} {metrics['time']}
"""
url = f"{gateway_url}/metrics/job/{job_name}"
response = requests.post(url, data=payload,
headers={'Content-Type': 'text/plain'})
response.raise_for_status()
print(f"Pushed metrics for {metrics['tests']} tests (pass rate: {(metrics['tests'] - metrics['failures']) / max(metrics['tests'], 1):.1%})")
if __name__ == '__main__':
xml_file = sys.argv[1] if len(sys.argv) > 1 else 'test-results.xml'
gateway = os.getenv('PUSHGATEWAY_URL', 'http://localhost:9091')
job = os.getenv('JOB_NAME', 'ci-tests')
build = os.getenv('BUILD_NUMBER', 'local')
metrics = parse_junit(xml_file)
push_to_gateway(metrics, gateway, job, build)Step 2: Add to CI Pipeline
GitHub Actions example:
- name: Run tests
run: pytest tests/ --junitxml=test-results.xml
- name: Push test metrics to Prometheus
env:
PUSHGATEWAY_URL: ${{ secrets.PUSHGATEWAY_URL }}
JOB_NAME: ${{ github.workflow }}
BUILD_NUMBER: ${{ github.run_number }}
run: python push_test_metrics.py test-results.xmlStep 3: Prometheus Configuration
In your prometheus.yml, add a scrape job for Pushgateway:
scrape_configs:
- job_name: 'pushgateway'
honor_labels: true
static_configs:
- targets: ['pushgateway:9091']Approach 2: PostgreSQL Storage
This approach stores individual test case results and enables more detailed queries — which specific tests are failing, failure rate per test name, and flakiness scores.
Schema
CREATE TABLE test_runs (
id SERIAL PRIMARY KEY,
run_id TEXT NOT NULL,
job_name TEXT NOT NULL,
build_number TEXT NOT NULL,
branch TEXT,
commit_sha TEXT,
started_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
duration_seconds FLOAT,
total_tests INT,
passed INT,
failed INT,
skipped INT
);
CREATE TABLE test_results (
id SERIAL PRIMARY KEY,
run_id TEXT NOT NULL REFERENCES test_runs(run_id),
suite_name TEXT,
test_name TEXT NOT NULL,
status TEXT NOT NULL CHECK (status IN ('passed', 'failed', 'error', 'skipped')),
duration_seconds FLOAT,
failure_message TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_test_results_test_name ON test_results(test_name);
CREATE INDEX idx_test_results_status ON test_results(status);
CREATE INDEX idx_test_results_created_at ON test_results(created_at);Ingestion Script
#!/usr/bin/env python3
# ingest_junit.py
import xml.etree.ElementTree as ET
import psycopg2
import uuid
import sys
import os
from datetime import datetime
def ingest(xml_path, db_url, job_name, build_number, branch=None, commit=None):
tree = ET.parse(xml_path)
root = tree.getroot()
suites = root.findall('testsuite') if root.tag == 'testsuites' else [root]
run_id = str(uuid.uuid4())
total = failures = errors = skipped = 0
test_records = []
for suite in suites:
suite_name = suite.get('name', '')
total += int(suite.get('tests', 0))
failures += int(suite.get('failures', 0))
errors += int(suite.get('errors', 0))
skipped += int(suite.get('skipped', 0))
for tc in suite.findall('testcase'):
status = 'passed'
failure_msg = None
if tc.find('failure') is not None:
status = 'failed'
failure_msg = tc.find('failure').get('message', '')
elif tc.find('error') is not None:
status = 'error'
failure_msg = tc.find('error').get('message', '')
elif tc.find('skipped') is not None:
status = 'skipped'
test_records.append({
'run_id': run_id,
'suite_name': suite_name,
'test_name': tc.get('name', ''),
'status': status,
'duration': float(tc.get('time', 0)),
'failure_message': failure_msg
})
conn = psycopg2.connect(db_url)
cur = conn.cursor()
cur.execute("""
INSERT INTO test_runs (run_id, job_name, build_number, branch, commit_sha, total_tests, passed, failed, skipped)
VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s)
""", (run_id, job_name, build_number, branch, commit, total, total - failures - errors, failures + errors, skipped))
for r in test_records:
cur.execute("""
INSERT INTO test_results (run_id, suite_name, test_name, status, duration_seconds, failure_message)
VALUES (%s, %s, %s, %s, %s, %s)
""", (r['run_id'], r['suite_name'], r['test_name'], r['status'], r['duration'], r['failure_message']))
conn.commit()
cur.close()
conn.close()
print(f"Ingested {total} tests (run_id: {run_id})")
if __name__ == '__main__':
ingest(
xml_path=sys.argv[1],
db_url=os.getenv('DB_URL', 'postgresql://user:pass@localhost/testdb'),
job_name=os.getenv('JOB_NAME', 'ci'),
build_number=os.getenv('BUILD_NUMBER', '0'),
branch=os.getenv('GIT_BRANCH'),
commit=os.getenv('GIT_COMMIT')
)Setting Up Grafana
Install Grafana
# Docker
docker run -d -p 3000:3000 --name grafana grafana/grafana-oss
<span class="hljs-comment"># Or Helm
helm repo add grafana https://grafana.github.io/helm-charts
helm install grafana grafana/grafanaAdd Data Sources
For Prometheus: Grafana → Configuration → Data Sources → Add → Prometheus → URL: http://prometheus:9090
For PostgreSQL: Grafana → Configuration → Data Sources → Add → PostgreSQL → fill in host/database/user/password
Dashboard Panels
Panel 1: Pass Rate Trend (Prometheus)
# Pass rate as percentage
(1 - (
sum(ci_test_failures) by (job) /
sum(ci_test_total) by (job)
)) * 100Visualization: Time series. Set unit to percent (0-100). Add threshold lines at 90% (orange) and 80% (red).
Panel 2: Pass Rate Trend (PostgreSQL)
SELECT
DATE_TRUNC('day', started_at) as time,
ROUND(AVG(passed::float / NULLIF(total_tests, 0) * 100), 2) as pass_rate
FROM test_runs
WHERE $__timeFilter(started_at)
AND job_name = '$job_name'
GROUP BY DATE_TRUNC('day', started_at)
ORDER BY timePanel 3: Top Failing Tests (PostgreSQL)
SELECT
test_name,
COUNT(*) FILTER (WHERE status = 'failed') as failures,
COUNT(*) as total_runs,
ROUND(COUNT(*) FILTER (WHERE status = 'failed')::float / COUNT(*) * 100, 1) as failure_rate
FROM test_results
WHERE $__timeFilter(created_at)
GROUP BY test_name
HAVING COUNT(*) FILTER (WHERE status = 'failed') > 0
ORDER BY failures DESC
LIMIT 20Visualization: Table. Sort by failures descending. Add field overrides to color the failure_rate column red when > 20%.
Panel 4: Flakiness Score (PostgreSQL)
A flaky test changes status between consecutive runs. This query approximates flakiness by counting status transitions:
WITH ranked AS (
SELECT
test_name,
status,
created_at,
LAG(status) OVER (PARTITION BY test_name ORDER BY created_at) as prev_status
FROM test_results
WHERE $__timeFilter(created_at)
),
flips AS (
SELECT
test_name,
COUNT(*) as total_runs,
COUNT(*) FILTER (WHERE status != prev_status AND prev_status IS NOT NULL) as status_changes
FROM ranked
GROUP BY test_name
HAVING COUNT(*) >= 5
)
SELECT
test_name,
total_runs,
status_changes,
ROUND(status_changes::float / total_runs * 100, 1) as flakiness_score
FROM flips
WHERE status_changes > 0
ORDER BY flakiness_score DESC
LIMIT 20Panel 5: Build Health Summary
SELECT
build_number,
total_tests,
passed,
failed,
ROUND(passed::float / NULLIF(total_tests, 0) * 100, 1) as pass_rate,
TO_CHAR(started_at, 'YYYY-MM-DD HH24:MI') as run_time
FROM test_runs
WHERE $__timeFilter(started_at)
AND job_name = '$job_name'
ORDER BY started_at DESC
LIMIT 50Adding Template Variables
Template variables let you filter the dashboard by job, branch, or time range.
In Dashboard Settings → Variables, add:
Variable: job_name
- Type: Query
- Query:
SELECT DISTINCT job_name FROM test_runs ORDER BY job_name - Multi-value: true
Variable: branch
- Type: Query
- Query:
SELECT DISTINCT branch FROM test_runs WHERE branch IS NOT NULL ORDER BY branch
Reference them in panel queries as $job_name and $branch.
Alerting
In Grafana, create an alert rule on the pass rate panel:
- Open the pass rate panel → Edit → Alert tab
- Condition:
WHEN last() OF query(A) IS BELOW 85 - Notification channel: Slack, email, or PagerDuty
- Add a message:
Pass rate dropped below 85% on {{ $job_name }}
This gives your team a signal when a build is failing at an unusual rate — without checking dashboards manually.
Complete Stack with Docker Compose
version: '3.8'
services:
postgres:
image: postgres:15
environment:
POSTGRES_DB: testdb
POSTGRES_USER: grafana
POSTGRES_PASSWORD: grafanapass
volumes:
- pgdata:/var/lib/postgresql/data
grafana:
image: grafana/grafana-oss:latest
ports:
- "3000:3000"
environment:
GF_SECURITY_ADMIN_PASSWORD: admin
volumes:
- grafanadata:/var/lib/grafana
depends_on:
- postgres
volumes:
pgdata:
grafanadata:HelpMeTest and Test Dashboards
HelpMeTest includes built-in test result history and trend data in its dashboard — you get pass rate trends, run history, and test status without building your own pipeline.
For teams that already have Grafana for infrastructure monitoring, the PostgreSQL approach lets you consolidate test observability into the same tool and same alerting setup. Both approaches serve different contexts: HelpMeTest is the faster path from zero to test visibility, while a custom Grafana setup gives you full control over the data model and co-location with infrastructure metrics.
Summary
A Grafana test dashboard turns your CI data from a one-time artifact into a persistent signal. The key decisions:
- Prometheus: fast setup, real-time metrics, best for aggregate pass rate trends
- PostgreSQL: more setup, but enables per-test analytics, flakiness scoring, and ad-hoc queries
Start with the Prometheus approach if you're already running Prometheus. Move to PostgreSQL when you need per-test granularity. Either way, a dashboard with pass rate trends and a top-failing-tests table will immediately surface patterns that were invisible before.