Building a Test Results Dashboard with Grafana and Your CI Data

Building a Test Results Dashboard with Grafana and Your CI Data

Most CI pipelines collect test results and then throw them away. The JUnit XML file gets archived, the pipeline moves on, and nobody looks at the trend until something catastrophically breaks.

A Grafana dashboard changes that. By routing test result data into a time-series store, you get visibility into pass rate trends, failure hotspots, and flakiness — without buying a dedicated test analytics platform.

This guide walks through building a practical test results dashboard using Grafana with either Prometheus (push-based metrics) or PostgreSQL (relational storage for detailed test history).

What You'll Build

A dashboard with four key panels:

  1. Pass rate over time — trending pass/fail ratio across builds
  2. Test execution count — total tests run per build, broken down by status
  3. Top failing tests — which specific tests fail most frequently
  4. Flakiness score — tests that flip between pass and fail across builds

Approach 1: Prometheus + Pushgateway

This approach works well when you want real-time metrics and already use Prometheus for infrastructure monitoring.

Architecture

CI Pipeline → JUnit XML → Parser Script → Pushgateway → Prometheus → Grafana

Step 1: Parse JUnit XML and Push Metrics

Write a script (Python or shell) that runs after your test suite and pushes metrics to the Prometheus Pushgateway.

#!/usr/bin/env python3
# push_test_metrics.py

import xml.etree.ElementTree as ET
import requests
import sys
import os

def parse_junit(xml_path):
    tree = ET.parse(xml_path)
    root = tree.getroot()
    
    # Handle both <testsuites> and <testsuite> root elements
    suites = root.findall('testsuite') if root.tag == 'testsuites' else [root]
    
    metrics = {'tests': 0, 'failures': 0, 'errors': 0, 'skipped': 0, 'time': 0.0}
    for suite in suites:
        metrics['tests'] += int(suite.get('tests', 0))
        metrics['failures'] += int(suite.get('failures', 0))
        metrics['errors'] += int(suite.get('errors', 0))
        metrics['skipped'] += int(suite.get('skipped', 0))
        metrics['time'] += float(suite.get('time', 0))
    
    return metrics

def push_to_gateway(metrics, gateway_url, job_name, build_id):
    labels = f'job="{job_name}",build="{build_id}"'
    payload = f"""
# HELP ci_test_total Total number of tests in this run
# TYPE ci_test_total gauge
ci_test_total{{{labels}}} {metrics['tests']}

# HELP ci_test_failures Number of failed tests
# TYPE ci_test_failures gauge
ci_test_failures{{{labels}}} {metrics['failures']}

# HELP ci_test_errors Number of test errors
# TYPE ci_test_errors gauge
ci_test_errors{{{labels}}} {metrics['errors']}

# HELP ci_test_skipped Number of skipped tests
# TYPE ci_test_skipped gauge
ci_test_skipped{{{labels}}} {metrics['skipped']}

# HELP ci_test_pass_rate Pass rate (0-1)
# TYPE ci_test_pass_rate gauge
ci_test_pass_rate{{{labels}}} {(metrics['tests'] - metrics['failures'] - metrics['errors']) / max(metrics['tests'], 1):.4f}

# HELP ci_test_duration_seconds Total test duration
# TYPE ci_test_duration_seconds gauge
ci_test_duration_seconds{{{labels}}} {metrics['time']}
"""
    url = f"{gateway_url}/metrics/job/{job_name}"
    response = requests.post(url, data=payload, 
                            headers={'Content-Type': 'text/plain'})
    response.raise_for_status()
    print(f"Pushed metrics for {metrics['tests']} tests (pass rate: {(metrics['tests'] - metrics['failures']) / max(metrics['tests'], 1):.1%})")

if __name__ == '__main__':
    xml_file = sys.argv[1] if len(sys.argv) > 1 else 'test-results.xml'
    gateway = os.getenv('PUSHGATEWAY_URL', 'http://localhost:9091')
    job = os.getenv('JOB_NAME', 'ci-tests')
    build = os.getenv('BUILD_NUMBER', 'local')
    
    metrics = parse_junit(xml_file)
    push_to_gateway(metrics, gateway, job, build)

Step 2: Add to CI Pipeline

GitHub Actions example:

- name: Run tests
  run: pytest tests/ --junitxml=test-results.xml

- name: Push test metrics to Prometheus
  env:
    PUSHGATEWAY_URL: ${{ secrets.PUSHGATEWAY_URL }}
    JOB_NAME: ${{ github.workflow }}
    BUILD_NUMBER: ${{ github.run_number }}
  run: python push_test_metrics.py test-results.xml

Step 3: Prometheus Configuration

In your prometheus.yml, add a scrape job for Pushgateway:

scrape_configs:
  - job_name: 'pushgateway'
    honor_labels: true
    static_configs:
      - targets: ['pushgateway:9091']

Approach 2: PostgreSQL Storage

This approach stores individual test case results and enables more detailed queries — which specific tests are failing, failure rate per test name, and flakiness scores.

Schema

CREATE TABLE test_runs (
    id SERIAL PRIMARY KEY,
    run_id TEXT NOT NULL,
    job_name TEXT NOT NULL,
    build_number TEXT NOT NULL,
    branch TEXT,
    commit_sha TEXT,
    started_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    duration_seconds FLOAT,
    total_tests INT,
    passed INT,
    failed INT,
    skipped INT
);

CREATE TABLE test_results (
    id SERIAL PRIMARY KEY,
    run_id TEXT NOT NULL REFERENCES test_runs(run_id),
    suite_name TEXT,
    test_name TEXT NOT NULL,
    status TEXT NOT NULL CHECK (status IN ('passed', 'failed', 'error', 'skipped')),
    duration_seconds FLOAT,
    failure_message TEXT,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_test_results_test_name ON test_results(test_name);
CREATE INDEX idx_test_results_status ON test_results(status);
CREATE INDEX idx_test_results_created_at ON test_results(created_at);

Ingestion Script

#!/usr/bin/env python3
# ingest_junit.py

import xml.etree.ElementTree as ET
import psycopg2
import uuid
import sys
import os
from datetime import datetime

def ingest(xml_path, db_url, job_name, build_number, branch=None, commit=None):
    tree = ET.parse(xml_path)
    root = tree.getroot()
    suites = root.findall('testsuite') if root.tag == 'testsuites' else [root]
    
    run_id = str(uuid.uuid4())
    total = failures = errors = skipped = 0
    test_records = []
    
    for suite in suites:
        suite_name = suite.get('name', '')
        total += int(suite.get('tests', 0))
        failures += int(suite.get('failures', 0))
        errors += int(suite.get('errors', 0))
        skipped += int(suite.get('skipped', 0))
        
        for tc in suite.findall('testcase'):
            status = 'passed'
            failure_msg = None
            
            if tc.find('failure') is not None:
                status = 'failed'
                failure_msg = tc.find('failure').get('message', '')
            elif tc.find('error') is not None:
                status = 'error'
                failure_msg = tc.find('error').get('message', '')
            elif tc.find('skipped') is not None:
                status = 'skipped'
            
            test_records.append({
                'run_id': run_id,
                'suite_name': suite_name,
                'test_name': tc.get('name', ''),
                'status': status,
                'duration': float(tc.get('time', 0)),
                'failure_message': failure_msg
            })
    
    conn = psycopg2.connect(db_url)
    cur = conn.cursor()
    
    cur.execute("""
        INSERT INTO test_runs (run_id, job_name, build_number, branch, commit_sha, total_tests, passed, failed, skipped)
        VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s)
    """, (run_id, job_name, build_number, branch, commit, total, total - failures - errors, failures + errors, skipped))
    
    for r in test_records:
        cur.execute("""
            INSERT INTO test_results (run_id, suite_name, test_name, status, duration_seconds, failure_message)
            VALUES (%s, %s, %s, %s, %s, %s)
        """, (r['run_id'], r['suite_name'], r['test_name'], r['status'], r['duration'], r['failure_message']))
    
    conn.commit()
    cur.close()
    conn.close()
    print(f"Ingested {total} tests (run_id: {run_id})")

if __name__ == '__main__':
    ingest(
        xml_path=sys.argv[1],
        db_url=os.getenv('DB_URL', 'postgresql://user:pass@localhost/testdb'),
        job_name=os.getenv('JOB_NAME', 'ci'),
        build_number=os.getenv('BUILD_NUMBER', '0'),
        branch=os.getenv('GIT_BRANCH'),
        commit=os.getenv('GIT_COMMIT')
    )

Setting Up Grafana

Install Grafana

# Docker
docker run -d -p 3000:3000 --name grafana grafana/grafana-oss

<span class="hljs-comment"># Or Helm
helm repo add grafana https://grafana.github.io/helm-charts
helm install grafana grafana/grafana

Add Data Sources

For Prometheus: Grafana → Configuration → Data Sources → Add → Prometheus → URL: http://prometheus:9090

For PostgreSQL: Grafana → Configuration → Data Sources → Add → PostgreSQL → fill in host/database/user/password

Dashboard Panels

Panel 1: Pass Rate Trend (Prometheus)

# Pass rate as percentage
(1 - (
  sum(ci_test_failures) by (job) / 
  sum(ci_test_total) by (job)
)) * 100

Visualization: Time series. Set unit to percent (0-100). Add threshold lines at 90% (orange) and 80% (red).

Panel 2: Pass Rate Trend (PostgreSQL)

SELECT
  DATE_TRUNC('day', started_at) as time,
  ROUND(AVG(passed::float / NULLIF(total_tests, 0) * 100), 2) as pass_rate
FROM test_runs
WHERE $__timeFilter(started_at)
  AND job_name = '$job_name'
GROUP BY DATE_TRUNC('day', started_at)
ORDER BY time

Panel 3: Top Failing Tests (PostgreSQL)

SELECT
  test_name,
  COUNT(*) FILTER (WHERE status = 'failed') as failures,
  COUNT(*) as total_runs,
  ROUND(COUNT(*) FILTER (WHERE status = 'failed')::float / COUNT(*) * 100, 1) as failure_rate
FROM test_results
WHERE $__timeFilter(created_at)
GROUP BY test_name
HAVING COUNT(*) FILTER (WHERE status = 'failed') > 0
ORDER BY failures DESC
LIMIT 20

Visualization: Table. Sort by failures descending. Add field overrides to color the failure_rate column red when > 20%.

Panel 4: Flakiness Score (PostgreSQL)

A flaky test changes status between consecutive runs. This query approximates flakiness by counting status transitions:

WITH ranked AS (
  SELECT
    test_name,
    status,
    created_at,
    LAG(status) OVER (PARTITION BY test_name ORDER BY created_at) as prev_status
  FROM test_results
  WHERE $__timeFilter(created_at)
),
flips AS (
  SELECT
    test_name,
    COUNT(*) as total_runs,
    COUNT(*) FILTER (WHERE status != prev_status AND prev_status IS NOT NULL) as status_changes
  FROM ranked
  GROUP BY test_name
  HAVING COUNT(*) >= 5
)
SELECT
  test_name,
  total_runs,
  status_changes,
  ROUND(status_changes::float / total_runs * 100, 1) as flakiness_score
FROM flips
WHERE status_changes > 0
ORDER BY flakiness_score DESC
LIMIT 20

Panel 5: Build Health Summary

SELECT
  build_number,
  total_tests,
  passed,
  failed,
  ROUND(passed::float / NULLIF(total_tests, 0) * 100, 1) as pass_rate,
  TO_CHAR(started_at, 'YYYY-MM-DD HH24:MI') as run_time
FROM test_runs
WHERE $__timeFilter(started_at)
  AND job_name = '$job_name'
ORDER BY started_at DESC
LIMIT 50

Adding Template Variables

Template variables let you filter the dashboard by job, branch, or time range.

In Dashboard Settings → Variables, add:

Variable: job_name

  • Type: Query
  • Query: SELECT DISTINCT job_name FROM test_runs ORDER BY job_name
  • Multi-value: true

Variable: branch

  • Type: Query
  • Query: SELECT DISTINCT branch FROM test_runs WHERE branch IS NOT NULL ORDER BY branch

Reference them in panel queries as $job_name and $branch.

Alerting

In Grafana, create an alert rule on the pass rate panel:

  1. Open the pass rate panel → Edit → Alert tab
  2. Condition: WHEN last() OF query(A) IS BELOW 85
  3. Notification channel: Slack, email, or PagerDuty
  4. Add a message: Pass rate dropped below 85% on {{ $job_name }}

This gives your team a signal when a build is failing at an unusual rate — without checking dashboards manually.

Complete Stack with Docker Compose

version: '3.8'
services:
  postgres:
    image: postgres:15
    environment:
      POSTGRES_DB: testdb
      POSTGRES_USER: grafana
      POSTGRES_PASSWORD: grafanapass
    volumes:
      - pgdata:/var/lib/postgresql/data

  grafana:
    image: grafana/grafana-oss:latest
    ports:
      - "3000:3000"
    environment:
      GF_SECURITY_ADMIN_PASSWORD: admin
    volumes:
      - grafanadata:/var/lib/grafana
    depends_on:
      - postgres

volumes:
  pgdata:
  grafanadata:

HelpMeTest and Test Dashboards

HelpMeTest includes built-in test result history and trend data in its dashboard — you get pass rate trends, run history, and test status without building your own pipeline.

For teams that already have Grafana for infrastructure monitoring, the PostgreSQL approach lets you consolidate test observability into the same tool and same alerting setup. Both approaches serve different contexts: HelpMeTest is the faster path from zero to test visibility, while a custom Grafana setup gives you full control over the data model and co-location with infrastructure metrics.

Summary

A Grafana test dashboard turns your CI data from a one-time artifact into a persistent signal. The key decisions:

  • Prometheus: fast setup, real-time metrics, best for aggregate pass rate trends
  • PostgreSQL: more setup, but enables per-test analytics, flakiness scoring, and ad-hoc queries

Start with the Prometheus approach if you're already running Prometheus. Move to PostgreSQL when you need per-test granularity. Either way, a dashboard with pass rate trends and a top-failing-tests table will immediately surface patterns that were invisible before.

Read more