Testing Prometheus Alerting Rules: promtool, Unit Tests, and Alert Validation

Testing Prometheus Alerting Rules: promtool, Unit Tests, and Alert Validation

Prometheus alerting rules can fail silently for years: a typo in a PromQL expression, a label mismatch in Alertmanager routing, or a threshold that never actually fires. promtool provides a native unit testing framework for alert rules—YAML-defined tests that validate expressions against synthetic time-series data without running a live Prometheus instance.

Key Takeaways

Alert rules are code. Test them like code. promtool test rules runs YAML-defined unit tests against your alerting rules without a live Prometheus instance—fits perfectly in CI.

Test both the firing and the non-firing conditions. An alert that fires when it shouldn't is alert fatigue. An alert that never fires when it should is an incident you missed.

Test Alertmanager routing separately. A correct alert rule is useless if Alertmanager routes it to the wrong receiver or silences it with an accidental inhibition rule.

Why Alert Rules Need Tests

Prometheus alerting rules are evaluated continuously against live time-series data. But the rules themselves are just PromQL expressions in YAML files—they are code. And like all code, they break.

Common failure modes:

  • A label name changes in the exporter (e.g., job becomes service) and the alert expression silently stops matching any series
  • A threshold is set in the wrong unit (milliseconds instead of seconds)
  • A for duration is too short, causing the alert to fire on transient spikes and page the team at 3am for nothing
  • An alert is added but the Alertmanager routing tree doesn't have a matching route, so it goes to a catch-all that nobody watches

None of these produce an error. Prometheus happily evaluates the broken expression and either never fires or fires constantly. The promtool test rules command catches this before it reaches production.

The promtool Test Format

promtool ships with Prometheus. The test command takes a YAML file that specifies:

  1. The alerting rule file(s) to load
  2. Synthetic time-series input data
  3. The expected alert state at specific points in time

Here is a minimal example for a high error rate alert:

# rules/api-alerts.yaml
groups:
  - name: api
    rules:
      - alert: HighErrorRate
        expr: |
          sum(rate(http_requests_total{status=~"5.."}[5m]))
          /
          sum(rate(http_requests_total[5m])) > 0.05
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "High error rate on {{ $labels.job }}"
          runbook: "https://runbooks.example.com/high-error-rate"
# tests/test-api-alerts.yaml
rule_files:
  - ../rules/api-alerts.yaml

evaluation_interval: 1m

tests:
  - name: HighErrorRate fires when error rate exceeds 5%
    interval: 1m
    input_series:
      - series: 'http_requests_total{job="api", status="200"}'
        values: "100 100 100 100 100 100"
      - series: 'http_requests_total{job="api", status="500"}'
        values: "0 2 4 6 8 10"  # increasing errors

    alert_rule_test:
      - eval_time: 3m
        alertname: HighErrorRate
        exp_alerts: []  # not yet firing — under 2m `for` duration

      - eval_time: 5m
        alertname: HighErrorRate
        exp_alerts:
          - exp_labels:
              severity: critical
            exp_annotations:
              summary: "High error rate on api"

  - name: HighErrorRate does not fire on low error rate
    interval: 1m
    input_series:
      - series: 'http_requests_total{job="api", status="200"}'
        values: "1000 1000 1000 1000 1000 1000"
      - series: 'http_requests_total{job="api", status="500"}'
        values: "1 1 1 1 1 1"  # 0.1% error rate

    alert_rule_test:
      - eval_time: 5m
        alertname: HighErrorRate
        exp_alerts: []

Run it:

promtool test rules tests/test-api-alerts.yaml

Output on success:

Unit Testing:  tests/test-api-alerts.yaml
  SUCCESS

On failure, promtool shows the expected vs actual alert state, the expression value at each evaluation step, and which labels didn't match.

Writing Unit Tests for Alert Expressions

Anatomy of input_series Values

The values field uses a compact encoding:

"0 1 2 3 4 5"       # one value per interval
"0+1x5"             # start=0, increment=1, repeat=5 times → 0 1 2 3 4 5
"0 1 _ 3 4"         # _ means "no sample" (gap in data)
"stale"             # explicitly set to stale marker

For counter metrics (like http_requests_total), use cumulative values—the rate() function computes the per-second increase:

input_series:
  - series: 'http_requests_total{job="api", status="500"}'
    values: "0+10x10"  # counter goes 0, 10, 20, ... 100
                       # rate() over 5m window ≈ 10/60 per second

Testing the for Duration

The for field means an alert must be continuously firing for that duration before it becomes FIRING. Test both sides:

tests:
  - name: Alert does not fire before for-duration elapses
    interval: 1m
    input_series:
      - series: 'up{job="my-service", instance="10.0.0.1:9090"}'
        values: "0 0 0 0 0"  # down from t=0

    alert_rule_test:
      # Alert rule has `for: 3m`
      - eval_time: 2m
        alertname: InstanceDown
        exp_alerts: []  # PENDING, not FIRING yet

      - eval_time: 4m
        alertname: InstanceDown
        exp_alerts:
          - exp_labels:
              job: my-service
              instance: "10.0.0.1:9090"
              severity: warning

Testing Multi-Label Alerts

When alerts can fire for multiple label combinations, test them explicitly:

tests:
  - name: HighMemoryUsage fires per pod
    interval: 1m
    input_series:
      - series: 'container_memory_usage_bytes{pod="api-1", namespace="production"}'
        values: "0+1073741824x6"  # 1GB increments
      - series: 'container_memory_usage_bytes{pod="api-2", namespace="production"}'
        values: "500000000+0x6"   # stays at 500MB — should not fire

    alert_rule_test:
      - eval_time: 5m
        alertname: HighMemoryUsage
        exp_alerts:
          - exp_labels:
              pod: api-1
              namespace: production
              severity: warning
        # api-2 should NOT appear in exp_alerts

Testing PromQL Queries with promtool query

promtool query lets you run PromQL against a live Prometheus instance from the CLI—useful for verifying your queries work correctly against real data before encoding them as alert rules:

# Instant query
promtool query instant http://prometheus:9090 \
  <span class="hljs-string">'sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))'

<span class="hljs-comment"># Range query (last 30 minutes, 1m step)
promtool query range \
  --start=$(<span class="hljs-built_in">date -d <span class="hljs-string">'-30 minutes' +%s) \
  --end=$(<span class="hljs-built_in">date +%s) \
  --step=60 \
  http://prometheus:9090 \
  <span class="hljs-string">'histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))'

Use this during rule development to validate expressions return sensible values before writing unit tests.

Alertmanager Routing Tests

An alert that fires correctly but routes to the wrong receiver is useless. Alertmanager has its own test tool: amtool check-config and amtool config routes test.

First, validate your Alertmanager config parses correctly:

amtool check-config alertmanager.yml

Then test routing for specific alert label sets:

# Test which receiver handles a critical API alert
amtool config routes <span class="hljs-built_in">test \
  --config.file=alertmanager.yml \
  severity=critical \
  team=api

<span class="hljs-comment"># Expected output:
<span class="hljs-comment"># Receiver: pagerduty-api-critical
<span class="hljs-comment"># Repeat interval: 4h

Write a shell-based test suite for your routing rules:

#!/bin/bash
<span class="hljs-comment"># tests/test-alertmanager-routing.sh
<span class="hljs-built_in">set -e

AMTOOL=<span class="hljs-string">"amtool config routes test --config.file=alertmanager.yml"

<span class="hljs-function">assert_receiver() {
  <span class="hljs-built_in">local labels=<span class="hljs-string">"$1"
  <span class="hljs-built_in">local expected_receiver=<span class="hljs-string">"$2"
  <span class="hljs-built_in">local actual
  actual=$(<span class="hljs-variable">$AMTOOL <span class="hljs-variable">$labels <span class="hljs-pipe">| grep <span class="hljs-string">"Receiver:" <span class="hljs-pipe">| awk <span class="hljs-string">'{print $2}')

  <span class="hljs-keyword">if [ <span class="hljs-string">"$actual" = <span class="hljs-string">"$expected_receiver" ]; <span class="hljs-keyword">then
    <span class="hljs-built_in">echo <span class="hljs-string">"PASS: $labels<span class="hljs-variable">$expected_receiver"
  <span class="hljs-keyword">else
    <span class="hljs-built_in">echo <span class="hljs-string">"FAIL: $labels → expected <span class="hljs-variable">$expected_receiver, got <span class="hljs-variable">$actual"
    <span class="hljs-built_in">exit 1
  <span class="hljs-keyword">fi
}

<span class="hljs-comment"># Critical API alerts go to PagerDuty
assert_receiver <span class="hljs-string">"severity=critical team=api" <span class="hljs-string">"pagerduty-api-critical"

<span class="hljs-comment"># Warning alerts go to Slack
assert_receiver <span class="hljs-string">"severity=warning team=api" <span class="hljs-string">"slack-api-warnings"

<span class="hljs-comment"># Database critical alerts go to DB team PagerDuty
assert_receiver <span class="hljs-string">"severity=critical team=database" <span class="hljs-string">"pagerduty-db-critical"

<span class="hljs-comment"># Watchdog (always-on) alert goes to deadman switch
assert_receiver <span class="hljs-string">"alertname=Watchdog" <span class="hljs-string">"deadman-sns"

<span class="hljs-built_in">echo <span class="hljs-string">"All routing tests passed."

Validating Alert Labels and Annotations

Alert labels must be consistent—they are how Alertmanager routes and deduplicates. Test them programmatically:

#!/usr/bin/env python3
# tests/validate_alert_rules.py
import yaml, sys, pathlib, re

REQUIRED_LABELS = {'severity'}
SEVERITY_VALUES = {'critical', 'warning', 'info'}
REQUIRED_ANNOTATIONS = {'summary', 'runbook'}
RUNBOOK_PATTERN = re.compile(r'^https://runbooks\.example\.com/')

errors = []

for rule_file in pathlib.Path('rules').glob('*.yaml'):
    data = yaml.safe_load(rule_file.read_text())
    for group in data.get('groups', []):
        for rule in group.get('rules', []):
            if 'alert' not in rule:
                continue  # recording rule, skip

            name = rule['alert']
            labels = rule.get('labels', {})
            annotations = rule.get('annotations', {})

            for required in REQUIRED_LABELS:
                if required not in labels:
                    errors.append(f"{rule_file}:{name}: missing required label '{required}'")

            if labels.get('severity') not in SEVERITY_VALUES:
                errors.append(
                    f"{rule_file}:{name}: severity '{labels.get('severity')}' "
                    f"not in {SEVERITY_VALUES}"
                )

            for required in REQUIRED_ANNOTATIONS:
                if required not in annotations:
                    errors.append(f"{rule_file}:{name}: missing annotation '{required}'")

            runbook = annotations.get('runbook', '')
            if runbook and not RUNBOOK_PATTERN.match(runbook):
                errors.append(f"{rule_file}:{name}: runbook URL format incorrect: {runbook}")

if errors:
    for e in errors:
        print(f"ERROR: {e}", file=sys.stderr)
    sys.exit(1)

print(f"Validated all alert rules in rules/ — no issues found.")

CI Integration

Add all tests to CI so alert regressions are caught before rules are applied to the cluster:

# .github/workflows/prometheus-rules.yaml
name: Validate Prometheus Rules

on:
  pull_request:
    paths:
      - 'rules/**'
      - 'tests/test-*.yaml'
      - 'alertmanager.yml'

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Download promtool
        run: |
          wget -q https://github.com/prometheus/prometheus/releases/download/v2.51.0/prometheus-2.51.0.linux-amd64.tar.gz
          tar xzf prometheus-2.51.0.linux-amd64.tar.gz
          sudo mv prometheus-2.51.0.linux-amd64/promtool /usr/local/bin/

      - name: Validate rule syntax
        run: promtool check rules rules/*.yaml

      - name: Run alert unit tests
        run: |
          for test_file in tests/test-*.yaml; do
            echo "Running: $test_file"
            promtool test rules "$test_file"
          done

      - name: Validate alert metadata
        run: python3 tests/validate_alert_rules.py

      - name: Install amtool
        run: |
          wget -q https://github.com/prometheus/alertmanager/releases/download/v0.27.0/alertmanager-0.27.0.linux-amd64.tar.gz
          tar xzf alertmanager-0.27.0.linux-amd64.tar.gz
          sudo mv alertmanager-0.27.0.linux-amd64/amtool /usr/local/bin/

      - name: Validate Alertmanager config
        run: amtool check-config alertmanager.yml

      - name: Test Alertmanager routing
        run: bash tests/test-alertmanager-routing.sh

Every PR that touches alerting rules or Alertmanager config must pass these checks before merge. No more "I thought that route matched" postmortems.

Common Pitfalls

Rate windows shorter than scrape interval. If your scrape interval is 30s and you write rate(metric[15s]), the rate is always zero. Test with realistic values.

Missing by clause on alerts that should be per-instance. sum(rate(...)) aggregates everything—no label to route on. Test that your exp_labels in unit tests actually have the labels Alertmanager needs.

for duration shorter than twice the scrape interval. Prometheus may not evaluate the expression twice within the for window, preventing the alert from ever becoming FIRING. Test with evaluation intervals that reflect your real setup.


HelpMeTest can monitor your observability stack automatically — sign up free

Read more