Observability-Driven Testing: Shifting Left with Traces and Metrics

Observability-Driven Testing: Shifting Left with Traces and Metrics

The idea: if your system emits traces and metrics in production, those same signals can power your tests in development. Observability-driven testing (ODT) treats telemetry data as first-class test assertions.

The Problem with Classic Integration Tests

A typical integration test:

test('order placement succeeds', async () => {
  const res = await fetch('/orders', { method: 'POST', body: JSON.stringify(order) });
  expect(res.status).toBe(201);
  const body = await res.json();
  expect(body.orderId).toBeDefined();
});

This test verifies the interface — the HTTP response. It says nothing about:

  • Did inventory actually reserve the stock?
  • Did payment call the right gateway?
  • Were database writes transactional?
  • Is this 100ms slower than last week?

Observability-driven testing adds those assertions.

What ODT Asserts On

Signal Classic Test ODT Adds
HTTP response ✓ Status code, body
Spans ✓ All services participated, no errors
Metrics ✓ Counters incremented, latency within SLO
Logs ✓ No ERROR lines, expected audit events emitted

The ODT Testing Loop

1. Instrument your app (OpenTelemetry SDK)
2. Run a test collector in CI (Jaeger, Prometheus, Loki)
3. Execute the test scenario (HTTP, browser, CLI)
4. Wait for async export
5. Assert on telemetry — not just the API response
6. Fail the test if signals are wrong

Setting Up ODT in CI

# .github/workflows/integration.yml
services:
  jaeger:
    image: jaegertracing/all-in-one:latest
    env:
      COLLECTOR_OTLP_ENABLED: "true"
    ports:
      - 16686:16686
      - 4318:4318

  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./prometheus-test.yml:/etc/prometheus/prometheus.yml
    ports:
      - 9090:9090
# prometheus-test.yml
global:
  scrape_interval: 5s

scrape_configs:
  - job_name: app
    static_configs:
      - targets: ['app:9464']

Trace Assertions

const { jaeger } = require('./test-utils/jaeger');

test('order flow — trace must show all services, no errors', async () => {
  const startMs = Date.now();
  
  await placeOrder({ sku: 'A', qty: 1 });
  await sleep(800);
  
  const trace = await jaeger.findLatestTrace('order-service', 'POST /orders', startMs);
  const services = trace.serviceNames();
  
  expect(services).toEqual(
    expect.arrayContaining(['order-service', 'inventory-service', 'payment-service'])
  );
  expect(trace.errorSpans()).toHaveLength(0);
  expect(trace.rootSpanDurationMs()).toBeLessThan(500);
});

Metrics Assertions with Prometheus

const { prometheus } = require('./test-utils/prometheus');

test('successful order increments order counter', async () => {
  const before = await prometheus.queryInstant('orders_total{status="success"}');
  
  await placeOrder({ sku: 'A', qty: 1 });
  await sleep(1000); // allow scrape interval
  
  const after = await prometheus.queryInstant('orders_total{status="success"}');
  
  expect(after - before).toBe(1);
});

test('payment failure increments failure counter', async () => {
  mockPayment.rejectNext('declined');
  
  const before = await prometheus.queryInstant('payment_failures_total');
  await placeOrder({ sku: 'A', qty: 1 });
  await sleep(1000);
  const after = await prometheus.queryInstant('payment_failures_total');
  
  expect(after - before).toBe(1);
});

A simple Prometheus query helper:

// test-utils/prometheus.js
const PROM_API = 'http://localhost:9090/api/v1';

async function queryInstant(promql) {
  const url = `${PROM_API}/query?query=${encodeURIComponent(promql)}`;
  const res = await fetch(url);
  const json = await res.json();
  const result = json.data.result[0];
  return result ? parseFloat(result.value[1]) : 0;
}

module.exports = { prometheus: { queryInstant } };

Log Assertions

const { loki } = require('./test-utils/loki');

test('order audit event is logged', async () => {
  const startMs = Date.now();
  await placeOrder({ userId: 'u1', sku: 'A', qty: 1 });
  await sleep(500);
  
  const logs = await loki.queryRange(
    `{service="order-service"} |= "ORDER_PLACED" | json`,
    startMs,
    Date.now()
  );
  
  expect(logs).toHaveLength(1);
  expect(logs[0].fields.userId).toBe('u1');
  expect(logs[0].fields.sku).toBe('A');
});

test('no ERROR logs on successful order', async () => {
  const startMs = Date.now();
  await placeOrder({ userId: 'u1', sku: 'A', qty: 1 });
  await sleep(500);
  
  const errorLogs = await loki.queryRange(
    `{service="order-service"} |= "ERROR"`,
    startMs,
    Date.now()
  );
  
  expect(errorLogs).toHaveLength(0);
});

ODT for Performance Regression Detection

This is where ODT pays for itself: catching slowdowns before production.

const LATENCY_SLO_MS = {
  'order-service': 200,
  'inventory-service': 50,
  'payment-service': 300,
};

test('all services within latency SLO', async () => {
  const startMs = Date.now();
  await placeOrder({ sku: 'A', qty: 1 });
  await sleep(800);
  
  const trace = await jaeger.findLatestTrace('order-service', 'POST /orders', startMs);
  const spans = trace.spans();
  
  for (const [service, sloMs] of Object.entries(LATENCY_SLO_MS)) {
    const serviceSpans = spans.filter(s => s.service === service);
    serviceSpans.forEach(span => {
      expect(span.durationMs).toBeLessThan(sloMs);
    });
  }
});

Run this test in CI on every PR. When a PR introduces a slow database query, this test catches it before merge.

ODT for Contract Compliance

Verify that semantic conventions are respected:

test('HTTP spans follow OTel semantic conventions', async () => {
  const startMs = Date.now();
  await placeOrder({ sku: 'A', qty: 1 });
  await sleep(800);
  
  const trace = await jaeger.findLatestTrace('order-service', 'POST /orders', startMs);
  const httpSpans = trace.spans().filter(s => s.tags['http.method']);
  
  httpSpans.forEach(span => {
    // Semantic convention: http spans must have these attributes
    expect(span.tags['http.method']).toBeDefined();
    expect(span.tags['http.status_code']).toBeDefined();
    expect(span.tags['http.url'] || span.tags['http.route']).toBeDefined();
    expect(span.tags['net.peer.name'] || span.tags['server.address']).toBeDefined();
  });
});

Structuring ODT Tests

Separate ODT assertions into their own describe block:

describe('order placement — behavioral', () => {
  test('returns 201 with orderId', async () => { /* ... */ });
  test('returns 400 for invalid sku', async () => { /* ... */ });
});

describe('order placement — observability', () => {
  test('emits complete trace across all services', async () => { /* ... */ });
  test('increments order counter', async () => { /* ... */ });
  test('logs ORDER_PLACED audit event', async () => { /* ... */ });
  test('all spans within latency SLO', async () => { /* ... */ });
});

This keeps classical and observability assertions readable and independently runnable.

Running ODT in HelpMeTest

HelpMeTest E2E scenarios trigger real traces. By pointing HelpMeTest's test environment at your Jaeger + Prometheus stack, you can run behavioral scenarios and telemetry assertions in the same CI pipeline. This validates both "the user can place an order" and "the distributed execution behind that order is correct and fast."

Summary

Observability-driven testing shifts left the signals you'd normally only see in production:

  • Trace assertions — all services participated, no error spans, spans within SLO
  • Metrics assertions — counters incremented correctly, histograms within bounds
  • Log assertions — audit events emitted, no unexpected ERROR lines

The tooling is lightweight: one Jaeger container, one Prometheus container, and a handful of query helper functions. The payoff is enormous: distributed failures caught in CI, before production, before customers.

Read more