Testing with Datadog: APM Traces, Metric Gates, and Synthetic Monitoring

Testing with Datadog: APM Traces, Metric Gates, and Synthetic Monitoring

Datadog is not just a dashboard—it's a test surface. This guide covers asserting APM trace structure in tests, using Datadog Synthetics as CI quality gates, enforcing p99 latency thresholds before deployment, and mocking the Datadog agent so unit tests stay fast and isolated.

Key Takeaways

APM traces are assertions. Treat span structure, service names, and error tags as first-class test assertions—not just things to observe after the fact.

Synthetic monitors can gate deploys. Datadog Synthetics has an API that lets you block a CI pipeline until a browser or API test passes against your staging environment.

Metric-based quality gates prevent latency regressions. Query Datadog metrics before merging—if p99 latency exceeds your SLO, the deploy doesn't ship.

Why Datadog Belongs in Your Test Strategy

Most teams treat Datadog as the place you go after something breaks. You open it, you find the problem, you fix it. But that reactive loop has a flaw: by the time you're in Datadog, the breakage has already happened in production.

The better model is to bring Datadog into the pre-deploy phase. Synthetic monitors run against staging. APM trace structure gets asserted in integration tests. Metric gates query the API before you merge. Datadog becomes part of the definition of "done," not the postmortem tool.

This guide covers exactly how to do that across four areas: APM trace assertions, Synthetics as CI gates, metric-based quality gates, and unit-test-friendly mocking of the Datadog agent.

Asserting APM Trace Structure in Tests

When your service instruments with dd-trace, every request produces spans. Those spans have attributes—service name, resource, error flag, HTTP status, custom tags. You can assert all of this in integration tests.

The trick is running the Datadog agent in your test environment (or using a stub endpoint) and capturing what the tracer sends.

Here is a Node.js example using dd-trace with a test transport that captures spans in memory:

// test/helpers/trace-capture.js
const { Tracer } = require('dd-trace/packages/dd-trace/src/opentracing/tracer');

class InMemoryExporter {
  constructor() { this.spans = []; }
  export(spans) { this.spans.push(...spans); }
  clear() { this.spans = []; }
}

const exporter = new InMemoryExporter();

// In your test setup, configure dd-trace with a no-op writer
// and intercept via the exporter above.
module.exports = { exporter };

A cleaner approach for most setups is to run a lightweight Datadog agent stub that accepts traces over HTTP and exposes them for inspection:

# test/fixtures/dd_agent_stub.py
from http.server import HTTPServer, BaseHTTPRequestHandler
import json, threading

received_traces = []

class DDAgentHandler(BaseHTTPRequestHandler):
    def do_PUT(self):
        length = int(self.headers['Content-Length'])
        body = self.rfile.read(length)
        traces = json.loads(body)
        received_traces.extend(traces)
        self.send_response(200)
        self.end_headers()
        self.wfile.write(b'{}')

    def log_message(self, *args):
        pass  # silence request logs in test output

def start_stub(port=8126):
    server = HTTPServer(('127.0.0.1', port), DDAgentHandler)
    thread = threading.Thread(target=server.serve_forever)
    thread.daemon = True
    thread.start()
    return server

With this stub running, point DD_AGENT_HOST=127.0.0.1 and DD_TRACE_AGENT_PORT=8126 in your test environment. Now your service sends real traces to the stub, and you assert against received_traces:

def test_checkout_span_has_correct_service_and_resource():
    received_traces.clear()
    client.post('/checkout', json={'cart_id': 'abc123'})

    spans = [s for trace in received_traces for s in trace]
    checkout_span = next(s for s in spans if s['resource'] == 'POST /checkout')

    assert checkout_span['service'] == 'checkout-api'
    assert checkout_span['meta'].get('http.status_code') == '200'
    assert checkout_span['error'] == 0
    assert checkout_span['meta'].get('cart.id') == 'abc123'

This catches regressions where instrumentation silently breaks—service names get renamed during a refactor, custom tags disappear, error flags get set incorrectly.

Using Datadog Synthetics as CI Gates

Datadog Synthetics lets you define browser and API tests through the Datadog UI or as code via Terraform or the API. These tests run against real URLs—ideal for staging environments.

The key CI integration point is the datadog-ci CLI:

npm install -g @datadog/datadog-ci

# Run a specific synthetic test and block until it passes or times out
datadog-ci synthetics run-tests \
  --public-id abc-123-xyz \
  --config ./datadog-ci.json \
  --<span class="hljs-built_in">timeout 120

Your datadog-ci.json configuration:

{
  "apiKey": "${DD_API_KEY}",
  "appKey": "${DD_APP_KEY}",
  "datadogSite": "datadoghq.com",
  "failOnCriticalErrors": true,
  "failOnTimeout": true,
  "locations": ["aws:us-east-1"]
}

In your GitHub Actions workflow, this becomes a blocking step after deploy to staging:

- name: Deploy to staging
  run: ./scripts/deploy.sh staging

- name: Run Datadog Synthetic gates
  env:
    DD_API_KEY: ${{ secrets.DD_API_KEY }}
    DD_APP_KEY: ${{ secrets.DD_APP_KEY }}
  run: |
    datadog-ci synthetics run-tests \
      --public-id ${{ vars.DD_SYNTHETIC_CHECKOUT_TEST_ID }} \
      --public-id ${{ vars.DD_SYNTHETIC_AUTH_TEST_ID }} \
      --timeout 180

- name: Deploy to production
  if: success()
  run: ./scripts/deploy.sh production

If either synthetic test fails, the deploy to production never runs. You have effectively made your Datadog monitor a mandatory CI check.

For defining synthetics as code (so they live in your repository), use the Datadog Terraform provider or the synthetics-test.json format:

{
  "config": {
    "request": {
      "method": "POST",
      "url": "https://staging.example.com/api/checkout",
      "body": "{\"cart_id\": \"test-123\"}",
      "headers": { "Content-Type": "application/json" }
    },
    "assertions": [
      { "type": "statusCode", "operator": "is", "target": 200 },
      { "type": "responseTime", "operator": "lessThan", "target": 500 },
      { "type": "body", "operator": "contains", "target": "order_id" }
    ]
  },
  "name": "Checkout API - Synthetic Gate",
  "type": "api",
  "status": "live"
}

Metric-Based Quality Gates: p99 Latency Before Deploy

Beyond synthetics, you can query live Datadog metrics before allowing a deploy. This is useful for p99 latency SLOs, error rate thresholds, and saturation checks.

The Datadog Metrics API supports time-series queries. Here is a Python script you can run as a CI step:

#!/usr/bin/env python3
# scripts/check_metric_gates.py
import os, sys, time
from datadog_api_client import ApiClient, Configuration
from datadog_api_client.v1.api.metrics_api import MetricsApi

config = Configuration()
config.api_key['apiKeyAuth'] = os.environ['DD_API_KEY']
config.api_key['appKeyAuth'] = os.environ['DD_APP_KEY']

NOW = int(time.time())
LOOKBACK = 900  # 15 minutes

GATES = [
    {
        "query": "p99:trace.express.request{service:checkout-api}",
        "threshold": 400,  # ms
        "name": "checkout p99 latency",
    },
    {
        "query": "sum:trace.express.request.errors{service:checkout-api}.as_rate()",
        "threshold": 0.01,  # 1% error rate
        "name": "checkout error rate",
    },
]

failed = []

with ApiClient(config) as api_client:
    api = MetricsApi(api_client)
    for gate in GATES:
        result = api.query_metrics(
            _from=NOW - LOOKBACK,
            to=NOW,
            query=gate["query"]
        )
        series = result.series
        if not series or not series[0].pointlist:
            print(f"WARNING: No data for {gate['name']} — skipping gate")
            continue

        # Take the most recent data point
        latest_value = series[0].pointlist[-1][1]
        if latest_value > gate["threshold"]:
            failed.append(
                f"GATE FAILED: {gate['name']} = {latest_value:.2f} "
                f"(threshold: {gate['threshold']})"
            )
        else:
            print(f"GATE PASSED: {gate['name']} = {latest_value:.2f}")

if failed:
    for f in failed:
        print(f, file=sys.stderr)
    sys.exit(1)

print("All metric gates passed.")

Add this as a CI step before production deploy:

- name: Check Datadog metric gates
  env:
    DD_API_KEY: ${{ secrets.DD_API_KEY }}
    DD_APP_KEY: ${{ secrets.DD_APP_KEY }}
  run: python3 scripts/check_metric_gates.py

If your p99 on staging is already above 400ms before you even deploy, you know about it before it hits production.

Mocking the Datadog Agent in Unit Tests

Unit tests must be fast and isolated. You do not want to spin up a real Datadog agent or make HTTP calls to localhost:8126 during a Jest or pytest run.

In JavaScript, mock the dd-trace module:

// jest.config.js
module.exports = {
  moduleNameMapper: {
    'dd-trace': '<rootDir>/test/__mocks__/dd-trace.js',
  },
};

// test/__mocks__/dd-trace.js
const spans = [];

const mockSpan = {
  setTag: jest.fn().mockReturnThis(),
  finish: jest.fn(),
  log: jest.fn(),
  context: () => ({ toTraceId: () => 'mock-trace-id' }),
};

module.exports = {
  init: jest.fn().mockReturnValue({}),
  tracer: {
    startSpan: jest.fn(() => mockSpan),
    scope: () => ({ active: () => mockSpan }),
    inject: jest.fn(),
    extract: jest.fn(),
  },
  _spans: spans,
};

In Python, mock using unittest.mock:

from unittest.mock import patch, MagicMock

@patch('ddtrace.tracer')
def test_payment_span_tagged_correctly(mock_tracer):
    mock_span = MagicMock()
    mock_tracer.start_span.return_value.__enter__ = lambda s: mock_span
    mock_tracer.start_span.return_value.__exit__ = MagicMock(return_value=False)

    process_payment(amount=100, currency='USD')

    mock_span.set_tag.assert_any_call('payment.amount', 100)
    mock_span.set_tag.assert_any_call('payment.currency', 'USD')

Datadog CI Visibility for Test Analytics

Datadog CI Visibility ingests your test results and correlates them with deployments and services. It answers questions like: "Which tests are flaky?", "Which tests regressed after this deploy?", "What is the p95 test duration for my suite?"

Enable it in pytest:

pip install ddtrace

DD_ENV=ci DD_SERVICE=my-service \
  ddtrace-run pytest tests/ \
  --ddtrace \
  --junit-xml=test-results.xml

In Jest, use the jest-circus runner with the Datadog reporter:

// jest.config.js
module.exports = {
  testRunner: 'jest-circus/runner',
  reporters: [
    'default',
    ['@datadog/jest-circus-reporter', {
      service: 'frontend',
      env: process.env.CI ? 'ci' : 'local',
    }],
  ],
};

With CI Visibility active, you get test flakiness detection, duration baselines, and correlation between test failures and the commits that caused them—all inside Datadog without any extra tooling.

Putting It Together

A complete Datadog-integrated CI pipeline looks like this:

  1. Run unit tests with the Datadog agent mocked (fast, no network)
  2. Deploy to staging with DD_ENV=staging and real agent sidecar
  3. Run integration tests that assert span structure via the agent stub
  4. Run Datadog Synthetics against the staging URL
  5. Query the Metrics API for p99 and error rate gates
  6. If all pass, deploy to production

Each step builds confidence that the observability contract—not just the functional behavior—is correct before code reaches users.


HelpMeTest can monitor your observability stack automatically — sign up free

Read more