Distributed Tracing in Microservices Tests: Finding Failures Across Services
When a test fails in a distributed system, the error message tells you what happened but not where. A 500 response from your API gateway might be caused by a timeout in service C, waiting on a database query in service B, initiated by a bad request from service A. Without tracing, you're debugging with a flashlight in a cave.
Distributed tracing gives you a complete picture of every request as it flows through your system — and it's just as valuable in your test environment as in production.
How Distributed Tracing Works
Every request gets a trace ID when it enters the system. Each service creates a span tagged with that trace ID. Spans record start time, duration, errors, and metadata.
Collect all spans for a trace ID and you see the complete call graph: which services were called, in what order, how long each took, and where errors occurred.
Trace: 4f8a2b3c
├── order-service.createOrder (120ms)
│ ├── inventory-service.reserve (45ms) ✓
│ ├── payment-service.charge (68ms) ✗ ERROR: card_declined
│ └── notification-service.send (NEVER CALLED)OpenTelemetry: The Standard
OpenTelemetry (OTel) is the vendor-neutral standard for distributed tracing. Instrument once, export to any backend.
// Node.js tracing setup
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http');
const { HttpInstrumentation } = require('@opentelemetry/instrumentation-http');
const sdk = new NodeSDK({
traceExporter: new OTLPTraceExporter({
url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT || 'http://jaeger:4318/v1/traces',
}),
instrumentations: [new HttpInstrumentation()],
});
sdk.start();# Python tracing setup
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.instrumentation.requests import RequestsInstrumentation
provider = TracerProvider()
provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter()))
trace.set_tracer_provider(provider)
RequestsInstrumentation().instrument()Jaeger in Your Test Environment
Run Jaeger in Docker Compose alongside your services:
# docker-compose.test.yml
services:
jaeger:
image: jaegertracing/all-in-one:1.51
ports:
- "16686:16686" # UI
- "4318:4318" # OTLP HTTP
environment:
COLLECTOR_OTLP_ENABLED: "true"
order-service:
environment:
OTEL_EXPORTER_OTLP_ENDPOINT: http://jaeger:4318/v1/traces
OTEL_SERVICE_NAME: order-service
payment-service:
environment:
OTEL_EXPORTER_OTLP_ENDPOINT: http://jaeger:4318/v1/traces
OTEL_SERVICE_NAME: payment-serviceTrace-Based Test Assertions
The most powerful use of tracing in tests is programmatically verifying the trace after a test runs.
Return Trace IDs from Your APIs
// Express middleware — return trace ID in response header
app.use((req, res, next) => {
const span = trace.getActiveSpan();
if (span) {
res.setHeader('X-Trace-Id', span.spanContext().traceId);
}
next();
});Query Traces in Tests
import requests
import time
def test_checkout_calls_payment_and_inventory():
response = requests.post('http://order-service/checkout', json={
'customerId': 'abc',
'items': [{'productId': '123', 'quantity': 1}]
})
trace_id = response.headers['X-Trace-Id']
time.sleep(0.5) # wait for span export
trace_data = requests.get(
f'http://localhost:16686/api/traces/{trace_id}'
).json()
services_called = {
span['process']['serviceName']
for span in trace_data['data'][0]['spans']
}
assert 'payment-service' in services_called
assert 'inventory-service' in services_calledVerify Execution Order
def test_payment_called_after_inventory_reserved():
response = checkout(order_data)
trace = get_trace(response.headers['X-Trace-Id'])
spans = {s['process']['serviceName']: s for s in trace['spans']}
inventory_end = (
spans['inventory-service']['startTime'] +
spans['inventory-service']['duration']
)
payment_start = spans['payment-service']['startTime']
assert payment_start >= inventory_end, \
"Payment was called before inventory was reserved"Verify Cache Hits (No DB Calls)
def test_cached_response_skips_database():
requests.get('http://product-service/products/123') # warm cache
response = requests.get('http://product-service/products/123')
trace = get_trace(response.headers['X-Trace-Id'])
operations = [span['operationName'] for span in trace['spans']]
db_calls = [op for op in operations if 'SELECT' in op]
assert len(db_calls) == 0, f"Unexpected DB queries: {db_calls}"Custom Business Spans
Auto-instrumentation covers HTTP and DB calls. Add custom spans for business operations:
tracer = trace.get_tracer(__name__)
def process_payment(order_id, amount):
with tracer.start_as_current_span('process_payment') as span:
span.set_attribute('order.id', order_id)
span.set_attribute('payment.amount', amount)
try:
result = charge_card(amount)
span.set_attribute('payment.transaction_id', result['id'])
return result
except CardDeclinedException as e:
span.set_status(StatusCode.ERROR)
span.record_exception(e)
span.set_attribute('payment.decline_code', e.code)
raiseVerify these attributes in tests:
def test_payment_span_records_order_context():
response = process_order(order_id='ord_123', amount=99.99)
trace = get_trace(response.trace_id)
payment_span = next(s for s in trace['spans']
if s['operationName'] == 'process_payment')
tags = {t['key']: t['value'] for t in payment_span['tags']}
assert tags['order.id'] == 'ord_123'
assert tags['payment.amount'] == 99.99Trace Assertion Library
Build a reusable assertion helper for your team:
class TraceAssertions {
constructor(spans) {
this.spans = spans;
}
static async load(traceId, { waitMs = 500 } = {}) {
await new Promise(r => setTimeout(r, waitMs));
const res = await fetch(`http://localhost:16686/api/traces/${traceId}`);
const data = await res.json();
return new TraceAssertions(data.data[0].spans);
}
serviceCalled(name) {
const called = this.spans.some(s => s.process.serviceName === name);
if (!called) throw new Error(`Expected ${name} to be called. Called: ${this._services()}`);
return this;
}
serviceNotCalled(name) {
const called = this.spans.some(s => s.process.serviceName === name);
if (called) throw new Error(`Expected ${name} NOT to be called`);
return this;
}
noErrors() {
const errors = this.spans.filter(s => s.tags.some(t => t.key === 'error' && t.value));
if (errors.length) {
const names = errors.map(s => `${s.process.serviceName}:${s.operationName}`);
throw new Error(`Unexpected errors: ${names.join(', ')}`);
}
return this;
}
_services() {
return [...new Set(this.spans.map(s => s.process.serviceName))].join(', ');
}
}
// In tests:
it('checkout flow calls all required services', async () => {
const response = await checkout(orderData);
const trace = await TraceAssertions.load(response.headers['x-trace-id']);
trace
.serviceCalled('inventory-service')
.serviceCalled('payment-service')
.serviceNotCalled('fraud-detection')
.noErrors();
});Debugging Failures with Traces
When a test fails, use traces to pinpoint the cause:
- Get the trace ID from the failed request header or test output
- Open Jaeger UI at
http://localhost:16686 - Search by trace ID to see the full call graph
- Find error spans (shown in red) — click for stack traces and context
- Check timing — identify slow spans and unexpected call sequences
This replaces grepping through logs across multiple services.
Log Trace IDs in Test Output
Make trace links available in test reports:
@pytest.fixture(autouse=True)
def print_trace_url(request):
trace_ids = []
request.node.trace_ids = trace_ids
yield
for tid in trace_ids:
print(f"\n Trace: http://localhost:16686/trace/{tid}")When a CI test fails, the output includes a direct Jaeger link. One click shows exactly what happened.
Always Sample in Test Environments
Production typically samples a percentage of traces to manage volume. In tests, sample everything:
const { AlwaysOnSampler } = require('@opentelemetry/sdk-trace-base');
const sdk = new NodeSDK({
sampler: new AlwaysOnSampler(),
});Distributed tracing turns debugging from log archaeology into visual exploration. For microservices integration tests, it's one of the highest-value tools you can add to your stack.