bpftrace for Tracing Tests: Using bpftrace in CI/Testing Workflows
Modern software testing has a visibility problem. Unit tests tell you whether your code returns the right value. Integration tests confirm your services talk to each other. But neither tells you what the kernel is actually doing while your application runs — which syscalls fire, how long disk I/O blocks, whether your network stack is behaving. bpftrace fills that gap, and integrating it into your CI pipeline unlocks a class of observability that no application-level test framework can provide.
What Is bpftrace?
bpftrace is a high-level tracing language for Linux built on eBPF (extended Berkeley Packet Filter). It lets you attach small programs to kernel events — system calls, kprobes, uprobes, tracepoints — and collect data with almost zero overhead. Think of it as DTrace for Linux, but with a one-liner syntax that actually fits in a shell script.
A simple bpftrace program looks like this:
bpftrace -e 'tracepoint:syscalls:sys_enter_openat { printf("%s opened %s\n", comm, str(args->filename)); }'Run that alongside your test suite and you get a live log of every file your application opens. No code changes, no instrumentation libraries, no agent to install.
Why bpftrace Belongs in Your CI Pipeline
Traditional CI observability is limited to what your application explicitly logs. If a test fails because of a file descriptor leak, a DNS timeout, or unexpected disk flushing, your application logs might say nothing useful. bpftrace lets you observe the system your application runs on top of.
Concrete use cases where bpftrace catches what application-level tests miss:
- Syscall regression testing — verify that a code change does not introduce unexpected syscalls (e.g., a library update that starts calling
fork()in a hot path) - Latency profiling in tests — measure p99 latency of
read()orwrite()calls during a load test, not just application-level response times - File access auditing — confirm that a process only touches the files it should during a test run
- Memory allocation tracing — catch unexpected
mmap()calls that signal memory leaks
Setting Up bpftrace in CI
Prerequisites
bpftrace requires a Linux kernel 4.9 or later, with CAP_BPF and CAP_PERFMON capabilities (or root). Most CI runners on bare metal or VMs support this. Containers require --privileged or the appropriate capability set — check your CI provider's documentation.
Install bpftrace on Ubuntu/Debian:
apt-get install -y bpftraceOn RHEL/CentOS:
dnf install -y bpftraceA Minimal CI Integration Pattern
Here is a GitHub Actions workflow step that runs bpftrace alongside your test suite and saves the trace output as a build artifact:
- name: Run tests with bpftrace syscall tracing
run: |
# Start bpftrace in background, write to file
bpftrace -e '
tracepoint:syscalls:sys_enter_openat /comm == "your-app"/ {
printf("%lld %s %s\n", elapsed, comm, str(args->filename));
}
' > bpftrace-output.txt &
BPFTRACE_PID=$!
# Run your actual test suite
./run-tests.sh
TEST_EXIT=$?
# Stop bpftrace
kill $BPFTRACE_PID
exit $TEST_EXIT
- name: Upload bpftrace trace
uses: actions/upload-artifact@v3
with:
name: bpftrace-syscall-trace
path: bpftrace-output.txtThis gives you a syscall log for every CI run without changing a line of application code.
Writing Useful bpftrace Scripts for Testing
Tracing Test Duration by Syscall Category
This script groups syscall time into categories during a test run — useful for diagnosing slow tests:
#!/usr/bin/env bpftrace
BEGIN {
printf("Tracing syscalls. Hit Ctrl-C to end.\n");
}
tracepoint:syscalls:sys_enter_read,
tracepoint:syscalls:sys_enter_write,
tracepoint:syscalls:sys_enter_openat {
@start[tid] = nsecs;
@op[tid] = probe;
}
tracepoint:syscalls:sys_exit_read,
tracepoint:syscalls:sys_exit_write,
tracepoint:syscalls:sys_exit_openat
/@start[tid]/
{
@latency[str(@op[tid])] = hist(nsecs - @start[tid]);
delete(@start[tid]);
delete(@op[tid]);
}
END {
print(@latency);
}Run this during a slow integration test and the histogram output will pinpoint whether the bottleneck is in reads, writes, or file opens.
Detecting Unexpected Network Connections
Security-sensitive test environments need to verify that applications only talk to permitted hosts. This bpftrace script alerts on any connect() call during a test run:
tracepoint:syscalls:sys_enter_connect {
printf("CONNECT: pid=%d comm=%s\n", pid, comm);
}Pipe the output through a grep or awk filter in your CI step to fail the build if unexpected outbound connections appear.
Measuring Lock Contention Under Test Load
Mutex contention causes intermittent slowness that is notoriously hard to reproduce. bpftrace can surface it:
uprobe:/usr/lib/x86_64-linux-gnu/libpthread.so.0:pthread_mutex_lock {
@lock_start[tid] = nsecs;
}
uretprobe:/usr/lib/x86_64-linux-gnu/libpthread.so.0:pthread_mutex_lock
/@lock_start[tid]/
{
@wait_ns = hist(nsecs - @lock_start[tid]);
delete(@lock_start[tid]);
}
END { print(@wait_ns); }If your test suite shows high wait times in this histogram, your parallel tests may be fighting over shared locks — a data race waiting to happen.
Integrating bpftrace Output into Test Reports
Raw bpftrace output is text. You can parse it in post-processing to enrich test reports. A Python snippet to parse latency histograms into JSON for a CI dashboard:
import subprocess, json, re
result = subprocess.run(
["bpftrace", "-e", "...", "--no-warnings"],
capture_output=True, text=True, timeout=300
)
# Parse @latency output
lines = result.stdout.splitlines()
data = {}
current_key = None
for line in lines:
m = re.match(r'^@(\w+)\[(.+)\]:', line)
if m:
current_key = m.group(2)
data[current_key] = []
elif current_key and re.match(r'^\s+\[', line):
data[current_key].append(line.strip())
print(json.dumps(data, indent=2))Feed this JSON into your test reporting pipeline to track syscall latency trends across builds.
Combining bpftrace with Higher-Level Test Automation
For teams running tests at scale, bpftrace provides the low-level kernel layer while tools like HelpMeTest handle the higher-level application and API testing. HelpMeTest's AI-powered Robot Framework and Playwright runner orchestrates browser and API tests across your stack, while bpftrace scripts run alongside to capture the kernel-level behavior those tests trigger. You get full-stack observability: what the user sees (captured by HelpMeTest's test runner) and what the kernel does (captured by bpftrace) in the same CI run.
The two tools complement each other cleanly. HelpMeTest's Pro plan ($100/month) handles the test orchestration layer; bpftrace handles the observability layer. Neither duplicates the other.
Gotchas and Best Practices
Kernel version matters. Some tracepoints and helper functions require specific kernel versions. Always test your bpftrace scripts on the same kernel your CI runners use. CI providers sometimes upgrade kernel versions silently.
Permission model. Running bpftrace as root in CI is a security tradeoff. Prefer capability-based approaches (CAP_BPF, CAP_PERFMON) where your CI provider supports it, rather than --privileged containers.
Output volume. bpftrace can generate large output files if you trace high-frequency events (like read() on every file descriptor). Apply comm filters to scope traces to your application process:
/comm == "myapp"/Scripting reproducibility. Pin your bpftrace version in CI using a container image or version-locked package. bpftrace language syntax has evolved significantly across versions.
Don't trace in production without sampling. bpftrace's overhead is low but not zero. For production tracing, add sampling:
tracepoint:syscalls:sys_enter_read {
if (rand % 100 == 0) {
@samples = count();
}
}A Full Example: Catching a File Descriptor Leak in CI
Here is a complete real-world scenario: your integration tests pass, but you suspect a file descriptor leak is building up over long test runs. This bpftrace script tracks open and close calls for your process:
#!/bin/bash
PID=$(pgrep your-app)
bpftrace -e <span class="hljs-string">"
tracepoint:syscalls:sys_exit_openat /pid == $PID && retval >= 0/ {
@fds[retval] = 1;
@open_count = count();
}
tracepoint:syscalls:sys_enter_close /pid == <span class="hljs-variable">$PID/ {
delete(@fds[args->fd]);
@close_count = count();
}
END {
printf(\"Open: %d, Close: %d, Leaked FDs: %d\n\",
@open_count, @close_count, len(@fds));
}
" --<span class="hljs-built_in">timeout 60Run this during your test suite. If Leaked FDs is non-zero at the end, you have your bug.
Conclusion
bpftrace brings kernel-level visibility into testing workflows without requiring application code changes. By running bpftrace scripts alongside your existing CI test suites, you can detect syscall regressions, measure I/O latency histograms, audit file access patterns, and catch resource leaks — all in the same pipeline that runs your unit and integration tests. Start with a single script that traces file opens for your application process, commit it to your CI configuration, and build from there. The kernel tells a different story than your application logs, and bpftrace is how you read it.