Tetragon Security Policy Testing: Testing Cilium Tetragon Security Policies
Cilium Tetragon brings eBPF-powered runtime security enforcement to Kubernetes. It can block file access, kill processes that violate policy, and generate structured security events at the kernel level — all without modifying your application. But like any security control, Tetragon policies are only as good as the testing behind them. An untested security policy is a liability: you do not know if it actually blocks what you think it blocks, and you do not know if a future Tetragon upgrade will change the behavior.
This post covers how to test Tetragon TracingPolicy resources systematically — from unit-level policy validation through integration tests that verify enforcement in a live cluster.
Understanding Tetragon's Testing Surface
Tetragon has three layers that need testing:
- Policy syntax — does the
TracingPolicyYAML parse and load without errors? - Event generation — does the policy generate events when the matching condition occurs?
- Enforcement — does the policy's
action(SIGKILL, override, etc.) actually block the behavior?
Most teams test only layer 1 (it loads, CI is green). Layers 2 and 3 are where real security assurance lives.
The TracingPolicy Under Test
Throughout this post we will test a policy that blocks file writes to /etc/passwd from any process except login:
# policies/block-passwd-write.yaml
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
name: block-passwd-write
spec:
kprobes:
- call: "security_file_permission"
syscall: false
args:
- index: 0
type: "file"
- index: 1
type: "int"
selectors:
- matchArgs:
- index: 0
operator: "Prefix"
values:
- "/etc/passwd"
- index: 1
operator: "Equal"
values:
- "2" # MAY_WRITE
matchBinaries:
- operator: "NotIn"
values:
- "/bin/login"
matchActions:
- action: SigkillLayer 1: Policy Validation
Before applying a policy to any cluster, validate its schema. Tetragon ships a JSON schema for TracingPolicy; use it in CI:
#!/bin/bash
<span class="hljs-comment"># scripts/validate-policies.sh
SCHEMA_URL=<span class="hljs-string">"https://raw.githubusercontent.com/cilium/tetragon/main/vendor/github.com/cilium/cilium/pkg/k8s/apis/cilium.io/v1alpha1/crds/cilium.io_tracingpolicies.yaml"
<span class="hljs-comment"># Install kubeconform if not present
<span class="hljs-built_in">which kubeconform <span class="hljs-pipe">|| go install github.com/yannh/kubeconform/cmd/kubeconform@latest
<span class="hljs-keyword">for policy_file <span class="hljs-keyword">in policies/*.yaml; <span class="hljs-keyword">do
<span class="hljs-built_in">echo <span class="hljs-string">"Validating: $policy_file"
kubeconform \
-schema-location <span class="hljs-string">"$SCHEMA_URL" \
-strict \
<span class="hljs-string">"$policy_file"
<span class="hljs-keyword">if [ $? -ne 0 ]; <span class="hljs-keyword">then
<span class="hljs-built_in">echo <span class="hljs-string">"FAIL: $policy_file"
<span class="hljs-built_in">exit 1
<span class="hljs-keyword">fi
<span class="hljs-built_in">echo <span class="hljs-string">"PASS: $policy_file"
<span class="hljs-keyword">doneAlso validate with kubectl --dry-run:
kubectl apply --dry-run=client -f policies/block-passwd-write.yamlThis catches syntax errors without touching a cluster.
Layer 2: Event Generation Testing
Event testing verifies that Tetragon generates the expected JSON events when a matching operation occurs. Use tetra (the Tetragon CLI) to stream events during a test:
Test Script Pattern
#!/bin/bash
<span class="hljs-comment"># tests/test-passwd-write-event.sh
<span class="hljs-built_in">set -e
NAMESPACE=<span class="hljs-string">"kube-system"
EVENT_TIMEOUT=10
<span class="hljs-built_in">echo <span class="hljs-string">"=== Test: write to /etc/passwd generates Tetragon event ==="
<span class="hljs-comment"># Apply the policy
kubectl apply -f policies/block-passwd-write.yaml
<span class="hljs-built_in">sleep 2 <span class="hljs-comment"># Wait for policy propagation
<span class="hljs-comment"># Start event streaming in background
EVENTS_FILE=$(<span class="hljs-built_in">mktemp)
kubectl <span class="hljs-built_in">exec -n <span class="hljs-string">"$NAMESPACE" ds/tetragon -- \
tetra getevents \
--output json \
--filter-by-policy block-passwd-write \
> <span class="hljs-string">"$EVENTS_FILE" &
STREAM_PID=$!
<span class="hljs-comment"># Trigger the matching condition in a test pod
<span class="hljs-built_in">cat <<<span class="hljs-string">'EOF' <span class="hljs-pipe">| kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: policy-test-writer
spec:
containers:
- name: writer
image: busybox
<span class="hljs-built_in">command: [<span class="hljs-string">"sh", <span class="hljs-string">"-c", <span class="hljs-string">"echo test >> /etc/passwd; sleep 5"]
restartPolicy: Never
EOF
<span class="hljs-comment"># Wait for the pod to attempt the write
<span class="hljs-built_in">sleep <span class="hljs-string">"$EVENT_TIMEOUT"
<span class="hljs-comment"># Stop event streaming
<span class="hljs-built_in">kill <span class="hljs-string">"$STREAM_PID" 2>/dev/null <span class="hljs-pipe">|| <span class="hljs-literal">true
<span class="hljs-comment"># Clean up test pod
kubectl delete pod policy-test-writer --ignore-not-found
<span class="hljs-comment"># Assert: at least one event was generated
EVENT_COUNT=$(grep -c <span class="hljs-string">'"type":"KPROBE"' <span class="hljs-string">"$EVENTS_FILE" 2>/dev/null <span class="hljs-pipe">|| <span class="hljs-built_in">echo 0)
<span class="hljs-built_in">rm -f <span class="hljs-string">"$EVENTS_FILE"
<span class="hljs-keyword">if [ <span class="hljs-string">"$EVENT_COUNT" -gt 0 ]; <span class="hljs-keyword">then
<span class="hljs-built_in">echo <span class="hljs-string">"PASS: $EVENT_COUNT event(s) generated for /etc/passwd write attempt"
<span class="hljs-keyword">else
<span class="hljs-built_in">echo <span class="hljs-string">"FAIL: No events generated — policy may not be loaded correctly"
kubectl delete -f policies/block-passwd-write.yaml
<span class="hljs-built_in">exit 1
<span class="hljs-keyword">fi
kubectl delete -f policies/block-passwd-write.yamlParsing Event Fields
Tetragon events are structured JSON. Assert on specific fields to confirm the event content matches expectations:
#!/usr/bin/env python3
# tests/validate_event.py
import json, sys
def validate_tetragon_event(event_json_str):
"""Validate a Tetragon KPROBE event matches expectations."""
event = json.loads(event_json_str)
assertions = [
("type check", event.get("type") == "KPROBE"),
("function check", event.get("process_kprobe", {}).get("function_name") == "security_file_permission"),
("file path check", "/etc/passwd" in str(event.get("process_kprobe", {}).get("args", []))),
("policy name check", event.get("process_kprobe", {}).get("policy_name") == "block-passwd-write"),
("action check", "SIGKILL" in str(event.get("process_kprobe", {}).get("action", ""))),
]
failures = []
for name, result in assertions:
if result:
print(f" PASS: {name}")
else:
print(f" FAIL: {name}")
failures.append(name)
return len(failures) == 0
if __name__ == "__main__":
for line in sys.stdin:
line = line.strip()
if not line:
continue
if validate_tetragon_event(line):
print("Event validation: PASS")
else:
print("Event validation: FAIL")
sys.exit(1)Layer 3: Enforcement Testing
Enforcement testing verifies that the policy's action (SIGKILL in our case) actually prevents the operation. This is the most critical test — policy without enforcement is just logging.
Enforcement Test Pod
# tests/enforcement-test-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: enforcement-test
labels:
app: enforcement-test
spec:
containers:
- name: test
image: ubuntu:22.04
command:
- bash
- -c
- |
echo "Starting enforcement test"
# This should be killed by Tetragon
echo "testentry" >> /etc/passwd
EXIT_CODE=$?
if [ $EXIT_CODE -ne 0 ]; then
echo "WRITE_BLOCKED=true"
else
echo "WRITE_BLOCKED=false"
echo "FAIL: write to /etc/passwd was not blocked"
exit 1
fi
echo "PASS: enforcement verified"
restartPolicy: NeverEnforcement Test Runner
#!/bin/bash
<span class="hljs-comment"># tests/test-enforcement.sh
<span class="hljs-built_in">set -e
<span class="hljs-built_in">echo <span class="hljs-string">"=== Test: Tetragon SIGKILL enforcement for /etc/passwd write ==="
kubectl apply -f policies/block-passwd-write.yaml
<span class="hljs-built_in">sleep 3
kubectl apply -f tests/enforcement-test-pod.yaml
<span class="hljs-comment"># Wait for pod completion (with timeout)
TIMEOUT=60
START=$(<span class="hljs-built_in">date +%s)
<span class="hljs-keyword">while <span class="hljs-literal">true; <span class="hljs-keyword">do
PHASE=$(kubectl get pod enforcement-test \
-o jsonpath=<span class="hljs-string">'{.status.phase}' 2>/dev/null <span class="hljs-pipe">|| <span class="hljs-built_in">echo <span class="hljs-string">"Unknown")
<span class="hljs-keyword">if [ <span class="hljs-string">"$PHASE" = <span class="hljs-string">"Succeeded" ] <span class="hljs-pipe">|| [ <span class="hljs-string">"$PHASE" = <span class="hljs-string">"Failed" ]; <span class="hljs-keyword">then
<span class="hljs-built_in">break
<span class="hljs-keyword">fi
NOW=$(<span class="hljs-built_in">date +%s)
<span class="hljs-keyword">if [ $((NOW - START)) -gt <span class="hljs-variable">$TIMEOUT ]; <span class="hljs-keyword">then
<span class="hljs-built_in">echo <span class="hljs-string">"FAIL: pod did not complete within ${TIMEOUT}s"
kubectl delete pod enforcement-test --ignore-not-found
kubectl delete -f policies/block-passwd-write.yaml
<span class="hljs-built_in">exit 1
<span class="hljs-keyword">fi
<span class="hljs-built_in">sleep 2
<span class="hljs-keyword">done
<span class="hljs-comment"># Get logs and check result
LOGS=$(kubectl logs enforcement-test)
<span class="hljs-built_in">echo <span class="hljs-string">"$LOGS"
kubectl delete pod enforcement-test --ignore-not-found
kubectl delete -f policies/block-passwd-write.yaml
<span class="hljs-keyword">if <span class="hljs-built_in">echo <span class="hljs-string">"$LOGS" <span class="hljs-pipe">| grep -q <span class="hljs-string">"PASS: enforcement verified"; <span class="hljs-keyword">then
<span class="hljs-built_in">echo <span class="hljs-string">"=== ENFORCEMENT TEST PASSED ==="
<span class="hljs-keyword">else
<span class="hljs-built_in">echo <span class="hljs-string">"=== ENFORCEMENT TEST FAILED ==="
<span class="hljs-built_in">exit 1
<span class="hljs-keyword">fiTesting Policy Allowlists
The policy allows login to write to /etc/passwd. Test that the allowlist works — a false positive that blocks legitimate processes is as bad as a policy that blocks nothing:
#!/bin/bash
<span class="hljs-comment"># tests/test-allowlist.sh
<span class="hljs-comment"># Verify that /bin/login is NOT killed when writing to /etc/passwd
<span class="hljs-built_in">echo <span class="hljs-string">"=== Test: allowlisted binary is not blocked ==="
kubectl apply -f policies/block-passwd-write.yaml
<span class="hljs-built_in">sleep 3
<span class="hljs-comment"># Create a pod that simulates login writing to passwd
<span class="hljs-built_in">cat <<<span class="hljs-string">'EOF' <span class="hljs-pipe">| kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: allowlist-test
spec:
containers:
- name: <span class="hljs-built_in">test
image: ubuntu:22.04
<span class="hljs-built_in">command:
- bash
- -c
- <span class="hljs-pipe">|
<span class="hljs-comment"># Copy bash to /bin/login to simulate the allowlisted binary
<span class="hljs-comment"># (In a real test, use the actual /bin/login binary)
<span class="hljs-built_in">cp /bin/bash /tmp/login
/tmp/login -c <span class="hljs-string">"echo allowtest >> /etc/passwd"
<span class="hljs-built_in">echo <span class="hljs-string">"EXIT_CODE=$?"
restartPolicy: Never
EOF
<span class="hljs-built_in">sleep 15
LOGS=$(kubectl logs allowlist-test 2>/dev/null <span class="hljs-pipe">|| <span class="hljs-built_in">echo <span class="hljs-string">"")
kubectl delete pod allowlist-test --ignore-not-found
kubectl delete -f policies/block-passwd-write.yaml
<span class="hljs-keyword">if <span class="hljs-built_in">echo <span class="hljs-string">"$LOGS" <span class="hljs-pipe">| grep -q <span class="hljs-string">"EXIT_CODE=0"; <span class="hljs-keyword">then
<span class="hljs-built_in">echo <span class="hljs-string">"PASS: allowlisted binary was not blocked"
<span class="hljs-keyword">else
<span class="hljs-built_in">echo <span class="hljs-string">"FAIL: allowlisted binary was incorrectly blocked (false positive)"
<span class="hljs-built_in">exit 1
<span class="hljs-keyword">fiCI Integration
Bring all three test layers together in a CI pipeline:
name: Tetragon Policy Tests
on:
push:
paths:
- "policies/**"
- "tests/**"
jobs:
validate-policies:
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v4
- name: Validate policy schemas
run: bash scripts/validate-policies.sh
integration-tests:
needs: validate-policies
runs-on: self-hosted # Requires a real k8s cluster
steps:
- uses: actions/checkout@v4
- name: Verify Tetragon is running
run: |
kubectl get ds tetragon -n kube-system
kubectl rollout status ds/tetragon -n kube-system
- name: Run event generation tests
run: bash tests/test-passwd-write-event.sh
- name: Run enforcement tests
run: bash tests/test-enforcement.sh
- name: Run allowlist tests
run: bash tests/test-allowlist.sh
- name: Run full policy test suite
run: |
PASSED=0; FAILED=0
for test_script in tests/test-*.sh; do
if bash "$test_script"; then
PASSED=$((PASSED + 1))
else
FAILED=$((FAILED + 1))
fi
done
echo "Results: $PASSED passed, $FAILED failed"
[ $FAILED -eq 0 ]Testing Tetragon Upgrades
Tetragon versions change the internal BPF programs and event schema. Before upgrading Tetragon in production, run your policy tests against the new version in a staging cluster:
#!/bin/bash
<span class="hljs-comment"># scripts/test-upgrade.sh
OLD_VERSION=<span class="hljs-variable">$1
NEW_VERSION=<span class="hljs-variable">$2
<span class="hljs-built_in">echo <span class="hljs-string">"Testing Tetragon upgrade: $OLD_VERSION -> <span class="hljs-variable">$NEW_VERSION"
<span class="hljs-comment"># Install old version
helm upgrade --install tetragon cilium/tetragon \
--namespace kube-system \
--version <span class="hljs-string">"$OLD_VERSION" \
--<span class="hljs-built_in">wait
<span class="hljs-comment"># Baseline test run
bash tests/run-all-tests.sh > results/baseline.txt
<span class="hljs-comment"># Upgrade
helm upgrade tetragon cilium/tetragon \
--namespace kube-system \
--version <span class="hljs-string">"$NEW_VERSION" \
--<span class="hljs-built_in">wait
kubectl rollout status ds/tetragon -n kube-system
<span class="hljs-comment"># Post-upgrade test run
bash tests/run-all-tests.sh > results/post-upgrade.txt
<span class="hljs-comment"># Diff results
diff results/baseline.txt results/post-upgrade.txt && \
<span class="hljs-built_in">echo <span class="hljs-string">"PASS: No regressions after upgrade" <span class="hljs-pipe">|| \
<span class="hljs-built_in">echo <span class="hljs-string">"FAIL: Test results changed after upgrade — review diff"End-to-End Security Coverage
Tetragon policy tests verify kernel-level enforcement. They do not verify the rest of your security pipeline: the event ingestion into your SIEM, the alerting rules that fire on those events, or the incident response UI that your security team uses. Testing that full pipeline requires application-level automation.
Teams that use HelpMeTest alongside Tetragon testing cover both layers: Tetragon tests verify that the kernel blocks the right syscalls and generates the right events, while HelpMeTest's AI-powered Robot Framework and Playwright runner exercises the control plane — the dashboards, alert routing, and policy management UI. HelpMeTest's Pro plan at $100/month provides cloud-based test infrastructure without requiring you to stand up and maintain additional tooling.
Policy Test Checklist
Before declaring a Tetragon policy production-ready, verify:
- Schema validation passes (
kubeconform,kubectl --dry-run) - Policy loads without errors in the target cluster
- Matching operations generate events with correct fields
- Policy action (SIGKILL, override) is enforced — not just logged
- Allowlisted binaries are not incorrectly blocked (false positive test)
- Policy survives Tetragon pod restart (state is in kernel, not userspace)
- Policy behavior is consistent across node restarts
- Upgrade test passes from current to next Tetragon version
Conclusion
Testing Tetragon security policies is not optional — an untested policy gives you false confidence at best and a silent security gap at worst. The three-layer testing model (validation, event generation, enforcement) covers the full policy lifecycle: syntax correctness, observability output, and actual behavior. Start with schema validation in CI, add event generation tests against a dev cluster, and add enforcement tests before any policy reaches production. Security policies that survive this gauntlet are policies you can trust.