Tetragon Security Policy Testing: Testing Cilium Tetragon Security Policies

Tetragon Security Policy Testing: Testing Cilium Tetragon Security Policies

Cilium Tetragon brings eBPF-powered runtime security enforcement to Kubernetes. It can block file access, kill processes that violate policy, and generate structured security events at the kernel level — all without modifying your application. But like any security control, Tetragon policies are only as good as the testing behind them. An untested security policy is a liability: you do not know if it actually blocks what you think it blocks, and you do not know if a future Tetragon upgrade will change the behavior.

This post covers how to test Tetragon TracingPolicy resources systematically — from unit-level policy validation through integration tests that verify enforcement in a live cluster.

Understanding Tetragon's Testing Surface

Tetragon has three layers that need testing:

  1. Policy syntax — does the TracingPolicy YAML parse and load without errors?
  2. Event generation — does the policy generate events when the matching condition occurs?
  3. Enforcement — does the policy's action (SIGKILL, override, etc.) actually block the behavior?

Most teams test only layer 1 (it loads, CI is green). Layers 2 and 3 are where real security assurance lives.

The TracingPolicy Under Test

Throughout this post we will test a policy that blocks file writes to /etc/passwd from any process except login:

# policies/block-passwd-write.yaml
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: block-passwd-write
spec:
  kprobes:
  - call: "security_file_permission"
    syscall: false
    args:
    - index: 0
      type: "file"
    - index: 1
      type: "int"
    selectors:
    - matchArgs:
      - index: 0
        operator: "Prefix"
        values:
        - "/etc/passwd"
      - index: 1
        operator: "Equal"
        values:
        - "2"   # MAY_WRITE
      matchBinaries:
      - operator: "NotIn"
        values:
        - "/bin/login"
      matchActions:
      - action: Sigkill

Layer 1: Policy Validation

Before applying a policy to any cluster, validate its schema. Tetragon ships a JSON schema for TracingPolicy; use it in CI:

#!/bin/bash
<span class="hljs-comment"># scripts/validate-policies.sh

SCHEMA_URL=<span class="hljs-string">"https://raw.githubusercontent.com/cilium/tetragon/main/vendor/github.com/cilium/cilium/pkg/k8s/apis/cilium.io/v1alpha1/crds/cilium.io_tracingpolicies.yaml"

<span class="hljs-comment"># Install kubeconform if not present
<span class="hljs-built_in">which kubeconform <span class="hljs-pipe">|| go install github.com/yannh/kubeconform/cmd/kubeconform@latest

<span class="hljs-keyword">for policy_file <span class="hljs-keyword">in policies/*.yaml; <span class="hljs-keyword">do
    <span class="hljs-built_in">echo <span class="hljs-string">"Validating: $policy_file"
    kubeconform \
        -schema-location <span class="hljs-string">"$SCHEMA_URL" \
        -strict \
        <span class="hljs-string">"$policy_file"

    <span class="hljs-keyword">if [ $? -ne 0 ]; <span class="hljs-keyword">then
        <span class="hljs-built_in">echo <span class="hljs-string">"FAIL: $policy_file"
        <span class="hljs-built_in">exit 1
    <span class="hljs-keyword">fi
    <span class="hljs-built_in">echo <span class="hljs-string">"PASS: $policy_file"
<span class="hljs-keyword">done

Also validate with kubectl --dry-run:

kubectl apply --dry-run=client -f policies/block-passwd-write.yaml

This catches syntax errors without touching a cluster.

Layer 2: Event Generation Testing

Event testing verifies that Tetragon generates the expected JSON events when a matching operation occurs. Use tetra (the Tetragon CLI) to stream events during a test:

Test Script Pattern

#!/bin/bash
<span class="hljs-comment"># tests/test-passwd-write-event.sh
<span class="hljs-built_in">set -e

NAMESPACE=<span class="hljs-string">"kube-system"
EVENT_TIMEOUT=10

<span class="hljs-built_in">echo <span class="hljs-string">"=== Test: write to /etc/passwd generates Tetragon event ==="

<span class="hljs-comment"># Apply the policy
kubectl apply -f policies/block-passwd-write.yaml
<span class="hljs-built_in">sleep 2  <span class="hljs-comment"># Wait for policy propagation

<span class="hljs-comment"># Start event streaming in background
EVENTS_FILE=$(<span class="hljs-built_in">mktemp)
kubectl <span class="hljs-built_in">exec -n <span class="hljs-string">"$NAMESPACE" ds/tetragon -- \
    tetra getevents \
    --output json \
    --filter-by-policy block-passwd-write \
    > <span class="hljs-string">"$EVENTS_FILE" &
STREAM_PID=$!

<span class="hljs-comment"># Trigger the matching condition in a test pod
<span class="hljs-built_in">cat <<<span class="hljs-string">'EOF' <span class="hljs-pipe">| kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: policy-test-writer
spec:
  containers:
  - name: writer
    image: busybox
    <span class="hljs-built_in">command: [<span class="hljs-string">"sh", <span class="hljs-string">"-c", <span class="hljs-string">"echo test >> /etc/passwd; sleep 5"]
  restartPolicy: Never
EOF

<span class="hljs-comment"># Wait for the pod to attempt the write
<span class="hljs-built_in">sleep <span class="hljs-string">"$EVENT_TIMEOUT"

<span class="hljs-comment"># Stop event streaming
<span class="hljs-built_in">kill <span class="hljs-string">"$STREAM_PID" 2>/dev/null <span class="hljs-pipe">|| <span class="hljs-literal">true

<span class="hljs-comment"># Clean up test pod
kubectl delete pod policy-test-writer --ignore-not-found

<span class="hljs-comment"># Assert: at least one event was generated
EVENT_COUNT=$(grep -c <span class="hljs-string">'"type":"KPROBE"' <span class="hljs-string">"$EVENTS_FILE" 2>/dev/null <span class="hljs-pipe">|| <span class="hljs-built_in">echo 0)
<span class="hljs-built_in">rm -f <span class="hljs-string">"$EVENTS_FILE"

<span class="hljs-keyword">if [ <span class="hljs-string">"$EVENT_COUNT" -gt 0 ]; <span class="hljs-keyword">then
    <span class="hljs-built_in">echo <span class="hljs-string">"PASS: $EVENT_COUNT event(s) generated for /etc/passwd write attempt"
<span class="hljs-keyword">else
    <span class="hljs-built_in">echo <span class="hljs-string">"FAIL: No events generated — policy may not be loaded correctly"
    kubectl delete -f policies/block-passwd-write.yaml
    <span class="hljs-built_in">exit 1
<span class="hljs-keyword">fi

kubectl delete -f policies/block-passwd-write.yaml

Parsing Event Fields

Tetragon events are structured JSON. Assert on specific fields to confirm the event content matches expectations:

#!/usr/bin/env python3
# tests/validate_event.py
import json, sys

def validate_tetragon_event(event_json_str):
    """Validate a Tetragon KPROBE event matches expectations."""
    event = json.loads(event_json_str)

    assertions = [
        ("type check", event.get("type") == "KPROBE"),
        ("function check", event.get("process_kprobe", {}).get("function_name") == "security_file_permission"),
        ("file path check", "/etc/passwd" in str(event.get("process_kprobe", {}).get("args", []))),
        ("policy name check", event.get("process_kprobe", {}).get("policy_name") == "block-passwd-write"),
        ("action check", "SIGKILL" in str(event.get("process_kprobe", {}).get("action", ""))),
    ]

    failures = []
    for name, result in assertions:
        if result:
            print(f"  PASS: {name}")
        else:
            print(f"  FAIL: {name}")
            failures.append(name)

    return len(failures) == 0

if __name__ == "__main__":
    for line in sys.stdin:
        line = line.strip()
        if not line:
            continue
        if validate_tetragon_event(line):
            print("Event validation: PASS")
        else:
            print("Event validation: FAIL")
            sys.exit(1)

Layer 3: Enforcement Testing

Enforcement testing verifies that the policy's action (SIGKILL in our case) actually prevents the operation. This is the most critical test — policy without enforcement is just logging.

Enforcement Test Pod

# tests/enforcement-test-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: enforcement-test
  labels:
    app: enforcement-test
spec:
  containers:
  - name: test
    image: ubuntu:22.04
    command:
    - bash
    - -c
    - |
      echo "Starting enforcement test"

      # This should be killed by Tetragon
      echo "testentry" >> /etc/passwd
      EXIT_CODE=$?

      if [ $EXIT_CODE -ne 0 ]; then
        echo "WRITE_BLOCKED=true"
      else
        echo "WRITE_BLOCKED=false"
        echo "FAIL: write to /etc/passwd was not blocked"
        exit 1
      fi

      echo "PASS: enforcement verified"
  restartPolicy: Never

Enforcement Test Runner

#!/bin/bash
<span class="hljs-comment"># tests/test-enforcement.sh
<span class="hljs-built_in">set -e

<span class="hljs-built_in">echo <span class="hljs-string">"=== Test: Tetragon SIGKILL enforcement for /etc/passwd write ==="

kubectl apply -f policies/block-passwd-write.yaml
<span class="hljs-built_in">sleep 3

kubectl apply -f tests/enforcement-test-pod.yaml

<span class="hljs-comment"># Wait for pod completion (with timeout)
TIMEOUT=60
START=$(<span class="hljs-built_in">date +%s)
<span class="hljs-keyword">while <span class="hljs-literal">true; <span class="hljs-keyword">do
    PHASE=$(kubectl get pod enforcement-test \
        -o jsonpath=<span class="hljs-string">'{.status.phase}' 2>/dev/null <span class="hljs-pipe">|| <span class="hljs-built_in">echo <span class="hljs-string">"Unknown")

    <span class="hljs-keyword">if [ <span class="hljs-string">"$PHASE" = <span class="hljs-string">"Succeeded" ] <span class="hljs-pipe">|| [ <span class="hljs-string">"$PHASE" = <span class="hljs-string">"Failed" ]; <span class="hljs-keyword">then
        <span class="hljs-built_in">break
    <span class="hljs-keyword">fi

    NOW=$(<span class="hljs-built_in">date +%s)
    <span class="hljs-keyword">if [ $((NOW - START)) -gt <span class="hljs-variable">$TIMEOUT ]; <span class="hljs-keyword">then
        <span class="hljs-built_in">echo <span class="hljs-string">"FAIL: pod did not complete within ${TIMEOUT}s"
        kubectl delete pod enforcement-test --ignore-not-found
        kubectl delete -f policies/block-passwd-write.yaml
        <span class="hljs-built_in">exit 1
    <span class="hljs-keyword">fi
    <span class="hljs-built_in">sleep 2
<span class="hljs-keyword">done

<span class="hljs-comment"># Get logs and check result
LOGS=$(kubectl logs enforcement-test)
<span class="hljs-built_in">echo <span class="hljs-string">"$LOGS"

kubectl delete pod enforcement-test --ignore-not-found
kubectl delete -f policies/block-passwd-write.yaml

<span class="hljs-keyword">if <span class="hljs-built_in">echo <span class="hljs-string">"$LOGS" <span class="hljs-pipe">| grep -q <span class="hljs-string">"PASS: enforcement verified"; <span class="hljs-keyword">then
    <span class="hljs-built_in">echo <span class="hljs-string">"=== ENFORCEMENT TEST PASSED ==="
<span class="hljs-keyword">else
    <span class="hljs-built_in">echo <span class="hljs-string">"=== ENFORCEMENT TEST FAILED ==="
    <span class="hljs-built_in">exit 1
<span class="hljs-keyword">fi

Testing Policy Allowlists

The policy allows login to write to /etc/passwd. Test that the allowlist works — a false positive that blocks legitimate processes is as bad as a policy that blocks nothing:

#!/bin/bash
<span class="hljs-comment"># tests/test-allowlist.sh
<span class="hljs-comment"># Verify that /bin/login is NOT killed when writing to /etc/passwd

<span class="hljs-built_in">echo <span class="hljs-string">"=== Test: allowlisted binary is not blocked ==="

kubectl apply -f policies/block-passwd-write.yaml
<span class="hljs-built_in">sleep 3

<span class="hljs-comment"># Create a pod that simulates login writing to passwd
<span class="hljs-built_in">cat <<<span class="hljs-string">'EOF' <span class="hljs-pipe">| kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: allowlist-test
spec:
  containers:
  - name: <span class="hljs-built_in">test
    image: ubuntu:22.04
    <span class="hljs-built_in">command:
    - bash
    - -c
    - <span class="hljs-pipe">|
      <span class="hljs-comment"># Copy bash to /bin/login to simulate the allowlisted binary
      <span class="hljs-comment"># (In a real test, use the actual /bin/login binary)
      <span class="hljs-built_in">cp /bin/bash /tmp/login
      /tmp/login -c <span class="hljs-string">"echo allowtest >> /etc/passwd"
      <span class="hljs-built_in">echo <span class="hljs-string">"EXIT_CODE=$?"
  restartPolicy: Never
EOF

<span class="hljs-built_in">sleep 15

LOGS=$(kubectl logs allowlist-test 2>/dev/null <span class="hljs-pipe">|| <span class="hljs-built_in">echo <span class="hljs-string">"")
kubectl delete pod allowlist-test --ignore-not-found
kubectl delete -f policies/block-passwd-write.yaml

<span class="hljs-keyword">if <span class="hljs-built_in">echo <span class="hljs-string">"$LOGS" <span class="hljs-pipe">| grep -q <span class="hljs-string">"EXIT_CODE=0"; <span class="hljs-keyword">then
    <span class="hljs-built_in">echo <span class="hljs-string">"PASS: allowlisted binary was not blocked"
<span class="hljs-keyword">else
    <span class="hljs-built_in">echo <span class="hljs-string">"FAIL: allowlisted binary was incorrectly blocked (false positive)"
    <span class="hljs-built_in">exit 1
<span class="hljs-keyword">fi

CI Integration

Bring all three test layers together in a CI pipeline:

name: Tetragon Policy Tests

on:
  push:
    paths:
      - "policies/**"
      - "tests/**"

jobs:
  validate-policies:
    runs-on: ubuntu-22.04
    steps:
      - uses: actions/checkout@v4
      - name: Validate policy schemas
        run: bash scripts/validate-policies.sh

  integration-tests:
    needs: validate-policies
    runs-on: self-hosted  # Requires a real k8s cluster
    steps:
      - uses: actions/checkout@v4

      - name: Verify Tetragon is running
        run: |
          kubectl get ds tetragon -n kube-system
          kubectl rollout status ds/tetragon -n kube-system

      - name: Run event generation tests
        run: bash tests/test-passwd-write-event.sh

      - name: Run enforcement tests
        run: bash tests/test-enforcement.sh

      - name: Run allowlist tests
        run: bash tests/test-allowlist.sh

      - name: Run full policy test suite
        run: |
          PASSED=0; FAILED=0
          for test_script in tests/test-*.sh; do
            if bash "$test_script"; then
              PASSED=$((PASSED + 1))
            else
              FAILED=$((FAILED + 1))
            fi
          done
          echo "Results: $PASSED passed, $FAILED failed"
          [ $FAILED -eq 0 ]

Testing Tetragon Upgrades

Tetragon versions change the internal BPF programs and event schema. Before upgrading Tetragon in production, run your policy tests against the new version in a staging cluster:

#!/bin/bash
<span class="hljs-comment"># scripts/test-upgrade.sh
OLD_VERSION=<span class="hljs-variable">$1
NEW_VERSION=<span class="hljs-variable">$2

<span class="hljs-built_in">echo <span class="hljs-string">"Testing Tetragon upgrade: $OLD_VERSION -> <span class="hljs-variable">$NEW_VERSION"

<span class="hljs-comment"># Install old version
helm upgrade --install tetragon cilium/tetragon \
    --namespace kube-system \
    --version <span class="hljs-string">"$OLD_VERSION" \
    --<span class="hljs-built_in">wait

<span class="hljs-comment"># Baseline test run
bash tests/run-all-tests.sh > results/baseline.txt

<span class="hljs-comment"># Upgrade
helm upgrade tetragon cilium/tetragon \
    --namespace kube-system \
    --version <span class="hljs-string">"$NEW_VERSION" \
    --<span class="hljs-built_in">wait

kubectl rollout status ds/tetragon -n kube-system

<span class="hljs-comment"># Post-upgrade test run
bash tests/run-all-tests.sh > results/post-upgrade.txt

<span class="hljs-comment"># Diff results
diff results/baseline.txt results/post-upgrade.txt && \
    <span class="hljs-built_in">echo <span class="hljs-string">"PASS: No regressions after upgrade" <span class="hljs-pipe">|| \
    <span class="hljs-built_in">echo <span class="hljs-string">"FAIL: Test results changed after upgrade — review diff"

End-to-End Security Coverage

Tetragon policy tests verify kernel-level enforcement. They do not verify the rest of your security pipeline: the event ingestion into your SIEM, the alerting rules that fire on those events, or the incident response UI that your security team uses. Testing that full pipeline requires application-level automation.

Teams that use HelpMeTest alongside Tetragon testing cover both layers: Tetragon tests verify that the kernel blocks the right syscalls and generates the right events, while HelpMeTest's AI-powered Robot Framework and Playwright runner exercises the control plane — the dashboards, alert routing, and policy management UI. HelpMeTest's Pro plan at $100/month provides cloud-based test infrastructure without requiring you to stand up and maintain additional tooling.

Policy Test Checklist

Before declaring a Tetragon policy production-ready, verify:

  • Schema validation passes (kubeconform, kubectl --dry-run)
  • Policy loads without errors in the target cluster
  • Matching operations generate events with correct fields
  • Policy action (SIGKILL, override) is enforced — not just logged
  • Allowlisted binaries are not incorrectly blocked (false positive test)
  • Policy survives Tetragon pod restart (state is in kernel, not userspace)
  • Policy behavior is consistent across node restarts
  • Upgrade test passes from current to next Tetragon version

Conclusion

Testing Tetragon security policies is not optional — an untested policy gives you false confidence at best and a silent security gap at worst. The three-layer testing model (validation, event generation, enforcement) covers the full policy lifecycle: syntax correctness, observability output, and actual behavior. Start with schema validation in CI, add event generation tests against a dev cluster, and add enforcement tests before any policy reaches production. Security policies that survive this gauntlet are policies you can trust.

Read more