seccomp Profiles: Writing and Testing Syscall Filters for Containers

seccomp Profiles: Writing and Testing Syscall Filters for Containers

Every system call a container makes is an attack surface. seccomp (Secure Computing Mode) lets you define exactly which syscalls a container is allowed to make — and kill or log anything outside that list.

The challenge is writing profiles that are tight enough to restrict attacker behavior but permissive enough that your application actually runs. That gap is where testing lives.

How seccomp Works

Linux seccomp operates at the kernel level. When a process makes a syscall, the kernel checks the process's seccomp filter:

  • SCMP_ACT_ALLOW: syscall proceeds normally
  • SCMP_ACT_ERRNO: syscall is blocked, returns an error code
  • SCMP_ACT_KILL: process is killed immediately
  • SCMP_ACT_LOG: syscall is logged but allowed (audit mode)
  • SCMP_ACT_TRAP: sends SIGSYS to the process

Docker and Kubernetes both support seccomp profiles in JSON format. A profile lists the default action and any syscall-specific overrides.

Profile Format

A seccomp profile is a JSON file:

{
  "defaultAction": "SCMP_ACT_ERRNO",
  "architectures": ["SCMP_ARCH_X86_64", "SCMP_ARCH_X86", "SCMP_ARCH_X32"],
  "syscalls": [
    {
      "names": [
        "read", "write", "close", "fstat", "mmap", "mprotect",
        "munmap", "brk", "pread64", "access", "pipe", "select",
        "sched_yield", "mremap", "msync", "mincore", "madvise",
        "rt_sigaction", "rt_sigprocmask", "rt_sigreturn",
        "ioctl", "pwrite64", "readv", "writev", "getcwd",
        "socket", "connect", "accept", "sendto", "recvfrom",
        "sendmsg", "recvmsg", "bind", "listen", "getsockname",
        "getpeername", "socketpair", "setsockopt", "getsockopt",
        "clone", "fork", "vfork", "execve", "exit", "wait4",
        "kill", "uname", "fcntl", "flock", "fsync", "getpid",
        "getppid", "getuid", "geteuid", "getgid", "getegid",
        "getdents64", "openat", "newfstatat", "exit_group",
        "set_tid_address", "set_robust_list", "futex",
        "epoll_create1", "epoll_ctl", "epoll_pwait", "eventfd2",
        "timerfd_create", "timerfd_settime", "timerfd_gettime",
        "accept4", "dup3", "pipe2", "prlimit64", "getrandom",
        "seccomp", "statx", "rseq"
      ],
      "action": "SCMP_ACT_ALLOW"
    }
  ]
}

The defaultAction is the critical field. In production: SCMP_ACT_ERRNO. For profiling: SCMP_ACT_LOG.

Phase 1: Discovery — What Syscalls Does Your App Actually Use?

Before writing a restrictive profile, you need to know what syscalls your application makes. Use audit mode to discover them.

Method 1: Docker + strace

# Run your container with strace
docker run --<span class="hljs-built_in">rm \
  --security-opt seccomp=unconfined \
  --entrypoint strace \
  your-app:latest \
  -f -e trace=all \
  -o /tmp/syscalls.log \
  /usr/bin/your-app --your-flags

<span class="hljs-comment"># Extract unique syscall names
grep -oP <span class="hljs-string">'(?<=\[pid\s+\d+\] )\w+|^\w+(?=\()' /tmp/syscalls.log \
  <span class="hljs-pipe">| <span class="hljs-built_in">sort -u > syscalls-used.txt

Method 2: seccomp Audit Mode

Create an audit profile that logs everything but blocks nothing:

{
  "defaultAction": "SCMP_ACT_LOG",
  "syscalls": []
}

Apply it:

docker run --rm \
  --security-opt seccomp=audit-profile.json \
  your-app:latest

Read the audit log:

# On the host
journalctl -k <span class="hljs-pipe">| grep <span class="hljs-string">"seccomp" <span class="hljs-pipe">| grep <span class="hljs-string">"syscall=" <span class="hljs-pipe">| \
  awk -F<span class="hljs-string">'syscall=' <span class="hljs-string">'{print $2}' <span class="hljs-pipe">| awk <span class="hljs-string">'{print $1}' <span class="hljs-pipe">| \
  <span class="hljs-built_in">sort -u

Method 3: Kubernetes + Audit Mode

Create a RuntimeDefault override in your pod spec:

apiVersion: v1
kind: Pod
metadata:
  name: app-seccomp-audit
spec:
  securityContext:
    seccompProfile:
      type: Localhost
      localhostProfile: audit-profile.json
  containers:
    - name: app
      image: your-app:latest

After running your full test suite against the pod, extract syscalls from the node's audit log.

Phase 2: Building a Minimal Profile

From the syscall list collected during discovery, build a minimal allowlist profile.

Using oci-seccomp-bpf-hook (Automated)

# Install
go install github.com/containers/oci-seccomp-bpf-hook/cmd/oci-seccomp-bpf-hook@latest

<span class="hljs-comment"># Run your container; the hook captures all syscalls
docker run --<span class="hljs-built_in">rm \
  --annotation io.containers.trace-syscall=of:/tmp/profile.json \
  your-app:latest

<span class="hljs-comment"># Output is a ready-to-use seccomp profile
<span class="hljs-built_in">cat /tmp/profile.json

Manual Approach Using Python

#!/usr/bin/env python3
"""Build a seccomp profile from a list of syscall names."""
import json
import sys


COMMON_BASE = [
    "read", "write", "close", "fstat", "mmap", "mprotect", "munmap",
    "brk", "rt_sigaction", "rt_sigprocmask", "rt_sigreturn",
    "ioctl", "pread64", "pwrite64", "readv", "writev",
    "socket", "connect", "sendto", "recvfrom", "sendmsg", "recvmsg",
    "bind", "listen", "getsockname", "getpeername", "setsockopt", "getsockopt",
    "clone", "fork", "execve", "exit", "wait4", "getpid", "getppid",
    "getuid", "geteuid", "getgid", "getegid", "getcwd",
    "openat", "getdents64", "newfstatat", "exit_group",
    "futex", "set_tid_address", "set_robust_list", "prlimit64",
    "epoll_create1", "epoll_ctl", "epoll_pwait", "eventfd2",
    "accept4", "dup3", "pipe2", "getrandom", "statx", "rseq",
    "fcntl", "flock", "fsync", "uname", "kill", "select",
]


def build_profile(extra_syscalls=None):
    allowed = sorted(set(COMMON_BASE + (extra_syscalls or [])))
    profile = {
        "defaultAction": "SCMP_ACT_ERRNO",
        "architectures": [
            "SCMP_ARCH_X86_64",
            "SCMP_ARCH_X86",
            "SCMP_ARCH_X32"
        ],
        "syscalls": [
            {
                "names": allowed,
                "action": "SCMP_ACT_ALLOW"
            }
        ]
    }
    return profile


if __name__ == "__main__":
    extra = sys.argv[1:]  # pass additional syscalls as args
    print(json.dumps(build_profile(extra), indent=2))

Usage:

python3 build-profile.py mq_open mq_send mq_receive > my-app-profile.json

Phase 3: Testing the Profile

Test 1: Application Functionality

Apply the profile and run your integration tests. Any blocked syscall causes an EPERM or kills the process:

docker run --rm \
  --security-opt seccomp=my-app-profile.json \
  your-app:latest \
  ./run-tests.sh

If tests fail, check dmesg for seccomp entries:

dmesg | grep seccomp <span class="hljs-pipe">| <span class="hljs-built_in">tail -20
<span class="hljs-comment"># Output: [12345.678] audit: type=1326 audit(1234567890.123:456):
<span class="hljs-comment">#   auid=0 uid=0 gid=0 ses=1 subj=... pid=1234 comm="your-app"
<span class="hljs-comment">#   exe="/usr/bin/your-app" sig=31 arch=c000003e syscall=281 compat=0
<span class="hljs-comment">#   ip=0x... code=0x80000000

The syscall=281 is the syscall number. Look it up:

python3 -c "import ctypes; print(ctypes.CDLL(None).syscall.__name__)"
<span class="hljs-comment"># Or use ausyscall
ausyscall x86_64 281  <span class="hljs-comment"># prints: signalfd

Add the missing syscall to your allowlist and retest.

Test 2: Security Verification

After tuning the profile to allow your app, verify that dangerous syscalls are blocked:

import subprocess
import pytest


PROFILE_PATH = "profiles/my-app-profile.json"
IMAGE = "your-app:latest"


def run_in_container(cmd, profile=PROFILE_PATH):
    result = subprocess.run(
        [
            "docker", "run", "--rm",
            "--security-opt", f"seccomp={profile}",
            IMAGE,
            "sh", "-c", cmd
        ],
        capture_output=True,
        text=True,
        timeout=30
    )
    return result


def test_app_runs_successfully():
    """Application starts and exits cleanly with the profile applied."""
    result = run_in_container("./my-app --check-health && echo OK")
    assert result.returncode == 0, \
        f"App failed with profile applied:\n{result.stderr}"
    assert "OK" in result.stdout


def test_ptrace_blocked():
    """ptrace is not allowed — prevents container escapes."""
    result = run_in_container(
        "python3 -c \"import ctypes; ctypes.CDLL(None).ptrace(0,0,0,0)\""
    )
    assert result.returncode != 0, "ptrace should be blocked"


def test_mount_blocked():
    """mount syscall is not allowed — prevents filesystem manipulation."""
    result = run_in_container("mount --bind /tmp /mnt 2>&1 || echo BLOCKED")
    assert "BLOCKED" in result.stdout or result.returncode != 0


def test_kexec_blocked():
    """kexec is not allowed — prevents kernel replacement."""
    result = run_in_container(
        "python3 -c \"import ctypes; ctypes.CDLL(None).syscall(246)\""
    )
    # kexec_load = 246 on x86_64
    assert result.returncode != 0


def test_module_loading_blocked():
    """Loading kernel modules is not allowed."""
    result = run_in_container(
        "python3 -c \"import ctypes; ctypes.CDLL(None).syscall(175)\""
    )
    # init_module = 175 on x86_64
    assert result.returncode != 0

Run:

pytest tests/test_seccomp_profile.py -v

Test 3: Profile Diff Validation

When profiles change, automatically check that no new high-risk syscalls were added:

#!/usr/bin/env python3
"""Check that a seccomp profile doesn't allow dangerous syscalls."""
import json
import sys


DANGEROUS_SYSCALLS = {
    "ptrace",           # Container escape
    "mount",            # Filesystem manipulation
    "umount2",          # Filesystem manipulation
    "kexec_load",       # Kernel replacement
    "kexec_file_load",  # Kernel replacement
    "init_module",      # Kernel module loading
    "finit_module",     # Kernel module loading
    "delete_module",    # Kernel module unloading
    "settimeofday",     # System clock manipulation
    "adjtimex",         # Clock adjustment
    "clock_settime",    # Clock setting
    "nfsservctl",       # NFS server
    "pivot_root",       # Root filesystem change
    "chroot",           # Root directory change (suspicious in containers)
    "unshare",          # Namespace creation (often used in escapes)
    "keyctl",           # Kernel keyring (credential theft)
    "add_key",          # Kernel keyring
    "request_key",      # Kernel keyring
}


def check_profile(profile_path):
    with open(profile_path) as f:
        profile = json.load(f)
    
    violations = []
    for rule in profile.get("syscalls", []):
        if rule.get("action") == "SCMP_ACT_ALLOW":
            for syscall in rule.get("names", []):
                if syscall in DANGEROUS_SYSCALLS:
                    violations.append(syscall)
    
    if profile.get("defaultAction") == "SCMP_ACT_ALLOW":
        violations.append("defaultAction=SCMP_ACT_ALLOW (allows all by default)")
    
    return violations


if __name__ == "__main__":
    path = sys.argv[1]
    violations = check_profile(path)
    if violations:
        print(f"FAIL: Dangerous syscalls found in {path}:")
        for v in violations:
            print(f"  - {v}")
        sys.exit(1)
    print(f"OK: {path} passes security check")

Kubernetes Integration

Applying Profiles via RuntimeDefault

Kubernetes 1.19+ has built-in seccomp support:

apiVersion: v1
kind: Pod
metadata:
  name: my-app
spec:
  securityContext:
    seccompProfile:
      type: RuntimeDefault  # Docker/containerd default profile
  containers:
    - name: app
      image: my-app:latest
      securityContext:
        seccompProfile:
          type: Localhost                   # Container-specific profile
          localhostProfile: my-app-profile.json

Uploading Custom Profiles to Nodes

Profiles must exist on each node at /var/lib/kubelet/seccomp/:

# Copy profile to each node (use DaemonSet in production)
<span class="hljs-keyword">for node <span class="hljs-keyword">in $(kubectl get nodes -o name <span class="hljs-pipe">| <span class="hljs-built_in">cut -d/ -f2); <span class="hljs-keyword">do
  scp profiles/my-app-profile.json <span class="hljs-string">"$node:/var/lib/kubelet/seccomp/my-app-profile.json"
<span class="hljs-keyword">done

Or use the Security Profiles Operator for automated profile management:

helm repo add spo https://kubernetes-sigs.github.io/security-profiles-operator/
helm install spo spo/security-profiles-operator -n security-profiles-operator --create-namespace
apiVersion: security-profiles-operator.x-k8s.io/v1beta1
kind: SeccompProfile
metadata:
  name: my-app-profile
  namespace: default
spec:
  defaultAction: SCMP_ACT_ERRNO
  syscalls:
    - action: SCMP_ACT_ALLOW
      names:
        - read
        - write
        - exit
        - exit_group
        # ... rest of allowlist

Testing Profile Application in CI

name: seccomp Profile CI

on:
  push:
    paths:
      - 'profiles/**'

jobs:
  validate-profiles:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Check for dangerous syscalls
        run: |
          for profile in profiles/*.json; do
            python3 scripts/check-dangerous-syscalls.py "$profile"
          done

      - name: Test app runs with profile
        run: |
          docker build -t my-app:test .
          docker run --rm \
            --security-opt seccomp=profiles/my-app-profile.json \
            my-app:test \
            ./health-check.sh

      - name: Verify dangerous syscalls blocked
        run: |
          docker run --rm \
            --security-opt seccomp=profiles/my-app-profile.json \
            my-app:test \
            python3 -c "
              import ctypes, sys
              # Try ptrace (should fail)
              ret = ctypes.CDLL(None).ptrace(0, 0, 0, 0)
              if ret == -1:
                  print('ptrace blocked: OK')
              else:
                  print('ptrace NOT blocked: FAIL')
                  sys.exit(1)
            "

Debugging Blocked Syscalls in Production

When a profile blocks a syscall your app needs, the symptom is usually a silent crash or EPERM error. Debug it:

# Enable audit logging temporarily
<span class="hljs-comment"># In pod spec:
securityContext:
  seccompProfile:
    <span class="hljs-built_in">type: Localhost
    localhostProfile: audit-mode-profile.json  <span class="hljs-comment"># defaultAction: SCMP_ACT_LOG

<span class="hljs-comment"># Watch audit log on node
journalctl -k -f <span class="hljs-pipe">| grep seccomp

Convert syscall numbers to names quickly:

# One-liner to decode a syscall number
python3 -c <span class="hljs-string">"import ctypes; print(ctypes.CDLL(None).strsignal($(cat /proc/sys/kernel/seccomp)))"

<span class="hljs-comment"># Or use the syscall table directly
awk -v n=281 <span class="hljs-string">'$2=="common" && $1==n {print $3}' /usr/src/linux-headers-*/arch/x86/entry/syscalls/syscall_64.tbl

Summary

A hardened seccomp workflow:

  1. Discover: Run with SCMP_ACT_LOG + your full test suite, collect used syscalls
  2. Build: Generate a minimal allowlist profile with SCMP_ACT_ERRNO as default
  3. Test functionality: Run tests with the profile applied — must pass
  4. Test security: Verify dangerous syscalls (ptrace, mount, kexec, module loading) are blocked
  5. Enforce in CI: Validate profiles on every change, fail if dangerous syscalls added
  6. Deploy: Use Security Profiles Operator for managed rollout to nodes

The payoff: even if an attacker gets code execution inside your container, they can't call the syscalls needed to escape or escalate. seccomp makes your containers significantly harder to exploit.

Read more