seccomp Profiles: Writing and Testing Syscall Filters for Containers
Every system call a container makes is an attack surface. seccomp (Secure Computing Mode) lets you define exactly which syscalls a container is allowed to make — and kill or log anything outside that list.
The challenge is writing profiles that are tight enough to restrict attacker behavior but permissive enough that your application actually runs. That gap is where testing lives.
How seccomp Works
Linux seccomp operates at the kernel level. When a process makes a syscall, the kernel checks the process's seccomp filter:
- SCMP_ACT_ALLOW: syscall proceeds normally
- SCMP_ACT_ERRNO: syscall is blocked, returns an error code
- SCMP_ACT_KILL: process is killed immediately
- SCMP_ACT_LOG: syscall is logged but allowed (audit mode)
- SCMP_ACT_TRAP: sends SIGSYS to the process
Docker and Kubernetes both support seccomp profiles in JSON format. A profile lists the default action and any syscall-specific overrides.
Profile Format
A seccomp profile is a JSON file:
{
"defaultAction": "SCMP_ACT_ERRNO",
"architectures": ["SCMP_ARCH_X86_64", "SCMP_ARCH_X86", "SCMP_ARCH_X32"],
"syscalls": [
{
"names": [
"read", "write", "close", "fstat", "mmap", "mprotect",
"munmap", "brk", "pread64", "access", "pipe", "select",
"sched_yield", "mremap", "msync", "mincore", "madvise",
"rt_sigaction", "rt_sigprocmask", "rt_sigreturn",
"ioctl", "pwrite64", "readv", "writev", "getcwd",
"socket", "connect", "accept", "sendto", "recvfrom",
"sendmsg", "recvmsg", "bind", "listen", "getsockname",
"getpeername", "socketpair", "setsockopt", "getsockopt",
"clone", "fork", "vfork", "execve", "exit", "wait4",
"kill", "uname", "fcntl", "flock", "fsync", "getpid",
"getppid", "getuid", "geteuid", "getgid", "getegid",
"getdents64", "openat", "newfstatat", "exit_group",
"set_tid_address", "set_robust_list", "futex",
"epoll_create1", "epoll_ctl", "epoll_pwait", "eventfd2",
"timerfd_create", "timerfd_settime", "timerfd_gettime",
"accept4", "dup3", "pipe2", "prlimit64", "getrandom",
"seccomp", "statx", "rseq"
],
"action": "SCMP_ACT_ALLOW"
}
]
}The defaultAction is the critical field. In production: SCMP_ACT_ERRNO. For profiling: SCMP_ACT_LOG.
Phase 1: Discovery — What Syscalls Does Your App Actually Use?
Before writing a restrictive profile, you need to know what syscalls your application makes. Use audit mode to discover them.
Method 1: Docker + strace
# Run your container with strace
docker run --<span class="hljs-built_in">rm \
--security-opt seccomp=unconfined \
--entrypoint strace \
your-app:latest \
-f -e trace=all \
-o /tmp/syscalls.log \
/usr/bin/your-app --your-flags
<span class="hljs-comment"># Extract unique syscall names
grep -oP <span class="hljs-string">'(?<=\[pid\s+\d+\] )\w+|^\w+(?=\()' /tmp/syscalls.log \
<span class="hljs-pipe">| <span class="hljs-built_in">sort -u > syscalls-used.txtMethod 2: seccomp Audit Mode
Create an audit profile that logs everything but blocks nothing:
{
"defaultAction": "SCMP_ACT_LOG",
"syscalls": []
}Apply it:
docker run --rm \
--security-opt seccomp=audit-profile.json \
your-app:latestRead the audit log:
# On the host
journalctl -k <span class="hljs-pipe">| grep <span class="hljs-string">"seccomp" <span class="hljs-pipe">| grep <span class="hljs-string">"syscall=" <span class="hljs-pipe">| \
awk -F<span class="hljs-string">'syscall=' <span class="hljs-string">'{print $2}' <span class="hljs-pipe">| awk <span class="hljs-string">'{print $1}' <span class="hljs-pipe">| \
<span class="hljs-built_in">sort -uMethod 3: Kubernetes + Audit Mode
Create a RuntimeDefault override in your pod spec:
apiVersion: v1
kind: Pod
metadata:
name: app-seccomp-audit
spec:
securityContext:
seccompProfile:
type: Localhost
localhostProfile: audit-profile.json
containers:
- name: app
image: your-app:latestAfter running your full test suite against the pod, extract syscalls from the node's audit log.
Phase 2: Building a Minimal Profile
From the syscall list collected during discovery, build a minimal allowlist profile.
Using oci-seccomp-bpf-hook (Automated)
# Install
go install github.com/containers/oci-seccomp-bpf-hook/cmd/oci-seccomp-bpf-hook@latest
<span class="hljs-comment"># Run your container; the hook captures all syscalls
docker run --<span class="hljs-built_in">rm \
--annotation io.containers.trace-syscall=of:/tmp/profile.json \
your-app:latest
<span class="hljs-comment"># Output is a ready-to-use seccomp profile
<span class="hljs-built_in">cat /tmp/profile.jsonManual Approach Using Python
#!/usr/bin/env python3
"""Build a seccomp profile from a list of syscall names."""
import json
import sys
COMMON_BASE = [
"read", "write", "close", "fstat", "mmap", "mprotect", "munmap",
"brk", "rt_sigaction", "rt_sigprocmask", "rt_sigreturn",
"ioctl", "pread64", "pwrite64", "readv", "writev",
"socket", "connect", "sendto", "recvfrom", "sendmsg", "recvmsg",
"bind", "listen", "getsockname", "getpeername", "setsockopt", "getsockopt",
"clone", "fork", "execve", "exit", "wait4", "getpid", "getppid",
"getuid", "geteuid", "getgid", "getegid", "getcwd",
"openat", "getdents64", "newfstatat", "exit_group",
"futex", "set_tid_address", "set_robust_list", "prlimit64",
"epoll_create1", "epoll_ctl", "epoll_pwait", "eventfd2",
"accept4", "dup3", "pipe2", "getrandom", "statx", "rseq",
"fcntl", "flock", "fsync", "uname", "kill", "select",
]
def build_profile(extra_syscalls=None):
allowed = sorted(set(COMMON_BASE + (extra_syscalls or [])))
profile = {
"defaultAction": "SCMP_ACT_ERRNO",
"architectures": [
"SCMP_ARCH_X86_64",
"SCMP_ARCH_X86",
"SCMP_ARCH_X32"
],
"syscalls": [
{
"names": allowed,
"action": "SCMP_ACT_ALLOW"
}
]
}
return profile
if __name__ == "__main__":
extra = sys.argv[1:] # pass additional syscalls as args
print(json.dumps(build_profile(extra), indent=2))Usage:
python3 build-profile.py mq_open mq_send mq_receive > my-app-profile.jsonPhase 3: Testing the Profile
Test 1: Application Functionality
Apply the profile and run your integration tests. Any blocked syscall causes an EPERM or kills the process:
docker run --rm \
--security-opt seccomp=my-app-profile.json \
your-app:latest \
./run-tests.shIf tests fail, check dmesg for seccomp entries:
dmesg | grep seccomp <span class="hljs-pipe">| <span class="hljs-built_in">tail -20
<span class="hljs-comment"># Output: [12345.678] audit: type=1326 audit(1234567890.123:456):
<span class="hljs-comment"># auid=0 uid=0 gid=0 ses=1 subj=... pid=1234 comm="your-app"
<span class="hljs-comment"># exe="/usr/bin/your-app" sig=31 arch=c000003e syscall=281 compat=0
<span class="hljs-comment"># ip=0x... code=0x80000000The syscall=281 is the syscall number. Look it up:
python3 -c "import ctypes; print(ctypes.CDLL(None).syscall.__name__)"
<span class="hljs-comment"># Or use ausyscall
ausyscall x86_64 281 <span class="hljs-comment"># prints: signalfdAdd the missing syscall to your allowlist and retest.
Test 2: Security Verification
After tuning the profile to allow your app, verify that dangerous syscalls are blocked:
import subprocess
import pytest
PROFILE_PATH = "profiles/my-app-profile.json"
IMAGE = "your-app:latest"
def run_in_container(cmd, profile=PROFILE_PATH):
result = subprocess.run(
[
"docker", "run", "--rm",
"--security-opt", f"seccomp={profile}",
IMAGE,
"sh", "-c", cmd
],
capture_output=True,
text=True,
timeout=30
)
return result
def test_app_runs_successfully():
"""Application starts and exits cleanly with the profile applied."""
result = run_in_container("./my-app --check-health && echo OK")
assert result.returncode == 0, \
f"App failed with profile applied:\n{result.stderr}"
assert "OK" in result.stdout
def test_ptrace_blocked():
"""ptrace is not allowed — prevents container escapes."""
result = run_in_container(
"python3 -c \"import ctypes; ctypes.CDLL(None).ptrace(0,0,0,0)\""
)
assert result.returncode != 0, "ptrace should be blocked"
def test_mount_blocked():
"""mount syscall is not allowed — prevents filesystem manipulation."""
result = run_in_container("mount --bind /tmp /mnt 2>&1 || echo BLOCKED")
assert "BLOCKED" in result.stdout or result.returncode != 0
def test_kexec_blocked():
"""kexec is not allowed — prevents kernel replacement."""
result = run_in_container(
"python3 -c \"import ctypes; ctypes.CDLL(None).syscall(246)\""
)
# kexec_load = 246 on x86_64
assert result.returncode != 0
def test_module_loading_blocked():
"""Loading kernel modules is not allowed."""
result = run_in_container(
"python3 -c \"import ctypes; ctypes.CDLL(None).syscall(175)\""
)
# init_module = 175 on x86_64
assert result.returncode != 0Run:
pytest tests/test_seccomp_profile.py -vTest 3: Profile Diff Validation
When profiles change, automatically check that no new high-risk syscalls were added:
#!/usr/bin/env python3
"""Check that a seccomp profile doesn't allow dangerous syscalls."""
import json
import sys
DANGEROUS_SYSCALLS = {
"ptrace", # Container escape
"mount", # Filesystem manipulation
"umount2", # Filesystem manipulation
"kexec_load", # Kernel replacement
"kexec_file_load", # Kernel replacement
"init_module", # Kernel module loading
"finit_module", # Kernel module loading
"delete_module", # Kernel module unloading
"settimeofday", # System clock manipulation
"adjtimex", # Clock adjustment
"clock_settime", # Clock setting
"nfsservctl", # NFS server
"pivot_root", # Root filesystem change
"chroot", # Root directory change (suspicious in containers)
"unshare", # Namespace creation (often used in escapes)
"keyctl", # Kernel keyring (credential theft)
"add_key", # Kernel keyring
"request_key", # Kernel keyring
}
def check_profile(profile_path):
with open(profile_path) as f:
profile = json.load(f)
violations = []
for rule in profile.get("syscalls", []):
if rule.get("action") == "SCMP_ACT_ALLOW":
for syscall in rule.get("names", []):
if syscall in DANGEROUS_SYSCALLS:
violations.append(syscall)
if profile.get("defaultAction") == "SCMP_ACT_ALLOW":
violations.append("defaultAction=SCMP_ACT_ALLOW (allows all by default)")
return violations
if __name__ == "__main__":
path = sys.argv[1]
violations = check_profile(path)
if violations:
print(f"FAIL: Dangerous syscalls found in {path}:")
for v in violations:
print(f" - {v}")
sys.exit(1)
print(f"OK: {path} passes security check")Kubernetes Integration
Applying Profiles via RuntimeDefault
Kubernetes 1.19+ has built-in seccomp support:
apiVersion: v1
kind: Pod
metadata:
name: my-app
spec:
securityContext:
seccompProfile:
type: RuntimeDefault # Docker/containerd default profile
containers:
- name: app
image: my-app:latest
securityContext:
seccompProfile:
type: Localhost # Container-specific profile
localhostProfile: my-app-profile.jsonUploading Custom Profiles to Nodes
Profiles must exist on each node at /var/lib/kubelet/seccomp/:
# Copy profile to each node (use DaemonSet in production)
<span class="hljs-keyword">for node <span class="hljs-keyword">in $(kubectl get nodes -o name <span class="hljs-pipe">| <span class="hljs-built_in">cut -d/ -f2); <span class="hljs-keyword">do
scp profiles/my-app-profile.json <span class="hljs-string">"$node:/var/lib/kubelet/seccomp/my-app-profile.json"
<span class="hljs-keyword">doneOr use the Security Profiles Operator for automated profile management:
helm repo add spo https://kubernetes-sigs.github.io/security-profiles-operator/
helm install spo spo/security-profiles-operator -n security-profiles-operator --create-namespaceapiVersion: security-profiles-operator.x-k8s.io/v1beta1
kind: SeccompProfile
metadata:
name: my-app-profile
namespace: default
spec:
defaultAction: SCMP_ACT_ERRNO
syscalls:
- action: SCMP_ACT_ALLOW
names:
- read
- write
- exit
- exit_group
# ... rest of allowlistTesting Profile Application in CI
name: seccomp Profile CI
on:
push:
paths:
- 'profiles/**'
jobs:
validate-profiles:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Check for dangerous syscalls
run: |
for profile in profiles/*.json; do
python3 scripts/check-dangerous-syscalls.py "$profile"
done
- name: Test app runs with profile
run: |
docker build -t my-app:test .
docker run --rm \
--security-opt seccomp=profiles/my-app-profile.json \
my-app:test \
./health-check.sh
- name: Verify dangerous syscalls blocked
run: |
docker run --rm \
--security-opt seccomp=profiles/my-app-profile.json \
my-app:test \
python3 -c "
import ctypes, sys
# Try ptrace (should fail)
ret = ctypes.CDLL(None).ptrace(0, 0, 0, 0)
if ret == -1:
print('ptrace blocked: OK')
else:
print('ptrace NOT blocked: FAIL')
sys.exit(1)
"Debugging Blocked Syscalls in Production
When a profile blocks a syscall your app needs, the symptom is usually a silent crash or EPERM error. Debug it:
# Enable audit logging temporarily
<span class="hljs-comment"># In pod spec:
securityContext:
seccompProfile:
<span class="hljs-built_in">type: Localhost
localhostProfile: audit-mode-profile.json <span class="hljs-comment"># defaultAction: SCMP_ACT_LOG
<span class="hljs-comment"># Watch audit log on node
journalctl -k -f <span class="hljs-pipe">| grep seccompConvert syscall numbers to names quickly:
# One-liner to decode a syscall number
python3 -c <span class="hljs-string">"import ctypes; print(ctypes.CDLL(None).strsignal($(cat /proc/sys/kernel/seccomp)))"
<span class="hljs-comment"># Or use the syscall table directly
awk -v n=281 <span class="hljs-string">'$2=="common" && $1==n {print $3}' /usr/src/linux-headers-*/arch/x86/entry/syscalls/syscall_64.tblSummary
A hardened seccomp workflow:
- Discover: Run with
SCMP_ACT_LOG+ your full test suite, collect used syscalls - Build: Generate a minimal allowlist profile with
SCMP_ACT_ERRNOas default - Test functionality: Run tests with the profile applied — must pass
- Test security: Verify dangerous syscalls (
ptrace,mount,kexec, module loading) are blocked - Enforce in CI: Validate profiles on every change, fail if dangerous syscalls added
- Deploy: Use Security Profiles Operator for managed rollout to nodes
The payoff: even if an attacker gets code execution inside your container, they can't call the syscalls needed to escape or escalate. seccomp makes your containers significantly harder to exploit.