ClusterFuzz: Google's Distributed Fuzzing Infrastructure for CI

ClusterFuzz: Google's Distributed Fuzzing Infrastructure for CI

OSS-Fuzz handles the hosting and infrastructure for open source projects. But what runs underneath OSS-Fuzz? The answer is ClusterFuzz — Google's distributed fuzzing platform that scales across thousands of machines, automatically manages corpora, deduplicates crashes, minimizes test cases, and integrates with bug trackers. ClusterFuzz is open source and can be run on your own infrastructure. For organizations that need enterprise-grade continuous fuzzing for private codebases, it provides the same capabilities that power the security testing of Chrome and Android.

ClusterFuzz Architecture

ClusterFuzz has several components:

Bot: A worker machine that runs fuzz jobs. Each bot runs a fuzzing engine (libFuzzer, AFL++, Honggfuzz) against a fuzz target, collecting crashes and coverage data. Bots can run on physical machines, VMs, or Kubernetes pods.

Fuzzing engines: ClusterFuzz supports libFuzzer, AFL++, Honggfuzz, and others. Each has different mutation strategies and strengths. ClusterFuzz automatically rotates between engines.

Crash deduplication: When two fuzz inputs cause the same crash, ClusterFuzz detects that they are duplicates (using the crash stack signature) and keeps only the minimal reproducer. Without deduplication, you would drown in thousands of reports for the same underlying bug.

Corpus management: ClusterFuzz maintains and syncs corpora across all bots. Inputs that find new coverage are added to the corpus; old inputs are periodically pruned. This ensures all bots benefit from each other's discoveries.

Dashboard: A web UI showing fuzzer status, crash reports, coverage over time, and job history.

Minimizer: When a crash is found, ClusterFuzz automatically minimizes the crashing input to the smallest possible reproducer. This is critical for debugging — a minimal reproducer is much easier to analyze than a 10KB file.

Bisection: ClusterFuzz can automatically find the commit that introduced a bug by binary-searching through git history and testing each commit.

Running ClusterFuzz Locally

For local experimentation and small-scale deployment:

# Prerequisites: Python 3.8+, Docker
git <span class="hljs-built_in">clone https://github.com/google/clusterfuzz
<span class="hljs-built_in">cd clusterfuzz

<span class="hljs-comment"># Install Python dependencies
pip install -r src/requirements.txt

<span class="hljs-comment"># Start local development server
python3 butler.py run_server --storage-path /tmp/clusterfuzz-data

<span class="hljs-comment"># In another terminal, start a bot
python3 butler.py run_bot \
  --name local-bot-1 \
  --server-storage-path /tmp/clusterfuzz-data

Access the dashboard at http://localhost:9000.

Google Cloud Platform Deployment

For production use, ClusterFuzz runs on GCP:

# Configure GCP project
gcloud config <span class="hljs-built_in">set project my-fuzz-project
gcloud services <span class="hljs-built_in">enable compute.googleapis.com
gcloud services <span class="hljs-built_in">enable storage.googleapis.com
gcloud services <span class="hljs-built_in">enable pubsub.googleapis.com

<span class="hljs-comment"># Deploy ClusterFuzz
python3 butler.py deploy \
  --config-dir configs/test/ \
  --staging \
  --force

<span class="hljs-comment"># Create fuzzer bots (GCE instances)
python3 butler.py create_config \
  --project-id my-fuzz-project \
  --bot-count 10 \
  --zone us-central1-a

The GCP deployment uses:

  • Cloud Storage for corpus, crashes, and builds
  • Cloud Pub/Sub for task distribution between bots
  • Cloud Datastore for metadata
  • App Engine for the dashboard

Integrating with CI

The most powerful ClusterFuzz CI integration is the build upload workflow. Every time your CI builds a new version, it uploads the fuzz targets to ClusterFuzz, which then runs them continuously.

GitHub Actions workflow:

name: Build and Upload Fuzz Targets
on:
  push:
    branches: [main]

jobs:
  fuzz-build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Build fuzz targets with instrumentation
        run: |
          cmake -B build \
            -DCMAKE_C_COMPILER=clang \
            -DCMAKE_CXX_COMPILER=clang++ \
            -DCMAKE_C_FLAGS="-fsanitize=fuzzer-no-link,address,undefined" \
            -DCMAKE_CXX_FLAGS="-fsanitize=fuzzer-no-link,address,undefined"
          make -C build all_fuzz_targets
          
          # Package for ClusterFuzz
          mkdir fuzz-artifacts
          cp build/fuzz_* fuzz-artifacts/
          tar czf fuzz-build-${{ github.sha }}.tar.gz fuzz-artifacts/
      
      - name: Upload to ClusterFuzz
        run: |
          python3 -c "
          import requests
          import os
          
          with open('fuzz-build-${{ github.sha }}.tar.gz', 'rb') as f:
              r = requests.post(
                  'https://clusterfuzz.example.com/upload-build',
                  headers={'Authorization': 'Bearer ${{ secrets.CLUSTERFUZZ_TOKEN }}'},
                  files={'build': f},
                  data={
                      'project': 'my-project',
                      'platform': 'linux',
                      'job': 'libfuzzer_asan_my_project',
                      'revision': '${{ github.sha }}'
                  }
              )
          print(r.json())
          "

ClusterFuzz picks up the new build and starts fuzzing with it. If a new build introduces a regression (a crash on an input that used to work), ClusterFuzz files a bug automatically.

Crash Deduplication in Depth

Crash deduplication is one of ClusterFuzz's most valuable features. Without it, a single bug that manifests in many ways generates thousands of duplicate reports.

ClusterFuzz's deduplication algorithm:

  1. Extract the crash signature: the top N frames of the stack trace (usually 3-5 frames, excluding fuzzer infrastructure frames)
  2. Normalize: remove memory addresses (which vary between runs), normalize module offsets
  3. Hash: compute a hash of the normalized signature
  4. Deduplicate: if two crashes have the same hash, they are the same bug; keep only the smaller reproducer

For AddressSanitizer crashes:

# Bug A (found first, reproducer = 1.2 KB input):
ASAN: heap-buffer-overflow READ 4
  #0 parse_number json.c:142
  #1 parse_value json.c:89
  #2 LLVMFuzzerTestOneInput fuzz_json.cc:12

# Bug B (found later, reproducer = 38 bytes):
ASAN: heap-buffer-overflow READ 4
  #0 parse_number json.c:142
  #1 parse_value json.c:89
  #2 LLVMFuzzerTestOneInput fuzz_json.cc:12

# Same signature → keep Bug B (smaller reproducer), close Bug A as duplicate

Coverage-Guided Fuzzing at Scale

ClusterFuzz uses coverage data to coordinate across bots. Each bot reports which code paths its corpus exercises. The corpus manager merges corpora from all bots and ensures that inputs covering unique paths are preserved.

The coverage visualization shows which parts of your code are being exercised:

# Generate coverage report from ClusterFuzz corpus
llvm-profdata merge \
    -sparse corpus/*/default.profraw \
    -o coverage.profdata

llvm-cov show \
    ./fuzz_target \
    -instr-profile=coverage.profdata \
    -format=html \
    -output-dir=coverage-report/

Coverage gaps in the report tell you where to improve your seed corpus or fuzz target. A good fuzz target should cover at least 70-80% of the code it targets — anything less means there are code paths the fuzzer cannot reach, which are likely where the interesting bugs live.

Multi-Fuzzer Strategy

ClusterFuzz supports running multiple fuzzing engines simultaneously against the same targets. Different engines find different bugs:

  • libFuzzer: fast, great at finding short execution paths, excellent with ASan
  • AFL++: better at state machine exploration, finds bugs in stateful protocols
  • Honggfuzz: strong at persistent fuzzing, good for multi-threaded code

Configure multiple jobs for the same target:

# clusterfuzz job configuration
jobs = [
    {
        'name': 'libfuzzer_asan_myproject',
        'fuzzer': 'libFuzzer',
        'sanitizer': 'address',
        'targets': ['fuzz_json_parser', 'fuzz_http_parser'],
    },
    {
        'name': 'afl_asan_myproject',
        'fuzzer': 'AFL++',
        'sanitizer': 'address',
        'targets': ['fuzz_json_parser', 'fuzz_http_parser'],
    },
    {
        'name': 'libfuzzer_ubsan_myproject',
        'fuzzer': 'libFuzzer',
        'sanitizer': 'undefined',
        'targets': ['fuzz_json_parser', 'fuzz_http_parser'],
    },
]

Corpora are shared across all engines — if AFL++ finds a new path, that input is added to libFuzzer's corpus too, and vice versa.

Monitoring and Alerting

ClusterFuzz exposes metrics via its dashboard API. For integration with Prometheus/Grafana:

# ClusterFuzz metrics endpoint
import requests

def get_clusterfuzz_metrics(project):
    r = requests.get(
        'https://clusterfuzz.example.com/api/metrics',
        params={'project': project},
        headers={'Authorization': f'Bearer {CLUSTERFUZZ_TOKEN}'}
    )
    return r.json()

# Metrics available:
# - crashes_per_hour
# - coverage_percent
# - corpus_size
# - executions_per_second
# - unique_crash_count

Set up alerts for:

  • New crash found (immediate alert)
  • Coverage drops below threshold (weekly check)
  • No fuzzing activity for >24 hours (infrastructure health)

Cost Optimization

Running many bots 24/7 is expensive. Strategies to reduce costs:

Preemptible/Spot instances: ClusterFuzz bots tolerate being killed — they checkpoint progress regularly. Using spot instances can cut bot costs by 70-90%.

Coverage-based scheduling: Run more bots when coverage is growing (active exploration), fewer when it plateaus (maintenance mode).

Targeted fuzzing: Prioritize fuzz targets that cover recently changed code. Use git diff to identify which fuzz targets are affected by each commit, and schedule those jobs with higher priority.

Corpus pruning: Regularly prune the corpus to remove inputs that are redundant (covered by other inputs). ClusterFuzz does this automatically, but tuning the pruning aggressiveness reduces corpus bloat.

ClusterFuzz represents the state of the art in continuous fuzzing infrastructure. For organizations that handle untrusted input in security-critical software, deploying it internally with your own compute provides the same level of continuous security testing that Google applies to Chrome. The initial setup investment pays back quickly when the first remotely exploitable vulnerability is found before it ships.

Read more