libFuzzer with Clang: Writing Fuzz Targets, Sanitizers, and Corpus Management

libFuzzer with Clang: Writing Fuzz Targets, Sanitizers, and Corpus Management

libFuzzer is a coverage-guided fuzzing engine built directly into LLVM/Clang. Unlike AFL++, which runs as a separate process and instruments via compiler wrappers, libFuzzer runs the fuzz target as a library — making it significantly faster for library-level fuzzing. It's the engine behind Google's OSS-Fuzz program, which runs continuous fuzzing on hundreds of critical open-source projects.

This tutorial covers writing effective fuzz targets, using sanitizers to catch more bugs, managing the corpus, and scaling to production-grade continuous fuzzing.

How libFuzzer Works

libFuzzer's architecture differs from AFL++ in one key way: the fuzzer and the target run in the same process.

┌─────────────────────────────────────────────────────┐
│  Single Process                                      │
│                                                     │
│  libFuzzer Engine                                   │
│    ├── Mutation engine                              │
│    ├── Coverage tracking (SanitizerCoverage)        │
│    └── Calls → LLVMFuzzerTestOneInput(data, size)   │
│                    │                                │
│                    └── Your parsing/library code    │
└─────────────────────────────────────────────────────┘

Benefits:

  • No fork overhead: ~10x faster than process-based fuzzers for small inputs
  • Direct ASAN integration: ASAN's shadow memory is in the same process — crashes are immediate
  • Easy setup: No separate fuzzing binary, no input/output file dance

Writing Your First Fuzz Target

The libFuzzer interface is a single function:

// The entry point for libFuzzer
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    // Feed data to your target function
    // Return 0 always (non-zero return is reserved)
    return 0;
}

Rules for the fuzz target:

  1. Never crash on expected invalid input: If your parser should handle errors gracefully, don't let it crash — that's the bug you're looking for
  2. No global state between calls: Each call to LLVMFuzzerTestOneInput must be independent
  3. No memory leaks: libFuzzer calls this function millions of times; leaks will OOM the process
  4. Return 0: Return -1 only to tell libFuzzer to discard this input without counting it

Example: Fuzzing a JSON Parser

// fuzz_json.cpp
#include <cstdint>
#include <cstdlib>
#include <string>
#include "nlohmann/json.hpp"  // Or whichever JSON library you use

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    // Convert raw bytes to string
    std::string input(reinterpret_cast<const char*>(data), size);
    
    try {
        auto parsed = nlohmann::json::parse(input);
        
        // Optional: test round-trip correctness
        // If parse + serialize + re-parse gives different result, that's a bug
        std::string serialized = parsed.dump();
        auto reparsed = nlohmann::json::parse(serialized);
        // Could assert here but be careful about legitimate differences
        
    } catch (const nlohmann::json::exception&) {
        // Parse errors are expected — not a bug
    } catch (...) {
        // Any other exception = bug
        __builtin_trap();  // Force crash for libFuzzer to catch
    }
    
    return 0;
}

Example: Fuzzing a URL Parser

// fuzz_url_parser.cpp
#include <cstdint>
#include "url_parser.h"  // Your URL parsing code

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    if (size == 0) return 0;
    
    // Ensure null termination
    std::string url(reinterpret_cast<const char*>(data), size);
    
    ParsedURL result;
    bool success = parse_url(url.c_str(), &result);
    
    if (success) {
        // Verify invariants: if parse succeeded, output should be valid
        if (result.scheme.empty()) {
            // Parser claims success but no scheme — that's a bug
            __builtin_trap();
        }
        
        // Round-trip: serialize and re-parse
        std::string reconstructed = reconstruct_url(&result);
        ParsedURL reparsed;
        bool reparsed_ok = parse_url(reconstructed.c_str(), &reparsed);
        
        if (!reparsed_ok) {
            // We generated a URL our own parser can't parse
            __builtin_trap();
        }
    }
    
    return 0;
}

Round-trip testing (parse → serialize → re-parse) is extremely powerful for finding inconsistencies in your data handling code.

Compiling and Running

Compilation

# Basic fuzz target
clang++ -fsanitize=fuzzer,address \
        -fno-omit-frame-pointer \
        -g \
        -O1 \
        -o fuzz_json \
        fuzz_json.cpp \
        -I/path/to/nlohmann

<span class="hljs-comment"># With all recommended sanitizers
clang++ -fsanitize=fuzzer,address,undefined \
        -fno-omit-frame-pointer \
        -fno-optimize-sibling-calls \
        -g -O1 \
        -o fuzz_target \
        fuzz_target.cpp libparser.a

Key flags:

  • -fsanitize=fuzzer: Links libFuzzer engine and instruments for coverage
  • -fsanitize=address: AddressSanitizer — catches buffer overflows, use-after-free
  • -fsanitize=undefined: UndefinedBehaviorSanitizer — catches integer overflow, null dereference
  • -g: Debug symbols for readable crash reports
  • -O1: Low optimization — keeps code readable in stack traces, doesn't eliminate too many checks

Running

# Basic run with a corpus directory
./fuzz_json corpus/

<span class="hljs-comment"># Run for a specific time budget (seconds)
./fuzz_json -max_total_time=300 corpus/

<span class="hljs-comment"># Run with a specific number of iterations
./fuzz_json -runs=1000000 corpus/

<span class="hljs-comment"># Show coverage stats periodically
./fuzz_json -print_coverage=1 corpus/

libFuzzer Command-Line Reference

./fuzz_target [corpus_dir/] [flags]

# Key flags:
-max_len=N              <span class="hljs-comment"># Max input length (default: 4096)
-max_total_time=N       <span class="hljs-comment"># Stop after N seconds
-runs=N                 <span class="hljs-comment"># Stop after N executions
-workers=N              <span class="hljs-comment"># Parallel workers (experimental)
-dict=file              <span class="hljs-comment"># Dictionary of interesting tokens
-artifact_prefix=path/  <span class="hljs-comment"># Where to write crash/timeout inputs
-print_final_stats=1    <span class="hljs-comment"># Print stats at end
-only_ascii=1           <span class="hljs-comment"># Only generate ASCII inputs
-seed=N                 <span class="hljs-comment"># Fixed random seed

Sanitizers: The Bug Amplifiers

Sanitizers transform latent bugs into crashes that libFuzzer can detect. Without them, many bugs would be silent memory corruption that causes incorrect behavior much later, if at all.

AddressSanitizer (ASAN)

ASAN catches:

  • Heap buffer overflow/underflow
  • Stack buffer overflow
  • Use-after-free
  • Use-after-return
  • Heap-use-after-free
  • Double-free
clang++ -fsanitize=fuzzer,address -o fuzz_target fuzz_target.cpp

ASAN adds ~2x memory overhead and 2x CPU overhead — acceptable for fuzzing.

MemorySanitizer (MSAN)

MSAN detects reads from uninitialized memory — a class of bugs ASAN misses:

clang++ -fsanitize=fuzzer,memory \
        -fsanitize-memory-track-origins=2 \
        -o fuzz_msan \
        fuzz_target.cpp

Important: MSAN requires that ALL libraries (including libc++) be compiled with MSAN instrumentation. Otherwise you get false positives from uninstrumented library code. Use MSAN with a custom-built instrumented libc++ or disable MSAN for known-clean libraries.

UndefinedBehaviorSanitizer (UBSAN)

UBSAN catches C/C++ undefined behavior:

clang++ -fsanitize=fuzzer,address,undefined \
        -fno-sanitize-recover=all \
        -o fuzz_ubsan \
        fuzz_target.cpp

-fno-sanitize-recover=all makes UBSAN abort on the first violation instead of continuing.

UBSAN catches:

  • Signed integer overflow
  • Null pointer dereference
  • Shift out of range
  • Array index out of bounds (with bounds checking)
  • Invalid enum values
  • Misaligned memory access

Combining Sanitizers

# ASAN + UBSAN: most common combination
clang++ -fsanitize=fuzzer,address,undefined -fno-sanitize-recover=all ...

<span class="hljs-comment"># MSAN: use separately from ASAN (they conflict)
clang++ -fsanitize=fuzzer,memory ...

Run separate fuzzing campaigns for ASAN+UBSAN and MSAN to get coverage from both.

Corpus Management

Seeding the Corpus

mkdir corpus_seeds

<span class="hljs-comment"># Put real input samples in corpus_seeds/
<span class="hljs-built_in">cp /path/to/real_json_files/* corpus_seeds/
<span class="hljs-built_in">cp tests/fixtures/*.json corpus_seeds/

Corpus quality matters more than size. A few dozen real-world inputs are better than thousands of random files.

Minimizing the Corpus

libFuzzer has built-in corpus merging that removes duplicate coverage:

# Merge corpus and deduplicate
./fuzz_json -merge=1 corpus_output/ corpus_seeds/ existing_corpus/

-merge=1 runs all inputs in corpus_seeds/ and existing_corpus/, adds only those with new coverage to corpus_output/.

Corpus for Structured Inputs

For structured inputs (protobuf, ASN.1, etc.), use structured fuzzing:

// Protobuf-aware fuzzing with libprotobuf-mutator
#include "libprotobuf-mutator/src/libfuzzer/libfuzzer_macro.h"
#include "my_proto.pb.h"

DEFINE_PROTO_FUZZER(const MyRequest& request) {
    // request is always a valid protobuf message
    // libprotobuf-mutator mutates at the proto level
    process_request(request);
}

Dictionary-Guided Fuzzing

# json.dict
<span class="hljs-string">"null"
<span class="hljs-string">"true"
<span class="hljs-string">"false"
<span class="hljs-string">"\""
<span class="hljs-string">"{"
<span class="hljs-string">"}"
<span class="hljs-string">"["
<span class="hljs-string">"]"
<span class="hljs-string">":"
<span class="hljs-string">","
<span class="hljs-string">"\\n"
<span class="hljs-string">"\\t"

<span class="hljs-comment"># Run with dictionary
./fuzz_json -dict=json.dict corpus/

Writing Oracle Assertions

A fuzz target that only checks for crashes misses many logic bugs. Add oracle assertions to catch incorrect behavior:

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    if (size < 4) return 0;
    
    // Differential testing: compare two implementations
    int result_v1 = parse_v1(data, size);
    int result_v2 = parse_v2(data, size);
    
    // They should always agree
    assert(result_v1 == result_v2);
    
    return 0;
}

Differential fuzzing — comparing two implementations or two versions of the same code — is one of the most powerful techniques for finding semantic bugs.

Reproducing and Analyzing Crashes

When libFuzzer finds a crash, it saves the input to crash-<hash>:

# Reproduce the crash
./fuzz_target crash-a1b2c3d4e5f6...

<span class="hljs-comment"># With ASAN verbose output
ASAN_OPTIONS=<span class="hljs-string">"verbosity=1:print_stacktrace=1" ./fuzz_target crash-...

<span class="hljs-comment"># Get the stack trace with symbolization
ASAN_SYMBOLIZER_PATH=$(<span class="hljs-built_in">which llvm-symbolizer) ./fuzz_target crash-...

Minimizing Crash Inputs

libFuzzer has built-in minimization:

# Minimize the crash input
./fuzz_target -minimize_crash=1 -runs=10000 crash-a1b2c3...

This produces minimized-from-crash-... — a smaller input that still triggers the same crash.

Integrating with OSS-Fuzz

OSS-Fuzz is Google's free continuous fuzzing service for open-source projects. It runs your libFuzzer targets on a fleet of machines 24/7 and reports findings.

project.yaml

homepage: "https://github.com/yourorg/yourproject"
language: c++
primary_contact: "security@yourorg.com"
sanitizers:
  - address
  - undefined
  - memory
fuzzing_engines:
  - libfuzzer
  - afl

build.sh

#!/bin/bash -eu
<span class="hljs-comment"># Build and install dependencies
<span class="hljs-built_in">cd <span class="hljs-variable">$SRC/yourproject
./configure --disable-shared --enable-static
make -j$(<span class="hljs-built_in">nproc) install

<span class="hljs-comment"># Build fuzz targets
<span class="hljs-variable">$CXX <span class="hljs-variable">$CXXFLAGS <span class="hljs-variable">$LIB_FUZZING_ENGINE \
    fuzz/fuzz_json.cpp \
    -I include/ \
    -l yourlib \
    -o <span class="hljs-variable">$OUT/fuzz_json

OSS-Fuzz handles infrastructure, scaling, crash deduplication, and notifying maintainers.

Continuous Fuzzing in CI

# GitHub Actions: 5-minute fuzz campaign
name: Fuzz Tests

on:
  push:
    branches: [main]
  schedule:
    - cron: '0 2 * * *'  # Nightly extended run

jobs:
  fuzz:
    runs-on: ubuntu-latest
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Install LLVM/Clang
        run: sudo apt-get install -y clang
      
      - name: Build fuzz target
        run: |
          clang++ -fsanitize=fuzzer,address,undefined \
                  -fno-sanitize-recover=all \
                  -g -O1 \
                  -o fuzz_target \
                  fuzz/fuzz_target.cpp src/library.cpp
      
      - name: Run fuzzer
        run: |
          ./fuzz_target -max_total_time=300 corpus/
        continue-on-error: true
      
      - name: Check for crashes
        run: |
          if ls crash-* 2>/dev/null || ls timeout-* 2>/dev/null; then
            echo "FUZZER FOUND CRASHES OR TIMEOUTS"
            ls -la crash-* timeout-* 2>/dev/null
            exit 1
          fi

HelpMeTest and Continuous Monitoring

libFuzzer secures your C/C++ library code against malformed input. But production applications fail in ways that no fuzzer can simulate: misconfigurations, third-party API changes, database issues, and user-facing workflow bugs.

HelpMeTest runs continuous end-to-end tests against your live application 24/7. Write tests in plain English — no code required. HelpMeTest and libFuzzer complement each other: libFuzzer secures the parsing layer, HelpMeTest secures the user-facing behavior.

Summary

libFuzzer is the right tool when you need:

  • Maximum performance (in-process, no fork overhead)
  • Deep LLVM integration (sanitizers, coverage tracking)
  • Library-level fuzzing
  • Structured input fuzzing (protobuf, ASN.1)
  • OSS-Fuzz integration

Key practices:

  1. Always combine with ASAN (-fsanitize=address)
  2. Add UBSAN (-fsanitize=undefined -fno-sanitize-recover=all)
  3. Write oracle assertions for logic bugs, not just crashes
  4. Minimize crash inputs before reporting
  5. Manage and deduplicate the corpus regularly
  6. Integrate into CI with a time budget

A fuzz target takes 30 minutes to write. The bugs it finds can take days to hunt manually. The ROI is exceptional.

Read more