libFuzzer with Clang: Writing Fuzz Targets, Sanitizers, and Corpus Management
libFuzzer is a coverage-guided fuzzing engine built directly into LLVM/Clang. Unlike AFL++, which runs as a separate process and instruments via compiler wrappers, libFuzzer runs the fuzz target as a library — making it significantly faster for library-level fuzzing. It's the engine behind Google's OSS-Fuzz program, which runs continuous fuzzing on hundreds of critical open-source projects.
This tutorial covers writing effective fuzz targets, using sanitizers to catch more bugs, managing the corpus, and scaling to production-grade continuous fuzzing.
How libFuzzer Works
libFuzzer's architecture differs from AFL++ in one key way: the fuzzer and the target run in the same process.
┌─────────────────────────────────────────────────────┐
│ Single Process │
│ │
│ libFuzzer Engine │
│ ├── Mutation engine │
│ ├── Coverage tracking (SanitizerCoverage) │
│ └── Calls → LLVMFuzzerTestOneInput(data, size) │
│ │ │
│ └── Your parsing/library code │
└─────────────────────────────────────────────────────┘Benefits:
- No fork overhead: ~10x faster than process-based fuzzers for small inputs
- Direct ASAN integration: ASAN's shadow memory is in the same process — crashes are immediate
- Easy setup: No separate fuzzing binary, no input/output file dance
Writing Your First Fuzz Target
The libFuzzer interface is a single function:
// The entry point for libFuzzer
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
// Feed data to your target function
// Return 0 always (non-zero return is reserved)
return 0;
}Rules for the fuzz target:
- Never crash on expected invalid input: If your parser should handle errors gracefully, don't let it crash — that's the bug you're looking for
- No global state between calls: Each call to
LLVMFuzzerTestOneInputmust be independent - No memory leaks: libFuzzer calls this function millions of times; leaks will OOM the process
- Return 0: Return -1 only to tell libFuzzer to discard this input without counting it
Example: Fuzzing a JSON Parser
// fuzz_json.cpp
#include <cstdint>
#include <cstdlib>
#include <string>
#include "nlohmann/json.hpp" // Or whichever JSON library you use
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
// Convert raw bytes to string
std::string input(reinterpret_cast<const char*>(data), size);
try {
auto parsed = nlohmann::json::parse(input);
// Optional: test round-trip correctness
// If parse + serialize + re-parse gives different result, that's a bug
std::string serialized = parsed.dump();
auto reparsed = nlohmann::json::parse(serialized);
// Could assert here but be careful about legitimate differences
} catch (const nlohmann::json::exception&) {
// Parse errors are expected — not a bug
} catch (...) {
// Any other exception = bug
__builtin_trap(); // Force crash for libFuzzer to catch
}
return 0;
}Example: Fuzzing a URL Parser
// fuzz_url_parser.cpp
#include <cstdint>
#include "url_parser.h" // Your URL parsing code
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
if (size == 0) return 0;
// Ensure null termination
std::string url(reinterpret_cast<const char*>(data), size);
ParsedURL result;
bool success = parse_url(url.c_str(), &result);
if (success) {
// Verify invariants: if parse succeeded, output should be valid
if (result.scheme.empty()) {
// Parser claims success but no scheme — that's a bug
__builtin_trap();
}
// Round-trip: serialize and re-parse
std::string reconstructed = reconstruct_url(&result);
ParsedURL reparsed;
bool reparsed_ok = parse_url(reconstructed.c_str(), &reparsed);
if (!reparsed_ok) {
// We generated a URL our own parser can't parse
__builtin_trap();
}
}
return 0;
}Round-trip testing (parse → serialize → re-parse) is extremely powerful for finding inconsistencies in your data handling code.
Compiling and Running
Compilation
# Basic fuzz target
clang++ -fsanitize=fuzzer,address \
-fno-omit-frame-pointer \
-g \
-O1 \
-o fuzz_json \
fuzz_json.cpp \
-I/path/to/nlohmann
<span class="hljs-comment"># With all recommended sanitizers
clang++ -fsanitize=fuzzer,address,undefined \
-fno-omit-frame-pointer \
-fno-optimize-sibling-calls \
-g -O1 \
-o fuzz_target \
fuzz_target.cpp libparser.aKey flags:
-fsanitize=fuzzer: Links libFuzzer engine and instruments for coverage-fsanitize=address: AddressSanitizer — catches buffer overflows, use-after-free-fsanitize=undefined: UndefinedBehaviorSanitizer — catches integer overflow, null dereference-g: Debug symbols for readable crash reports-O1: Low optimization — keeps code readable in stack traces, doesn't eliminate too many checks
Running
# Basic run with a corpus directory
./fuzz_json corpus/
<span class="hljs-comment"># Run for a specific time budget (seconds)
./fuzz_json -max_total_time=300 corpus/
<span class="hljs-comment"># Run with a specific number of iterations
./fuzz_json -runs=1000000 corpus/
<span class="hljs-comment"># Show coverage stats periodically
./fuzz_json -print_coverage=1 corpus/libFuzzer Command-Line Reference
./fuzz_target [corpus_dir/] [flags]
# Key flags:
-max_len=N <span class="hljs-comment"># Max input length (default: 4096)
-max_total_time=N <span class="hljs-comment"># Stop after N seconds
-runs=N <span class="hljs-comment"># Stop after N executions
-workers=N <span class="hljs-comment"># Parallel workers (experimental)
-dict=file <span class="hljs-comment"># Dictionary of interesting tokens
-artifact_prefix=path/ <span class="hljs-comment"># Where to write crash/timeout inputs
-print_final_stats=1 <span class="hljs-comment"># Print stats at end
-only_ascii=1 <span class="hljs-comment"># Only generate ASCII inputs
-seed=N <span class="hljs-comment"># Fixed random seedSanitizers: The Bug Amplifiers
Sanitizers transform latent bugs into crashes that libFuzzer can detect. Without them, many bugs would be silent memory corruption that causes incorrect behavior much later, if at all.
AddressSanitizer (ASAN)
ASAN catches:
- Heap buffer overflow/underflow
- Stack buffer overflow
- Use-after-free
- Use-after-return
- Heap-use-after-free
- Double-free
clang++ -fsanitize=fuzzer,address -o fuzz_target fuzz_target.cppASAN adds ~2x memory overhead and 2x CPU overhead — acceptable for fuzzing.
MemorySanitizer (MSAN)
MSAN detects reads from uninitialized memory — a class of bugs ASAN misses:
clang++ -fsanitize=fuzzer,memory \
-fsanitize-memory-track-origins=2 \
-o fuzz_msan \
fuzz_target.cppImportant: MSAN requires that ALL libraries (including libc++) be compiled with MSAN instrumentation. Otherwise you get false positives from uninstrumented library code. Use MSAN with a custom-built instrumented libc++ or disable MSAN for known-clean libraries.
UndefinedBehaviorSanitizer (UBSAN)
UBSAN catches C/C++ undefined behavior:
clang++ -fsanitize=fuzzer,address,undefined \
-fno-sanitize-recover=all \
-o fuzz_ubsan \
fuzz_target.cpp-fno-sanitize-recover=all makes UBSAN abort on the first violation instead of continuing.
UBSAN catches:
- Signed integer overflow
- Null pointer dereference
- Shift out of range
- Array index out of bounds (with bounds checking)
- Invalid enum values
- Misaligned memory access
Combining Sanitizers
# ASAN + UBSAN: most common combination
clang++ -fsanitize=fuzzer,address,undefined -fno-sanitize-recover=all ...
<span class="hljs-comment"># MSAN: use separately from ASAN (they conflict)
clang++ -fsanitize=fuzzer,memory ...Run separate fuzzing campaigns for ASAN+UBSAN and MSAN to get coverage from both.
Corpus Management
Seeding the Corpus
mkdir corpus_seeds
<span class="hljs-comment"># Put real input samples in corpus_seeds/
<span class="hljs-built_in">cp /path/to/real_json_files/* corpus_seeds/
<span class="hljs-built_in">cp tests/fixtures/*.json corpus_seeds/Corpus quality matters more than size. A few dozen real-world inputs are better than thousands of random files.
Minimizing the Corpus
libFuzzer has built-in corpus merging that removes duplicate coverage:
# Merge corpus and deduplicate
./fuzz_json -merge=1 corpus_output/ corpus_seeds/ existing_corpus/-merge=1 runs all inputs in corpus_seeds/ and existing_corpus/, adds only those with new coverage to corpus_output/.
Corpus for Structured Inputs
For structured inputs (protobuf, ASN.1, etc.), use structured fuzzing:
// Protobuf-aware fuzzing with libprotobuf-mutator
#include "libprotobuf-mutator/src/libfuzzer/libfuzzer_macro.h"
#include "my_proto.pb.h"
DEFINE_PROTO_FUZZER(const MyRequest& request) {
// request is always a valid protobuf message
// libprotobuf-mutator mutates at the proto level
process_request(request);
}Dictionary-Guided Fuzzing
# json.dict
<span class="hljs-string">"null"
<span class="hljs-string">"true"
<span class="hljs-string">"false"
<span class="hljs-string">"\""
<span class="hljs-string">"{"
<span class="hljs-string">"}"
<span class="hljs-string">"["
<span class="hljs-string">"]"
<span class="hljs-string">":"
<span class="hljs-string">","
<span class="hljs-string">"\\n"
<span class="hljs-string">"\\t"
<span class="hljs-comment"># Run with dictionary
./fuzz_json -dict=json.dict corpus/Writing Oracle Assertions
A fuzz target that only checks for crashes misses many logic bugs. Add oracle assertions to catch incorrect behavior:
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
if (size < 4) return 0;
// Differential testing: compare two implementations
int result_v1 = parse_v1(data, size);
int result_v2 = parse_v2(data, size);
// They should always agree
assert(result_v1 == result_v2);
return 0;
}Differential fuzzing — comparing two implementations or two versions of the same code — is one of the most powerful techniques for finding semantic bugs.
Reproducing and Analyzing Crashes
When libFuzzer finds a crash, it saves the input to crash-<hash>:
# Reproduce the crash
./fuzz_target crash-a1b2c3d4e5f6...
<span class="hljs-comment"># With ASAN verbose output
ASAN_OPTIONS=<span class="hljs-string">"verbosity=1:print_stacktrace=1" ./fuzz_target crash-...
<span class="hljs-comment"># Get the stack trace with symbolization
ASAN_SYMBOLIZER_PATH=$(<span class="hljs-built_in">which llvm-symbolizer) ./fuzz_target crash-...Minimizing Crash Inputs
libFuzzer has built-in minimization:
# Minimize the crash input
./fuzz_target -minimize_crash=1 -runs=10000 crash-a1b2c3...This produces minimized-from-crash-... — a smaller input that still triggers the same crash.
Integrating with OSS-Fuzz
OSS-Fuzz is Google's free continuous fuzzing service for open-source projects. It runs your libFuzzer targets on a fleet of machines 24/7 and reports findings.
project.yaml
homepage: "https://github.com/yourorg/yourproject"
language: c++
primary_contact: "security@yourorg.com"
sanitizers:
- address
- undefined
- memory
fuzzing_engines:
- libfuzzer
- aflbuild.sh
#!/bin/bash -eu
<span class="hljs-comment"># Build and install dependencies
<span class="hljs-built_in">cd <span class="hljs-variable">$SRC/yourproject
./configure --disable-shared --enable-static
make -j$(<span class="hljs-built_in">nproc) install
<span class="hljs-comment"># Build fuzz targets
<span class="hljs-variable">$CXX <span class="hljs-variable">$CXXFLAGS <span class="hljs-variable">$LIB_FUZZING_ENGINE \
fuzz/fuzz_json.cpp \
-I include/ \
-l yourlib \
-o <span class="hljs-variable">$OUT/fuzz_jsonOSS-Fuzz handles infrastructure, scaling, crash deduplication, and notifying maintainers.
Continuous Fuzzing in CI
# GitHub Actions: 5-minute fuzz campaign
name: Fuzz Tests
on:
push:
branches: [main]
schedule:
- cron: '0 2 * * *' # Nightly extended run
jobs:
fuzz:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install LLVM/Clang
run: sudo apt-get install -y clang
- name: Build fuzz target
run: |
clang++ -fsanitize=fuzzer,address,undefined \
-fno-sanitize-recover=all \
-g -O1 \
-o fuzz_target \
fuzz/fuzz_target.cpp src/library.cpp
- name: Run fuzzer
run: |
./fuzz_target -max_total_time=300 corpus/
continue-on-error: true
- name: Check for crashes
run: |
if ls crash-* 2>/dev/null || ls timeout-* 2>/dev/null; then
echo "FUZZER FOUND CRASHES OR TIMEOUTS"
ls -la crash-* timeout-* 2>/dev/null
exit 1
fiHelpMeTest and Continuous Monitoring
libFuzzer secures your C/C++ library code against malformed input. But production applications fail in ways that no fuzzer can simulate: misconfigurations, third-party API changes, database issues, and user-facing workflow bugs.
HelpMeTest runs continuous end-to-end tests against your live application 24/7. Write tests in plain English — no code required. HelpMeTest and libFuzzer complement each other: libFuzzer secures the parsing layer, HelpMeTest secures the user-facing behavior.
Summary
libFuzzer is the right tool when you need:
- Maximum performance (in-process, no fork overhead)
- Deep LLVM integration (sanitizers, coverage tracking)
- Library-level fuzzing
- Structured input fuzzing (protobuf, ASN.1)
- OSS-Fuzz integration
Key practices:
- Always combine with ASAN (
-fsanitize=address) - Add UBSAN (
-fsanitize=undefined -fno-sanitize-recover=all) - Write oracle assertions for logic bugs, not just crashes
- Minimize crash inputs before reporting
- Manage and deduplicate the corpus regularly
- Integrate into CI with a time budget
A fuzz target takes 30 minutes to write. The bugs it finds can take days to hunt manually. The ROI is exceptional.