libFuzzer Guide: In-Process Fuzzing with LLVM

libFuzzer Guide: In-Process Fuzzing with LLVM

libFuzzer is LLVM's in-process, coverage-guided fuzzing engine. Unlike AFL which forks a new process per input, libFuzzer runs inside the target process. This eliminates process spawn overhead and can achieve orders of magnitude more executions per second for code with fast parsing paths.

libFuzzer is built into LLVM/Clang and requires no separate installation.

How libFuzzer Differs from AFL

libFuzzer AFL
Execution model In-process New process per input
Speed Faster (no fork) Slower (fork overhead)
Stability Less (state leaks) More (clean process each time)
Setup Requires harness function Works with any binary
Integration Library-based External tool
Best for Libraries, fast targets Programs with complex state

Use libFuzzer when your target is a library with a clean API. Use AFL when your target is a standalone binary or when state isolation is important.

Writing a libFuzzer Harness

Every libFuzzer target implements one function:

// fuzz_target.c
#include <stdint.h>
#include <stdlib.h>

// Your library's header
#include "myparser.h"

// libFuzzer calls this with each generated input
int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    // Parse/process the data
    // Return 0 — any other return value is reserved
    parse_input(data, size);
    return 0;
}

Rules for the harness:

  1. Always return 0 (other values may be used for special purposes in future versions)
  2. Do not call exit() — it terminates the fuzzer
  3. Do not print to stdout/stderr in the main loop (it slows things down significantly)
  4. If the function allocates memory, free it — libFuzzer runs millions of iterations in the same process

Compiling and Running

# Compile with libFuzzer and AddressSanitizer
clang -fsanitize=fuzzer,address -o fuzz_target fuzz_target.c mylib.c

<span class="hljs-comment"># Run the fuzzer
<span class="hljs-built_in">mkdir corpus
./fuzz_target corpus/

<span class="hljs-comment"># Run with a seed corpus
<span class="hljs-built_in">mkdir seeds
<span class="hljs-built_in">echo <span class="hljs-string">"valid input" > seeds/seed1
./fuzz_target corpus/ seeds/

<span class="hljs-comment"># Run for a time limit (seconds)
./fuzz_target corpus/ -max_total_time=3600

<span class="hljs-comment"># Run with multiple jobs (parallel)
./fuzz_target corpus/ -<span class="hljs-built_in">jobs=4 -workers=4

Sanitizer Integration

Always compile with sanitizers. libFuzzer without sanitizers misses most bugs.

# AddressSanitizer (memory errors)
clang -fsanitize=fuzzer,address -o fuzz_target fuzz_target.c mylib.c

<span class="hljs-comment"># UndefinedBehaviorSanitizer
clang -fsanitize=fuzzer,undefined -o fuzz_target fuzz_target.c mylib.c

<span class="hljs-comment"># Both (recommended)
clang -fsanitize=fuzzer,address,undefined -o fuzz_target fuzz_target.c mylib.c

<span class="hljs-comment"># MemorySanitizer (uninitialized reads — requires special build)
clang -fsanitize=fuzzer,memory -o fuzz_target fuzz_target.c mylib.c

What each sanitizer catches:

  • ASan: Buffer overflows, use-after-free, double-free, heap/stack/global buffer overflows
  • UBSan: Integer overflow, null pointer dereference, misaligned access, invalid enum values
  • MSan: Use of uninitialized memory (can catch information leaks)

FuzzedDataProvider: Structured Fuzzing

Raw bytes work for binary parsers. For structured input (multiple fields, integers, strings), use FuzzedDataProvider:

// fuzz_structured.cpp
#include <fuzzer/FuzzedDataProvider.h>
#include "myapi.h"

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    FuzzedDataProvider fdp(data, size);

    // Extract structured data from the fuzzer input
    int user_id = fdp.ConsumeIntegral<int>();
    bool is_admin = fdp.ConsumeBool();
    std::string username = fdp.ConsumeRandomLengthString(64);
    std::vector<uint8_t> payload = fdp.ConsumeRemainingBytes<uint8_t>();

    // Use structured data in your API
    User user(user_id, username, is_admin);
    process_request(user, payload.data(), payload.size());

    return 0;
}

FuzzedDataProvider methods:

  • ConsumeIntegral<T>() — consume an integer of type T
  • ConsumeBool() — consume a boolean
  • ConsumeFloatingPoint<T>() — consume a float/double
  • ConsumeRandomLengthString(max_length) — consume a string
  • ConsumeBytes<T>(count) — consume N bytes as a vector
  • ConsumeRemainingBytes<T>() — consume all remaining bytes
  • PickValueInArray(array) — pick a value from an array
  • ConsumeProbability<T>() — value in [0.0, 1.0]

Corpus Management

# Merge new inputs into an existing corpus
./fuzz_target -merge=1 corpus/ new_inputs/

<span class="hljs-comment"># Minimize the corpus (remove redundant inputs)
./fuzz_target -merge=1 minimized_corpus/ corpus/

<span class="hljs-comment"># Run a single input (for debugging)
./fuzz_target crash_input

<span class="hljs-comment"># Print coverage map for a specific input
./fuzz_target -dump_coverage=1 corpus/seed1

Understanding libFuzzer Output

INFO: Seed: 1234567890
INFO: Loaded 5 modules   (40965 inline 8-bit counters): 5 [0x7f..., 0x7f...)
INFO: Loaded 5 PC tables (40965 PCs): 5 [0x7f..., 0x7f...),
INFO: 10 files found in corpus/
INFO: seed corpus: files: 10 min: 1b max: 1024b total: 4096b rss: 35Mb
#10      INITED cov: 428 ft: 1234 corp: 10/4096b exec/s: 0 rss: 36Mb
#512     NEW    cov: 431 ft: 1246 corp: 11/4098b lim: 64 exec/s: 512 L: 2/1024 MS: 2 EraseBytes-InsertByte-
#1024    REDUCE cov: 431 ft: 1246 corp: 11/4096b lim: 64 exec/s: 1024 L: 2/1024 MS: 1 EraseBytes-
...

Key fields:

  • cov: Number of code coverage edges triggered
  • ft: Number of unique features (more granular than edges)
  • corp: Number of inputs in corpus / total corpus size
  • exec/s: Executions per second
  • L: Size of the current input / max corpus input size
  • MS: Mutation stages applied

NEW lines mean the fuzzer found a new coverage path. REDUCE means it minimized an existing corpus entry. CRASH means it found a bug.

Reproducing and Minimizing Crashes

# The crashing input is printed on crash:
<span class="hljs-comment"># artifact_prefix='./'; Test unit written to ./crash-abc123

<span class="hljs-comment"># Reproduce
./fuzz_target crash-abc123

<span class="hljs-comment"># Minimize (find smallest input that still crashes)
./fuzz_target -minimize_crash=1 -max_total_time=60 crash-abc123

Custom Mutators

For targets with complex input formats (protocols, structured configs), you can write a custom mutator:

extern "C" size_t LLVMFuzzerCustomMutator(
    uint8_t *data, size_t size, size_t max_size, unsigned int seed) {
    
    // Parse the input into your format
    MyFormat fmt;
    if (!fmt.Parse(data, size)) {
        // If unparseable, return a valid minimal input
        std::string minimal = "{}";
        memcpy(data, minimal.data(), minimal.size());
        return minimal.size();
    }
    
    // Apply format-aware mutations
    std::mt19937 rng(seed);
    fmt.MutateField(rng);
    
    // Serialize back
    std::string serialized = fmt.Serialize();
    if (serialized.size() > max_size) return size; // too large, return unchanged
    
    memcpy(data, serialized.data(), serialized.size());
    return serialized.size();
}

Custom mutators drastically improve coverage for structured inputs by generating valid-but-unexpected inputs rather than random byte sequences.

OSS-Fuzz Integration

If your project is open source, OSS-Fuzz will run your libFuzzer harnesses continuously for free.

Requirements:

  • libFuzzer harnesses
  • Docker-based build configuration
  • Public repository

Minimal integration:

# projects/mylib/Dockerfile
FROM gcr.io/oss-fuzz-base/base-builder
COPY . $SRC/mylib
WORKDIR $SRC/mylib
COPY build.sh $SRC/
# projects/mylib/build.sh
<span class="hljs-comment">#!/bin/bash -eu
<span class="hljs-built_in">cd <span class="hljs-variable">$SRC/mylib
cmake . -DCMAKE_CXX_COMPILER=<span class="hljs-variable">$CXX -DCMAKE_C_COMPILER=<span class="hljs-variable">$CC \
        -DCMAKE_CXX_FLAGS=<span class="hljs-string">"$CXXFLAGS" -DCMAKE_C_FLAGS=<span class="hljs-string">"$CFLAGS"
make -j$(<span class="hljs-built_in">nproc)

<span class="hljs-comment"># Copy harness binaries
<span class="hljs-keyword">for fuzzer <span class="hljs-keyword">in fuzz_parser fuzz_decoder; <span class="hljs-keyword">do
    <span class="hljs-built_in">cp <span class="hljs-variable">$fuzzer <span class="hljs-variable">$OUT/
<span class="hljs-keyword">done

<span class="hljs-comment"># Copy seed corpus
zip -r <span class="hljs-variable">$OUT/fuzz_parser_seed_corpus.zip corpus/
# projects/mylib/project.yaml
homepage: "https://github.com/org/mylib"
language: c++
primary_contact: "maintainer@example.com"
auto_ccs:
  - "security@example.com"

Submit as a PR to the OSS-Fuzz repository.

Performance Optimization

Track expensive allocations: If your harness allocates a large object on every call, move it outside the fuzzing loop using a static or persistent initialization:

// Slow: allocates every call
int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    MyExpensiveObject obj;  // expensive construction
    obj.Parse(data, size);
    return 0;
}

// Fast: initialize once
static MyExpensiveObject *obj = nullptr;

extern "C" int LLVMFuzzerInitialize(int *argc, char ***argv) {
    obj = new MyExpensiveObject();
    return 0;
}

int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    obj->Reset();  // cheap reset
    obj->Parse(data, size);
    return 0;
}

Reduce logging: Disable logging in test builds. Even stderr writes slow down fuzzing substantially.

Focus the harness: A harness that exercises 500 lines of critical parsing code will outperform one that exercises 10,000 lines of application logic. Narrow the attack surface to what you care about most.

Read more