AFL++ Fuzzing Tutorial: Setup, Corpus Management, and Coverage-Guided Fuzzing

AFL++ Fuzzing Tutorial: Setup, Corpus Management, and Coverage-Guided Fuzzing

AFL++ (American Fuzzy Lop Plus Plus) is the most widely used coverage-guided fuzzer in the world. It has discovered thousands of vulnerabilities in production software — from image parsers to network protocols to cryptographic libraries. This tutorial walks you through setting up AFL++, instrumenting a target, building an effective corpus, and analyzing the crashes it finds.

What Makes AFL++ Different

AFL++ evolved from the original AFL (developed by Michal Zalewski at Google). It adds:

  • Multiple instrumentation backends: LLVM, GCC, QEMU, Unicorn, Frida
  • Custom mutators API: Write domain-specific mutation strategies in Python or C
  • Persistent mode: Run target in-process without fork overhead (10-20x faster)
  • Cmplog: Tracks comparison operands to break through magic byte checks
  • MOpt: Machine learning-based mutation scheduling
  • Parallel fuzzing: Scale across multiple cores with minimal configuration

AFL++ is the standard choice for fuzzing C and C++ code.

Installation

From Package Manager

# Ubuntu/Debian
apt-get install afl++

<span class="hljs-comment"># macOS (Homebrew)
brew install afl++

<span class="hljs-comment"># Arch Linux
pacman -S afl++

Building from source ensures the latest features and best performance:

git clone https://github.com/AFLplusplus/AFLplusplus
<span class="hljs-built_in">cd AFLplusplus
make distrib
<span class="hljs-built_in">sudo make install

Verify installation:

afl-fuzz --help 2>&1 <span class="hljs-pipe">| <span class="hljs-built_in">head -5

Instrumenting a Target

AFL++ needs to instrument your target program to observe code coverage. The simplest approach uses the AFL++ compiler wrappers.

Compiling with afl-clang-fast

# For a simple C program
CC=afl-clang-fast CXX=afl-clang-fast++ ./configure --prefix=/tmp/fuzz-install
make -j$(<span class="hljs-built_in">nproc)
make install

<span class="hljs-comment"># Or directly
afl-clang-fast -o target target.c

afl-clang-fast is the recommended instrumentation method — it uses LLVM's pass-based instrumentation, which is more precise than AFL's original GCC-based approach.

For programs using CMake:

cmake -DCMAKE_C_COMPILER=afl-clang-fast \
      -DCMAKE_CXX_COMPILER=afl-clang-fast++ \
      ..
make -j$(nproc)

A Simple Target for Learning

Let's create a target with a known vulnerability to demonstrate AFL++:

// target.c — intentionally buggy parser
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

typedef struct {
    char header[4];
    int size;
    char data[64];
} Packet;

void parse_packet(const unsigned char *buf, size_t len) {
    if (len < 8) return;
    
    Packet pkt;
    memcpy(pkt.header, buf, 4);
    
    if (memcmp(pkt.header, "PKT!", 4) != 0) return;
    
    pkt.size = *(int *)(buf + 4);
    
    // BUG: pkt.size not bounds-checked before memcpy!
    memcpy(pkt.data, buf + 8, pkt.size);  // Heap buffer overflow
    
    printf("Packet size: %d\n", pkt.size);
}

int main(int argc, char **argv) {
    if (argc < 2) {
        fprintf(stderr, "Usage: %s <input_file>\n", argv[0]);
        return 1;
    }
    
    FILE *f = fopen(argv[1], "rb");
    if (!f) return 1;
    
    unsigned char buf[1024];
    size_t len = fread(buf, 1, sizeof(buf), f);
    fclose(f);
    
    parse_packet(buf, len);
    return 0;
}

Compile with AFL++ and AddressSanitizer (ASAN):

AFL_USE_ASAN=1 afl-clang-fast -fsanitize=address -o target_fuzz target.c

ASAN makes buffer overflows and use-after-free bugs crash immediately instead of silently corrupting memory. Always use it during fuzzing.

Building a Seed Corpus

AFL++ starts with a corpus of seed inputs and mutates them. A good corpus:

  • Contains valid inputs your parser accepts
  • Covers different code paths
  • Is small in size (AFL++ prefers many small files over few large ones)

Creating Initial Seeds

mkdir -p corpus_in

<span class="hljs-comment"># Seed 1: minimal valid packet
<span class="hljs-built_in">printf <span class="hljs-string">'PKT!\x04\x00\x00\x00data' > corpus_in/valid_small

<span class="hljs-comment"># Seed 2: packet with different data
<span class="hljs-built_in">printf <span class="hljs-string">'PKT!\x08\x00\x00\x00datadata' > corpus_in/valid_medium

<span class="hljs-comment"># Seed 3: invalid header (will fail validation, but still a useful path)
<span class="hljs-built_in">printf <span class="hljs-string">'NOPE\x04\x00\x00\x00data' > corpus_in/invalid_header

For real targets, collect actual input samples from:

  • Protocol captures (Wireshark pcap → individual packets)
  • File format samples (valid JPEG files for an image parser)
  • Test fixtures from the project's test suite
  • Publicly available corpus collections

Minimizing the Corpus

Duplicate inputs waste fuzzer time. Use afl-cmin to reduce your corpus to unique coverage:

afl-cmin -i corpus_in/ -o corpus_min/ -- ./target_fuzz @@

@@ tells AFL++ where to pass the input file path. afl-cmin runs all inputs, discards duplicates with identical coverage, and outputs only the unique ones.

Running AFL++

Basic Single-Core Fuzzing

afl-fuzz -i corpus_min/ -o fuzz_output/ -- ./target_fuzz @@

You'll see the AFL++ TUI — a dashboard showing:

  • cycles done: How many times the corpus has been fully processed
  • corpus count: Number of test cases in the corpus
  • map coverage: % of coverage bitmap filled
  • crashes: Number of unique crash inputs found
  • hangs: Number of timeout inputs

Important Flags

# -t: timeout per execution (milliseconds, default 1000ms)
afl-fuzz -i corpus/ -o out/ -t 500 -- ./target @@

<span class="hljs-comment"># -m: memory limit (default 50MB)
afl-fuzz -i corpus/ -o out/ -m 200 -- ./target @@

<span class="hljs-comment"># -x: dictionary file with interesting tokens
afl-fuzz -i corpus/ -o out/ -x /usr/share/aflplusplus/dictionaries/http.dict -- ./target @@

<span class="hljs-comment"># -s: fixed random seed (for reproducibility)
afl-fuzz -i corpus/ -o out/ -s 42 -- ./target @@

Parallel Fuzzing (Multi-Core)

Parallel fuzzing multiplies throughput linearly:

# Main instance (-M flag)
afl-fuzz -i corpus/ -o fuzz_output/ -M main -- ./target @@

<span class="hljs-comment"># Secondary instances (-S flag, run in separate terminals)
afl-fuzz -i corpus/ -o fuzz_output/ -S worker1 -- ./target @@
afl-fuzz -i corpus/ -o fuzz_output/ -S worker2 -- ./target @@
afl-fuzz -i corpus/ -o fuzz_output/ -S worker3 -- ./target @@

Secondary instances share discoveries with the main instance. Run as many as you have CPU cores.

A quick launch script for all cores:

#!/bin/bash
CORES=$(<span class="hljs-built_in">nproc)

<span class="hljs-comment"># Launch main instance
afl-fuzz -i corpus/ -o fuzz_output/ -M main -- ./target @@ &

<span class="hljs-comment"># Launch secondary instances
<span class="hljs-keyword">for i <span class="hljs-keyword">in $(<span class="hljs-built_in">seq 1 $((CORES - <span class="hljs-number">1))); <span class="hljs-keyword">do
    afl-fuzz -i corpus/ -o fuzz_output/ -S <span class="hljs-string">"worker${i}" -- ./target @@ &
<span class="hljs-keyword">done

<span class="hljs-built_in">wait

Persistent Mode (10-20x Faster)

Persistent mode runs the fuzz target in a loop within a single process, eliminating fork() overhead:

// target_persistent.c
#include "afl-fuzz.h"  // AFL++ header

int main(int argc, char **argv) {
    // AFL_LOOP: run fuzz target in-process up to 10000 iterations
    while (__AFL_LOOP(10000)) {
        // Read from stdin (AFL++ provides input via stdin in persistent mode)
        unsigned char buf[4096];
        ssize_t len = read(0, buf, sizeof(buf));
        if (len < 0) continue;
        
        parse_packet(buf, len);
    }
    return 0;
}
# Run in persistent mode (stdin input)
afl-fuzz -i corpus/ -o out/ -- ./target_persistent

Using Cmplog to Break Magic Bytes

Many programs have "magic byte" checks that fuzzers struggle with:

if (memcmp(buf, "MAGIC_HEADER\x42\x13", 14) != 0) return;

Randomly mutating inputs to match 14 specific bytes is nearly impossible. AFL++'s Cmplog tracks comparison operands and automatically feeds them back as mutations:

# Compile with Cmplog instrumentation
AFL_LLVM_CMPLOG=1 afl-clang-fast -o target_cmplog target.c

<span class="hljs-comment"># Run with Cmplog
afl-fuzz -i corpus/ -o out/ -c ./target_cmplog -- ./target_fuzz @@

Analyzing Crashes

AFL++ saves crash inputs in fuzz_output/main/crashes/. Each file is an input that caused a crash.

Reproducing a Crash

# Run the target directly with the crash input
./target_fuzz fuzz_output/main/crashes/id:000000,sig:11,src:000001,...

<span class="hljs-comment"># With ASAN for detailed analysis
ASAN_OPTIONS=symbolize=1 ./target_fuzz fuzz_output/main/crashes/id:000000,...

ASAN output for our buffer overflow bug:

==12345==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x...
READ of size 1024 at 0x... thread T0
    #0 0x... in parse_packet target.c:18
    #1 0x... in main target.c:32
    
SUMMARY: AddressSanitizer: heap-buffer-overflow target.c:18 in parse_packet

Minimizing Crash Inputs

Crash inputs from AFL++ are often large with unnecessary bytes. afl-tmin minimizes them:

afl-tmin -i fuzz_output/main/crashes/id:000000,... -o crash_minimal.bin -- ./target_fuzz @@

The minimized input is easier to understand and turn into a regression test.

Deduplicating Crashes

Multiple crash inputs often represent the same underlying bug:

# Use afl-collect to gather and deduplicate
afl-collect -e -r fuzz_output/ crashes/ -- ./target_fuzz @@

Reading AFL++ Stats

Check campaign statistics without the TUI:

cat fuzz_output/main/fuzzer_stats

Key metrics:

run_time        : 3600         (seconds)
cycles_done     : 47           (corpus cycles completed)  
corpus_count    : 1847         (unique test cases)
map_coverage    : 23.45%       (% of edges covered)
crashes         : 3            (unique crashes)
hangs           : 0
execs_per_sec   : 2847.32      (target executions/second)

Using Dictionaries

Dictionaries tell AFL++ interesting byte sequences to use in mutations:

# AFL++ ships dictionaries for common formats
<span class="hljs-built_in">ls /usr/share/aflplusplus/dictionaries/
<span class="hljs-comment"># http.dict html.dict json.dict xml.dict jpeg.dict zip.dict ...

afl-fuzz -i corpus/ -o out/ -x /usr/share/aflplusplus/dictionaries/json.dict -- ./target @@

Create custom dictionaries for domain-specific targets:

# custom.dict
<span class="hljs-string">"PKT!"
<span class="hljs-string">"MAGIC_BYTES"
<span class="hljs-string">"\x00\x01\x02\x03"
<span class="hljs-string">"\xff\xfe\xfd\xfc"

Continuous Fuzzing in CI

Running AFL++ in CI catches regressions and extends coverage over time:

# GitHub Actions
- name: Run AFL++ fuzz campaign
  run: |
    afl-fuzz -i corpus/ -o fuzz_out/ -t 1000 -m 200 \
      -- ./target @@ &
    AFL_PID=$!
    sleep 300  # 5 minutes
    kill $AFL_PID
    
    # Fail CI if crashes found
    if ls fuzz_out/main/crashes/id:* 2>/dev/null; then
      echo "CRASHES FOUND"
      exit 1
    fi

From Fuzzing to Continuous Monitoring

AFL++ finds crashes in your code under controlled conditions. Production, however, has real users, real data, and runtime conditions a fuzzer can't simulate. A crash-free fuzzing run doesn't mean your application works correctly in production.

HelpMeTest complements AFL++ by running continuous end-to-end tests against your live production environment — verifying correct behavior 24/7. While AFL++ explores edge cases in unit-level code, HelpMeTest monitors whether the full application works for real users. No code required.

Summary

AFL++ is production-grade fuzzing infrastructure with a manageable learning curve. The key steps:

  1. Compile with afl-clang-fast and AFL_USE_ASAN=1
  2. Build a minimal seed corpus from real inputs
  3. Minimize corpus with afl-cmin
  4. Run with parallel workers (one per CPU core)
  5. Use Cmplog (-c) to break through magic byte checks
  6. Analyze crashes with ASAN symbolization
  7. Minimize crash inputs with afl-tmin
  8. Add crash inputs as regression tests

Give AFL++ a few hours on any C/C++ parser that handles untrusted input. The bugs it finds will surprise you.

Read more