Rust Benchmarking with Criterion: Performance Testing Guide

Rust Benchmarking with Criterion: Performance Testing Guide

Rust's built-in #[bench] attribute requires nightly and lacks statistical rigor. Criterion.rs is the de facto standard for benchmarking on stable Rust: it handles warm-up, measures statistical variance, compares runs against baselines, and produces HTML reports.

Installation

[dev-dependencies]
criterion = { version = "0.5", features = ["html_reports"] }

[[bench]]
name = "my_benchmark"
harness = false

The harness = false line disables the default test harness so Criterion can take over.

Create benches/my_benchmark.rs:

my_crate/
├── benches/
│   └── my_benchmark.rs
├── src/
│   └── lib.rs
└── Cargo.toml

A Minimal Benchmark

// benches/my_benchmark.rs
use criterion::{criterion_group, criterion_main, Criterion};
use my_crate::sort_numbers;

fn bench_sort(c: &mut Criterion) {
    let data: Vec<i32> = (0..1000).rev().collect();

    c.bench_function("sort 1000 integers", |b| {
        b.iter(|| sort_numbers(data.clone()))
    });
}

criterion_group!(benches, bench_sort);
criterion_main!(benches);

Run with:

cargo bench

Criterion runs the benchmark function many times, measures latency, calculates mean and standard deviation, and prints a summary:

sort 1000 integers      time:   [42.3 µs 42.7 µs 43.1 µs]

The three numbers are the lower bound, estimated mean, and upper bound of a 95% confidence interval.

black_box: Preventing Compiler Optimizations

The compiler may optimize away computations whose results aren't used. Use criterion::black_box to prevent this:

use criterion::black_box;

c.bench_function("fibonacci", |b| {
    b.iter(|| fibonacci(black_box(30)))
});

black_box tells the compiler to treat the value as potentially externally observable, suppressing dead-code elimination.

Setup Outside the Hot Loop

Move expensive setup outside b.iter using b.iter_with_setup or pre-computing values:

fn bench_search(c: &mut Criterion) {
    // Setup runs once, not on every iteration
    let haystack: Vec<String> = (0..10_000)
        .map(|i| format!("item_{}", i))
        .collect();

    c.bench_function("search in 10k strings", |b| {
        b.iter(|| {
            haystack.iter().find(|s| s.contains("5000"))
        });
    });
}

For setup that must run fresh each iteration:

c.bench_function("sort with clone", |b| {
    b.iter_batched(
        || (0..1000i32).rev().collect::<Vec<_>>(),  // setup
        |data| sort_numbers(data),                    // benchmark
        criterion::BatchSize::SmallInput,
    )
});

iter_batched creates fresh input for each iteration without including setup time in the measurement.

Parameterized Benchmarks with BenchmarkGroup

Compare the same function across different input sizes:

use criterion::{BenchmarkId, Criterion};

fn bench_sort_sizes(c: &mut Criterion) {
    let mut group = c.benchmark_group("sort by size");

    for size in [100, 1_000, 10_000, 100_000].iter() {
        let data: Vec<i32> = (0..*size).rev().collect();

        group.bench_with_input(
            BenchmarkId::new("reverse_sort", size),
            size,
            |b, &size| {
                b.iter_batched(
                    || (0..size).rev().collect::<Vec<i32>>(),
                    |d| sort_numbers(d),
                    criterion::BatchSize::SmallInput,
                )
            },
        );
    }

    group.finish();
}

Output:

sort by size/reverse_sort/100     time: [1.2 µs 1.2 µs 1.3 µs]
sort by size/reverse_sort/1000    time: [15.4 µs 15.6 µs 15.8 µs]
sort by size/reverse_sort/10000   time: [188 µs 190 µs 192 µs]
sort by size/reverse_sort/100000  time: [2.2 ms 2.2 ms 2.3 ms]

Comparing Implementations

Benchmark multiple implementations in the same group:

fn bench_search_algorithms(c: &mut Criterion) {
    let mut group = c.benchmark_group("search");
    let haystack: Vec<i32> = (0..10_000).collect();
    let needle = 7_777;

    group.bench_function("linear_search", |b| {
        b.iter(|| linear_search(&haystack, black_box(needle)))
    });

    group.bench_function("binary_search", |b| {
        b.iter(|| binary_search(&haystack, black_box(needle)))
    });

    group.finish();
}

Baselines and Regression Detection

Criterion saves results to target/criterion/. On subsequent runs, it compares against the saved baseline:

sort 1000 integers    time:   [42.3 µs 42.7 µs 43.1 µs]
                      change: [-3.2% -2.1% -0.9%] (p = 0.001 < 0.05)
                      Performance has improved.

Save a named baseline before a refactor:

cargo bench -- --save-baseline before_refactor

Compare against it after:

cargo bench -- --baseline before_refactor

HTML Reports

With the html_reports feature enabled, Criterion generates HTML reports in target/criterion/:

open target/criterion/report/index.html

Reports include time-series plots, distribution histograms, and comparison charts. Useful for presenting benchmark results to teammates.

Throughput Measurement

For I/O or data-processing benchmarks, measure throughput instead of latency:

use criterion::Throughput;

fn bench_compress(c: &mut Criterion) {
    let data = vec![0u8; 1_000_000];  // 1 MB

    let mut group = c.benchmark_group("compression");
    group.throughput(Throughput::Bytes(data.len() as u64));

    group.bench_function("compress_1mb", |b| {
        b.iter(|| compress(&data))
    });

    group.finish();
}

Output shows MB/s rather than µs:

compression/compress_1mb    thrpt: [1.23 GiB/s 1.25 GiB/s 1.27 GiB/s]

CI Integration

Benchmarks in CI catch regressions before merge. A GitHub Actions example with critcmp:

name: Benchmarks
on: [push, pull_request]

jobs:
  benchmark:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: dtolnay/rust-toolchain@stable
      - name: Install critcmp
        run: cargo install critcmp

      - name: Run benchmarks (main baseline)
        if: github.ref == 'refs/heads/main'
        run: cargo bench -- --save-baseline main

      - name: Run benchmarks (PR comparison)
        if: github.ref != 'refs/heads/main'
        run: |
          cargo bench -- --save-baseline current
          critcmp main current --threshold 5

critcmp fails the CI job if any benchmark regresses by more than 5%.

What to Benchmark

Not everything needs a benchmark. Good candidates:

  • Hot paths — code called millions of times per second
  • Algorithmic alternatives — choosing between two implementations
  • Before/after refactors — verifying optimization didn't regress
  • Data structure selectionHashMap vs BTreeMap for your access pattern
  • Serialization — JSON vs MessagePack vs Bincode for your payload sizes

Avoid benchmarking code that isn't on a performance-critical path. The cost of maintaining benchmarks should be justified by the value of the measurement.

Production Monitoring

Benchmarks measure performance in a controlled environment. Production behavior — under real load, real data sizes, and real concurrency — is different. HelpMeTest monitors your live endpoints for latency regressions 24/7, alerting before users notice.

Summary

  • Criterion works on stable Rust; #[bench] requires nightly
  • b.iter(|| ...) is the hot loop; setup goes outside
  • Use black_box to prevent dead-code optimization
  • BenchmarkGroup compares multiple inputs or implementations
  • Save baselines before refactors, compare after
  • Throughput measurement for I/O-intensive code
  • Integrate with CI using critcmp for regression detection

Read more