Rust

Rust Benchmarking with Criterion: Performance Testing Guide

HelpMeTest

15 May 2026 — 4 min read

Rust's built-in #[bench] attribute requires nightly and lacks statistical rigor. Criterion.rs is the de facto standard for benchmarking on stable Rust: it handles warm-up, measures statistical variance, compares runs against baselines, and produces HTML reports.

Installation

[dev-dependencies]
criterion = { version = "0.5", features = ["html_reports"] }

[[bench]]
name = "my_benchmark"
harness = false

The harness = false line disables the default test harness so Criterion can take over.

Create benches/my_benchmark.rs:

my_crate/
├── benches/
│   └── my_benchmark.rs
├── src/
│   └── lib.rs
└── Cargo.toml

A Minimal Benchmark

// benches/my_benchmark.rs
use criterion::{criterion_group, criterion_main, Criterion};
use my_crate::sort_numbers;

fn bench_sort(c: &mut Criterion) {
    let data: Vec<i32> = (0..1000).rev().collect();

    c.bench_function("sort 1000 integers", |b| {
        b.iter(|| sort_numbers(data.clone()))
    });
}

criterion_group!(benches, bench_sort);
criterion_main!(benches);

Run with:

cargo bench

Criterion runs the benchmark function many times, measures latency, calculates mean and standard deviation, and prints a summary:

sort 1000 integers      time:   [42.3 µs 42.7 µs 43.1 µs]

The three numbers are the lower bound, estimated mean, and upper bound of a 95% confidence interval.

`black_box`: Preventing Compiler Optimizations

The compiler may optimize away computations whose results aren't used. Use criterion::black_box to prevent this:

use criterion::black_box;

c.bench_function("fibonacci", |b| {
    b.iter(|| fibonacci(black_box(30)))
});

black_box tells the compiler to treat the value as potentially externally observable, suppressing dead-code elimination.

Setup Outside the Hot Loop

Move expensive setup outside b.iter using b.iter_with_setup or pre-computing values:

fn bench_search(c: &mut Criterion) {
    // Setup runs once, not on every iteration
    let haystack: Vec<String> = (0..10_000)
        .map(|i| format!("item_{}", i))
        .collect();

    c.bench_function("search in 10k strings", |b| {
        b.iter(|| {
            haystack.iter().find(|s| s.contains("5000"))
        });
    });
}

For setup that must run fresh each iteration:

c.bench_function("sort with clone", |b| {
    b.iter_batched(
        || (0..1000i32).rev().collect::<Vec<_>>(),  // setup
        |data| sort_numbers(data),                    // benchmark
        criterion::BatchSize::SmallInput,
    )
});

iter_batched creates fresh input for each iteration without including setup time in the measurement.

Parameterized Benchmarks with `BenchmarkGroup`

Compare the same function across different input sizes:

use criterion::{BenchmarkId, Criterion};

fn bench_sort_sizes(c: &mut Criterion) {
    let mut group = c.benchmark_group("sort by size");

    for size in [100, 1_000, 10_000, 100_000].iter() {
        let data: Vec<i32> = (0..*size).rev().collect();

        group.bench_with_input(
            BenchmarkId::new("reverse_sort", size),
            size,
            |b, &size| {
                b.iter_batched(
                    || (0..size).rev().collect::<Vec<i32>>(),
                    |d| sort_numbers(d),
                    criterion::BatchSize::SmallInput,
                )
            },
        );
    }

    group.finish();
}

Output:

sort by size/reverse_sort/100     time: [1.2 µs 1.2 µs 1.3 µs]
sort by size/reverse_sort/1000    time: [15.4 µs 15.6 µs 15.8 µs]
sort by size/reverse_sort/10000   time: [188 µs 190 µs 192 µs]
sort by size/reverse_sort/100000  time: [2.2 ms 2.2 ms 2.3 ms]

Comparing Implementations

Benchmark multiple implementations in the same group:

fn bench_search_algorithms(c: &mut Criterion) {
    let mut group = c.benchmark_group("search");
    let haystack: Vec<i32> = (0..10_000).collect();
    let needle = 7_777;

    group.bench_function("linear_search", |b| {
        b.iter(|| linear_search(&haystack, black_box(needle)))
    });

    group.bench_function("binary_search", |b| {
        b.iter(|| binary_search(&haystack, black_box(needle)))
    });

    group.finish();
}

Baselines and Regression Detection

Criterion saves results to target/criterion/. On subsequent runs, it compares against the saved baseline:

sort 1000 integers    time:   [42.3 µs 42.7 µs 43.1 µs]
                      change: [-3.2% -2.1% -0.9%] (p = 0.001 < 0.05)
                      Performance has improved.

Save a named baseline before a refactor:

cargo bench -- --save-baseline before_refactor

Compare against it after:

cargo bench -- --baseline before_refactor

HTML Reports

With the html_reports feature enabled, Criterion generates HTML reports in target/criterion/:

open target/criterion/report/index.html

Reports include time-series plots, distribution histograms, and comparison charts. Useful for presenting benchmark results to teammates.

Throughput Measurement

For I/O or data-processing benchmarks, measure throughput instead of latency:

use criterion::Throughput;

fn bench_compress(c: &mut Criterion) {
    let data = vec![0u8; 1_000_000];  // 1 MB

    let mut group = c.benchmark_group("compression");
    group.throughput(Throughput::Bytes(data.len() as u64));

    group.bench_function("compress_1mb", |b| {
        b.iter(|| compress(&data))
    });

    group.finish();
}

Output shows MB/s rather than µs:

compression/compress_1mb    thrpt: [1.23 GiB/s 1.25 GiB/s 1.27 GiB/s]

CI Integration

Benchmarks in CI catch regressions before merge. A GitHub Actions example with critcmp:

name: Benchmarks
on: [push, pull_request]

jobs:
  benchmark:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: dtolnay/rust-toolchain@stable
      - name: Install critcmp
        run: cargo install critcmp

      - name: Run benchmarks (main baseline)
        if: github.ref == 'refs/heads/main'
        run: cargo bench -- --save-baseline main

      - name: Run benchmarks (PR comparison)
        if: github.ref != 'refs/heads/main'
        run: |
          cargo bench -- --save-baseline current
          critcmp main current --threshold 5

critcmp fails the CI job if any benchmark regresses by more than 5%.

What to Benchmark

Not everything needs a benchmark. Good candidates:

Hot paths — code called millions of times per second
Algorithmic alternatives — choosing between two implementations
Before/after refactors — verifying optimization didn't regress
Data structure selection — HashMap vs BTreeMap for your access pattern
Serialization — JSON vs MessagePack vs Bincode for your payload sizes

Avoid benchmarking code that isn't on a performance-critical path. The cost of maintaining benchmarks should be justified by the value of the measurement.

Production Monitoring

Benchmarks measure performance in a controlled environment. Production behavior — under real load, real data sizes, and real concurrency — is different. HelpMeTest monitors your live endpoints for latency regressions 24/7, alerting before users notice.

Summary

Criterion works on stable Rust; #[bench] requires nightly
b.iter(|| ...) is the hot loop; setup goes outside
Use black_box to prevent dead-code optimization
BenchmarkGroup compares multiple inputs or implementations
Save baselines before refactors, compare after
Throughput measurement for I/O-intensive code
Integrate with CI using critcmp for regression detection

Rust Benchmarking with Criterion: Performance Testing Guide

HelpMeTest

Installation

A Minimal Benchmark

`black_box`: Preventing Compiler Optimizations

Setup Outside the Hot Loop

Parameterized Benchmarks with `BenchmarkGroup`

Comparing Implementations

Baselines and Regression Detection

HTML Reports

Throughput Measurement

CI Integration

What to Benchmark

Production Monitoring

Summary

Read more

Acceptance Testing Best Practices: A Complete Guide

AgentBench and LLM Agent Evaluation: Setting Up Benchmarks & Custom Harnesses

A/B Test Experiment Design: How to Set Up Tests That Produce Reliable Results

Advanced Jest Mocking: Timers, ESM & Complex Dependencies

Installation

A Minimal Benchmark

black_box: Preventing Compiler Optimizations

Setup Outside the Hot Loop

Parameterized Benchmarks with BenchmarkGroup

Comparing Implementations

Baselines and Regression Detection

HTML Reports

Throughput Measurement

CI Integration

What to Benchmark

Production Monitoring

Summary

Read more

Acceptance Testing Best Practices: A Complete Guide

AgentBench and LLM Agent Evaluation: Setting Up Benchmarks & Custom Harnesses

A/B Test Experiment Design: How to Set Up Tests That Produce Reliable Results

Advanced Jest Mocking: Timers, ESM & Complex Dependencies

`black_box`: Preventing Compiler Optimizations

Parameterized Benchmarks with `BenchmarkGroup`