Rust Benchmarking with Criterion: Performance Testing Guide
Rust's built-in #[bench] attribute requires nightly and lacks statistical rigor. Criterion.rs is the de facto standard for benchmarking on stable Rust: it handles warm-up, measures statistical variance, compares runs against baselines, and produces HTML reports.
Installation
[dev-dependencies]
criterion = { version = "0.5", features = ["html_reports"] }
[[bench]]
name = "my_benchmark"
harness = falseThe harness = false line disables the default test harness so Criterion can take over.
Create benches/my_benchmark.rs:
my_crate/
├── benches/
│ └── my_benchmark.rs
├── src/
│ └── lib.rs
└── Cargo.tomlA Minimal Benchmark
// benches/my_benchmark.rs
use criterion::{criterion_group, criterion_main, Criterion};
use my_crate::sort_numbers;
fn bench_sort(c: &mut Criterion) {
let data: Vec<i32> = (0..1000).rev().collect();
c.bench_function("sort 1000 integers", |b| {
b.iter(|| sort_numbers(data.clone()))
});
}
criterion_group!(benches, bench_sort);
criterion_main!(benches);Run with:
cargo benchCriterion runs the benchmark function many times, measures latency, calculates mean and standard deviation, and prints a summary:
sort 1000 integers time: [42.3 µs 42.7 µs 43.1 µs]The three numbers are the lower bound, estimated mean, and upper bound of a 95% confidence interval.
black_box: Preventing Compiler Optimizations
The compiler may optimize away computations whose results aren't used. Use criterion::black_box to prevent this:
use criterion::black_box;
c.bench_function("fibonacci", |b| {
b.iter(|| fibonacci(black_box(30)))
});black_box tells the compiler to treat the value as potentially externally observable, suppressing dead-code elimination.
Setup Outside the Hot Loop
Move expensive setup outside b.iter using b.iter_with_setup or pre-computing values:
fn bench_search(c: &mut Criterion) {
// Setup runs once, not on every iteration
let haystack: Vec<String> = (0..10_000)
.map(|i| format!("item_{}", i))
.collect();
c.bench_function("search in 10k strings", |b| {
b.iter(|| {
haystack.iter().find(|s| s.contains("5000"))
});
});
}For setup that must run fresh each iteration:
c.bench_function("sort with clone", |b| {
b.iter_batched(
|| (0..1000i32).rev().collect::<Vec<_>>(), // setup
|data| sort_numbers(data), // benchmark
criterion::BatchSize::SmallInput,
)
});iter_batched creates fresh input for each iteration without including setup time in the measurement.
Parameterized Benchmarks with BenchmarkGroup
Compare the same function across different input sizes:
use criterion::{BenchmarkId, Criterion};
fn bench_sort_sizes(c: &mut Criterion) {
let mut group = c.benchmark_group("sort by size");
for size in [100, 1_000, 10_000, 100_000].iter() {
let data: Vec<i32> = (0..*size).rev().collect();
group.bench_with_input(
BenchmarkId::new("reverse_sort", size),
size,
|b, &size| {
b.iter_batched(
|| (0..size).rev().collect::<Vec<i32>>(),
|d| sort_numbers(d),
criterion::BatchSize::SmallInput,
)
},
);
}
group.finish();
}Output:
sort by size/reverse_sort/100 time: [1.2 µs 1.2 µs 1.3 µs]
sort by size/reverse_sort/1000 time: [15.4 µs 15.6 µs 15.8 µs]
sort by size/reverse_sort/10000 time: [188 µs 190 µs 192 µs]
sort by size/reverse_sort/100000 time: [2.2 ms 2.2 ms 2.3 ms]Comparing Implementations
Benchmark multiple implementations in the same group:
fn bench_search_algorithms(c: &mut Criterion) {
let mut group = c.benchmark_group("search");
let haystack: Vec<i32> = (0..10_000).collect();
let needle = 7_777;
group.bench_function("linear_search", |b| {
b.iter(|| linear_search(&haystack, black_box(needle)))
});
group.bench_function("binary_search", |b| {
b.iter(|| binary_search(&haystack, black_box(needle)))
});
group.finish();
}Baselines and Regression Detection
Criterion saves results to target/criterion/. On subsequent runs, it compares against the saved baseline:
sort 1000 integers time: [42.3 µs 42.7 µs 43.1 µs]
change: [-3.2% -2.1% -0.9%] (p = 0.001 < 0.05)
Performance has improved.Save a named baseline before a refactor:
cargo bench -- --save-baseline before_refactorCompare against it after:
cargo bench -- --baseline before_refactorHTML Reports
With the html_reports feature enabled, Criterion generates HTML reports in target/criterion/:
open target/criterion/report/index.htmlReports include time-series plots, distribution histograms, and comparison charts. Useful for presenting benchmark results to teammates.
Throughput Measurement
For I/O or data-processing benchmarks, measure throughput instead of latency:
use criterion::Throughput;
fn bench_compress(c: &mut Criterion) {
let data = vec![0u8; 1_000_000]; // 1 MB
let mut group = c.benchmark_group("compression");
group.throughput(Throughput::Bytes(data.len() as u64));
group.bench_function("compress_1mb", |b| {
b.iter(|| compress(&data))
});
group.finish();
}Output shows MB/s rather than µs:
compression/compress_1mb thrpt: [1.23 GiB/s 1.25 GiB/s 1.27 GiB/s]CI Integration
Benchmarks in CI catch regressions before merge. A GitHub Actions example with critcmp:
name: Benchmarks
on: [push, pull_request]
jobs:
benchmark:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@stable
- name: Install critcmp
run: cargo install critcmp
- name: Run benchmarks (main baseline)
if: github.ref == 'refs/heads/main'
run: cargo bench -- --save-baseline main
- name: Run benchmarks (PR comparison)
if: github.ref != 'refs/heads/main'
run: |
cargo bench -- --save-baseline current
critcmp main current --threshold 5critcmp fails the CI job if any benchmark regresses by more than 5%.
What to Benchmark
Not everything needs a benchmark. Good candidates:
- Hot paths — code called millions of times per second
- Algorithmic alternatives — choosing between two implementations
- Before/after refactors — verifying optimization didn't regress
- Data structure selection —
HashMapvsBTreeMapfor your access pattern - Serialization — JSON vs MessagePack vs Bincode for your payload sizes
Avoid benchmarking code that isn't on a performance-critical path. The cost of maintaining benchmarks should be justified by the value of the measurement.
Production Monitoring
Benchmarks measure performance in a controlled environment. Production behavior — under real load, real data sizes, and real concurrency — is different. HelpMeTest monitors your live endpoints for latency regressions 24/7, alerting before users notice.
Summary
- Criterion works on stable Rust;
#[bench]requires nightly b.iter(|| ...)is the hot loop; setup goes outside- Use
black_boxto prevent dead-code optimization BenchmarkGroupcompares multiple inputs or implementations- Save baselines before refactors, compare after
Throughputmeasurement for I/O-intensive code- Integrate with CI using
critcmpfor regression detection