WebAssembly Performance Benchmarking: criterion.rs, Native Comparison, and Memory Profiling

WebAssembly Performance Benchmarking: criterion.rs, Native Comparison, and Memory Profiling

WebAssembly is fast — often within 10–20% of native performance — but "fast" is relative. Whether your Wasm module is fast enough depends on the baseline, the workload, and the runtime. Benchmarking Wasm requires measuring both the Wasm execution and comparing it against native, and tracking performance over time to catch regressions.

This guide covers benchmarking WebAssembly with criterion.rs, comparing Wasm vs native performance, profiling memory usage in the Wasm sandbox, and integrating performance gates into CI.

Why Wasm Benchmarking Is Different

When benchmarking native Rust, you measure the code running on your hardware. When benchmarking Wasm, you measure:

  1. Compilation overhead — Wasmtime/V8 JIT-compile the module before execution
  2. Sandbox overhead — bounds checks, memory isolation, and indirect call overhead
  3. JS/Wasm boundary crossing — for browser Wasm, crossing the JS boundary adds overhead
  4. Memory allocation — Wasm uses a linear memory model, different from native heap behavior

Benchmark at each level to understand where performance comes from.

Setting Up criterion.rs

# Cargo.toml
[dev-dependencies]
criterion = { version = "0.5", features = ["html_reports"] }

[[bench]]
name = "wasm_benchmarks"
harness = false

[[bench]]
name = "native_benchmarks"
harness = false

Native Benchmarks

Establish native baselines first:

// benches/native_benchmarks.rs
use criterion::{black_box, criterion_group, criterion_main, Criterion, BenchmarkId};

// The pure logic we'll also compile to Wasm
fn fib_native(n: u64) -> u64 {
    if n <= 1 { return n; }
    let mut a = 0u64;
    let mut b = 1u64;
    for _ in 2..=n {
        let c = a + b;
        a = b;
        b = c;
    }
    b
}

fn parse_csv_native(input: &str) -> Vec<Vec<f64>> {
    input.lines()
        .map(|line| {
            line.split(',')
                .filter_map(|s| s.trim().parse().ok())
                .collect()
        })
        .collect()
}

fn sum_matrix_native(matrix: &[Vec<f64>]) -> f64 {
    matrix.iter().flat_map(|row| row.iter()).sum()
}

fn benchmark_fibonacci(c: &mut Criterion) {
    let mut group = c.benchmark_group("fibonacci");
    
    for n in [10u64, 30, 50].iter() {
        group.bench_with_input(
            BenchmarkId::new("native", n),
            n,
            |b, &n| b.iter(|| fib_native(black_box(n)))
        );
    }
    
    group.finish();
}

fn benchmark_csv_parsing(c: &mut Criterion) {
    let csv_data = (0..1000)
        .map(|i| format!("{},{},{}", i, i as f64 * 1.5, i as f64 * 2.7))
        .collect::<Vec<_>>()
        .join("\n");
    
    c.bench_function("csv_parse_1000_rows_native", |b| {
        b.iter(|| parse_csv_native(black_box(&csv_data)))
    });
}

criterion_group!(benches, benchmark_fibonacci, benchmark_csv_parsing);
criterion_main!(benches);

Run native benchmarks:

cargo bench --bench native_benchmarks

Wasm Benchmarks with Wasmtime

Benchmark Wasm execution using Wasmtime as the host:

// benches/wasm_benchmarks.rs
use criterion::{black_box, criterion_group, criterion_main, Criterion, BenchmarkId};
use wasmtime::{Engine, Module, Store, Instance};
use std::time::Duration;

struct WasmBenchCtx {
    engine: Engine,
    module: Module,
}

impl WasmBenchCtx {
    fn new(wasm_path: &str) -> Self {
        let engine = Engine::default();
        let module = Module::from_file(&engine, wasm_path).expect("Failed to load Wasm module");
        Self { engine, module }
    }
    
    fn new_instance(&self) -> (Store<()>, Instance) {
        let mut store = Store::new(&self.engine, ());
        let instance = Instance::new(&mut store, &self.module, &[]).expect("Failed to instantiate");
        (store, instance)
    }
}

fn benchmark_wasm_fibonacci(c: &mut Criterion) {
    let ctx = WasmBenchCtx::new("target/wasm32-wasi/release/my_module.wasm");
    
    let mut group = c.benchmark_group("fibonacci");
    
    // Warm up the JIT — don't include compilation in benchmark
    let (mut warm_store, warm_instance) = ctx.new_instance();
    let warm_fib = warm_instance
        .get_typed_func::<u64, u64>(&mut warm_store, "fib")
        .expect("fib export not found");
    warm_fib.call(&mut warm_store, 10).unwrap(); // JIT warm-up
    
    for n in [10u64, 30, 50].iter() {
        group.bench_with_input(
            BenchmarkId::new("wasm_wasmtime", n),
            n,
            |b, &n| {
                let (mut store, instance) = ctx.new_instance();
                let fib = instance
                    .get_typed_func::<u64, u64>(&mut store, "fib")
                    .unwrap();
                
                b.iter(|| fib.call(&mut store, black_box(n)).unwrap())
            }
        );
    }
    
    group.finish();
}

fn benchmark_wasm_vs_native_csv(c: &mut Criterion) {
    let csv_data = (0..1000)
        .map(|i| format!("{},{},{}", i, i as f64 * 1.5, i as f64 * 2.7))
        .collect::<Vec<_>>()
        .join("\n");
    
    let ctx = WasmBenchCtx::new("target/wasm32-wasi/release/my_module.wasm");
    
    let mut group = c.benchmark_group("csv_parsing_1000_rows");
    group.measurement_time(Duration::from_secs(5));
    
    // Native
    group.bench_function("native", |b| {
        b.iter(|| {
            let data = black_box(&csv_data);
            // inline native parse...
            data.lines().count()
        })
    });
    
    // Wasm via Wasmtime
    group.bench_function("wasm_wasmtime", |b| {
        let (mut store, instance) = ctx.new_instance();
        let memory = instance.get_memory(&mut store, "memory").unwrap();
        let parse_csv = instance
            .get_typed_func::<(i32, i32), i32>(&mut store, "parse_csv")
            .unwrap();
        
        b.iter(|| {
            let input = black_box(csv_data.as_bytes());
            memory.write(&mut store, 0, input).unwrap();
            parse_csv.call(&mut store, (0, input.len() as i32)).unwrap()
        })
    });
    
    group.finish();
}

criterion_group!(benches, benchmark_wasm_fibonacci, benchmark_wasm_vs_native_csv);
criterion_main!(benches);

Memory Usage Profiling

Track Wasm memory growth under load:

// tests/memory_profile_tests.rs
use wasmtime::{Engine, Module, Store, Instance};

#[test]
fn test_wasm_memory_does_not_grow_unboundedly() {
    let engine = Engine::default();
    let module = Module::from_file(&engine, "target/wasm32-wasi/release/my_module.wasm")
        .expect("Module load failed");
    
    let mut store = Store::new(&engine, ());
    let instance = Instance::new(&mut store, &module, &[]).unwrap();
    
    let memory = instance.get_memory(&mut store, "memory").unwrap();
    let process = instance
        .get_typed_func::<(i32, i32), i32>(&mut store, "process_batch")
        .unwrap();
    
    let initial_pages = memory.size(&store);
    println!("Initial memory: {} pages ({} KB)", initial_pages, initial_pages * 64);
    
    // Run 100 iterations of the workload
    for i in 0..100 {
        let data = format!("batch_{}", i);
        let input = data.as_bytes();
        memory.write(&mut store, 0, input).unwrap();
        process.call(&mut store, (0, input.len() as i32)).unwrap();
    }
    
    let final_pages = memory.size(&store);
    println!("Final memory: {} pages ({} KB)", final_pages, final_pages * 64);
    
    let growth_pages = final_pages - initial_pages;
    assert!(
        growth_pages <= 10, // Allow up to 640KB growth
        "Memory grew by {} pages ({} KB) over 100 iterations. Possible memory leak.",
        growth_pages,
        growth_pages * 64
    );
}

#[test]
fn test_wasm_memory_limit_respected() {
    use wasmtime::StoreLimitsBuilder;
    
    let engine = Engine::default();
    let module = Module::from_file(&engine, "target/wasm32-wasi/release/my_module.wasm")
        .unwrap();
    
    // Enforce a 16MB memory limit
    let limits = StoreLimitsBuilder::new()
        .memory_size(16 * 1024 * 1024) // 16 MB
        .build();
    
    let mut store = Store::new(&engine, limits);
    store.limiter(|state| state);
    
    let instance = Instance::new(&mut store, &module, &[]).unwrap();
    
    let allocate_large = instance
        .get_typed_func::<i32, i32>(&mut store, "allocate_bytes")
        .unwrap();
    
    // Try to allocate 20MB — should fail or be rejected
    let result = allocate_large.call(&mut store, 20 * 1024 * 1024);
    
    // With memory limiter, this should fail gracefully
    // (either returns -1 error code or traps)
    match result {
        Ok(result_code) => {
            assert!(result_code < 0, "Expected failure code for over-limit allocation");
        }
        Err(_trap) => {
            // Trap is acceptable — the limiter rejected the allocation
        }
    }
}

Performance Regression Gates in CI

Set performance budgets and fail CI when they're exceeded:

// tests/performance_gate_tests.rs

const MAX_WASM_OVERHEAD_PERCENT: f64 = 30.0; // Wasm must be within 30% of native

#[test]
fn test_fibonacci_wasm_overhead_within_budget() {
    use std::time::Instant;
    
    let n = 40u64;
    let iterations = 1000;
    
    // Native timing
    let native_start = Instant::now();
    for _ in 0..iterations {
        let _ = std::hint::black_box(fib_native(n));
    }
    let native_duration = native_start.elapsed();
    
    // Wasm timing (using cached instance — not including compilation)
    let engine = Engine::default();
    let module = Module::from_file(&engine, "target/wasm32-wasi/release/my_module.wasm").unwrap();
    let mut store = Store::new(&engine, ());
    let instance = Instance::new(&mut store, &module, &[]).unwrap();
    let fib_wasm = instance.get_typed_func::<u64, u64>(&mut store, "fib").unwrap();
    
    // Warm-up
    fib_wasm.call(&mut store, 10).unwrap();
    
    let wasm_start = Instant::now();
    for _ in 0..iterations {
        let _ = fib_wasm.call(&mut store, n).unwrap();
    }
    let wasm_duration = wasm_start.elapsed();
    
    let overhead_percent = (wasm_duration.as_nanos() as f64 / native_duration.as_nanos() as f64 - 1.0) * 100.0;
    
    println!(
        "Fibonacci n={}: native={:.2}μs, wasm={:.2}μs, overhead={:.1}%",
        n,
        native_duration.as_micros() as f64 / iterations as f64,
        wasm_duration.as_micros() as f64 / iterations as f64,
        overhead_percent
    );
    
    assert!(
        overhead_percent <= MAX_WASM_OVERHEAD_PERCENT,
        "Wasm overhead {:.1}% exceeds budget of {}%",
        overhead_percent,
        MAX_WASM_OVERHEAD_PERCENT
    );
}

CI Pipeline

# .github/workflows/wasm-benchmarks.yml
name: Wasm Performance Benchmarks

on:
  pull_request:
  schedule:
    - cron: "0 8 * * 1"  # Weekly Monday morning

jobs:
  benchmarks:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - uses: dtolnay/rust-toolchain@stable
        with:
          targets: wasm32-wasi
      
      - name: Build Wasm modules
        run: cargo build --target wasm32-wasi --release
      
      - name: Run performance gate tests
        run: cargo test --test performance_gate_tests -- --nocapture
      
      - name: Run benchmarks (save results)
        run: cargo bench -- --output-format json 2>&1 | tee bench-results.json
      
      - name: Upload benchmark results
        uses: actions/upload-artifact@v4
        with:
          name: bench-results
          path: bench-results.json

Monitoring Wasm Performance in Production

Production performance can diverge from benchmark results due to real data distributions. HelpMeTest lets you run scheduled tests that measure your Wasm module's performance on production-representative workloads and alert when latency thresholds are exceeded — giving you early warning of performance regressions before they impact users.

Conclusion

Wasm benchmarking requires measuring at three levels: native baseline (what your algorithm costs without a sandbox), Wasm execution (what the sandbox and JIT add), and memory growth (to catch leaks in the linear memory model). Use criterion.rs for statistically rigorous benchmarks, StoreLimitsBuilder to enforce memory budgets in tests, and performance gate tests in CI to catch regressions automatically. The goal is not to make Wasm as fast as native — it's to ensure Wasm performance is predictable, within budget, and doesn't regress across releases.

Read more