Go Benchmark Testing: go test -bench, benchstat, and Profiling

Go Benchmark Testing: go test -bench, benchstat, and Profiling

Go's testing package includes a built-in benchmark system. Benchmarks measure how long code takes to run and how much memory it allocates. They're the right tool for validating optimization work and catching performance regressions.

Writing Benchmarks

Benchmark functions start with Bench and take *testing.B:

func BenchmarkAdd(b *testing.B) {
    for i := 0; i < b.N; i++ {
        Add(2, 3)
    }
}

b.N is the number of iterations. The testing framework automatically adjusts b.N until the benchmark runs for a stable duration (default: 1 second). You don't set b.N — just loop over it.

Running Benchmarks

# Run all benchmarks in current package
go <span class="hljs-built_in">test -bench=. ./...

<span class="hljs-comment"># Run specific benchmark
go <span class="hljs-built_in">test -bench=BenchmarkAdd ./...

<span class="hljs-comment"># Run benchmarks matching a pattern
go <span class="hljs-built_in">test -bench=BenchmarkJSON.* ./...

<span class="hljs-comment"># Run with specific duration (instead of default 1s)
go <span class="hljs-built_in">test -bench=. -benchtime=10s ./...

<span class="hljs-comment"># Run specific number of iterations
go <span class="hljs-built_in">test -bench=. -benchtime=1000x ./...

Sample output:

BenchmarkAdd-8          1000000000               0.2781 ns/op
BenchmarkStringConcat-8  5000000               350 ns/op
BenchmarkJSONMarshal-8    500000              2500 ns/op

The columns: benchmark name, number of CPU cores, iterations run, nanoseconds per operation.

Memory Allocation Stats

Track allocations with -benchmem:

go test -bench=. -benchmem ./...

Output:

BenchmarkJSONMarshal-8    500000    2500 ns/op    512 B/op    8 allocs/op

The new columns: bytes allocated per operation, allocations per operation. Reducing allocations often matters more than CPU time — the garbage collector's pressure directly affects throughput.

Setup Outside the Loop

Don't measure setup time. Reset the timer or move setup outside the loop:

func BenchmarkSort(b *testing.B) {
    // Generate data once
    data := generateLargeSlice(10000)

    b.ResetTimer()  // start measuring after setup

    for i := 0; i < b.N; i++ {
        // copy so each iteration starts with unsorted data
        input := make([]int, len(data))
        copy(input, data)
        sort.Ints(input)
    }
}

Without b.ResetTimer(), the time to generate data pollutes the benchmark.

For per-iteration setup:

func BenchmarkDBInsert(b *testing.B) {
    db := setupTestDB(b)

    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        b.StopTimer()    // pause timer during setup
        record := generateRecord(i)
        b.StartTimer()   // resume timer

        db.Insert(record)
    }
}

Sub-Benchmarks

Like t.Run, use b.Run for sub-benchmarks:

func BenchmarkJSON(b *testing.B) {
    user := User{Name: "Alice", Email: "alice@example.com", Age: 30}

    b.Run("marshal", func(b *testing.B) {
        for i := 0; i < b.N; i++ {
            _, _ = json.Marshal(user)
        }
    })

    b.Run("marshal+unmarshal", func(b *testing.B) {
        for i := 0; i < b.N; i++ {
            data, _ := json.Marshal(user)
            var out User
            _ = json.Unmarshal(data, &out)
        }
    })
}

Run a specific sub-benchmark:

go test -bench=BenchmarkJSON/marshal$ ./...

Comparing Results with benchstat

benchstat is the standard tool for comparing benchmark results statistically:

go install golang.org/x/perf/cmd/benchstat@latest

Capture before/after:

# Run 5 times for statistical significance
go <span class="hljs-built_in">test -bench=. -count=5 ./... > old.txt
<span class="hljs-comment"># Make your optimization
go <span class="hljs-built_in">test -bench=. -count=5 ./... > new.txt

benchstat old.txt new.txt

Output:

name              old time/op    new time/op    delta
JSONMarshal-8       2.50µs ± 2%    1.80µs ± 1%  -28.00%  (p=0.001 n=5+5)

name              old alloc/op   new alloc/op   delta
JSONMarshal-8        512B ± 0%      256B ± 0%  -50.00%  (p=0.001 n=5+5)

The p=0.001 is the p-value — low p means the difference is statistically significant, not noise. Run at least 5 iterations (-count=5) for reliable statistics.

CPU Profiling

Generate a CPU profile during benchmarks:

go test -bench=BenchmarkJSONMarshal -cpuprofile=cpu.prof ./...

Analyze with pprof:

go tool pprof cpu.prof

Interactive commands in pprof:

(pprof) top10        # top 10 functions by CPU time
(pprof) list Marshal # show source with annotations
(pprof) web          # open flame graph in browser (requires graphviz)

Or generate a flame graph directly:

go tool pprof -http=:8080 cpu.prof

Memory Profiling

go test -bench=BenchmarkJSONMarshal -memprofile=mem.prof ./...
go tool pprof mem.prof

In pprof:

(pprof) top10 -cum    # top allocating functions (cumulative)
(pprof) list Marshal  # show allocations per line

Avoiding Benchmark Pitfalls

Compiler Optimization

The Go compiler may optimize away code that has no side effects. Use a sink variable to prevent this:

var result int  // package-level sink

func BenchmarkAdd(b *testing.B) {
    var r int
    for i := 0; i < b.N; i++ {
        r = Add(2, 3)
    }
    result = r  // prevent compiler from eliminating Add calls
}

Without the sink, the compiler could detect that Add(2, 3) has no side effects and eliminate the call entirely, giving you a benchmark that measures nothing.

Inline Caching

For benchmarks that hit a cache (CPU cache, in-memory cache), run enough iterations so cache effects average out. Alternatively, explicitly stress the cache by varying input:

func BenchmarkLookup(b *testing.B) {
    keys := generateUniqueKeys(b.N)
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        Lookup(keys[i%len(keys)])
    }
}

Parallelism

Benchmark parallel code with b.RunParallel:

func BenchmarkConcurrentCache(b *testing.B) {
    cache := NewCache()

    b.RunParallel(func(pb *testing.PB) {
        i := 0
        for pb.Next() {
            cache.Set(fmt.Sprintf("key-%d", i%100), i)
            i++
        }
    })
}

Control the number of goroutines:

go test -bench=BenchmarkConcurrentCache -cpu=1,2,4,8 ./...

This runs the benchmark with 1, 2, 4, and 8 CPUs to measure scalability.

Benchmarks in CI

Run benchmarks in CI to catch regressions:

# .github/workflows/bench.yml
- name: Run benchmarks
  run: go test -bench=. -benchmem ./... | tee benchmark-result.txt

- name: Compare with baseline
  run: benchstat baseline.txt benchmark-result.txt

Store the baseline result as an artifact and compare on each PR. Alert if a benchmark degrades by more than 10%.

When to Write Benchmarks

Write benchmarks for:

  • Code that processes large volumes of data (parsers, serializers, algorithms)
  • Hot paths called millions of times per second
  • Code where you're comparing implementation options
  • Before/after optimization work

Don't write benchmarks for:

  • Database calls (benchmark the database separately, not your Go wrapper)
  • Network I/O (too many external variables)
  • Code that's called rarely

Benchmarks are most valuable when you can run them in a stable environment and compare results over time with benchstat.

Read more