Go Benchmark Testing: go test -bench, benchstat, and Profiling
Go's testing package includes a built-in benchmark system. Benchmarks measure how long code takes to run and how much memory it allocates. They're the right tool for validating optimization work and catching performance regressions.
Writing Benchmarks
Benchmark functions start with Bench and take *testing.B:
func BenchmarkAdd(b *testing.B) {
for i := 0; i < b.N; i++ {
Add(2, 3)
}
}b.N is the number of iterations. The testing framework automatically adjusts b.N until the benchmark runs for a stable duration (default: 1 second). You don't set b.N — just loop over it.
Running Benchmarks
# Run all benchmarks in current package
go <span class="hljs-built_in">test -bench=. ./...
<span class="hljs-comment"># Run specific benchmark
go <span class="hljs-built_in">test -bench=BenchmarkAdd ./...
<span class="hljs-comment"># Run benchmarks matching a pattern
go <span class="hljs-built_in">test -bench=BenchmarkJSON.* ./...
<span class="hljs-comment"># Run with specific duration (instead of default 1s)
go <span class="hljs-built_in">test -bench=. -benchtime=10s ./...
<span class="hljs-comment"># Run specific number of iterations
go <span class="hljs-built_in">test -bench=. -benchtime=1000x ./...Sample output:
BenchmarkAdd-8 1000000000 0.2781 ns/op
BenchmarkStringConcat-8 5000000 350 ns/op
BenchmarkJSONMarshal-8 500000 2500 ns/opThe columns: benchmark name, number of CPU cores, iterations run, nanoseconds per operation.
Memory Allocation Stats
Track allocations with -benchmem:
go test -bench=. -benchmem ./...Output:
BenchmarkJSONMarshal-8 500000 2500 ns/op 512 B/op 8 allocs/opThe new columns: bytes allocated per operation, allocations per operation. Reducing allocations often matters more than CPU time — the garbage collector's pressure directly affects throughput.
Setup Outside the Loop
Don't measure setup time. Reset the timer or move setup outside the loop:
func BenchmarkSort(b *testing.B) {
// Generate data once
data := generateLargeSlice(10000)
b.ResetTimer() // start measuring after setup
for i := 0; i < b.N; i++ {
// copy so each iteration starts with unsorted data
input := make([]int, len(data))
copy(input, data)
sort.Ints(input)
}
}Without b.ResetTimer(), the time to generate data pollutes the benchmark.
For per-iteration setup:
func BenchmarkDBInsert(b *testing.B) {
db := setupTestDB(b)
b.ResetTimer()
for i := 0; i < b.N; i++ {
b.StopTimer() // pause timer during setup
record := generateRecord(i)
b.StartTimer() // resume timer
db.Insert(record)
}
}Sub-Benchmarks
Like t.Run, use b.Run for sub-benchmarks:
func BenchmarkJSON(b *testing.B) {
user := User{Name: "Alice", Email: "alice@example.com", Age: 30}
b.Run("marshal", func(b *testing.B) {
for i := 0; i < b.N; i++ {
_, _ = json.Marshal(user)
}
})
b.Run("marshal+unmarshal", func(b *testing.B) {
for i := 0; i < b.N; i++ {
data, _ := json.Marshal(user)
var out User
_ = json.Unmarshal(data, &out)
}
})
}Run a specific sub-benchmark:
go test -bench=BenchmarkJSON/marshal$ ./...Comparing Results with benchstat
benchstat is the standard tool for comparing benchmark results statistically:
go install golang.org/x/perf/cmd/benchstat@latestCapture before/after:
# Run 5 times for statistical significance
go <span class="hljs-built_in">test -bench=. -count=5 ./... > old.txt
<span class="hljs-comment"># Make your optimization
go <span class="hljs-built_in">test -bench=. -count=5 ./... > new.txt
benchstat old.txt new.txtOutput:
name old time/op new time/op delta
JSONMarshal-8 2.50µs ± 2% 1.80µs ± 1% -28.00% (p=0.001 n=5+5)
name old alloc/op new alloc/op delta
JSONMarshal-8 512B ± 0% 256B ± 0% -50.00% (p=0.001 n=5+5)The p=0.001 is the p-value — low p means the difference is statistically significant, not noise. Run at least 5 iterations (-count=5) for reliable statistics.
CPU Profiling
Generate a CPU profile during benchmarks:
go test -bench=BenchmarkJSONMarshal -cpuprofile=cpu.prof ./...Analyze with pprof:
go tool pprof cpu.profInteractive commands in pprof:
(pprof) top10 # top 10 functions by CPU time
(pprof) list Marshal # show source with annotations
(pprof) web # open flame graph in browser (requires graphviz)Or generate a flame graph directly:
go tool pprof -http=:8080 cpu.profMemory Profiling
go test -bench=BenchmarkJSONMarshal -memprofile=mem.prof ./...
go tool pprof mem.profIn pprof:
(pprof) top10 -cum # top allocating functions (cumulative)
(pprof) list Marshal # show allocations per lineAvoiding Benchmark Pitfalls
Compiler Optimization
The Go compiler may optimize away code that has no side effects. Use a sink variable to prevent this:
var result int // package-level sink
func BenchmarkAdd(b *testing.B) {
var r int
for i := 0; i < b.N; i++ {
r = Add(2, 3)
}
result = r // prevent compiler from eliminating Add calls
}Without the sink, the compiler could detect that Add(2, 3) has no side effects and eliminate the call entirely, giving you a benchmark that measures nothing.
Inline Caching
For benchmarks that hit a cache (CPU cache, in-memory cache), run enough iterations so cache effects average out. Alternatively, explicitly stress the cache by varying input:
func BenchmarkLookup(b *testing.B) {
keys := generateUniqueKeys(b.N)
b.ResetTimer()
for i := 0; i < b.N; i++ {
Lookup(keys[i%len(keys)])
}
}Parallelism
Benchmark parallel code with b.RunParallel:
func BenchmarkConcurrentCache(b *testing.B) {
cache := NewCache()
b.RunParallel(func(pb *testing.PB) {
i := 0
for pb.Next() {
cache.Set(fmt.Sprintf("key-%d", i%100), i)
i++
}
})
}Control the number of goroutines:
go test -bench=BenchmarkConcurrentCache -cpu=1,2,4,8 ./...This runs the benchmark with 1, 2, 4, and 8 CPUs to measure scalability.
Benchmarks in CI
Run benchmarks in CI to catch regressions:
# .github/workflows/bench.yml
- name: Run benchmarks
run: go test -bench=. -benchmem ./... | tee benchmark-result.txt
- name: Compare with baseline
run: benchstat baseline.txt benchmark-result.txtStore the baseline result as an artifact and compare on each PR. Alert if a benchmark degrades by more than 10%.
When to Write Benchmarks
Write benchmarks for:
- Code that processes large volumes of data (parsers, serializers, algorithms)
- Hot paths called millions of times per second
- Code where you're comparing implementation options
- Before/after optimization work
Don't write benchmarks for:
- Database calls (benchmark the database separately, not your Go wrapper)
- Network I/O (too many external variables)
- Code that's called rarely
Benchmarks are most valuable when you can run them in a stable environment and compare results over time with benchstat.