BenchmarkDotNet: Measure .NET Performance Correctly

BenchmarkDotNet: Measure .NET Performance Correctly

BenchmarkDotNet is the standard .NET library for micro-benchmarking — it handles JIT warmup, statistical analysis, and memory allocation measurement automatically, giving you reliable, comparable performance numbers instead of naive Stopwatch timing.

Key Takeaways

  • The [Benchmark] attribute marks methods to measure; BenchmarkRunner.Run() executes the full benchmark suite
  • [MemoryDiagnoser] adds allocation columns (Gen0/Gen1/Gen2 GC collections and allocated bytes) to every run
  • [Params] lets you parameterize benchmarks to compare performance across different input sizes or configurations
  • [GlobalSetup] and [GlobalCleanup] handle expensive initialization that should not be included in benchmark timing
  • Never use Stopwatch for performance comparison — BenchmarkDotNet eliminates JIT bias, OS noise, and measurement error that make manual timing unreliable

Measuring performance in .NET is harder than it looks. Write a naive Stopwatch benchmark and you will measure JIT compilation time, OS scheduling noise, and GC pressure — not the code you intended to measure. The first run of a method is always slower than subsequent runs. Different platforms optimize differently. BenchmarkDotNet handles all of this automatically: it JIT-warms the code, runs multiple iterations, performs statistical analysis to filter outliers, and reports results with confidence intervals. This post covers everything you need to use it effectively.

Why Not Stopwatch?

Before looking at BenchmarkDotNet, it is worth understanding why manual timing fails.

// This is wrong — measures JIT compilation, not execution
var sw = Stopwatch.StartNew();
var result = MyMethod(input);
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds); // includes JIT warmup

// This is also wrong — too few iterations, high variance
long total = 0;
for (int i = 0; i < 100; i++)
{
    sw.Restart();
    MyMethod(input);
    sw.Stop();
    total += sw.ElapsedTicks;
}
Console.WriteLine(total / 100); // no outlier removal, no statistical significance

Problems with manual timing:

  • The first execution triggers JIT compilation. Subsequent executions run compiled native code. If you measure both together, you get a mix.
  • .NET's tiered compilation promotes hot methods to more aggressively optimized native code after several runs. Your benchmark might measure tier-0 code, tier-1 code, or a mix.
  • The GC may run during your timing window, adding milliseconds of noise.
  • For very fast operations (nanoseconds), timer resolution is insufficient to measure individual calls.

BenchmarkDotNet solves all of these by design.

Installation and Basic Setup

dotnet add package BenchmarkDotNet

Create a benchmark class with [Benchmark]-attributed methods:

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

[MemoryDiagnoser]
public class StringConcatenationBenchmarks
{
    private const int Iterations = 1000;

    [Benchmark(Baseline = true)]
    public string StringConcatenation()
    {
        string result = "";
        for (int i = 0; i < Iterations; i++)
            result += i.ToString();
        return result;
    }

    [Benchmark]
    public string StringBuilderConcatenation()
    {
        var sb = new StringBuilder();
        for (int i = 0; i < Iterations; i++)
            sb.Append(i);
        return sb.ToString();
    }

    [Benchmark]
    public string StringJoin()
    {
        return string.Join("", Enumerable.Range(0, Iterations));
    }
}

// Run in Main() or a test
class Program
{
    static void Main(string[] args)
    {
        BenchmarkRunner.Run<StringConcatenationBenchmarks>();
    }
}

Important: benchmarks must be run in Release configuration. BenchmarkDotNet will warn you if you accidentally run in Debug. Run with:

dotnet run -c Release

Reading the Output Table

BenchmarkDotNet produces a formatted table after each run:

| Method                    | Mean        | Error     | StdDev    | Ratio | Gen0      | Allocated  |
|---------------------------|-------------|-----------|-----------|-------|-----------|------------|
| StringConcatenation       | 1,842.3 μs  | 12.41 μs  | 11.61 μs  | 1.00  | 1000.0000 | 4,058 KB   |
| StringBuilderConcatenation|    12.8 μs  |  0.08 μs  |  0.07 μs  | 0.007 |    7.6294 |    16 KB   |
| StringJoin                |    24.1 μs  |  0.14 μs  |  0.13 μs  | 0.013 |   19.1040 |    39 KB   |

Column meanings:

  • Mean: average execution time over all iterations
  • Error: half the 99.9% confidence interval — how much the mean might deviate due to measurement noise
  • StdDev: standard deviation of all measurements — lower is more stable
  • Ratio: relative to the baseline method (the one marked [Benchmark(Baseline = true)])
  • Gen0/Gen1/Gen2: garbage collections per 1000 operations. High Gen0 means frequent short-lived allocations. Gen2 is costly.
  • Allocated: total managed memory allocated per operation (requires [MemoryDiagnoser])

From this table you can see that StringBuilder is 144x faster than + concatenation and allocates 254x less memory. The numbers are reliable because BenchmarkDotNet ran each method thousands of times and applied statistical analysis.

[MemoryDiagnoser]

Memory allocation is often as important as raw speed. [MemoryDiagnoser] adds the Gen0, Gen1, Gen2, and Allocated columns with zero configuration.

[MemoryDiagnoser]
public class SerializationBenchmarks
{
    private readonly Order _order = CreateTestOrder();

    [Benchmark]
    public string SerializeWithNewtonsoft()
    {
        return JsonConvert.SerializeObject(_order);
    }

    [Benchmark]
    public string SerializeWithSystemTextJson()
    {
        return System.Text.Json.JsonSerializer.Serialize(_order);
    }

    [Benchmark]
    public byte[] SerializeWithMessagePack()
    {
        return MessagePackSerializer.Serialize(_order);
    }
}

The Allocated column shows bytes allocated per single call, accounting for all intermediate objects created during execution. This is critical for hot paths — an allocation of 1KB per request sounds small until you're at 10,000 req/s and allocating 10 GB/s, triggering constant GC.

[Params]: Benchmarking Across Input Sizes

Performance often varies dramatically with input size. [Params] runs the same benchmark with multiple values, producing a separate row for each.

[MemoryDiagnoser]
public class SearchBenchmarks
{
    [Params(10, 100, 1_000, 10_000)]
    public int CollectionSize;

    private List<int> _data;

    [GlobalSetup]
    public void Setup()
    {
        _data = Enumerable.Range(0, CollectionSize)
                          .OrderBy(_ => Guid.NewGuid())
                          .ToList();
    }

    [Benchmark(Baseline = true)]
    public bool LinearSearch()
    {
        return _data.Contains(CollectionSize / 2);
    }

    [Benchmark]
    public bool BinarySearch()
    {
        var sorted = _data.OrderBy(x => x).ToList();
        return sorted.BinarySearch(CollectionSize / 2) >= 0;
    }

    [Benchmark]
    public bool HashSetLookup()
    {
        var set = new HashSet<int>(_data);
        return set.Contains(CollectionSize / 2);
    }
}

Output:

| Method        | CollectionSize | Mean        | Ratio |
|---------------|----------------|-------------|-------|
| LinearSearch  | 10             |    12.3 ns  | 1.00  |
| BinarySearch  | 10             |   445.1 ns  | 36.19 |
| HashSetLookup | 10             |   312.4 ns  | 25.40 |
|               |                |             |       |
| LinearSearch  | 1000           |  1,234.8 ns | 1.00  |
| BinarySearch  | 1000           |  4,123.7 ns | 3.34  |
| HashSetLookup | 1000           |    312.9 ns | 0.25  |
|               |                |             |       |
| LinearSearch  | 10000          | 12,891.2 ns | 1.00  |
| BinarySearch  | 10000          |  4,891.3 ns | 0.38  |
| HashSetLookup | 10000          |    314.1 ns | 0.02  |

At size 10, linear search wins. At size 10,000, HashSet lookup is 41x faster. [Params] makes this crossover point visible.

[GlobalSetup] and [GlobalCleanup]

Expensive initialization — opening database connections, loading files, deserializing large objects — should not be included in benchmark timing. [GlobalSetup] runs once before all iterations; [IterationSetup] runs before each individual iteration.

[MemoryDiagnoser]
public class DatabaseQueryBenchmarks
{
    private SqlConnection _connection;
    private byte[] _largePayload;

    [GlobalSetup]
    public async Task Setup()
    {
        // Connection setup — not measured
        _connection = new SqlConnection(connectionString);
        await _connection.OpenAsync();

        // Large payload preparation — not measured
        _largePayload = File.ReadAllBytes("testdata/large-payload.json");
    }

    [GlobalCleanup]
    public async Task Cleanup()
    {
        await _connection.CloseAsync();
        _connection.Dispose();
    }

    [Benchmark]
    public async Task<List<Product>> QueryProducts()
    {
        // Only this is measured
        return await _connection
            .QueryAsync<Product>("SELECT * FROM Products WHERE IsActive = 1")
            .ToListAsync();
    }

    [Benchmark]
    public ParsedPayload ParseLargeJson()
    {
        // Only parsing is measured — not loading from disk
        return JsonSerializer.Deserialize<ParsedPayload>(_largePayload);
    }
}

[IterationSetup] is useful when each iteration needs a fresh state (like a clear cache or a reset data structure) but creating that state should not count toward the benchmark time.

Comparing Implementations

A common workflow is benchmarking a proposed optimization against the current implementation. Mark the current implementation as Baseline = true and the ratio column immediately shows the speedup or regression.

[MemoryDiagnoser]
public class ParserBenchmarks
{
    private readonly string _input = File.ReadAllText("testdata/sample.json");

    [Benchmark(Baseline = true)]
    public object ParseWithRegex()
    {
        return LegacyParser.Parse(_input); // existing implementation
    }

    [Benchmark]
    public object ParseWithSpan()
    {
        return OptimizedParser.Parse(_input); // proposed replacement
    }
}

If Ratio for ParseWithSpan is 0.12, the new implementation is 8.3x faster. If it is 1.15, you have a 15% regression to investigate before shipping.

Exporters and CI Regression Detection

BenchmarkDotNet can export results to JSON, CSV, and Markdown. These exports enable CI-based performance regression detection.

// In the benchmark project entry point
BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly)
    .Run(args, new ManualConfig()
        .AddExporter(JsonExporter.Full)      // outputs BenchmarkDotNet.Artifacts/results/*.json
        .AddExporter(MarkdownExporter.GitHub)
        .AddDiagnoser(MemoryDiagnoser.Default));

In CI, compare the JSON output to a stored baseline:

# Run benchmarks and save results
dotnet run -c Release -- --filter * --exporters json

<span class="hljs-comment"># Compare with baseline (simple example using jq)
CURRENT_MEAN=$(jq <span class="hljs-string">'.Benchmarks[0].Statistics.Mean' results/current.json)
BASELINE_MEAN=$(jq <span class="hljs-string">'.Benchmarks[0].Statistics.Mean' results/baseline.json)

<span class="hljs-comment"># Fail if current is more than 10% slower than baseline
RATIO=$(<span class="hljs-built_in">echo <span class="hljs-string">"$CURRENT_MEAN / <span class="hljs-variable">$BASELINE_MEAN" <span class="hljs-pipe">| bc -l)
<span class="hljs-keyword">if (( $(echo "<span class="hljs-variable">$RATIO > <span class="hljs-number">1.10" <span class="hljs-pipe">| bc -l) )); <span class="hljs-keyword">then
    <span class="hljs-built_in">echo <span class="hljs-string">"Performance regression detected: ${RATIO}x slowdown"
    <span class="hljs-built_in">exit 1
<span class="hljs-keyword">fi

For more sophisticated comparison, ResultsComparer is a tool maintained by the .NET team that compares BenchmarkDotNet JSON output files and reports regressions with statistical significance.

Common Pitfalls

Benchmarking in Debug mode. Debug builds disable most JIT optimizations and add bounds checking. Always run with dotnet run -c Release. BenchmarkDotNet prints a warning but still runs — the numbers will be meaningless in Debug.

Dead code elimination. The JIT can detect that a return value is never used and eliminate the computation entirely, benchmarking nothing. Always return the result of the computed value or use BenchmarkDotNet.Engines.Consumer to consume it.

// Wrong — JIT may eliminate the loop entirely
[Benchmark]
public void ComputeSomething()
{
    var result = ExpensiveComputation();
    // result is discarded — JIT may skip the computation
}

// Correct — return the value
[Benchmark]
public int ComputeSomething()
{
    return ExpensiveComputation(); // returned, cannot be eliminated
}

Measuring operations too fast to time accurately. For operations in the single-digit nanosecond range (array access, simple arithmetic), use [Benchmark(OperationsPerInvoke = 1000)] to call the operation 1000 times per measurement, then divide. BenchmarkDotNet does the math automatically.

[Benchmark(OperationsPerInvoke = 1000)]
public int ArrayAccess()
{
    int sum = 0;
    for (int i = 0; i < 1000; i++)
        sum += _array[i];
    return sum;
}
// Reports per-operation time (total / 1000)

Including setup in benchmarks accidentally. LINQ queries like _list.OrderBy(x => x).ToList() allocate on every call. If you're benchmarking search, sort inside [GlobalSetup] and benchmark only the search. If you're benchmarking the sort, include it in the benchmark but use [IterationSetup] to reset the list to unsorted state before each iteration.

Over-parameterizing. [Params(1, 2, 5, 10, 50, 100, 500, 1000, 5000, 10000)] produces 10 rows × number of methods = a very long run. Keep params focused on the crossover points you care about.

Running a Subset of Benchmarks

For large benchmark suites, you can filter which benchmarks to run:

# Run only benchmarks whose name contains "String"
dotnet run -c Release -- --filter *String*

<span class="hljs-comment"># Run a specific class
dotnet run -c Release -- --filter StringConcatenationBenchmarks.*

<span class="hljs-comment"># List all available benchmarks without running
dotnet run -c Release -- --list flat

Conclusion

BenchmarkDotNet is the correct way to measure .NET performance. The moment you have two implementations to compare or a performance requirement to validate, reach for it instead of Stopwatch. The [Benchmark], [MemoryDiagnoser], and [Params] attributes cover 90% of common use cases. Add [GlobalSetup] for expensive initialization, export to JSON for CI regression tracking, and you have a complete performance testing workflow that produces reliable, reproducible numbers — the kind you can confidently base optimization decisions on.

Read more