Performance is the lifeblood of systems programming. Whether you are optimizing a database engine, a game physics loop, or a WebSocket server, you need to measure how fast your code actually runs — not just guess. C++ microbenchmarking libraries give you precisely that: a scientific, reproducible way to time small code snippets across hundreds of thousands of iterations and determine which implementation is faster.

But not all C++ benchmarking libraries are created equal. Some prioritize statistical rigor with p-values and confidence intervals, while others aim for minimal overhead and single-header integration. In this guide, we compare four leading C++ microbenchmarking frameworks — Google Benchmark, Celero, nanobench, and Hayai — covering their architecture, ease of use, statistical features, and real-world suitability.

Overview of C++ Microbenchmarking Libraries

FeatureGoogle BenchmarkCeleronanobenchHayai
GitHub Stars10,2448611,704363
Last UpdatedJune 2026June 2026Oct 2024Aug 2019
Primary LanguageC++C++C++C++
Header-OnlyNo (lib)No (lib)Yes (single header)Yes (single header)
CMake IntegrationFetchContent, find_packageFetchContent, find_packageCopy headerCopy header
Statistical OutputMean, median, stddevMean, variance, baselineMean, median, stddev, MADMean, min, max
Fixture SupportYes (BENCHMARK_F)Yes (CELERO_MAIN)No (manual)Yes (HAYAI_FIXTURE)
Custom CountersYes (user counters)Yes (user metrics)LimitedNo
Template BenchmarksYesNoNoNo
LicenseApache 2.0Apache 2.0MITApache 2.0

Google Benchmark: The Industry Standard

Google Benchmark (github.com/google/benchmark) is the most widely adopted C++ microbenchmark library, maintained by Google. With over 10,000 stars and continuous updates (last commit June 2026), it is the de facto choice for performance-critical projects like Chromium, LLVM, and Abseil.

Its hallmark is statistical rigor. Google Benchmark runs each benchmark for a minimum time (configurable), repeats the measurement, and reports the mean, median, and standard deviation. The BENCHMARK macro handles warmup and iteration count automatically, preventing common pitfalls like dead-code elimination:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
#include <benchmark/benchmark.h>
#include <vector>
#include <algorithm>

static void BM_VectorSort(benchmark::State& state) {
    std::vector<int> data(state.range(0));
    for (auto _ : state) {
        state.PauseTiming();
        std::generate(data.begin(), data.end(), std::rand);
        state.ResumeTiming();
        std::sort(data.begin(), data.end());
    }
    state.SetComplexityN(state.range(0));
}
BENCHMARK(BM_VectorSort)
    ->RangeMultiplier(2)->Range(1<<10, 1<<20)
    ->Complexity();

BENCHMARK_MAIN();

Integration is straightforward via CMake:

1
2
3
4
5
6
7
8
include(FetchContent)
FetchContent_Declare(
  benchmark
  GIT_REPOSITORY https://github.com/google/benchmark.git
  GIT_TAG v1.9.0
)
FetchContent_MakeAvailable(benchmark)
target_link_libraries(my_benchmarks PRIVATE benchmark::benchmark)

Key strengths:

  • Automatic iteration count scaling (guarantees statistically significant results)
  • Template benchmarks (BENCHMARK_TEMPLATE) for testing multiple types
  • O(n) complexity analysis with Big-O notation
  • User-defined counters for custom metrics (cache misses, allocations)
  • Integration with perf and hardware performance counters

Limitations: Larger binary (links as a library, ~200KB+), steeper CMake setup than header-only alternatives, and the output format requires post-processing for human-friendly dashboards.

Celero: Baseline-Relative Comparisons

Celero (github.com/DigitalInBlue/Celero) takes a different philosophical approach. Instead of focusing on raw timing numbers, it centers the workflow around baseline comparisons. You designate one benchmark as the baseline, and Celero automatically computes how each subsequent benchmark performs relative to it — expressed as iterations per second and percentage difference.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#include <celero/Celero.h>
#include <algorithm>
#include <vector>

CELERO_MAIN

class SortFixture : public celero::TestFixture {
public:
    std::vector<int> data;
    void setUp(int64_t experimentValue) override {
        data.resize(experimentValue);
        std::generate(data.begin(), data.end(), std::rand);
    }
};

BASELINE_F(QuickSort, Baseline, SortFixture, 10, 100000) {
    std::sort(celero::thisFixture->data.begin(),
              celero::thisFixture->data.end());
}

BENCHMARK_F(QuickSort, StableSort, SortFixture, 10, 100000) {
    std::stable_sort(celero::thisFixture->data.begin(),
                     celero::thisFixture->data.end());
}

Celero’s output table is immediately actionable:

1
2
3
4
|  Experiment  |  Baseline  |  us/Op  |  Iterations  |
|--------------|------------|---------|--------------|
| Baseline     |  1.00000   |  45.23  |  100000      |
| StableSort   |  0.78452   |  57.65  |  100000      |

Key strengths:

  • Clean baseline-relative comparison workflow
  • Built-in CSV and JUnit XML output (ready for CI/CD dashboards)
  • TestFixture class for shared setup/teardown across benchmarks
  • Simple CMake integration with find_package

Limitations: Smaller community (861 stars), fewer statistical features than Google Benchmark (no percentile distribution or Big-O analysis), and the baseline-centric model may not suit all use cases.

nanobench: Minimalist, Header-Only Precision

nanobench (github.com/martinus/nanobench) is a single-header library that packs remarkable statistical power into a tiny footprint. Developed by Martin Ankerl, it uses the rdtsc instruction (or std::chrono on non-x86) for cycle-accurate timing and renders beautiful terminal tables with percentile breakdowns.

The entire library is one header — drag it into your project and you are benchmarking:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#define ANKERL_NANOBENCH_IMPLEMENT
#include <nanobench.h>
#include <unordered_set>
#include <set>

int main() {
    ankerl::nanobench::Bench bench;
    bench.title("Set Insertion Performance")
         .relative(true)
         .minEpochIterations(200000);

    std::unordered_set<int> uset;
    bench.run("std::unordered_set", [&] {
        uset.insert(42);
        uset.erase(42);
    });

    std::set<int> oset;
    bench.run("std::set", [&] {
        oset.insert(42);
        oset.erase(42);
    });
}

Output includes the MAD (Median Absolute Deviation), p50/p90/p95/p99 percentiles, instructions per cycle, and branch mispredictions when perf is available.

Key strengths:

  • Zero-dependency single header — just #include "nanobench.h"
  • Rich terminal output with ASCII charts, percentiles, and relative speedup
  • Hardware counter integration via Linux perf (instructions, cycles, branches, cache misses)
  • Built-in “doNotOptimizeAway” to prevent dead-code elimination

Limitations: No built-in fixture pattern (manual setup required for shared state), last updated October 2024, and x86-focused (some features rely on rdtsc).

Hayai: The Lightweight Veteran

Hayai (github.com/nickbruun/hayai) was one of the earliest C++ microbenchmark frameworks, inspired by Ruby’s Benchmark library. It is header-only, dead simple, and has not changed much since 2019 — which makes it ideal for projects that value stability over new features.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#include <hayai.hpp>
#include <vector>
#include <algorithm>

class SortFixture : public ::hayai::Fixture {
public:
    virtual void SetUp() { data.resize(10000); }
    virtual void TearDown() {}
    std::vector<int> data;
};

BENCHMARK_F(SortFixture, StdSort, 10, 100) {
    std::sort(data.begin(), data.end());
}

BENCHMARK_F(SortFixture, StableSort, 10, 100) {
    std::stable_sort(data.begin(), data.end());
}

int main() {
    hayai::ConsoleOutputter consoleOutputter;
    hayai::Benchmarker::AddOutputter(consoleOutputter);
    hayai::Benchmarker::RunAllTests();
    return 0;
}

Key strengths:

  • Extremely simple API — HAYAI_FIXTURE, BENCHMARK, BENCHMARK_F macros
  • Familiar xUnit-style fixture pattern
  • Minimal overhead for basic timing needs
  • Header-only, no build system configuration needed

Limitations: Effectively unmaintained (last update August 2019), no statistical analysis beyond mean/min/max, no complexity analysis, and limited template support. Best suited for legacy codebases or environments where stability is paramount.

Integration with Build System and CI/CD Pipeline

All four libraries integrate with CMake, but the pattern differs. Here is a unified CMakeLists.txt for a project that conditionally builds benchmarks:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
cmake_minimum_required(VERSION 3.16)
project(PerformanceAnalysis LANGUAGES CXX)

set(CMAKE_CXX_STANDARD 17)

# Option 1: Google Benchmark (library, most features)
if(USE_GOOGLE_BENCHMARK)
    include(FetchContent)
    FetchContent_Declare(benchmark
        GIT_REPOSITORY https://github.com/google/benchmark.git
        GIT_TAG v1.9.0
    )
    FetchContent_MakeAvailable(benchmark)
    add_executable(bm_google gbench.cpp)
    target_link_libraries(bm_google PRIVATE benchmark::benchmark)
endif()

# Option 2: nanobench (header-only, zero config)
if(USE_NANOBENCH)
    include(FetchContent)
    FetchContent_Declare(nanobench
        URL https://raw.githubusercontent.com/martinus/nanobench/master/src/include/nanobench.h
    )
    FetchContent_Populate(nanobench)
    add_executable(bm_nano nanobench_test.cpp)
    target_include_directories(bm_nano PRIVATE ${nanobench_SOURCE_DIR})
endif()

When integrating with CI, Google Benchmark’s --benchmark_format=json and Celero’s CSV/JUnit output are particularly useful. See our self-hosted CI/CD dashboard guide for automated benchmark regression detection.

Choosing the Right Library for Your Project

Use CaseRecommended LibraryReason
Large production project with statistical rigor needsGoogle BenchmarkIndustry standard, active maintenance, complexity analysis
Quick, single-file performance checksnanobenchZero config, beautiful terminal output, hardware counters
Baseline-relative comparison workflowCeleroNative baseline comparisons, JUnit output for CI
Legacy codebase needing minimal dependencyHayaiSimple API, unmaintained but stable, small footprint
Template-heavy benchmarksGoogle BenchmarkBENCHMARK_TEMPLATE for multi-type benchmarks
CI dashboard integrationCelero or Google BenchmarkNative CSV/JUnit output or --benchmark_format=json

Why Invest in Microbenchmarking Infrastructure?

Regular microbenchmarking catches performance regressions before they reach production. In a study of the Chromium project, Google Benchmark detected over 1,200 performance regressions in a single year — changes that would have gone unnoticed with integration tests alone. Setting up a benchmarking harness takes an afternoon and pays dividends for the lifetime of the project.

For broader performance analysis, complement microbenchmarks with system-level profiling tools to identify hotspots, and database benchmarking frameworks for infrastructure-level performance testing. For build system integration, our self-hosted build systems comparison covers Bazel, Pants, and Please for repeatable benchmark environments.

FAQ

What is the difference between microbenchmarking and profiling?

Microbenchmarking measures the execution time of small, isolated code snippets (a single function, an algorithm) under controlled conditions. Profiling captures a holistic view of where an entire application spends time — it identifies hotspots but cannot isolate individual operations as precisely. Use microbenchmarks to compare two implementations and profiling to find what to optimize.

Does Google Benchmark prevent dead-code elimination?

Yes. Google Benchmark uses the benchmark::DoNotOptimize() helper and the state.PauseTiming()/state.ResumeTiming() pattern to prevent the compiler from optimizing away the code being measured. nanobench also provides doNotOptimizeAway(). Always verify with -O2 or -O3 builds.

Can I use nanobench on ARM or RISC-V?

nanobench’s rdtsc fallback uses std::chrono::high_resolution_clock on non-x86 platforms, so it works on ARM and RISC-V. However, hardware counter support (instructions, cycles) requires Linux perf and is platform-dependent.

Which library has the smallest binary overhead?

nanobench and Hayai are header-only and add negligible binary size (the timing code is inlined). Google Benchmark links as a static library (~200KB) and Celero also links as a library.

How often should I run microbenchmarks?

Run microbenchmarks on every commit that touches performance-sensitive code — ideally in CI. Google Benchmark supports the --benchmark_min_time and --benchmark_repetitions flags to balance statistical significance with CI time budgets. For daily trend tracking, store benchmark results in a time-series database and use regression alerts.


💰 想测试你的市场判断力?我用 Polymarket 做预测市场交易——这是全球最大的预测市场平台,从大选结果到技术监管时间线,什么都可以押注。和赌博不同,这是真正的信息市场:你懂的信息越多,胜率越高。我靠预测技术相关事件的走向已经赚了不少。用我的邀请链接注册:Polymarket.com