Introduction
Memory allocation is one of the most performance-critical aspects of C++ applications. The default new/delete operators use a general-purpose allocator that works reasonably well for most cases, but high-performance systems often suffer from allocation bottlenecks, memory fragmentation, and cache-unfriendly allocation patterns.
C++17 introduced Polymorphic Memory Resources (PMR) — a standard library framework that lets you swap allocation strategies at runtime without changing container types. Combined with industrial-grade allocators like jemalloc and mimalloc, and Boost.Container’s PMR-aware containers, you can dramatically improve allocation performance.
This article compares four approaches to custom memory allocation in C++: std::pmr, Boost.Container PMR, jemalloc, and mimalloc.
Allocator Architecture Comparison
| Feature | std::pmr (C++17) | Boost.Container | jemalloc | mimalloc |
|---|---|---|---|---|
| Stars | Part of Standard | Part of Boost | 10,975 | 13,128 |
| Language | C++ Standard Library | C++ Header-Only | C (C++ bindings) | C (C++ bindings) |
| Approach | Runtime-polymorphic allocator | PMR-compatible containers | Drop-in malloc replacement | Drop-in malloc replacement |
| Thread Caching | Yes (synchronized_pool_resource) | Delegates to backend | Per-thread arena | Per-thread heap |
| Fragmentation Control | monotonic_buffer_resource | Through PMR backend | Active defrag | Free list sharding |
| Debug Support | No | No | Yes (JEMALLOC_STATS) | Yes (MI_STATS) |
| Header-Only | Library component | Yes | Compiled library | Compiled library |
| Portable | Yes (any C++17 compiler) | Yes | Linux/macOS/FreeBSD | Linux/macOS/Windows |
std::pmr — Standard Library Polymorphic Allocators
The C++17 PMR framework provides a set of memory resource classes that implement different allocation strategies, accessible through the std::pmr namespace.
| |
Key PMR resource types:
monotonic_buffer_resource: Appends-only allocation from a pre-allocated buffer. No individual deallocations — ideal for request-scoped or frame-based allocation patterns.unsynchronized_pool_resource: Pool allocator for single-threaded use. Groups allocations by size class for reduced fragmentation.synchronized_pool_resource: Thread-safe version with internal locking. Good for general-purpose multi-threaded allocation.new_delete_resource(): Wraps globalnew/delete— the default fallback.
Boost.Container PMR
Boost.Container extends the PMR framework with additional containers that support polymorphic allocators, plus enhanced memory resource implementations.
| |
Boost.Container provides containers not yet in the standard PMR library: deque, list, map, set, flat_map, flat_set, slist, and stable_vector — all with PMR allocator support. This gives you type-erased allocation for virtually any data structure.
jemalloc
jemalloc is Facebook’s production memory allocator, designed for multi-threaded server workloads. It emphasizes low fragmentation and detailed statistics.
| |
| |
jemalloc uses per-thread arenas to eliminate lock contention, size classes for efficient reuse, and a background thread for dirty page purging. It’s particularly strong in scenarios with many small, short-lived allocations — exactly the pattern seen in network servers and database engines.
mimalloc
mimalloc is Microsoft’s compact general-purpose allocator, optimized for performance and low memory overhead. It features excellent security properties, including guard pages and hardened metadata.
| |
| |
mimalloc’s key innovations include free list sharding (each thread has multiple free lists to reduce contention), eager page reset (return unused memory to OS quickly), and secure mode (guard pages, encoded free lists). It typically matches or beats jemalloc in single-threaded performance while using 10-15% less memory.
Deployment Architecture
For production deployments, you have two main integration strategies:
Strategy 1: Global LD_PRELOAD Replacement
| |
This replaces all malloc/free calls globally — works with any language, any library. Use this for existing applications where you can’t change code.
Strategy 2: Selective PMR Usage (C++)
| |
This is the recommended approach for new C++ codebases where you can control allocation per data structure.
Performance Benchmarks and Scaling Considerations
The choice between these allocators depends heavily on your workload characteristics. In single-threaded benchmarks with many small allocations (common in parsing and serialization workloads), mimalloc typically outperforms other options significantly. Its free list sharding and aggressive inlining reduce per-allocation overhead.
For multi-threaded server workloads with high contention, jemalloc’s thread-cache eviction strategies and background page purging provide more consistent tail latency. This matters because the Linux kernel’s default glibc allocator can exhibit multi-second pauses during malloc_trim() when threads allocate and free in different orders.
The PMR framework (both std::pmr and Boost.Container) gives you fine-grained control without changing the global allocator. This is especially valuable in applications with mixed allocation patterns — you can use monotonic_buffer_resource for the hot path (10x faster than malloc for sequential allocation) while leaving cold paths with the default allocator.
For applications processing millions of small requests per second, combining mimalloc as the global allocator with PMR arenas for request-local data can yield 40-60% throughput improvements over glibc’s default allocator. For deep dives into memory optimization, see our memory allocator comparison guide. For container-level optimizations that pair well with custom allocators, check our small vector optimization comparison and high-performance hash container guide.
CI/CD Integration and Testing
Integrating custom allocators into your continuous integration pipeline requires careful attention to both build configuration and runtime testing. Here’s how to set up proper validation for your chosen allocator strategy.
For jemalloc and mimalloc, compile-time linking is straightforward — add -ljemalloc or -lmimalloc to your linker flags and verify with ldd that the correct shared library is loaded:
| |
For runtime verification, both allocators provide environment variables for statistics:
| |
For PMR-based code, unit testing allocation strategies is essential. Use std::pmr::monotonic_buffer_resource with a fixed buffer during tests to catch accidental heap allocations in performance-critical paths. If the buffer overflows, null_memory_resource will throw instead of silently falling back to heap allocation — this catches regressions immediately in CI rather than in production.
Integration testing should also verify thread safety. When using synchronized_pool_resource or global jemalloc/mimalloc replacement, run your test suite under Thread Sanitizer (-fsanitize=thread) to catch data races in allocation paths. Both jemalloc and mimalloc are thoroughly tested for thread safety, but custom PMR allocator wrappers may introduce races if you modify shared state without synchronization.
FAQ
What is the difference between PMR allocators and replacing the global allocator?
PMR allocators work at the C++ container level — each container can use a different allocation strategy. Global replacement (via LD_PRELOAD or linking) affects every allocation in the process, including C libraries and third-party code. PMR is safer for incremental optimization; global replacement is broader but can interact unexpectedly with libraries that have custom allocator expectations.
Can I use jemalloc and mimalloc together?
Not as global allocators — they both replace malloc/free and only one can be active. However, you can use mimalloc’s C++ API directly for specific allocations while jemalloc handles the rest globally, or vice versa.
Does PMR work with third-party libraries?
Only if the library’s containers use std::pmr::polymorphic_allocator (or accept an allocator template parameter). Most third-party libraries use std::allocator by default, so PMR integration requires the library to support custom allocators. This is a major reason Boost.Container includes its own PMR-aware container implementations.
Which allocator is best for a game engine?
Game engines benefit from mimalloc’s low overhead and fast single-threaded performance. For frame-based allocation, pair mimalloc as the global allocator with monotonic_buffer_resource for per-frame scratch data. This gives you both fast general allocation and zero-fragmentation arena allocation for the hot path.
Is std::pmr production-ready?
Yes. The PMR framework has been part of the C++ standard since 2017 and is implemented in GCC 9+, Clang 16+, and MSVC 2019+. The monotonic_buffer_resource and synchronized_pool_resource are well-tested in production. The main limitation is that not all standard containers have PMR aliases (only vector, string, list, deque, map, set, unordered_map, unordered_set do).
💰 想测试你的市场判断力?我用 Polymarket 做预测市场交易——这是全球最大的预测市场平台,从大选结果到技术监管时间线,什么都可以押注。和赌博不同,这是真正的信息市场:你懂的信息越多,胜率越高。我靠预测技术相关事件的走向已经赚了不少。用我的邀请链接注册:Polymarket.com