Introduction

Memory allocation is one of the most performance-critical aspects of C++ applications. The default new/delete operators use a general-purpose allocator that works reasonably well for most cases, but high-performance systems often suffer from allocation bottlenecks, memory fragmentation, and cache-unfriendly allocation patterns.

C++17 introduced Polymorphic Memory Resources (PMR) — a standard library framework that lets you swap allocation strategies at runtime without changing container types. Combined with industrial-grade allocators like jemalloc and mimalloc, and Boost.Container’s PMR-aware containers, you can dramatically improve allocation performance.

This article compares four approaches to custom memory allocation in C++: std::pmr, Boost.Container PMR, jemalloc, and mimalloc.

Allocator Architecture Comparison

Featurestd::pmr (C++17)Boost.Containerjemallocmimalloc
StarsPart of StandardPart of Boost10,97513,128
LanguageC++ Standard LibraryC++ Header-OnlyC (C++ bindings)C (C++ bindings)
ApproachRuntime-polymorphic allocatorPMR-compatible containersDrop-in malloc replacementDrop-in malloc replacement
Thread CachingYes (synchronized_pool_resource)Delegates to backendPer-thread arenaPer-thread heap
Fragmentation Controlmonotonic_buffer_resourceThrough PMR backendActive defragFree list sharding
Debug SupportNoNoYes (JEMALLOC_STATS)Yes (MI_STATS)
Header-OnlyLibrary componentYesCompiled libraryCompiled library
PortableYes (any C++17 compiler)YesLinux/macOS/FreeBSDLinux/macOS/Windows

std::pmr — Standard Library Polymorphic Allocators

The C++17 PMR framework provides a set of memory resource classes that implement different allocation strategies, accessible through the std::pmr namespace.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
#include <memory_resource>
#include <vector>
#include <string>

// Stack-based arena — no individual deallocations, bulk free at end
char buffer[1024 * 1024];  // 1MB stack buffer
std::pmr::monotonic_buffer_resource pool{
    buffer, sizeof(buffer), 
    std::pmr::null_memory_resource()  // Fallback if exhausted
};

// All containers share the same arena
std::pmr::vector<std::pmr::string> items{&pool};
items.emplace_back("first item", &pool);
items.emplace_back("second item", &pool);
// All allocations from the 1MB buffer — zero fragmentation
// Entire buffer freed when pool goes out of scope

Key PMR resource types:

  • monotonic_buffer_resource: Appends-only allocation from a pre-allocated buffer. No individual deallocations — ideal for request-scoped or frame-based allocation patterns.
  • unsynchronized_pool_resource: Pool allocator for single-threaded use. Groups allocations by size class for reduced fragmentation.
  • synchronized_pool_resource: Thread-safe version with internal locking. Good for general-purpose multi-threaded allocation.
  • new_delete_resource(): Wraps global new/delete — the default fallback.

Boost.Container PMR

Boost.Container extends the PMR framework with additional containers that support polymorphic allocators, plus enhanced memory resource implementations.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
#include <boost/container/pmr/memory_resource.hpp>
#include <boost/container/pmr/vector.hpp>
#include <boost/container/pmr/string.hpp>

// Boost's polymorphic resource with adaptive pooling
boost::container::pmr::unsynchronized_pool_resource pool;

// PMR-aware containers from Boost
boost::container::pmr::vector<int> vec{&pool};
vec.reserve(10000);  // Uses pool allocator, not default malloc

boost::container::pmr::string str{"allocated from pool", &pool};

Boost.Container provides containers not yet in the standard PMR library: deque, list, map, set, flat_map, flat_set, slist, and stable_vector — all with PMR allocator support. This gives you type-erased allocation for virtually any data structure.

jemalloc

jemalloc is Facebook’s production memory allocator, designed for multi-threaded server workloads. It emphasizes low fragmentation and detailed statistics.

1
2
3
4
5
6
7
# Build from source
git clone https://github.com/jemalloc/jemalloc.git
cd jemalloc
./autogen.sh
./configure --enable-stats --enable-prof
make -j$(nproc)
sudo make install
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
// Link with -ljemalloc; jemalloc replaces malloc/free globally
// No code changes needed for existing applications
// All allocations automatically use jemalloc's per-thread arenas

// Optional: access jemalloc statistics
#include <jemalloc/jemalloc.h>

size_t allocated, active, metadata;
size_t sz = sizeof(allocated);
mallctl("stats.allocated", &allocated, &sz, NULL, 0);
mallctl("stats.active", &active, &sz, NULL, 0);
mallctl("stats.metadata", &metadata, &sz, NULL, 0);
printf("Allocated: %zu, Active: %zu, Metadata: %zu\n", 
       allocated, active, metadata);

jemalloc uses per-thread arenas to eliminate lock contention, size classes for efficient reuse, and a background thread for dirty page purging. It’s particularly strong in scenarios with many small, short-lived allocations — exactly the pattern seen in network servers and database engines.

mimalloc

mimalloc is Microsoft’s compact general-purpose allocator, optimized for performance and low memory overhead. It features excellent security properties, including guard pages and hardened metadata.

1
2
3
4
5
6
7
# Build from source
git clone https://github.com/microsoft/mimalloc.git
cd mimalloc
mkdir build && cd build
cmake ..
make -j$(nproc)
sudo make install
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
// Link with -lmimalloc; mimics malloc interface
// mimalloc provides a C++ allocator conforming to std::allocator

// Or use the C++ API directly
#include <mimalloc.h>

// Allocate with mimalloc directly (bypasses global replacement)
void* ptr = mi_malloc(1024);
mi_free(ptr);

// Use mimalloc heap for arenas
mi_heap_t* heap = mi_heap_new();
void* p = mi_heap_malloc(heap, 4096);
mi_heap_delete(heap);  // Frees all allocations from this heap

mimalloc’s key innovations include free list sharding (each thread has multiple free lists to reduce contention), eager page reset (return unused memory to OS quickly), and secure mode (guard pages, encoded free lists). It typically matches or beats jemalloc in single-threaded performance while using 10-15% less memory.

Deployment Architecture

For production deployments, you have two main integration strategies:

Strategy 1: Global LD_PRELOAD Replacement

1
2
3
4
5
# jemalloc
LD_PRELOAD=/usr/local/lib/libjemalloc.so ./your_server

# mimalloc
LD_PRELOAD=/usr/local/lib/libmimalloc.so ./your_server

This replaces all malloc/free calls globally — works with any language, any library. Use this for existing applications where you can’t change code.

Strategy 2: Selective PMR Usage (C++)

1
2
3
4
5
6
// Use PMR for specific performance-critical containers
std::pmr::monotonic_buffer_resource request_arena(1024 * 1024);
std::pmr::unordered_map<int, std::pmr::string> cache{&request_arena};

// Keep default allocator for everything else
std::vector<std::string> other_data;  // Uses default new/delete

This is the recommended approach for new C++ codebases where you can control allocation per data structure.

Performance Benchmarks and Scaling Considerations

The choice between these allocators depends heavily on your workload characteristics. In single-threaded benchmarks with many small allocations (common in parsing and serialization workloads), mimalloc typically outperforms other options significantly. Its free list sharding and aggressive inlining reduce per-allocation overhead.

For multi-threaded server workloads with high contention, jemalloc’s thread-cache eviction strategies and background page purging provide more consistent tail latency. This matters because the Linux kernel’s default glibc allocator can exhibit multi-second pauses during malloc_trim() when threads allocate and free in different orders.

The PMR framework (both std::pmr and Boost.Container) gives you fine-grained control without changing the global allocator. This is especially valuable in applications with mixed allocation patterns — you can use monotonic_buffer_resource for the hot path (10x faster than malloc for sequential allocation) while leaving cold paths with the default allocator.

For applications processing millions of small requests per second, combining mimalloc as the global allocator with PMR arenas for request-local data can yield 40-60% throughput improvements over glibc’s default allocator. For deep dives into memory optimization, see our memory allocator comparison guide. For container-level optimizations that pair well with custom allocators, check our small vector optimization comparison and high-performance hash container guide.

CI/CD Integration and Testing

Integrating custom allocators into your continuous integration pipeline requires careful attention to both build configuration and runtime testing. Here’s how to set up proper validation for your chosen allocator strategy.

For jemalloc and mimalloc, compile-time linking is straightforward — add -ljemalloc or -lmimalloc to your linker flags and verify with ldd that the correct shared library is loaded:

1
2
3
# Verify which allocator is linked
ldd ./your_binary | grep -E "jemalloc|mimalloc"
# Should show the path to libjemalloc.so or libmimalloc.so

For runtime verification, both allocators provide environment variables for statistics:

1
2
3
4
5
# jemalloc stats
MALLOC_CONF="stats_print:true" ./your_binary 2>&1 | grep -A5 "Allocated"

# mimalloc stats
MIMALLOC_SHOW_STATS=1 ./your_binary

For PMR-based code, unit testing allocation strategies is essential. Use std::pmr::monotonic_buffer_resource with a fixed buffer during tests to catch accidental heap allocations in performance-critical paths. If the buffer overflows, null_memory_resource will throw instead of silently falling back to heap allocation — this catches regressions immediately in CI rather than in production.

Integration testing should also verify thread safety. When using synchronized_pool_resource or global jemalloc/mimalloc replacement, run your test suite under Thread Sanitizer (-fsanitize=thread) to catch data races in allocation paths. Both jemalloc and mimalloc are thoroughly tested for thread safety, but custom PMR allocator wrappers may introduce races if you modify shared state without synchronization.

FAQ

What is the difference between PMR allocators and replacing the global allocator?

PMR allocators work at the C++ container level — each container can use a different allocation strategy. Global replacement (via LD_PRELOAD or linking) affects every allocation in the process, including C libraries and third-party code. PMR is safer for incremental optimization; global replacement is broader but can interact unexpectedly with libraries that have custom allocator expectations.

Can I use jemalloc and mimalloc together?

Not as global allocators — they both replace malloc/free and only one can be active. However, you can use mimalloc’s C++ API directly for specific allocations while jemalloc handles the rest globally, or vice versa.

Does PMR work with third-party libraries?

Only if the library’s containers use std::pmr::polymorphic_allocator (or accept an allocator template parameter). Most third-party libraries use std::allocator by default, so PMR integration requires the library to support custom allocators. This is a major reason Boost.Container includes its own PMR-aware container implementations.

Which allocator is best for a game engine?

Game engines benefit from mimalloc’s low overhead and fast single-threaded performance. For frame-based allocation, pair mimalloc as the global allocator with monotonic_buffer_resource for per-frame scratch data. This gives you both fast general allocation and zero-fragmentation arena allocation for the hot path.

Is std::pmr production-ready?

Yes. The PMR framework has been part of the C++ standard since 2017 and is implemented in GCC 9+, Clang 16+, and MSVC 2019+. The monotonic_buffer_resource and synchronized_pool_resource are well-tested in production. The main limitation is that not all standard containers have PMR aliases (only vector, string, list, deque, map, set, unordered_map, unordered_set do).


💰 想测试你的市场判断力?我用 Polymarket 做预测市场交易——这是全球最大的预测市场平台,从大选结果到技术监管时间线,什么都可以押注。和赌博不同,这是真正的信息市场:你懂的信息越多,胜率越高。我靠预测技术相关事件的走向已经赚了不少。用我的邀请链接注册:Polymarket.com