Introduction

Transparent Hugepages (THP) is one of the most impactful yet misunderstood Linux kernel features for server performance. When enabled, the kernel automatically promotes standard 4KB memory pages into 2MB (or 1GB on x86_64) hugepages — reducing TLB (Translation Lookaside Buffer) misses by up to 512x for mapped memory regions. For database servers, in-memory caches, and virtualization workloads, this can mean 5-20% performance improvements with zero application changes.

However, THP comes with tradeoffs: memory compaction overhead, increased allocation latency, and potential memory fragmentation. This guide compares three approaches to managing Transparent Hugepages: hugeadm (from the libhugetlbfs project), the kernel’s built-in khugepaged daemon, and direct sysfs/procfs tuning. We cover configuration, monitoring, workload-specific recommendations, and troubleshooting for self-hosted server environments.

Comparison Table

Featurehugeadm (libhugetlbfs)khugepaged (Kernel)Direct sysfs Tuning
Primary PurposePool management and reportingAutomatic THP promotionRuntime THP policy control
Authorlibhugetlbfs projectLinux kernel (Andrea Arcangeli)Linux kernel
LanguageCC (kernel)Shell / config files
Active Since2005Kernel 2.6.38 (2011)Always available
GitHub StarsBundled in distrosPart of kernelN/A
InterfaceCLI (hugeadm)Kernel thread (automatic)/sys/kernel/mm/transparent_hugepage/
Requires RootYesN/A (kernel daemon)Yes (for writing)
Pool ReportingYes (detailed)Limited (/proc/meminfo)Limited
ReservationYes (HugeTLB pools)No (THP is transparent)No
Fragmentation InfoYes (--page-sizes)NoVia /proc/buddyinfo
Runtime DisableNo (reads state)NoYes (via sysfs)
PersistenceNo (runtime only)NoVia sysctl / systemd-sysctl
Best ForDiagnostics and reportingDefault THP behaviorProduction tuning and policy

khugepaged: The Automatic Promotion Engine

Khugepaged is the kernel thread responsible for scanning memory and collapsing eligible 4KB pages into 2MB hugepages. It runs automatically when THP is set to always or madvise mode.

How khugepaged Works

Khugepaged periodically scans process memory for contiguous 4KB pages that can be collapsed into a hugepage. When it finds 512 aligned 4KB pages (forming a 2MB region), it remaps them into a single hugepage — transparent to the application. This process runs at low priority to avoid CPU contention.

Monitoring khugepaged Activity

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Check current THP status
cat /sys/kernel/mm/transparent_hugepage/enabled
# Output: always [madvise] never

# View allocation stats
cat /proc/vmstat | grep thp
# thp_fault_alloc: number of hugepages allocated at page fault time
# thp_collapse_alloc: number of hugepages created by khugepaged
# thp_swpout: hugepages swapped out (bad — indicates memory pressure)
# thp_swpout_fallback: normal pages swapped because THP swap failed

# Monitor khugepaged scan rate
grep -H "" /sys/kernel/mm/transparent_hugepage/khugepaged/*

Key khugepaged tunables in /sys/kernel/mm/transparent_hugepage/khugepaged/:

ParameterDefaultDescription
scan_sleep_millisecs10000Time khugepaged sleeps between scans (10s)
alloc_sleep_millisecs60000Sleep after a failed allocation (60s)
pages_to_scan4096Pages to scan per wakeup
defrag1Enable defragmentation on allocation failure
max_ptes_none511Max non-present PTEs in range (higher = more aggressive)

Tuning for Database Workloads

PostgreSQL, MySQL, and MongoDB benefit significantly from THP. For database servers, increase scan aggressiveness:

1
2
3
4
# Make khugepaged more aggressive for 24/7 database server
echo 5000 > /sys/kernel/mm/transparent_hugepage/khugepaged/scan_sleep_millisecs
echo 8192 > /sys/kernel/mm/transparent_hugepage/khugepaged/pages_to_scan
echo 0 > /sys/kernel/mm/transparent_hugepage/khugepaged/alloc_sleep_millisecs

To make these settings persistent:

1
2
3
# /etc/sysctl.d/99-thp-tuning.conf
# Note: khugepaged parameters are NOT controllable via sysctl.
# Use a systemd service instead.

Create a systemd service:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
[Unit]
Description=Tune THP for database workloads
After=multi-user.target

[Service]
Type=oneshot
ExecStart=/bin/bash -c 'echo 5000 > /sys/kernel/mm/transparent_hugepage/khugepaged/scan_sleep_millisecs && echo 8192 > /sys/kernel/mm/transparent_hugepage/khugepaged/pages_to_scan'
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target

When to Disable THP

Some workloads suffer from THP. Redis, MongoDB (older versions), and Oracle Database have documented recommendations to disable THP:

1
2
3
4
5
6
# Disable THP immediately
echo never > /sys/kernel/mm/transparent_hugepage/enabled

# Make persistent (GRUB kernel command line)
# Add to /etc/default/grub: transparent_hugepage=never
# Then: update-grub && reboot

hugeadm: Diagnostics and Reporting

The hugeadm command from the libhugetlbfs project provides comprehensive hugepage pool reporting and management. While primarily designed for the older HugeTLB reservation system, its diagnostic capabilities are invaluable for THP troubleshooting.

Installation

1
2
3
4
5
# Debian/Ubuntu
sudo apt install libhugetlbfs-bin

# RHEL/Fedora
sudo dnf install libhugetlbfs-utils

Pool Diagnostics

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Show all hugepage pool sizes (default, huge, gigantic)
sudo hugeadm --pool-list

# Detailed page size information
sudo hugeadm --page-sizes-all

# Show hugepage usage per NUMA node
sudo hugeadm --pool-list --numa

# Display current pool configuration
sudo hugeadm --explain

Example output:

1
2
3
      Size  Minimum  Current  Maximum  Default
   2097152        0        0        0        *
1073741824        0        0        0

Memory Mapping Diagnostics

Check whether a running process is using hugepages:

1
2
3
4
5
# Show mapping details for a process
sudo hugeadm --show-shm-groups

# Display per-process hugepage usage
grep -H huge /proc/*/smaps | grep -B 20 "AnonHugePages:.*[1-9]"

Monitoring with /proc/meminfo

The kernel exposes THP statistics in /proc/meminfo:

1
grep -i huge /proc/meminfo

Key fields:

FieldMeaning
AnonHugePagesAnonymous transparent hugepages currently in use
ShmemHugePagesShared memory (tmpfs) hugepages
FileHugePagesFile-backed (page cache) hugepages
HugePages_TotalPre-allocated HugeTLB pool pages
HugePages_FreeFree HugeTLB pages
HugepagesizeHugeTLB page size (typically 2048 KB)

Direct sysfs Tuning: Production Control

For production environments, direct sysfs control provides the most flexible and precise THP management.

THP System-Wide Modes

1
2
3
4
5
6
# Check current mode
cat /sys/kernel/mm/transparent_hugepage/enabled
# always [madvise] never

# Switch modes at runtime
echo madvise > /sys/kernel/mm/transparent_hugepage/enabled

The three modes:

ModeBehaviorBest For
alwaysKHugepaged promotes all eligible mappingsDatabase servers, JVM, VMs
madviseOnly promote if application requests via madvise()Mixed workloads, Redis, Node.js
neverTHP completely disabledReal-time systems, Redis, Oracle

Application-Specific THP (madvise)

With madvise mode, applications opt in to hugepages:

1
2
3
4
5
// C: Request hugepages for this mapping
#include <sys/mman.h>
void *addr = mmap(NULL, size, PROT_READ|PROT_WRITE,
                  MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
madvise(addr, size, MADV_HUGEPAGE);

For applications that cannot be modified, use environment variables:

1
2
3
4
5
# JVM: explicitly request hugepages
java -XX:+UseTransparentHugePages -jar app.jar

# Python: use libhugetlbfs preload
LD_PRELOAD=libhugetlbfs.so HUGETLB_MORECORE=yes python3 app.py

Defragmentation Control

THP allocation may fail if memory is fragmented. Control defrag behavior:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# Check defrag setting
cat /sys/kernel/mm/transparent_hugepage/defrag
# always defer defer+madvise madvise never

# Aggressive defrag for database servers
echo always > /sys/kernel/mm/transparent_hugepage/defrag

# Deferred defrag (lighter, lower latency impact)
echo defer > /sys/kernel/mm/transparent_hugepage/defrag

# No defrag for latency-sensitive workloads
echo never > /sys/kernel/mm/transparent_hugepage/defrag

Shmem (tmpfs) Hugepages

Since kernel 4.8, tmpfs mounts can use hugepages:

1
2
3
4
5
# Mount tmpfs with hugepage support
mount -t tmpfs -o huge=always tmpfs /mnt/hugetmp

# Or in /etc/fstab:
# tmpfs /dev/shm tmpfs rw,nodev,nosuid,noexec,huge=always 0 0

Persistent Configuration

For persistence across reboots, use sysctl (for enabled/defrag) and systemd services for khugepaged tuning:

1
2
3
# /etc/sysctl.d/99-thp.conf
vm.transparent_hugepage = madvise
vm.transparent_hugepage_defrag = defer

Apply:

1
sudo sysctl --system

Workload-Specific Recommendations

Database Servers (PostgreSQL, MySQL, MongoDB 4.2+)

1
2
3
THP: always
Defrag: always or defer
Khugepaged: more aggressive (scan_sleep = 5000ms, pages_to_scan = 8192)

Databases have large, long-lived memory allocations that benefit enormously from TLB reduction. PostgreSQL with huge_pages = on and THP=always sees 5-15% throughput improvements in OLTP workloads.

Redis and In-Memory Stores

1
2
THP: never (or madvise)
Defrag: never

Redis documentation explicitly recommends disabling THP because fork-based persistence (RDB snapshots, AOF rewrites) duplicates page tables, and the kernel compacts memory during fork — causing latency spikes. Use echo never > /sys/kernel/mm/transparent_hugepage/enabled before starting Redis.

Java Virtual Machines

1
2
THP: always (with -XX:+UseTransparentHugePages)
Defrag: defer

Modern JVMs (OpenJDK 8+) support THP natively. The heap is allocated as a single large contiguous region — perfect for hugepage promotion. Enable with -XX:+UseTransparentHugePages and verify with grep AnonHugePages /proc/<pid>/smaps.

Virtualization (KVM/QEMU, LXC)

1
2
3
THP: always
Defrag: always
Khugepaged: default settings

Virtual machine memory is allocated as large, aligned chunks. THP reduces EPT (Extended Page Table) overhead for nested paging, improving VM density and performance. QEMU automatically uses THP when available.

Real-Time and Low-Latency Workloads

1
2
THP: never
Defrag: never

Real-time applications cannot tolerate the latency spikes from memory compaction during khugepaged runs. Use transparent_hugepage=never on the kernel command line and allocate static HugeTLB pages if large page support is needed.

Why Self-Host Your Memory Management Tuning?

Memory management tuning is perhaps the single highest-ROI optimization you can make on a self-hosted server:

Zero hardware cost: Unlike adding RAM, tuning THP costs nothing. A database server with THP properly configured can serve 20% more queries per second — equivalent to a free hardware upgrade. Our Linux hugepages management guide covers the older HugeTLB system for applications that need explicitly reserved hugepages.

Database performance multiplier: OLTP databases show the most dramatic THP benefits because they spend significant time in kernel page table walks. Reducing TLB misses from 512 entries per 2MB to 1 entry directly translates to higher throughput at the same CPU utilization.

Container density: Each container’s memory mappings consume TLB entries. THP reduces per-container TLB pressure, allowing you to run more containers per host without performance degradation. For container-specific memory tuning, see our KSM memory deduplication guide.

NUMA-aware hugepages: On multi-socket servers, THP + NUMA awareness ensures that processes access hugepages on their local memory node, avoiding cross-socket latency penalties that can halve memory bandwidth.

FAQ

How do I know if THP is helping or hurting my workload?

Measure with a before-and-after benchmark. Disable THP (echo never > .../enabled), run your workload for 10 minutes, re-enable THP, and compare. Monitor /proc/vmstat for thp_fault_alloc (successful allocations) vs thp_fault_fallback (failed allocations). If thp_fault_fallback is high (>10% of allocations), THP is struggling — consider madvise mode instead of always. If you see many compact_stall events in /proc/vmstat, defragmentation is causing latency — switch to defer or never defrag.

What is the difference between HugeTLB and Transparent Hugepages?

HugeTLB is the older, explicit hugepage system: you pre-allocate a pool of hugepages at boot (hugepages=N kernel parameter), and applications must explicitly request them via mmap() with MAP_HUGETLB or shared memory. THP is transparent — the kernel automatically promotes eligible 4KB pages to 2MB without application changes. HugeTLB guarantees availability (pages are reserved); THP is best-effort (pages may not be available if memory is fragmented). For workloads that MUST have hugepages (DPDK, VMs with passthrough), use HugeTLB. For general server workloads, use THP.

Can I use 1GB hugepages with THP?

Yes, on x86_64 CPUs with pdpe1gb CPU flag. Check with grep pdpe1gb /proc/cpuinfo. 1GB THP requires the kernel command line parameter transparent_hugepage=always and sufficient contiguous free memory. You can monitor 1GB page usage via /sys/kernel/mm/transparent_hugepage/hpage_pmd_size. However, 1GB THP is significantly harder to allocate than 2MB pages due to fragmentation requirements — it’s primarily useful for virtual machine memory backed by hugepages.

What happens when memory is fragmented and THP cannot allocate?

When khugepaged cannot find 512 contiguous aligned 4KB pages, it triggers compaction (if defrag is not never). Compaction moves pages around to create contiguous regions. This has a CPU cost and a (usually small) latency impact. If compaction fails, the allocation falls back to 4KB pages transparently — the application continues working correctly but without the TLB benefit. The thp_fault_fallback counter in /proc/vmstat tracks these failed promotions.

How do I troubleshoot high compact_stall counts?

High compact_stall means the kernel is spending significant time on memory compaction. Solutions:

  1. Switch THP defrag to defer or madvise — less aggressive compaction
  2. Increase khugepaged/alloc_sleep_millisecs — longer wait between retries
  3. Reduce memory pressure — add more RAM, reduce workload memory usage
  4. Switch to madvise THP mode — only promote when explicitly requested
  5. Use drop_caches periodically to free reclaimable slab memory

For advanced memory analysis, see our Linux memory reclaim guide.


💰 想测试你的市场判断力?我用 Polymarket 做预测市场交易——这是全球最大的预测市场平台,从大选结果到技术监管时间线,什么都可以押注。和赌博不同,这是真正的信息市场:你懂的信息越多,胜率越高。我靠预测技术相关事件的走向已经赚了不少。用我的邀请链接注册:Polymarket.com