Infrastructure

Memory Tuning for Low-Latency: The THP Trap and HugePage Mastery

How Transparent Huge Pages cause unpredictable latency spikes, and the explicit HugePage reservation strategy that eliminates memory stalls.

3 min
#memory #hugepages #thp #numa #latency #kernel

A trading engine at a crypto exchange exhibited a perfect 50µs P99-until memory demand crossed 70% of available RAM. Then, P99 spiked to 15ms.

The root cause: Transparent Huge Pages (THP). The kernel was “helpfully” defragmenting memory in the background, stealing CPU cycles from the trading thread.

This post documents how to disable THP and pre-allocate explicit HugePages for deterministic memory access.

1. The Physics of Page Tables

When your application accesses memory, the CPU must translate a virtual address to a physical address. This translation uses the Translation Lookaside Buffer (TLB), a hardware cache.

A TLB miss is expensive: 10-100 CPU cycles.

The THP Trap

THP attempts to reduce TLB misses by using 2MB pages instead of 4KB pages. In theory, this is good. In practice, the compaction daemon (khugepaged) runs in the kernel, consuming CPU and holding spinlocks while it defragments memory.

App Allocates THP Enabled? khugepaged Runs Spinlocks (1-15ms) App Stalls

2. The Decision Matrix

ApproachTLB PressureCompaction StallsVerdict
A. THP Enabled (Default)LowHigh (Bad)Unpredictable. Rejected.
B. THP Disabled, No HugePagesHighNoneStable but slower.
C. THP Disabled + Explicit HugePagesLowNoneSelected. Best of both worlds.

3. The Kill: Explicit HugePage Reservation

Step 1: Disable THP Globally

# /etc/default/grub
GRUB_CMDLINE_LINUX="transparent_hugepage=never"

sudo update-grub && sudo reboot

Verification:

cat /sys/kernel/mm/transparent_hugepage/enabled
# Expected: [always] madvise [never]
#           The 'never' should have brackets.

Step 2: Reserve Explicit HugePages at Boot

Calculate how many 2MB pages you need. For a 32GB heap: 32 * 1024 / 2 = 16384 pages

# /etc/sysctl.conf
vm.nr_hugepages = 16384

sudo sysctl -p

Step 3: Use HugePages in Your Application

Java (JVM):

java -XX:+UseLargePages -XX:LargePageSizeInBytes=2m -jar myapp.jar

C/Rust (mmap):

void* ptr = mmap(NULL, size, PROT_READ | PROT_WRITE,
                 MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);

4. The Tool: Auditing HugePage State

pip install latency-audit && latency-audit --check memory

This checks that THP is disabled and that vm.nr_hugepages is set to a non-zero value.

5. Systems Thinking: The Trade-offs

  1. Memory Fragmentation: Explicit HugePages reserve physical memory at boot. If you over-allocate, you starve other processes.
  2. Non-Swappable: HugePages cannot be swapped to disk. Ensure you have enough physical RAM.
  3. Application Support: Not all applications support HugePages. Test thoroughly before production.

6. The Philosophy

The kernel’s “smart” features assume you value average throughput. In HFT, you value worst-case latency. These are opposing goals.

THP is a prime example: it optimizes the common case (less TLB pressure) at the cost of the rare case (catastrophic compaction stalls). For HFT, the rare case is your SLA.

Disable the kernel’s intelligence. Replace it with your own, explicit configuration.

Reading Path

Continue exploring with these related deep dives:

TopicNext Post
CPU governors, C-states, NUMA, isolationCPU Isolation for HFT: The isolcpus Lie and What Actually Works
I/O schedulers, Direct I/O, EBS tuningI/O Schedulers: Why the Kernel Reorders Your Writes
The 5 kernel settings that cost you latencyThe $2M Millisecond: Linux Defaults That Cost You Money
Measuring without overhead using eBPFeBPF Profiling: Nanoseconds Without Adding Any
Design philosophy & architecture decisionsTrading Infrastructure: First Principles That Scale
Share: LinkedIn X