Linux Memory: The Physics of RAM
Why access to RAM is slow (TLB Misses), how the Kernel cheats with Slab Allocators, and the math behind the OOM Killer.
🎯 What You'll Learn
- Deconstruct the Virtual Memory Map (VMA)
- Calculate the latency cost of a TLB Miss vs Page Fault
- Analyze the SLAB/SLUB allocator for kernel objects
- Predict OOM Killer behavior using `oom_score`
- Trace a Major Page Fault from CPU to Disk IO
📚 Prerequisites
Before this lesson, you should understand:
🔬 Try It: Feel the Distance of Latency
If L1 cache is 1 foot away, how far is RAM? Feel the difference:
Introduction
RAM is a sparsely populated map, not a bucket of bytes. Your 16GB laptop can run processes asking for 100TB of memory. How? The Kernel lies.
Virtual Memory is the art of promising distinct memory addresses to every process while mapping them to the same physical chips-or often, to nothing at all.
The Physics: Translation Lookaside Buffer (TLB)
The CPU does not know “Physical RAM”. It only knows Virtual Addresses.
Every memory access requires a translation: Virtual -> Physical.
Reading the Page Table from RAM takes 100ns.
This doubles the cost of every memory access.
The Fix: The TLB. A tiny hardware cache that remembers “Virtual Page 5 = Physical Frame 900”. TLB Hit: 0ns overhead. TLB Miss: 100ns (Page Walk).
Physics: If your working set exceeds the TLB size, your program slows down by 50%, even if you have infinite RAM. This is why HugePages (2MB instead of 4KB) exist-to increase TLB coverage.
Deep Dive: The SLAB Allocator
malloc() is slow. It searches for free blocks.
The Kernel needs to allocate millions of identical objects (e.g., task_struct, inode).
The Solution: SLAB (and SLUB). Pre-allocated caches of specific object sizes.
- Need a
task_struct? Take one off the top of thetask_struct_cachestack. - Freeing? Just push it back on the stack.
- Zero Fragmentation. Zero Searching.
The Assassin: OOM Killer
When RAM + Swap is full, the Kernel must kill something. It doesn’t pick randomly. It calculates a score.
- Badness: High memory usage (1000 points max).
- Protection: Root processes get a -30 points discount.
- Adjustment: You can manually set
/proc/<pid>/oom_score_adjto -1000 to become invincible (likesshd).
Code: Inspecting the Maps
Where is your heap? Where is your stack?
cat /proc/self/maps tells the truth.
# Output format:
# address perms offset dev inode pathname
00400000-00452000 r-xp 00000000 08:02 173521 /usr/bin/zsh (Text)
00651000-00652000 r--p 00051000 08:02 173521 /usr/bin/zsh (Data)
01cea000-01d0d000 rw-p 00000000 00:00 0 [heap]
...
7ffeeb9a0000-7ffeeb9c1000 rw-p 00000000 00:00 0 [stack]
Key Insight: Notice r-xp (Read-Execute-Private) for code, and rw-p for data.
You cannot write to code. You cannot execute data (NX Bit). This prevents Buffer Overflows.
Practice Exercises
Exercise 1: The Page Fault (Beginner)
Scenario: malloc(1GB).
Task: Does the OS allocate 1GB of RAM?
No. It allocates “Virtual Promises”. The RAM is allocated only when you touch (write to) the pages (Minor Page Faults). Verify this with ps (VSZ vs RSS).
Exercise 2: OOM Survival (Intermediate)
Task:
- Run a process.
echo -1000 > /proc/<pid>/oom_score_adj.- Trigger an OOM condition (safely!).
- Confirm your process survives while others die.
Exercise 3: HugePages (Advanced)
Task: Check /proc/meminfo for HugePages_Total.
Enable HugePages and run a memory-intensive database (like Postgres). Measure the performance difference (TLB Miss reduction).
Knowledge Check
- Why do HugePages improve performance?
- What is the difference between specific Major and Minor Page Faults?
- Why does the Kernel generally refuse to kill
sshdduring OOM? - What is the “NX Bit”?
- Does
free()return memory to the OS immediately?
Answers
- TLB Efficiency. One TLB entry covers 2MB instead of 4KB. Fewer misses.
- Major: Disk I/O required (Swap/File). Minor: Memory mapping only (First access/COW).
- Safety.
sshdallows the admin to log in and fix the problem. Killing it locks you out. - No-Execute. Hardware flag that prevents code execution in Data/Stack segments.
- No. Usually,
glibckeeps the memory in the heap for reuse. It is only returned (viasbrk) if a large chunk at the end is freed.
Summary
- Virtual Memory: A lie that enables isolation.
- TLB: The hardware cache that makes the lie fast.
- OOM: The grim reaper of memory leaks.
Questions about this lesson? Working on related infrastructure?
Let's discuss