The Physics of Caching: Locality, Coherence & Thundering Herds

Why accessing RAM is 100x slower than L1. The physics of Temporal vs Spatial Locality, Cache Coherence problems, and the mathematics of the Thundering Herd.

Intermediate • 45 min read • Expert Version →

🎯 What You'll Learn

Quantify Latency: L1 vs L2 vs RAM vs Network
Master Locality of Reference (Temporal & Spatial)
Implement Cache-Aside vs Write-Through Patterns
Solve the Thundering Herd Problem (Probabilistic Locking)
Tune Redis Eviction Policies (LRU vs LFU Physics)

📚 Prerequisites

Before this lesson, you should understand:

Introduction

In physics, the speed of light is the limit. In computers, the speed of light is too slow. Light travels 30cm in 1 nanosecond. A CPU cycle is 0.2 nanoseconds. If data is not physically close to the CPU, the CPU stalls.

Caching is not just “saving data.” It is Hierarchical Storage Optimization based on the probability of access.

The Latency Hierarchy

Every engineer should memorize these orders of magnitude:

Storage Layer	Latency	Physics Metaphor
L1 Cache	1 ns	Picking a book from your desk.
L2 Cache	4 ns	Picking a book from the bookshelf.
RAM	100 ns	Walking to the library.
SSD (NVMe)	100,000 ns	Flying to New York.
Network (Redis)	500,000 ns	Flying to the Moon.

Key Insight: Redis is fast, but it is 5,000x slower than L1 Cache. Do not overuse it for extremely hot loops.

The Physics: Locality of Reference

Caching works because code is predictable.

Temporal Locality: If you used a variable recently, you will likely use it again soon.
- Implementation: Least Recently Used (LRU) Eviction.
Spatial Locality: If you accessed address $X$ $X$ , you will likely access $X+1$ $X + 1$ .
- Implementation: Cache Lines (fetching 64 bytes at a time).

Anti-Pattern: Linked Lists. Nodes are scattered in memory (random pointers). This destroys Spatial Locality. Pro-Pattern: Arrays. Contiguous memory blocks maximize Spatial Locality.

The Thundering Herd Problem

Scenario: You have a hot cache key price:BTC with TTL 1 second. At $t=1.001s$ , the key expires. 10,000 requests arrive simultaneously. All 10,000 see a MISS. All 10,000 hit the Database to calculate the price. Result: Database CPU goes to 100%. Service dies.

Physics Solution: Probabilistic Early Expiration (X-Fetch) Instead of a hard TTL, we recompute before expiration based on a probability $P$ . $P = \frac{TimeNow - CreatedTime}{TTL} \times \beta \times log(Rand())$ If $P > 1$ , recompute early. Only one request triggers the recompute, preventing the herd.

Cache Consistency: The Hard Problem

“There are only two hard things in Computer Science: cache invalidation and naming things.”

Write-Through: App writes to Cache AND DB.

Pros: Read consistency.
Cons: Write latency doubles.

Cache-Aside (Lazy Loading): App reads Cache. Miss? Read DB -> Write Cache.

Pros: Fast writes.
Cons: Stale hits are possible if DB is updated via backchannel.

Physics Solution: Change Data Capture (CDC). Trigger cache invalidation by listening to the Database Writelog (Postgres WAL) directly.

Practice Exercises

Exercise 1: RAM vs Redis (Beginner)

Task: Benchmark a Python dictionary vs Redis GET. Action: Measure 10,000 reads. Result: Dictionary (L1/RAM) will be nanoseconds. Redis will be milliseconds.

Exercise 2: Simulating Stampede (Intermediate)

Task: Create a script that spawns 100 threads requesting a key. Action: Delete the key. Watch them all execute the “DB Fetch” function. Fix: Implement a Semaphore/Lock on the “DB Fetch”.

Exercise 3: Array vs Linked List (Advanced)

Task: Iterate over a 10MB Array vs a 10MB Linked List. Action: Measure time. Why: The Array fits in Cache Lines. The List causes constant Cache Misses.

Knowledge Check

Why is a Linked List bad for Caching?
What is the “Thundering Herd”?
If Redis is 0.5ms, is it “fast”?
What is Spatial Locality?
How does CDC improve Cache Consistency?

Answers

Poor Spatial Locality. Nodes are scattered, preventing Cache Line optimizations.
Simultaneous Misses. Thousands of requests hitting the backing store at once.
Depends. Fast for network, slow for CPU. 0.5ms = 500,000 cycles.
Neighborhood Access. Accessing memory locations near each other.
Real-time Invalidation. It reacts to the source of truth (DB Log).

Summary

L1 Cache: The only thing that is truly fast.
Locality: Arrays > Linked Lists.
Thundering Herd: Solve with Math (Probabilistic Recompute) or Locks.
Consistency: The hardest problem.

Questions about this lesson? Working on related infrastructure?

Let's discuss