The Physics of Time: Concurrency, Parallelism & Amdahl's Law

Why adding threads can make your code slower. The physics of Context Switching, Cache Thrashing, and the mathematical limit of parallel speedup (Amdahl's Law).

Intermediate 40 min read Expert Version →

🎯 What You'll Learn

  • Differentiate Concurrency (Interleaving) vs Parallelism (Simultaneity)
  • Calculate Theoretical Max Speedup using Amdahl's Law
  • Visualise the Cost of Context Switching (L1 Cache Flush)
  • Understand the Global Interpreter Lock (GIL) Mutex
  • Implement Safe Locks to prevent Race Conditions

Introduction

“Let’s add more threads to make it faster.” This is the most dangerous sentence in software engineering.

Threads are not free. They cost memory (Stack) and CPU time (Context Switch). If you don’t understand the physics, adding threads will make your program slower.


Part 1: The Physics of Simultaneity

Concurrency != Parallelism

  • Concurrency: Dealing with multiple things at once. (Structure).
    • Physics: Time Slicing. The CPU runs Task A for 10ms, checks interrupt, switches to Task B for 10ms.
    • Illusion: It looks like they run together, but they are interleaved.
  • Parallelism: Doing multiple things at once. (Execution).
    • Physics: Multicore. Core 1 runs Task A. Core 2 runs Task B. They truly execute at the exact same nanosecond.

The Kitchen Metaphor:

  • Concurrency: One chef chopping onions, then stirring sauce, then checking oven.
  • Parallelism: Three chefs. One chops, one stirs, one bakes.

Part 2: The Speed Limit (Amdahl’s Law)

You cannot parallelize everything. Setup code, sequential dependencies, and I/O are strictly serial. Amdahl’s Law defines the maximum theoretical speedup:

S(n)=1(1P)+PnS(n) = \frac{1}{(1-P) + \frac{P}{n}}

  • S(n)S(n): Speedup with nn cores.
  • PP: Portion of code that is parallelizable (0.0 to 1.0).
  • 1P1-P: Serial portion.

The Trap: If 10% of your program is serial (P=0.9P=0.9): Even with Infinite Cores (n=n = \infty), the max speedup is 1/0.1=10x1 / 0.1 = 10x. You can never go faster than the sequential part.


Part 3: The Cost of Switching

Switching threads is violent. When the OS preempts Thread A to run Thread B:

  1. Save Registers: Instruction Pointer, Stack Pointer.
  2. Pollute Cache: Thread B needs different data. It evicts Thread A’s hot data from L1/L2 Cache.
  3. Resume: When Thread A returns, it faces a Cache Miss Storm.

Physics: A Context Switch takes 5μs\approx 5\mu s. If you switch every 10μs10\mu s, you spend 33% of your CPU time doing nothing but switching.


Part 4: The GIL (Global Interpreter Lock)

Python and Ruby have a GIL. Physics: A Mutex protects the Interpreter internals. Only ONE thread can execute Bytecode at a time, even on a 64-core machine.

  • I/O Bound (Network/Disk): GIL is released. Threading works.
  • CPU Bound (Math/Loop): GIL is held. Threading adds overhead without speedup. Use Multiprocessing (Separate processes) instead.

Practice Exercises

Exercise 1: Amdahl’s Limit (Beginner)

Task: Calculate max speedup for a program that is 50% serial. Math: 1/(0.5+0)=2x1 / (0.5 + 0) = 2x. Even with a supercomputer, it only runs twice as fast.

Exercise 2: The GIL Wall (Intermediate)

Task: Run a CPU-heavy Fibonacci calculation in Python using 2 threads vs 2 processes. Observation: Threads take longer than serial (context switch overhead). Processes run 2x faster.

Exercise 3: Race Condition (Advanced)

Task: Increment a shared counter 1 million times from 2 threads without a lock. Observation: Final count is never 2 million. It’s random (e.g., 1,452,991) due to non-atomic Read-Modify-Write cycles.


Knowledge Check

  1. What is the theoretical max speedup if 1% of code is serial?
  2. Why does Python use a GIL?
  3. What hardware resource is thrashed during a context switch?
  4. Difference between threading and multiprocessing in Python?
  5. Does adding cores help if P=0?
Answers
  1. 100x. (1/0.011/0.01).
  2. Memory Safety. Simplifies memory management (ref counting) in CPython.
  3. L1/L2 Cache.
  4. Threads share memory (GIL limited). Processes have separate memory (True Parallelism).
  5. No. Speedup is exactly 1x.

Summary

  • Concurrency: Structure (Interleaving).
  • Parallelism: Execution (Simultaneous).
  • Amdahl’s Law: The serial part kills you.
  • Context Switches: The hidden tax on threads.

Questions about this lesson? Working on related infrastructure?

Let's discuss