How Computers Actually Work: Complete Guide
From electrons to applications. Understand the CPU, memory, and fetch-decode-execute cycle that powers every program you'll ever write.
🎯 What You'll Learn
- Understand the basic components of a computer
- Learn how the CPU executes instructions
- Grasp the memory hierarchy and why it matters
- See how hardware and software connect
- Build intuition for performance optimization
The Invisible Machine
You’re reading this on a device that performs billions of operations per second. But what’s actually happening inside that metal box?
Here’s what most developers never truly grasp: your computer is just doing math really, really fast. That’s it. Every video, every game, every AI model-it’s all just numbers being shuffled around at incomprehensible speed.
Understanding this machinery isn’t academic curiosity. It’s the difference between writing code that crawls and code that flies. It’s why some engineers command $500K salaries while others struggle-they understand the machine.
The Core Components
Every computer, from your phone to a supercomputer, has the same basic architecture:
The Brain ↔ Memory (RAM)
Working Space ↔ Storage (SSD/HDD)
Long-term Memory
Let’s understand each one:
The CPU: The Calculator
The Central Processing Unit is where computation happens. It can only do a few things:
- Arithmetic: Add, subtract, multiply, divide
- Logic: Compare values, AND/OR/NOT operations
- Move data: Load from memory, store to memory
- Branch: Jump to different instructions based on conditions
That’s it. Every program you’ve ever used (from Photoshop to video games to operating systems) is just clever combinations of these four operations.
Memory (RAM): The Desk
Random Access Memory is your computer’s working space. It’s:
- Fast: 100 nanoseconds to access (compared to 10 milliseconds for disk)
- Volatile: Contents disappear when power is off
- Limited: Usually 8-64 GB in modern computers
Think of RAM like a desk. You can only work on what’s on the desk. Everything else is in filing cabinets (storage) and needs to be retrieved.
Storage: The Filing Cabinet
Persistent storage (SSD/HDD) keeps your data safe when power is off:
- Slow: 1000x slower than RAM
- Persistent: Data survives power loss
- Large: 256 GB to several TB
The Fetch-Decode-Execute Cycle
This is the heartbeat of every computer. The CPU repeats this cycle billions of times per second:
Get instruction from memory → 2. DECODE
Figure out what to do → 3. EXECUTE
Do the operation
Step 1: Fetch
The CPU has a special register called the Program Counter (PC) that holds the memory address of the next instruction. The CPU:
- Reads the address in PC
- Goes to that memory location
- Retrieves the instruction stored there
- Increments PC to point to the next instruction
Step 2: Decode
The fetched instruction is a number. The CPU decodes it to understand:
- What operation? (add, subtract, load, etc.)
- What data? (which registers, which memory addresses)
For example, the instruction 0x01D8 might mean “add register B to register A.”
Step 3: Execute
The CPU performs the operation. If it’s:
- Arithmetic: The ALU (Arithmetic Logic Unit) computes the result
- Memory access: Data is loaded from or stored to RAM
- Branch: The PC is updated to a new address
Putting It Together
Here’s a simple example. The instruction “add 5 to the value in memory location 100”:
1. FETCH: PC=0x1000, get instruction at 0x1000
2. DECODE: Instruction means "load from address 100"
3. EXECUTE: Read value from address 100 into register
4. FETCH: PC=0x1004, get next instruction
5. DECODE: Instruction means "add 5 to register"
6. EXECUTE: Add 5 to the register value
7. FETCH: PC=0x1008, get next instruction
8. DECODE: Instruction means "store register to address 100"
9. EXECUTE: Write register value back to address 100
At 3 GHz, this happens 3 billion times per second. Your “instant” mouse click involves millions of these cycles.
The Memory Hierarchy
Here’s the dirty secret of computing: CPU is way faster than memory.
A modern CPU can execute an operation every 0.3 nanoseconds. But accessing RAM takes 100 nanoseconds. That’s 300 wasted cycles waiting for data!
The solution: caches-small, fast memory close to the CPU.
| Level | Size | Latency | Analogy |
|---|---|---|---|
| Registers | ~1 KB | 0.3 ns | Your hands |
| L1 Cache | 64 KB | 1 ns | Your desk |
| L2 Cache | 512 KB | 4 ns | Your office drawer |
| L3 Cache | 8 MB | 20 ns | Filing cabinet |
| RAM | 16 GB | 100 ns | Library in your building |
| SSD | 500 GB | 100,000 ns | Library across town |
Why This Matters for Performance
# Bad: Random memory access
for i in random_order:
array[i] += 1 # Cache miss every time!
# Good: Sequential access
for i in range(len(array)):
array[i] += 1 # Cache loves this
The difference? Sequential access can be 100x faster because of how caches work. The cache loads data in chunks (cache lines). Sequential access uses the whole chunk; random access wastes it.
Numbers Every Programmer Should Know
Memorize these. They’ll inform every performance decision:
| Operation | Time |
|---|---|
| L1 cache reference | 1 ns |
| L2 cache reference | 4 ns |
| RAM reference | 100 ns |
| SSD random read | 150,000 ns (150 µs) |
| HDD seek | 10,000,000 ns (10 ms) |
| Network round-trip (same datacenter) | 500,000 ns (500 µs) |
| Network round-trip (cross-country) | 150,000,000 ns (150 ms) |
Relative Scale
If an L1 cache access takes 1 second:
- L2 cache: 4 seconds
- RAM: 1.5 minutes
- SSD: 1.5 days
- HDD: 4 months
- Cross-country network: 5 years
This is why caching, locality, and data structure choice matter so much.
From Hardware to Software
How does your Python code become CPU operations?
Python/JS/etc. → Interpreter/Compiler → Machine Code → CPU
Compiled languages (C, Rust, Go): Code is translated to machine code ahead of time. Fast execution.
Interpreted languages (Python, JavaScript): Code is translated line-by-line at runtime. More flexible, slower.
JIT compiled (Java, modern JS): Hybrid-interpreted first, hot paths compiled during execution.
// This C code:
int x = 5;
int y = 10;
int z = x + y;
// Becomes something like:
// MOV R1, 5 ; Put 5 in register 1
// MOV R2, 10 ; Put 10 in register 2
// ADD R3, R1, R2 ; Add R1+R2, store in R3
Practice Exercises
Exercise 1: Latency Intuition (Beginner)
Rank these operations from fastest to slowest:
- Reading from L1 cache
- Reading from SSD
- Reading from RAM
- Network request to a server in the same city
- Reading from L3 cache
Answer
- L1 cache (1 ns)
- L3 cache (20 ns)
- RAM (100 ns)
- SSD (150,000 ns)
- Network (5,000,000+ ns)
Exercise 2: Cache Behavior (Intermediate)
Why is this loop fast:
for i in range(1000):
total += array[i]
And this loop slow:
for i in range(1000):
total += array[random.randint(0, len(array)-1)]
Answer
Sequential access benefits from cache prefetching. The CPU predicts you’ll need the next bytes and loads them in advance. Random access defeats this-every access is a cache miss.
Exercise 3: CPU Bottleneck Analysis (Advanced)
Your program takes 10 seconds. Profiling shows:
- 8 seconds waiting for disk I/O
- 1 second in CPU computation
- 1 second waiting for network
What’s the bottleneck? How would you optimize?
Answer
Disk I/O is the bottleneck (80% of time). Optimizations:
- Load data into RAM once, process in memory
- Use an SSD instead of HDD
- Read files sequentially, not randomly
- Use async I/O to overlap reading with processing
Optimizing CPU computation would save at most 1 second (10%).
Knowledge Check
-
What are the three main operations in the fetch-decode-execute cycle?
-
Why do we have cache memory? Why not just use more RAM?
-
L1 cache access takes 1ns. RAM access takes 100ns. How many L1 accesses could you do in the time of one RAM access?
-
What’s the difference between compiled and interpreted languages at the hardware level?
-
True or False: A 3 GHz CPU executes 3 billion instructions per second.
Answers
-
Fetch (get instruction from memory), Decode (understand what it means), Execute (perform the operation).
-
RAM is too slow for the CPU. The CPU would spend most of its time waiting. Cache is faster but more expensive per byte, so we use small amounts close to the CPU.
-
100 accesses. This is why cache hit rate matters so much.
-
Compiled: Machine code is generated once, runs directly on CPU. Interpreted: Code is translated to machine operations at runtime by another program.
-
False. Modern CPUs can execute multiple instructions per cycle (superscalar) and run at varying speeds (turbo boost). Actual instructions per second depends on workload.
Summary
| Concept | Key Takeaway |
|---|---|
| CPU | The calculator that executes instructions |
| RAM | Fast, volatile working memory |
| Storage | Slow, persistent long-term storage |
| Fetch-Decode-Execute | The fundamental cycle of computation |
| Memory Hierarchy | Trade-off between speed and capacity |
| Cache | Small, fast memory that hides RAM latency |
The mental model to internalize: Your computer is just shuffling numbers between different speed levels of storage, while a calculator adds them up billions of times per second.
What’s Next?
You now understand the machine. Next, understand how the operating system manages it:
🎯 Continue learning:
- Networking Basics - How data moves between computers
- What Is the Linux Kernel - The software that controls the hardware
🔬 Go deeper: The Linux Kernel Optimization Series covers how to tune these systems.
You now understand the machine that runs every piece of software on Earth. ⚡
Questions about this lesson? Working on related infrastructure?
Let's discuss