How Computers Actually Work: Complete Guide

From electrons to applications. Understand the CPU, memory, and fetch-decode-execute cycle that powers every program you'll ever write.

Beginner • 35 min read

🎯 What You'll Learn

Understand the basic components of a computer
Learn how the CPU executes instructions
Grasp the memory hierarchy and why it matters
See how hardware and software connect
Build intuition for performance optimization

The Invisible Machine

You’re reading this on a device that performs billions of operations per second. But what’s actually happening inside that metal box?

Here’s what most developers never truly grasp: your computer is just doing math really, really fast. That’s it. Every video, every game, every AI model-it’s all just numbers being shuffled around at incomprehensible speed.

Understanding this machinery isn’t academic curiosity. It’s the difference between writing code that crawls and code that flies. It’s why some engineers command $500K salaries while others struggle-they understand the machine.

The Core Components

Every computer, from your phone to a supercomputer, has the same basic architecture:

CPU
The Brain ↔ Memory (RAM)
Working Space ↔ Storage (SSD/HDD)
Long-term Memory

Let’s understand each one:

The CPU: The Calculator

The Central Processing Unit is where computation happens. It can only do a few things:

Arithmetic: Add, subtract, multiply, divide
Logic: Compare values, AND/OR/NOT operations
Move data: Load from memory, store to memory
Branch: Jump to different instructions based on conditions

That’s it. Every program you’ve ever used (from Photoshop to video games to operating systems) is just clever combinations of these four operations.

Memory (RAM): The Desk

Random Access Memory is your computer’s working space. It’s:

Fast: 100 nanoseconds to access (compared to 10 milliseconds for disk)
Volatile: Contents disappear when power is off
Limited: Usually 8-64 GB in modern computers

Think of RAM like a desk. You can only work on what’s on the desk. Everything else is in filing cabinets (storage) and needs to be retrieved.

Storage: The Filing Cabinet

Persistent storage (SSD/HDD) keeps your data safe when power is off:

Slow: 1000x slower than RAM
Persistent: Data survives power loss
Large: 256 GB to several TB

The Fetch-Decode-Execute Cycle

This is the heartbeat of every computer. The CPU repeats this cycle billions of times per second:

1. FETCH
Get instruction from memory → 2. DECODE
Figure out what to do → 3. EXECUTE
Do the operation

Step 1: Fetch

The CPU has a special register called the Program Counter (PC) that holds the memory address of the next instruction. The CPU:

Reads the address in PC
Goes to that memory location
Retrieves the instruction stored there
Increments PC to point to the next instruction

Step 2: Decode

The fetched instruction is a number. The CPU decodes it to understand:

What operation? (add, subtract, load, etc.)
What data? (which registers, which memory addresses)

For example, the instruction 0x01D8 might mean “add register B to register A.”

Step 3: Execute

The CPU performs the operation. If it’s:

Arithmetic: The ALU (Arithmetic Logic Unit) computes the result
Memory access: Data is loaded from or stored to RAM
Branch: The PC is updated to a new address

Putting It Together

Here’s a simple example. The instruction “add 5 to the value in memory location 100”:

1. FETCH:   PC=0x1000, get instruction at 0x1000
2. DECODE:  Instruction means "load from address 100"
3. EXECUTE: Read value from address 100 into register

4. FETCH:   PC=0x1004, get next instruction
5. DECODE:  Instruction means "add 5 to register"
6. EXECUTE: Add 5 to the register value

7. FETCH:   PC=0x1008, get next instruction
8. DECODE:  Instruction means "store register to address 100"
9. EXECUTE: Write register value back to address 100

At 3 GHz, this happens 3 billion times per second. Your “instant” mouse click involves millions of these cycles.

The Memory Hierarchy

Here’s the dirty secret of computing: CPU is way faster than memory.

A modern CPU can execute an operation every 0.3 nanoseconds. But accessing RAM takes 100 nanoseconds. That’s 300 wasted cycles waiting for data!

The solution: caches-small, fast memory close to the CPU.

Level	Size	Latency	Analogy
Registers	~1 KB	0.3 ns	Your hands
L1 Cache	64 KB	1 ns	Your desk
L2 Cache	512 KB	4 ns	Your office drawer
L3 Cache	8 MB	20 ns	Filing cabinet
RAM	16 GB	100 ns	Library in your building
SSD	500 GB	100,000 ns	Library across town

Why This Matters for Performance

# Bad: Random memory access
for i in random_order:
    array[i] += 1      # Cache miss every time!

# Good: Sequential access
for i in range(len(array)):
    array[i] += 1      # Cache loves this

The difference? Sequential access can be 100x faster because of how caches work. The cache loads data in chunks (cache lines). Sequential access uses the whole chunk; random access wastes it.

Numbers Every Programmer Should Know

Memorize these. They’ll inform every performance decision:

Operation	Time
L1 cache reference	1 ns
L2 cache reference	4 ns
RAM reference	100 ns
SSD random read	150,000 ns (150 µs)
HDD seek	10,000,000 ns (10 ms)
Network round-trip (same datacenter)	500,000 ns (500 µs)
Network round-trip (cross-country)	150,000,000 ns (150 ms)

Relative Scale

If an L1 cache access takes 1 second:

L2 cache: 4 seconds
RAM: 1.5 minutes
SSD: 1.5 days
HDD: 4 months
Cross-country network: 5 years

This is why caching, locality, and data structure choice matter so much.

From Hardware to Software

How does your Python code become CPU operations?

Your Code
Python/JS/etc. → Interpreter/Compiler → Machine Code → CPU

Compiled languages (C, Rust, Go): Code is translated to machine code ahead of time. Fast execution.

Interpreted languages (Python, JavaScript): Code is translated line-by-line at runtime. More flexible, slower.

JIT compiled (Java, modern JS): Hybrid-interpreted first, hot paths compiled during execution.

// This C code:
int x = 5;
int y = 10;
int z = x + y;

// Becomes something like:
// MOV R1, 5        ; Put 5 in register 1
// MOV R2, 10       ; Put 10 in register 2
// ADD R3, R1, R2   ; Add R1+R2, store in R3

Practice Exercises

Exercise 1: Latency Intuition (Beginner)

Rank these operations from fastest to slowest:

Reading from L1 cache
Reading from SSD
Reading from RAM
Network request to a server in the same city
Reading from L3 cache

Answer

L1 cache (1 ns)
L3 cache (20 ns)
RAM (100 ns)
SSD (150,000 ns)
Network (5,000,000+ ns)

Exercise 2: Cache Behavior (Intermediate)

Why is this loop fast:

for i in range(1000):
    total += array[i]

And this loop slow:

for i in range(1000):
    total += array[random.randint(0, len(array)-1)]

Answer

Sequential access benefits from cache prefetching. The CPU predicts you’ll need the next bytes and loads them in advance. Random access defeats this-every access is a cache miss.

Exercise 3: CPU Bottleneck Analysis (Advanced)

Your program takes 10 seconds. Profiling shows:

8 seconds waiting for disk I/O
1 second in CPU computation
1 second waiting for network

What’s the bottleneck? How would you optimize?

Answer

Disk I/O is the bottleneck (80% of time). Optimizations:

Load data into RAM once, process in memory
Use an SSD instead of HDD
Read files sequentially, not randomly
Use async I/O to overlap reading with processing

Optimizing CPU computation would save at most 1 second (10%).

Knowledge Check

What are the three main operations in the fetch-decode-execute cycle?
Why do we have cache memory? Why not just use more RAM?
L1 cache access takes 1ns. RAM access takes 100ns. How many L1 accesses could you do in the time of one RAM access?
What’s the difference between compiled and interpreted languages at the hardware level?
True or False: A 3 GHz CPU executes 3 billion instructions per second.

Answers

Fetch (get instruction from memory), Decode (understand what it means), Execute (perform the operation).
RAM is too slow for the CPU. The CPU would spend most of its time waiting. Cache is faster but more expensive per byte, so we use small amounts close to the CPU.
100 accesses. This is why cache hit rate matters so much.
Compiled: Machine code is generated once, runs directly on CPU. Interpreted: Code is translated to machine operations at runtime by another program.
False. Modern CPUs can execute multiple instructions per cycle (superscalar) and run at varying speeds (turbo boost). Actual instructions per second depends on workload.

Summary

Concept	Key Takeaway
CPU	The calculator that executes instructions
RAM	Fast, volatile working memory
Storage	Slow, persistent long-term storage
Fetch-Decode-Execute	The fundamental cycle of computation
Memory Hierarchy	Trade-off between speed and capacity
Cache	Small, fast memory that hides RAM latency

The mental model to internalize: Your computer is just shuffling numbers between different speed levels of storage, while a calculator adds them up billions of times per second.

What’s Next?

You now understand the machine. Next, understand how the operating system manages it:

🎯 Continue learning:

Networking Basics - How data moves between computers
What Is the Linux Kernel - The software that controls the hardware

🔬 Go deeper: The Linux Kernel Optimization Series covers how to tune these systems.

You now understand the machine that runs every piece of software on Earth. ⚡

Questions about this lesson? Working on related infrastructure?

Let's discuss