The Physics of Containers: Layers, OverlayFS & Chroot
A container is not a Virtual Machine. It is a process with a mask. The physics of Copy-on-Write (CoW), Union Filesystems, and manual container assembly.
🎯 What You'll Learn
- Deconstruct the Union Filesystem (OverlayFS) architecture
- Explain Copy-on-Write (CoW) at the inode level
- Build a container manually using `unshare`, `mount`, and `chroot`
- Differentiate `chroot` vs `pivot_root` (Security Physics)
- Trace an Image Pull: Manifests, Blobs, and SHA256 hashing
📚 Prerequisites
Before this lesson, you should understand:
Introduction
“Containers are lightweight VMs.” Incorrect. A VM is a Simulation of Hardware. A Container is an Isolation of Software.
When you run docker run ubuntu, you are not booting an OS.
You are running a standard Linux process bash, but the kernel is lying to it about:
- Who it is (PID Namespace).
- Where it lives (Mount Namespace + OverlayFS).
- What it owns (User Namespace).
This lesson explores the physics of the Filesystem Layering that makes containers efficient.
The Physics: OverlayFS (The Magic of Layers)
A 1GB Ubuntu Image acts like 1GB of data. But if you run 10 Ubuntu containers, you still only use ~1GB of disk usage. Why? OverlayFS (Union Filesystem).
OverlayFS takes multiple directories and stacks them like sheets of glass.
- LowerDir (Read-Only): The Base Image (Ubuntu). Physics: Immutable.
- UpperDir (Read-Write): The Container Layer. Physics: Empty at start.
- MergedDir: The Virtual View.
Copy-On-Write (CoW) Mechanics
When the container tries to Read /etc/passwd:
- The kernel looks in
Upper. Empty. - The kernel looks in
Lower. Found! It passes the inode pointer up. Zero extra space used.
When the container tries to Write/Modify /etc/passwd:
- Trap: The kernel pauses the write.
- Copy: It copies the entire file from
LowertoUpper. - Modify: The write happens in
Upper. - Mask: The
Upperfile now “masks” (hides) theLowerfile.
Physics Cost: The first write to a file is slow (Copy latency). Subsequent writes are fast.
The Root: chroot vs pivot_root
Technically, a container needs to change its Root Directory (/) to the merged overlay directory.
1. chroot (Legacy)
“Change Root”.
chroot /var/lib/docker/overlay2/merged /bin/bash.- Problem: A root user can break out of a chroot jail. It is not a security boundary.
2. pivot_root (Modern)
“Swap Root”.
- Moves the entire mount stack.
- The old host root
/is unmounted and moved to a directory inside the new root (e.g.,/old_root), which is then unmounted and detached. - Result: The process literally loses the reference to the host filesystem. It cannot go back.
Code: Building a Container from Scratch
Let’s build a container without Docker.
# 1. Create Filesystem directories
mkdir -p rootfs/{bin,lib,lib64,proc,sys}
# 2. Copy binaries (Busybox is great for this)
cp /bin/busybox rootfs/bin/
# 3. Create the Namespace Isolation
# -p (PID), -m (Mount), -u (UTS), -n (Net), -f (Fork)
sudo unshare -p -m -u -n -f --mount-proc sh -c "
# INSIDE THE CONTAINER:
# 4. Mount /proc (Crucial for ps to work)
mount -t proc proc /proc
# 5. Hostname
hostname container-physics
# 6. Pivot Root (Simplified as chroot here for readability)
# In reality, runc uses pivot_root
chroot rootfs /bin/busybox sh
"
Physics observation: If you run ps aux inside, you see PID 1. If you run ls /, you see only bin, lib, proc. You are trapped.
The Image: Manifests & Blobs
A Docker Image is actually a JSON file (The Manifest) pointing to direct-content-addressable blobs (tarballs).
- Registry: “Give me
ubuntu:latest”. - Manifest: “Here is the JSON. It has 3 layers:
sha256:abc...,sha256:def...,sha256:ghi...”. - Pull: Download the 3 tarballs.
- Extract: Unpack each tarball into a separate directory (
lowerdircandidates).
Practice Exercises
Exercise 1: Manual Overlay (Beginner)
Task: Create lower, upper, work, merged.
Action:
mount -t overlay overlay -o lowerdir=lower,upperdir=upper,workdir=work merged
Observation: Create a file in merged. See it appear in upper. lower remains untouched.
Exercise 2: The Breakout (Intermediate)
Task: Use chroot.
Challenge: As a C programmer, try to escape using fstream and ... (Google “chroot breakout”).
Lesson: Why we use pivot_root.
Exercise 3: Storage Driver Analysis (Advanced)
Task: Run docker info | grep Storage.
Action: Inspect /var/lib/docker/overlay2.
Physics: Find the exact correlation between a defined layer hash and the directory on disk.
Knowledge Check
- What happens when you write to a file in a container for the first time?
- Why is
pivot_rootmore secure thanchroot? - Does a container have its own kernel?
- What filesystem driver does Docker use by default?
- Where does the “Writeable Layer” exist?
Answers
- Latency Spike. The Copy-Up operation copies the file from Lower to Upper.
- Detachment. It unmounts the old root, removing the kernel handle to the host filesystem.
- No. It shares the Host Kernel.
- OverlayFS (overlay2).
- UpperDir. On the host disk, usually in
/var/lib/docker/overlay2/.../diff.
Summary
- OverlayFS: Stacked Filesystem.
- CoW: Read speed of Native, Write speed of Copy.
- Structure: Manifests + Blobs.
- Definition: Namespace + Cgroup + OverlayFS.
Questions about this lesson? Working on related infrastructure?
Let's discuss