The Physics of Containers: Layers, OverlayFS & Chroot

A container is not a Virtual Machine. It is a process with a mask. The physics of Copy-on-Write (CoW), Union Filesystems, and manual container assembly.

Intermediate 45 min read Expert Version →

🎯 What You'll Learn

  • Deconstruct the Union Filesystem (OverlayFS) architecture
  • Explain Copy-on-Write (CoW) at the inode level
  • Build a container manually using `unshare`, `mount`, and `chroot`
  • Differentiate `chroot` vs `pivot_root` (Security Physics)
  • Trace an Image Pull: Manifests, Blobs, and SHA256 hashing

Introduction

“Containers are lightweight VMs.” Incorrect. A VM is a Simulation of Hardware. A Container is an Isolation of Software.

When you run docker run ubuntu, you are not booting an OS. You are running a standard Linux process bash, but the kernel is lying to it about:

  1. Who it is (PID Namespace).
  2. Where it lives (Mount Namespace + OverlayFS).
  3. What it owns (User Namespace).

This lesson explores the physics of the Filesystem Layering that makes containers efficient.


The Physics: OverlayFS (The Magic of Layers)

A 1GB Ubuntu Image acts like 1GB of data. But if you run 10 Ubuntu containers, you still only use ~1GB of disk usage. Why? OverlayFS (Union Filesystem).

OverlayFS takes multiple directories and stacks them like sheets of glass.

  1. LowerDir (Read-Only): The Base Image (Ubuntu). Physics: Immutable.
  2. UpperDir (Read-Write): The Container Layer. Physics: Empty at start.
  3. MergedDir: The Virtual View.

Copy-On-Write (CoW) Mechanics

When the container tries to Read /etc/passwd:

  • The kernel looks in Upper. Empty.
  • The kernel looks in Lower. Found! It passes the inode pointer up. Zero extra space used.

When the container tries to Write/Modify /etc/passwd:

  1. Trap: The kernel pauses the write.
  2. Copy: It copies the entire file from Lower to Upper.
  3. Modify: The write happens in Upper.
  4. Mask: The Upper file now “masks” (hides) the Lower file.

Physics Cost: The first write to a file is slow (Copy latency). Subsequent writes are fast.


The Root: chroot vs pivot_root

Technically, a container needs to change its Root Directory (/) to the merged overlay directory.

1. chroot (Legacy)

“Change Root”.

  • chroot /var/lib/docker/overlay2/merged /bin/bash.
  • Problem: A root user can break out of a chroot jail. It is not a security boundary.

2. pivot_root (Modern)

“Swap Root”.

  • Moves the entire mount stack.
  • The old host root / is unmounted and moved to a directory inside the new root (e.g., /old_root), which is then unmounted and detached.
  • Result: The process literally loses the reference to the host filesystem. It cannot go back.

Code: Building a Container from Scratch

Let’s build a container without Docker.

# 1. Create Filesystem directories
mkdir -p rootfs/{bin,lib,lib64,proc,sys}

# 2. Copy binaries (Busybox is great for this)
cp /bin/busybox rootfs/bin/

# 3. Create the Namespace Isolation
# -p (PID), -m (Mount), -u (UTS), -n (Net), -f (Fork)
sudo unshare -p -m -u -n -f --mount-proc sh -c "
  # INSIDE THE CONTAINER:

  # 4. Mount /proc (Crucial for ps to work)
  mount -t proc proc /proc

  # 5. Hostname
  hostname container-physics

  # 6. Pivot Root (Simplified as chroot here for readability)
  # In reality, runc uses pivot_root
  chroot rootfs /bin/busybox sh
"

Physics observation: If you run ps aux inside, you see PID 1. If you run ls /, you see only bin, lib, proc. You are trapped.


The Image: Manifests & Blobs

A Docker Image is actually a JSON file (The Manifest) pointing to direct-content-addressable blobs (tarballs).

  • Registry: “Give me ubuntu:latest”.
  • Manifest: “Here is the JSON. It has 3 layers: sha256:abc..., sha256:def..., sha256:ghi...”.
  • Pull: Download the 3 tarballs.
  • Extract: Unpack each tarball into a separate directory (lowerdir candidates).

Practice Exercises

Exercise 1: Manual Overlay (Beginner)

Task: Create lower, upper, work, merged. Action:

mount -t overlay overlay -o lowerdir=lower,upperdir=upper,workdir=work merged

Observation: Create a file in merged. See it appear in upper. lower remains untouched.

Exercise 2: The Breakout (Intermediate)

Task: Use chroot. Challenge: As a C programmer, try to escape using fstream and ... (Google “chroot breakout”). Lesson: Why we use pivot_root.

Exercise 3: Storage Driver Analysis (Advanced)

Task: Run docker info | grep Storage. Action: Inspect /var/lib/docker/overlay2. Physics: Find the exact correlation between a defined layer hash and the directory on disk.


Knowledge Check

  1. What happens when you write to a file in a container for the first time?
  2. Why is pivot_root more secure than chroot?
  3. Does a container have its own kernel?
  4. What filesystem driver does Docker use by default?
  5. Where does the “Writeable Layer” exist?
Answers
  1. Latency Spike. The Copy-Up operation copies the file from Lower to Upper.
  2. Detachment. It unmounts the old root, removing the kernel handle to the host filesystem.
  3. No. It shares the Host Kernel.
  4. OverlayFS (overlay2).
  5. UpperDir. On the host disk, usually in /var/lib/docker/overlay2/.../diff.

Summary

  • OverlayFS: Stacked Filesystem.
  • CoW: Read speed of Native, Write speed of Copy.
  • Structure: Manifests + Blobs.
  • Definition: Namespace + Cgroup + OverlayFS.

Questions about this lesson? Working on related infrastructure?

Let's discuss