The Physics of the Order Book: L2, L3, and Sequence Gaps
Why 'Price' is an aggregation of intent. Understanding Level 2 vs Level 3 data, UDP Sequence Gaps, and the Crossed Book phenomenon.
🎯 What You'll Learn
- Differentiate L1 (Top), L2 (Aggregated), and L3 (Individual) Data
- Deconstruct a WebSocket Delta Update (`Add`, `Update`, `Delete`)
- Analyze the 'Crossed Book' state (Bid >= Ask)
- Implement a Sequence Gap Detector for UDP Feeds
- Visualize Market Depth using Heatmaps
📚 Prerequisites
Before this lesson, you should understand:
🔬 Try It: Watch a Flash Crash
See an order book in action. Watch what happens when liquidity evaporates:
📊 Order Book Replay: Flash Crash
Introduction
Most traders see a chart. Engineers see a State Machine synchronized across thousands of miles.
The Order Book is not static. It is a living data structure that mutates millions of times per second. To build a trading bot, you don’t just “read” the book. You reconstruct it locally, packet by packet, verifying the integrity of the universe with every sequence number.
The Physics: L2 vs L3 Data
Data comes in resolutions.
Level 1 (Top of Book):
- “Best Bid is 101.”
- Physics: Low Bandwidth. Useful for retail UI. Useless for algo trading.
Level 2 (Aggregated Depth):
- “There are 500 shares at 99.”
- Physics: You know how much is there, but not who is there. Most HFT happens here.
Level 3 (Market by Order):
- “Order ID 123 (Size 100) added at $100.”
- “Order ID 456 (Size 50) added at $100.”
- Physics: Full visibility. Highest bandwidth (Gbps). You can track individual queue positions.
Deep Dive: Delta Updates & Sequence Gaps
Downloading the full book (Snapshot) takes 100ms. That is an eternity. Instead, we download a Snapshot once, and then apply Deltas (Changes).
The Protocol:
- Snapshot:
{ "bids": [[100, 500]], "seq": 50 } - Delta:
{ "action": "update", "price": 100, "size": 600, "seq": 51 }
The Physics of Gaps:
If you receive seq: 50 and then seq: 52, you have lost reality.
You cannot just “skip” packet 51. Packet 51 might have been “Sell 1 Million BTC”.
If you miss a packet, your local book is corrupted. You must disconnect, flush, and restart.
The Anomaly: Crossed Markets
In a sane universe, Best Bid < Best Ask.
If Bid >= Ask, a trade should have happened.
Why do we sometimes see Bid: $100, Ask: $99?
- Latency: The trade report packet hasn’t arrived yet.
- Exchange Lag: The Matching Engine is overwhelmed and hasn’t processed the cross yet.
- Arbitrage: This is happening on two different exchanges. (Buy on A at 100).
Code: The Local Book Builder
How to maintain a local L2 book from a stream of updates.
class OrderBook:
def __init__(self):
self.bids = {} # Price -> Size
self.asks = {}
self.last_seq = None
def process_update(self, msg):
# 1. Sequence Gap Detection
if self.last_seq and msg['seq'] != self.last_seq + 1:
raise Exception(f"GAP DETECTED! Expected {self.last_seq+1}, got {msg['seq']}")
self.last_seq = msg['seq']
# 2. Apply Delta
side = self.bids if msg['side'] == 'buy' else self.asks
price = msg['price']
if msg['size'] == 0:
if price in side: del side[price] # Delete level
else:
side[price] = msg['size'] # Upsert level
def get_best_bid(self):
return max(self.bids.keys()) if self.bids else 0
Practice Exercises
Exercise 1: Bandwidth Calculation (Beginner)
Scenario: An L3 feed sends 100 bytes per order. 50,000 orders/sec. Task: What is the bandwidth requirement? (5 MB/s). What happens if volatility spikes to 1,000,000 orders/sec? (100 MB/s - do you have a 1Gbps line?)
Exercise 2: The Ghost Order (Intermediate)
Scenario: You miss a “Delete” packet for Order A. Task: Your bot thinks Order A is still there. You try to trade against it. What happens? (Exchange rejects order: “Liquidity missing”).
Exercise 3: Crossed Book Arb (Advanced)
Task: Write a script that listens to 2 mock orderbooks.
Print “ARBITRAGE” whenever BookA.Bid > BookB.Ask.
Knowledge Check
- Why is L3 data “heavier” than L2?
- What does a sequence gap imply about your network?
- Why can’t you trade against a “Crossed Market” on the same exchange?
- What is a “Snapshot” vs a “Delta”?
- Why do HFTs prefer UDP over TCP for market data?
Answers
- Granularity. L3 sends every single order add/cancel. L2 only sends price level summaries.
- Packet Loss. UDP packets were dropped, or the CPU was too slow to read the socket buffer.
- Matching Engine Logic. The engine would have matched them instantly. If you see it, it’s a display artifact or a timing race.
- State vs Change. Snapshot is the full state (slow). Delta is the change (fast).
- Speed. TCP requires ACKs (slow). UDP fires and forgets (fast).
Summary
- L2 vs L3: Resolution vs Bandwidth trade-off.
- Sequence Numbers: The heartbeat of data integrity.
- Reconstruction: The art of keeping your local truth in sync with the exchange.
Pro Version: See the full research: Orderbook Reconstruction at Sub-Millisecond
Questions about this lesson? Working on related infrastructure?
Let's discuss