Order Book Reconstruction in Sub-Millisecond
How to build and maintain order books from L2/L3 feeds. Sequence numbers, gap detection, and snapshot recovery.
🎯 What You'll Learn
- Understand L1/L2/L3 market data feeds
- Build an order book from incremental updates
- Handle sequence gaps and snapshot recovery
- Optimize for sub-millisecond reconstruction
📚 Prerequisites
Before this lesson, you should understand:
Why Sub-Millisecond Matters
Your trading strategy sees a price. By the time you react, the price has moved.
Order book age: 1ms → You might win
Order book age: 10ms → You're trading on stale data
Order book age: 100ms → You're the sucker at the table
This lesson teaches you how to build and maintain real-time order books from exchange data feeds.
What You’ll Learn
By the end of this lesson, you’ll understand:
- L1/L2/L3 data formats - Trade-offs between detail and bandwidth
- Incremental updates - Building books from deltas
- Sequence handling - Detecting and recovering from gaps
- Performance optimization - Memory layout and data structures
The Foundation: Market Data Levels
| Level | Contains | Updates/sec | Use Case |
|---|---|---|---|
| L1 | Best bid/ask only | 10-100 | Simple trading |
| L2 | Top N levels (e.g., top 10) | 100-1,000 | Most strategies |
| L3 | Every order (individual IDs) | 1,000-100,000 | Market making |
Most exchanges provide L2. True L3 is rare (mainly equity markets).
The “Aha!” Moment
Here’s what makes or breaks order book systems:
Order books are event-sourced. You don’t receive the current state-you receive a stream of changes. Miss one update, and your book is wrong forever until you re-sync.
Every update has a sequence number. Gap detection is mandatory.
Let’s See It In Action: Order Book Data Structure
from collections import defaultdict
from sortedcontainers import SortedDict
class OrderBook:
def __init__(self, symbol: str):
self.symbol = symbol
self.bids = SortedDict() # price -> quantity (descending)
self.asks = SortedDict() # price -> quantity (ascending)
self.sequence = 0
def apply_snapshot(self, snapshot: dict):
"""Initialize from full snapshot."""
self.bids.clear()
self.asks.clear()
for price, qty in snapshot['bids']:
self.bids[-float(price)] = float(qty) # Negative for descending
for price, qty in snapshot['asks']:
self.asks[float(price)] = float(qty)
self.sequence = snapshot['sequence']
def apply_update(self, update: dict) -> bool:
"""Apply incremental update. Returns False if gap detected."""
expected_seq = self.sequence + 1
if update['sequence'] != expected_seq:
return False # GAP! Need to re-sync
for price, qty in update.get('bids', []):
price = -float(price)
if float(qty) == 0:
self.bids.pop(price, None)
else:
self.bids[price] = float(qty)
for price, qty in update.get('asks', []):
price = float(price)
if float(qty) == 0:
self.asks.pop(price, None)
else:
self.asks[price] = float(qty)
self.sequence = update['sequence']
return True
def best_bid(self):
return -self.bids.peekitem(0)[0] if self.bids else None
def best_ask(self):
return self.asks.peekitem(0)[0] if self.asks else None
def mid_price(self):
bid, ask = self.best_bid(), self.best_ask()
return (bid + ask) / 2 if bid and ask else None
Sequence Gap Detection
class SafeOrderBook(OrderBook):
def __init__(self, symbol: str, on_gap_callback):
super().__init__(symbol)
self.on_gap = on_gap_callback
self.buffered_updates = []
self.max_buffer = 1000
def process_message(self, msg: dict):
if msg['type'] == 'snapshot':
self.apply_snapshot(msg)
self._replay_buffered()
elif msg['type'] == 'update':
if not self.apply_update(msg):
# Gap detected!
self._handle_gap(msg)
def _handle_gap(self, msg):
"""Buffer updates and request new snapshot."""
self.buffered_updates.append(msg)
if len(self.buffered_updates) > self.max_buffer:
self.buffered_updates.pop(0)
self.on_gap(self.symbol) # Trigger snapshot request
def _replay_buffered(self):
"""Apply buffered updates after snapshot."""
valid_updates = [u for u in self.buffered_updates
if u['sequence'] > self.sequence]
valid_updates.sort(key=lambda x: x['sequence'])
for update in valid_updates:
if not self.apply_update(update):
break # Still have a gap
self.buffered_updates.clear()
Performance Optimization
1. Memory Layout
# Bad: Python dicts with string keys
book = {"bids": {}, "asks": {}} # Slow, memory-intensive
# Better: NumPy arrays for numerical data
import numpy as np
bids = np.zeros((1000, 2), dtype=np.float64) # [price, qty] pairs
asks = np.zeros((1000, 2), dtype=np.float64)
# Best: Pre-allocated fixed-size arrays with manual management
2. Update Parsing
# Bad: JSON parsing every message
data = json.loads(raw_message)
# Better: Use orjson (3-10x faster)
import orjson
data = orjson.loads(raw_message)
# Best: Binary protocols (protobuf, FlatBuffers)
3. Price Level Indexing
# For fixed-tick instruments, use array indexing
# E.g., BTC with $1 ticks: price $50,000 = index 50000
class TickBook:
def __init__(self, min_price, max_price, tick_size):
self.min_price = min_price
self.tick_size = tick_size
n_levels = int((max_price - min_price) / tick_size)
self.bids = np.zeros(n_levels, dtype=np.float64)
self.asks = np.zeros(n_levels, dtype=np.float64)
def _price_to_idx(self, price):
return int((price - self.min_price) / self.tick_size)
def update_bid(self, price, qty):
self.bids[self._price_to_idx(price)] = qty # O(1)!
Common Misconceptions
Myth: “I can just poll the REST API for order book state.”
Reality: REST is seconds behind WebSocket. By the time you get the response, it’s ancient history. WebSocket with incremental updates is mandatory.
Myth: “Sequence gaps are rare, I can ignore them.”
Reality: Networks drop packets. WebSocket connections hiccup. Gaps happen in production-your code MUST handle them or your book diverges silently.
Myth: “My order book matches the exchange perfectly.”
Reality: There’s always propagation delay. Your book is a local approximation. Build strategies that tolerate this uncertainty.
Exchange Integration Examples
Binance
# Subscribe to order book updates
{
"method": "SUBSCRIBE",
"params": ["btcusdt@depth@100ms"], # 100ms batching
"id": 1
}
# Snapshot endpoint for recovery
GET /api/v3/depth?symbol=BTCUSDT&limit=1000
Coinbase
# Subscribe
{
"type": "subscribe",
"channels": [{"name": "level2", "product_ids": ["BTC-USD"]}]
}
# Initial snapshot comes first, then updates
Practice Exercises
Exercise 1: Build a Simple Book
# Process this sequence:
snapshot = {"bids": [["100", "10"]], "asks": [["101", "5"]], "sequence": 1}
update1 = {"bids": [["99", "20"]], "asks": [], "sequence": 2}
update2 = {"bids": [["100", "0"]], "asks": [["101", "8"]], "sequence": 3}
# What's the final state?
Exercise 2: Gap Detection
# What happens with this sequence?
snapshot = {"sequence": 100, ...}
update = {"sequence": 102, ...} # Gap! 101 is missing
Exercise 3: Benchmark Performance
# Measure updates per second
import time
start = time.perf_counter()
for _ in range(100000):
book.apply_update(sample_update)
elapsed = time.perf_counter() - start
print(f"{100000/elapsed:.0f} updates/sec")
Key Takeaways
- Event-sourcing model - Books are built from update streams
- Sequence numbers are critical - One missed update corrupts everything
- Buffer + snapshot = recovery - Handle gaps gracefully
- Performance is architecture - Data structures matter more than code optimization
What’s Next?
🎯 Continue learning: Market Making Fundamentals
🔬 Expert version: Order Book Reconstruction: Nanoseconds Matter
Now you can build order books that keep up with the market. 📊
Questions about this lesson? Working on related infrastructure?
Let's discuss