The Sub-50µs Cloud Lie
Why cloud vendors' latency claims don't match reality for trading. Real measurements and the hard limits of cloud infrastructure.
🎯 What You'll Learn
- Understand why vendor latency claims are misleading
- Learn how to measure real trading latency
- Identify cloud infrastructure limitations
- Know when cloud works and when it doesn't
📚 Prerequisites
Before this lesson, you should understand:
The Marketing vs Reality Gap
Cloud vendors claim “sub-millisecond latency.” Your trading system measures 5-50ms. What’s going on?
AWS claims: "Single-digit millisecond latency"
Your measurement: 15ms to Binance
Reality: Both are "correct" - but measuring different things
This lesson exposes the gap between marketing claims and trading reality.
What You’ll Learn
By the end of this lesson, you’ll understand:
- What vendors actually measure - VM-to-VM, not your use case
- Real trading latency sources - Network, hypervisor, kernel
- How to measure properly - End-to-end, with percentiles
- When cloud makes sense - Not for HFT, but maybe for you
The Foundation: What “Latency” Actually Means
Vendors measure inter-VM latency within the same datacenter:
EC2 instance-A → EC2 instance-B (same AZ)
AWS claims: ~50-100µs
What you actually need:
Your EC2 → Internet → Exchange → Processing → Response
Reality: 5-50ms depending on exchange
Marketing latency ≠ application latency
The “Aha!” Moment
Here’s what cloud vendors won’t tell you:
The hypervisor adds 5-20µs of jitter to every network operation. You share physical hardware with other tenants. When they spike, you spike. This variability is invisible in averages but destroys your p99 latency.
Dedicated hardware doesn’t have this problem.
Let’s See It In Action: Measuring Real Latency
Measure VM-to-VM (What AWS Claims)
# Install sockperf on two EC2 instances
sudo apt install sockperf
# Server side
sockperf server -i 0.0.0.0 -p 12345
# Client side - measure latency
sockperf ping-pong -i <server-ip> -p 12345 --pps=max -t 60
# Typical AWS result: avg 60µs, p99 150µs
Measure to Exchange (What You Actually Get)
import time
import requests
def measure_exchange_latency(url, n=100):
latencies = []
for _ in range(n):
start = time.perf_counter()
requests.get(url)
latencies.append((time.perf_counter() - start) * 1000)
latencies.sort()
print(f"Min: {latencies[0]:.1f}ms")
print(f"Avg: {sum(latencies)/len(latencies):.1f}ms")
print(f"P99: {latencies[int(n*0.99)]:.1f}ms")
print(f"Max: {latencies[-1]:.1f}ms")
# Run from EC2
measure_exchange_latency("https://api.binance.com/api/v3/time")
# Typical: Min 15ms, Avg 25ms, P99 80ms
Where Cloud Latency Comes From
| Source | Contribution | Fixable? |
|---|---|---|
| Physical distance | 1-50ms | Move to colo |
| Internet routing | 1-20ms | Pay for direct connect |
| Hypervisor overhead | 5-20µs | Bare metal instance |
| Kernel network stack | 10-50µs | Kernel tuning |
| Your application | Variable | Code optimization |
90% of your latency is location + network path. Optimizing code won’t fix this.
The Noisy Neighbor Problem
Shared infrastructure means shared variability:
Normal operation:
Your latency: 50µs
Neighbor running ML training:
Your latency: 200µs (CPU steal)
Neighbor doing heavy I/O:
Your latency: 500µs (network contention)
This variability is random and unpredictable. Your p99 suffers.
Measuring CPU Steal
# Check if you're losing CPU to other tenants
vmstat 1 | awk 'NR>2 {print "steal:", $18"%"}'
# >0% steal means others are taking your CPU time
AWS Instance Selection
| Instance Type | Latency Profile | Monthly Cost |
|---|---|---|
| t3.medium | High variability, burst | $30 |
| c6i.2xlarge | Better, still shared | $250 |
| c6i.metal | Bare metal, no hypervisor | $3,000 |
| p4d.24xlarge | Dedicated network | $30,000+ |
For trading: Minimum c5n/c6i.xlarge with Enhanced Networking.
Common Misconceptions
Myth: “Faster instance types = lower latency.”
Reality: Instance type affects CPU, not network latency. A t3.micro and p4d.24xlarge have similar network latency to external destinations.
Myth: “AWS Direct Connect solves all latency problems.”
Reality: Direct Connect reduces internet routing variability (~5-10ms savings) but doesn’t fix hypervisor jitter or distance.
Myth: “My cloud setup is fast enough because average latency is low.”
Reality: Averages hide tail latency. Your p99 or p99.9 is what matters for trading. One 500ms spike per minute is catastrophic.
When Cloud Makes Sense
Cloud is Fine For:
- Swing trading (minutes to days)
- Backtesting and research
- Non-latency-sensitive strategies
- Starting out / proving concepts
Cloud is Not Fine For:
- Market making
- HFT strategies
- Arbitrage (especially cross-exchange)
- Any strategy where you compete on speed
Honest Latency Budget
If you’re serious about cloud trading:
Fixed costs (can't optimize):
Distance to exchange: 10-30ms
Internet routing: 5-15ms
TLS handshake: 5-10ms
Variable costs (can optimize):
Application code: 0.1-10ms
Network stack: 0.01-0.1ms
Realistic total: 25-70ms
Your competitor in colo: 0.1-1ms
You’re 25-700x slower. Accept it or move to colo.
Practice Exercises
Exercise 1: Measure Your Reality
# From your trading server, measure to your exchange
while true; do
curl -w "%{time_total}\n" -o /dev/null -s https://api.exchange.com/time
sleep 1
done | tee latency.log
Exercise 2: Check for Steal Time
# Monitor for 1 hour
vmstat 1 3600 | awk '{print $18}' > steal.log
# Any non-zero values?
Exercise 3: Compare Instance Types
If budget allows:
- Spin up c6i.xlarge and c6i.metal
- Run same latency test on both
- Compare p99 latency
Key Takeaways
- Vendor claims measure the wrong thing - VM-to-VM ≠ to-exchange
- Hypervisor adds jitter - Shared infrastructure = shared variability
- Distance dominates - No amount of tuning fixes 10ms of physics
- Know your use case - Cloud works for some strategies, not others
What’s Next?
🎯 Continue learning: Trading Infrastructure First Principles
🔬 Expert version: The Sub-50µs Cloud Lie
Now you know what cloud vendors aren’t telling you. ☁️
Questions about this lesson? Working on related infrastructure?
Let's discuss