LatencyScope
The Anatomy of a Nanosecond. Surgical latency analysis with eBPF.
pip install latencyscope The Observer Effect
You cannot debug a latency spike if your debugger causes the spike. Traditional tools like strace introduce massive overhead.
# Cost of "observing" a write() syscall:
Native
~300 ns
With strace
~50,000 ns (166x slower)
LatencyScope uses eBPF to achieve <500 ns overhead per event, fully decoupled from your application's runtime.
Surgical Precision
$ sudo latencyscope
LatencyScope v0.1.1 - HFT Latency Profiler
Target: PID 12345 (trading_engine) Duration: 10.0s | Cores: 4,5,6,7 (isolated)
╭──────────────────────────────────────────────────────────────────╮
│ ISOLATION VERIFIER │
├──────────────────────────────────────────────────────────────────┤
[FAIL] Context switches detected: 47 events
Worst: 12,847 ns runqueue latency @ 14:32:17.847
Cause: kworker/4:0 preempted trading_engine
Runqueue Latency:
P50: 124 ns P99: 312 ns P99.999: 12,847 ns
╰──────────────────────────────────────────────────────────────────╯
╭──────────────────────────────────────────────────────────────────╮
│ IRQ STORM DETECTOR │
├──────────────────────────────────────────────────────────────────┤
[WARN] IRQs on isolated cores: 12 events
Device: nvme0q5 | Max duration: 2,347 ns
Recommendation:
echo f0 > /proc/irq/124/smp_affinity
╰──────────────────────────────────────────────────────────────────╯
╭──────────────────────────────────────────────────────────────────╮
│ MEMORY STALL PROFILER │
├──────────────────────────────────────────────────────────────────┤
[PASS] No major page faults
[WARN] Minor faults: 23 | TLB shootdowns: 8
╰──────────────────────────────────────────────────────────────────╯
══════════════════════════════════════════════════════════════════
SUMMARY: 2 violations, 1 warning | Exit code: 2
══════════════════════════════════════════════════════════════════ Detailed Telemetry
What hides in the tail?
Isolation
-
Scheduler SwitchDetects any switch away from pinned PIDs on isolated cores
-
Runqueue LatencyMeasures time spent 'Runnable' but waiting for CPU
-
Migration CostTracks expensive cross-core task migrations
Interrupts
-
HardIRQsprecise duration of hardware interrupt vectors
-
SoftIRQsBottom-half processing latency and stealing cycles
-
IRQ AffinityVerifies interrupts abide by smp_affinity masks
Memory
-
Page FaultsMinor (pipeline freeze) and Major (disk I/O) fault tracking
-
TLB ShootdownsCross-core synchronization penalties
-
CompactionStalls from transparent hugepage sanitization
Architecture
-
eBPF / CO-RECompile Once, Run Everywhere: safe kernel tracing
-
Per-CPU BuffersLock-free ring buffers for nanosecond overhead
-
Zero CopyIn-kernel aggregation avoids userspace context switches
Safety First Architecture
Control Plane (Python)
Manages the lifecycle, parses symbol tables, and renders the TUI. It loads the BPF programs but stays out of the hot path.
Data Plane (eBPF/C)
Runs safely inside the kernel VM. Events are filtered in-kernel (if pid != target return 0) and aggregated in per-CPU ring buffers.
CLI Reference
Basic Profiling
Profile all modules for 10 seconds
sudo latencyscope --duration 10 Target PID
Focus analysis on a specific process
sudo latencyscope --pid $(pgrep trading) Isolation Check
Verify isolation on specific cores
sudo latencyscope --cpus 4,5,6,7 CI/CD Integration
Generate machine-readable output
sudo latencyscope --json > report.json Ready to go deeper?
As we push toward the theoretical limits of silicon, our eyes must improve before our hands can.
pip install latencyscope