v0.1.1 — Now Available

LatencyScope

The Anatomy of a Nanosecond. Surgical latency analysis with eBPF.

$ pip install latencyscope

The Observer Effect

You cannot debug a latency spike if your debugger causes the spike. Traditional tools like strace introduce massive overhead.

# Cost of "observing" a write() syscall:

Native

~300 ns

With strace

~50,000 ns (166x slower)

LatencyScope uses eBPF to achieve <500 ns overhead per event, fully decoupled from your application's runtime.

Surgical Precision

$ sudo latencyscope
LatencyScope v0.1.1 - HFT Latency Profiler
Target: PID 12345 (trading_engine) Duration: 10.0s | Cores: 4,5,6,7 (isolated)

╭──────────────────────────────────────────────────────────────────╮
│                       ISOLATION VERIFIER                         │
├──────────────────────────────────────────────────────────────────┤
  [FAIL] Context switches detected: 47 events
  Worst: 12,847 ns runqueue latency @ 14:32:17.847
  Cause: kworker/4:0 preempted trading_engine

  Runqueue Latency:
  P50: 124 ns    P99: 312 ns    P99.999: 12,847 ns
╰──────────────────────────────────────────────────────────────────╯

╭──────────────────────────────────────────────────────────────────╮
│                       IRQ STORM DETECTOR                         │
├──────────────────────────────────────────────────────────────────┤
  [WARN] IRQs on isolated cores: 12 events
  Device: nvme0q5 | Max duration: 2,347 ns
  
  Recommendation:
  echo f0 > /proc/irq/124/smp_affinity
╰──────────────────────────────────────────────────────────────────╯

╭──────────────────────────────────────────────────────────────────╮
│                      MEMORY STALL PROFILER                       │
├──────────────────────────────────────────────────────────────────┤
  [PASS] No major page faults
  [WARN] Minor faults: 23 | TLB shootdowns: 8
╰──────────────────────────────────────────────────────────────────╯

══════════════════════════════════════════════════════════════════
 SUMMARY: 2 violations, 1 warning | Exit code: 2
══════════════════════════════════════════════════════════════════

Detailed Telemetry

What hides in the tail?

Isolation

Scheduler Switch
Detects any switch away from pinned PIDs on isolated cores
Runqueue Latency
Measures time spent 'Runnable' but waiting for CPU
Migration Cost
Tracks expensive cross-core task migrations

Interrupts

HardIRQs
precise duration of hardware interrupt vectors
SoftIRQs
Bottom-half processing latency and stealing cycles
IRQ Affinity
Verifies interrupts abide by smp_affinity masks

Memory

Page Faults
Minor (pipeline freeze) and Major (disk I/O) fault tracking
TLB Shootdowns
Cross-core synchronization penalties
Compaction
Stalls from transparent hugepage sanitization

Architecture

eBPF / CO-RE
Compile Once, Run Everywhere: safe kernel tracing
Per-CPU Buffers
Lock-free ring buffers for nanosecond overhead
Zero Copy
In-kernel aggregation avoids userspace context switches

Safety First Architecture

Control Plane (Python)

Manages the lifecycle, parses symbol tables, and renders the TUI. It loads the BPF programs but stays out of the hot path.

Data Plane (eBPF/C)

Runs safely inside the kernel VM. Events are filtered in-kernel (if pid != target return 0) and aggregated in per-CPU ring buffers.

CLI Reference

Basic Profiling

Profile all modules for 10 seconds

sudo latencyscope --duration 10

Target PID

Focus analysis on a specific process

sudo latencyscope --pid $(pgrep trading)

Isolation Check

Verify isolation on specific cores

sudo latencyscope --cpus 4,5,6,7

CI/CD Integration

Generate machine-readable output

sudo latencyscope --json > report.json

Ready to go deeper?

As we push toward the theoretical limits of silicon, our eyes must improve before our hands can.

$ pip install latencyscope

Read the research · Start with static auditing · Browser-based scorecard