v0.1.1 — Now Available

LatencyScope

The Anatomy of a Nanosecond. Surgical latency analysis with eBPF.

$ pip install latencyscope

The Observer Effect

You cannot debug a latency spike if your debugger causes the spike. Traditional tools like strace introduce massive overhead.

# Cost of "observing" a write() syscall:

Native

~300 ns

With strace

~50,000 ns (166x slower)

LatencyScope uses eBPF to achieve <500 ns overhead per event, fully decoupled from your application's runtime.

Surgical Precision

$ sudo latencyscope
LatencyScope v0.1.1 - HFT Latency Profiler
Target: PID 12345 (trading_engine) Duration: 10.0s | Cores: 4,5,6,7 (isolated)

╭──────────────────────────────────────────────────────────────────╮
│                       ISOLATION VERIFIER                         │
├──────────────────────────────────────────────────────────────────┤
  [FAIL] Context switches detected: 47 events
  Worst: 12,847 ns runqueue latency @ 14:32:17.847
  Cause: kworker/4:0 preempted trading_engine

  Runqueue Latency:
  P50: 124 ns    P99: 312 ns    P99.999: 12,847 ns
╰──────────────────────────────────────────────────────────────────╯

╭──────────────────────────────────────────────────────────────────╮
│                       IRQ STORM DETECTOR                         │
├──────────────────────────────────────────────────────────────────┤
  [WARN] IRQs on isolated cores: 12 events
  Device: nvme0q5 | Max duration: 2,347 ns
  
  Recommendation:
  echo f0 > /proc/irq/124/smp_affinity
╰──────────────────────────────────────────────────────────────────╯

╭──────────────────────────────────────────────────────────────────╮
│                      MEMORY STALL PROFILER                       │
├──────────────────────────────────────────────────────────────────┤
  [PASS] No major page faults
  [WARN] Minor faults: 23 | TLB shootdowns: 8
╰──────────────────────────────────────────────────────────────────╯

══════════════════════════════════════════════════════════════════
 SUMMARY: 2 violations, 1 warning | Exit code: 2
══════════════════════════════════════════════════════════════════

Detailed Telemetry

What hides in the tail?

Isolation

  • Scheduler Switch

    Detects any switch away from pinned PIDs on isolated cores

  • Runqueue Latency

    Measures time spent 'Runnable' but waiting for CPU

  • Migration Cost

    Tracks expensive cross-core task migrations

Interrupts

  • HardIRQs

    precise duration of hardware interrupt vectors

  • SoftIRQs

    Bottom-half processing latency and stealing cycles

  • IRQ Affinity

    Verifies interrupts abide by smp_affinity masks

Memory

  • Page Faults

    Minor (pipeline freeze) and Major (disk I/O) fault tracking

  • TLB Shootdowns

    Cross-core synchronization penalties

  • Compaction

    Stalls from transparent hugepage sanitization

Architecture

  • eBPF / CO-RE

    Compile Once, Run Everywhere: safe kernel tracing

  • Per-CPU Buffers

    Lock-free ring buffers for nanosecond overhead

  • Zero Copy

    In-kernel aggregation avoids userspace context switches

Safety First Architecture

Control Plane (Python)

Manages the lifecycle, parses symbol tables, and renders the TUI. It loads the BPF programs but stays out of the hot path.

Data Plane (eBPF/C)

Runs safely inside the kernel VM. Events are filtered in-kernel (if pid != target return 0) and aggregated in per-CPU ring buffers.

CLI Reference

Basic Profiling

Profile all modules for 10 seconds

sudo latencyscope --duration 10

Target PID

Focus analysis on a specific process

sudo latencyscope --pid $(pgrep trading)

Isolation Check

Verify isolation on specific cores

sudo latencyscope --cpus 4,5,6,7

CI/CD Integration

Generate machine-readable output

sudo latencyscope --json > report.json

Ready to go deeper?

As we push toward the theoretical limits of silicon, our eyes must improve before our hands can.

$ pip install latencyscope