eBPF Profiling Without the Observer Effect
How to measure latency at nanosecond precision without slowing down your application. eBPF, bpftrace, and kernel tracing.
🎯 What You'll Learn
- Understand why traditional profilers add latency
- Use bpftrace for nanosecond-precision tracing
- Write custom eBPF programs for application tracing
- Avoid the observer effect in production profiling
📚 Prerequisites
Before this lesson, you should understand:
The Profiling Paradox
You want to measure your application’s latency. You attach a profiler. Suddenly, your application is 10-100x slower.
Without profiler: 10µs per operation
With strace: 500µs per operation (50x slower!)
With gdb: Don't even try in production
This is the observer effect: measuring the system changes the system.
What You’ll Learn
By the end of this lesson, you’ll understand:
- Why profilers slow things down - Ptrace, context switches, memory overhead
- How eBPF bypasses the problem - Kernel-space tracing with JIT compilation
- Using bpftrace - One-liners for latency profiling
- Writing eBPF programs - Custom tracing for your application
The Foundation: Why Traditional Profilers Hurt
Traditional profilers use ptrace() to intercept syscalls:
That’s 3 extra context switches per syscall. Each context switch costs 1-5µs.
The “Aha!” Moment
Here’s what makes eBPF different:
eBPF programs run inside the kernel, not in a separate process. There’s no context switch. The tracing code executes in nanoseconds, not microseconds-and it’s JIT-compiled to native machine code.
eBPF adds ~100ns overhead. Traditional profilers add ~100µs. That’s a 1000x difference.
Let’s See It In Action: Compare strace vs bpftrace
# The slow way: strace (don't use in production)
strace -tt -T ./your_app 2>&1 | head -20
# Note: Your app is now 50x slower
# The fast way: bpftrace (production-safe)
sudo bpftrace -e 'tracepoint:syscalls:sys_enter_write /pid == 12345/ { @writes = count(); }'
# Your app runs at normal speed
Measuring syscall latency with bpftrace
# Histogram of write() syscall latency
sudo bpftrace -e '
tracepoint:syscalls:sys_enter_write /pid == 12345/ { @start[tid] = nsecs; }
tracepoint:syscalls:sys_exit_write /pid == 12345/ {
@latency = hist(nsecs - @start[tid]);
delete(@start[tid]);
}'
Output:
@latency:
[1K, 2K) 5234 |@@@@@@@@@@@@@@@@@@@@ |
[2K, 4K) 8123 |@@@@@@@@@@@@@@@@@@@@@@@@|
[4K, 8K) 2341 |@@@@@@@ |
[8K, 16K) 423 |@ |
bpftrace One-Liners for Trading
1. Network latency per packet
# Time from packet arrival to application read
sudo bpftrace -e '
kprobe:__netif_receive_skb { @start[arg0] = nsecs; }
kprobe:tcp_recvmsg /@start[arg0]/ {
@latency = hist(nsecs - @start[arg0]);
delete(@start[arg0]);
}'
2. Scheduler latency (time waiting to run)
# How long threads wait for CPU
sudo bpftrace -e '
tracepoint:sched:sched_wakeup { @qtime[args->pid] = nsecs; }
tracepoint:sched:sched_switch /@qtime[args->next_pid]/ {
@latency = hist(nsecs - @qtime[args->next_pid]);
delete(@qtime[args->next_pid]);
}'
3. Lock contention
# Time spent waiting for mutex
sudo bpftrace -e '
uprobe:/path/to/app:pthread_mutex_lock { @start[tid] = nsecs; }
uretprobe:/path/to/app:pthread_mutex_lock /@start[tid]/ {
@lock_time = hist(nsecs - @start[tid]);
delete(@start[tid]);
}'
Writing Custom eBPF Programs
For complex tracing, write eBPF in C:
// latency_trace.bpf.c
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
struct {
__uint(type, BPF_MAP_TYPE_HASH);
__uint(max_entries, 10240);
__type(key, u32);
__type(value, u64);
} start_time SEC(".maps");
SEC("tracepoint/syscalls/sys_enter_write")
int trace_write_entry(struct trace_event_raw_sys_enter *ctx) {
u32 tid = bpf_get_current_pid_tgid();
u64 ts = bpf_ktime_get_ns();
bpf_map_update_elem(&start_time, &tid, &ts, BPF_ANY);
return 0;
}
SEC("tracepoint/syscalls/sys_exit_write")
int trace_write_exit(struct trace_event_raw_sys_exit *ctx) {
u32 tid = bpf_get_current_pid_tgid();
u64 *ts = bpf_map_lookup_elem(&start_time, &tid);
if (ts) {
u64 latency = bpf_ktime_get_ns() - *ts;
// Log latency to perf buffer or map
bpf_map_delete_elem(&start_time, &tid);
}
return 0;
}
char LICENSE[] SEC("license") = "GPL";
Compile and load:
clang -O2 -target bpf -c latency_trace.bpf.c -o latency_trace.o
bpftool prog load latency_trace.o /sys/fs/bpf/latency_trace
Common Misconceptions
Myth: “eBPF is only for network tracing.”
Reality: eBPF can trace any kernel or user-space function-syscalls, scheduler events, locks, memory allocation, file I/O, and more.
Myth: “eBPF is unsafe and can crash the kernel.”
Reality: The eBPF verifier checks every program before loading. It ensures programs terminate, don’t access invalid memory, and don’t cause kernel panics.
Myth: “I need to learn C to use eBPF.”
Reality: bpftrace provides a high-level scripting language. You can do most tracing without writing any C.
Overhead Comparison
| Tool | Overhead | Use Case |
|---|---|---|
| bpftrace | ~100ns/event | Production tracing |
| perf | ~500ns/event | Sampling profiler |
| strace | ~50-100µs/event | Development only |
| gdb | Stops process | Development only |
Practice Exercises
Exercise 1: Trace Your Shell
# Count syscalls by type
sudo bpftrace -e 'tracepoint:syscalls:sys_enter_* /pid == '$BASHPID'/ { @[probe] = count(); }'
Exercise 2: Find Slow Writes
# Writes taking >1ms
sudo bpftrace -e '
tracepoint:syscalls:sys_enter_write { @start[tid] = nsecs; }
tracepoint:syscalls:sys_exit_write /@start[tid]/ {
$lat = nsecs - @start[tid];
if ($lat > 1000000) { printf("pid %d: %d ns\n", pid, $lat); }
delete(@start[tid]);
}'
Exercise 3: Profile Network Path
# Time from NIC to application
sudo bpftrace -e '
kprobe:napi_gro_receive { @recv[arg1] = nsecs; }
kretprobe:tcp_recvmsg /@recv[arg0]/ {
printf("network->app: %d ns\n", nsecs - @recv[arg0]);
delete(@recv[arg0]);
}'
Key Takeaways
- Traditional profilers add 50-100µs overhead - Context switches kill latency
- eBPF runs in kernel space - ~100ns overhead, 1000x faster
- bpftrace for quick wins - One-liners for common traces
- Custom eBPF for production - Full control with C programs
What’s Next?
🎯 Continue learning: Monitoring Trading Systems
🔬 Expert version: eBPF Profiling: Nanoseconds Without Adding Any
Now you know how to measure without disturbing. 🔬
Questions about this lesson? Working on related infrastructure?
Let's discuss