Skip to content

GrigoryEvko/symbiotic

Repository files navigation

Symbiotic

Symbiotic is an embedded profiling runtime for Rust. It instruments a program from the inside — hardware counters, kernel tracepoints, memory layout, per-line sample attribution — producing the kind of data that previously required a combination of perf, bpftrace, valgrind, and manual /proc parsing. One symbiotic::init() call replaces all of them.

The library compiles on stable Rust. Subsystems that require kernel support (eBPF, PMU counters) degrade gracefully when unavailable. Everything is feature-gated; what you don't enable doesn't compile.

Why another profiler

Existing Rust profiling tools fall into two categories: external sampling profilers that attach to a running process (perf record, samply, flamegraph-rs), and lightweight timing wrappers that measure wall time. Neither gives you hardware counter attribution at the source-line level, kernel event correlation during a specific code region, or a unified view of what the CPU, memory subsystem, scheduler, and I/O stack were doing during your function call.

Symbiotic closes that gap. A region!("sort", { data.sort(); }) block captures cycles, instructions, IPC, branch misses, L1D/LLC cache misses, page faults, context switches, futex contention, and I/O bytes — all attributed to that specific block, all in a single run.

Capabilities

Per-region hardware counters. RAII guards read six PMU counters (cycles, instructions, branches, branch misses, L1D misses, LLC misses) via diff-on-drop. Nested regions compose correctly. Cost: ~20ns per enter/exit.

96-counter kernel sensory system. A BPF program (sense_hub) attaches to 58 tracepoints and maintains 96 u64 counters in a BPF_F_MMAPABLE array. Userspace reads are volatile pointer loads — no syscalls, no ring buffer drain. Covers: network state (TCP/UDP/UNIX), file descriptors, I/O byte counts, page faults, RSS breakdown, scheduler stats, futex/lock contention, thread lifecycle, block I/O latency, page cache, writeback pressure, TCP health (retransmits, RTT, congestion), OOM, thermals, MCE.

Per-line IP sampling with DWARF resolution. Ten hardware and software events are captured at the instruction pointer level: cycles, L1D misses (via perf_event ring buffer), LLC misses, branch misses, DTLB misses, AMD IBS-Op, AMD IBS-Fetch, major page faults, CPU migrations, alignment faults (via BPF perf_event programs writing to a zero-copy circular buffer). After profiling, IPs are batch-resolved through DWARF via blazesym and aggregated per source line. Output is a .symbiot trace file (zstd-compressed JSON).

Multi-level code view. For hot functions, the disassembler correlates four representations — Rust source, MIR, LLVM IR, x86 assembly — with per-instruction sample counts. DWARF .loc directives, MIR scope chains, and IR debug metadata provide the cross-level mapping.

eBPF off-CPU, syscall, and lock profiling. Separate BPF programs track sched_switch (off-CPU stacks with blazesym symbolization), per-syscall latency distributions, and futex contention (wait time + call stacks). BTF tracepoints are preferred where available, with automatic fallback to legacy tracepoints.

Process-wide PMU counters. inherit(true) counters span all threads — rayon workers, thread pools, everything. Provides accurate IPC and branch miss rate for the entire process.

Interactive report viewer. A terminal UI (ratatui) with tabs for overview, CPU, cache hierarchy, per-line samples, region tree, sensory state, and analysis. An HTML dashboard (self-contained, offline) provides the same data in a VS Code-style Monokai Pro layout with a timeline swimlane.

Query server. HTTP/JSON, gRPC, and Unix socket endpoints expose live metrics while the profiled program runs. The dashboard auto-starts on port 9882.

Usage

use symbiotic::{region, BenchProfiler};

fn main() {
    symbiotic::init(); // loads BPF, enables sensory capture

    let _profiler = BenchProfiler::new("my_workload");

    region!("sort", {
        data.sort_unstable();
    });

    region!("process", {
        expensive_computation(&data);
    });

    // report generated on drop: TUI, HTML dashboard, .symbiot trace
}

Annotate functions directly:

#[symbiotic::profile]
fn hot_function(data: &mut [f64]) -> f64 {
    data.sort_unstable_by(|a, b| a.partial_cmp(b).unwrap());
    data.iter().sum()
}

Feature flags

Flag What it enables
profiling Per-region PMU counters (region!, RegionGuard)
ebpf BPF sensory system + off-CPU/syscall/lock profiling
line-profiler Per-line IP sampling + DWARF resolution + .symbiot export
disasm Multi-level code view (source/MIR/IR/ASM)
tui Interactive terminal report viewer
dashboard VS Code-style HTML dashboard + live web UI
server HTTP/JSON query server
server-grpc gRPC transport
pmu Raw PMU counter access
jemalloc jemalloc allocator statistics
alloc-track Per-region allocation counting
flamegraph Flamegraph SVG export
full Everything above

Requirements

  • Linux (kernel >= 5.8 for full eBPF support; PMU counters work on older kernels)
  • perf_event_paranoid <= 1 for hardware counter access, or CAP_PERFMON
  • CAP_BPF + CAP_PERFMON for eBPF programs (or root)
  • Stable Rust toolchain

Roadmap

The following areas are under active development. Items are listed by theme, not priority.

Ecosystem integration. Criterion adapter for automatic hardware-counter enrichment of benchmark results. cargo-symbiotic subcommand for one-shot profiling of any binary or test. Integration with tracing spans so existing instrumented code gets PMU attribution without source changes.

CI/CD and regression detection. Machine-readable profile output (JSON, protobuf) suitable for ingestion by CI systems. Diff mode: compare two .symbiot traces and surface regressions in IPC, cache miss rate, or branch misprediction at the source-line level. Threshold-based assertions (assert_ipc!(region, >= 2.0)) for performance-gated CI pipelines.

Profile-guided development. IDE integration beyond VS Code: per-line hardware counters displayed inline in the editor during development. Persistent profile database across runs for trend analysis. Correlation of profile data with git blame to attribute regressions to specific commits.

Platform and architecture. Full aarch64 PMU support (ARMv8.1+ SPE for precise memory sampling). macOS kperf backend for Apple Silicon. Cross-platform fallback mode (wall time + allocation tracking) for environments without PMU access.

Deeper kernel visibility. Per-syscall latency histograms with automatic slow-path detection. IRQ and softirq stolen-time attribution to specific code regions. NUMA topology-aware memory placement analysis.

Allocation profiling. Integration with jemalloc's prof facility for heap flamegraphs. Per-region allocation rate tracking with leak detection heuristics. Object lifetime analysis for identifying unnecessary clones.

License

Dual-licensed under MIT and Apache 2.0.

About

High-resolution performance profiling with hardware PMU counters, eBPF, and tracing integration

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors