Symbiotic

Symbiotic is an embedded profiling runtime for Rust. It instruments a program from the inside — hardware counters, kernel tracepoints, memory layout, per-line sample attribution — producing the kind of data that previously required a combination of perf, bpftrace, valgrind, and manual /proc parsing. One symbiotic::init() call replaces all of them.

The library compiles on stable Rust. Subsystems that require kernel support (eBPF, PMU counters) degrade gracefully when unavailable. Everything is feature-gated; what you don't enable doesn't compile.

Why another profiler

Existing Rust profiling tools fall into two categories: external sampling profilers that attach to a running process (perf record, samply, flamegraph-rs), and lightweight timing wrappers that measure wall time. Neither gives you hardware counter attribution at the source-line level, kernel event correlation during a specific code region, or a unified view of what the CPU, memory subsystem, scheduler, and I/O stack were doing during your function call.

Symbiotic closes that gap. A region!("sort", { data.sort(); }) block captures cycles, instructions, IPC, branch misses, L1D/LLC cache misses, page faults, context switches, futex contention, and I/O bytes — all attributed to that specific block, all in a single run.

Capabilities

Per-region hardware counters. RAII guards read six PMU counters (cycles, instructions, branches, branch misses, L1D misses, LLC misses) via diff-on-drop. Nested regions compose correctly. Cost: ~20ns per enter/exit.

96-counter kernel sensory system. A BPF program (sense_hub) attaches to 58 tracepoints and maintains 96 u64 counters in a BPF_F_MMAPABLE array. Userspace reads are volatile pointer loads — no syscalls, no ring buffer drain. Covers: network state (TCP/UDP/UNIX), file descriptors, I/O byte counts, page faults, RSS breakdown, scheduler stats, futex/lock contention, thread lifecycle, block I/O latency, page cache, writeback pressure, TCP health (retransmits, RTT, congestion), OOM, thermals, MCE.

Per-line IP sampling with DWARF resolution. Ten hardware and software events are captured at the instruction pointer level: cycles, L1D misses (via perf_event ring buffer), LLC misses, branch misses, DTLB misses, AMD IBS-Op, AMD IBS-Fetch, major page faults, CPU migrations, alignment faults (via BPF perf_event programs writing to a zero-copy circular buffer). After profiling, IPs are batch-resolved through DWARF via blazesym and aggregated per source line. Output is a .symbiot trace file (zstd-compressed JSON).

Multi-level code view. For hot functions, the disassembler correlates four representations — Rust source, MIR, LLVM IR, x86 assembly — with per-instruction sample counts. DWARF .loc directives, MIR scope chains, and IR debug metadata provide the cross-level mapping.

eBPF off-CPU, syscall, and lock profiling. Separate BPF programs track sched_switch (off-CPU stacks with blazesym symbolization), per-syscall latency distributions, and futex contention (wait time + call stacks). BTF tracepoints are preferred where available, with automatic fallback to legacy tracepoints.

Process-wide PMU counters. inherit(true) counters span all threads — rayon workers, thread pools, everything. Provides accurate IPC and branch miss rate for the entire process.

Interactive report viewer. A terminal UI (ratatui) with tabs for overview, CPU, cache hierarchy, per-line samples, region tree, sensory state, and analysis. An HTML dashboard (self-contained, offline) provides the same data in a VS Code-style Monokai Pro layout with a timeline swimlane.

Query server. HTTP/JSON, gRPC, and Unix socket endpoints expose live metrics while the profiled program runs. The dashboard auto-starts on port 9882.

Usage

use symbiotic::{region, BenchProfiler};

fn main() {
    symbiotic::init(); // loads BPF, enables sensory capture

    let _profiler = BenchProfiler::new("my_workload");

    region!("sort", {
        data.sort_unstable();
    });

    region!("process", {
        expensive_computation(&data);
    });

    // report generated on drop: TUI, HTML dashboard, .symbiot trace
}

Annotate functions directly:

#[symbiotic::profile]
fn hot_function(data: &mut [f64]) -> f64 {
    data.sort_unstable_by(|a, b| a.partial_cmp(b).unwrap());
    data.iter().sum()
}

Feature flags

Flag	What it enables
`profiling`	Per-region PMU counters (`region!`, `RegionGuard`)
`ebpf`	BPF sensory system + off-CPU/syscall/lock profiling
`line-profiler`	Per-line IP sampling + DWARF resolution + `.symbiot` export
`disasm`	Multi-level code view (source/MIR/IR/ASM)
`tui`	Interactive terminal report viewer
`dashboard`	VS Code-style HTML dashboard + live web UI
`server`	HTTP/JSON query server
`server-grpc`	gRPC transport
`pmu`	Raw PMU counter access
`jemalloc`	jemalloc allocator statistics
`alloc-track`	Per-region allocation counting
`flamegraph`	Flamegraph SVG export
`full`	Everything above

Requirements

Linux (kernel >= 5.8 for full eBPF support; PMU counters work on older kernels)
perf_event_paranoid <= 1 for hardware counter access, or CAP_PERFMON
CAP_BPF + CAP_PERFMON for eBPF programs (or root)
Stable Rust toolchain

Roadmap

The following areas are under active development. Items are listed by theme, not priority.

Ecosystem integration. Criterion adapter for automatic hardware-counter enrichment of benchmark results. cargo-symbiotic subcommand for one-shot profiling of any binary or test. Integration with tracing spans so existing instrumented code gets PMU attribution without source changes.

CI/CD and regression detection. Machine-readable profile output (JSON, protobuf) suitable for ingestion by CI systems. Diff mode: compare two .symbiot traces and surface regressions in IPC, cache miss rate, or branch misprediction at the source-line level. Threshold-based assertions (assert_ipc!(region, >= 2.0)) for performance-gated CI pipelines.

Profile-guided development. IDE integration beyond VS Code: per-line hardware counters displayed inline in the editor during development. Persistent profile database across runs for trend analysis. Correlation of profile data with git blame to attribute regressions to specific commits.

Platform and architecture. Full aarch64 PMU support (ARMv8.1+ SPE for precise memory sampling). macOS kperf backend for Apple Silicon. Cross-platform fallback mode (wall time + allocation tracking) for environments without PMU access.

Deeper kernel visibility. Per-syscall latency histograms with automatic slow-path detection. IRQ and softirq stolen-time attribution to specific code regions. NUMA topology-aware memory placement analysis.

Allocation profiling. Integration with jemalloc's prof facility for heap flamegraphs. Per-region allocation rate tracking with leak detection heuristics. Object lifetime analysis for identifying unnecessary clones.

License

Dual-licensed under MIT and Apache 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
benches		benches
bpf		bpf
examples		examples
proto		proto
src		src
symbiotic_macros		symbiotic_macros
vscode-symbiot		vscode-symbiot
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md
build.rs		build.rs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Symbiotic

Why another profiler

Capabilities

Usage

Feature flags

Requirements

Roadmap

License

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Symbiotic

Why another profiler

Capabilities

Usage

Feature flags

Requirements

Roadmap

License

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages