myelon

Ultra-low-latency and high-throughput multiprocess transport stack over SHM and mmap ring buffers on Linux and macOS.

Dr Bill Dally is working on shaving nanoseconds by shrinking the distance data has to travel (on-chip wires, off-chip PHYs, memory-to-compute) and stripping overhead out of the path. System software should be just as serious about stripping copies, wakeups, coordination, and communication distance out of its own path.

myelon borrows its name from the Greek root behind the spinal cord: the signal path between brain and body. Wrapped in myelin, that cord exists to move impulses fast and clean. This crate tries to do the same for processes: give independent OS processes a low-latency fabric over SHM and mmap, with as little copying, waiting, and coordination drag as possible.

myelon is the default crate in this repository. It builds on top of disruptor-mp, which extends LMAX Disruptor's HFT-grade design (lock-free atomics, no syscalls in the hot path, cache-aligned cursors, busyspin wait) to cross-process IPC. myelon keeps the raw ring reachable and adds framing, codecs, typed zero-copy, topology helpers, and layout helpers for low-latency pipelines.

This repository publishes two crates:

Crate	Use it when...
`disruptor-mp`	You want the raw fixed-size event substrate and you own the wire format.
`myelon`	You want one dependency that keeps the raw ring reachable and also gives you framing, codecs, typed zero-copy, topology helpers, and layout helpers.

disruptor-mp is the Layer 0 substrate. myelon is the broader public transport crate built on top of it.

Both crates do non-trivial unsafe work internally (raw ring slots, SHM mmap, atomic ordering, manual cache-line layout), but the public API is safe Rust with only a small set of documented escape hatches. See Safety & quality gates below.

Headline numbers

Throughput mode reports the peak attainable rate plus its latency distribution under self-pressure. CO mode reports coordinated-omission-corrected latency while holding a configured constant offered rate, so tail percentiles reflect real time-to-receipt, not just bench iteration time.

Signal ceiling, consumer scaling (no payload)

Topology	shm	mmap
1p1c	332 M ops/s	239 M ops/s
1p2c	163 M	188 M
1p4c	95 M	97 M
1p8c	30 M	44 M

Pipelined fan-out, no ack. Per-consumer rate within 0.5% of producer rate. Signal scales down as consumers fan out because the publisher contends with each consumer's cursor.

Raw ping-pong, 1p1c, 64B, shm, busyspin

Mode	Achieved ops/s	P50	P99	P99.99
Max throughput	5.58 M	130 ns	240 ns	2.5 µs
CO constant rate @ 1.2 M ops/s	1.20 M	188 ns	282 ns	13.3 µs

Producer rate equals consumer rate (single ring, round-trip). The CO row holds 1.2 M ops/s sustained with coordinated-omission-corrected percentiles. mmap variants reach the same throughput at the same P50, with the P99.99 tail 4-5x wider.

Framed ping-pong, payload scaling (1p1c, shm, throughput mode)

Payload	ops/s	GB/s	P50	P99	P99.99
64 B	4.51 M	0.29	180 ns	300 ns	3.02 µs
1 KB	2.45 M	2.51	360 ns	610 ns	3.70 µs
32 KB	133.6 K	4.38	7.31 µs	11.41 µs	24.89 µs
128 KB (multi-frame)	31.2 K	4.09	31.68 µs	38.08 µs	50.24 µs

Producer rate equals consumer rate (single ring, round-trip). 128 KB fragments across multiple frames; per-message rate drops but per-message bandwidth stays at ~4 GB/s.

Framed broadcast, consumer × payload scaling (mmap, throughput mode)

Topology	Payload	Producer ops/s	Per-consumer ops/s	Producer GB/s
1p4c	1 KB	9.09 M	9.07 M	9.31
1p4c	128 KB	108.9 K	109.0 K	14.27
1p8c	1 KB	5.98 M	5.97 M	6.13
1p8c	128 KB	88.3 K	85.5 K	11.57

Each consumer receives every message; per-consumer rate within 0.5-3.2% of producer. Aggregate fan-out scales by N: 1p8c × 128 KB delivers ~92.6 GB/s aggregate across 8 consumers. Broadcast throughput mode doesn't measure per-message RTT (no ack); CO-mode latency under sustained load lives in the crates/perf-bench/ bench output.

Typed zero-copy ping-pong (shm, rkyv, throughput mode)

Batch	Payload	ops/s	P50	P99	P99.99
1	592 B	1.89 M	490 ns	660 ns	4.05 µs
64	37 KB	94.2 K	10.5 µs	13.7 µs	22.4 µs
256	150 KB	26.7 K	37.2 µs	44.4 µs	55.0 µs

ZeroCopyCodec::access reads Archived<T> fields in place. Speedup vs full owned decode: 3.0× at batch=1, 5.5× at batch=64, 5.7× at batch=256.

Measured on AMD Ryzen 7 5800X (8 cores / 16 threads, 4.85 GHz boost, 32 MiB L3), 64 GiB DDR4, Ubuntu 22.04, kernel 6.8 via crates/perf-bench/.

Charts

Pingpong throughput at 1 KB payload: myelon-raw vs 11 other in-machine IPC adapters.

Broadcast P99 latency at 1 KB · 4 consumers · 400 K msgs/s sustained (coordinated-omission-corrected).

Pingpong throughput heatmap across the full adapter × payload matrix.

More bench charts (per-layer heatmaps, payload-vs-latency curves, broadcast scaling) live in assets/ and the crates/perf-bench/ bench output.

Quick start

Start with the crate that matches your data model.

[dependencies]
myelon = "0.1.0-alpha.2"

Or, if you only want the raw ring:

[dependencies]
disruptor-mp = "0.1.0-alpha.2"

Runnable first-party examples live under examples/demos:

cargo run --release -p demos --example shm_disruptor
cargo run --release -p demos --example pingpong

Repository layout

crates/
├── disruptor-mp/        # Publishable raw multiprocess substrate.
├── myelon/              # Publishable layered transport crate.
├── myelon-dst/          # Internal deterministic-simulation runner. Inspired by FoundationDB, TigerBeetle, Turso & SlateDB.
├── perf-bench/          # Internal broad transport sweep harness.
└── competitive-bench/   # Internal external-comparison harness.

examples/
├── demos/               # Runnable first-party examples.
└── myelon-pulse-vanity/ # Brand vanity demo shown above.

book/                    # mdBook source. Maintained separately from this README.

Validation and benchmarks

Top-level workspace commands:

make help
make build
make test
make workspace-smoke
make orchestrate-rust
make smoke

Benchmark crate entry points:

make -C crates/perf-bench super-tiny
make -C crates/competitive-bench super-tiny

Bench binaries are built with --profile competitive (max-perf, panic=abort, stripped — bench-fairness defaults, not production). For production use release or prod-max (release + fat LTO + line-table debug).

Features

`disruptor-mp`: raw multiprocess substrate

`myelon`: layered transport on top of `disruptor-mp`

`myelon-dst`: internal deterministic-simulation harness

Runner with fault injection and invariant oracle.
Verification, report emission, and DST-coverage sweep.

`perf-bench`: internal broad transport sweep harness

Pingpong, broadcast, signal, repeatability binaries.
Layer matrix: raw, framed, typed, codec, typed-zero-copy (all × shm / mmap).
Throughput and CO-aware fixed-rate measurement modes.
Tier ladder: super-tiny, simple-smoke, smoke, quick, extensive.

`competitive-bench`: internal external-comparison harness

Adapters: Crossbeam, Iceoryx2, Rusteron (Aeron), shmipc, ZeroMQ (IPC / IPC-abs / TCP), Boost.MQ, OpenMPI.
Tier ladder matching perf-bench.
Aggregate report and Pareto-frontier summary per run.

Bindings

Python.
C / C++.
Zig.

Safety & quality gates

Both crates do non-trivial unsafe work internally (raw ring slots, SHM mmap, atomic ordering, manual cache-line layout). The public API is safe Rust, with a small set of documented escape hatches.

Tier 1: public-surface contract (required for crates.io)

myelon public surface: zero pub unsafe fn.
disruptor-mp public surface: four documented pub unsafe fn escape hatches in observability (CountersFile::init, ::attach, ::from_ptr, AggregatorHandle::spawn); each carries a # Safety rustdoc section.
unsafe_op_in_unsafe_fn = warn workspace-wide.
clippy::missing_safety_doc = warn workspace-wide.
cargo clippy --workspace --all-targets -- -D warnings clean.

Tier 2: internal correctness

DST harness: assert_always / assert_sometimes + BUGGIFY fault injection, FoundationDB + TigerBeetle-style. Run with RUSTFLAGS="--cfg dst".
Per-block // SAFETY: comment on every internal unsafe { ... } block. Currently 12 / 81 blocks ≈ 15% coverage; backfill scheduled for v0.1.0-alpha.2.
cargo miri test lane (pointer-math helpers; not the syscall paths).
AddressSanitizer lane.
ThreadSanitizer lane.

Tier 3: enforcement and audit

clippy::undocumented_unsafe_blocks = warn enabled workspace-wide (after Tier 2 backfill).
Safe wrappers around the four pub unsafe fn (boxed(), with_shm_segment()); the four escape hatches stay as power-user surface.
cargo geiger audit reported.

Platform support

Linux: officially supported.
macOS: officially supported.
Windows: unsupported.

Acknowledgements

LMAX Disruptor by Martin Thompson & team for the original lock-free ring-buffer single process multi-threaded design and the mechanical-sympathy mindset behind it.
disruptor-rs by Nicholas Schultz-Møller for the single-process multi-threaded Rust port that disruptor-mp extends.
vLLM shm_broadcast.py by Kaichao You for the SOTA Python shared-memory broadcast fabric used in intra-node inter-process inference worker processes.
Jeff Dean and Dr Bill Dally, Advancing to AI's Next Frontier, NVIDIA GTC 2026 for stating the systems point clearly: at the ultra-low-latency edge of inference, the bulk of the delay is communication latency.

Resources

LMAX Disruptor & mechanical sympathy:

Built with

Agentic engineering, using:

Citation

If you use myelon or disruptor-mp in research or downstream work, cite this repository.

Repository: https://github.com/Venkat2811/myelon

Twitter/X: @venkat_systems

Formal citation metadata lives in CITATION.cff. BibTeX entries can live in CITATION.bib.

License

Licensed under either of:

Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Issues, Feedback, Discussions, PR are welcome & appreciated !

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
.github		.github
assets		assets
book		book
crates		crates
examples		examples
scripts		scripts
.gitignore		.gitignore
.gitmodules		.gitmodules
CHANGELOG.md		CHANGELOG.md
CITATION.bib		CITATION.bib
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
codecov.yml		codecov.yml
rust-toolchain.toml		rust-toolchain.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

myelon

Headline numbers

Signal ceiling, consumer scaling (no payload)

Raw ping-pong, 1p1c, 64B, shm, busyspin

Framed ping-pong, payload scaling (1p1c, shm, throughput mode)

Framed broadcast, consumer × payload scaling (mmap, throughput mode)

Typed zero-copy ping-pong (shm, rkyv, throughput mode)

Charts

Quick start

Repository layout

Validation and benchmarks

Features

`disruptor-mp`: raw multiprocess substrate

`myelon`: layered transport on top of `disruptor-mp`

`myelon-dst`: internal deterministic-simulation harness

`perf-bench`: internal broad transport sweep harness

`competitive-bench`: internal external-comparison harness

Bindings

Safety & quality gates

Platform support

Acknowledgements

Resources

Built with

Citation

License

Contribution

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

myelon

Headline numbers

Signal ceiling, consumer scaling (no payload)

Raw ping-pong, 1p1c, 64B, shm, busyspin

Framed ping-pong, payload scaling (1p1c, shm, throughput mode)

Framed broadcast, consumer × payload scaling (mmap, throughput mode)

Typed zero-copy ping-pong (shm, rkyv, throughput mode)

Charts

Quick start

Repository layout

Validation and benchmarks

Features

disruptor-mp: raw multiprocess substrate

myelon: layered transport on top of disruptor-mp

myelon-dst: internal deterministic-simulation harness

perf-bench: internal broad transport sweep harness

competitive-bench: internal external-comparison harness

Bindings

Safety & quality gates

Platform support

Acknowledgements

Resources

Built with

Citation

License

Contribution

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`disruptor-mp`: raw multiprocess substrate

`myelon`: layered transport on top of `disruptor-mp`

`myelon-dst`: internal deterministic-simulation harness

`perf-bench`: internal broad transport sweep harness

`competitive-bench`: internal external-comparison harness

Packages