14 releases (5 stable)
Uses new Rust 2024
| 1.3.1 | Apr 13, 2026 |
|---|---|
| 1.2.0 | Mar 16, 2026 |
| 1.0.0 | Jan 25, 2026 |
| 0.3.1 | Jan 6, 2026 |
| 0.1.1 | Jan 1, 2026 |
#937 in Concurrency
688 downloads per month
Used in 5 crates
(3 directly)
80KB
1.5K
SLoC
nexus-queue
High-performance SPSC, MPSC, and SPMC ring buffers for Rust, optimized for ultra-low-latency messaging.
Performance
Benchmarked on Intel Core Ultra 7 165U, 2.69 GHz, pinned to physical P-cores (0,2):
| Variant | p50 latency | p99 latency | Throughput |
|---|---|---|---|
| SPSC | 200 cycles (74 ns) | 210 cycles | 113 M msgs/sec |
| MPSC | 180 cycles (67 ns) | 304 cycles | — |
| SPMC | 169 cycles (63 ns) | 325 cycles | 47 M msgs/sec (1 consumer) |
| crossbeam (MPMC) | 520 cycles | 580 cycles | — |
All variants use a unified ring_buffer() constructor. See BENCHMARKS.md for detailed methodology and results.
Usage
use nexus_queue::spsc;
let (mut tx, mut rx) = spsc::ring_buffer::<u64>(1024);
// Producer thread
tx.push(42).unwrap();
// Consumer thread
assert_eq!(rx.pop(), Some(42));
Handling backpressure
use nexus_queue::Full;
// Spin until space is available
while tx.push(msg).is_err() {
std::hint::spin_loop();
}
// Or handle the full case
match tx.push(msg) {
Ok(()) => { /* sent */ }
Err(Full(returned_msg)) => { /* queue full, msg returned */ }
}
Disconnection detection
// Check if the other end has been dropped
if rx.is_disconnected() {
// Producer was dropped, drain remaining messages
}
if tx.is_disconnected() {
// Consumer was dropped, stop producing
}
Design
┌─────────────────────────────────────────────────────────────┐
│ Shared (Arc): │
│ tail: CachePadded<AtomicUsize> ← Producer writes │
│ head: CachePadded<AtomicUsize> ← Consumer writes │
│ buffer: *mut T │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────┐ ┌─────────────────────┐
│ Producer: │ │ Consumer: │
│ local_tail │ │ local_head │
│ cached_head │ │ cached_tail │
│ buffer (cached) │ │ buffer (cached) │
└─────────────────────┘ └─────────────────────┘
Producer and consumer write to separate cache lines (128-byte padding). Each endpoint caches the buffer pointer, mask, and the other's index locally, only refreshing from atomics when the cache indicates full/empty.
This design performs well on multi-socket NUMA systems where cache line ownership is important for latency.
Benchmarking
For accurate results, disable turbo boost and pin to physical cores:
# Build
cargo build -p nexus-queue --examples --release
# Run pinned to two cores
taskset -c 0,1 ./target/release/examples/bench_spsc
# For more stable results, disable turbo boost:
echo 1 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo
# Re-enable after:
echo 0 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo
Verify your core topology with lscpu -e — you want cores with different CORE numbers to avoid hyperthreading siblings.
Memory Ordering
Uses manual fencing for clarity and portability:
- Producer:
fence(Release)before publishing tail - Consumer:
fence(Acquire)after reading tail,fence(Release)before advancing head
On x86 these compile to no instructions (strong memory model), but they're required for correctness on ARM and other weakly-ordered architectures.
When to Use This
Use nexus-queue when:
- You know your producer/consumer topology at compile time
- You need the lowest possible latency
- You're building trading systems, audio pipelines, or real-time applications
Consider alternatives when:
- Multiple producers AND multiple consumers → use MPMC queues (crossbeam)
- You need async/await → use
tokio::sync::mpsc
License
MIT OR Apache-2.0
Dependencies
~105KB