This Library is still pre-0.1.0 the API is therefore in heavy flux, and everything should be considered alpha!
A small library for conveniently working with immutables bytes from different sources, providing zero-copy slicing and cloning.
Access itself is extremely cheap via no-op conversion to a &[u8]
.
The storage mechanism backing the bytes can be extended
and is implemented for a variety of sources already,
including other byte handling crates Bytes
, mmap-ed files,
String
s and Zerocopy
types.
See INVENTORY.md
for notes on possible cleanup and future functionality.
Bytes
decouples data access from lifetime management through two traits:
ByteSource
and ByteOwner
. A ByteSource
can yield a slice of its bytes and then convert itself into a ByteOwner
that
keeps the underlying storage alive. This separation lets callers obtain a
borrow of the bytes, drop any locks or external guards, and still retain the
data by storing the owner behind an Arc
. No runtime indirection is required
when constructing a Bytes
, and custom storage types integrate by
implementing ByteSource
.
use anybytes::Bytes;
fn main() {
// create `Bytes` from a vector
let bytes = Bytes::from(vec![1u8, 2, 3, 4]);
// take a zero-copy slice
let slice = bytes.slice(1..3);
// convert it to a typed View
let view = slice.view::<[u8]>().unwrap();
assert_eq!(&*view, &[2, 3]);
}
The full example is available in examples/quick_start.rs
.
Bytes
can also be created directly from an Arc
holding a byte container.
This avoids allocating another Arc
wrapper:
use anybytes::Bytes;
use std::sync::Arc;
let data = Arc::new(vec![1u8, 2, 3, 4]);
let bytes = Bytes::from(data.clone());
assert_eq!(bytes.as_ref(), data.as_slice());
Implementing ByteSource
for Arc<[u8]>
or Arc<Vec<u8>>
is therefore
unnecessary, since Bytes::from
already reuses the provided Arc
.
Bytes::try_unwrap_owner
allows recovering the original owner when no other
references exist.
use anybytes::Bytes;
let bytes = Bytes::from(vec![1u8, 2, 3]);
let vec = bytes.try_unwrap_owner::<Vec<u8>>().expect("unique owner");
assert_eq!(vec, vec![1, 2, 3]);
Bytes
can directly wrap memory-mapped files or other large buffers. Combined
with the view
module this enables simple parsing of structured
data without copying:
use anybytes::Bytes;
use zerocopy::{FromBytes, Immutable, KnownLayout};
#[derive(FromBytes, Immutable, KnownLayout)]
#[repr(C)]
struct Header { magic: u32, count: u32 }
// `file` can be any type that implements `memmap2::MmapAsRawDesc` such as
// `&std::fs::File` or `&tempfile::NamedTempFile`.
fn read_header(file: &std::fs::File) -> std::io::Result<anybytes::view::View<Header>> {
let bytes = unsafe { Bytes::map_file(file)? };
Ok(bytes.view().unwrap())
}
To map only a portion of a file use the unsafe helper
Bytes::map_file_region(file, offset, len)
.
Use ByteArea
to incrementally build immutable bytes on disk; each section can
yield a handle that reconstructs its range after the area is frozen:
use anybytes::area::ByteArea;
let mut area = ByteArea::new().unwrap();
let mut sections = area.sections();
let mut section = sections.reserve::<u8>(4).unwrap();
section.copy_from_slice(b"test");
let handle = section.handle();
let bytes = section.freeze().unwrap();
drop(sections);
let all = area.freeze().unwrap();
assert_eq!(handle.bytes(&all).as_ref(), bytes.as_ref());
assert_eq!(handle.view(&all).unwrap().as_ref(), b"test".as_ref());
Call area.persist(path)
to keep the temporary file instead of mapping it.
The area only aligns allocations to the element type and may share pages between adjacent sections to minimize wasted space. Multiple sections may be active simultaneously; their byte ranges do not overlap.
See examples/byte_area.rs
for a complete example
that reserves different typed sections, mutates them simultaneously, and then
either freezes the area into Bytes
or persists it to disk.
By default the crate enables the mmap
and zerocopy
features.
Other optional features provide additional integrations:
bytes
– support for thebytes
crate sobytes::Bytes
can act as aByteSource
.ownedbytes
– adds compatibility withownedbytes
and implements itsStableDeref
trait.mmap
– enables memory-mapped file handling via thememmap2
crate.zerocopy
– exposes theview
module for typed zero-copy access and allows usingzerocopy
types as sources.pyo3
– builds thepyanybytes
module to provide Python bindings forBytes
.winnow
– implements theStream
traits forBytes
and offers parsers (view
,view_elems(count)
) that return typedView
s.
Enabling the pyo3
feature requires the Python development headers and libraries
(for example libpython3.x
). Running cargo test --all-features
therefore
needs these libraries installed; otherwise disable the feature during testing.
examples/quick_start.rs
– the quick start shown aboveexamples/try_unwrap_owner.rs
– reclaim the owner when uniquely referencedexamples/pyanybytes.rs
– demonstrates thepyo3
feature usingPyAnyBytes
examples/from_python.rs
– wrap a Pythonbytes
object intoBytes
examples/python_winnow.rs
– parse Python bytes with winnowexamples/python_winnow_view.rs
– parse structured data from Python bytes using winnow'sview
examples/byte_area.rs
– reserve and mutate multiple typed sections, then either freeze the area intoBytes
or persist it to disk
Crate | Active | Extensible | mmap support | Zerocopy Integration | Pyo3 Integration | kani verified |
---|---|---|---|---|---|---|
anybytes | ✅ | ✅ | ✅ | ✅ | ✅ | 🚧 |
bytes | ✅ | ✅ | ✅1 | ❌ | ❌ | ❌ |
ownedbytes | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ |
minibytes | ✅2 | ✅ | ✅ | ❌ | ❌ | ❌ |
Run ./scripts/preflight.sh
from the repository root before committing. The
script formats the code and executes all tests using Python 3.12 for the pyo3
feature.
Kani proofs are executed separately with ./scripts/verify.sh
, which should be
run on a dedicated system. The script will install the Kani verifier
automatically. Verification can take a long time and isn't needed for quick
development iterations.
Bytes
– primary container type.ByteSource
– trait for objects that can provide bytes.ByteOwner
– keeps backing storage alive.view
module – typed zero-copy access to bytes.pyanybytes
module – Python bindings.
This library started as a fork of the minibyte library in facebooks sapling scm.
Thanks to @kylebarron for his feedback and ideas on Pyo3 integration.