GitHub - murrdb/murr: Sub-millisecond cache for ML/AI workloads. Parquets in, Arrow-Flight out.

🐱 What is Murr? · 🚀 Why Murr? · 🚫 Why NOT Murr? · ⚡ Quickstart · 📊 Benchmarks · 🗺 Roadmap

Murrdb: A RocksDB-based NVMe/S3 cache for AI inference workloads. A faster Redis replacement, optimized for batch low-latency zero-copy reads and writes.

This README.md is 99%¹ human written.

What is Murr?

Murr is a caching layer for ML/AI data serving that sits between your batch data pipelines and inference apps:

Tiered storage: hot data lives in memory, cold data stays on disk with S3-based replication. It's 2026, RAM is expensive - keep only the hot stuff there.
Batch-in, batch-out: native batch reads and writes over columnar storage, with no per-row overhead. Dumping 1GB Parquet/Arrow files into the ingestion API is a perfectly valid use case.

# yes this works for batch writes
curl -d @0000.parquet -H "Content-Type: application/vnd.apache.parquet" \
  -XPUT http://localhost:8080/api/v1/table/yolo/write

Zero-copy wire protocol: no conversion needed when building np.ndarray, pd.DataFrame or pt.Tensor from API responses. Sure, Redis is fast, but parsing its replies is not (especially in Python!).

result = db.read("docs", keys=["doc_1", "doc_3", "doc_5"], columns=["score", "category"])
print(result.to_pandas())  # look mom, zero copy!

Stateless: Murr is not a database - all state is persisted on S3. When a Redis node gets evicted, you're cooked. Murr just self-bootstraps from block storage.

Murr shines when:

your data is heavy and tabular: that giant Parquet dump on S3 your AI inference or ML prep job produces? Perfect fit.
reads are batched: pulling 100 columns across 1000 documents your agent wants to analyze? Great!
you care about costs: sure, Redis with 1TB of RAM will work fine, but disk/S3 offloading is operationally simpler and way cheaper.

Short quickstart (see full example):

uv pip install murrdb

and then

from murr.sync import Murr

db = Murr.start_local(cache_dir="/tmp/murr")  # embedded local instance

# fetch columns for a batch of document keys
result = db.read("docs", keys=["doc_1", "doc_3", "doc_5"], columns=["score", "category"])
print(result.to_pandas())

# Output:
#    score category
# 0   0.95       ml
# 1   0.72    infra
# 2   0.68      ops

Why Murr?

TLDR: latency, simplicity, cost -- pick two. Murrdb tries to nail all three: fastest, cheapest, and easiest to operate. A bold claim, I know.

For the typical use case of read N datapoints across M documents (an agent reading document attributes, an ML ranker fetching feature values), on top of being the fastest, Murrdb:

vs Redis: is persistent (S3 is the new filesystem) and can offload cold data to local NVMe.
vs embedded RocksDB: no need to build data sync between producer jobs and inference nodes yourself. Murrdb was designed to be distributed from the start.
vs DynamoDB: roughly 10x cheaper, since you only pay for CPU/RAM, not per query.

Not being a general-purpose database, it tries to be friendly to the everyday pain points of ML/AI engineers:

First-class Python support: pip install murrdb, then map to/from Numpy/Pandas/Polars/Pytorch arrays with zero copy.
Sparse columns: when a column has no data, it takes up zero bytes. Unlike the packed feature blob approach, where null columns aren't actually null.

Why NOT Murr?

Murr is not a general-purpose database:

OLTP workloads: if you have relations, transactions, and per-row reads/writes, go with Postgres.
Analytics: aggregating over entire tables to produce reports? Pick Clickhouse, BigQuery, or Snowflake.
General-purpose caching: need to cache user session data for a web app? Use Redis.
Feature store: yes, it kinda looks like one — but Murrdb doesn't govern how you compute and store your data. Murr is an online serving layer, and can be a part of both internal feature stores and open-source ones like Feast, Hopsworks, and Databricks Feature Store.

Warning

Murr is still in its early days and may not be stable enough for your use case yet. But it's improving quickly.

Quickstart

import pandas as pd
import pyarrow as pa
from murr import TableSchema, ColumnSchema, DType
from murr.sync import Murr

db = Murr.start_local(cache_dir="/tmp/murr")

# define table schema
schema = TableSchema(
    key="doc_id", # the key
    columns={
        "doc_id": ColumnSchema(dtype=DType.UTF8, nullable=False),
        "score": ColumnSchema(dtype=DType.FLOAT32),
        "category": ColumnSchema(dtype=DType.UTF8),
    },
)
db.create_table("docs", schema)

# write a batch of documents
df = pd.DataFrame.from_dict({
    "doc_id":   ["doc_1", "doc_2", "doc_3", "doc_4", "doc_5"],
    "score":    [0.95, 0.87, 0.72, 0.91, 0.68],
    "category": ["ml", "search", "infra", "ml", "ops"],
})
db.write("docs", pa.Table.from_pandas(df))

# fetch specific columns for a few keys
result = db.read("docs", keys=["doc_1", "doc_3", "doc_5"], columns=["score", "category"])
print(result.to_pandas())

# Output:
#   score category
# 0   0.95       ml
# 1   0.72    infra
# 2   0.68      ops

Benchmarks

Full benchmark suite with reproduction steps: murrdb/murr-benchmark.

We benchmark a typical ML Ranking use case: 100M rows, 10 float32 columns, 1000 random key lookups per iteration. The suite includes two complementary harnesses:

Rust (Criterion) — measures raw service throughput as time-to-last-byte. Reads select_rows random keys per iteration and consumes raw response bytes without decoding. This isolates the storage/network layer and shows the theoretical ceiling of each backend.
Python (pyperf) — measures end-to-end latency as experienced by a Python ML client. Performs the same random-key reads but includes full protocol decoding and conversion into a pd.DataFrame. This captures the real cost a user pays: protocol parsing, byte deserialization, and DataFrame construction.

Backends and data layouts tested:

murr (native, Arrow IPC) — row-wise storage on top of RocksDB SSTables, with zero-copy reads and projection pushdown. Two modes: mmap (PlainTable, in-memory) and block (BlockTable, NVMe-backed).
Redis / Valkey / Dragonfly, blob — all features packed into a single MGET blob. Compact and cache-friendly, but always reads all columns.
Redis / Valkey / Dragonfly, HSET — Feast-style hash-per-row: each feature is a separate HSET field. Flexible, but per-field overhead adds up.
PostgreSQL blob — BYTEA column with packed features.
PostgreSQL col-per-feature — explicit typed columns, one per feature.

Rust time-to-last-byte

All backends run on the same machine; container-backed ones use Docker via testcontainers. Memory is the container TOTAL (RSS+SHR) delta around the load phase. Net TX is server-to-client bytes per read. disk variants are cgroup-capped at 2 GiB RAM to force disk reads.

Blob layouts

Engine	Layout	Memory	Disk	Ingestion	p50 latency	Net TX/read
murr 0.2.0 mmap	native	7.5 GiB	5.9 GiB	948K rows/s	268 µs	42 KiB
Dragonfly 1.31	blob	7.3 GiB	—	4.01M rows/s	296 µs	46 KiB
Valkey 8.1	blob	8.9 GiB	—	1.58M rows/s	657 µs	46 KiB
Redis 8.6.3	blob	9.6 GiB	—	1.43M rows/s	815 µs	46 KiB
pgsql 18.4	blob	24.0 GiB	12.8 GiB	400K rows/s	5.69 ms	62 KiB

Hash / col-per-feature layouts

Engine	Layout	Memory	Disk	Ingestion	p50 latency	Net TX/read
murr 0.2.0 mmap	native	7.5 GiB	5.9 GiB	948K rows/s	268 µs	42 KiB
Dragonfly 1.31	hash	20.1 GiB	—	650K rows/s	2.82 ms	213 KiB
Valkey 8.1	hash	19.4 GiB	—	378K rows/s	3.20 ms	210 KiB
Redis 8.6.3	hash	20.1 GiB	—	398K rows/s	3.25 ms	210 KiB
pgsql 18.4	col	23.4 GiB	12.7 GiB	384K rows/s	6.54 ms	86 KiB

Disk mode (2 GiB RAM cap)

Engine	Layout	Memory	Disk	Ingestion	p50 latency	Net TX/read
murr 0.2.0 block	native	1.7 GiB	5.8 GiB	1.00M rows/s	6.33 ms	42 KiB
pgsql 18.4	blob	2.0 GiB	12.8 GiB	329K rows/s	189 ms	62 KiB
pgsql 18.4	col	2.0 GiB	12.7 GiB	327K rows/s	217 ms	86 KiB

Python end-to-end

Measures full round-trip latency including protocol decoding and pd.DataFrame conversion. Ingestion throughput includes Python-side serialization and batch writes.

Blob layouts

Engine	Layout	Ingestion	Read latency
murr 0.2.0 mmap	native	1.06M rows/s	1.08 ms
Dragonfly	blob	524K rows/s	1.68 ms
Valkey 8.1	blob	436K rows/s	2.04 ms
Redis 8.6.3	blob	421K rows/s	2.46 ms
pgsql 18.4	blob	298K rows/s	28.6 ms

Hash / col-per-feature layouts

Engine	Layout	Ingestion	Read latency
murr 0.2.0 mmap	native	1.06M rows/s	1.08 ms
Dragonfly	hash	64K rows/s	8.25 ms
Valkey 8.1	hash	62K rows/s	8.63 ms
Redis 8.6.3	hash	62K rows/s	8.50 ms
pgsql 18.4	col	271K rows/s	13.8 ms

Disk mode (2 GiB RAM cap)

Engine	Layout	Ingestion	Read latency
murr 0.2.0 block	native	662K rows/s	6.69 ms
pgsql 18.4	blob	317K rows/s	171 ms
pgsql 18.4	col	303K rows/s	153 ms

Murr is ~3x faster than Redis on packed-blob reads and ~12x faster on Feast-style HSET layout, while using ~3x less RAM than the HSET equivalent. Dragonfly's packed-blob mode is close on latency, but still pays the protocol-parsing cost on the client.

Roadmap

No ETAs, but at least you can see where things stand:

Development

cargo build                  # Build the project
cargo test                   # Run all tests
cargo check                  # Fast syntax/type check
cargo clippy                 # Linting
cargo fmt                    # Format code
cargo bench --bench <name>   # Run a benchmark (multi_segment_index_bench, row_vs_col_bench)

License

Apache 2.0

Used only for grammar and syntax checking. ↩

Name		Name	Last commit message	Last commit date
Latest commit History 247 Commits
.github		.github
.memory		.memory
benches		benches
doc/img		doc/img
src		src
tests		tests
util		util
website		website
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
openapi.yaml		openapi.yaml
release.sh		release.sh
run_bench.sh		run_bench.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is Murr?

Why Murr?

Why NOT Murr?

Quickstart

Benchmarks

Rust time-to-last-byte

Blob layouts

Hash / col-per-feature layouts

Disk mode (2 GiB RAM cap)

Python end-to-end

Blob layouts

Hash / col-per-feature layouts

Disk mode (2 GiB RAM cap)

Roadmap

Development

License

About

Uh oh!

Releases 17

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

What is Murr?

Why Murr?

Why NOT Murr?

Quickstart

Benchmarks

Rust time-to-last-byte

Blob layouts

Hash / col-per-feature layouts

Disk mode (2 GiB RAM cap)

Python end-to-end

Blob layouts

Hash / col-per-feature layouts

Disk mode (2 GiB RAM cap)

Roadmap

Development

License

Footnotes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 17

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages