174 releases (12 stable)
Uses new Rust 2024
| new 6.0.0 | May 11, 2026 |
|---|---|
| 4.0.1 | Apr 24, 2026 |
| 4.0.0 | Mar 30, 2026 |
| 3.0.1 | Mar 19, 2026 |
| 0.0.1-alpha0 | Jul 28, 2022 |
#9 in Machine learning
214,455 downloads per month
Used in 124 crates
(27 directly)
13MB
287K
SLoC
Rust Implementation of Lance
The Open Lakehouse Format for Multimodal AI
Installation
Install using cargo:
cargo install lance
Examples
Create dataset
Suppose batches is an Arrow Vec<RecordBatch> and schema is Arrow SchemaRef:
use lance::{dataset::WriteParams, Dataset};
let write_params = WriteParams::default();
let mut reader = RecordBatchIterator::new(
batches.into_iter().map(Ok),
schema
);
Dataset::write(reader, &uri, Some(write_params)).await.unwrap();
Read
let dataset = Dataset::open(path).await.unwrap();
let mut scanner = dataset.scan();
let batches: Vec<RecordBatch> = scanner
.try_into_stream()
.await
.unwrap()
.map(|b| b.unwrap())
.collect::<Vec<RecordBatch>>()
.await;
Take
let values: Result<RecordBatch> = dataset.take(&[200, 199, 39, 40, 100], &projection).await;
Vector index
Assume "embeddings" is a FixedSizeListArray
use ::lance::index::vector::VectorIndexParams;
let params = VectorIndexParams::default();
params.num_partitions = 256;
params.num_sub_vectors = 16;
// this will Err if list_size(embeddings) / num_sub_vectors does not meet simd alignment
dataset.create_index(&["embeddings"], IndexType::Vector, None, ¶ms, true).await;
What is Lance?
Lance is an open lakehouse format for multimodal AI. It contains a file format, table format, and catalog spec that allows you to build a complete lakehouse on top of object storage to power your AI workflows.
The key features of Lance include:
-
Expressive hybrid search: Combine vector similarity search, full-text search (BM25), and SQL analytics on the same dataset with accelerated secondary indices.
-
Lightning-fast random access: 100x faster than Parquet or Iceberg for random access without sacrificing scan performance.
-
Native multimodal data support: Store images, videos, audio, text, and embeddings in a single unified format with efficient blob encoding and lazy loading.
-
Data evolution: Efficiently add columns with backfilled values without full table rewrites, perfect for ML feature engineering.
-
Zero-copy versioning: ACID transactions, time travel, and automatic versioning without needing extra infrastructure.
-
Rich ecosystem integrations: Apache Arrow, Pandas, Polars, DuckDB, Apache Spark, Ray, Trino, Apache Flink, and open catalogs (Apache Polaris, Unity Catalog, Apache Gravitino).
For more details, see the full Lance format specification.
Dependencies
~100–150MB
~2.5M SLoC