LanceDB

LanceDB · 2026-05-12T15:35:23.594Z

Vector search gets expensive at scale because the index has to live in RAM. Bigger dataset, bigger instance — you're paying for memory, not queries. LanceDB stores the index in S3 and memory-maps it. Only the pages a query touches get loaded, so RAM scales with QPS not dataset size. At 100M docs (1152-dim, SQ8): ~$779/month — $397 compute, $4 S3 index, ~$10 GETs at 10 QPS. At 10M: ~$148/mo. At 1M: ~$65/mo. Benchmarked on 287K COCO 2017 images, SigLIP 2 embeddings, IVF_HNSW_SQ — above 0.95 recall@10, sub-50ms p95 on a single node. Full cost breakdown and OpenSearch comparison at the same scale: https://lnkd.in/gAbQ62m7

Information Services

San Francisco, California 11,247 followers

Developer-friendly, open source database for multi-modal AI

See jobs Follow

View all 47 employees

About us

LanceDB is a developer-friendly, open source database for multimodal AI. From hyper scalable vector search to advanced retrieval for RAG, from streaming training data to interactive exploration of large scale AI datasets, LanceDB is the best foundation for your AI application.

Website: http://lancedb.com
External link for LanceDB
Industry: Information Services
Company size: 11-50 employees
Headquarters: San Francisco, California
Type: Privately Held
Founded: 2022

Locations

Primary

San Francisco, California, US

Get directions

Employees at LanceDB

See all employees

Updates

LanceDB

11,247 followers
8h Edited
Report this post
Three talks in one room TOMORROW NIGHT in Menlo Park: ingestion, retrieval, and metadata lineage — the parts of the AI stack that most teams are still stitching together manually. ChanChan M. will walk through Lance, the default storage layer for multimodal AI. One table for raw bytes, embeddings, and features, without the export pipelines and stale snapshots that come with a separate vector DB and data lake. Joining us: elvis kahoro from dltHub on the connective tissue of your AI data stack, and Gabe Lyons & Manuela Wei from DataHub on how trusted context makes Cortex agents more reliable in production. Doors open at 6pm, see you there! 📅 Thur May 21 6PM, SVAI @ Menlo Park 🔗 Register: https://lnkd.in/gErcViP9

The missing data layer for ML: dltHub x LanceDB x DataHub @ SVAI · Luma luma.com

2 Comments

Like Comment Share
LanceDB reposted this
dltHub

13,135 followers
2d
Report this post
The AI stack is evolving fast, but reliable data movement is still the foundation. As AI systems become more complex, connecting ingestion, retrieval, and metadata layers reliably is becoming a core engineering challenge, and that’s what we’ll be discussing live in Menlo Park. We're joining forces with LanceDB and DataHub for a Community Meetup on May 21 at Silicon Valley AI Hub, and the lineup is built for engineers in the trenches: - Lance as the default storage layer for multimodal AI, LanceDB - The connective tissue of your AI data stack, dltHub - Lineage you can trust: Supercharging Cortex with DataHub No fluff, just the engineers actually building this stuff, showing you how it works in production. 📅 Thursday, May 21 · 6:00 PM - 8:00 PM PDT · Menlo Park, CA 🍕 Talks, demos, networking, food & drinks 🔗 Join us here: https://luma.com/80pocni3

The missing data layer for ML: dltHub x LanceDB x DataHub @ SVAI · Luma luma.com

1 Comment

Like Comment Share
LanceDB

11,247 followers
1d
Report this post
With LeRobot v3's default layout, exploring a dataset before training means pulling more data than you need. 𝗹𝗲𝗿𝗼𝗯𝗼𝘁-𝗹𝗮𝗻𝗰𝗲𝗱𝗯 changes that. Open any Lance-formatted dataset from the Hub via hf:// and filter by episode_index, frame_index, or task metadata before touching a video blob. From there, materialize your slice, attach embeddings, add columns, and pass it to LeRobotLanceDataset — a drop-in for LeRobotDataset, existing PyTorch code unchanged. This gives robotics teams fast random access, lazy multimodal blob reads, and a cleaner path from dataset curation to training. Filter, curate, then train — all from one table: https://lnkd.in/grSCkU6P

LeRobotDataset - LanceDB docs.lancedb.com

Like Comment Share
LanceDB

11,247 followers
2d
Report this post
What happens when a duck with a shield walks onto a yacht? We're still not entirely sure — but it involved LanceDB, MotherDuck, Theory Ventures and a boat full of builders talking AI and data infrastructure on a sunset cruise around the SF Bay. Thanks to everyone who joined us. Ship happens. ⚓
3 Comments

Like Comment Share
LanceDB

11,247 followers
5d
Report this post
Lance for robotics 🤝🤗🤖
Quentin Lhoest
5d

The repo is public ! https://lnkd.in/e3Vc-cf3
Like Comment Share
LanceDB reposted this
Dhruv Singh
1w
Report this post
You know the talk was good when the speaker can’t make it out of the room after. We kicked off our AI Council AI Engineering track afternoon session with Co-founder & CTO of LanceDB, Lei Xu. Multimodal AI is exposing the limits of traditional data infra. Once video, embeddings, metadata, and training data are split across different systems, teams lose a clean way to search and debug what’s happening as datasets keep changing. Lei’s argument is for a versioned system of record for multimodal data that keeps data and indexes together and lets teams update datasets incrementally instead of constantly creating new copies. That shift feels especially important as model development gets more data-intensive and research velocity depends more on how quickly teams can trace and iterate on the underlying data. Recording will be posted to the AI Council youtube in a few weeks.
Like Comment Share
LanceDB reposted this
Theory Ventures

9,056 followers
6d
Report this post
From Theory, MotherDuck , and LanceDB : we salute you AI Council, and the conference attendees. If you missed our boat party, we hope you’ll join us in the future. 🫡
12 Comments

Like Comment Share
LanceDB

11,247 followers
6d
Report this post
Hannes Mühleisen just announced Quack, the DuckDB Client-Server Protocol, at Day 1 of AI Council — and it's a natural fit for Lance. Load the Lance extension on the DuckDB server, attach a Lance namespace, and any DuckDB client can query those tables over the wire. Here's how it works: → Lance handles storage and catalog — local directories, object stores, REST namespaces → Quack handles remote DuckDB execution → Clients query Lance tables directly via quack_query(), no changes to your SQL or data layout DuckDB Quack announcement: https://lnkd.in/gWXnZzhv
Like Comment Share
LanceDB

11,247 followers
1w
Report this post
Past 1B vectors, three things break: the index won't fit on one node, centroid scans go linear, and RaBitQ rotation costs O(d²) per query at production embedding dimensions. → Segments built in parallel — 5x faster index construction with 10 nodes, build time bounded by the slowest segment → Plan Executors per segment — add workers to scale throughput, per-query latency doesn’t change → HNSW over centroids replaces the linear scan; Walsh-Hadamard rotation drops RaBitQ prep from O(d²) to O(d log d) More on the architectural decisions and benchmark numbers at 10B scale: https://lnkd.in/gH-KBmvH

How LanceDB Accelerates Vector Search at 10 Billion Scale lancedb.com

Like Comment Share
LanceDB

11,247 followers
1w
Report this post
Vector search gets expensive at scale because the index has to live in RAM. Bigger dataset, bigger instance — you're paying for memory, not queries. LanceDB stores the index in S3 and memory-maps it. Only the pages a query touches get loaded, so RAM scales with QPS not dataset size. At 100M docs (1152-dim, SQ8): ~$779/month — $397 compute, $4 S3 index, ~$10 GETs at 10 QPS. At 10M: ~$148/mo. At 1M: ~$65/mo. Benchmarked on 287K COCO 2017 images, SigLIP 2 embeddings, IVF_HNSW_SQ — above 0.95 recall@10, sub-50ms p95 on a single node. Full cost breakdown and OpenSearch comparison at the same scale: https://lnkd.in/gAbQ62m7

OpenSearch vs LanceDB for Vector Search: Query Cost and Infrastructure lancedb.com

Like Comment Share

Browse jobs

Funding

LanceDB 3 total rounds

Last Round

Series A Jul 24, 2025

US$ 30.0M

Investors

Theory Ventures + 6 Other investors

See more info on crunchbase

LanceDB

Information Services

San Francisco, California 11,247 followers

Developer-friendly, open source database for multi-modal AI

About us

Locations

Employees at LanceDB

David Wang

Dave Unger

Peter Ebert

Catherine Chung

Updates

Join now to see what you are missing

Similar pages

Eventual

DuckDB

Apache DataFusion

Polars

Pinecone

Qdrant

Kuzu

AtoB

Spice AI

Warp

Browse jobs

Engineer jobs

Developer jobs

System Operations Engineer jobs

Staff Scientist jobs

Enterprise Account Executive jobs

Database Administrator jobs

Director of Engineering jobs

Site Reliability Engineer jobs

Engineering Manager jobs

Scientist jobs

Intern jobs

Software Engineer jobs

Senior Data Analyst jobs

Full Stack Engineer jobs

Marketing Manager jobs

Lead jobs

Legal Counsel jobs

Contract Manager jobs

Machine Learning Engineer jobs

Lawyer jobs

Funding