Three talks in one room TOMORROW NIGHT in Menlo Park: ingestion, retrieval, and metadata lineage — the parts of the AI stack that most teams are still stitching together manually. ChanChan M. will walk through Lance, the default storage layer for multimodal AI. One table for raw bytes, embeddings, and features, without the export pipelines and stale snapshots that come with a separate vector DB and data lake. Joining us: elvis kahoro from dltHub on the connective tissue of your AI data stack, and Gabe Lyons & Manuela Wei from DataHub on how trusted context makes Cortex agents more reliable in production. Doors open at 6pm, see you there! 📅 Thur May 21 6PM, SVAI @ Menlo Park 🔗 Register: https://lnkd.in/gErcViP9
LanceDB
Information Services
San Francisco, California 11,247 followers
Developer-friendly, open source database for multi-modal AI
About us
LanceDB is a developer-friendly, open source database for multimodal AI. From hyper scalable vector search to advanced retrieval for RAG, from streaming training data to interactive exploration of large scale AI datasets, LanceDB is the best foundation for your AI application.
- Website
-
http://lancedb.com
External link for LanceDB
- Industry
- Information Services
- Company size
- 11-50 employees
- Headquarters
- San Francisco, California
- Type
- Privately Held
- Founded
- 2022
Locations
-
Primary
Get directions
San Francisco, California, US
Employees at LanceDB
Updates
-
LanceDB reposted this
The AI stack is evolving fast, but reliable data movement is still the foundation. As AI systems become more complex, connecting ingestion, retrieval, and metadata layers reliably is becoming a core engineering challenge, and that’s what we’ll be discussing live in Menlo Park. We're joining forces with LanceDB and DataHub for a Community Meetup on May 21 at Silicon Valley AI Hub, and the lineup is built for engineers in the trenches: - Lance as the default storage layer for multimodal AI, LanceDB - The connective tissue of your AI data stack, dltHub - Lineage you can trust: Supercharging Cortex with DataHub No fluff, just the engineers actually building this stuff, showing you how it works in production. 📅 Thursday, May 21 · 6:00 PM - 8:00 PM PDT · Menlo Park, CA 🍕 Talks, demos, networking, food & drinks 🔗 Join us here: https://luma.com/80pocni3
-
With LeRobot v3's default layout, exploring a dataset before training means pulling more data than you need. 𝗹𝗲𝗿𝗼𝗯𝗼𝘁-𝗹𝗮𝗻𝗰𝗲𝗱𝗯 changes that. Open any Lance-formatted dataset from the Hub via hf:// and filter by episode_index, frame_index, or task metadata before touching a video blob. From there, materialize your slice, attach embeddings, add columns, and pass it to LeRobotLanceDataset — a drop-in for LeRobotDataset, existing PyTorch code unchanged. This gives robotics teams fast random access, lazy multimodal blob reads, and a cleaner path from dataset curation to training. Filter, curate, then train — all from one table: https://lnkd.in/grSCkU6P
-
What happens when a duck with a shield walks onto a yacht? We're still not entirely sure — but it involved LanceDB, MotherDuck, Theory Ventures and a boat full of builders talking AI and data infrastructure on a sunset cruise around the SF Bay. Thanks to everyone who joined us. Ship happens. ⚓
-
-
Lance for robotics 🤝🤗🤖
The repo is public ! https://lnkd.in/e3Vc-cf3
-
-
LanceDB reposted this
You know the talk was good when the speaker can’t make it out of the room after. We kicked off our AI Council AI Engineering track afternoon session with Co-founder & CTO of LanceDB, Lei Xu. Multimodal AI is exposing the limits of traditional data infra. Once video, embeddings, metadata, and training data are split across different systems, teams lose a clean way to search and debug what’s happening as datasets keep changing. Lei’s argument is for a versioned system of record for multimodal data that keeps data and indexes together and lets teams update datasets incrementally instead of constantly creating new copies. That shift feels especially important as model development gets more data-intensive and research velocity depends more on how quickly teams can trace and iterate on the underlying data. Recording will be posted to the AI Council youtube in a few weeks.
-
-
LanceDB reposted this
From Theory, MotherDuck , and LanceDB : we salute you AI Council, and the conference attendees. If you missed our boat party, we hope you’ll join us in the future. 🫡
-
-
Hannes Mühleisen just announced Quack, the DuckDB Client-Server Protocol, at Day 1 of AI Council — and it's a natural fit for Lance. Load the Lance extension on the DuckDB server, attach a Lance namespace, and any DuckDB client can query those tables over the wire. Here's how it works: → Lance handles storage and catalog — local directories, object stores, REST namespaces → Quack handles remote DuckDB execution → Clients query Lance tables directly via quack_query(), no changes to your SQL or data layout DuckDB Quack announcement: https://lnkd.in/gWXnZzhv
-
-
Past 1B vectors, three things break: the index won't fit on one node, centroid scans go linear, and RaBitQ rotation costs O(d²) per query at production embedding dimensions. → Segments built in parallel — 5x faster index construction with 10 nodes, build time bounded by the slowest segment → Plan Executors per segment — add workers to scale throughput, per-query latency doesn’t change → HNSW over centroids replaces the linear scan; Walsh-Hadamard rotation drops RaBitQ prep from O(d²) to O(d log d) More on the architectural decisions and benchmark numbers at 10B scale: https://lnkd.in/gH-KBmvH
-
Vector search gets expensive at scale because the index has to live in RAM. Bigger dataset, bigger instance — you're paying for memory, not queries. LanceDB stores the index in S3 and memory-maps it. Only the pages a query touches get loaded, so RAM scales with QPS not dataset size. At 100M docs (1152-dim, SQ8): ~$779/month — $397 compute, $4 S3 index, ~$10 GETs at 10 QPS. At 10M: ~$148/mo. At 1M: ~$65/mo. Benchmarked on 287K COCO 2017 images, SigLIP 2 embeddings, IVF_HNSW_SQ — above 0.95 recall@10, sub-50ms p95 on a single node. Full cost breakdown and OpenSearch comparison at the same scale: https://lnkd.in/gAbQ62m7