LanceDB’s cover photo
LanceDB

LanceDB

Information Services

San Francisco, California 11,247 followers

Developer-friendly, open source database for multi-modal AI

About us

LanceDB is a developer-friendly, open source database for multimodal AI. From hyper scalable vector search to advanced retrieval for RAG, from streaming training data to interactive exploration of large scale AI datasets, LanceDB is the best foundation for your AI application.

Website
http://lancedb.com
Industry
Information Services
Company size
11-50 employees
Headquarters
San Francisco, California
Type
Privately Held
Founded
2022

Locations

Employees at LanceDB

Updates

  • View organization page for LanceDB

    11,247 followers

    Three talks in one room TOMORROW NIGHT in Menlo Park: ingestion, retrieval, and metadata lineage — the parts of the AI stack that most teams are still stitching together manually. ChanChan M. will walk through Lance, the default storage layer for multimodal AI. One table for raw bytes, embeddings, and features, without the export pipelines and stale snapshots that come with a separate vector DB and data lake. Joining us: elvis kahoro from dltHub on the connective tissue of your AI data stack, and Gabe Lyons & Manuela Wei from DataHub on how trusted context makes Cortex agents more reliable in production. Doors open at 6pm, see you there! 📅 Thur May 21 6PM, SVAI @ Menlo Park 🔗 Register: https://lnkd.in/gErcViP9

  • LanceDB reposted this

    View organization page for dltHub

    13,135 followers

    The AI stack is evolving fast, but reliable data movement is still the foundation. As AI systems become more complex, connecting ingestion, retrieval, and metadata layers reliably is becoming a core engineering challenge, and that’s what we’ll be discussing live in Menlo Park. We're joining forces with LanceDB and DataHub for a Community Meetup on May 21 at Silicon Valley AI Hub, and the lineup is built for engineers in the trenches: - Lance as the default storage layer for multimodal AI, LanceDB - The connective tissue of your AI data stack, dltHub - Lineage you can trust: Supercharging Cortex with DataHub No fluff, just the engineers actually building this stuff, showing you how it works in production. 📅 Thursday, May 21 · 6:00 PM - 8:00 PM PDT · Menlo Park, CA 🍕 Talks, demos, networking, food & drinks 🔗 Join us here: https://luma.com/80pocni3

  • With LeRobot v3's default layout, exploring a dataset before training means pulling more data than you need. 𝗹𝗲𝗿𝗼𝗯𝗼𝘁-𝗹𝗮𝗻𝗰𝗲𝗱𝗯 changes that. Open any Lance-formatted dataset from the Hub via hf:// and filter by episode_index, frame_index, or task metadata before touching a video blob. From there, materialize your slice, attach embeddings, add columns, and pass it to LeRobotLanceDataset — a drop-in for LeRobotDataset, existing PyTorch code unchanged. This gives robotics teams fast random access, lazy multimodal blob reads, and a cleaner path from dataset curation to training. Filter, curate, then train — all from one table: https://lnkd.in/grSCkU6P

  • LanceDB reposted this

    You know the talk was good when the speaker can’t make it out of the room after. We kicked off our AI Council AI Engineering track afternoon session with Co-founder & CTO of LanceDB, Lei Xu. Multimodal AI is exposing the limits of traditional data infra. Once video, embeddings, metadata, and training data are split across different systems, teams lose a clean way to search and debug what’s happening as datasets keep changing. Lei’s argument is for a versioned system of record for multimodal data that keeps data and indexes together and lets teams update datasets incrementally instead of constantly creating new copies. That shift feels especially important as model development gets more data-intensive and research velocity depends more on how quickly teams can trace and iterate on the underlying data. Recording will be posted to the AI Council youtube in a few weeks.

    • No alternative text description for this image
  • Hannes Mühleisen just announced Quack, the DuckDB Client-Server Protocol, at Day 1 of AI Council — and it's a natural fit for Lance. Load the Lance extension on the DuckDB server, attach a Lance namespace, and any DuckDB client can query those tables over the wire. Here's how it works: → Lance handles storage and catalog — local directories, object stores, REST namespaces → Quack handles remote DuckDB execution → Clients query Lance tables directly via quack_query(), no changes to your SQL or data layout DuckDB Quack announcement: https://lnkd.in/gWXnZzhv

    • No alternative text description for this image
  • Past 1B vectors, three things break: the index won't fit on one node, centroid scans go linear, and RaBitQ rotation costs O(d²) per query at production embedding dimensions. → Segments built in parallel — 5x faster index construction with 10 nodes, build time bounded by the slowest segment → Plan Executors per segment — add workers to scale throughput, per-query latency doesn’t change → HNSW over centroids replaces the linear scan; Walsh-Hadamard rotation drops RaBitQ prep from O(d²) to O(d log d) More on the architectural decisions and benchmark numbers at 10B scale: https://lnkd.in/gH-KBmvH

  • Vector search gets expensive at scale because the index has to live in RAM. Bigger dataset, bigger instance — you're paying for memory, not queries. LanceDB stores the index in S3 and memory-maps it. Only the pages a query touches get loaded, so RAM scales with QPS not dataset size. At 100M docs (1152-dim, SQ8): ~$779/month — $397 compute, $4 S3 index, ~$10 GETs at 10 QPS. At 10M: ~$148/mo. At 1M: ~$65/mo. Benchmarked on 287K COCO 2017 images, SigLIP 2 embeddings, IVF_HNSW_SQ — above 0.95 recall@10, sub-50ms p95 on a single node. Full cost breakdown and OpenSearch comparison at the same scale: https://lnkd.in/gAbQ62m7

Similar pages

Browse jobs

Funding

LanceDB 3 total rounds

Last Round

Series A

US$ 30.0M

See more info on crunchbase