LakeSail’s cover photo
LakeSail

LakeSail

Software Development

Spark rewritten in Rust. A new standard for unified compute, reimagined for modern data and AI infrastructure.

About us

LakeSail is a cloud-native platform redefining big data processing for the AI driven future. Its innovative, unified open source computation framework, Sail, is built entirely in Rust and runs ~4x faster than Spark while reducing hardware costs by up to 94%, while maintaining Spark compatibility. LakeSail's mission is to unify batch processing, stream processing, and compute-intensive AI workloads into a seamless framework engineered for unparalleled scalability and speed at a fraction of the cost.

Website
https://lakesail.com/
Industry
Software Development
Company size
11-50 employees
Headquarters
San Francisco
Type
Privately Held
Founded
2023

Locations

Employees at LakeSail

Updates

  • LakeSail reposted this

    It was a tremendous honor to represent LakeSail for the first time in my official capacity, as the head of community, at the Data for AI meetup today! Thank to Adi Wabisabi, who organized and moderated, and is Lisa N. Cao 2.0 at Datastrato lol, we had a deep dive into the semantic layer for the Agentic AI Lakehouse of the future. Kudos to my fellow panelists Josh W., Mark Hoerth, and Andrew Madson, and our fearless CEO Shehab Amin who took this photo, and was there along with Everett Roeth and Delly Tamer on our team and Kranti K. Parisa of LaserData, our blazing-fast, Rust-backed streaming partner!

    • No alternative text description for this image
    • No alternative text description for this image
  • Spark accelerators like Photon speed up portions of execution, but are still tied to the JVM. Sail, written in Rust, removes the JVM entirely. The result is drastically better performance with no JVM tuning required. The same familiar Spark API on a Rust-native runtime. That’s Sail.

  • Why fully rebuild Spark in Rust instead of just accelerating it? Spark accelerators speed up parts of execution, but they still inherit Spark's JVM control plane, memory model, shuffle path, and Python serialization costs. You're still tuning a JVM. With Sail, we took a clean-slate approach: a fully Rust-native runtime, with no JVM, no heap tuning, and no GC pauses. Same Spark interface with an entirely new runtime underneath. Read our full breakdown how Sail compares to Spark accelerators here → https://lnkd.in/gwyF8Fq2

    • No alternative text description for this image
  • LakeSail reposted this

    We wrote a book, with Codex, delving deep into LakeSail architecture. Sail is a modern data ecosystem with deep roots in Apache Arrow, Apache DataFusion, and rebuilding the whole Apache Spark ecosystem in Rust. One of the key advantages of AI-native stack is extensibility. We are convening the community to build Sail extensions. A proposal is on the table in lakehq/sail. In order to extend the engine as profoundly efficient as Sail, you need to operate at several levels — physical, logical plans, loading and linking, and performing at the top both in a single node and in cluster mode. The book is written as an exploration of the codebase, of the overall Sail architecture, its use of Arrow and DataFusion, its implementation of SparkConnect protocol, and everything else pertinent to the extensions. https://lnkd.in/gcH9puUX You’ll also learn Rust along the way, seeing it in action, doing heavy lifting.

  • LakeSail reposted this

    I'm excited to announce the panel for our next Data for AI on Jun 3rd at Yes SF! We have four experts in the field, who will be speaking on our panel on unifying enterprise-wide data for agents, each covering various dimensions of this challenge at scale. Josh W., LiveRamp: Josh is a Principal Architect at LiveRamp, where he's at the intersection of AI, data and MarTech. In case you missed his last Data for AI presentation, he's built a semantic middle layer that allows LiveRamp's agents to understand the context of data from over a dozen systems. They had to balance cost, security, auditability, and many other factors to make this initiative a success. Previously, he's held senior roles at Highnote, Coinbase, Twilio, and Salesforce, and holds a patent on database server access management. Mark Hoerth, Datastrato: Mark leads product at Datastrato, working on Apache Gravitino and the next chapter of open table formats for AI. He joined from Dremio, where he held product and solution architecture roles spanning Apache Iceberg lakehouse deployments and AI semantic search, and led Dremio's efforts security, Iceberg, and Apache Polaris. A Stanford alum based in the Bay Area, Mark is a longtime Silicon Valley startup builder. Andrew Madson, Fivetran: Andrew leads Developer Relations at Fivetran, where he builds programs that help developers and data teams adopt modern data and AI tooling. He's the author of O'Reilly's Apache Polaris: The Definitive Guide, with two more books on the way — AI-Ready Data (Wiley) and Data Transformation (O'Reilly). Andrew previously built DevRel functions at Tobiko Data and Dremio, and he teaches data science and engineering as a graduate professor. Alexy Khrabrov, LakeSail: Dr. Khrabrov is the Head of Community at LakeSail, building the Spark-compatible AI Lakehouse of tomorrow in Rust. He is also the founder of the Community Research Center for Reliable AI at Northeastern University, founder and organizer of AI By the Bay, Bay Area AI, AI Agent SF, the longest-running, deepest technical OSS AI communities, conferences, and meetups in the San Francisco Bay Area. Previously, Alexy was the Director of Open-Source Science at IBM Research, AI Community Architect at Neo4j, Senior Software Engineer at Amazon, and a co-founder and engineer in several Bay Area startups. Spaces are still available, but are running out! Sign up today! https://luma.com/8tvd2xla Thank you to our sponsors, Datastrato, Fivetran & LakeSail for making this event possible! See you there!

Similar pages

Browse jobs