Mosaico is a blazing-fast data platform designed to close the loop between physical world data and scalable, production-grade systems. Mosaico it does one thing exceptionally well: transforms monolithic sensor logs into a structured, queryable archive built for multi-modal data.
Retrieval is highly efficient thanks to a modern data lake approach.
Leveraging zero-copy architecture eliminates serialization overhead, surpassing the limitations of .bag or .mcap storage by enabling direct, random access to specific signals without parsing entire files.
Streams are automatically indexed, allowing the query engine to prune irrelevant data and stream only the precise slices required.
At its foundation is a strictly-typed data ontology. Instead of treating data as generic byte arrays, Mosaico enforces a semantic understanding of every object. This guarantees data is validatable, optimized for transport, and deeply queryable by physical values.
Targeting durable long-term storage and strict data lineage, Mosaico rejects traditional versioning, which introduces query ambiguity. Instead, it uses immutable data layers. This horizontal growth ensures query history remains deterministic and absolutely immutable.
This repo contains both the Python SDK (mosaico-sdk-py) and the Rust backend (mosaicod). We have chosen to keep the code in a monorepo configuration to simplify the testing and reduce compatibility issues.
Mosaico takes a strictly code-first approach. We didn't want to force you to learn yet another SQL-like sublanguage just to move data around. Instead, we provide native SDKs (starting with Python so you can query and upload data using the programming language you are already comfortable with.
Under the hood, the system operates on a standard client-server model.
The server daemon, mosaicod, acts as the central hub that takes care of the heavy lifting, like data conversion, compression, and organized storage.
On the other side, the client SDK is what you actually import into your scripts; it manages the communication logic and abstracts away the implementation details to keep your API usage stable, even as the platform evolves in the background.
The transition from classical robotics to Physical AI represents a fundamental shift in data requirements.
Classical Robotics operates in an event-driven world. Data is asynchronous, sparse, and stored in monolithic sequential files (like ROS bags). A Lidar might fire at 10Hz, an IMU at 100Hz, and a camera at 30Hz, all drifting relative to one another.
Physical AI requires synchronous, dense, and tabular data. Models expect fixed-size tensors arriving at a constant frequency (e.g., a batch of state vectors at exactly 50Hz).
Mosaico’s ML module automates this tedious data plumbing. It ingests raw, unsynchronized data and transforms it on the fly into the aligned, flattened formats ready for model training, eliminating the need for massive intermediate CSV files.
For a comprehensive description, please visit our documentation. If you are building with AI, you can find specialized technical guides in the Agent-ready section.
If you use Mosaico for a publication, please cite it as:
@software{MosaicoLabs,
author = {{Mosaico Team}},
title = {{Mosaico: The Open-Source Data Platform for Robotics and Physical AI.}},
url = {https://mosaico.dev},
version = {0.0},
year = {2025},
month = {12},
address = {Online},
note = {Available from https://mosaico.dev/ and https://github.com/mosaico-labs/mosaico}
}