Your data, on your own infrastructure
Mosaico runs entirely on-premise. Ontology, ingestion, and query. Every layer of the platform runs on your servers, your cloud account, or your edge hardware. No data ever leaves your infrastructure.
Mosaico is the open source infrastructure that transforms
petabytes of robotic sensor data into training-ready assets.
A purpose-built data platform for robotics. Stop building fragile workarounds and handle petabyte-scale sensor data on your own infrastructure. From ingestion to retrieval, everything your team needs in one place.
Mosaico runs entirely on-premise. Ontology, ingestion, and query. Every layer of the platform runs on your servers, your cloud account, or your edge hardware. No data ever leaves your infrastructure.
Filter by physical sensor values across your entire dataset: acceleration spikes, GPS drops, joint overloads, you name it. Mosaico returns exact timestamp windows, not files to scrub through manually.
ROS bags, MCAP, custom sensors, Mosaico ingests them all through a single structured interface. Schema translation is automatic. One integration, any source, for any project.
Mosaico's Python SDK covers the full data lifecycle without SQL, schemas, or boilerplate. Three primitives: push, find, stream.
from mosaicolabs import MosaicoClient, IMU, SessionLevelErrorPolicy
with MosaicoClient.connect("localhost", 6726) as client:
with client.sequence_create(
sequence_name="imu_session_01",
metadata={"source": "sensor_rig_v3"},
on_error=SessionLevelErrorPolicy.Delete,
) as swriter:
imu_writer = swriter.topic_create("sensors/imu", ontology_type=IMU)
for msg in stream_imu("imu.csv"): # user-supplied generator
imu_writer.push(message=msg)Built-in models for IMU, GPS, Image, Pressure. Extend with your own in a few lines of Python.
Generator-based I/O from CSV, MCAP, or ROS bags. Ingest datasets that exceed RAM without loading a single full file.
Session and topic-level error policies define what happens on failure. Delete cleanly or retain partial data for recovery.
from mosaicolabs import MosaicoClient, QuerySequence
with MosaicoClient.connect("localhost", 6726) as client:
results = client.query(
QuerySequence()
.with_name_match("test_drive")
.with_user_metadata("project.name", eq="Apollo")
.with_user_metadata("environment.visibility", lt=50)
)
if results:
for item in results:
print(item.sequence.name, [t.name for t in item.topics])Filter across session metadata, topic names, and physical sensor values in one atomic server-side request.
Dot notation on your own models. No SQL, no query strings just simple strings like IMU.Q.acceleration.x.gt(5.0).
Every match returns exact start and end timestamps, ready to feed directly into a data streamer for replay.
from mosaicolabs import MosaicoClient
with MosaicoClient.connect("localhost", 6726) as client:
handler = client.sequence_handler("mission_alpha")
if handler:
stream = handler.get_data_streamer(
topics=["/sensors/gps", "/sensors/imu", "/sensors/pressure"],
start_timestamp_ns=1738508778000000000,
end_timestamp_ns =1738509618000000000,
)
print(f"Replay starts at: {stream.next_timestamp()}")
for topic, msg in stream:
print(f"[{topic}] {msg.timestamp_ns} — {type(msg.data).__name__}")
handler.close()K-Way merge sort delivers messages from multiple sensors in exact chronological order, regardless of their sampling rates.
Set precise start and end timestamps to pull only the exact data window you need, skipping anything that falls outside the range.
Processed batches are discarded and replaced as you iterate. Stream terabytes without exhausting local memory.
Mosaico is built around two components: the SDK, which your application uses directly for ingestion, retrieval, and querying, and mosaicod, the daemon that sits between your code and your data layer, handling storage, indexing, state, and retrieval. Data moves between the two without serialization overhead or format conversion.
Your application
Mosaico SDK
mosaicod
Your data layer
Your application
Mosaico SDK
mosaicod
Your data layer
Mosaico handles every stage of your robotics data lifecycle, so your team can focus on building robots, not data infrastructure.
Standardized representation for the most common robotics data models. Speak one language across all sensors.
Search complex scenarios using simple text queries. Find the needle in petabytes of sensor data.
True root cause debugging with full traceability. Track every transformation from raw sensor to training set.
Coordinate data containers and third-party tools. Automate labeling, segmentation, and inference pipelines.
Build certifiable data pipelines against rigorous industry standards. Enterprise-grade compliance from day one.
Native support for ROS, ROS2, and MCAP formats. Middleware agnostic with optimal data compression.
From open source to enterprise, we've got you covered.
AGPL licensed · Community support
Dedicated support, development available.
Researchers from SISSA, Pisa, and Parma. Engineering experience at Ambarella and Magneti Marelli. At some point we all ended up debugging the same broken data pipelines, so we decided to fix them once and for all.
CEO & CO-FOUNDER
Ambarella · Magneti Marelli · SISSA
CSO & CO-FOUNDER
Ambarella · Magneti Marelli · Univ. Pisa
CTO & CO-FOUNDER
Ambarella · Magneti Marelli · Univ. Parma
Backed by
We didn't think it was right to put a paywall on something this fundamental. Data management isn't a premium feature of robotics development, it's the foundation. It felt wrong to have teams lose months of work, or entire datasets, just because they couldn't justify a license fee at an early stage.
Join disruptive companies and robotics leaders.
Start building with Mosaico in minutes.
Open source · Deploy in < 5 min