The trust layer for robots that ship

From demo to deployment: the loop that actually closes.

Bench evaluates, Data cleans, Synth augments. Ship faster, fail less with tooling your team can inspect and extend.

Explore the product→View on GitHub→

Code, docs, and early SDK work live on GitHub.

Example Bench report

run_8c91a4 · kitchen_handover · real Franka

Three subtasks pass. Handover still fails.→ Find bottlenecks before hardware ever moves.

open_drawer98%

pick_mug91%

pour82%

handover60%

Verdict

Ship drawer + pick + pour.

Replay handover failures before sign-off.

MIT

Source available

Dataset formats

Policy types

12+

Embodiments targeted

The gap

A demo proves a robot can. Deployment needs proof it will — across seeds, across embodiments, on the hardware that ships.

The loop

One loop. Four building blocks.

Data, Synth, Bench, and API work together from failed rollout to verified policy.

OpenBot Data

Teleop curation

Clean episodes, mine failures, build the next training set.

Explore Data

OpenBot Synth

Data synthesis

Generate hard cases from the failures Bench finds.

Explore Synth

OpenBot Bench

Policy evaluation

Acceptance metrics across seeds, subtasks, and embodiments.

Explore Bench

OpenBot API

Programmatic access

Wire the loop into your runner, CI, or agent.

Explore API

Example report

A report built for sign-off.

Task success, failure point, and next action in one view.

openbot bench · kitchen_handover

example · mock data

run_8c91a4·policy: openvla-7b·embodiment: franka_panda·200 rollouts × 10 seeds

kitchen_handover · open_drawer → pick_mug → pour → handover

Conditional pass

Task success

73%+8 pp

Sim→Real gap closed

−29pp+12 pp

Intervention rate

14%−6 pp

Mean time-to-success

18.4s−2.1 s

Subtask success

200 rollouts, real Franka

open_drawer98%
pick_mug91%
pour82%
handover60%

Success across 10 seeds

73% ± 5.2

Worst: seed 3 · 65%Best: seed 2 · 80%

Ship for drawer + pick + pour. Replay failed handovers in Synth, then re-run Bench.

Open source

Open core. Hosted scale.

The methods that decide readiness are public. Managed workflows are available for teams that need scale.

Explore GitHub See the API

Auditable methods

Metrics and schemas stay public.

A shared standard

Use the same contracts in your own runner.

Hosted at scale

Move to managed rollouts when needed.

Real code

Actual SDK, not a mockup.

The openbot Python client is public in the repo today.

View the SDK source Read the API spec

bench_eval.py

from openbot import Client

ob = Client()                       # reads OPENBOT_API_KEY

run = ob.bench.rollout(
    policy="openvla-7b",
    embodiment="franka_panda",
    task="open_drawer → pick_mug → pour → handover",
    rollouts=200,
    seeds=10,
)

result = run.wait()                 # poll until the run finishes
print(result.task_success)          # e.g. 0.73
print(result.subtask["handover"])   # e.g. 0.60  ← bottleneck
print(result.sim_to_real_gap)       # e.g. -0.29

Open source

Build your verification loop today.

Clone the core, read the spec, run the examples.

Explore GitHub→

Early access

Bring OpenBot into your robot loop.

We're onboarding teams that need evaluation, data, and synthesis wired into real deployment workflows.

Request early access→