Concepts
Mental models for building on the Menlo Platform — architecture layers, robots, simulator, agent, and sessions.
The mental model you need before writing code. For exact message shapes and field lists, see the Wire format reference.
Architecture layers
System 2, 1, and 0 — the three timing contracts of robot software.
Robots
Virtual vs physical. Same API surface, different hardware.
Simulator
Uranus — a browser-based MuJoCo digital twin.
Agent
The voice-driven layer that interprets intent and moves the robot.
Sessions & channels
How the browser, cloud, and robot stay connected in real time.
Architecture layers
Robot software is organized into three layers with different timing contracts. System 2 can take seconds to reason; System 0 must respond in microseconds. A well-designed robot moves work down the stack only as fast as the timing budget allows.
| Layer | Timing | Responsibility |
|---|---|---|
| System 2 | 100 ms – seconds | Goal reasoning, task planning, LLM queries |
| System 1 | 10 – 100 ms | Skill execution, behavior trees, reactive policies |
| System 0 | < 1 ms | Motor drivers, sensor reads, real-time control loop |
Robots
A robot is the core resource in the platform. Every robot — virtual or physical — exposes the same API surface, so code written against one deploys to the other without changes.
| Type | Description |
|---|---|
virtual | Simulated robot running in the browser via the Uranus simulator. No hardware required. Create one instantly from the Platform UI. |
physical | An Asimov humanoid robot. Controlled today via the Asimov API. Platform UI integration is coming soon. |
Physical robots additionally expose firmware modes (damp, stand, move) controlled through the Asimov API. Those modes aren't surfaced in the Platform UI today.
Simulator
The Uranus simulator is a browser-based MuJoCo digital twin of the Asimov robot.
- Runs inside the Platform UI — no local install required.
- Uses actuator models measured from real hardware, not idealized physics.
- Streams telemetry over the same protobuf wire format as physical robots.
- FPV and third-person camera views, captured as a video track over WebRTC.
Agents validated in the simulator deploy directly to physical hardware without code changes.
Agent
Every session launches with an agent — a voice-driven layer that sits above the direct motion API. Hold Shift in the cockpit and speak a command; the agent transcribes your voice, reasons about the intent, issues the matching robot command, and replies verbally.
The pipeline is three stages: STT (speech-to-text) → LLM (intent + tool use) → TTS (text-to-speech). The LLM has tools for semantic commands and can see the robot's camera feed, so it can answer questions about what the robot is looking at and act on natural language instead of button presses.
The agent is dispatched automatically when you start a session — no separate connection needed. See the Manual control guide for how to use it.
Sessions & channels
A session is a live, bidirectional WebRTC connection to a running robot. Under the hood it's a LiveKit room — the browser, the agent, and the robot all join the same room and exchange messages over a fixed set of channels.
The channels have different reliability guarantees tuned to what they carry:
- Commands go out on a reliable, ordered channel — you don't want a "stop" to get dropped.
- Telemetry comes back on a lossy channel at ~10 Hz — if a packet drops, the next one is 100 ms away anyway.
- System events (boot progress, errors, mode changes) flow on a separate reliable channel so you can react to state transitions without polling telemetry.
- Video and audio are standard WebRTC media tracks.
Start a session with POST /v1/robots/{id}/session. For the exact channel names, message shapes, and field lists, see the Wire format reference.
For API authentication, see API reference → Authentication.
How is this guide?