Domain Randomization
This chapter describes the domain randomization strategy used for Asimov locomotion.
1. Targeted, not broad
The randomization strategy is intentionally selective. The goal is not to randomize every quantity in the simulator. The goal is to randomize the quantities that are known to vary between simulation and hardware.
This chapter should therefore be read with the following principle in mind:
Randomize what is known to vary. Do not randomize what has already been measured with sufficient accuracy.
2. Quantities that are randomized
Representative randomized terms include:
| Parameter | Range | Reason |
|---|---|---|
encoder zero offset (qpos0) | +/-0.02 rad | Calibration error |
| PD gains | x0.9 - x1.1 | Motor response variation |
| toe stiffness | 3.5 - 5.5 Nm/rad | Spring variation |
| foot friction | 1.0 - 1.5 | Surface variation |
| observation delay | 0-2 steps | CAN timing jitter |
| action delay | 0-1 steps | Command latency |
| push disturbance | +/-0.5 m/s class disturbances | External perturbations |
| reset base orientation | yaw+/-180°, pitch+/-0.15 rad, roll+/-0.1 rad | Initial orientation variation |
| joint velocity noise | +/-0.1 rad/s | Encoder velocity noise |
| IMU angular velocity noise | +/-0.01 rad/s | Gyro measurement noise |
These randomizations are tied directly to known sources of mismatch.
3. Quantities intentionally not randomized
Some quantities are intentionally left fixed during initial training.
| Parameter | Reason |
|---|---|
| body mass | broad randomization reduced learning stability during initial walking |
| link lengths | CAD and URDF geometry were already close to hardware |
| gravity | deployment environment does not vary meaningfully |
This prevents training from spending capacity on unlikely or unnecessary variability.
4. Delay randomization is not generic noise
Observation and actuator delay randomization are especially important in this stack. These delays are not abstract robustness noise — they reflect the real CAN polling structure and firmware timing described in Deep Dive: System Identification. The randomization ranges in the table above correspond directly to the measured variation in those timing paths.
5. Contact-side randomization
Foot friction and contact-dependent terms are also randomized because walking quality depends strongly on floor condition and contact consistency.
These terms help the policy remain usable across:
- slightly different surfaces
- moderate contact-model mismatch
- unit-to-unit variation in toe and foot response
6. Randomization still depends on an accurate base model
Domain randomization is not a substitute for system identification.
The stack first requires:
- correct hardware mapping
- realistic actuator parameters
- stable contact geometry
- deployable observation design
Only after those are in place does targeted randomization improve robustness in a meaningful way.
How is this guide?