Domain Randomization

This chapter describes the domain randomization strategy used for Asimov locomotion.

1. Targeted, not broad

The randomization strategy is intentionally selective. The goal is not to randomize every quantity in the simulator. The goal is to randomize the quantities that are known to vary between simulation and hardware.

This chapter should therefore be read with the following principle in mind:

Randomize what is known to vary. Do not randomize what has already been measured with sufficient accuracy.

2. Quantities that are randomized

Representative randomized terms include:

Parameter	Range	Reason
encoder zero offset (`qpos0`)	`+/-0.02 rad`	Calibration error
PD gains	`x0.9 - x1.1`	Motor response variation
toe stiffness	`3.5 - 5.5 Nm/rad`	Spring variation
foot friction	`1.0 - 1.5`	Surface variation
observation delay	`0-2` steps	CAN timing jitter
action delay	`0-1` steps	Command latency
push disturbance	`+/-0.5 m/s` class disturbances	External perturbations
reset base orientation	`yaw+/-180°, pitch+/-0.15 rad, roll+/-0.1 rad`	Initial orientation variation
joint velocity noise	`+/-0.1 rad/s`	Encoder velocity noise
IMU angular velocity noise	`+/-0.01 rad/s`	Gyro measurement noise

These randomizations are tied directly to known sources of mismatch.

3. Quantities intentionally not randomized

Some quantities are intentionally left fixed during initial training.

Parameter	Reason
body mass	broad randomization reduced learning stability during initial walking
link lengths	CAD and URDF geometry were already close to hardware
gravity	deployment environment does not vary meaningfully

This prevents training from spending capacity on unlikely or unnecessary variability.

4. Delay randomization is not generic noise

Observation and actuator delay randomization are especially important in this stack. These delays are not abstract robustness noise — they reflect the real CAN polling structure and firmware timing described in Deep Dive: System Identification. The randomization ranges in the table above correspond directly to the measured variation in those timing paths.

5. Contact-side randomization

Foot friction and contact-dependent terms are also randomized because walking quality depends strongly on floor condition and contact consistency.

These terms help the policy remain usable across:

slightly different surfaces

moderate contact-model mismatch

unit-to-unit variation in toe and foot response

6. Randomization still depends on an accurate base model

Domain randomization is not a substitute for system identification.

The stack first requires:

correct hardware mapping

realistic actuator parameters

stable contact geometry

deployable observation design

Only after those are in place does targeted randomization improve robustness in a meaningful way.