Locomotion Training

This collection documents the locomotion stack used to take Asimov from simulation to real-world walking. The current material is centered on the legs-only platform: 12 actuated leg joints with 2 passive toe joints. The same design principles are intended to carry forward into the full-body controller.

The locomotion stack is organized around one core idea: successful transfer is determined less by policy novelty and more by whether the policy sees the same type, timing, and quality of data in simulation that it will see on hardware.

The chapters in this collection are organized as follows:

Understanding Your Simulation Environment defines the simulator assumptions, actuator interfaces, timing model, and the limits of a purely physics-only view of sim2real.

Reinforcement Learning for Locomotion describes the policy formulation, actor and critic observations, and the control philosophy used for transfer.

Deep Dive: System Identification documents the hardware-to-simulation mapping, motor parameters, armature values, joint constraints, and other identified quantities that matter for stable behavior.

Simulation Training Environment covers action and observation interfaces, control rates, delays, filters, and environment structure.

Reward Design explains which rewards were kept, which ones were removed, and which ones were modified for Asimov hardware.

Domain Randomization describes the targeted randomization strategy used for sim2real transfer.

Policy Deployment documents the real firmware loop, processor-in-the-loop validation, and the practical issues encountered during bring-up on hardware.

This collection should be read together with the hardware chapters, especially the discussions of the parallel ankle mechanism and passive toes in Joint Design and Actuation.