Hello !
Playground has become the natural successor to brax/envs: the physics is faster, the tooling is better, and the Brax README explicitly redirects users here:
"Instead of brax/envs, users should use MuJoCo Playground, all of which train well with brax/training."
But, as far as I am aware, that's not quite true yet. Playground has the DMC versions (WalkerWalk, HopperHop, ...), which are close in spirit to Brax's gym-style locomotion envs but not the same task. They differ on reset, reward, and termination:
- DMC WalkerWalk resets pitch in [−π, π] and joints across their full range; Brax walker2d is init_q ± 5e-3.
- DMC reward is a shaped stand·walk product; Brax is
forward·x_velocity + healthy − ctrl_cost·‖a‖².
- DMC doesn't terminate on falling; Brax terminates when
torso_z ∉ [0.8, 2.0] or |angle| > 1.0.
I ran into this on a safety-RL project and ended up writing a local wrapper stack to get Brax-walker2d semantics on top of Playground's WalkerWalk. Not hard, but it really should live upstream so we all share one implementation. This would also give conversations like #265 (reproducing PPO baselines on humanoid walk) a single tested reference point.
What I'd propose
A subpackage (working name brax_reborn, comebrax, or whatever you prefer) with the five classic envs: Walker2d, Hopper, HalfCheetah, Ant, Humanoid.
Each port uses Brax's original MJCF (the MuJoCo XML robot description), Brax's init_q + reset_noise_scale reset, Brax reward with default weights, and Brax healthy-range termination. Physics is MJX, with both JAX and Warp backends via Playground's existing impl config.
Faithfulness tests
For each env, roll a random uniform policy for N episodes in Brax (positional backend) and in the port with matching seeds. Assert that episode return, episode length, and termination rate distributions match within a loose tolerance. MJX and Brax's positional pipeline won't agree to the digit, but a gross mismatch catches the porting bugs worth catching. Also assert the obs and action dims match Brax.
If this is of interest
I'm happy to do Walker2d end-to-end as a first PR (I have a prototype locally from the safety-RL project), and follow up with the other four if the shape looks right. Before I start, a few questions:
- Is this something you'd want upstream, or leaning toward keeping it external?
- Preferred name and location for the subpackage?
- Reuse Brax's MJCFs verbatim, or re-author fresh ones? Verbatim is easier; re-authoring sidesteps any licensing question.
Cheers,
Yann
Hello !
Playground has become the natural successor to
brax/envs: the physics is faster, the tooling is better, and the Brax README explicitly redirects users here:But, as far as I am aware, that's not quite true yet. Playground has the DMC versions (WalkerWalk, HopperHop, ...), which are close in spirit to Brax's gym-style locomotion envs but not the same task. They differ on reset, reward, and termination:
forward·x_velocity + healthy − ctrl_cost·‖a‖².torso_z ∉ [0.8, 2.0]or|angle| > 1.0.I ran into this on a safety-RL project and ended up writing a local wrapper stack to get Brax-walker2d semantics on top of Playground's WalkerWalk. Not hard, but it really should live upstream so we all share one implementation. This would also give conversations like #265 (reproducing PPO baselines on humanoid walk) a single tested reference point.
What I'd propose
A subpackage (working name
brax_reborn,comebrax, or whatever you prefer) with the five classic envs: Walker2d, Hopper, HalfCheetah, Ant, Humanoid.Each port uses Brax's original MJCF (the MuJoCo XML robot description), Brax's
init_q + reset_noise_scalereset, Brax reward with default weights, and Brax healthy-range termination. Physics is MJX, with both JAX and Warp backends via Playground's existingimplconfig.Faithfulness tests
For each env, roll a random uniform policy for N episodes in Brax (positional backend) and in the port with matching seeds. Assert that episode return, episode length, and termination rate distributions match within a loose tolerance. MJX and Brax's positional pipeline won't agree to the digit, but a gross mismatch catches the porting bugs worth catching. Also assert the obs and action dims match Brax.
If this is of interest
I'm happy to do Walker2d end-to-end as a first PR (I have a prototype locally from the safety-RL project), and follow up with the other four if the shape looks right. Before I start, a few questions:
Cheers,
Yann