Skip to content

Port the classic Brax/Gym locomotion tasks (walker2d, hopper, halfcheetah, ant, humanoid) to Playground physics #310

@YannBerthelot

Description

@YannBerthelot

Hello !

Playground has become the natural successor to brax/envs: the physics is faster, the tooling is better, and the Brax README explicitly redirects users here:

"Instead of brax/envs, users should use MuJoCo Playground, all of which train well with brax/training."

But, as far as I am aware, that's not quite true yet. Playground has the DMC versions (WalkerWalk, HopperHop, ...), which are close in spirit to Brax's gym-style locomotion envs but not the same task. They differ on reset, reward, and termination:

  • DMC WalkerWalk resets pitch in [−π, π] and joints across their full range; Brax walker2d is init_q ± 5e-3.
  • DMC reward is a shaped stand·walk product; Brax is forward·x_velocity + healthy − ctrl_cost·‖a‖².
  • DMC doesn't terminate on falling; Brax terminates when torso_z ∉ [0.8, 2.0] or |angle| > 1.0.

I ran into this on a safety-RL project and ended up writing a local wrapper stack to get Brax-walker2d semantics on top of Playground's WalkerWalk. Not hard, but it really should live upstream so we all share one implementation. This would also give conversations like #265 (reproducing PPO baselines on humanoid walk) a single tested reference point.

What I'd propose

A subpackage (working name brax_reborn, comebrax, or whatever you prefer) with the five classic envs: Walker2d, Hopper, HalfCheetah, Ant, Humanoid.

Each port uses Brax's original MJCF (the MuJoCo XML robot description), Brax's init_q + reset_noise_scale reset, Brax reward with default weights, and Brax healthy-range termination. Physics is MJX, with both JAX and Warp backends via Playground's existing impl config.

Faithfulness tests

For each env, roll a random uniform policy for N episodes in Brax (positional backend) and in the port with matching seeds. Assert that episode return, episode length, and termination rate distributions match within a loose tolerance. MJX and Brax's positional pipeline won't agree to the digit, but a gross mismatch catches the porting bugs worth catching. Also assert the obs and action dims match Brax.

If this is of interest

I'm happy to do Walker2d end-to-end as a first PR (I have a prototype locally from the safety-RL project), and follow up with the other four if the shape looks right. Before I start, a few questions:

  1. Is this something you'd want upstream, or leaning toward keeping it external?
  2. Preferred name and location for the subpackage?
  3. Reuse Brax's MJCFs verbatim, or re-author fresh ones? Verbatim is easier; re-authoring sidesteps any licensing question.

Cheers,

Yann

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions