Skip to content

update update_gradient_JTCJ_sparse#1225

Merged
thowell merged 1 commit intogoogle-deepmind:mainfrom
thowell:update_gradient_JTCJ_sparse
Mar 18, 2026
Merged

update update_gradient_JTCJ_sparse#1225
thowell merged 1 commit intogoogle-deepmind:mainfrom
thowell:update_gradient_JTCJ_sparse

Conversation

@thowell
Copy link
Copy Markdown
Collaborator

@thowell thowell commented Mar 13, 2026

improve performance for elliptic friction cones with sparse constraints


benchmark: aloha pot

mjwarp-testspeed benchmarks/aloha_pot/scene.xml --nworld=8192 --nconmax=24 --njmax=128 --replay=lift_pot -o "opt.jacobian="sparse"" --event_trace

(note: there are ccd_iteration warnings)

main 02cde5b SPARSE_CONSTRAINT_JACOBIAN=False w/ bug fix from #1223

Summary for 8192 parallel rollouts

Total JIT time: 0.43 s
Total simulation time: 3.97 s
Total steps per second: 2,064,309
Total realtime factor: 4,128.62 x
Total time per step: 484.42 ns
Total converged worlds: 8192 / 8192

Event trace:

step: 471.52
  forward: 457.94
    fwd_position: 228.40
      kinematics: 22.03
      com_pos: 9.03
      camlight: 1.90
      flex: 0.17
      tendon: 0.18
      crb: 7.42
      tendon_armature: 0.17
      collision: 170.07
        nxn_broadphase: 84.46
        convex_narrowphase: 83.15
        primitive_narrowphase: 1.55
      make_constraint: 14.16
      transmission: 1.37
    sensor_pos: 0.17
    fwd_velocity: 25.80
      com_vel: 7.08
      passive: 1.17
      rne: 9.18
      tendon_bias: 0.17
    sensor_vel: 0.17
    fwd_actuation: 1.45
    fwd_acceleration: 13.34
      xfrc_accumulate: 2.07
    solve: 182.97
      mul_m: 1.23
    sensor_acc: 3.83
  euler: 13.06

main 02cde5b SPARSE_CONSTRAINT_JACOBIAN=True w/ bug fix from #1223

Summary for 8192 parallel rollouts

Total JIT time: 0.44 s
Total simulation time: 12.92 s
Total steps per second: 633,914
Total realtime factor: 1,267.83 x
Total time per step: 1577.50 ns
Total converged worlds: 8192 / 8192

Event trace:

step: 1565.73
  forward: 1552.46
    fwd_position: 224.02
      kinematics: 21.93
      com_pos: 9.15
      camlight: 1.92
      flex: 0.17
      tendon: 0.17
      crb: 7.33
      tendon_armature: 0.18
      collision: 169.03
        nxn_broadphase: 84.51
        convex_narrowphase: 82.09
        primitive_narrowphase: 1.54
      make_constraint: 10.97
      transmission: 1.28
    sensor_pos: 0.17
    fwd_velocity: 24.94
      com_vel: 6.57
      passive: 1.16
      rne: 8.87
      tendon_bias: 0.18
    sensor_vel: 0.17
    fwd_actuation: 1.45
    fwd_acceleration: 13.22
      xfrc_accumulate: 2.05
    solve: 1282.84
      mul_m: 1.21
    sensor_acc: 3.83
  euler: 12.75

this pr SPARSE_CONSTRAINT_JACOBIAN=True w/ bug fix from #1223 ctx.h.zero_()

Summary for 8192 parallel rollouts

Total JIT time: 0.44 s
Total simulation time: 4.07 s
Total steps per second: 2,012,126
Total realtime factor: 4,024.25 x
Total time per step: 496.99 ns
Total converged worlds: 8192 / 8192

Event trace:

step: 483.68
  forward: 470.30
    fwd_position: 224.81
      kinematics: 22.00
      com_pos: 9.21
      camlight: 1.93
      flex: 0.17
      tendon: 0.18
      crb: 7.38
      tendon_armature: 0.17
      collision: 169.56
        nxn_broadphase: 84.56
        convex_narrowphase: 82.55
        primitive_narrowphase: 1.55
      make_constraint: 11.03
      transmission: 1.28
    sensor_pos: 0.17
    fwd_velocity: 25.08
      com_vel: 6.59
      passive: 1.17
      rne: 8.93
      tendon_bias: 0.17
    sensor_vel: 0.17
    fwd_actuation: 1.46
    fwd_acceleration: 13.31
      xfrc_accumulate: 2.06
    solve: 199.62
      mul_m: 1.22
    sensor_acc: 3.86
  euler: 12.85

tl;dr
performance for the sparse constraint jacobian code path is improved
sps: 633,914 -> 2,012,126

but this is still not as performance as the dense constraint jacobian code path
we have significantly closed the performance gap with the dense code path
sps: 2,064,309 -> 2,012,126

@thowell
Copy link
Copy Markdown
Collaborator Author

thowell commented Mar 13, 2026

@adenzler-nvidia @Kenny-Vilella any ideas for further improving performance for elliptic friction cones + sparse constraint jacobian?

@Kenny-Vilella
Copy link
Copy Markdown
Collaborator

@thowell I will hopefully start to look at perf again this week. Will come back to you.

@thowell
Copy link
Copy Markdown
Collaborator Author

thowell commented Mar 16, 2026

there was a bug in the sparse code path that is now fixed in #1223.

performance between the sparse and dense code paths is now significantly closer

@thowell thowell marked this pull request as ready for review March 16, 2026 18:54
Copy link
Copy Markdown
Collaborator

@erikfrey erikfrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@Kenny-Vilella
Copy link
Copy Markdown
Collaborator

Yeah you can probably merge that and we can continuously improve perf.
Let's improve one kernel at a time.

@thowell thowell merged commit 43ca0fb into google-deepmind:main Mar 18, 2026
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants