added logic to create graph depending on driver cuda version #413

SergiMuac · 2025-12-12T12:34:50Z

Starting on CUDA 12.4, loading kernel modules in a graph is supported. However, it does not work for older CUDA versions that are actively used on GPU clusters (like 12.2).

This PR adds the logic to check the CUDA driver version, which might be different from the toolkit version. This way, the variable that enables the graph creation is set properly.

This fixes the following error:
Warp CUDA error 900: operation not permitted when stream is capturing (in function wp_cuda_load_module, /builds/omniverse/warp/warp/native/warp.cu:4389)

Tested on pc with rtx5090 cuda 13.0 and on cluster H100 cuda 12.2.

kevinzakka · 2025-12-16T02:56:28Z

Hi @SergiMuac, thanks for your contribution! I checked Newton code and they don't do such checking either. I'd like to err on the side of simplicity and just warn the user that they should use 12.4+ for graph capture support, something we do in fact currently document. Do you feel strongly about having this in the code?

SergiMuac · 2025-12-19T12:55:40Z

Hi @kevinzakka,

Short anwer: Yes!

I have identified a bug in the Sim.py code. The condition
self.use_cuda_graph = self.wp_device.is_cuda and wp.is_mempool_enabled(self.wp_device)
is necessary but not sufficient. While it is true that this logic works for all CUDA 12.4+ versions, and it would be possible to simply state in the documentation that MJLab requires CUDA 12.4+, this does not seem to be the best approach.

Allow me to explain, to the best of my understanding, what is happening. CUDA is backward compatible across versions in the sense that kernel modules can be recompiled on demand using instructions compatible with older versions. However, the CUDA Runtime itself has version-dependent limitations, and certain features are simply unavailable in older runtimes. In the case of sim.py, I have identified two issues that cause the code to fail silently. First, when using an older CUDA Runtime, kernel modules are recompiled as described above, but the script enables graph capture before all modules are fully loaded, which leads to an error. This can be addressed by performing a simple warm-up of the modules, for example by calling step once at the beginning.

The second issue concerns CUDA graph capture. This feature is only available starting from CUDA Runtime 12.4, but this requirement is not checked anywhere in the code. As a result, the code starts normally, but fails when create_graph is called. This could be fixed by adding an additional condition that explicitly checks the CUDA Runtime version, for example:

driver_ver = wp.context.runtime.driver_version
driver_ver = float(f"{driver_ver[0]}.{driver_ver[1]}")
self.use_cuda_graph = (
    self.wp_device.is_cuda
    and wp.is_mempool_enabled(self.wp_device)
    and driver_ver >= _MIN_DRIVER_FOR_CONDITIONAL_GRAPHS
)

With these two changes, execution would be robust across any CUDA 12.x version. If, after this explanation, you still prefer not to introduce these changes, I strongly believe that at a minimum the relevant CUDA graph or module-loading exceptions should be properly caught, so that the code does not continue failing silently when a CUDA version prior to 12.4 is used.

After further investigation, I have simplified the checks and reduced the additional code to a minimum. I would be interested to hear your thoughts on this approach.

kevinzakka · 2025-12-19T13:51:46Z

src/mjlab/sim/sim.py

+    if self.use_cuda_graph:
+      print("Warming up CUDA kernels...")
+      mjwarp.step(self.wp_model, self.wp_data)
+      wp.synchronize()


Is this part necessary?

I cannot confirm that this step is unnecessary, as no dedicated setup is available to validate the scenario. In theory, if the CUDA driver runtime accepts CUDA graph capture but does not support lazy module loading, preloading all modules before enabling graph capture would prevent a potential crash; however, it is unclear whether any released CUDA driver versions actually exhibit this behavior.
Empirically, the CUDA versions that satisfy the initial runtime check also appear to support lazy module loading, suggesting that this potential failure mode is already implicitly covered.
Tests performed on the available machines indicate that the system functions correctly without the warm-up step; therefore, it can be removed, with the understanding that a separate pull request can be opened in the future if needed. The changes will be pushed accordingly.

src/mjlab/sim/sim.py

SergiMuac · 2025-12-22T13:11:56Z

It appears that the pipeline crashes when checking CUDA version because GitHub runners do not have CUDA drivers installed. I'll add try/except mechanism to handle this scenario gracefully.

kevinzakka · 2025-12-23T05:37:58Z

Need to take a look at the failing tests

kevinzakka · 2025-12-23T14:52:02Z

Hi @SergiMuac! I've refactored the CUDA graph checking logic to make it cleaner and fix a scope bug. Could you cherry-pick this commit into your PR?

git fetch https://github.com/mujocolab/mjlab.git feat/support_cuda122
git cherry-pick 9baedc7

kevinzakka

See last comment.

SergiAcosta and others added 3 commits December 12, 2025 13:23

added logic to create graph depending on driver cuda version

38b6f20

code formated

81a7467

added try/except logic to handle if cuda driver is not installed

201e009

louislelay requested a review from kevinzakka December 15, 2025 23:10

SergiMuac added 2 commits December 19, 2025 14:05

improved cuda version checking

b93de62

added kernel modules warmup

89aa3eb

kevinzakka requested changes Dec 19, 2025

View reviewed changes

removed warm up step

aafd1e6

kevinzakka approved these changes Dec 23, 2025

View reviewed changes

kevinzakka requested changes Dec 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

added logic to create graph depending on driver cuda version #413

added logic to create graph depending on driver cuda version #413

Uh oh!

SergiMuac commented Dec 12, 2025 •

edited

Loading

Uh oh!

kevinzakka commented Dec 16, 2025

Uh oh!

SergiMuac commented Dec 19, 2025

Uh oh!

kevinzakka Dec 19, 2025

Uh oh!

SergiMuac Dec 22, 2025

Uh oh!

Uh oh!

SergiMuac commented Dec 22, 2025 •

edited

Loading

Uh oh!

kevinzakka commented Dec 23, 2025

Uh oh!

kevinzakka commented Dec 23, 2025

Uh oh!

kevinzakka left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

added logic to create graph depending on driver cuda version #413

Are you sure you want to change the base?

added logic to create graph depending on driver cuda version #413

Uh oh!

Conversation

SergiMuac commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kevinzakka commented Dec 16, 2025

Uh oh!

SergiMuac commented Dec 19, 2025

Uh oh!

kevinzakka Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

SergiMuac Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SergiMuac commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kevinzakka commented Dec 23, 2025

Uh oh!

kevinzakka commented Dec 23, 2025

Uh oh!

kevinzakka left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SergiMuac commented Dec 12, 2025 •

edited

Loading

SergiMuac commented Dec 22, 2025 •

edited

Loading