-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Release 22/inconsistent device tensor action in trainers [Don't Merge] #6225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Release 22/inconsistent device tensor action in trainers [Don't Merge] #6225
Conversation
This reverts commit d67dc94.
…gies#6145) * Update PerformancProject and DevProject. * Removed mac perf tests.
…ts into release/3.0.0
Aurimas Petrovas seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❗Cycode: Security vulnerabilities found in newly introduced dependency.
Ecosystem | PyPI |
Dependency | grpcio |
Dependency Paths | grpcio 1.48.2 |
Direct Dependency | Yes |
The following vulnerabilities were introduced:
GHSA | CVE | Severity | Fixed Version |
---|---|---|---|
GHSA-496j-2rq6-j6cc | CVE-2023-33953 | HIGH | 1.53.2 |
GHSA-6628-q6j9-w8vg | CVE-2023-1428 | HIGH | 1.53.0 |
GHSA-cfgp-2977-2fmm | CVE-2023-32731 | HIGH | 1.53.0 |
Highest fixed version: 1.53.2
Description
Detects when new vulnerabilities affect your dependencies.
Tell us how you wish to proceed using one of the following commands:
Tag | Short Description |
---|---|
#cycode_vulnerable_package_fix_this_violation | Fix this violation via a commit to this branch |
#cycode_ignore_manifest_here <reason> | Applies to this manifest in this request only |
Proposed change(s)
Hi,
While working with mlagents-learn and running my environment with --torch-device cuda, I encountered multiple runtime errors such as:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
After investigating the root cause, I found that several tensor operations in the codebase implicitly rely on default CPU-based tensors. These issues are often introduced when:
Creating tensors directly with torch.tensor() Using NumPy-based Parameters or Variable without specifying device (e.g., torch.tensor(numpy_value) instead of torch.from_numpy(numpy_value).to(device)).
Combining GPU-based tensors with CPU-based masks or constants Computation.
This is especially problematic in scenarios where ML-Agents interacts with compute shaders or when training with Unity in GPU mode, as values originating from RAM (via NumPy) default to CPU memory and cause device mismatches in PyTorch computations.
What I did:
To address this, I:
Identified common points where tensors were created without explicit device assignment.
Ensured that all relevant tensors (especially masks, constants, and externally created inputs) are moved to the correct device using .to(device) based on the context tensor.
This should make training on CUDA more stable and prevent errors due to cross-device tensor operations.
Thanks for the great work on ML-Agents!
Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)
None
Types of change(s)
Checklist
Other comments
Error and Success Run Log

