Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add utility capability for local_docker (#906) #907

Merged

Conversation

clumsy
Copy link
Contributor

@clumsy clumsy commented May 6, 2024

nvidia Docker images require adding libraries like libnvidia-ml that are part of utility capability.

TorchX currently only adds compute here.

Add utility next to compute. Similar fixes here and here

NOTE: nvidia-container-runtime has been superseded by nvidia-container-toolkit.

Test plan:
☑ updated unit test

☑ verified works with compute,utility

☑ verfiied works with default device capabilities too

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 6, 2024
@facebook-github-bot
Copy link
Contributor

@andywag has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@andywag andywag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@facebook-github-bot
Copy link
Contributor

@andywag has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@clumsy
Copy link
Contributor Author

clumsy commented May 7, 2024

Thanks for checking @andywag. Do you happen to know why do we still need device_request capabilities for local_docker?

I mentioned an alternative solution in #906 to delete this section entirely (e.g. we don't have it for aws_batch). Decided to fix in a somewhat backward-compatible fashion here since I'm not aware of the reason this code is there.

@facebook-github-bot facebook-github-bot merged commit ec8d0e8 into pytorch:main May 7, 2024
22 of 23 checks passed
@clumsy clumsy deleted the fix/local_docker_utility_capability branch May 8, 2024 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants