-
Notifications
You must be signed in to change notification settings - Fork 741
Support accelerator directive for local executor #5850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for nextflow-docs-staging canceled.
|
modules/nextflow/src/main/groovy/nextflow/processor/LocalPollingMonitor.groovy
Outdated
Show resolved
Hide resolved
|
@bentsherman The feature implemented in this PR would really help us use all the local GPUs without dealing with scheduling tasks on them manually. Do you know when this feature will be released? Or is it even planned? |
|
Right now I just put it out so that people can try it out, so I encourage you to try it with a local build of this PR. In principle we do want to have this, just haven't decided whether it should be part of |
b4b321e to
069653d
Compare
|
Hi @bentsherman or @pditommaso , I finally got some time to try this out. However, I was not able to compile nextflow from source. I used the following steps:
The error I get is as below: I'm not really sure, why I receive the 403 status code during download. Do you have any ideas to fix this? I would really want to try out this feature on our local GPU machines. |
|
Hi @bentsherman or @pditommaso, I'd be happy if you could have a look at this PR: |
07b6a01 to
d1f1a8a
Compare
|
Hi @bentsherman or @pditommaso, could you please also have a look at this PR: It proposes a fix to respect gpuIDs set in |
d1f1a8a to
ec6a888
Compare
ec6a888 to
b88d058
Compare
|
@thealanjason thank you again, you actually inspired me to improve the overall approach and make it more generic. I removed the @pditommaso I think this PR is ready for serious consideration. Using |
modules/nextflow/src/main/groovy/nextflow/processor/LocalPollingMonitor.groovy
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's great that now NVIDIA, AMD, and HIP devices can be handled generically :)
b88d058 to
90d1422
Compare
90d1422 to
28bff07
Compare
|
Just commenting to say this would help on SO many of cloud compute deployments. |
|
@ECM893 do you typically use the local executor in cloud for GPUs? If so I'm curious what your process looks like |
|
Yes. |
modules/nextflow/src/main/groovy/nextflow/processor/AcceleratorTracker.groovy
Show resolved
Hide resolved
modules/nextflow/src/main/groovy/nextflow/processor/AcceleratorTracker.groovy
Outdated
Show resolved
Hide resolved
modules/nextflow/src/main/groovy/nextflow/processor/LocalPollingMonitor.groovy
Show resolved
Hide resolved
modules/nextflow/src/main/groovy/nextflow/processor/LocalPollingMonitor.groovy
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added one suggestion. Please check meaning is retained. Otherwise docs looks. Approved.
Signed-off-by: Ben Sherman <[email protected]>
0df91bb to
1eb6b61
Compare
|
Hi @bentsherman and @pditommaso The command I am using is My observation is in the environment variable I'm not sure why this happens, but maybe some more tests are needed. |
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
|
@thealanjason thanks for testing. I added an end-to-end test based on your use case, but I wan't able to replicate your issue. Can you add |

When a pipeline runs multiple GPU-enabled tasks on the same node, each task will see all GPUs and will not try to coordinate which task should use which GPU.
NVIDIA provides the
CUDA_VISIBLE_DEVICESenvironment variable to control which tasks can see which GPUs, and users generally have to manage this variable themselves. Some HPC schedulers can assign this variable automatically, or use cgroups to control GPU visibility at a lower level.Nextflow should be able to manage this variable for the local executor, so that the user doesn't have to add complex pipeline logic to do the same. Running a GPU workload locally on a multi-GPU node is a common use case, so it is worth doing.
See the docs in the PR for usage.
To use with containers, you might have to add
CUDA_VISIBLE_DEVICEStodocker.envWhitelist.CUDA_VISIBLE_DEVICESworks with containers or if you have to setNVIDIA_VISIBLE_DEVICES.--gpusfor the docker command in order to use the GPUs at all, that can be set indocker.runOptionsSee also: #5570