-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple GPUs #726
Comments
Hello! What do you mean by |
I am sorry, I mean the openmmtools script for replica exchange solute
tempering. It works very well, but since every replica is running on the
same GPU, it is slow. I was wondering if workstations with more than one
GPU could be used to distribute the replicas. It is easy to run OpenMM jobs
in such manner by specifying multiple device indices for the simulation
platform, but I couldn’t find a similar option within openmmtools. Hope
this makes my question more clear. Thank you very much, Istvan
…On Tue, Mar 19, 2024 at 9:09 PM Iván Pulido ***@***.***> wrote:
Hello! What do you mean by rest.py? Can you be more specific as to what
your issue is?
—
Reply to this email directly, view it on GitHub
<#726 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKDNJ2TEYT54GZXV6ZA4R7TYZDOTFAVCNFSM6AAAAABE6KT2HSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBYGQ4DEMJTGM>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
I posted this question first on the OpenMM cookbook GitHub but Peter
Eastman said that it belonged here.
On Tue, Mar 19, 2024 at 9:34 PM Istvan Kolossvary ***@***.***>
wrote:
… I am sorry, I mean the openmmtools script for replica exchange solute
tempering. It works very well, but since every replica is running on the
same GPU, it is slow. I was wondering if workstations with more than one
GPU could be used to distribute the replicas. It is easy to run OpenMM jobs
in such manner by specifying multiple device indices for the simulation
platform, but I couldn’t find a similar option within openmmtools. Hope
this makes my question more clear. Thank you very much, Istvan
On Tue, Mar 19, 2024 at 9:09 PM Iván Pulido ***@***.***>
wrote:
> Hello! What do you mean by rest.py? Can you be more specific as to what
> your issue is?
>
> —
> Reply to this email directly, view it on GitHub
> <#726 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AKDNJ2TEYT54GZXV6ZA4R7TYZDOTFAVCNFSM6AAAAABE6KT2HSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBYGQ4DEMJTGM>
> .
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
|
I have the same question. I am able to run multiple solute tempering REMD simulations in parallel with mpirun according to this issue(#648), but I don't know how to distribute replicas among multiple GPUs so that they contribute to the same REMD simulation. |
Yes, that was exactly my point. When I run such a simulation using
openmmtools, the only option seems to be that all, say, 8 jobs run on the
same GPU and that makes it prohibitively slow. I have not received any
reply from the developers. (Plain OpenMM jobs can readily run on multiple
GPUs.)
…On Fri, May 17, 2024 at 8:39 PM Xiaowei Xie ***@***.***> wrote:
I have the same question. I am able to run multiple solute tempering REMD
simulations in parallel with mpirun according to this issue(#648
<#648>), but I don't know
how to distribute replicas among multiple GPUs so that they contribute to
the same REMD simulation.
—
Reply to this email directly, view it on GitHub
<#726 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKDNJ2URGTQAN23ADOYXRSTZC2PK5AVCNFSM6AAAAABE6KT2HSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJYGUZTANBYGA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Hi @gitkol, I think I figured it out. Do you have mpi4py installed correctly? For me, what I found was that, if I don't have mpi4py installed, mpirun would run multiple copies of the same REMD (each GPU would run a whole REMD but multiple GPUs running at the same time), but when I have mpi4py installed multiple GPUs will contribute to a single REMD. Here is an example of job files that worked for me (if it's helpful). On my system, using 4GPUs resulted in 2x speed up compared to 1 GPU (not 4x). |
@xiaowei-xie2 is correct, having There's always some part of the code that cannot be fully parallelized, for example when communicating between the different GPUs. It would be interesting to see if we can accomplish some profiling to check where the overhead is. Thanks! |
Hi @ijpulidos, thank you for the insight. Yes I totally understand using n GPUs won't necessarily result in n times speed up (sometimes not even any speed up), so I am actually satisfied with the current performance. But yes it would be nice to see where the overhead is! I am also curious does the current repo support parallelizing across multiple GPUs across multiple nodes? |
@xiaowei-xie2 It does support that, since everything is handled by the MPI environment. That also means it's highly dependent on the MPI setup of the system. Depending on the connectivity of your HPC system and the system being simulated it might make sense, or not, to do this. We should try to come up with an example on how to accomplish this that people can use and add it to the docs. |
Thanks for the comments here, @ijpulidos - following up on the above, is it possible to specify each replica to live on its own GPU somehow? It seems to me that that setup show provide the best performance, or is that what happens by default under the hood? |
Hi,
Can rest.py run on multiple GPUs?
Thanks,
Istvan
The text was updated successfully, but these errors were encountered: