One model per GPU #786
Unanswered
WaleedAlfaris
asked this question in
Q&A
Replies: 1 comment
-
Wish I had multiple gpus to test it out but have you tried main_gpu param? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I have a system with 4 CUDA enabled GPUs, each with 16GB of VRAM. I have a single api which loads the models into a pool and uses a queue system to process queries in a first in first out sequence. I am able to sucesfully run 4 llama2-7B models on this system. However, When I do this, the models are split accross the 4 GPUs automatically. Is there any way to specify which models are loaded on which devices? I would like to load each model fully onto a single GPU, having model one fully loaded on GPU 0, model 2 on GPU 1, and so on, wihtout splitting a single model accross multiple GPUs. Is this possible?
When looking online, I found the
export CUDA_VISIBLE_DEVICES=1
command, but since I am loading all the models in a single script this would limit all the models to the visible GPUs and would stil allocate them automatically. Unless there is a way to use the command in another way.Beta Was this translation helpful? Give feedback.
All reactions