Long Context Models - Possible to split the ctx memory across GPUs? #1639
Unanswered
Alumniminium
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hey, what's the approach here? I just got myself a 2nd RTX 3090 with 24GB VRAM to use a 7B 64k model, and still get OOMs
What's the proper way to invoke llama? I tried the tensor split at a 1,1 ratio - i have no idea if that's right
Beta Was this translation helpful? Give feedback.
All reactions