Replies: 1 comment
-
Yes, generation is the main bottleneck here, as for each prompt, you need to generate several completions (8 by default). We're currently pushing hard to make the generation way faster, see #2600 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
it seems to be much slower on my side with G=8? though with less memory (which is expected)
Beta Was this translation helpful? Give feedback.
All reactions