How to control request size within a batch? #2030
Replies: 2 comments
-
|
I'm also curious about this and also the relationship between max-batch-size and max-concurrent-requests |
Beta Was this translation helpful? Give feedback.
-
|
The --max-batch-size limits how many requests are grouped together, but TGi adjusts dynamically to handle more requests under heavy load (e.g due to long tokens or backpressure). This can cause the batch size to exceed your set limit. How to fix it: Reduce --max-total-tokens to limit the total tokens in a batch. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all,
I've tried
--max-batch-sizecommand however it doesn't work as I've expected, I thought it was supposed to limittgi_batch_current_size. I'd like to control the queue size & how many requests per inference batch. Can someone please clarify on this?These are my launch commands and version:
My situation:
--max-batch-size 10tgi_batch_current_sizegoes over 10 after a short whileExpected situation:
--max-batch-size 10tgi_batch_current_sizestays at 10, and the other 30 request will stay intgi_queue_sizeuntil its done.TLDR:
tgi_batch_current_sizedoesn't align with--max-batch-size.Thanks in advance.
Beta Was this translation helpful? Give feedback.
All reactions