-
Hello community, I'm working on a project where we process large documents using the Microsoft.KernelMemory library. What I'm trying to figure out is if there is a way to pause file processing once a certain number of tokens has been processed. Current Approach:
I call ImportDocumentAsync() to ingest the document, assuming it would process it in chunks internally:
What I Need: Questions: |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
KM automatically handles TPM quota with Azure OpenAI and OpenAI, pausing requests as instructed by HTTP 429 response headers. When the KM code reaches the quota, OpenAI service returns a response with HTTP Status code 429, and a header saying something like "max quota reached, wait N seconds". As long as this header is returned, KM will honor it. |
Beta Was this translation helpful? Give feedback.
-
Thanks @dluc! Will try this approach |
Beta Was this translation helpful? Give feedback.
KM automatically handles TPM quota with Azure OpenAI and OpenAI, pausing requests as instructed by HTTP 429 response headers.
When the KM code reaches the quota, OpenAI service returns a response with HTTP Status code 429, and a header saying something like "max quota reached, wait N seconds". As long as this header is returned, KM will honor it.
Code here: https://github.com/microsoft/kernel-memory/blob/main/extensions/OpenAI/OpenAI/Internals/ClientSequentialRetryPolicy.cs