Skip to content

How to pause document ingestion after processing a set number of Tokens using Kernel Memory ImportDocumentAsync? #967

Answered by dluc
mahadzk asked this question in 1. Q&A
Discussion options

You must be logged in to vote

KM automatically handles TPM quota with Azure OpenAI and OpenAI, pausing requests as instructed by HTTP 429 response headers.

When the KM code reaches the quota, OpenAI service returns a response with HTTP Status code 429, and a header saying something like "max quota reached, wait N seconds". As long as this header is returned, KM will honor it.

Code here: https://github.com/microsoft/kernel-memory/blob/main/extensions/OpenAI/OpenAI/Internals/ClientSequentialRetryPolicy.cs

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Answer selected by dluc
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
1. Q&A
Labels
None yet
2 participants