How to pause document ingestion after processing a set number of Tokens using Kernel Memory ImportDocumentAsync? #967

mahadzk · 2025-01-07T03:17:45Z

mahadzk
Jan 7, 2025

Hello community,

I'm working on a project where we process large documents using the Microsoft.KernelMemory library. What I'm trying to figure out is if there is a way to pause file processing once a certain number of tokens has been processed.

Current Approach:
I’m using the KernelMemoryBuilder with WithCustomTextPartitioningOptions to handle the chunking automatically:

.WithCustomTextPartitioningOptions(new TextPartitioningOptions { MaxTokensPerParagraph = _chunkingOptions.MaxTokensPerParagraph, MaxTokensPerLine = _chunkingOptions.MaxTokensPerLine, OverlappingTokens = _chunkingOptions.OverlappingTokens })

I call ImportDocumentAsync() to ingest the document, assuming it would process it in chunks internally:

await memory.ImportDocumentAsync(stream, fileName, indexName, documentId, tags);

What I Need:
I need to ensure that the ingestion process pauses after 100,000 tokens have been processed for 60 seconds before continuing. This is due to the 250,000 tokens per minute (TPM) restriction for OpenAI.

Questions:
Does ImportDocumentAsync() allow for mid-process control (e.g., tracking real-time chunks and pausing)?
If not, is there a recommended way to achieve this without manually splitting the document into smaller pieces beforehand?
Are there any built-in methods or extension points that provide visibility into the individual chunks created by WithCustomTextPartitioningOptions?
Context:
I’m using version 0.62.240605.1 of Microsoft.KernelMemory.Core.
The use case is to avoid OpenAI’s API rate limits, which enforces a 250,000-token per minute restriction.
Any guidance or recommendations would be highly appreciated. Thank you for your help!

Answered by dluc

Jan 8, 2025

KM automatically handles TPM quota with Azure OpenAI and OpenAI, pausing requests as instructed by HTTP 429 response headers.

When the KM code reaches the quota, OpenAI service returns a response with HTTP Status code 429, and a header saying something like "max quota reached, wait N seconds". As long as this header is returned, KM will honor it.

Code here: https://github.com/microsoft/kernel-memory/blob/main/extensions/OpenAI/OpenAI/Internals/ClientSequentialRetryPolicy.cs

View full answer

dluc · 2025-01-08T19:59:38Z

dluc
Jan 8, 2025
Maintainer

KM automatically handles TPM quota with Azure OpenAI and OpenAI, pausing requests as instructed by HTTP 429 response headers.

When the KM code reaches the quota, OpenAI service returns a response with HTTP Status code 429, and a header saying something like "max quota reached, wait N seconds". As long as this header is returned, KM will honor it.

Code here: https://github.com/microsoft/kernel-memory/blob/main/extensions/OpenAI/OpenAI/Internals/ClientSequentialRetryPolicy.cs

0 replies

mahadzk · 2025-01-09T18:06:42Z

mahadzk
Jan 9, 2025
Author

Thanks @dluc! Will try this approach

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to pause document ingestion after processing a set number of Tokens using Kernel Memory ImportDocumentAsync? #967

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

How to pause document ingestion after processing a set number of Tokens using Kernel Memory ImportDocumentAsync? #967

mahadzk Jan 7, 2025

Replies: 2 comments

dluc Jan 8, 2025 Maintainer

mahadzk Jan 9, 2025 Author

mahadzk
Jan 7, 2025

dluc
Jan 8, 2025
Maintainer

mahadzk
Jan 9, 2025
Author