You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the existing question and discussions and this question is not already answered.
I believe this is a legitimate question, not just a bug or feature request.
Your Question
Hi there,
First of all, thank you for sharing this amazing project! I’ve been going through the code, and I really appreciate the effort and thought put into it.
I had a question regarding how the project manages concurrent OpenAI API calls. From what I can see, multiple functions (e.g., extract_entities, kg_query, mix_kg_vector_query, etc.) use llm_model_func, which in turn calls openai_complete_if_cache. However, I didn’t notice any explicit mechanism limiting the number of concurrent requests to OpenAI.
Given that OpenAI enforces rate limits, I was wondering:
Is there an existing mechanism to control the number of concurrent API calls that I might have missed?
If not, was this an intentional design choice, or could this potentially lead to exceeding rate limits when multiple requests are sent simultaneously?
Would adding an async semaphore (e.g., asyncio.Semaphore) be a recommended way to limit concurrent calls if needed?
I just wanted to clarify this before running it at scale. I really appreciate any insights you can provide. Thanks again for the great work!
Additional Context
No response
The text was updated successfully, but these errors were encountered:
Optimize the search for the parameter llm_model_max_async, which governs the maximum number of parallel requests for the Large Language Model (LLM). In the event of reaching the rate limit, the system will implement a randomized delay followed by a retry mechanism.
Do you need to ask a question?
Your Question
Hi there,
First of all, thank you for sharing this amazing project! I’ve been going through the code, and I really appreciate the effort and thought put into it.
I had a question regarding how the project manages concurrent OpenAI API calls. From what I can see, multiple functions (e.g.,
extract_entities
,kg_query
,mix_kg_vector_query
, etc.) usellm_model_func
, which in turn callsopenai_complete_if_cache
. However, I didn’t notice any explicit mechanism limiting the number of concurrent requests to OpenAI.Given that OpenAI enforces rate limits, I was wondering:
asyncio.Semaphore
) be a recommended way to limit concurrent calls if needed?I just wanted to clarify this before running it at scale. I really appreciate any insights you can provide. Thanks again for the great work!
Additional Context
No response
The text was updated successfully, but these errors were encountered: