Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: How is OpenAI API Call Concurrency Managed? #1095

Open
2 tasks done
zhangxingeng opened this issue Mar 16, 2025 · 1 comment
Open
2 tasks done

[Question]: How is OpenAI API Call Concurrency Managed? #1095

zhangxingeng opened this issue Mar 16, 2025 · 1 comment
Labels
question Further information is requested

Comments

@zhangxingeng
Copy link

Do you need to ask a question?

  • I have searched the existing question and discussions and this question is not already answered.
  • I believe this is a legitimate question, not just a bug or feature request.

Your Question

Hi there,

First of all, thank you for sharing this amazing project! I’ve been going through the code, and I really appreciate the effort and thought put into it.

I had a question regarding how the project manages concurrent OpenAI API calls. From what I can see, multiple functions (e.g., extract_entities, kg_query, mix_kg_vector_query, etc.) use llm_model_func, which in turn calls openai_complete_if_cache. However, I didn’t notice any explicit mechanism limiting the number of concurrent requests to OpenAI.

Given that OpenAI enforces rate limits, I was wondering:

  1. Is there an existing mechanism to control the number of concurrent API calls that I might have missed?
  2. If not, was this an intentional design choice, or could this potentially lead to exceeding rate limits when multiple requests are sent simultaneously?
  3. Would adding an async semaphore (e.g., asyncio.Semaphore) be a recommended way to limit concurrent calls if needed?

I just wanted to clarify this before running it at scale. I really appreciate any insights you can provide. Thanks again for the great work!

Additional Context

No response

@zhangxingeng zhangxingeng added the question Further information is requested label Mar 16, 2025
@danielaskdd
Copy link
Collaborator

Optimize the search for the parameter llm_model_max_async, which governs the maximum number of parallel requests for the Large Language Model (LLM). In the event of reaching the rate limit, the system will implement a randomized delay followed by a retry mechanism.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants