-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python: Handling Rate Limits and Potential Code Interpreter Limitations in Azure Assistant Agent #10287
Comments
Hi @anu43, we allow one to provide overrides for the @experimental_class
class RunPollingOptions(KernelBaseModel):
"""Configuration and defaults associated with polling behavior for Assistant API requests."""
default_polling_interval: timedelta = Field(default=timedelta(milliseconds=250))
default_polling_backoff: timedelta = Field(default=timedelta(seconds=1))
default_polling_backoff_threshold: int = Field(default=2)
default_message_synchronization_delay: timedelta = Field(default=timedelta(milliseconds=250))
run_polling_interval: timedelta = Field(default=timedelta(milliseconds=250))
run_polling_backoff: timedelta = Field(default=timedelta(seconds=1))
run_polling_backoff_threshold: int = Field(default=2)
message_synchronization_delay: timedelta = Field(default=timedelta(milliseconds=250))
run_polling_timeout: timedelta = Field(default=timedelta(minutes=1)) # New timeout attribute See the class definition here. You could do something like: from semantic_kernel.agents.open_ai.run_polling_options import RunPollingOptions
from datetime import timedelta
polling_options = RunPollingOptions(run_polling_interval=timedelta(seconds=5)) # or something based on your RPM
# Create the agent configuration
agent = await AzureAssistantAgent.create(
kernel=kernel,
service_id=service_id,
name=AGENT_NAME,
instructions=AGENT_INSTRUCTIONS,
...,
polling_options=polling_options,
) The attributes you'll want to pay attention to are:
We use these based on: def get_polling_interval(self, iteration_count: int) -> timedelta:
"""Get the polling interval for the given iteration count."""
return (
self.run_polling_backoff
if iteration_count > self.run_polling_backoff_threshold
else self.run_polling_interval
) Additionally, in your AI Foundry Portal, you can adjust your RPM/TPM for your model deployment. Could you have a look at if you can increase your RPM? |
I should add: yes, we can do better at handling rate limits for the caller -- a feature we should explore in the future. But hopefully my suggestion above can help mitigate your current 429s. |
Hi @moonbox3, thanks for the detailed explanation. I believe it worked. I noticed that when responding to some conversations, the model took a bit of time. I added additional parameters based on my discussions with GPT. Since I'm not a software expert, I found it challenging to understand what happens behind the scenes. polling_options = RunPollingOptions(
run_polling_interval=timedelta(seconds=5),
run_polling_backoff=timedelta(seconds=30),
run_polling_backoff_threshold=2,
run_polling_timeout=timedelta(minutes=5),
) Would you mind if I asked a few more questions? I think it would help me understand the concept better.
Additionally, I have a question unrelated to my initial issue but still want to understand my limitations.
Thanks in advance! |
Hi @anu43, based on your current settings, you will be polling OpenAI's server for a result for your operation every 5 seconds (you could probably reduce this if you want less latency during a conversation).
In the synchronous code execution, yes, it's a config to wait for the model's operation to complete. When you have an OpenAI assistant, you create a thread (similar to a chat history, but it lives on the sever). You then add a message to the thread, and invoke a run, kicking off the execution. To know when the run is complete, we poll on it. We can either poll quickly (what the default values are, but can run into 429s if you RPM/TPM are low) or we can poll more slowly, it saves API calls, but it can introduce latency and higher processing times. The server-side operations are asynchronous so that is why polling is required.
I do believe we are still limited by OpenAI's limited compute when using the code interpreter. You could have a look at |
Closing as we've solved the original issue by setting custom run poll options for the assistant agent. |
My initial intention was to enhance the computational complexity of the await agent.add_chat_message(
thread_id=thread_id,
message=ChatMessageContent(role=AuthorRole.USER, content=user_input),
) I decided to also include the assistant's response in the conversation history: # Add the assistant's message to the history
await agent.add_chat_message(
thread_id=thread_id,
message=ChatMessageContent(
role=AuthorRole.ASSISTANT, content=response.content
),
) I'm uncertain if this addition is necessary, as it's possible that AzureAssistantAgent might already append its responses automatically. I would appreciate your feedback on this. |
@moonbox3 any comments for chat history? |
ChatHistory is usually a concept/operation driven by the caller. That means you'll want to choose how/when you update and with what information. |
We're encountering challenges when attempting to run more complex ML/DL algorithms on the Titanic dataset using an Azure Assistant Agent. It's unclear whether this is due to code interpreter limitations or our implementation.
Current Behavior:
Error Message:
Relevant Code Snippet:
Questions:
Desired Outcome:
We aim to understand the source of this limitation and find ways to handle rate limits effectively, allowing us to perform more complex ML tasks without errors. Additionally, we seek guidance on best practices for working with the Azure Assistant Agent for computationally intensive tasks.
Any insights, suggestions, or examples of addressing these issues would be greatly appreciated.
The text was updated successfully, but these errors were encountered: