Open
Description
Describe the Bug
I am encountering an issue where concurrent requests are being processed sequentially rather than simultaneously when deployed on AWS Fargate.
I suspect the problem is that boto3 runs synchronously, and its calls are blocking.
API Details
- API Used: /chat/completions
- Model Used: all of them
To Reproduce
Steps to reproduce the behavior:
- Deploy the service on AWS Fargate following the standard setup procedures.
- Send multiple concurrent requests (e.g., 10 concurrent requests) to the API.
- Observe that the requests are processed sequentially instead of concurrently.
Expected Behavior
I expected that when sending multiple concurrent requests to the API, all requests would be handled simultaneously or at least as many as the server can handle