-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Open
Labels
Description
🎯 The Goal / Use Case
In real-world usage, different models and providers (e.g., OpenAI, Anthropic, or self-hosted models) enforce different rate limits.
Currently, PicoClaw does not provide a built-in way to control request rates at the model level, which may lead to:
- Frequent rate limit exceeded errors
- Unstable gateway behavior under high concurrency
- Difficulty managing multi-model or multi-tenant workloads
A common use case is to limit a specific model to a fixed rate, such as 50 requests per minute (RPM).
💡 Proposed Solution
Introduce model-level rate limiting configuration, allowing users to define limits per model.
Suggested capabilities:
- Configure RPM (requests per minute) per model (e.g., 50 RPM)
- Optionally support TPM (tokens per minute)
- Apply rate limiting automatically in the gateway before forwarding requests to providers
Example configuration:
{
"model": "gpt-4",
"rate_limit": {
"rpm": 50,
"tpm": 10000
}
}🛠 Potential Implementation (Optional)
- Implement a token bucket or leaky bucket rate limiter per model
- Maintain a rate limiter map keyed by model name in the gateway layer
- Apply limiting before dispatching requests to the provider
Possible structure in Go:
- RateLimiterManager (map[model]limiter)
- Middleware in gateway request pipeline
- Config-driven initialization
🚦 Impact & Roadmap Alignment - This is a Core Feature
- This is a Nice-to-Have / Enhancement
- This aligns with the current Roadmap
🔄 Alternatives Considered
- External rate limiting (e.g., Nginx, API Gateway)
- Not flexible for per-model control
- Client-side throttling
- Hard to maintain and not centralized
💬 Additional Context - Many providers enforce strict rate limits, making this feature essential for stable operation
- This would greatly improve reliability in multi-model and production deployments
- Could be extended in the future to support:
- Per-user or per-API-key rate limiting
- Burst control and priority queues
Reactions are currently unavailable