Skip to content

[Feature]Add Model-Level Rate Limiting #2029

@SiYue-ZO

Description

@SiYue-ZO

🎯 The Goal / Use Case

In real-world usage, different models and providers (e.g., OpenAI, Anthropic, or self-hosted models) enforce different rate limits.

Currently, PicoClaw does not provide a built-in way to control request rates at the model level, which may lead to:

  • Frequent rate limit exceeded errors
  • Unstable gateway behavior under high concurrency
  • Difficulty managing multi-model or multi-tenant workloads

A common use case is to limit a specific model to a fixed rate, such as 50 requests per minute (RPM).

💡 Proposed Solution

Introduce model-level rate limiting configuration, allowing users to define limits per model.

Suggested capabilities:

  • Configure RPM (requests per minute) per model (e.g., 50 RPM)
  • Optionally support TPM (tokens per minute)
  • Apply rate limiting automatically in the gateway before forwarding requests to providers

Example configuration:

{
  "model": "gpt-4",
  "rate_limit": {
    "rpm": 50,
    "tpm": 10000
  }
}

🛠 Potential Implementation (Optional)

  • Implement a token bucket or leaky bucket rate limiter per model
  • Maintain a rate limiter map keyed by model name in the gateway layer
  • Apply limiting before dispatching requests to the provider

Possible structure in Go:

  • RateLimiterManager (map[model]limiter)
  • Middleware in gateway request pipeline
  • Config-driven initialization
    🚦 Impact & Roadmap Alignment
  • This is a Core Feature
  • This is a Nice-to-Have / Enhancement
  • This aligns with the current Roadmap

🔄 Alternatives Considered

  • External rate limiting (e.g., Nginx, API Gateway)
  • Not flexible for per-model control
  • Client-side throttling
  • Hard to maintain and not centralized
    💬 Additional Context
  • Many providers enforce strict rate limits, making this feature essential for stable operation
  • This would greatly improve reliability in multi-model and production deployments
  • Could be extended in the future to support:
    • Per-user or per-API-key rate limiting
    • Burst control and priority queues

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions