Skip to content

TokenRateLimitPolicy: Figure out a solution for updating Limitador's token counters at request time #200

@eguzki

Description

@eguzki

The core challenge with token-based rate limiting is that we don't know the token count until after the AI request is complete. This means we can only update the usage counter in Limitador after the resources have already been used, making it too late to block the request. The current implementation mitigates this by using a two-call approach: a preliminary check during the request phase determines if the limit is breached, and a second call post-response updates the counter. While effective, this pattern introduces the overhead of an additional Limitador call per request, resulting in two separate calls for every AI request.

Sequence Diagram: Token rate limiting and auth

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions