TokenRateLimitPolicy: Figure out a solution for updating Limitador's token counters at request time

The core challenge with token-based rate limiting is that we don't know the token count until after the AI request is complete. This means we can only update the usage counter in Limitador after the resources have already been used, making it too late to block the request. The current implementation mitigates this by using a two-call approach: a preliminary check during the request phase determines if the limit is breached, and a second call post-response updates the counter. While effective, this pattern introduces the overhead of an additional Limitador call per request, resulting in two separate calls for every AI request.

[Sequence Diagram: Token rate limiting and auth](https://github.com/Kuadrant/architecture/blob/main/rfcs/0013-ai-policies.md#sequence-diagram-token-rate-limiting-and-auth)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TokenRateLimitPolicy: Figure out a solution for updating Limitador's token counters at request time #200

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

TokenRateLimitPolicy: Figure out a solution for updating Limitador's token counters at request time #200

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions