Proposal: LLMRateLimitingPlugin to simulate token-based throttling

## Proposal

Introduce a new plugin named **LLMRateLimitingPlugin** that simulates throttling based on the number of input and output tokens within a specified timeframe. This plugin will enable developers to verify and test how their applications behave when token limits are exceeded, similar to how the existing RateLimitingPlugin works for request/response rates.

### Key Features:
- Simulate throttling based on configurable token limits (input and output tokens) within a user-defined timeframe.
- Allow developers to configure thresholds and time windows for token consumption.
- Provide feedback/response when token limits are exceeded, mirroring real-world LLM API behavior.
- Useful for preparing applications for production LLM rate limits and improving resilience.

### Motivation
Currently, the existing RateLimitingPlugin is focused on request/response rates. Many LLM providers enforce limits on tokens rather than request counts. This new plugin will help developers proactively identify and handle token-based throttling scenarios.

### Reference
- The new plugin should be modeled similarly to the existing **RateLimitingPlugin**, but should focus on tokens instead of requests/responses.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proposal: LLMRateLimitingPlugin to simulate token-based throttling #1309

Proposal

Key Features:

Motivation

Reference

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposal: LLMRateLimitingPlugin to simulate token-based throttling #1309

Description

Proposal

Key Features:

Motivation

Reference

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions