Skip to content

Adds LanguageModelRateLimitingPlugin. Closes #1309 #1324

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 15, 2025

Conversation

waldekmastykarz
Copy link
Collaborator

Adds LanguageModelRateLimitingPlugin. Closes #1309

Test:

devproxyrc.json:

{
  "$schema": "https://github.com/raw/dotnet/dev-proxy/main/schemas/v1.0.0/rc.schema.json",
  "plugins": [
    {
      "name": "LanguageModelRateLimitingPlugin",
      "enabled": true,
      "pluginPath": "~appFolder/plugins/DevProxy.Plugins.dll",
      "configSection": "languageModelRateLimitingPlugin"
    }
  ],
  "urlsToWatch": [
    "*"
  ],
  "languageModelRateLimitingPlugin": {
    "$schema": "https://github.com/raw/dotnet/dev-proxy/main/schemas/v1.0.0/languagemodelratelimitingplugin.schema.json",
    "promptTokenLimit": 500,
    "completionTokenLimit": 500,
    "resetTimeWindowSeconds": 300
  },
  "logLevel": "information",
  "newVersionNotification": "stable",
  "showSkipMessages": true,
  "showTimestamps": true,
  "validateSchemas": true,
  "asSystemProxy": false
}

Call ollama a few times:

cucurl -ikx http://127.0.0.1:8000 -X POST http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2",
    "messages": [
      {
        "role": "user",
        "content": "Why is the sky blue?"
      }
    ]
  }'

Roughly third request should fail with a 429

The schema validation error on startup is expected because the schema is in this PR.

@waldekmastykarz waldekmastykarz requested a review from a team as a code owner July 13, 2025 10:50
@garrytrinder garrytrinder requested a review from Copilot July 14, 2025 08:11
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds a new LanguageModelRateLimitingPlugin to enforce per-window token quotas and provide throttling or custom responses on limit exceed.

  • Introduces JSON schemas for plugin configuration and custom response files
  • Implements the core plugin logic and a file-watcher loader for custom responses
  • Updates OpenAIModels classes from abstract to concrete to support deserialization

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
schemas/v1.0.0/languagemodelratelimitingplugin.schema.json Defines config properties for the rate-limiting plugin
schemas/v1.0.0/languagemodelratelimitingplugin.customresponsefile.schema.json Defines schema for custom error responses
DevProxy.Plugins/Behavior/LanguageModelRateLimitingPlugin.cs Implements rate-limiting, token tracking, throttle/custom responses
DevProxy.Plugins/Behavior/LanguageModelRateLimitingCustomResponseLoader.cs Loads and watches custom response files
DevProxy.Abstractions/LanguageModel/OpenAIModels.cs Changed OpenAIRequest/Response from abstract classes to concrete classes
Comments suppressed due to low confidence (4)

schemas/v1.0.0/languagemodelratelimitingplugin.schema.json:4

  • Add a "required" array to enforce mandatory properties (e.g., "promptTokenLimit", "completionTokenLimit", "resetTimeWindowSeconds") to ensure configuration completeness.
  "type": "object",

schemas/v1.0.0/languagemodelratelimitingplugin.customresponsefile.schema.json:5

  • Define a "required" array (e.g., ["body", "statusCode"]) so consumers must include these mandatory fields in custom response files.
  "type": "object",

DevProxy.Plugins/Behavior/LanguageModelRateLimitingPlugin.cs:38

  • Consider adding unit tests covering rate limiting logic (token decrement, reset window, throttle/custom response) to ensure behavior is validated and prevent regressions.
public sealed class LanguageModelRateLimitingPlugin(

DevProxy.Plugins/Behavior/LanguageModelRateLimitingPlugin.cs:155

  • The empty list literal '[]' may not compile or infer the correct type for headersList. Use 'new List()' to explicitly create an empty List.
                        [];

@garrytrinder
Copy link
Contributor

image

@garrytrinder garrytrinder merged commit 7077cae into dotnet:main Jul 15, 2025
4 checks passed
@waldekmastykarz waldekmastykarz deleted the lmratelimitingplugin branch July 15, 2025 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proposal: LLMRateLimitingPlugin to simulate token-based throttling
2 participants