Skip to content

Conversation

michaelfeil
Copy link
Contributor

@michaelfeil michaelfeil commented Aug 20, 2025

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • Chores
    • Enforced a 45 MB maximum request body size for the Completions, Chat Completions, and Embeddings endpoints.
    • Requests exceeding this limit will be rejected, ensuring more reliable performance and protection against oversized payloads.
    • Typical usage is unaffected; only unusually large inputs are impacted.
    • No changes to public APIs or request/response formats.

@michaelfeil michaelfeil requested a review from a team as a code owner August 20, 2025 23:01
Copy link

copy-pr-bot bot commented Aug 20, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copy link

👋 Hi michaelfeil! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

@github-actions github-actions bot added the external-contribution Pull request is from an external contributor label Aug 20, 2025
@michaelfeil michaelfeil changed the title Support for HTTP Body limit in axum server fix: limit Support for HTTP Body limit in axum server Aug 20, 2025
@github-actions github-actions bot added the fix label Aug 20, 2025
Copy link
Contributor

coderabbitai bot commented Aug 20, 2025

Walkthrough

A private 45 MB BODY_LIMIT was added and applied via DefaultBodyLimit to three OpenAI HTTP routers: completions, chat_completions, and embeddings. This enforces maximum request body sizes internally without changing public APIs or signatures.

Changes

Cohort / File(s) Summary
OpenAI HTTP service body limit enforcement
lib/llm/src/http/service/openai.rs
Added private const BODY_LIMIT = 45 MB; wrapped completions_router, chat_completions_router, and embeddings_router with DefaultBodyLimit(BODY_LIMIT); no public API changes.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant C as Client
  participant S as OpenAI Service
  participant BL as DefaultBodyLimit (45 MB)
  participant H as Endpoint Handler

  C->>S: HTTP request (completion/chat/embedding)
  S->>BL: Route request
  alt Body <= 45 MB
    BL->>H: Forward request
    H-->>S: Response
    S-->>C: HTTP 200/4xx/5xx
  else Body > 45 MB
    BL-->>S: Reject (Payload Too Large)
    S-->>C: HTTP 413
  end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I nibbled bytes and set a gate,
Forty-five meg—no more on the plate.
Completions, chats, embeddings too,
Queue up small, we’ll speed you through.
Thump-thump logs, the burrow’s tidy—
Requests behave, no payload mighty.

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
lib/llm/src/http/service/openai.rs (4)

51-52: Make the limit name self-descriptive and consider configurability.

The constant works, but a more descriptive name and docs help future maintainers. Consider also making it configurable (env/config) so ops can tune without a rebuild.

-const BODY_LIMIT: usize = 45 * 1024 * 1024;
+/// Maximum OpenAI request body size in bytes (default: 45 MiB).
+const MB: usize = 1024 * 1024;
+const OPENAI_BODY_LIMIT_BYTES: usize = 45 * MB;

And update the usages accordingly:

- .layer(axum::extract::DefaultBodyLimit::max(BODY_LIMIT))
+ .layer(axum::extract::DefaultBodyLimit::max(OPENAI_BODY_LIMIT_BYTES))

If you’d like, I can wire this to a config/env flag (e.g., DYNAMO_OPENAI_MAX_BODY_BYTES) in a follow-up.


1023-1024: Apply the same body limit to the /v1/responses router for consistency.

Responses accepts Json and should likely be capped the same as the other OpenAI endpoints to avoid surprising differences.

 pub fn responses_router(
     state: Arc<service_v2::State>,
     template: Option<RequestTemplate>,
     path: Option<String>,
 ) -> (Vec<RouteDoc>, Router) {
     let path = path.unwrap_or("/v1/responses".to_string());
     let doc = RouteDoc::new(axum::http::Method::POST, &path);
     let router = Router::new()
         .route(&path, post(handler_responses))
+        .layer(axum::extract::DefaultBodyLimit::max(OPENAI_BODY_LIMIT_BYTES))
         .with_state((state, template));
     (vec![doc], router)
 }

If the omission is intentional (e.g., different limits per endpoint), consider documenting the rationale or making the per-endpoint limit explicit at the call site.


1038-1039: DRY: factor the body-limit layer into a tiny helper to avoid duplication.

This is minor, but it cuts repetition across routers and keeps future changes one-liner.

// Near this module:
fn apply_openai_body_limit<S>(router: Router<S>) -> Router<S> {
    router.layer(axum::extract::DefaultBodyLimit::max(OPENAI_BODY_LIMIT_BYTES))
}

// Usage:
let router = apply_openai_body_limit(
    Router::new().route(&path, post(embeddings))
).with_state(state);

1007-1008: Map oversized JSON-body rejections to your ErrorMessage shape
Axum’s DefaultBodyLimit::max enforces a cap (2 MB by default) and, when exceeded on the Json extractor, produces a JsonRejection::BytesRejection whose .status() is 413 Payload Too Large and whose .body_text() yields a plain-text error(docs.rs). To keep your API’s ErrorMessage { error: String } format consistent, catch this rejection and return JSON:

 // In your handler signature:
- request: Json<NvCreateCompletionRequest>,
+ request: Result<Json<NvCreateCompletionRequest>, JsonRejection>,

 async fn handler_completions(
     State(state): State<Arc<service_v2::State>>,
     headers: HeaderMap,
-    request: Json<NvCreateCompletionRequest>,
+    request: Result<Json<NvCreateCompletionRequest>, JsonRejection>,
 ) -> Result<Response, ErrorResponse> {
     let Json(request) = request.map_err(|rej| {
-        // Default produces a 413 with plain text
-        rej.into_response()
+        (
+            StatusCode::PAYLOAD_TOO_LARGE,
+            Json(ErrorMessage {
+                error: rej.body_text().into(),
+            }),
+        )
     })?;
     // ...
 }

Alternatively, apply a global mapping middleware:

use tower::{BoxError, ServiceBuilder};
use tower_http::error_handling::HandleErrorLayer;

let app = Router::new()
    // ...
    .layer(
        ServiceBuilder::new()
            .layer(DefaultBodyLimit::max(BODY_LIMIT))
            .layer(HandleErrorLayer::new(|err: BoxError| async move {
                if let Some(JsonRejection::BytesRejection(_)) =
                    err.downcast_ref::<JsonRejection>()
                {
                    (
                        StatusCode::PAYLOAD_TOO_LARGE,
                        Json(ErrorMessage {
                            error: "Request body too large".into(),
                        }),
                    )
                        .into_response()
                } else {
                    Err(err)
                }
            }))
    )
    .with_state(state);
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between bc290e7 and d0fa8e6.

📒 Files selected for processing (1)
  • lib/llm/src/http/service/openai.rs (4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Build and Test - dynamo
  • GitHub Check: pre-merge-rust (lib/bindings/python)
  • GitHub Check: pre-merge-rust (.)
  • GitHub Check: pre-merge-rust (lib/runtime/examples)

Copy link
Contributor

@rmccorm4 rmccorm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. We probably want an env var to control this, but can follow-up on it. CC @grahamking @GuanLuo @kthui

Added #2584 for follow-up on configuring

@rmccorm4
Copy link
Contributor

Likely relates to #2580

michaelfeil and others added 2 commits August 20, 2025 19:30
Copy link
Contributor

@grahamking grahamking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Completely missed the 2MB default.

@grahamking grahamking merged commit 41a617f into ai-dynamo:main Aug 21, 2025
11 checks passed
hhzhang16 pushed a commit that referenced this pull request Aug 27, 2025
Signed-off-by: Michael Feil <[email protected]>
Co-authored-by: Ryan McCormick <[email protected]>
Signed-off-by: Hannah Zhang <[email protected]>
nv-anants pushed a commit that referenced this pull request Aug 28, 2025
KrishnanPrash pushed a commit that referenced this pull request Sep 2, 2025
Signed-off-by: Michael Feil <[email protected]>
Co-authored-by: Ryan McCormick <[email protected]>
Signed-off-by: Krishnan Prashanth <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external-contribution Pull request is from an external contributor fix size/S
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants