Skip to content

Conversation

ayushag-nv
Copy link
Contributor

@ayushag-nv ayushag-nv commented Aug 21, 2025

Overview:

adds two cli args in vLLM backend worker to enable user provided tool and reasoning parsing

  • dyn-tool-call-parser
  • dyn_reasoning-parser

These names are choses as individual backends have their own cli name, so we don't want to conflict them.

These args are passed and consumed in frontend to do tool and reasoning parsing.

Current PR scope is just tool call parsing and vLLM

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • New Features

    • Configure tool-call and reasoning parsers via CLI and configuration; settings propagate to runtime and Python API.
    • Per-model streaming arguments added; non-streaming completions and chat responses now honor the configured tool-call parser.
  • Improvements

    • Added informative logs indicating which tool-call parser is used (or if none is set).
  • Tests

    • Updated tests to validate new streaming argument handling.

Copy link

copy-pr-bot bot commented Aug 21, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@ayushag-nv ayushag-nv requested review from PeaBrane, hhzhang16 and a team as code owners August 21, 2025 20:50
@ayushag-nv ayushag-nv marked this pull request as draft August 21, 2025 20:50
@github-actions github-actions bot added the feat label Aug 21, 2025
Copy link
Member

@paulhendricks paulhendricks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far

@ayushag-nv ayushag-nv changed the title feat: cli args for parsers in backend and MDC feat: implement cli args for tool and reasoning parsers Aug 22, 2025
@ayushag-nv ayushag-nv changed the title feat: implement cli args for tool and reasoning parsers feat: [vLLM] implement cli args for tool and reasoning parsers Aug 22, 2025
@ayushag-nv ayushag-nv marked this pull request as ready for review August 22, 2025 19:24
@ayushag-nv ayushag-nv requested a review from messiaen August 22, 2025 19:26
Copy link
Contributor

coderabbitai bot commented Aug 22, 2025

Walkthrough

Adds optional parser names (tool_call_parser, reasoning_parser) to config/CLI, propagates them into ModelRuntimeConfig, exposes them via bindings/manager, introduces StreamArgs carrying these options, and threads StreamArgs through OpenAI HTTP handlers and aggregators to influence tool-call parsing. Minor whitespace/logging tweaks and test updates accompany signature changes.

Changes

Cohort / File(s) Summary of changes
VLLM config & CLI
components/backends/vllm/src/dynamo/vllm/args.py, components/backends/vllm/src/dynamo/vllm/main.py
Added Optional[str] fields tool_call_parser, reasoning_parser to Config; new CLI flags --dyn-tool-call-parser, --dyn-reasoning-parser; propagated into runtime_config in init.
Runtime config struct & accessors
lib/llm/src/local_model/runtime_config.rs, lib/bindings/python/rust/llm/local_model.rs, lib/llm/src/discovery/model_manager.rs
Added tool_call_parser and reasoning_parser fields to ModelRuntimeConfig; Python getters/setters; ModelManager accessor get_model_tool_call_parser.
OpenAI protocol args
lib/llm/src/protocols/openai.rs
Introduced public StreamArgs { tool_call_parser, reasoning_parser } with derives and constructor.
Aggregators (chat & completions)
lib/llm/src/protocols/openai/chat_completions/aggregator.rs, lib/llm/src/protocols/openai/completions/aggregator.rs
Updated aggregator apply/constructors to accept StreamArgs; threaded through delta aggregation; tests adjusted to pass StreamArgs::default().
HTTP service wiring
lib/llm/src/http/service/openai.rs
Built per-model StreamArgs (using tool_call_parser; reasoning_parser None for now) and passed to from_annotated_stream across completions/chat/responses; updated imports and call sites.
Tests
lib/llm/tests/aggregators.rs
Updated to import/export StreamArgs and pass it to from_sse_stream.
Parsers logging
lib/parsers/src/tool_calling/tools.rs
Added info logs indicating selected tool parser (or absence) in try_tool_call_parse_aggregate.
Whitespace-only
lib/llm/src/local_model.rs, lib/llm/src/preprocessor.rs
Inserted/removed blank lines; no functional change.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant CLI as CLI
  participant VArgs as VLLM Args/Config
  participant VM as VLLM Main
  participant RT as ModelRuntimeConfig
  participant HTTP as OpenAI HTTP Service
  participant Prot as StreamArgs
  participant Agg as Aggregators
  participant Pars as Parsers

  User->>CLI: Start with flags (--dyn-...-parser)
  CLI->>VArgs: parse_args()
  VArgs->>VM: Config{tool_call_parser, reasoning_parser}
  VM->>RT: set tool_call_parser / reasoning_parser
  User->>HTTP: Completion/Chat request (model)
  HTTP->>RT: lookup model runtime_config
  HTTP->>Prot: get_stream_args(model)
  note right of Prot: StreamArgs{tool_call_parser,<br/>reasoning_parser}
  HTTP->>Agg: from_annotated_stream(stream, StreamArgs)
  Agg->>Pars: try_tool_call_parse_aggregate(StreamArgs.tool_call_parser)
  Pars-->>Agg: Parsed tool calls (optional)
  Agg-->>HTTP: Aggregated response
  HTTP-->>User: Final response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

In burrows of bytes I hop with glee,
Two parsers named, now roaming free.
StreamArgs packed in a carrot crate,
They ride the deltas, aggregate.
Tool bells ring, reasoning sings—
Thump-thump! says QA as the response springs. 🥕🐇

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
lib/llm/src/protocols/openai/chat_completions/aggregator.rs (1)

181-201: Avoid logging raw tool-call arguments; log parse errors instead.

Logging full JSON arguments can leak PII at debug level. Also, parse errors are silently dropped, making issues hard to diagnose.

Apply this diff to redact arguments and surface parse failures at debug:

-            if choice.tool_calls.is_none() {
-                if let Ok(tool_calls) = try_tool_call_parse_aggregate(
-                    &choice.text,
-                    extra_stream_args.tool_call_parser.as_deref(),
-                ) {
+            if choice.tool_calls.is_none() {
+                match try_tool_call_parse_aggregate(
+                    &choice.text,
+                    extra_stream_args.tool_call_parser.as_deref(),
+                ) {
+                    Ok(tool_calls) => {
                     if tool_calls.is_empty() {
                         continue;
                     }
                     for tool_call in &tool_calls {
-                        tracing::debug!(
-                            tool_call_id = %tool_call.id,
-                            function_name = %tool_call.function.name,
-                            arguments = %tool_call.function.arguments,
-                            "Parsed structured tool call from aggregated content"
-                        );
+                        tracing::debug!(
+                            tool_call_id = %tool_call.id,
+                            function_name = %tool_call.function.name,
+                            args_len = tool_call.function.arguments.len(),
+                            "Parsed structured tool call from aggregated content"
+                        );
                     }
                     choice.tool_calls = Some(tool_calls);
                     choice.text.clear();
                     choice.finish_reason =
                         Some(dynamo_async_openai::types::FinishReason::ToolCalls);
-                }
+                    }
+                    Err(e) => {
+                        tracing::debug!(
+                            error = %e,
+                            "Tool-call parsing failed; leaving content as-is"
+                        );
+                    }
+                }
             }
🧹 Nitpick comments (21)
lib/parsers/src/tool_calling/tools.rs (2)

17-21: Lower log level and use structured fields to avoid noisy logs and Option formatting

These logs sit on a hot path and will emit per parse attempt. Prefer debug level and structured fields, and avoid printing Option as Some("...").

Apply:

-    if parser_str.is_none() {
-        tracing::info!("No tool parser provided. Trying parsing with default parser.");
-    } else {
-        tracing::info!("Using tool parser: {:?}", parser_str);
-    }
+    match parser_str {
+        Some(p) => tracing::debug!(parser = %p, "Using tool parser for aggregation"),
+        None => tracing::debug!("No tool parser provided; default parser will be used"),
+    }

45-48: Consider consistent logging in stream path (mirrors aggregate path)

For parity and easier troubleshooting, add the same (debug-level) structured log at the start of try_tool_call_parse_stream.

Example:

pub fn try_tool_call_parse_stream(
    message: &str,
    parser_str: Option<&str>,
) -> anyhow::Result<Vec<dynamo_async_openai::types::ChatCompletionMessageToolCallChunk>> {
    match parser_str {
        Some(p) => tracing::debug!(parser = %p, "Using tool parser for streaming"),
        None => tracing::debug!("No tool parser provided; default parser will be used"),
    }
    let parsed = detect_and_parse_tool_call(message, parser_str)?;
    // ...
}
lib/llm/src/local_model/runtime_config.rs (1)

16-19: Document new fields (optionally skip serializing None)

Good additive fields; serde with Option<String> remains backward-compatible. Consider brief docs for expected values. Optionally, skip serializing None to avoid null noise in etcd snapshots.

-    pub tool_call_parser: Option<String>,
+    /// Optional tool-call parser identifier for this model. If None, default parsing is used.
+    pub tool_call_parser: Option<String>,
 
-    pub reasoning_parser: Option<String>,
+    /// Optional reasoning parser identifier for this model. Reserved for future use.
+    pub reasoning_parser: Option<String>,
components/backends/vllm/src/dynamo/vllm/main.py (1)

237-239: Normalize empty CLI values to None to avoid passing "" downstream

If a user passes an empty string (e.g., via env plumbing), we currently propagate "", which could be interpreted as “use a parser named empty-string.” Normalize to None.

-        runtime_config.tool_call_parser = config.tool_call_parser
-        runtime_config.reasoning_parser = config.reasoning_parser
+        # Normalize empty strings to None to avoid passing "" to downstream parsers
+        runtime_config.tool_call_parser = config.tool_call_parser or None
+        runtime_config.reasoning_parser = config.reasoning_parser or None
lib/llm/src/discovery/model_manager.rs (3)

250-259: Avoid redundant clone/to_string — return the existing String

config.tool_call_parser.clone() already yields Option<String>. The subsequent .map(|parser| parser.to_string()) needlessly clones.

-            .and_then(|config| config.tool_call_parser.clone())
-            .map(|parser| parser.to_string())
+            .and_then(|config| config.tool_call_parser.clone())

250-259: Match by slug as well as display name to prevent lookup mismatches

Callers may pass either the served model slug or display name. Matching only on entry.name == model can miss entries. Consider comparing slugs too.

 pub fn get_model_tool_call_parser(&self, model: &str) -> Option<String> {
-        self.entries
+        let model_slug = Slug::from_string(model);
+        self.entries
             .lock()
             .unwrap()
             .values()
-            .find(|entry| entry.name == model)
+            .find(|entry| entry.name == model || Slug::from_string(&entry.name) == model_slug)
             .and_then(|entry| entry.runtime_config.as_ref())
             .and_then(|config| config.tool_call_parser.clone())
 }

250-259: Add symmetric accessor for reasoning parser (future-proofing)

For API symmetry and future usage, consider get_model_reasoning_parser(&self, model: &str) -> Option<String> alongside the tool-call accessor. I can follow up with a small patch if desired.

lib/llm/src/protocols/openai.rs (1)

197-211: StreamArgs: solid public surface; add docs and minor ergonomics (PartialEq/Eq, is_empty).

These fields will flow through many call sites. A tiny polish helps testing and introspection.

Apply this diff to add docs, derive PartialEq/Eq, and a convenience helper:

-#[derive(Clone, Debug, Serialize, Deserialize, Default)]
+#[derive(Clone, Debug, Serialize, Deserialize, Default, PartialEq, Eq)]
 pub struct StreamArgs {
-    pub tool_call_parser: Option<String>,
+    /// Optional tool-call parser identifier, e.g. "hermes", "llama3_json".
+    pub tool_call_parser: Option<String>,
 
-    pub reasoning_parser: Option<String>,
+    /// Optional reasoning parser identifier (non-breaking; may be unused for now).
+    pub reasoning_parser: Option<String>,
 }
 
 impl StreamArgs {
     pub fn new(tool_call_parser: Option<String>, reasoning_parser: Option<String>) -> Self {
         Self {
             tool_call_parser,
             reasoning_parser,
         }
     }
+
+    /// Convenience: true when no extra stream behavior is requested.
+    pub fn is_empty(&self) -> bool {
+        self.tool_call_parser.is_none() && self.reasoning_parser.is_none()
+    }
 }
components/backends/vllm/src/dynamo/vllm/args.py (3)

109-109: Nit: typo in comment.

“adoped” → “adopted”.

-    # To avoid name conflicts with different backends, adoped prefix "dyn-" for dynamo specific args
+    # To avoid name conflicts with different backends, adopted prefix "dyn-" for Dynamo-specific args

111-121: Add CLI validation (choices) to prevent typos; optionally log selection.

Users will benefit from early validation for known tool-call parsers. Suggest adding argparse choices and a brief log for visibility.

-    parser.add_argument(
-        "--dyn-tool-call-parser",
-        type=str,
-        default=None,
-        help="Tool call parser name for the model. Available options: 'hermes', 'nemotron_deci', 'llama3_json', 'mistral', 'phi4'.",
-    )
+    parser.add_argument(
+        "--dyn-tool-call-parser",
+        type=str,
+        choices=["hermes", "nemotron_deci", "llama3_json", "mistral", "phi4"],
+        default=None,
+        help="Tool call parser name for the model. Options: hermes, nemotron_deci, llama3_json, mistral, phi4.",
+    )
@@
-    parser.add_argument(
+    parser.add_argument(
         "--dyn-reasoning-parser",
-        type=str,
+        type=str,
         default=None,
         help="Reasoning parser name for the model.",
     )

Optionally log the selected values after assignment (see below on lines 171-173).


171-173: Surface chosen parser selections in logs.

A single info/debug log aids debugging deployments.

     config.tool_call_parser = args.dyn_tool_call_parser
     config.reasoning_parser = args.dyn_reasoning_parser
+    if config.tool_call_parser:
+        logger.info(f"Using tool call parser: {config.tool_call_parser}")
+    if config.reasoning_parser:
+        logger.info(f"Using reasoning parser: {config.reasoning_parser}")
lib/bindings/python/rust/llm/local_model.rs (2)

37-45: Add Python-visible docs on new setters to reduce misuse.

Docstrings help users discover valid values and the ability to pass None.

 #[setter]
 fn set_tool_call_parser(&mut self, tool_call_parser: Option<String>) {
+    /// Set the tool-call parser name (e.g., "hermes", "llama3_json"). Pass None to clear.
     self.inner.tool_call_parser = tool_call_parser;
 }
 
 #[setter]
 fn set_reasoning_parser(&mut self, reasoning_parser: Option<String>) {
+    /// Set the reasoning parser name (if supported by the backend). Pass None to clear.
     self.inner.reasoning_parser = reasoning_parser;
 }

70-78: Mirror docs on getters; no functional issues.

Returning clones is fine. Add brief doc comments for Python help() output.

 #[getter]
 fn tool_call_parser(&self) -> Option<String> {
+    /// Get the configured tool-call parser name, if any.
     self.inner.tool_call_parser.clone()
 }
 
 #[getter]
 fn reasoning_parser(&self) -> Option<String> {
+    /// Get the configured reasoning parser name, if any.
     self.inner.reasoning_parser.clone()
 }
lib/llm/tests/aggregators.rs (1)

41-44: Add one test that exercises non-default StreamArgs.

Current tests only use StreamArgs::default(). Add a smoke test passing a non-default parser to guard API wiring (even if behavior is currently a no-op).

@@
 #[tokio::test]
 async fn test_openai_chat_stream() {
@@
 }
 
+#[tokio::test]
+async fn test_openai_chat_stream_with_stream_args() {
+    let data = std::fs::read_to_string("tests/data/replays/meta/llama-3.1-8b-instruct/chat_completions/chat-completion.streaming.1").unwrap();
+    let stream = create_message_stream(&data).take(8);
+    let args = StreamArgs {
+        tool_call_parser: Some("hermes".to_string()),
+        reasoning_parser: None,
+    };
+    let result = NvCreateChatCompletionResponse::from_sse_stream(Box::pin(stream), args)
+        .await
+        .unwrap();
+    assert!(result.choices.first().is_some());
+}

Also applies to: 65-68, 86-89, 107-109, 122-124

lib/llm/src/http/service/openai.rs (1)

198-203: get_stream_args: OK; consider logging and wiring reasoning when available.

Minor improvement: add a debug log so we can trace which parser was applied per model. Reasoning parser TODO noted.

 fn get_stream_args(state: &Arc<service_v2::State>, model: &str) -> StreamArgs {
     let tool_call_parser = state.manager().get_model_tool_call_parser(model);
     let reasoning_parser = None; // TODO: Implement reasoning parser
 
-    StreamArgs::new(tool_call_parser, reasoning_parser)
+    let args = StreamArgs::new(tool_call_parser, reasoning_parser);
+    tracing::debug!(model, tool_call_parser=?args.tool_call_parser, reasoning_parser=?args.reasoning_parser, "Constructed StreamArgs");
+    args
 }
lib/llm/src/protocols/openai/completions/aggregator.rs (3)

67-72: extra_stream_args is only logged; avoid noisy logs and keep PII surface minimal.

You don’t use extra_stream_args yet (expected), but the debug log is easy to forget. Prefer trace-level and structured fields, or remove.

Apply one of the following diffs (preferred: trace):

-        tracing::debug!("Tool Call Parser: {:?}", extra_stream_args.tool_call_parser); // TODO: remove this once completion has tool call support
+        tracing::trace!(
+            tool_call_parser = ?extra_stream_args.tool_call_parser,
+            "completions: received StreamArgs"
+        ); // TODO: downgrade or remove once completion has tool-call support

Or, remove entirely:

-        tracing::debug!("Tool Call Parser: {:?}", extra_stream_args.tool_call_parser); // TODO: remove this once completion has tool call support
+        // TODO: add usage when completion gets tool-call support

117-129: Minor comment typos (“return”/“conversion”).

Nit-level polish for readability.

-                    // to be return as part of the NIM Response Extension
+                    // to be returned as part of the NIM Response Extension
@@
-                        // Handle CompletionFinishReason -> FinishReason conversation
+                        // Handle CompletionFinishReason -> FinishReason conversion

290-299: Duplicate assertion in test_single_delta.

There are two identical assertions for choice.finish_reason.

-        assert_eq!(
-            choice.finish_reason,
-            Some(dynamo_async_openai::types::CompletionFinishReason::Length)
-        );
-        assert_eq!(
-            choice.finish_reason,
-            Some(dynamo_async_openai::types::CompletionFinishReason::Length)
-        );
+        assert_eq!(
+            choice.finish_reason,
+            Some(dynamo_async_openai::types::CompletionFinishReason::Length)
+        );
lib/llm/src/protocols/openai/chat_completions/aggregator.rs (3)

139-147: Make role aggregation resilient to late-arriving roles.

If the first delta for a choice lacks role, we keep None and later panic in From via expect(). Update role when a later delta provides it.

                 let state_choice =
                     aggregator
                         .choices
                         .entry(choice.index)
                         .or_insert(DeltaChoice {
                             index: choice.index,
                             text: "".to_string(),
-                            role: choice.delta.role,
+                            role: choice.delta.role,
                             finish_reason: None,
                             logprobs: choice.logprobs,
                             tool_calls: None,
                             reasoning_content: None,
                         });
+
+                // If role arrived late in the stream, adopt it.
+                if state_choice.role.is_none() && choice.delta.role.is_some() {
+                    state_choice.role = choice.delta.role;
+                }

229-255: Avoid panic if role is still missing at the end.

Default to Assistant rather than expect()-panic to keep the aggregator robust.

-                role: delta.role.expect("delta should have a Role"),
+                role: delta
+                    .role
+                    .unwrap_or(dynamo_async_openai::types::Role::Assistant),

541-597: Add a test where finish_reason is not ToolCalls but content parses as a tool-call.

This ensures we set finish_reason = ToolCalls on successful parse even if upstream didn’t.

I can add a test like below:

@@
     #[tokio::test]
     async fn test_tool_calling_output() {
@@
     }
+
+    #[tokio::test]
+    async fn test_tool_calling_when_finish_reason_is_none() {
+        let tool_call_json = r#"{"name":"get_weather","arguments":{"location":"SF"}}"#;
+        let annotated_delta = create_test_delta(
+            0,
+            tool_call_json,
+            Some(dynamo_async_openai::types::Role::Assistant),
+            None, // upstream didn't mark ToolCalls
+        );
+        let data = annotated_delta.data.unwrap();
+        let annotated_delta = Annotated { data: Some(data), id: Some("test_id".into()), event: None, comment: None };
+        let stream = Box::pin(stream::iter(vec![annotated_delta]));
+        let result = DeltaAggregator::apply(stream, StreamArgs::default()).await;
+        assert!(result.is_ok());
+        let response = result.unwrap();
+        assert_eq!(response.choices.len(), 1);
+        let choice = &response.choices[0];
+        assert!(choice.message.tool_calls.is_some());
+        assert!(choice.message.content.is_none());
+        assert_eq!(
+            choice.finish_reason,
+            Some(dynamo_async_openai::types::FinishReason::ToolCalls)
+        );
+    }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 5045f13 and ef14880.

📒 Files selected for processing (13)
  • components/backends/vllm/src/dynamo/vllm/args.py (3 hunks)
  • components/backends/vllm/src/dynamo/vllm/main.py (1 hunks)
  • lib/bindings/python/rust/llm/local_model.rs (2 hunks)
  • lib/llm/src/discovery/model_manager.rs (1 hunks)
  • lib/llm/src/http/service/openai.rs (8 hunks)
  • lib/llm/src/local_model.rs (2 hunks)
  • lib/llm/src/local_model/runtime_config.rs (1 hunks)
  • lib/llm/src/preprocessor.rs (0 hunks)
  • lib/llm/src/protocols/openai.rs (1 hunks)
  • lib/llm/src/protocols/openai/chat_completions/aggregator.rs (10 hunks)
  • lib/llm/src/protocols/openai/completions/aggregator.rs (7 hunks)
  • lib/llm/tests/aggregators.rs (6 hunks)
  • lib/parsers/src/tool_calling/tools.rs (1 hunks)
💤 Files with no reviewable changes (1)
  • lib/llm/src/preprocessor.rs
🧰 Additional context used
🧬 Code graph analysis (7)
components/backends/vllm/src/dynamo/vllm/main.py (2)
lib/llm/src/local_model.rs (2)
  • runtime_config (165-168)
  • runtime_config (355-357)
lib/bindings/python/rust/llm/local_model.rs (2)
  • tool_call_parser (71-73)
  • reasoning_parser (76-78)
components/backends/vllm/src/dynamo/vllm/args.py (2)
lib/bindings/python/rust/llm/local_model.rs (2)
  • tool_call_parser (71-73)
  • reasoning_parser (76-78)
lib/llm/src/local_model.rs (1)
  • default (65-85)
lib/llm/src/protocols/openai.rs (1)
lib/bindings/python/rust/llm/local_model.rs (2)
  • tool_call_parser (71-73)
  • reasoning_parser (76-78)
lib/llm/src/protocols/openai/completions/aggregator.rs (3)
lib/llm/src/protocols/openai/chat_completions/aggregator.rs (4)
  • convert_sse_stream (300-300)
  • from_annotated_stream (269-272)
  • from_annotated_stream (289-294)
  • apply (102-226)
lib/llm/src/protocols.rs (1)
  • convert_sse_stream (51-67)
lib/bindings/python/rust/llm/local_model.rs (1)
  • tool_call_parser (71-73)
lib/llm/tests/aggregators.rs (2)
lib/llm/src/protocols/openai/completions/aggregator.rs (2)
  • from_sse_stream (182-188)
  • default (49-51)
lib/llm/src/protocols/openai/chat_completions/aggregator.rs (3)
  • from_sse_stream (282-285)
  • from_sse_stream (296-302)
  • default (73-75)
lib/llm/src/protocols/openai/chat_completions/aggregator.rs (3)
lib/llm/src/protocols/openai/completions/aggregator.rs (5)
  • convert_sse_stream (186-186)
  • from_annotated_stream (190-195)
  • apply (68-165)
  • from_sse_stream (182-188)
  • default (49-51)
lib/llm/src/protocols.rs (1)
  • convert_sse_stream (51-67)
lib/parsers/src/tool_calling/tools.rs (1)
  • try_tool_call_parse_aggregate (13-39)
lib/llm/src/http/service/openai.rs (4)
lib/bindings/python/rust/llm/local_model.rs (2)
  • tool_call_parser (71-73)
  • reasoning_parser (76-78)
lib/llm/src/protocols/openai/completions/aggregator.rs (2)
  • new (55-65)
  • from_annotated_stream (190-195)
lib/llm/src/protocols/openai.rs (1)
  • new (205-210)
lib/llm/src/protocols/openai/chat_completions/aggregator.rs (3)
  • new (80-91)
  • from_annotated_stream (269-272)
  • from_annotated_stream (289-294)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: pre-merge-rust (.)
  • GitHub Check: pre-merge-rust (lib/runtime/examples)
  • GitHub Check: Build and Test - dynamo
  • GitHub Check: pre-merge-rust (lib/bindings/python)
🔇 Additional comments (10)
lib/llm/src/local_model.rs (1)

205-205: No-op whitespace changes — fine to keep

The blank lines improve readability slightly and have no functional impact.

Also applies to: 396-396

lib/llm/tests/aggregators.rs (1)

21-22: Public re-export of StreamArgs used correctly.

Import path and usage look good.

lib/llm/src/http/service/openai.rs (3)

40-41: Import of StreamArgs is appropriate.

No issues.


278-279: Completions: threading StreamArgs into the fold path looks correct.

This ensures non-streaming completions can leverage the selected parser.

Also applies to: 338-347


507-508: LGTM on retrieving and passing StreamArgs.

Non-streaming paths now consistently receive per-model parsing configuration.

Also applies to: 744-745

lib/llm/src/protocols/openai/completions/aggregator.rs (2)

25-28: Good plumbing: StreamArgs and convert_sse_stream imports are consistent with chat path.

This aligns the completions path with chat and keeps the API surface uniform.


181-195: Signature propagation is correct.

from_sse_stream and from_annotated_stream cleanly thread StreamArgs through to the aggregator.

lib/llm/src/protocols/openai/chat_completions/aggregator.rs (3)

22-25: Imports for convert_sse_stream and StreamArgs look right.

Matches the completions path and keeps the API consistent.


103-105: Added StreamArgs to apply signature — good threading of parser config.

This unlocks model-specific tool-call parsing without altering aggregation semantics.


269-299: No Other Internal Implementations—Breaking Change Is Limited

I ran a repo-wide search for all impl ChatCompletionAggregator and direct calls to from_annotated_stream/from_sse_stream. The only trait implementation in this crate is in lib/llm/src/protocols/openai/chat_completions/aggregator.rs (lines 288–299), and all internal call sites have already been updated to match the new signatures. There are no other implementors within our codebase.

• If you’re bumping this crate’s version, treat this as a breaking change:
– Update the changelog to note the signature change.
– Increment the major version (per SemVer) so downstream users get a clear upgrade signal.
• No further internal changes are needed—everything compiles and tests pass with the new signatures.
• External consumers who provided their own impl ChatCompletionAggregator for _ will need to update their implementations.

@paulhendricks paulhendricks self-requested a review August 22, 2025 20:08
@ayushag-nv ayushag-nv enabled auto-merge (squash) August 22, 2025 20:11
@ayushag-nv ayushag-nv merged commit cbe854f into main Aug 22, 2025
10 checks passed
@ayushag-nv ayushag-nv deleted the ayushag/worker-to-mdc-parser-info branch August 22, 2025 20:29
hhzhang16 pushed a commit that referenced this pull request Aug 27, 2025
jasonqinzhou pushed a commit that referenced this pull request Aug 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants