feat: [vLLM] implement cli args for tool and reasoning parsers #2619

ayushag-nv · 2025-08-21T20:50:52Z

Overview:

adds two cli args in vLLM backend worker to enable user provided tool and reasoning parsing

dyn-tool-call-parser
dyn_reasoning-parser

These names are choses as individual backends have their own cli name, so we don't want to conflict them.

These args are passed and consumed in frontend to do tool and reasoning parsing.

Current PR scope is just tool call parsing and vLLM

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Summary by CodeRabbit

New Features
- Configure tool-call and reasoning parsers via CLI and configuration; settings propagate to runtime and Python API.
- Per-model streaming arguments added; non-streaming completions and chat responses now honor the configured tool-call parser.
Improvements
- Added informative logs indicating which tool-call parser is used (or if none is set).
Tests
- Updated tests to validate new streaming argument handling.

copy-pr-bot · 2025-08-21T20:50:56Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

paulhendricks

Looks good so far

lib/bindings/python/rust/lib.rs

lib/llm/src/preprocessor.rs

lib/llm/src/local_model.rs

coderabbitai · 2025-08-22T19:28:06Z

Walkthrough

Adds optional parser names (tool_call_parser, reasoning_parser) to config/CLI, propagates them into ModelRuntimeConfig, exposes them via bindings/manager, introduces StreamArgs carrying these options, and threads StreamArgs through OpenAI HTTP handlers and aggregators to influence tool-call parsing. Minor whitespace/logging tweaks and test updates accompany signature changes.

Changes

Cohort / File(s)	Summary of changes
VLLM config & CLI `components/backends/vllm/src/dynamo/vllm/args.py`, `components/backends/vllm/src/dynamo/vllm/main.py`	Added Optional[str] fields `tool_call_parser`, `reasoning_parser` to Config; new CLI flags `--dyn-tool-call-parser`, `--dyn-reasoning-parser`; propagated into runtime_config in init.
Runtime config struct & accessors `lib/llm/src/local_model/runtime_config.rs`, `lib/bindings/python/rust/llm/local_model.rs`, `lib/llm/src/discovery/model_manager.rs`	Added `tool_call_parser` and `reasoning_parser` fields to ModelRuntimeConfig; Python getters/setters; ModelManager accessor `get_model_tool_call_parser`.
OpenAI protocol args `lib/llm/src/protocols/openai.rs`	Introduced public `StreamArgs { tool_call_parser, reasoning_parser }` with derives and constructor.
Aggregators (chat & completions) `lib/llm/src/protocols/openai/chat_completions/aggregator.rs`, `lib/llm/src/protocols/openai/completions/aggregator.rs`	Updated aggregator apply/constructors to accept `StreamArgs`; threaded through delta aggregation; tests adjusted to pass `StreamArgs::default()`.
HTTP service wiring `lib/llm/src/http/service/openai.rs`	Built per-model `StreamArgs` (using tool_call_parser; reasoning_parser None for now) and passed to `from_annotated_stream` across completions/chat/responses; updated imports and call sites.
Tests `lib/llm/tests/aggregators.rs`	Updated to import/export `StreamArgs` and pass it to `from_sse_stream`.
Parsers logging `lib/parsers/src/tool_calling/tools.rs`	Added info logs indicating selected tool parser (or absence) in `try_tool_call_parse_aggregate`.
Whitespace-only `lib/llm/src/local_model.rs`, `lib/llm/src/preprocessor.rs`	Inserted/removed blank lines; no functional change.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant CLI as CLI
  participant VArgs as VLLM Args/Config
  participant VM as VLLM Main
  participant RT as ModelRuntimeConfig
  participant HTTP as OpenAI HTTP Service
  participant Prot as StreamArgs
  participant Agg as Aggregators
  participant Pars as Parsers

  User->>CLI: Start with flags (--dyn-...-parser)
  CLI->>VArgs: parse_args()
  VArgs->>VM: Config{tool_call_parser, reasoning_parser}
  VM->>RT: set tool_call_parser / reasoning_parser
  User->>HTTP: Completion/Chat request (model)
  HTTP->>RT: lookup model runtime_config
  HTTP->>Prot: get_stream_args(model)
  note right of Prot: StreamArgs{tool_call_parser,<br/>reasoning_parser}
  HTTP->>Agg: from_annotated_stream(stream, StreamArgs)
  Agg->>Pars: try_tool_call_parse_aggregate(StreamArgs.tool_call_parser)
  Pars-->>Agg: Parsed tool calls (optional)
  Agg-->>HTTP: Aggregated response
  HTTP-->>User: Final response

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

feat: enable basic reasoning parsing of <think> </think> tokens #2555 — Implements reasoning parser and integrates it into aggregation; complements this PR’s reasoning_parser plumbing.
chore: Tool call parsers incremental improvements + Model Specific Parsers #2457 — Adds parser dispatchers and parsing logic used by the new tool_call_parser configuration.
feat: added parsers lib #2542 — Introduces parsers crate functions (e.g., detect_and_parse_tool_call) that this PR wires via StreamArgs.

Poem

In burrows of bytes I hop with glee,
Two parsers named, now roaming free.
StreamArgs packed in a carrot crate,
They ride the deltas, aggregate.
Tool bells ring, reasoning sings—
Thump-thump! says QA as the response springs. 🥕🐇

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

lib/llm/src/protocols/openai/chat_completions/aggregator.rs (1)

181-201: Avoid logging raw tool-call arguments; log parse errors instead.

Logging full JSON arguments can leak PII at debug level. Also, parse errors are silently dropped, making issues hard to diagnose.

Apply this diff to redact arguments and surface parse failures at debug:

-            if choice.tool_calls.is_none() {
-                if let Ok(tool_calls) = try_tool_call_parse_aggregate(
-                    &choice.text,
-                    extra_stream_args.tool_call_parser.as_deref(),
-                ) {
+            if choice.tool_calls.is_none() {
+                match try_tool_call_parse_aggregate(
+                    &choice.text,
+                    extra_stream_args.tool_call_parser.as_deref(),
+                ) {
+                    Ok(tool_calls) => {
                     if tool_calls.is_empty() {
                         continue;
                     }
                     for tool_call in &tool_calls {
-                        tracing::debug!(
-                            tool_call_id = %tool_call.id,
-                            function_name = %tool_call.function.name,
-                            arguments = %tool_call.function.arguments,
-                            "Parsed structured tool call from aggregated content"
-                        );
+                        tracing::debug!(
+                            tool_call_id = %tool_call.id,
+                            function_name = %tool_call.function.name,
+                            args_len = tool_call.function.arguments.len(),
+                            "Parsed structured tool call from aggregated content"
+                        );
                     }
                     choice.tool_calls = Some(tool_calls);
                     choice.text.clear();
                     choice.finish_reason =
                         Some(dynamo_async_openai::types::FinishReason::ToolCalls);
-                }
+                    }
+                    Err(e) => {
+                        tracing::debug!(
+                            error = %e,
+                            "Tool-call parsing failed; leaving content as-is"
+                        );
+                    }
+                }
             }

🧹 Nitpick comments (21)

lib/parsers/src/tool_calling/tools.rs (2)

17-21: Lower log level and use structured fields to avoid noisy logs and Option formatting

These logs sit on a hot path and will emit per parse attempt. Prefer debug level and structured fields, and avoid printing Option as Some("...").

Apply:

-    if parser_str.is_none() {
-        tracing::info!("No tool parser provided. Trying parsing with default parser.");
-    } else {
-        tracing::info!("Using tool parser: {:?}", parser_str);
-    }
+    match parser_str {
+        Some(p) => tracing::debug!(parser = %p, "Using tool parser for aggregation"),
+        None => tracing::debug!("No tool parser provided; default parser will be used"),
+    }

45-48: Consider consistent logging in stream path (mirrors aggregate path)

For parity and easier troubleshooting, add the same (debug-level) structured log at the start of try_tool_call_parse_stream.

Example:

pub fn try_tool_call_parse_stream(
    message: &str,
    parser_str: Option<&str>,
) -> anyhow::Result<Vec<dynamo_async_openai::types::ChatCompletionMessageToolCallChunk>> {
    match parser_str {
        Some(p) => tracing::debug!(parser = %p, "Using tool parser for streaming"),
        None => tracing::debug!("No tool parser provided; default parser will be used"),
    }
    let parsed = detect_and_parse_tool_call(message, parser_str)?;
    // ...
}

lib/llm/src/local_model/runtime_config.rs (1)

16-19: Document new fields (optionally skip serializing None)

Good additive fields; serde with Option<String> remains backward-compatible. Consider brief docs for expected values. Optionally, skip serializing None to avoid null noise in etcd snapshots.
-    pub tool_call_parser: Option<String>,
+    /// Optional tool-call parser identifier for this model. If None, default parsing is used.
+    pub tool_call_parser: Option<String>,
 
-    pub reasoning_parser: Option<String>,
+    /// Optional reasoning parser identifier for this model. Reserved for future use.
+    pub reasoning_parser: Option<String>,

components/backends/vllm/src/dynamo/vllm/main.py (1)

237-239: Normalize empty CLI values to None to avoid passing "" downstream

If a user passes an empty string (e.g., via env plumbing), we currently propagate "", which could be interpreted as “use a parser named empty-string.” Normalize to None.
-        runtime_config.tool_call_parser = config.tool_call_parser
-        runtime_config.reasoning_parser = config.reasoning_parser
+        # Normalize empty strings to None to avoid passing "" to downstream parsers
+        runtime_config.tool_call_parser = config.tool_call_parser or None
+        runtime_config.reasoning_parser = config.reasoning_parser or None

lib/llm/src/discovery/model_manager.rs (3)

250-259: Avoid redundant clone/to_string — return the existing String

config.tool_call_parser.clone() already yields Option<String>. The subsequent .map(|parser| parser.to_string()) needlessly clones.
-            .and_then(|config| config.tool_call_parser.clone())
-            .map(|parser| parser.to_string())
+            .and_then(|config| config.tool_call_parser.clone())
250-259: Match by slug as well as display name to prevent lookup mismatches

Callers may pass either the served model slug or display name. Matching only on entry.name == model can miss entries. Consider comparing slugs too.
 pub fn get_model_tool_call_parser(&self, model: &str) -> Option<String> {
-        self.entries
+        let model_slug = Slug::from_string(model);
+        self.entries
             .lock()
             .unwrap()
             .values()
-            .find(|entry| entry.name == model)
+            .find(|entry| entry.name == model || Slug::from_string(&entry.name) == model_slug)
             .and_then(|entry| entry.runtime_config.as_ref())
             .and_then(|config| config.tool_call_parser.clone())
 }
250-259: Add symmetric accessor for reasoning parser (future-proofing)

For API symmetry and future usage, consider get_model_reasoning_parser(&self, model: &str) -> Option<String> alongside the tool-call accessor. I can follow up with a small patch if desired.

lib/llm/src/protocols/openai.rs (1)

197-211: StreamArgs: solid public surface; add docs and minor ergonomics (PartialEq/Eq, is_empty).

These fields will flow through many call sites. A tiny polish helps testing and introspection.

Apply this diff to add docs, derive PartialEq/Eq, and a convenience helper:

-#[derive(Clone, Debug, Serialize, Deserialize, Default)]
+#[derive(Clone, Debug, Serialize, Deserialize, Default, PartialEq, Eq)]
 pub struct StreamArgs {
-    pub tool_call_parser: Option<String>,
+    /// Optional tool-call parser identifier, e.g. "hermes", "llama3_json".
+    pub tool_call_parser: Option<String>,
 
-    pub reasoning_parser: Option<String>,
+    /// Optional reasoning parser identifier (non-breaking; may be unused for now).
+    pub reasoning_parser: Option<String>,
 }
 
 impl StreamArgs {
     pub fn new(tool_call_parser: Option<String>, reasoning_parser: Option<String>) -> Self {
         Self {
             tool_call_parser,
             reasoning_parser,
         }
     }
+
+    /// Convenience: true when no extra stream behavior is requested.
+    pub fn is_empty(&self) -> bool {
+        self.tool_call_parser.is_none() && self.reasoning_parser.is_none()
+    }
 }

components/backends/vllm/src/dynamo/vllm/args.py (3)

109-109: Nit: typo in comment.

“adoped” → “adopted”.

-    # To avoid name conflicts with different backends, adoped prefix "dyn-" for dynamo specific args
+    # To avoid name conflicts with different backends, adopted prefix "dyn-" for Dynamo-specific args

111-121: Add CLI validation (choices) to prevent typos; optionally log selection.

Users will benefit from early validation for known tool-call parsers. Suggest adding argparse choices and a brief log for visibility.

-    parser.add_argument(
-        "--dyn-tool-call-parser",
-        type=str,
-        default=None,
-        help="Tool call parser name for the model. Available options: 'hermes', 'nemotron_deci', 'llama3_json', 'mistral', 'phi4'.",
-    )
+    parser.add_argument(
+        "--dyn-tool-call-parser",
+        type=str,
+        choices=["hermes", "nemotron_deci", "llama3_json", "mistral", "phi4"],
+        default=None,
+        help="Tool call parser name for the model. Options: hermes, nemotron_deci, llama3_json, mistral, phi4.",
+    )
@@
-    parser.add_argument(
+    parser.add_argument(
         "--dyn-reasoning-parser",
-        type=str,
+        type=str,
         default=None,
         help="Reasoning parser name for the model.",
     )

Optionally log the selected values after assignment (see below on lines 171-173).

171-173: Surface chosen parser selections in logs.

A single info/debug log aids debugging deployments.

     config.tool_call_parser = args.dyn_tool_call_parser
     config.reasoning_parser = args.dyn_reasoning_parser
+    if config.tool_call_parser:
+        logger.info(f"Using tool call parser: {config.tool_call_parser}")
+    if config.reasoning_parser:
+        logger.info(f"Using reasoning parser: {config.reasoning_parser}")

lib/bindings/python/rust/llm/local_model.rs (2)

37-45: Add Python-visible docs on new setters to reduce misuse.

Docstrings help users discover valid values and the ability to pass None.

 #[setter]
 fn set_tool_call_parser(&mut self, tool_call_parser: Option<String>) {
+    /// Set the tool-call parser name (e.g., "hermes", "llama3_json"). Pass None to clear.
     self.inner.tool_call_parser = tool_call_parser;
 }
 
 #[setter]
 fn set_reasoning_parser(&mut self, reasoning_parser: Option<String>) {
+    /// Set the reasoning parser name (if supported by the backend). Pass None to clear.
     self.inner.reasoning_parser = reasoning_parser;
 }

70-78: Mirror docs on getters; no functional issues.

Returning clones is fine. Add brief doc comments for Python help() output.

 #[getter]
 fn tool_call_parser(&self) -> Option<String> {
+    /// Get the configured tool-call parser name, if any.
     self.inner.tool_call_parser.clone()
 }
 
 #[getter]
 fn reasoning_parser(&self) -> Option<String> {
+    /// Get the configured reasoning parser name, if any.
     self.inner.reasoning_parser.clone()
 }

lib/llm/tests/aggregators.rs (1)

41-44: Add one test that exercises non-default StreamArgs.

Current tests only use StreamArgs::default(). Add a smoke test passing a non-default parser to guard API wiring (even if behavior is currently a no-op).

@@
 #[tokio::test]
 async fn test_openai_chat_stream() {
@@
 }
 
+#[tokio::test]
+async fn test_openai_chat_stream_with_stream_args() {
+    let data = std::fs::read_to_string("tests/data/replays/meta/llama-3.1-8b-instruct/chat_completions/chat-completion.streaming.1").unwrap();
+    let stream = create_message_stream(&data).take(8);
+    let args = StreamArgs {
+        tool_call_parser: Some("hermes".to_string()),
+        reasoning_parser: None,
+    };
+    let result = NvCreateChatCompletionResponse::from_sse_stream(Box::pin(stream), args)
+        .await
+        .unwrap();
+    assert!(result.choices.first().is_some());
+}

Also applies to: 65-68, 86-89, 107-109, 122-124

lib/llm/src/http/service/openai.rs (1)

198-203: get_stream_args: OK; consider logging and wiring reasoning when available.

Minor improvement: add a debug log so we can trace which parser was applied per model. Reasoning parser TODO noted.

 fn get_stream_args(state: &Arc<service_v2::State>, model: &str) -> StreamArgs {
     let tool_call_parser = state.manager().get_model_tool_call_parser(model);
     let reasoning_parser = None; // TODO: Implement reasoning parser
 
-    StreamArgs::new(tool_call_parser, reasoning_parser)
+    let args = StreamArgs::new(tool_call_parser, reasoning_parser);
+    tracing::debug!(model, tool_call_parser=?args.tool_call_parser, reasoning_parser=?args.reasoning_parser, "Constructed StreamArgs");
+    args
 }

lib/llm/src/protocols/openai/completions/aggregator.rs (3)

67-72: extra_stream_args is only logged; avoid noisy logs and keep PII surface minimal.

You don’t use extra_stream_args yet (expected), but the debug log is easy to forget. Prefer trace-level and structured fields, or remove.

Apply one of the following diffs (preferred: trace):

-        tracing::debug!("Tool Call Parser: {:?}", extra_stream_args.tool_call_parser); // TODO: remove this once completion has tool call support
+        tracing::trace!(
+            tool_call_parser = ?extra_stream_args.tool_call_parser,
+            "completions: received StreamArgs"
+        ); // TODO: downgrade or remove once completion has tool-call support

Or, remove entirely:

-        tracing::debug!("Tool Call Parser: {:?}", extra_stream_args.tool_call_parser); // TODO: remove this once completion has tool call support
+        // TODO: add usage when completion gets tool-call support

117-129: Minor comment typos (“return”/“conversion”).

Nit-level polish for readability.

-                    // to be return as part of the NIM Response Extension
+                    // to be returned as part of the NIM Response Extension
@@
-                        // Handle CompletionFinishReason -> FinishReason conversation
+                        // Handle CompletionFinishReason -> FinishReason conversion

290-299: Duplicate assertion in test_single_delta.

There are two identical assertions for choice.finish_reason.

-        assert_eq!(
-            choice.finish_reason,
-            Some(dynamo_async_openai::types::CompletionFinishReason::Length)
-        );
-        assert_eq!(
-            choice.finish_reason,
-            Some(dynamo_async_openai::types::CompletionFinishReason::Length)
-        );
+        assert_eq!(
+            choice.finish_reason,
+            Some(dynamo_async_openai::types::CompletionFinishReason::Length)
+        );

lib/llm/src/protocols/openai/chat_completions/aggregator.rs (3)

139-147: Make role aggregation resilient to late-arriving roles.

If the first delta for a choice lacks role, we keep None and later panic in From via expect(). Update role when a later delta provides it.

                 let state_choice =
                     aggregator
                         .choices
                         .entry(choice.index)
                         .or_insert(DeltaChoice {
                             index: choice.index,
                             text: "".to_string(),
-                            role: choice.delta.role,
+                            role: choice.delta.role,
                             finish_reason: None,
                             logprobs: choice.logprobs,
                             tool_calls: None,
                             reasoning_content: None,
                         });
+
+                // If role arrived late in the stream, adopt it.
+                if state_choice.role.is_none() && choice.delta.role.is_some() {
+                    state_choice.role = choice.delta.role;
+                }

229-255: Avoid panic if role is still missing at the end.

Default to Assistant rather than expect()-panic to keep the aggregator robust.

-                role: delta.role.expect("delta should have a Role"),
+                role: delta
+                    .role
+                    .unwrap_or(dynamo_async_openai::types::Role::Assistant),

541-597: Add a test where finish_reason is not ToolCalls but content parses as a tool-call.

This ensures we set finish_reason = ToolCalls on successful parse even if upstream didn’t.

I can add a test like below:

@@
     #[tokio::test]
     async fn test_tool_calling_output() {
@@
     }
+
+    #[tokio::test]
+    async fn test_tool_calling_when_finish_reason_is_none() {
+        let tool_call_json = r#"{"name":"get_weather","arguments":{"location":"SF"}}"#;
+        let annotated_delta = create_test_delta(
+            0,
+            tool_call_json,
+            Some(dynamo_async_openai::types::Role::Assistant),
+            None, // upstream didn't mark ToolCalls
+        );
+        let data = annotated_delta.data.unwrap();
+        let annotated_delta = Annotated { data: Some(data), id: Some("test_id".into()), event: None, comment: None };
+        let stream = Box::pin(stream::iter(vec![annotated_delta]));
+        let result = DeltaAggregator::apply(stream, StreamArgs::default()).await;
+        assert!(result.is_ok());
+        let response = result.unwrap();
+        assert_eq!(response.choices.len(), 1);
+        let choice = &response.choices[0];
+        assert!(choice.message.tool_calls.is_some());
+        assert!(choice.message.content.is_none());
+        assert_eq!(
+            choice.finish_reason,
+            Some(dynamo_async_openai::types::FinishReason::ToolCalls)
+        );
+    }

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 5045f13 and ef14880.

📒 Files selected for processing (13)

components/backends/vllm/src/dynamo/vllm/args.py (3 hunks)
components/backends/vllm/src/dynamo/vllm/main.py (1 hunks)
lib/bindings/python/rust/llm/local_model.rs (2 hunks)
lib/llm/src/discovery/model_manager.rs (1 hunks)
lib/llm/src/http/service/openai.rs (8 hunks)
lib/llm/src/local_model.rs (2 hunks)
lib/llm/src/local_model/runtime_config.rs (1 hunks)
lib/llm/src/preprocessor.rs (0 hunks)
lib/llm/src/protocols/openai.rs (1 hunks)
lib/llm/src/protocols/openai/chat_completions/aggregator.rs (10 hunks)
lib/llm/src/protocols/openai/completions/aggregator.rs (7 hunks)
lib/llm/tests/aggregators.rs (6 hunks)
lib/parsers/src/tool_calling/tools.rs (1 hunks)

💤 Files with no reviewable changes (1)

lib/llm/src/preprocessor.rs

🧰 Additional context used

🧬 Code graph analysis (7)

components/backends/vllm/src/dynamo/vllm/main.py (2)

lib/llm/src/local_model.rs (2)

runtime_config (165-168)

runtime_config (355-357)

lib/bindings/python/rust/llm/local_model.rs (2)

tool_call_parser (71-73)

reasoning_parser (76-78)

components/backends/vllm/src/dynamo/vllm/args.py (2)

lib/bindings/python/rust/llm/local_model.rs (2)

tool_call_parser (71-73)

reasoning_parser (76-78)

lib/llm/src/local_model.rs (1)

default (65-85)

lib/llm/src/protocols/openai.rs (1)

lib/bindings/python/rust/llm/local_model.rs (2)

tool_call_parser (71-73)

reasoning_parser (76-78)

lib/llm/src/protocols/openai/completions/aggregator.rs (3)

lib/llm/src/protocols/openai/chat_completions/aggregator.rs (4)

convert_sse_stream (300-300)

from_annotated_stream (269-272)

from_annotated_stream (289-294)

apply (102-226)

lib/llm/src/protocols.rs (1)

convert_sse_stream (51-67)

lib/bindings/python/rust/llm/local_model.rs (1)

tool_call_parser (71-73)

lib/llm/tests/aggregators.rs (2)

lib/llm/src/protocols/openai/completions/aggregator.rs (2)

from_sse_stream (182-188)

default (49-51)

lib/llm/src/protocols/openai/chat_completions/aggregator.rs (3)

from_sse_stream (282-285)

from_sse_stream (296-302)

default (73-75)

lib/llm/src/protocols/openai/chat_completions/aggregator.rs (3)

lib/llm/src/protocols/openai/completions/aggregator.rs (5)

convert_sse_stream (186-186)

from_annotated_stream (190-195)

apply (68-165)

from_sse_stream (182-188)

default (49-51)

lib/llm/src/protocols.rs (1)

convert_sse_stream (51-67)

lib/parsers/src/tool_calling/tools.rs (1)

try_tool_call_parse_aggregate (13-39)

lib/llm/src/http/service/openai.rs (4)

lib/bindings/python/rust/llm/local_model.rs (2)

tool_call_parser (71-73)

reasoning_parser (76-78)

lib/llm/src/protocols/openai/completions/aggregator.rs (2)

new (55-65)

from_annotated_stream (190-195)

lib/llm/src/protocols/openai.rs (1)

new (205-210)

lib/llm/src/protocols/openai/chat_completions/aggregator.rs (3)

new (80-91)

from_annotated_stream (269-272)

from_annotated_stream (289-294)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: pre-merge-rust (.)
GitHub Check: pre-merge-rust (lib/runtime/examples)
GitHub Check: Build and Test - dynamo
GitHub Check: pre-merge-rust (lib/bindings/python)

🔇 Additional comments (10)

lib/llm/src/local_model.rs (1)

205-205: No-op whitespace changes — fine to keep

The blank lines improve readability slightly and have no functional impact.

Also applies to: 396-396

lib/llm/tests/aggregators.rs (1)

21-22: Public re-export of StreamArgs used correctly.

Import path and usage look good.

lib/llm/src/http/service/openai.rs (3)

40-41: Import of StreamArgs is appropriate.

No issues.

278-279: Completions: threading StreamArgs into the fold path looks correct.

This ensures non-streaming completions can leverage the selected parser.

Also applies to: 338-347

507-508: LGTM on retrieving and passing StreamArgs.

Non-streaming paths now consistently receive per-model parsing configuration.

Also applies to: 744-745

lib/llm/src/protocols/openai/completions/aggregator.rs (2)

25-28: Good plumbing: StreamArgs and convert_sse_stream imports are consistent with chat path.

This aligns the completions path with chat and keeps the API surface uniform.

181-195: Signature propagation is correct.

from_sse_stream and from_annotated_stream cleanly thread StreamArgs through to the aggregator.

lib/llm/src/protocols/openai/chat_completions/aggregator.rs (3)

22-25: Imports for convert_sse_stream and StreamArgs look right.

Matches the completions path and keeps the API consistent.

103-105: Added StreamArgs to apply signature — good threading of parser config.

This unlocks model-specific tool-call parsing without altering aggregation semantics.

269-299: No Other Internal Implementations—Breaking Change Is Limited

I ran a repo-wide search for all impl ChatCompletionAggregator and direct calls to from_annotated_stream/from_sse_stream. The only trait implementation in this crate is in lib/llm/src/protocols/openai/chat_completions/aggregator.rs (lines 288–299), and all internal call sites have already been updated to match the new signatures. There are no other implementors within our codebase.

• If you’re bumping this crate’s version, treat this as a breaking change:
– Update the changelog to note the signature change.
– Increment the major version (per SemVer) so downstream users get a clear upgrade signal.
• No further internal changes are needed—everything compiles and tests pass with the new signatures.
• External consumers who provided their own impl ChatCompletionAggregator for _ will need to update their implementations.

components/backends/vllm/src/dynamo/vllm/args.py

lib/llm/src/http/service/openai.rs

lib/llm/src/protocols/openai.rs

lib/llm/src/discovery/model_manager.rs

Signed-off-by: Hannah Zhang <[email protected]>

Signed-off-by: Jason Zhou <[email protected]>

chore: added reasoning and tool parser arg in vllm and mdc

d876f7a

ayushag-nv requested review from nnshah1, tanmayv25, piotrm-nvidia, ptarasiewiczNV, ryanolson, grahamking, paulhendricks, biswapanda, tmonty12, GuanLuo, rmccorm4, tedzhouhk, alec-flowers and kkranen as code owners August 21, 2025 20:50

ayushag-nv self-assigned this Aug 21, 2025

ayushag-nv requested review from jthomson04 and ishandhanani as code owners August 21, 2025 20:50

ayushag-nv requested review from PeaBrane, hhzhang16 and a team as code owners August 21, 2025 20:50

pull-request-size bot added the size/M label Aug 21, 2025

ayushag-nv marked this pull request as draft August 21, 2025 20:50

github-actions bot added the feat label Aug 21, 2025

ayushag-nv added 2 commits August 21, 2025 20:53

fix: lint

5a83e49

chore: add debug statements tmp

4be4236

paulhendricks reviewed Aug 21, 2025

View reviewed changes

lib/bindings/python/rust/lib.rs Outdated Show resolved Hide resolved

lib/llm/src/preprocessor.rs Outdated Show resolved Hide resolved

lib/llm/src/local_model.rs Outdated Show resolved Hide resolved

chore: cli for vllm works e2e

73f78d7

ayushag-nv changed the title ~~feat: cli args for parsers in backend and MDC~~ feat: implement cli args for tool and reasoning parsers Aug 22, 2025

ayushag-nv added 2 commits August 22, 2025 19:13

chore: added stream_args struct

211519c

chore: add correct print stats

801b3fc

ayushag-nv changed the title ~~feat: implement cli args for tool and reasoning parsers~~ feat: [vLLM] implement cli args for tool and reasoning parsers Aug 22, 2025

ayushag-nv marked this pull request as ready for review August 22, 2025 19:24

Merge branch 'main' into ayushag/worker-to-mdc-parser-info

57b7a47

ayushag-nv requested a review from messiaen August 22, 2025 19:26

ayushag-nv added 2 commits August 22, 2025 19:29

fix: fmt

ec51035

Merge branch 'main' into ayushag/worker-to-mdc-parser-info

ef14880

coderabbitai bot reviewed Aug 22, 2025

View reviewed changes

components/backends/vllm/src/dynamo/vllm/args.py Show resolved Hide resolved

lib/llm/src/http/service/openai.rs Outdated Show resolved Hide resolved

lib/llm/src/http/service/openai.rs Outdated Show resolved Hide resolved

paulhendricks reviewed Aug 22, 2025

View reviewed changes

lib/llm/src/protocols/openai.rs Outdated Show resolved Hide resolved

paulhendricks reviewed Aug 22, 2025

View reviewed changes

lib/llm/src/discovery/model_manager.rs Outdated Show resolved Hide resolved

ayushag-nv added 3 commits August 22, 2025 19:51

fix: replaced stream args with parsing opt

114ba09

fix: better error handling

174df2e

Merge branch 'main' into ayushag/worker-to-mdc-parser-info

95ab4ff

paulhendricks self-requested a review August 22, 2025 20:08

paulhendricks approved these changes Aug 22, 2025

View reviewed changes

ayushag-nv enabled auto-merge (squash) August 22, 2025 20:11

ayushag-nv merged commit cbe854f into main Aug 22, 2025
10 checks passed

ayushag-nv deleted the ayushag/worker-to-mdc-parser-info branch August 22, 2025 20:29

paulhendricks mentioned this pull request Aug 25, 2025

refactor: Switch ModelManager locks from std::sync::Mutex to parking_lot::Mutex #2696

Merged

coderabbitai bot mentioned this pull request Aug 25, 2025

feat: enable --dyn-reasoning-parser flag to set reasoning parser for … #2700

Merged

nachiketb-nvidia mentioned this pull request Aug 26, 2025

feat: add and enable reasoning and tool parser flags for trtllm and sglang #2713

Merged

coderabbitai bot mentioned this pull request Aug 26, 2025

feat: add pythonic tool call parser support #2726

Closed

hhzhang16 pushed a commit that referenced this pull request Aug 27, 2025

feat: [vLLM] implement cli args for tool and reasoning parsers (#2619)

91ffd4d

Signed-off-by: Hannah Zhang <[email protected]>

coderabbitai bot mentioned this pull request Aug 27, 2025

feat: python as a subprocess reasoning parser structure and implementation #2750

Open

nv-anants pushed a commit that referenced this pull request Aug 28, 2025

feat: [vLLM] implement cli args for tool and reasoning parsers (#2619)

813012f

jasonqinzhou pushed a commit that referenced this pull request Aug 30, 2025

feat: [vLLM] implement cli args for tool and reasoning parsers (#2619)

216f608

Signed-off-by: Jason Zhou <[email protected]>

KrishnanPrash pushed a commit that referenced this pull request Sep 2, 2025

feat: [vLLM] implement cli args for tool and reasoning parsers (#2619)

8cda57a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: [vLLM] implement cli args for tool and reasoning parsers #2619

feat: [vLLM] implement cli args for tool and reasoning parsers #2619

Uh oh!

ayushag-nv commented Aug 21, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Aug 21, 2025

Uh oh!

paulhendricks left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot commented Aug 22, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

feat: [vLLM] implement cli args for tool and reasoning parsers #2619

feat: [vLLM] implement cli args for tool and reasoning parsers #2619

Uh oh!

Conversation

ayushag-nv commented Aug 21, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Aug 21, 2025

Uh oh!

paulhendricks left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ayushag-nv commented Aug 21, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 22, 2025 •

edited

Loading