-
Notifications
You must be signed in to change notification settings - Fork 580
feat: [vLLM] implement cli args for tool and reasoning parsers #2619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good so far
WalkthroughAdds optional parser names (tool_call_parser, reasoning_parser) to config/CLI, propagates them into ModelRuntimeConfig, exposes them via bindings/manager, introduces StreamArgs carrying these options, and threads StreamArgs through OpenAI HTTP handlers and aggregators to influence tool-call parsing. Minor whitespace/logging tweaks and test updates accompany signature changes. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant CLI as CLI
participant VArgs as VLLM Args/Config
participant VM as VLLM Main
participant RT as ModelRuntimeConfig
participant HTTP as OpenAI HTTP Service
participant Prot as StreamArgs
participant Agg as Aggregators
participant Pars as Parsers
User->>CLI: Start with flags (--dyn-...-parser)
CLI->>VArgs: parse_args()
VArgs->>VM: Config{tool_call_parser, reasoning_parser}
VM->>RT: set tool_call_parser / reasoning_parser
User->>HTTP: Completion/Chat request (model)
HTTP->>RT: lookup model runtime_config
HTTP->>Prot: get_stream_args(model)
note right of Prot: StreamArgs{tool_call_parser,<br/>reasoning_parser}
HTTP->>Agg: from_annotated_stream(stream, StreamArgs)
Agg->>Pars: try_tool_call_parse_aggregate(StreamArgs.tool_call_parser)
Pars-->>Agg: Parsed tool calls (optional)
Agg-->>HTTP: Aggregated response
HTTP-->>User: Final response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
lib/llm/src/protocols/openai/chat_completions/aggregator.rs (1)
181-201
: Avoid logging raw tool-call arguments; log parse errors instead.Logging full JSON arguments can leak PII at debug level. Also, parse errors are silently dropped, making issues hard to diagnose.
Apply this diff to redact arguments and surface parse failures at debug:
- if choice.tool_calls.is_none() { - if let Ok(tool_calls) = try_tool_call_parse_aggregate( - &choice.text, - extra_stream_args.tool_call_parser.as_deref(), - ) { + if choice.tool_calls.is_none() { + match try_tool_call_parse_aggregate( + &choice.text, + extra_stream_args.tool_call_parser.as_deref(), + ) { + Ok(tool_calls) => { if tool_calls.is_empty() { continue; } for tool_call in &tool_calls { - tracing::debug!( - tool_call_id = %tool_call.id, - function_name = %tool_call.function.name, - arguments = %tool_call.function.arguments, - "Parsed structured tool call from aggregated content" - ); + tracing::debug!( + tool_call_id = %tool_call.id, + function_name = %tool_call.function.name, + args_len = tool_call.function.arguments.len(), + "Parsed structured tool call from aggregated content" + ); } choice.tool_calls = Some(tool_calls); choice.text.clear(); choice.finish_reason = Some(dynamo_async_openai::types::FinishReason::ToolCalls); - } + } + Err(e) => { + tracing::debug!( + error = %e, + "Tool-call parsing failed; leaving content as-is" + ); + } + } }
🧹 Nitpick comments (21)
lib/parsers/src/tool_calling/tools.rs (2)
17-21
: Lower log level and use structured fields to avoid noisy logs and Option formattingThese logs sit on a hot path and will emit per parse attempt. Prefer debug level and structured fields, and avoid printing
Option
asSome("...")
.Apply:
- if parser_str.is_none() { - tracing::info!("No tool parser provided. Trying parsing with default parser."); - } else { - tracing::info!("Using tool parser: {:?}", parser_str); - } + match parser_str { + Some(p) => tracing::debug!(parser = %p, "Using tool parser for aggregation"), + None => tracing::debug!("No tool parser provided; default parser will be used"), + }
45-48
: Consider consistent logging in stream path (mirrors aggregate path)For parity and easier troubleshooting, add the same (debug-level) structured log at the start of
try_tool_call_parse_stream
.Example:
pub fn try_tool_call_parse_stream( message: &str, parser_str: Option<&str>, ) -> anyhow::Result<Vec<dynamo_async_openai::types::ChatCompletionMessageToolCallChunk>> { match parser_str { Some(p) => tracing::debug!(parser = %p, "Using tool parser for streaming"), None => tracing::debug!("No tool parser provided; default parser will be used"), } let parsed = detect_and_parse_tool_call(message, parser_str)?; // ... }lib/llm/src/local_model/runtime_config.rs (1)
16-19
: Document new fields (optionally skip serializing None)Good additive fields; serde with
Option<String>
remains backward-compatible. Consider brief docs for expected values. Optionally, skip serializingNone
to avoidnull
noise in etcd snapshots.- pub tool_call_parser: Option<String>, + /// Optional tool-call parser identifier for this model. If None, default parsing is used. + pub tool_call_parser: Option<String>, - pub reasoning_parser: Option<String>, + /// Optional reasoning parser identifier for this model. Reserved for future use. + pub reasoning_parser: Option<String>,components/backends/vllm/src/dynamo/vllm/main.py (1)
237-239
: Normalize empty CLI values to None to avoid passing "" downstreamIf a user passes an empty string (e.g., via env plumbing), we currently propagate
""
, which could be interpreted as “use a parser named empty-string.” Normalize toNone
.- runtime_config.tool_call_parser = config.tool_call_parser - runtime_config.reasoning_parser = config.reasoning_parser + # Normalize empty strings to None to avoid passing "" to downstream parsers + runtime_config.tool_call_parser = config.tool_call_parser or None + runtime_config.reasoning_parser = config.reasoning_parser or Nonelib/llm/src/discovery/model_manager.rs (3)
250-259
: Avoid redundant clone/to_string — return the existing String
config.tool_call_parser.clone()
already yieldsOption<String>
. The subsequent.map(|parser| parser.to_string())
needlessly clones.- .and_then(|config| config.tool_call_parser.clone()) - .map(|parser| parser.to_string()) + .and_then(|config| config.tool_call_parser.clone())
250-259
: Match by slug as well as display name to prevent lookup mismatchesCallers may pass either the served model slug or display name. Matching only on
entry.name == model
can miss entries. Consider comparing slugs too.pub fn get_model_tool_call_parser(&self, model: &str) -> Option<String> { - self.entries + let model_slug = Slug::from_string(model); + self.entries .lock() .unwrap() .values() - .find(|entry| entry.name == model) + .find(|entry| entry.name == model || Slug::from_string(&entry.name) == model_slug) .and_then(|entry| entry.runtime_config.as_ref()) .and_then(|config| config.tool_call_parser.clone()) }
250-259
: Add symmetric accessor for reasoning parser (future-proofing)For API symmetry and future usage, consider
get_model_reasoning_parser(&self, model: &str) -> Option<String>
alongside the tool-call accessor. I can follow up with a small patch if desired.lib/llm/src/protocols/openai.rs (1)
197-211
: StreamArgs: solid public surface; add docs and minor ergonomics (PartialEq/Eq, is_empty).These fields will flow through many call sites. A tiny polish helps testing and introspection.
Apply this diff to add docs, derive PartialEq/Eq, and a convenience helper:
-#[derive(Clone, Debug, Serialize, Deserialize, Default)] +#[derive(Clone, Debug, Serialize, Deserialize, Default, PartialEq, Eq)] pub struct StreamArgs { - pub tool_call_parser: Option<String>, + /// Optional tool-call parser identifier, e.g. "hermes", "llama3_json". + pub tool_call_parser: Option<String>, - pub reasoning_parser: Option<String>, + /// Optional reasoning parser identifier (non-breaking; may be unused for now). + pub reasoning_parser: Option<String>, } impl StreamArgs { pub fn new(tool_call_parser: Option<String>, reasoning_parser: Option<String>) -> Self { Self { tool_call_parser, reasoning_parser, } } + + /// Convenience: true when no extra stream behavior is requested. + pub fn is_empty(&self) -> bool { + self.tool_call_parser.is_none() && self.reasoning_parser.is_none() + } }components/backends/vllm/src/dynamo/vllm/args.py (3)
109-109
: Nit: typo in comment.“adoped” → “adopted”.
- # To avoid name conflicts with different backends, adoped prefix "dyn-" for dynamo specific args + # To avoid name conflicts with different backends, adopted prefix "dyn-" for Dynamo-specific args
111-121
: Add CLI validation (choices) to prevent typos; optionally log selection.Users will benefit from early validation for known tool-call parsers. Suggest adding argparse choices and a brief log for visibility.
- parser.add_argument( - "--dyn-tool-call-parser", - type=str, - default=None, - help="Tool call parser name for the model. Available options: 'hermes', 'nemotron_deci', 'llama3_json', 'mistral', 'phi4'.", - ) + parser.add_argument( + "--dyn-tool-call-parser", + type=str, + choices=["hermes", "nemotron_deci", "llama3_json", "mistral", "phi4"], + default=None, + help="Tool call parser name for the model. Options: hermes, nemotron_deci, llama3_json, mistral, phi4.", + ) @@ - parser.add_argument( + parser.add_argument( "--dyn-reasoning-parser", - type=str, + type=str, default=None, help="Reasoning parser name for the model.", )Optionally log the selected values after assignment (see below on lines 171-173).
171-173
: Surface chosen parser selections in logs.A single info/debug log aids debugging deployments.
config.tool_call_parser = args.dyn_tool_call_parser config.reasoning_parser = args.dyn_reasoning_parser + if config.tool_call_parser: + logger.info(f"Using tool call parser: {config.tool_call_parser}") + if config.reasoning_parser: + logger.info(f"Using reasoning parser: {config.reasoning_parser}")lib/bindings/python/rust/llm/local_model.rs (2)
37-45
: Add Python-visible docs on new setters to reduce misuse.Docstrings help users discover valid values and the ability to pass None.
#[setter] fn set_tool_call_parser(&mut self, tool_call_parser: Option<String>) { + /// Set the tool-call parser name (e.g., "hermes", "llama3_json"). Pass None to clear. self.inner.tool_call_parser = tool_call_parser; } #[setter] fn set_reasoning_parser(&mut self, reasoning_parser: Option<String>) { + /// Set the reasoning parser name (if supported by the backend). Pass None to clear. self.inner.reasoning_parser = reasoning_parser; }
70-78
: Mirror docs on getters; no functional issues.Returning clones is fine. Add brief doc comments for Python help() output.
#[getter] fn tool_call_parser(&self) -> Option<String> { + /// Get the configured tool-call parser name, if any. self.inner.tool_call_parser.clone() } #[getter] fn reasoning_parser(&self) -> Option<String> { + /// Get the configured reasoning parser name, if any. self.inner.reasoning_parser.clone() }lib/llm/tests/aggregators.rs (1)
41-44
: Add one test that exercises non-default StreamArgs.Current tests only use StreamArgs::default(). Add a smoke test passing a non-default parser to guard API wiring (even if behavior is currently a no-op).
@@ #[tokio::test] async fn test_openai_chat_stream() { @@ } +#[tokio::test] +async fn test_openai_chat_stream_with_stream_args() { + let data = std::fs::read_to_string("tests/data/replays/meta/llama-3.1-8b-instruct/chat_completions/chat-completion.streaming.1").unwrap(); + let stream = create_message_stream(&data).take(8); + let args = StreamArgs { + tool_call_parser: Some("hermes".to_string()), + reasoning_parser: None, + }; + let result = NvCreateChatCompletionResponse::from_sse_stream(Box::pin(stream), args) + .await + .unwrap(); + assert!(result.choices.first().is_some()); +}Also applies to: 65-68, 86-89, 107-109, 122-124
lib/llm/src/http/service/openai.rs (1)
198-203
: get_stream_args: OK; consider logging and wiring reasoning when available.Minor improvement: add a debug log so we can trace which parser was applied per model. Reasoning parser TODO noted.
fn get_stream_args(state: &Arc<service_v2::State>, model: &str) -> StreamArgs { let tool_call_parser = state.manager().get_model_tool_call_parser(model); let reasoning_parser = None; // TODO: Implement reasoning parser - StreamArgs::new(tool_call_parser, reasoning_parser) + let args = StreamArgs::new(tool_call_parser, reasoning_parser); + tracing::debug!(model, tool_call_parser=?args.tool_call_parser, reasoning_parser=?args.reasoning_parser, "Constructed StreamArgs"); + args }lib/llm/src/protocols/openai/completions/aggregator.rs (3)
67-72
: extra_stream_args is only logged; avoid noisy logs and keep PII surface minimal.You don’t use extra_stream_args yet (expected), but the debug log is easy to forget. Prefer trace-level and structured fields, or remove.
Apply one of the following diffs (preferred: trace):
- tracing::debug!("Tool Call Parser: {:?}", extra_stream_args.tool_call_parser); // TODO: remove this once completion has tool call support + tracing::trace!( + tool_call_parser = ?extra_stream_args.tool_call_parser, + "completions: received StreamArgs" + ); // TODO: downgrade or remove once completion has tool-call supportOr, remove entirely:
- tracing::debug!("Tool Call Parser: {:?}", extra_stream_args.tool_call_parser); // TODO: remove this once completion has tool call support + // TODO: add usage when completion gets tool-call support
117-129
: Minor comment typos (“return”/“conversion”).Nit-level polish for readability.
- // to be return as part of the NIM Response Extension + // to be returned as part of the NIM Response Extension @@ - // Handle CompletionFinishReason -> FinishReason conversation + // Handle CompletionFinishReason -> FinishReason conversion
290-299
: Duplicate assertion in test_single_delta.There are two identical assertions for choice.finish_reason.
- assert_eq!( - choice.finish_reason, - Some(dynamo_async_openai::types::CompletionFinishReason::Length) - ); - assert_eq!( - choice.finish_reason, - Some(dynamo_async_openai::types::CompletionFinishReason::Length) - ); + assert_eq!( + choice.finish_reason, + Some(dynamo_async_openai::types::CompletionFinishReason::Length) + );lib/llm/src/protocols/openai/chat_completions/aggregator.rs (3)
139-147
: Make role aggregation resilient to late-arriving roles.If the first delta for a choice lacks role, we keep None and later panic in From via expect(). Update role when a later delta provides it.
let state_choice = aggregator .choices .entry(choice.index) .or_insert(DeltaChoice { index: choice.index, text: "".to_string(), - role: choice.delta.role, + role: choice.delta.role, finish_reason: None, logprobs: choice.logprobs, tool_calls: None, reasoning_content: None, }); + + // If role arrived late in the stream, adopt it. + if state_choice.role.is_none() && choice.delta.role.is_some() { + state_choice.role = choice.delta.role; + }
229-255
: Avoid panic if role is still missing at the end.Default to Assistant rather than expect()-panic to keep the aggregator robust.
- role: delta.role.expect("delta should have a Role"), + role: delta + .role + .unwrap_or(dynamo_async_openai::types::Role::Assistant),
541-597
: Add a test where finish_reason is not ToolCalls but content parses as a tool-call.This ensures we set finish_reason = ToolCalls on successful parse even if upstream didn’t.
I can add a test like below:
@@ #[tokio::test] async fn test_tool_calling_output() { @@ } + + #[tokio::test] + async fn test_tool_calling_when_finish_reason_is_none() { + let tool_call_json = r#"{"name":"get_weather","arguments":{"location":"SF"}}"#; + let annotated_delta = create_test_delta( + 0, + tool_call_json, + Some(dynamo_async_openai::types::Role::Assistant), + None, // upstream didn't mark ToolCalls + ); + let data = annotated_delta.data.unwrap(); + let annotated_delta = Annotated { data: Some(data), id: Some("test_id".into()), event: None, comment: None }; + let stream = Box::pin(stream::iter(vec![annotated_delta])); + let result = DeltaAggregator::apply(stream, StreamArgs::default()).await; + assert!(result.is_ok()); + let response = result.unwrap(); + assert_eq!(response.choices.len(), 1); + let choice = &response.choices[0]; + assert!(choice.message.tool_calls.is_some()); + assert!(choice.message.content.is_none()); + assert_eq!( + choice.finish_reason, + Some(dynamo_async_openai::types::FinishReason::ToolCalls) + ); + }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (13)
components/backends/vllm/src/dynamo/vllm/args.py
(3 hunks)components/backends/vllm/src/dynamo/vllm/main.py
(1 hunks)lib/bindings/python/rust/llm/local_model.rs
(2 hunks)lib/llm/src/discovery/model_manager.rs
(1 hunks)lib/llm/src/http/service/openai.rs
(8 hunks)lib/llm/src/local_model.rs
(2 hunks)lib/llm/src/local_model/runtime_config.rs
(1 hunks)lib/llm/src/preprocessor.rs
(0 hunks)lib/llm/src/protocols/openai.rs
(1 hunks)lib/llm/src/protocols/openai/chat_completions/aggregator.rs
(10 hunks)lib/llm/src/protocols/openai/completions/aggregator.rs
(7 hunks)lib/llm/tests/aggregators.rs
(6 hunks)lib/parsers/src/tool_calling/tools.rs
(1 hunks)
💤 Files with no reviewable changes (1)
- lib/llm/src/preprocessor.rs
🧰 Additional context used
🧬 Code graph analysis (7)
components/backends/vllm/src/dynamo/vllm/main.py (2)
lib/llm/src/local_model.rs (2)
runtime_config
(165-168)runtime_config
(355-357)lib/bindings/python/rust/llm/local_model.rs (2)
tool_call_parser
(71-73)reasoning_parser
(76-78)
components/backends/vllm/src/dynamo/vllm/args.py (2)
lib/bindings/python/rust/llm/local_model.rs (2)
tool_call_parser
(71-73)reasoning_parser
(76-78)lib/llm/src/local_model.rs (1)
default
(65-85)
lib/llm/src/protocols/openai.rs (1)
lib/bindings/python/rust/llm/local_model.rs (2)
tool_call_parser
(71-73)reasoning_parser
(76-78)
lib/llm/src/protocols/openai/completions/aggregator.rs (3)
lib/llm/src/protocols/openai/chat_completions/aggregator.rs (4)
convert_sse_stream
(300-300)from_annotated_stream
(269-272)from_annotated_stream
(289-294)apply
(102-226)lib/llm/src/protocols.rs (1)
convert_sse_stream
(51-67)lib/bindings/python/rust/llm/local_model.rs (1)
tool_call_parser
(71-73)
lib/llm/tests/aggregators.rs (2)
lib/llm/src/protocols/openai/completions/aggregator.rs (2)
from_sse_stream
(182-188)default
(49-51)lib/llm/src/protocols/openai/chat_completions/aggregator.rs (3)
from_sse_stream
(282-285)from_sse_stream
(296-302)default
(73-75)
lib/llm/src/protocols/openai/chat_completions/aggregator.rs (3)
lib/llm/src/protocols/openai/completions/aggregator.rs (5)
convert_sse_stream
(186-186)from_annotated_stream
(190-195)apply
(68-165)from_sse_stream
(182-188)default
(49-51)lib/llm/src/protocols.rs (1)
convert_sse_stream
(51-67)lib/parsers/src/tool_calling/tools.rs (1)
try_tool_call_parse_aggregate
(13-39)
lib/llm/src/http/service/openai.rs (4)
lib/bindings/python/rust/llm/local_model.rs (2)
tool_call_parser
(71-73)reasoning_parser
(76-78)lib/llm/src/protocols/openai/completions/aggregator.rs (2)
new
(55-65)from_annotated_stream
(190-195)lib/llm/src/protocols/openai.rs (1)
new
(205-210)lib/llm/src/protocols/openai/chat_completions/aggregator.rs (3)
new
(80-91)from_annotated_stream
(269-272)from_annotated_stream
(289-294)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: pre-merge-rust (.)
- GitHub Check: pre-merge-rust (lib/runtime/examples)
- GitHub Check: Build and Test - dynamo
- GitHub Check: pre-merge-rust (lib/bindings/python)
🔇 Additional comments (10)
lib/llm/src/local_model.rs (1)
205-205
: No-op whitespace changes — fine to keepThe blank lines improve readability slightly and have no functional impact.
Also applies to: 396-396
lib/llm/tests/aggregators.rs (1)
21-22
: Public re-export of StreamArgs used correctly.Import path and usage look good.
lib/llm/src/http/service/openai.rs (3)
40-41
: Import of StreamArgs is appropriate.No issues.
278-279
: Completions: threading StreamArgs into the fold path looks correct.This ensures non-streaming completions can leverage the selected parser.
Also applies to: 338-347
507-508
: LGTM on retrieving and passing StreamArgs.Non-streaming paths now consistently receive per-model parsing configuration.
Also applies to: 744-745
lib/llm/src/protocols/openai/completions/aggregator.rs (2)
25-28
: Good plumbing: StreamArgs and convert_sse_stream imports are consistent with chat path.This aligns the completions path with chat and keeps the API surface uniform.
181-195
: Signature propagation is correct.from_sse_stream and from_annotated_stream cleanly thread StreamArgs through to the aggregator.
lib/llm/src/protocols/openai/chat_completions/aggregator.rs (3)
22-25
: Imports for convert_sse_stream and StreamArgs look right.Matches the completions path and keeps the API consistent.
103-105
: Added StreamArgs to apply signature — good threading of parser config.This unlocks model-specific tool-call parsing without altering aggregation semantics.
269-299
: No Other Internal Implementations—Breaking Change Is LimitedI ran a repo-wide search for all
impl ChatCompletionAggregator
and direct calls tofrom_annotated_stream
/from_sse_stream
. The only trait implementation in this crate is inlib/llm/src/protocols/openai/chat_completions/aggregator.rs
(lines 288–299), and all internal call sites have already been updated to match the new signatures. There are no other implementors within our codebase.• If you’re bumping this crate’s version, treat this as a breaking change:
– Update the changelog to note the signature change.
– Increment the major version (per SemVer) so downstream users get a clear upgrade signal.
• No further internal changes are needed—everything compiles and tests pass with the new signatures.
• External consumers who provided their ownimpl ChatCompletionAggregator for _
will need to update their implementations.
Signed-off-by: Hannah Zhang <[email protected]>
Signed-off-by: Jason Zhou <[email protected]>
Overview:
adds two cli args in vLLM backend worker to enable user provided tool and reasoning parsing
These names are choses as individual backends have their own cli name, so we don't want to conflict them.
These args are passed and consumed in frontend to do tool and reasoning parsing.
Current PR scope is just tool call parsing and vLLM
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
New Features
Improvements
Tests