-
Notifications
You must be signed in to change notification settings - Fork 580
chore: use token ids for reasoning parsing instead of strings directly #2629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughUpdates thread tokenizer and vocab into the chat-completions delta pipeline, switch reasoning parsing to operate on token IDs, expose delta module publicly, and adjust generator APIs and call sites to accept optional tokenizer/vocab and a new reasoning_content argument in create_choice. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant Preprocessor
participant Request
participant DeltaGenerator
participant Stream
Client->>Preprocessor: NvCreateChatCompletionRequest
Preprocessor->>Preprocessor: tokenizer.get_vocab(true)
Preprocessor->>Request: response_generator(Some(&tokenizer), Some(&vocab))
Request-->>Preprocessor: DeltaGenerator(model, opts, tokenizer?, vocab?)
Preprocessor->>DeltaGenerator: stream tokens (token_ids, text)
rect rgba(220,235,255,0.5)
note over DeltaGenerator: Optional reasoning path
DeltaGenerator->>DeltaGenerator: reasoning_parser?vocab→init
DeltaGenerator->>DeltaGenerator: parse token_ids → {normal_token_ids, reasoning_token_ids}
DeltaGenerator->>DeltaGenerator: decode via tokenizer? → text, reasoning_content?
end
DeltaGenerator-->>Stream: create_choice(index, text?, reasoning_content?, finish_reason?, logprobs?)
Stream-->>Client: incremental deltas
DeltaGenerator-->>Stream: final create_choice(..., FinishReason::Stop, ...)
Stream-->>Client: termination
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
lib/llm/src/protocols/openai/chat_completions/delta.rs (1)
307-313
: Add explicit length checks to prevent silent truncation increate_logprobs
.Before zipping the token, token_id, and logprob vectors—which currently will truncate to the shortest length without warning—insert runtime (or debug) assertions to ensure:
tokens.len() == token_ids.len()
token_ids.len() == logprobs.len()
(whenlogprobs.is_some()
)token_ids.len() == top_logprobs.len()
(whentop_logprobs.is_some()
)This will catch any backend that produces mismatched lengths and avoid silent data loss.
Locations to update:
- lib/llm/src/protocols/openai/chat_completions/delta.rs around
fn create_logprobs
(just before the first.zip
).Proposed diff:
impl fn create_logprobs( &self, tokens: Vec<TokenType>, token_ids: Vec<TokenIdType>, logprobs: Option<LogProbs>, top_logprobs: Option<TopLogprobs>, ) -> Option<ChatChoiceLogprobs> { if !self.options.enable_logprobs || logprobs.is_none() { return None; } + // Prevent silent truncation when zipping mismatched vectors + debug_assert_eq!( + tokens.len(), + token_ids.len(), + "Mismatch: tokens ({}) vs token_ids ({}) lengths", + tokens.len(), + token_ids.len() + ); + if let Some(ref lps) = logprobs { + debug_assert_eq!( + token_ids.len(), + lps.len(), + "Mismatch: token_ids ({}) vs logprobs ({}) lengths", + token_ids.len(), + lps.len() + ); + } + if let Some(ref tops) = top_logprobs { + debug_assert_eq!( + token_ids.len(), + tops.len(), + "Mismatch: token_ids ({}) vs top_logprobs ({}) lengths", + token_ids.len(), + tops.len() + ); + } let toks = tokens .into_iter() .zip(token_ids) // ...lib/parsers/src/reasoning/base_parser.rs (3)
21-29
: Panic risk: unwrap() on missing vocab entries. Return Result instead.If "" or "" are missing, unwrap() will panic at runtime. Make the constructor fallible and let callers skip reasoning parsing when markers aren’t present.
Apply this diff:
- pub fn new( + pub fn try_new( think_start_token: String, think_end_token: String, force_reasoning: bool, stream_reasoning: bool, vocab: &HashMap<String, u32>, - ) -> Self { - let think_start_token_id = vocab.get(&think_start_token).cloned().unwrap(); - let think_end_token_id = vocab.get(&think_end_token).cloned().unwrap(); + ) -> Option<Self> { + let (Some(think_start_token_id), Some(think_end_token_id)) = ( + vocab.get(&think_start_token).cloned(), + vocab.get(&think_end_token).cloned(), + ) else { + log::warn!( + "Reasoning markers not found in vocab: start='{}' present? {}, end='{}' present? {}", + think_start_token, + vocab.contains_key(&think_start_token), + think_end_token, + vocab.contains_key(&think_end_token) + ); + return None; + }; - Self { + Some(Self { think_start_token: think_start_token_id, think_end_token: think_end_token_id, _in_reasoning: force_reasoning, stream_reasoning, _buffer: Vec::new(), stripped_think_start: false, - } + }) }Follow-up: Update call sites to use try_new (see mod.rs and delta.rs diffs).
60-94
: Bug: tokens before are incorrectly treated as reasoning in non-streaming parse.When a chunk contains normal tokens before the start marker, you only remove the start token, so those leading tokens are carried into the reasoning side. They should be returned as normal, and excluded from reasoning_text.
Apply this diff:
- // The text is considered to be in a reasoning block. - if let Some(pos) = processed_text - .iter() - .position(|&x| x == self.think_start_token) - { - processed_text.remove(pos); - } + // The text is considered to be in a reasoning block. + // Split out any normal prefix before the start marker. + let mut normal_prefix: Vec<u32> = Vec::new(); + if let Some(pos) = processed_text + .iter() + .position(|&x| x == self.think_start_token) + { + if pos > 0 { + normal_prefix = processed_text[..pos].to_vec(); + } + // Keep only content after the start marker + processed_text = processed_text[pos + 1..].to_vec(); + } @@ - if !processed_text.contains(&self.think_end_token) { + if !processed_text.contains(&self.think_end_token) { log::debug!( "Reasoning truncated, think_end_token not found. Returning reasoning text." ); // Assume reasoning was truncated before `think_end_token` return ParserResult { - normal_token_ids: Vec::new(), + // Preserve any normal text that preceded the start marker + normal_token_ids: normal_prefix, reasoning_token_ids: processed_text, }; } @@ - let (reasoning_token_ids, normal_token_ids) = if let Some(pos) = processed_text + let (reasoning_token_ids, mut normal_token_ids) = if let Some(pos) = processed_text .iter() .position(|&x| x == self.think_end_token) { let reasoning_token_ids = processed_text[..pos].to_vec(); - let normal_token_ids = processed_text[pos + 1..].to_vec(); - (reasoning_token_ids, normal_token_ids) + let normal_suffix = processed_text[pos + 1..].to_vec(); + (reasoning_token_ids, normal_suffix) } else { (processed_text, Vec::new()) }; @@ - ParserResult { - normal_token_ids, + // Prepend any normal text that appeared before <think>. + if !normal_prefix.is_empty() { + let mut merged = normal_prefix; + merged.extend(normal_token_ids); + return ParserResult { + normal_token_ids: merged, + reasoning_token_ids, + }; + } + ParserResult { + normal_token_ids, reasoning_token_ids, }
104-122
: Bug: streaming parse misroutes pre- tokens to reasoning. Split the chunk at the start marker.If a single streaming chunk contains normal tokens followed by the start marker, those leading tokens are currently emitted as reasoning due to removing only the marker. Split into (normal_prefix, reasoning_tail), keep reasoning_tail in the buffer, and emit both portions appropriately.
Apply this diff:
fn parse_reasoning_streaming_incremental(&mut self, token_ids: &[u32]) -> ParserResult { // Incrementally parse the streaming text self._buffer.extend(token_ids); - let mut current_text = self._buffer.clone(); + let mut current_text = self._buffer.clone(); + let mut normal_prefix: Vec<u32> = Vec::new(); @@ // Strip `<think>` token if present if !self.stripped_think_start && current_text.contains(&self.think_start_token) { - if let Some(pos) = current_text - .iter() - .position(|&x| x == self.think_start_token) - { - current_text.remove(pos); - } - // current_text = current_text.replace(&self.think_start_token, ""); - self._buffer = current_text.clone(); + if let Some(pos) = current_text.iter().position(|&x| x == self.think_start_token) { + // Split out any preceding normal text in this same chunk + if pos > 0 { + normal_prefix = current_text[..pos].to_vec(); + } + // Keep only content after the start marker + current_text = current_text[pos + 1..].to_vec(); + self._buffer = current_text.clone(); + } self.stripped_think_start = true; self._in_reasoning = true; } @@ - if self._in_reasoning && think_end_idx < current_text.len() { - let reasoning_token_ids = ¤t_text[..think_end_idx]; + if self._in_reasoning && think_end_idx < current_text.len() { + let reasoning_token_ids = ¤t_text[..think_end_idx]; self._buffer.clear(); self._in_reasoning = false; let start_idx = think_end_idx + 1; - let normal_token_ids: &[u32] = if start_idx < current_text.len() { - ¤t_text[start_idx..] - } else { - [].as_ref() - }; - return ParserResult { - normal_token_ids: normal_token_ids.to_vec(), - reasoning_token_ids: reasoning_token_ids.to_vec(), - }; + let normal_suffix: &[u32] = if start_idx < current_text.len() { + ¤t_text[start_idx..] + } else { + [].as_ref() + }; + // Merge any normal prefix (before <think>) with the suffix (after </think>) + let mut normal_out = normal_prefix; + normal_out.extend_from_slice(normal_suffix); + return ParserResult { + normal_token_ids: normal_out, + reasoning_token_ids: reasoning_token_ids.to_vec(), + }; } @@ - if self._in_reasoning && self.stream_reasoning { + if self._in_reasoning && self.stream_reasoning { // Stream the content immediately - let reasoning_token_ids = current_text; + let reasoning_token_ids = current_text; self._buffer.clear(); - ParserResult { - normal_token_ids: Vec::new(), - reasoning_token_ids, - } + ParserResult { + normal_token_ids: normal_prefix, // may be empty + reasoning_token_ids, + } } else if !self._in_reasoning { // If we're not in a reasoning block return as normal text let normal_token_ids = current_text; self._buffer.clear(); ParserResult { - normal_token_ids, + normal_token_ids, reasoning_token_ids: Vec::new(), } } else { // If we are in a reasoning block but no end token is found, return the current buffer ParserResult { - normal_token_ids: Vec::new(), + normal_token_ids: Vec::new(), reasoning_token_ids: Vec::new(), } }Also applies to: 140-174, 176-199
🧹 Nitpick comments (8)
lib/llm/src/tokenizers/hf.rs (1)
41-43
: Use TokenIdType in the public API for consistency; add brief docs.Returning
u32
works, but the rest of the tokenizer surface prefersTokenIdType
. Using it here avoids type drift if the alias changes later. Also, a short doc onwith_added_tokens
will help call sites understand the semantics. This remains zero-cost becauseTokenIdType
is a type alias.Apply this diff:
- pub fn get_vocab(&self, with_added_tokens: bool) -> HashMap<String, u32> { - self.tokenizer.get_vocab(with_added_tokens) - } + /// Returns the tokenizer vocabulary as a map from token string to token id. + /// + /// If `with_added_tokens` is true, includes user-added tokens (special tokens, merges, etc.). + pub fn get_vocab(&self, with_added_tokens: bool) -> HashMap<String, TokenIdType> { + self.tokenizer.get_vocab(with_added_tokens) + }lib/llm/src/protocols/openai/chat_completions.rs (1)
31-31
: Publicly exposingdelta
module: confirm intended API surface and stability.Making the entire module public may expose internal types unintentionally despite the existing
pub use
re-exports. If the goal is only to makeDeltaGenerator
and related helpers available, consider documenting the public contents ofdelta
(module-level docs) and verify no internal-only symbols are inadvertently exported.Would you like me to scan for additional public items inside
delta
and proposepub(crate)
constraints where appropriate?lib/llm/src/preprocessor.rs (2)
92-99
: Store vocab with the tokenizer context; preferTokenIdType
for type consistency.The added
vocab
field is appropriate to feed token-id based parsers. UseTokenIdType
to avoid hard-codingu32
at the API boundary.Apply this diff:
- tokenizer: Arc<dyn Tokenizer>, + tokenizer: Arc<dyn Tokenizer>, model_info: Arc<dyn ModelInfo>, - vocab: HashMap<String, u32>, + vocab: HashMap<String, TokenIdType>,
117-117
: One-time clone of full vocab; acceptable, but note memory footprint.
tokenizer.get_vocab(true)
clones the entire vocab. SinceOpenAIPreprocessor
is wrapped in anArc
and reused, this is likely fine. If we expect many preprocessors per model, consider centralizing and reusing the same vocab instance across operators. No change required now.If memory pressure becomes an issue, I can draft a small cache keyed by
mdcsum
to share the vocab across preprocessors.lib/parsers/src/reasoning/mod.rs (1)
21-37
: Remove or port the commented-out getters to ID-based accessors.These commented methods still reference the old string-based API and add noise. Either delete them or reintroduce ID-based helpers if they are still useful to callers.
Apply this diff to remove the dead code:
-// -// impl ParserResult { -// pub fn get_some_reasoning(&self) -> Option<Vec<u32>> { -// if self.reasoning_token_ids.is_empty() { -// None -// } else { -// Some(self.reasoning_token_ids.clone()) -// } -// } -// -// pub fn get_some_normal_text(&self) -> Option<Vec<u32>> { -// if self.normal_token_ids.is_empty() { -// None -// } else { -// Some(self.normal_token_ids.clone()) -// } -// } -// }lib/llm/src/protocols/openai/chat_completions/delta.rs (1)
200-210
: create_reasoning_content naming/readability nit.The method returns a ParserResult, not directly “content”. Consider renaming to create_reasoning_parse_result or parse_reasoning_delta for clarity. No functional impact.
lib/parsers/src/reasoning/base_parser.rs (2)
123-139
: Remove stale, string-based commented code.These comments reference the pre-migration string matching and are misleading for the ID-based implementation. Safe to delete.
Apply this diff:
- // if self.think_start_token.(¤t_text) - // && self.think_start_token.as_str() != current_text.as_str() - // { - // return ParserResult { - // normal_token_ids: String::new(), - // reasoning_token_ids: String::new(), - // }; - // } - // if self.think_end_token.starts_with(¤t_text) - // && self.think_end_token.as_str() != current_text.as_str() - // { - // return ParserResult { - // normal_token_ids: String::new(), - // reasoning_token_ids: String::new(), - // }; - // }
202-460
: Reintroduce unit tests for ID-based parsing, especially split-at-start cases.The old string-based tests are commented out. Please add minimal tests with a tiny fake vocab mapping "" and "" to IDs and cover:
- normal_prefix + + reasoning + + normal_suffix
- chunk where and preceding normal appear in the same call (streaming)
- missing end marker (ensure normal_prefix is preserved)
I can draft a test module using a tiny vocab HashMap and deterministic IDs if you want me to push a patch.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (9)
lib/llm/src/engines.rs
(2 hunks)lib/llm/src/preprocessor.rs
(4 hunks)lib/llm/src/protocols/openai/chat_completions.rs
(1 hunks)lib/llm/src/protocols/openai/chat_completions/delta.rs
(10 hunks)lib/llm/src/tokenizers/hf.rs
(2 hunks)lib/llm/tests/http-service.rs
(1 hunks)lib/parsers/src/reasoning/base_parser.rs
(4 hunks)lib/parsers/src/reasoning/deepseek_r1_parser.rs
(2 hunks)lib/parsers/src/reasoning/mod.rs
(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
lib/parsers/src/reasoning/deepseek_r1_parser.rs (2)
lib/parsers/src/reasoning/base_parser.rs (3)
new
(20-38)parse_reasoning_streaming_incremental
(104-199)detect_and_parse_reasoning
(42-102)lib/parsers/src/reasoning/mod.rs (4)
parse_reasoning_streaming_incremental
(48-48)parse_reasoning_streaming_incremental
(68-70)detect_and_parse_reasoning
(43-43)detect_and_parse_reasoning
(64-66)
lib/llm/tests/http-service.rs (1)
lib/llm/src/protocols/openai/completions/delta.rs (1)
create_choice
(147-181)
lib/parsers/src/reasoning/base_parser.rs (2)
lib/parsers/src/reasoning/deepseek_r1_parser.rs (3)
new
(16-26)detect_and_parse_reasoning
(34-36)parse_reasoning_streaming_incremental
(30-32)lib/parsers/src/reasoning/mod.rs (4)
detect_and_parse_reasoning
(43-43)detect_and_parse_reasoning
(64-66)parse_reasoning_streaming_incremental
(48-48)parse_reasoning_streaming_incremental
(68-70)
lib/parsers/src/reasoning/mod.rs (2)
lib/parsers/src/reasoning/base_parser.rs (3)
detect_and_parse_reasoning
(42-102)parse_reasoning_streaming_incremental
(104-199)new
(20-38)lib/parsers/src/reasoning/deepseek_r1_parser.rs (3)
detect_and_parse_reasoning
(34-36)parse_reasoning_streaming_incremental
(30-32)new
(16-26)
🔇 Additional comments (11)
lib/llm/src/tokenizers/hf.rs (1)
16-17
: Import looks correct and scoped.Importing
HashMap
is required for the new API. No concerns.lib/llm/tests/http-service.rs (2)
98-99
: Good migration to instance-based generator.Switching to
request.response_generator(None, None)
aligns the tests with the new API shape and keeps model/metadata consistent with the request.
103-104
: Updatedcreate_choice
call shape acknowledged.Adding the trailing
None
forreasoning_content
matches the new signature and keeps tests focused on non-reasoning paths.lib/llm/src/preprocessor.rs (1)
132-133
: Field initialization matches new struct; LGTM.
vocab
is correctly captured once during construction.lib/llm/src/engines.rs (2)
186-187
: Use of(None, None)
for tokenizer/vocab in echo engine is appropriate.The echo engine emits character-level chunks and doesn’t need token-aware reasoning. This keeps the test engine minimal.
207-208
: Updatedcreate_choice
signature usage confirmed.Supplying
None
forreasoning_content
keeps behavior unchanged while conforming to the new API. No issues.Also applies to: 212-213
lib/parsers/src/reasoning/deepseek_r1_parser.rs (1)
29-36
: Delegation looks correct.Forwarding to the base parser preserves the updated token-ID semantics. No concerns here.
lib/parsers/src/reasoning/mod.rs (2)
39-49
: Trait API migration to token IDs looks consistent.Signatures and docs are aligned with the new token-id based parsing. Good.
63-71
: Wrapper forwarding is clean.The wrapper delegates correctly to the inner parser with the new signatures.
lib/llm/src/protocols/openai/chat_completions/delta.rs (2)
365-371
: create_choice wiring with reasoning_content looks good.Passing both normal_text and reasoning_content aligns with the new delta shape. Nice.
21-33
: Public API change forresponse_generator
– call sites updated, version bump recommendedAll existing call sites correctly use the new two-parameter chat variant and the unchanged no-arg completions variant:
- In lib/llm/src/preprocessor.rs, the chat path invokes
request.response_generator(Some(&self.tokenizer), Some(&self.vocab));
and the common‐completion path invokes
request.response_generator();
- In lib/llm/src/engines.rs and lib/llm/tests/http-service.rs, the chat stream uses
request.response_generator(None, None);
- The standalone completions API in lib/llm/src/protocols/openai/completions/delta.rs still provides the zero-argument
response_generator()
.Everything compiles against both signatures, so no call sites are missing. Since this constitutes a public API addition in the chat module, consider bumping the crate’s minor version to signal compatibility-preserving change.
let response_generator = | ||
request.response_generator(Some(&self.tokenizer), Some(&self.vocab)); | ||
let mut response_generator = Box::new(response_generator); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Passing tokenizer and vocab downstream is correct and future-proof.
Providing both Some(&self.tokenizer)
and Some(&self.vocab)
enables token-id aware postprocessing. Confirm all other call sites updated to the new signature.
Run to locate any lingering old signature usages:
🏁 Script executed:
#!/bin/bash
# Find old zero-arg response_generator calls for chat completions
rg -nP --type=rust '\.response_generator\s*\(\s*\)' -C2
# Find four-arg create_choice calls in chat completions path (should be five-arg now)
rg -nP --type=rust '\.create_choice\s*\(\s*[^\)]*(?:,[^\)]*){3}\s*\)' -g '!*completions/**' -C2
Length of output: 1772
Critical: Lingering old response_generator()
and four-arg create_choice()
call sites must be updated
The grep run uncovered remaining zero-argument calls to response_generator()
and a four-argument call to create_choice()
. These must be changed to the new signatures before merging:
• lib/engines/mistralrs/src/lib.rs:489
// before
let response_generator = request.response_generator();
// after
let response_generator =
request.response_generator(Some(&self.tokenizer), Some(&self.vocab));
• lib/llm/src/preprocessor.rs:564
// before
let response_generator = request.response_generator();
// after
let response_generator =
request.response_generator(Some(&self.tokenizer), Some(&self.vocab));
• lib/llm/src/engines.rs:233
// before
let deltas = request.response_generator();
// after
let deltas =
request.response_generator(Some(&self.tokenizer), Some(&self.vocab));
• lib/llm/src/engines.rs:212
// before (four args)
let response = deltas.create_choice(
0,
None,
None,
Some(dynamo_async_openai::types::FinishReason::Stop),
None,
);
// after (five args – add the missing parameter to match the updated signature)
let response = deltas.create_choice(
0,
None,
None,
Some(dynamo_async_openai::types::FinishReason::Stop),
None,
/* e.g. Some(&self.tokenizer) or the new parameter */,
);
Please update these call sites to pass the tokenizer and vocab (or the appropriate fifth argument) so that token-ID-aware postprocessing works everywhere.
🤖 Prompt for AI Agents
In lib/llm/src/preprocessor.rs around lines 505-507, the code calls
request.response_generator() without the new tokenizer and vocab args and then
boxes it; change the call to request.response_generator(Some(&self.tokenizer),
Some(&self.vocab)) and then Box::new that result (i.e., replace the zero-arg
invocation with the two-arg form and keep the Box::new wrapping).
pub fn new( | ||
model: String, | ||
options: DeltaGeneratorOptions, | ||
tokenizer: Option<&Arc<(dyn Tokenizer + 'static)>>, | ||
vocab: Option<&HashMap<String, u32>>, | ||
) -> Self { | ||
let now = std::time::SystemTime::now() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Avoid constructing a parser when markers are missing (use try_get + and_then).
If vocab is present but lacks "" or "", BasicReasoningParser::new will panic. Use a fallible acquisition to skip reasoning gracefully.
Apply this diff (paired with mod.rs change):
- let reasoning_parser_type = ReasoningParserType::Basic;
- // Reasoning parser wrapper
- let reasoning_parser = vocab.map(|v| reasoning_parser_type.get_reasoning_parser(v));
+ let reasoning_parser_type = ReasoningParserType::Basic;
+ // Try to create a reasoning parser only if vocab has required markers.
+ let reasoning_parser = vocab.and_then(|v| reasoning_parser_type.try_get_reasoning_parser(v));
Also applies to: 104-111, 123-124
🤖 Prompt for AI Agents
In lib/llm/src/protocols/openai/chat_completions/delta.rs around lines 81-87
(and similarly at 104-111 and 123-124), avoid unconditionally constructing
BasicReasoningParser when the vocab may be missing the "<think>" or "</think>"
markers; instead use fallible lookups (e.g., vocab.try_get or vocab.get
and_then) to retrieve both token IDs and only call BasicReasoningParser::new if
both markers are present, otherwise leave the reasoning parser as None so
reasoning is skipped gracefully; update the mapping logic to chain Option
results (and_then/map) and propagate None when either marker is absent.
// Create the streaming response. | ||
let reasoning_parser_result = self.create_reasoning_content(delta.token_ids); | ||
|
||
let (normal_text, reasoning_content) = if let Some(parser_result) = reasoning_parser_result | ||
{ | ||
let none_if_empty = |vec: String| { | ||
if vec.is_empty() { | ||
None | ||
} else { | ||
Some(vec) | ||
} | ||
}; | ||
( | ||
none_if_empty( | ||
self.tokenizer | ||
.as_ref() | ||
.unwrap() | ||
.decode(&parser_result.normal_token_ids, false) | ||
.unwrap(), | ||
), | ||
none_if_empty( | ||
self.tokenizer | ||
.as_ref() | ||
.unwrap() | ||
.decode(&parser_result.reasoning_token_ids, false) | ||
.unwrap(), | ||
), | ||
) | ||
} else { | ||
(delta.text, None) | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Don’t unwrap on tokenizer.decode during streaming; fall back gracefully.
Decoding can fail (e.g., invalid IDs or tokenizer errors). Unwrap will crash the stream. Handle decode errors and treat them as “no delta” or log and continue.
Apply this diff:
- let (normal_text, reasoning_content) = if let Some(parser_result) = reasoning_parser_result
- {
- let none_if_empty = |vec: String| {
- if vec.is_empty() {
- None
- } else {
- Some(vec)
- }
- };
- (
- none_if_empty(
- self.tokenizer
- .as_ref()
- .unwrap()
- .decode(&parser_result.normal_token_ids, false)
- .unwrap(),
- ),
- none_if_empty(
- self.tokenizer
- .as_ref()
- .unwrap()
- .decode(&parser_result.reasoning_token_ids, false)
- .unwrap(),
- ),
- )
- } else {
- (delta.text, None)
- };
+ let (normal_text, reasoning_content) = if let Some(parser_result) = reasoning_parser_result {
+ let decode = |ids: &Vec<u32>| {
+ self.tokenizer
+ .as_ref()
+ .unwrap()
+ .decode(ids, false)
+ .ok()
+ .and_then(|s| if s.is_empty() { None } else { Some(s) })
+ };
+ (decode(&parser_result.normal_token_ids), decode(&parser_result.reasoning_token_ids))
+ } else {
+ (delta.text, None)
+ };
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
// Create the streaming response. | |
let reasoning_parser_result = self.create_reasoning_content(delta.token_ids); | |
let (normal_text, reasoning_content) = if let Some(parser_result) = reasoning_parser_result | |
{ | |
let none_if_empty = |vec: String| { | |
if vec.is_empty() { | |
None | |
} else { | |
Some(vec) | |
} | |
}; | |
( | |
none_if_empty( | |
self.tokenizer | |
.as_ref() | |
.unwrap() | |
.decode(&parser_result.normal_token_ids, false) | |
.unwrap(), | |
), | |
none_if_empty( | |
self.tokenizer | |
.as_ref() | |
.unwrap() | |
.decode(&parser_result.reasoning_token_ids, false) | |
.unwrap(), | |
), | |
) | |
} else { | |
(delta.text, None) | |
}; | |
// Create the streaming response. | |
let reasoning_parser_result = self.create_reasoning_content(delta.token_ids); | |
let (normal_text, reasoning_content) = if let Some(parser_result) = reasoning_parser_result { | |
let decode = |ids: &Vec<u32>| { | |
self.tokenizer | |
.as_ref() | |
.unwrap() | |
.decode(ids, false) | |
.ok() | |
.and_then(|s| if s.is_empty() { None } else { Some(s) }) | |
}; | |
(decode(&parser_result.normal_token_ids), decode(&parser_result.reasoning_token_ids)) | |
} else { | |
(delta.text, None) | |
}; |
🤖 Prompt for AI Agents
In lib/llm/src/protocols/openai/chat_completions/delta.rs around lines 333 to
363, the code calls tokenizer.decode(...).unwrap() which will panic on decode
errors during streaming; instead handle decode failures gracefully by replacing
unwraps with error-aware logic: attempt to decode each token id vector with
tokenizer.decode(...), map successful results to Some(string) if non-empty, and
on Err return None (or log the error and return None) so streaming continues;
ensure the closure none_if_empty is applied after successful decode and that in
the else branch you still use delta.text when appropriate.
pub fn new(vocab: &HashMap<String, u32>) -> Self { | ||
Self { | ||
base: BasicReasoningParser::new( | ||
"<think>".to_string(), | ||
"</think>".to_string(), | ||
true, | ||
true, | ||
vocab, | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid panics when tokens are missing in vocab (make construction fallible).
BasicReasoningParser::new currently unwraps token IDs from the vocab. If the provided tokenizer/vocab doesn’t contain "" and/or "", this will panic during parser construction. Prefer a fallible constructor and propagate None/Result to the caller. See proposed cross-file diffs in base_parser.rs and mod.rs to introduce a try_new/try_get pattern and wire it in delta.rs.
Apply this diff here once BasicReasoningParser::try_new exists:
- pub fn new(vocab: &HashMap<String, u32>) -> Self {
- Self {
- base: BasicReasoningParser::new(
+ pub fn new(vocab: &HashMap<String, u32>) -> Self {
+ Self {
+ base: BasicReasoningParser::try_new(
"<think>".to_string(),
"</think>".to_string(),
true,
true,
- vocab,
- ),
+ vocab,
+ )
+ .expect("DeepseekR1ReasoningParser requires <think> markers in vocab"),
}
}
If you prefer to avoid propagating Result here, you can log+skip parser creation at the callsite (see delta.rs diff) and keep this file unchanged. Let me know your preferred approach and I’ll align the patches.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
pub fn new(vocab: &HashMap<String, u32>) -> Self { | |
Self { | |
base: BasicReasoningParser::new( | |
"<think>".to_string(), | |
"</think>".to_string(), | |
true, | |
true, | |
vocab, | |
), | |
pub fn new(vocab: &HashMap<String, u32>) -> Self { | |
Self { | |
base: BasicReasoningParser::try_new( | |
"<think>".to_string(), | |
"</think>".to_string(), | |
true, | |
true, | |
vocab, | |
) | |
.expect("DeepseekR1ReasoningParser requires <think> markers in vocab"), | |
} | |
} |
🤖 Prompt for AI Agents
In lib/parsers/src/reasoning/deepseek_r1_parser.rs around lines 16 to 24, the
current constructor calls BasicReasoningParser::new which unwraps vocab token
IDs and can panic if "<think>" or "</think>" are missing; change this to call
BasicReasoningParser::try_new (the fallible constructor) and propagate the
failure by making DeepseekR1Parser::new return a Result or Option (matching the
new base API) so construction does not panic: replace the direct call with the
try_new call, map/propagate the returned error or None to the caller, and update
the function signature and any calling sites accordingly so missing tokens are
handled gracefully.
impl ReasoningParserType { | ||
pub fn get_reasoning_parser(self) -> ReasoningParserWrapper { | ||
pub fn get_reasoning_parser(self, vocab: &HashMap<String, u32>) -> ReasoningParserWrapper { | ||
match self { | ||
ReasoningParserType::DeepseekR1 => ReasoningParserWrapper { | ||
parser: Box::new(DeepseekR1ReasoningParser::new()), | ||
parser: Box::new(DeepseekR1ReasoningParser::new(vocab)), | ||
}, | ||
ReasoningParserType::Basic => ReasoningParserWrapper { | ||
parser: Box::new(BasicReasoningParser::new( | ||
"<think>".into(), | ||
"</think>".into(), | ||
false, | ||
true, | ||
vocab, | ||
)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Make parser acquisition fallible to avoid runtime panics when markers are absent.
get_reasoning_parser unconditionally constructs a parser, but BasicReasoningParser::new unwraps marker IDs and can panic. Recommend returning Option and letting call sites skip reasoning gracefully if the vocab lacks markers. This is low-risk and avoids hard failures in heterogeneous model/tokenizer setups.
Apply this diff:
-impl ReasoningParserType {
- pub fn get_reasoning_parser(self, vocab: &HashMap<String, u32>) -> ReasoningParserWrapper {
- match self {
- ReasoningParserType::DeepseekR1 => ReasoningParserWrapper {
- parser: Box::new(DeepseekR1ReasoningParser::new(vocab)),
- },
- ReasoningParserType::Basic => ReasoningParserWrapper {
- parser: Box::new(BasicReasoningParser::new(
- "<think>".into(),
- "</think>".into(),
- false,
- true,
- vocab,
- )),
- },
- }
- }
-}
+impl ReasoningParserType {
+ pub fn try_get_reasoning_parser(
+ self,
+ vocab: &HashMap<String, u32>,
+ ) -> Option<ReasoningParserWrapper> {
+ match self {
+ ReasoningParserType::DeepseekR1 => {
+ // If you keep DeepseekR1ReasoningParser::new infallible, guard here:
+ if !vocab.contains_key("<think>") || !vocab.contains_key("</think>") {
+ return None;
+ }
+ Some(ReasoningParserWrapper {
+ parser: Box::new(DeepseekR1ReasoningParser::new(vocab)),
+ })
+ }
+ ReasoningParserType::Basic => {
+ BasicReasoningParser::try_new(
+ "<think>".into(),
+ "</think>".into(),
+ false,
+ true,
+ vocab,
+ )
+ .map(|p| ReasoningParserWrapper { parser: Box::new(p) })
+ }
+ }
+ }
+}
Follow-up: wire this into DeltaGenerator::new (see delta.rs comment).
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In lib/parsers/src/reasoning/mod.rs around lines 73 to 86, get_reasoning_parser
currently always constructs a parser but BasicReasoningParser::new can panic
when marker IDs are missing; change get_reasoning_parser to return
Option<ReasoningParserWrapper> (or Result with context) and make the Basic
branch propagate None when marker lookup fails instead of unwrapping; update the
DeepseekR1 branch to return Some(...), adjust the function signature and all
call sites (notably wire the fallible result through into DeltaGenerator::new in
delta.rs so callers can skip reasoning if markers are absent), and ensure
compiler errors are resolved by handling the Option/Result where parsers are
requested.
closing in favour of #2656 |
Overview:
Use token ids directly instead of pure strings for reasoning parsers, also add a way through which tokenizer and vocab can be provided to the postprocessing.
Summary by CodeRabbit
New Features
Refactor
Tests