chore: use token ids for reasoning parsing instead of strings directly #2629

nachiketb-nvidia · 2025-08-22T00:49:35Z

Overview:

Use token ids directly instead of pure strings for reasoning parsers, also add a way through which tokenizer and vocab can be provided to the postprocessing.

Summary by CodeRabbit

New Features
- Streaming chat responses can now include separate “reasoning” content when available, alongside normal output.
- Improved tokenization-aware streaming for smoother, more accurate partial outputs and logprobs.
- Exposed advanced streaming delta utilities for deeper integrations.
Refactor
- Upgraded underlying streaming and parsing pipeline to operate on token IDs for better performance and reliability.
- Consolidated generator initialization to a single, request-based entry point for consistency across the app.
Tests
- Updated tests to reflect token ID–based parsing and new streaming behaviors.

copy-pr-bot · 2025-08-22T00:49:39Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2025-08-22T01:00:31Z

Walkthrough

Updates thread tokenizer and vocab into the chat-completions delta pipeline, switch reasoning parsing to operate on token IDs, expose delta module publicly, and adjust generator APIs and call sites to accept optional tokenizer/vocab and a new reasoning_content argument in create_choice.

Changes

Cohort / File(s)	Summary
Generator API & call sites `lib/llm/src/engines.rs`, `lib/llm/tests/http-service.rs`, `lib/llm/src/preprocessor.rs`	Switched to request.response_generator(Some(&tokenizer), Some(&vocab)) and updated create_choice arity to include reasoning_content; engines/tests pass None,None where applicable.
Delta module & streaming `lib/llm/src/protocols/openai/chat_completions.rs`, `lib/llm/src/protocols/openai/chat_completions/delta.rs`	Made delta module public; DeltaGenerator now accepts optional tokenizer/vocab, holds optional tokenizer and reasoning_parser, decodes token IDs to emit text plus optional reasoning_content; create_choice adds reasoning_content param and argument order updated.
Parsers token‑ID refactor `lib/parsers/src/reasoning/mod.rs`, `lib/parsers/src/reasoning/base_parser.rs`, `lib/parsers/src/reasoning/deepseek_r1_parser.rs`	Refactored parser APIs and data to token IDs: constructors take vocab, ParserResult fields become *_token_ids, streaming/single-shot methods accept &[u32]; removed string-based helpers; Basic parser stores u32 tokens and Vec buffer.
Tokenizer vocab API `lib/llm/src/tokenizers/hf.rs`	Added HuggingFaceTokenizer::get_vocab(with_added_tokens) -> HashMap<String,u32> to expose vocabulary.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client
  participant Preprocessor
  participant Request
  participant DeltaGenerator
  participant Stream

  Client->>Preprocessor: NvCreateChatCompletionRequest
  Preprocessor->>Preprocessor: tokenizer.get_vocab(true)
  Preprocessor->>Request: response_generator(Some(&tokenizer), Some(&vocab))
  Request-->>Preprocessor: DeltaGenerator(model, opts, tokenizer?, vocab?)
  Preprocessor->>DeltaGenerator: stream tokens (token_ids, text)

  rect rgba(220,235,255,0.5)
    note over DeltaGenerator: Optional reasoning path
    DeltaGenerator->>DeltaGenerator: reasoning_parser?vocab→init
    DeltaGenerator->>DeltaGenerator: parse token_ids → {normal_token_ids, reasoning_token_ids}
    DeltaGenerator->>DeltaGenerator: decode via tokenizer? → text, reasoning_content?
  end

  DeltaGenerator-->>Stream: create_choice(index, text?, reasoning_content?, finish_reason?, logprobs?)
  Stream-->>Client: incremental deltas
  DeltaGenerator-->>Stream: final create_choice(..., FinishReason::Stop, ...)
  Stream-->>Client: termination

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

feat: enable basic reasoning parsing of <think> </think> tokens #2555 — Also modifies the chat-completions delta pipeline to integrate reasoning parsing and adjust create_choice/response_generator usage.
chore: remove flatten for chat response types, add reasoning_content #2543 — Extends the delta/stream path to carry reasoning_content, aligning with this PR’s create_choice changes.
feat: added parsers lib #2542 — Introduces/updates the parsers crate; this PR further evolves those parser modules to token-ID based APIs.

Poem

I nibble bytes, a rabbit spry,
Tokens hop as numbers fly,
Vocab seeds in fields of code,
Reasoning blooms along the road.
Streams now whisper what they meant—
Both thoughts and text in one event.
Thump-thump: shipped, content! 🐇✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

lib/llm/src/protocols/openai/chat_completions/delta.rs (1)

307-313: Add explicit length checks to prevent silent truncation in create_logprobs.

Before zipping the token, token_id, and logprob vectors—which currently will truncate to the shortest length without warning—insert runtime (or debug) assertions to ensure:

tokens.len() == token_ids.len()
token_ids.len() == logprobs.len() (when logprobs.is_some())
token_ids.len() == top_logprobs.len() (when top_logprobs.is_some())

This will catch any backend that produces mismatched lengths and avoid silent data loss.

Locations to update:

lib/llm/src/protocols/openai/chat_completions/delta.rs around fn create_logprobs (just before the first .zip).

Proposed diff:

 impl fn create_logprobs(
     &self,
     tokens: Vec<TokenType>,
     token_ids: Vec<TokenIdType>,
     logprobs: Option<LogProbs>,
     top_logprobs: Option<TopLogprobs>,
 ) -> Option<ChatChoiceLogprobs> {
     if !self.options.enable_logprobs || logprobs.is_none() {
         return None;
     }

+    // Prevent silent truncation when zipping mismatched vectors
+    debug_assert_eq!(
+        tokens.len(),
+        token_ids.len(),
+        "Mismatch: tokens ({}) vs token_ids ({}) lengths",
+        tokens.len(),
+        token_ids.len()
+    );
+    if let Some(ref lps) = logprobs {
+        debug_assert_eq!(
+            token_ids.len(),
+            lps.len(),
+            "Mismatch: token_ids ({}) vs logprobs ({}) lengths",
+            token_ids.len(),
+            lps.len()
+        );
+    }
+    if let Some(ref tops) = top_logprobs {
+        debug_assert_eq!(
+            token_ids.len(),
+            tops.len(),
+            "Mismatch: token_ids ({}) vs top_logprobs ({}) lengths",
+            token_ids.len(),
+            tops.len()
+        );
+    }

     let toks = tokens
         .into_iter()
         .zip(token_ids)
         // ...

lib/parsers/src/reasoning/base_parser.rs (3)

21-29: Panic risk: unwrap() on missing vocab entries. Return Result instead.

If "" or "" are missing, unwrap() will panic at runtime. Make the constructor fallible and let callers skip reasoning parsing when markers aren’t present.

Apply this diff:

-    pub fn new(
+    pub fn try_new(
         think_start_token: String,
         think_end_token: String,
         force_reasoning: bool,
         stream_reasoning: bool,
         vocab: &HashMap<String, u32>,
-    ) -> Self {
-        let think_start_token_id = vocab.get(&think_start_token).cloned().unwrap();
-        let think_end_token_id = vocab.get(&think_end_token).cloned().unwrap();
+    ) -> Option<Self> {
+        let (Some(think_start_token_id), Some(think_end_token_id)) = (
+            vocab.get(&think_start_token).cloned(),
+            vocab.get(&think_end_token).cloned(),
+        ) else {
+            log::warn!(
+                "Reasoning markers not found in vocab: start='{}' present? {}, end='{}' present? {}",
+                think_start_token,
+                vocab.contains_key(&think_start_token),
+                think_end_token,
+                vocab.contains_key(&think_end_token)
+            );
+            return None;
+        };
 
-        Self {
+        Some(Self {
             think_start_token: think_start_token_id,
             think_end_token: think_end_token_id,
             _in_reasoning: force_reasoning,
             stream_reasoning,
             _buffer: Vec::new(),
             stripped_think_start: false,
-        }
+        })
     }

Follow-up: Update call sites to use try_new (see mod.rs and delta.rs diffs).

60-94: Bug: tokens before are incorrectly treated as reasoning in non-streaming parse.

When a chunk contains normal tokens before the start marker, you only remove the start token, so those leading tokens are carried into the reasoning side. They should be returned as normal, and excluded from reasoning_text.

Apply this diff:

-        // The text is considered to be in a reasoning block.
-        if let Some(pos) = processed_text
-            .iter()
-            .position(|&x| x == self.think_start_token)
-        {
-            processed_text.remove(pos);
-        }
+        // The text is considered to be in a reasoning block.
+        // Split out any normal prefix before the start marker.
+        let mut normal_prefix: Vec<u32> = Vec::new();
+        if let Some(pos) = processed_text
+            .iter()
+            .position(|&x| x == self.think_start_token)
+        {
+            if pos > 0 {
+                normal_prefix = processed_text[..pos].to_vec();
+            }
+            // Keep only content after the start marker
+            processed_text = processed_text[pos + 1..].to_vec();
+        }
@@
-        if !processed_text.contains(&self.think_end_token) {
+        if !processed_text.contains(&self.think_end_token) {
             log::debug!(
                 "Reasoning truncated, think_end_token not found. Returning reasoning text."
             );
             // Assume reasoning was truncated before `think_end_token`
             return ParserResult {
-                normal_token_ids: Vec::new(),
+                // Preserve any normal text that preceded the start marker
+                normal_token_ids: normal_prefix,
                 reasoning_token_ids: processed_text,
             };
         }
@@
-        let (reasoning_token_ids, normal_token_ids) = if let Some(pos) = processed_text
+        let (reasoning_token_ids, mut normal_token_ids) = if let Some(pos) = processed_text
             .iter()
             .position(|&x| x == self.think_end_token)
         {
             let reasoning_token_ids = processed_text[..pos].to_vec();
-            let normal_token_ids = processed_text[pos + 1..].to_vec();
-            (reasoning_token_ids, normal_token_ids)
+            let normal_suffix = processed_text[pos + 1..].to_vec();
+            (reasoning_token_ids, normal_suffix)
         } else {
             (processed_text, Vec::new())
         };
@@
-        ParserResult {
-            normal_token_ids,
+        // Prepend any normal text that appeared before <think>.
+        if !normal_prefix.is_empty() {
+            let mut merged = normal_prefix;
+            merged.extend(normal_token_ids);
+            return ParserResult {
+                normal_token_ids: merged,
+                reasoning_token_ids,
+            };
+        }
+        ParserResult {
+            normal_token_ids,
             reasoning_token_ids,
         }

104-122: Bug: streaming parse misroutes pre- tokens to reasoning. Split the chunk at the start marker.

If a single streaming chunk contains normal tokens followed by the start marker, those leading tokens are currently emitted as reasoning due to removing only the marker. Split into (normal_prefix, reasoning_tail), keep reasoning_tail in the buffer, and emit both portions appropriately.

Apply this diff:

     fn parse_reasoning_streaming_incremental(&mut self, token_ids: &[u32]) -> ParserResult {
         // Incrementally parse the streaming text
         self._buffer.extend(token_ids);
-        let mut current_text = self._buffer.clone();
+        let mut current_text = self._buffer.clone();
+        let mut normal_prefix: Vec<u32> = Vec::new();
@@
         // Strip `<think>` token if present
         if !self.stripped_think_start && current_text.contains(&self.think_start_token) {
-            if let Some(pos) = current_text
-                .iter()
-                .position(|&x| x == self.think_start_token)
-            {
-                current_text.remove(pos);
-            }
-            // current_text = current_text.replace(&self.think_start_token, "");
-            self._buffer = current_text.clone();
+            if let Some(pos) = current_text.iter().position(|&x| x == self.think_start_token) {
+                // Split out any preceding normal text in this same chunk
+                if pos > 0 {
+                    normal_prefix = current_text[..pos].to_vec();
+                }
+                // Keep only content after the start marker
+                current_text = current_text[pos + 1..].to_vec();
+                self._buffer = current_text.clone();
+            }
             self.stripped_think_start = true;
             self._in_reasoning = true;
         }
@@
-        if self._in_reasoning && think_end_idx < current_text.len() {
-            let reasoning_token_ids = &current_text[..think_end_idx];
+        if self._in_reasoning && think_end_idx < current_text.len() {
+            let reasoning_token_ids = &current_text[..think_end_idx];
             self._buffer.clear();
             self._in_reasoning = false;
             let start_idx = think_end_idx + 1;
-            let normal_token_ids: &[u32] = if start_idx < current_text.len() {
-                &current_text[start_idx..]
-            } else {
-                [].as_ref()
-            };
-            return ParserResult {
-                normal_token_ids: normal_token_ids.to_vec(),
-                reasoning_token_ids: reasoning_token_ids.to_vec(),
-            };
+            let normal_suffix: &[u32] = if start_idx < current_text.len() {
+                &current_text[start_idx..]
+            } else {
+                [].as_ref()
+            };
+            // Merge any normal prefix (before <think>) with the suffix (after </think>)
+            let mut normal_out = normal_prefix;
+            normal_out.extend_from_slice(normal_suffix);
+            return ParserResult {
+                normal_token_ids: normal_out,
+                reasoning_token_ids: reasoning_token_ids.to_vec(),
+            };
         }
@@
-        if self._in_reasoning && self.stream_reasoning {
+        if self._in_reasoning && self.stream_reasoning {
             // Stream the content immediately
-            let reasoning_token_ids = current_text;
+            let reasoning_token_ids = current_text;
             self._buffer.clear();
-            ParserResult {
-                normal_token_ids: Vec::new(),
-                reasoning_token_ids,
-            }
+            ParserResult {
+                normal_token_ids: normal_prefix, // may be empty
+                reasoning_token_ids,
+            }
         } else if !self._in_reasoning {
             // If we're not in a reasoning block return as normal text
             let normal_token_ids = current_text;
             self._buffer.clear();
             ParserResult {
-                normal_token_ids,
+                normal_token_ids,
                 reasoning_token_ids: Vec::new(),
             }
         } else {
             // If we are in a reasoning block but no end token is found, return the current buffer
             ParserResult {
-                normal_token_ids: Vec::new(),
+                normal_token_ids: Vec::new(),
                 reasoning_token_ids: Vec::new(),
             }
         }

Also applies to: 140-174, 176-199

🧹 Nitpick comments (8)

lib/llm/src/tokenizers/hf.rs (1)
41-43: Use TokenIdType in the public API for consistency; add brief docs.

Returning u32 works, but the rest of the tokenizer surface prefers TokenIdType. Using it here avoids type drift if the alias changes later. Also, a short doc on with_added_tokens will help call sites understand the semantics. This remains zero-cost because TokenIdType is a type alias.

Apply this diff:
-    pub fn get_vocab(&self, with_added_tokens: bool) -> HashMap<String, u32> {
-        self.tokenizer.get_vocab(with_added_tokens)
-    }
+    /// Returns the tokenizer vocabulary as a map from token string to token id.
+    ///
+    /// If `with_added_tokens` is true, includes user-added tokens (special tokens, merges, etc.).
+    pub fn get_vocab(&self, with_added_tokens: bool) -> HashMap<String, TokenIdType> {
+        self.tokenizer.get_vocab(with_added_tokens)
+    }
lib/llm/src/protocols/openai/chat_completions.rs (1)

31-31: Publicly exposing delta module: confirm intended API surface and stability.

Making the entire module public may expose internal types unintentionally despite the existing pub use re-exports. If the goal is only to make DeltaGenerator and related helpers available, consider documenting the public contents of delta (module-level docs) and verify no internal-only symbols are inadvertently exported.

Would you like me to scan for additional public items inside delta and propose pub(crate) constraints where appropriate?
lib/llm/src/preprocessor.rs (2)
92-99: Store vocab with the tokenizer context; prefer TokenIdType for type consistency.

The added vocab field is appropriate to feed token-id based parsers. Use TokenIdType to avoid hard-coding u32 at the API boundary.

Apply this diff:
-    tokenizer: Arc<dyn Tokenizer>,
+    tokenizer: Arc<dyn Tokenizer>,
     model_info: Arc<dyn ModelInfo>,
-    vocab: HashMap<String, u32>,
+    vocab: HashMap<String, TokenIdType>,
117-117: One-time clone of full vocab; acceptable, but note memory footprint.

tokenizer.get_vocab(true) clones the entire vocab. Since OpenAIPreprocessor is wrapped in an Arc and reused, this is likely fine. If we expect many preprocessors per model, consider centralizing and reusing the same vocab instance across operators. No change required now.

If memory pressure becomes an issue, I can draft a small cache keyed by mdcsum to share the vocab across preprocessors.
lib/parsers/src/reasoning/mod.rs (1)
21-37: Remove or port the commented-out getters to ID-based accessors.

These commented methods still reference the old string-based API and add noise. Either delete them or reintroduce ID-based helpers if they are still useful to callers.

Apply this diff to remove the dead code:
-//
-// impl ParserResult {
-//     pub fn get_some_reasoning(&self) -> Option<Vec<u32>> {
-//         if self.reasoning_token_ids.is_empty() {
-//             None
-//         } else {
-//             Some(self.reasoning_token_ids.clone())
-//         }
-//     }
-//
-//     pub fn get_some_normal_text(&self) -> Option<Vec<u32>> {
-//         if self.normal_token_ids.is_empty() {
-//             None
-//         } else {
-//             Some(self.normal_token_ids.clone())
-//         }
-//     }
-// }
lib/llm/src/protocols/openai/chat_completions/delta.rs (1)

200-210: create_reasoning_content naming/readability nit.

The method returns a ParserResult, not directly “content”. Consider renaming to create_reasoning_parse_result or parse_reasoning_delta for clarity. No functional impact.
lib/parsers/src/reasoning/base_parser.rs (2)
123-139: Remove stale, string-based commented code.

These comments reference the pre-migration string matching and are misleading for the ID-based implementation. Safe to delete.

Apply this diff:
-        // if self.think_start_token.(&current_text)
-        //     && self.think_start_token.as_str() != current_text.as_str()
-        // {
-        //     return ParserResult {
-        //         normal_token_ids: String::new(),
-        //         reasoning_token_ids: String::new(),
-        //     };
-        // }
-        // if self.think_end_token.starts_with(&current_text)
-        //     && self.think_end_token.as_str() != current_text.as_str()
-        // {
-        //     return ParserResult {
-        //         normal_token_ids: String::new(),
-        //         reasoning_token_ids: String::new(),
-        //     };
-        // }
202-460: Reintroduce unit tests for ID-based parsing, especially split-at-start cases.

The old string-based tests are commented out. Please add minimal tests with a tiny fake vocab mapping "" and "" to IDs and cover:

normal_prefix + + reasoning + + normal_suffix

chunk where and preceding normal appear in the same call (streaming)

missing end marker (ensure normal_prefix is preserved)

I can draft a test module using a tiny vocab HashMap and deterministic IDs if you want me to push a patch.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b5fa0a3 and f6c5ed4.

📒 Files selected for processing (9)

lib/llm/src/engines.rs (2 hunks)
lib/llm/src/preprocessor.rs (4 hunks)
lib/llm/src/protocols/openai/chat_completions.rs (1 hunks)
lib/llm/src/protocols/openai/chat_completions/delta.rs (10 hunks)
lib/llm/src/tokenizers/hf.rs (2 hunks)
lib/llm/tests/http-service.rs (1 hunks)
lib/parsers/src/reasoning/base_parser.rs (4 hunks)
lib/parsers/src/reasoning/deepseek_r1_parser.rs (2 hunks)
lib/parsers/src/reasoning/mod.rs (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (4)

lib/parsers/src/reasoning/deepseek_r1_parser.rs (2)

lib/parsers/src/reasoning/base_parser.rs (3)

new (20-38)

parse_reasoning_streaming_incremental (104-199)

detect_and_parse_reasoning (42-102)

lib/parsers/src/reasoning/mod.rs (4)

parse_reasoning_streaming_incremental (48-48)

parse_reasoning_streaming_incremental (68-70)

detect_and_parse_reasoning (43-43)

detect_and_parse_reasoning (64-66)

lib/llm/tests/http-service.rs (1)

lib/llm/src/protocols/openai/completions/delta.rs (1)

create_choice (147-181)

lib/parsers/src/reasoning/base_parser.rs (2)

lib/parsers/src/reasoning/deepseek_r1_parser.rs (3)

new (16-26)

detect_and_parse_reasoning (34-36)

parse_reasoning_streaming_incremental (30-32)

lib/parsers/src/reasoning/mod.rs (4)

detect_and_parse_reasoning (43-43)

detect_and_parse_reasoning (64-66)

parse_reasoning_streaming_incremental (48-48)

parse_reasoning_streaming_incremental (68-70)

lib/parsers/src/reasoning/mod.rs (2)

lib/parsers/src/reasoning/base_parser.rs (3)

detect_and_parse_reasoning (42-102)

parse_reasoning_streaming_incremental (104-199)

new (20-38)

lib/parsers/src/reasoning/deepseek_r1_parser.rs (3)

detect_and_parse_reasoning (34-36)

parse_reasoning_streaming_incremental (30-32)

new (16-26)

🔇 Additional comments (11)

lib/llm/src/tokenizers/hf.rs (1)

16-17: Import looks correct and scoped.

Importing HashMap is required for the new API. No concerns.

lib/llm/tests/http-service.rs (2)

98-99: Good migration to instance-based generator.

Switching to request.response_generator(None, None) aligns the tests with the new API shape and keeps model/metadata consistent with the request.

103-104: Updated create_choice call shape acknowledged.

Adding the trailing None for reasoning_content matches the new signature and keeps tests focused on non-reasoning paths.

lib/llm/src/preprocessor.rs (1)

132-133: Field initialization matches new struct; LGTM.

vocab is correctly captured once during construction.

lib/llm/src/engines.rs (2)

186-187: Use of (None, None) for tokenizer/vocab in echo engine is appropriate.

The echo engine emits character-level chunks and doesn’t need token-aware reasoning. This keeps the test engine minimal.

207-208: Updated create_choice signature usage confirmed.

Supplying None for reasoning_content keeps behavior unchanged while conforming to the new API. No issues.

Also applies to: 212-213

lib/parsers/src/reasoning/deepseek_r1_parser.rs (1)

29-36: Delegation looks correct.

Forwarding to the base parser preserves the updated token-ID semantics. No concerns here.

lib/parsers/src/reasoning/mod.rs (2)

39-49: Trait API migration to token IDs looks consistent.

Signatures and docs are aligned with the new token-id based parsing. Good.

63-71: Wrapper forwarding is clean.

The wrapper delegates correctly to the inner parser with the new signatures.

lib/llm/src/protocols/openai/chat_completions/delta.rs (2)

365-371: create_choice wiring with reasoning_content looks good.

Passing both normal_text and reasoning_content aligns with the new delta shape. Nice.

21-33: Public API change for response_generator – call sites updated, version bump recommended

All existing call sites correctly use the new two-parameter chat variant and the unchanged no-arg completions variant:

In lib/llm/src/preprocessor.rs, the chat path invokes
request.response_generator(Some(&self.tokenizer), Some(&self.vocab));
and the common‐completion path invokes
request.response_generator();

In lib/llm/src/engines.rs and lib/llm/tests/http-service.rs, the chat stream uses
request.response_generator(None, None);

The standalone completions API in lib/llm/src/protocols/openai/completions/delta.rs still provides the zero-argument response_generator().

Everything compiles against both signatures, so no call sites are missing. Since this constitutes a public API addition in the chat module, consider bumping the crate’s minor version to signal compatibility-preserving change.

coderabbitai · 2025-08-22T01:00:34Z

lib/llm/src/preprocessor.rs

+        let response_generator =
+            request.response_generator(Some(&self.tokenizer), Some(&self.vocab));
        let mut response_generator = Box::new(response_generator);


💡 Verification agent

🧩 Analysis chain

Passing tokenizer and vocab downstream is correct and future-proof.

Providing both Some(&self.tokenizer) and Some(&self.vocab) enables token-id aware postprocessing. Confirm all other call sites updated to the new signature.

Run to locate any lingering old signature usages:

🏁 Script executed:

#!/bin/bash # Find old zero-arg response_generator calls for chat completions rg -nP --type=rust '\.response_generator\s*\(\s*\)' -C2 # Find four-arg create_choice calls in chat completions path (should be five-arg now) rg -nP --type=rust '\.create_choice\s*\(\s*[^\)]*(?:,[^\)]*){3}\s*\)' -g '!*completions/**' -C2

Length of output: 1772

Critical: Lingering old response_generator() and four-arg create_choice() call sites must be updated

The grep run uncovered remaining zero-argument calls to response_generator() and a four-argument call to create_choice(). These must be changed to the new signatures before merging:

• lib/engines/mistralrs/src/lib.rs:489

// before let response_generator = request.response_generator(); // after let response_generator = request.response_generator(Some(&self.tokenizer), Some(&self.vocab));

• lib/llm/src/preprocessor.rs:564

// before let response_generator = request.response_generator(); // after let response_generator = request.response_generator(Some(&self.tokenizer), Some(&self.vocab));

• lib/llm/src/engines.rs:233

// before let deltas = request.response_generator(); // after let deltas = request.response_generator(Some(&self.tokenizer), Some(&self.vocab));

• lib/llm/src/engines.rs:212

// before (four args) let response = deltas.create_choice( 0, None, None, Some(dynamo_async_openai::types::FinishReason::Stop), None, ); // after (five args – add the missing parameter to match the updated signature) let response = deltas.create_choice( 0, None, None, Some(dynamo_async_openai::types::FinishReason::Stop), None, /* e.g. Some(&self.tokenizer) or the new parameter */, );

Please update these call sites to pass the tokenizer and vocab (or the appropriate fifth argument) so that token-ID-aware postprocessing works everywhere.

🤖 Prompt for AI Agents

In lib/llm/src/preprocessor.rs around lines 505-507, the code calls request.response_generator() without the new tokenizer and vocab args and then boxes it; change the call to request.response_generator(Some(&self.tokenizer), Some(&self.vocab)) and then Box::new that result (i.e., replace the zero-arg invocation with the two-arg form and keep the Box::new wrapping).

coderabbitai · 2025-08-22T01:00:34Z

lib/llm/src/protocols/openai/chat_completions/delta.rs

+    pub fn new(
+        model: String,
+        options: DeltaGeneratorOptions,
+        tokenizer: Option<&Arc<(dyn Tokenizer + 'static)>>,
+        vocab: Option<&HashMap<String, u32>>,
+    ) -> Self {
        let now = std::time::SystemTime::now()


🛠️ Refactor suggestion

Avoid constructing a parser when markers are missing (use try_get + and_then).

If vocab is present but lacks "" or "", BasicReasoningParser::new will panic. Use a fallible acquisition to skip reasoning gracefully.

Apply this diff (paired with mod.rs change):

- let reasoning_parser_type = ReasoningParserType::Basic; - // Reasoning parser wrapper - let reasoning_parser = vocab.map(|v| reasoning_parser_type.get_reasoning_parser(v)); + let reasoning_parser_type = ReasoningParserType::Basic; + // Try to create a reasoning parser only if vocab has required markers. + let reasoning_parser = vocab.and_then(|v| reasoning_parser_type.try_get_reasoning_parser(v));

Also applies to: 104-111, 123-124

🤖 Prompt for AI Agents

In lib/llm/src/protocols/openai/chat_completions/delta.rs around lines 81-87 (and similarly at 104-111 and 123-124), avoid unconditionally constructing BasicReasoningParser when the vocab may be missing the "<think>" or "</think>" markers; instead use fallible lookups (e.g., vocab.try_get or vocab.get and_then) to retrieve both token IDs and only call BasicReasoningParser::new if both markers are present, otherwise leave the reasoning parser as None so reasoning is skipped gracefully; update the mapping logic to chain Option results (and_then/map) and propagate None when either marker is absent.

coderabbitai · 2025-08-22T01:00:34Z

lib/llm/src/protocols/openai/chat_completions/delta.rs

        // Create the streaming response.
+        let reasoning_parser_result = self.create_reasoning_content(delta.token_ids);
+
+        let (normal_text, reasoning_content) = if let Some(parser_result) = reasoning_parser_result
+        {
+            let none_if_empty = |vec: String| {
+                if vec.is_empty() {
+                    None
+                } else {
+                    Some(vec)
+                }
+            };
+            (
+                none_if_empty(
+                    self.tokenizer
+                        .as_ref()
+                        .unwrap()
+                        .decode(&parser_result.normal_token_ids, false)
+                        .unwrap(),
+                ),
+                none_if_empty(
+                    self.tokenizer
+                        .as_ref()
+                        .unwrap()
+                        .decode(&parser_result.reasoning_token_ids, false)
+                        .unwrap(),
+                ),
+            )
+        } else {
+            (delta.text, None)
+        };


🛠️ Refactor suggestion

Don’t unwrap on tokenizer.decode during streaming; fall back gracefully.

Decoding can fail (e.g., invalid IDs or tokenizer errors). Unwrap will crash the stream. Handle decode errors and treat them as “no delta” or log and continue.

Apply this diff:

- let (normal_text, reasoning_content) = if let Some(parser_result) = reasoning_parser_result - { - let none_if_empty = |vec: String| { - if vec.is_empty() { - None - } else { - Some(vec) - } - }; - ( - none_if_empty( - self.tokenizer - .as_ref() - .unwrap() - .decode(&parser_result.normal_token_ids, false) - .unwrap(), - ), - none_if_empty( - self.tokenizer - .as_ref() - .unwrap() - .decode(&parser_result.reasoning_token_ids, false) - .unwrap(), - ), - ) - } else { - (delta.text, None) - }; + let (normal_text, reasoning_content) = if let Some(parser_result) = reasoning_parser_result { + let decode = |ids: &Vec<u32>| { + self.tokenizer + .as_ref() + .unwrap() + .decode(ids, false) + .ok() + .and_then(|s| if s.is_empty() { None } else { Some(s) }) + }; + (decode(&parser_result.normal_token_ids), decode(&parser_result.reasoning_token_ids)) + } else { + (delta.text, None) + };

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Create the streaming response.

let reasoning_parser_result = self.create_reasoning_content(delta.token_ids);

let (normal_text, reasoning_content) = if let Some(parser_result) = reasoning_parser_result

{

let none_if_empty = |vec: String| {

if vec.is_empty() {

None

} else {

Some(vec)

}

};

(

none_if_empty(

self.tokenizer

.as_ref()

.unwrap()

.decode(&parser_result.normal_token_ids, false)

.unwrap(),

),

none_if_empty(

self.tokenizer

.as_ref()

.unwrap()

.decode(&parser_result.reasoning_token_ids, false)

.unwrap(),

),

)

} else {

(delta.text, None)

};

// Create the streaming response.

let reasoning_parser_result = self.create_reasoning_content(delta.token_ids);

let (normal_text, reasoning_content) = if let Some(parser_result) = reasoning_parser_result {

let decode = |ids: &Vec<u32>| {

self.tokenizer

.as_ref()

.unwrap()

.decode(ids, false)

.ok()

.and_then(|s| if s.is_empty() { None } else { Some(s) })

};

(decode(&parser_result.normal_token_ids), decode(&parser_result.reasoning_token_ids))

} else {

(delta.text, None)

};

🤖 Prompt for AI Agents

In lib/llm/src/protocols/openai/chat_completions/delta.rs around lines 333 to 363, the code calls tokenizer.decode(...).unwrap() which will panic on decode errors during streaming; instead handle decode failures gracefully by replacing unwraps with error-aware logic: attempt to decode each token id vector with tokenizer.decode(...), map successful results to Some(string) if non-empty, and on Err return None (or log the error and return None) so streaming continues; ensure the closure none_if_empty is applied after successful decode and that in the else branch you still use delta.text when appropriate.

coderabbitai · 2025-08-22T01:00:34Z

lib/parsers/src/reasoning/deepseek_r1_parser.rs

+    pub fn new(vocab: &HashMap<String, u32>) -> Self {
        Self {
            base: BasicReasoningParser::new(
                "<think>".to_string(),
                "</think>".to_string(),
                true,
                true,
+                vocab,
            ),


⚠️ Potential issue

Avoid panics when tokens are missing in vocab (make construction fallible).

BasicReasoningParser::new currently unwraps token IDs from the vocab. If the provided tokenizer/vocab doesn’t contain "" and/or "", this will panic during parser construction. Prefer a fallible constructor and propagate None/Result to the caller. See proposed cross-file diffs in base_parser.rs and mod.rs to introduce a try_new/try_get pattern and wire it in delta.rs.

Apply this diff here once BasicReasoningParser::try_new exists:

- pub fn new(vocab: &HashMap<String, u32>) -> Self { - Self { - base: BasicReasoningParser::new( + pub fn new(vocab: &HashMap<String, u32>) -> Self { + Self { + base: BasicReasoningParser::try_new( "<think>".to_string(), "</think>".to_string(), true, true, - vocab, - ), + vocab, + ) + .expect("DeepseekR1ReasoningParser requires <think> markers in vocab"), } }

If you prefer to avoid propagating Result here, you can log+skip parser creation at the callsite (see delta.rs diff) and keep this file unchanged. Let me know your preferred approach and I’ll align the patches.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

pub fn new(vocab: &HashMap<String, u32>) -> Self {

Self {

base: BasicReasoningParser::new(

"<think>".to_string(),

"</think>".to_string(),

true,

true,

vocab,

),

pub fn new(vocab: &HashMap<String, u32>) -> Self {

Self {

base: BasicReasoningParser::try_new(

"<think>".to_string(),

"</think>".to_string(),

true,

true,

vocab,

)

.expect("DeepseekR1ReasoningParser requires <think> markers in vocab"),

}

}

🤖 Prompt for AI Agents

In lib/parsers/src/reasoning/deepseek_r1_parser.rs around lines 16 to 24, the current constructor calls BasicReasoningParser::new which unwraps vocab token IDs and can panic if "<think>" or "</think>" are missing; change this to call BasicReasoningParser::try_new (the fallible constructor) and propagate the failure by making DeepseekR1Parser::new return a Result or Option (matching the new base API) so construction does not panic: replace the direct call with the try_new call, map/propagate the returned error or None to the caller, and update the function signature and any calling sites accordingly so missing tokens are handled gracefully.

coderabbitai · 2025-08-22T01:00:35Z

lib/parsers/src/reasoning/mod.rs

 impl ReasoningParserType {
-    pub fn get_reasoning_parser(self) -> ReasoningParserWrapper {
+    pub fn get_reasoning_parser(self, vocab: &HashMap<String, u32>) -> ReasoningParserWrapper {
        match self {
            ReasoningParserType::DeepseekR1 => ReasoningParserWrapper {
-                parser: Box::new(DeepseekR1ReasoningParser::new()),
+                parser: Box::new(DeepseekR1ReasoningParser::new(vocab)),
            },
            ReasoningParserType::Basic => ReasoningParserWrapper {
                parser: Box::new(BasicReasoningParser::new(
                    "<think>".into(),
                    "</think>".into(),
                    false,
                    true,
+                    vocab,
                )),


🛠️ Refactor suggestion

Make parser acquisition fallible to avoid runtime panics when markers are absent.

get_reasoning_parser unconditionally constructs a parser, but BasicReasoningParser::new unwraps marker IDs and can panic. Recommend returning Option and letting call sites skip reasoning gracefully if the vocab lacks markers. This is low-risk and avoids hard failures in heterogeneous model/tokenizer setups.

Apply this diff:

-impl ReasoningParserType { - pub fn get_reasoning_parser(self, vocab: &HashMap<String, u32>) -> ReasoningParserWrapper { - match self { - ReasoningParserType::DeepseekR1 => ReasoningParserWrapper { - parser: Box::new(DeepseekR1ReasoningParser::new(vocab)), - }, - ReasoningParserType::Basic => ReasoningParserWrapper { - parser: Box::new(BasicReasoningParser::new( - "<think>".into(), - "</think>".into(), - false, - true, - vocab, - )), - }, - } - } -} +impl ReasoningParserType { + pub fn try_get_reasoning_parser( + self, + vocab: &HashMap<String, u32>, + ) -> Option<ReasoningParserWrapper> { + match self { + ReasoningParserType::DeepseekR1 => { + // If you keep DeepseekR1ReasoningParser::new infallible, guard here: + if !vocab.contains_key("<think>") || !vocab.contains_key("</think>") { + return None; + } + Some(ReasoningParserWrapper { + parser: Box::new(DeepseekR1ReasoningParser::new(vocab)), + }) + } + ReasoningParserType::Basic => { + BasicReasoningParser::try_new( + "<think>".into(), + "</think>".into(), + false, + true, + vocab, + ) + .map(|p| ReasoningParserWrapper { parser: Box::new(p) }) + } + } + } +}

Follow-up: wire this into DeltaGenerator::new (see delta.rs comment).

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In lib/parsers/src/reasoning/mod.rs around lines 73 to 86, get_reasoning_parser currently always constructs a parser but BasicReasoningParser::new can panic when marker IDs are missing; change get_reasoning_parser to return Option<ReasoningParserWrapper> (or Result with context) and make the Basic branch propagate None when marker lookup fails instead of unwrapping; update the DeepseekR1 branch to return Some(...), adjust the function signature and all call sites (notably wire the fallible result through into DeltaGenerator::new in delta.rs so callers can skip reasoning if markers are absent), and ensure compiler errors are resolved by handling the Option/Result where parsers are requested.

nachiketb-nvidia · 2025-08-22T19:39:42Z

closing in favour of #2656

nachiketb-nvidia added 2 commits August 21, 2025 17:08

chore: itnermediate commit

b85caa7

fix: working example

f6c5ed4

nachiketb-nvidia requested a review from a team as a code owner August 22, 2025 00:49

pull-request-size bot added the size/XL label Aug 22, 2025

nachiketb-nvidia changed the title ~~Nachiketb/use token ids for reasoning~~ chore: use token ids for reasoning parsing instead of strings directly Aug 22, 2025

github-actions bot added the chore label Aug 22, 2025

coderabbitai bot reviewed Aug 22, 2025

View reviewed changes

feat: add gpt oss reasoning parser

26ff25e

nachiketb-nvidia closed this Aug 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: use token ids for reasoning parsing instead of strings directly #2629

chore: use token ids for reasoning parsing instead of strings directly #2629

Uh oh!

nachiketb-nvidia commented Aug 22, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Aug 22, 2025

Uh oh!

coderabbitai bot commented Aug 22, 2025

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Aug 22, 2025

Uh oh!

coderabbitai bot Aug 22, 2025

Uh oh!

coderabbitai bot Aug 22, 2025

Uh oh!

coderabbitai bot Aug 22, 2025

Uh oh!

coderabbitai bot Aug 22, 2025

Uh oh!

nachiketb-nvidia commented Aug 22, 2025

Uh oh!

Uh oh!

chore: use token ids for reasoning parsing instead of strings directly #2629

chore: use token ids for reasoning parsing instead of strings directly #2629

Uh oh!

Conversation

nachiketb-nvidia commented Aug 22, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Aug 22, 2025

Uh oh!

coderabbitai bot commented Aug 22, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

nachiketb-nvidia commented Aug 22, 2025

Uh oh!

Uh oh!

nachiketb-nvidia commented Aug 22, 2025 •

edited by coderabbitai bot

Loading