Skip to content

Conversation

nachiketb-nvidia
Copy link
Contributor

@nachiketb-nvidia nachiketb-nvidia commented Aug 20, 2025

Overview:

Enable basic reasoning parsing of types of LLM outputs

Details:

  • Moved traits around for better code
  • Rename Base to basic reasoning parser
  • hardcode reasoning parser type to basic for now while we figure out how to get that from the frontend to there

Where should the reviewer start?

changes in aggregator and delta.rs

Summary by CodeRabbit

  • New Features

    • Streaming chat completions now separate "reasoning" from visible text; streams and final responses may include a reasoning field.
    • Think-tagged blocks (…) are stripped from visible message content and delivered via reasoning output.
  • Refactor

    • Pluggable, selectable reasoning parser with a thread-safe wrapper for different parsing strategies.
  • Tests

    • Expanded tests for incremental and streaming reasoning parsing.

@nachiketb-nvidia nachiketb-nvidia requested a review from a team as a code owner August 20, 2025 16:31
Copy link

copy-pr-bot bot commented Aug 20, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added the feat label Aug 20, 2025
@nachiketb-nvidia nachiketb-nvidia force-pushed the nachiketb/enable-basic-reasoning-parsing branch from 22d0268 to 9ca1004 Compare August 20, 2025 16:37
Copy link
Contributor

coderabbitai bot commented Aug 20, 2025

Walkthrough

Adds streaming reasoning parsing and propagation into chat-completion deltas: introduces a ReasoningParser abstraction and wrapper, renames and refactors base parser to BasicReasoningParser, wires parsing into DeltaGenerator (mutating create_choice), accumulates per-choice reasoning_content in the aggregator, and changes response generator bindings to mutable.

Changes

Cohort / File(s) Summary
Response generator mutability
lib/llm/src/engines.rs, lib/llm/tests/http-service.rs
Change response generator binding to mut so streaming code can call mutating methods (e.g., create_choice).
Delta generator: reasoning parsing
lib/llm/src/protocols/openai/chat_completions/delta.rs
Add reasoning_parser: ReasoningParserWrapper to DeltaGenerator; parse streaming text into normal_text and reasoning_content; change create_choice to &mut self and return NvCreateChatCompletionStreamResponse; generated delta sets content = normal_text and includes reasoning_content.
Delta aggregator: reasoning accumulation
lib/llm/src/protocols/openai/chat_completions/aggregator.rs
Add reasoning_content: Option<String> to internal DeltaChoice; append per-delta reasoning_content into choice state; final conversion exposes aggregated reasoning_content on ChatChoice.
Parsers: API, renames, wrapper
lib/parsers/src/reasoning/mod.rs, lib/parsers/src/reasoning/base_parser.rs, lib/parsers/src/reasoning/deepseek_r1_parser.rs
Move ParserResult and ReasoningParser to crate root; rename BaseReasoningParserBasicReasoningParser with expanded streaming state/logic and derive changes; make submodules private; add ReasoningParser trait, ReasoningParserType enum, and ReasoningParserWrapper (Arc<Mutex>) with factory method to obtain parser instances; DeepseekR1 updated to use BasicReasoningParser and implement the trait.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Client
  participant API as HTTP API
  participant Engine as Engine
  participant RG as ResponseGenerator (mutable)
  participant DG as DeltaGenerator
  participant RP as ReasoningParser (wrapper)
  participant Agg as DeltaAggregator

  Client->>API: POST /chat/completions (stream)
  API->>Engine: build request
  Engine->>RG: response_generator() as mutable
  loop token/delta stream
    RG->>DG: create_choice(index, text, finish_reason, logprobs) (mut)
    DG->>RP: parse streaming text
    RP-->>DG: {normal_text, reasoning_text}
    DG-->>RG: emit delta {content: normal_text, reasoning_content: reasoning_text}
    RG->>Agg: apply(delta)
    Agg-->>API: stream chunk (choices[].delta.content + internal reasoning accumulation)
    API-->>Client: data: {choices[].delta.content,...}
  end
  note over Agg,API: aggregated reasoning_content is carried per-choice and exposed in final ChatChoice
  API-->>Client: [DONE]
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

I hop and parse the thought between,
nibbling tokens, keeping lean—
split the think, keep chatter clear,
stitch the reasons, close and near.
Tiny paws, big streaming cheer. 🥕🐇

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@grahamking
Copy link
Contributor

@coderabbitai review

Copy link
Contributor

coderabbitai bot commented Aug 20, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
lib/parsers/src/reasoning/mod.rs (4)

11-17: Derive Debug/Clone/Default on ParserResult for usability and logging

These derives are low-cost and make the type easier to log, clone in tests, and initialize.

Apply:

-pub struct ParserResult {
+#[derive(Debug, Clone, Default)]
+pub struct ParserResult {
     /// The normal text outside of reasoning blocks.
     pub normal_text: String,
 
     /// The extracted reasoning text from within reasoning blocks.
     pub reasoning_text: String,
 }

Optionally consider a more specific name like ReasoningParseResult in a follow-up, but not required.


19-25: Clarify method contracts (streaming vs full) and align naming

The trait is clear, but downstream components (delta/aggregator) depend on the exact contract of what each method returns:

  • Should parse_reasoning_streaming_incremental return only the newly parsed delta for this chunk vs cumulative content?
  • Does detect_and_parse_reasoning reset internal state, or is it stateful across calls?

Tighten the docs so implementers and callers agree on semantics. Also consider naming consistency: e.g., parse_full or parse_complete vs detect_and_parse_reasoning.

Suggested docs tweak:

-pub trait ReasoningParser {
-    /// Detects and parses reasoning from the input text.
-    fn detect_and_parse_reasoning(&mut self, text: &str) -> ParserResult;
-
-    /// Parses reasoning incrementally from streaming input.
-    fn parse_reasoning_streaming_incremental(&mut self, text: &str) -> ParserResult;
-}
+pub trait ReasoningParser {
+    /// Parses a standalone, non-streaming input chunk. Implementations may reset or ignore
+    /// internal streaming state and should return the split of normal vs reasoning text for
+    /// this complete input. Marker tokens must not be included in either output.
+    fn detect_and_parse_reasoning(&mut self, text: &str) -> ParserResult;
+
+    /// Parses a streaming chunk and updates internal state. The return value should be the
+    /// delta: only the newly discovered normal and reasoning text attributable to this chunk
+    /// (not the cumulative totals). Marker tokens must not be included in either output.
+    fn parse_reasoning_streaming_incremental(&mut self, text: &str) -> ParserResult;
+}

If the aggregator relies on cumulative semantics today, call that out explicitly and we can adjust the wording.


27-31: Consider #[non_exhaustive] and a Default variant; add brief variant docs

If this enum is part of your public API and you plan to add new parser types, marking it #[non_exhaustive] avoids breaking consumers that use exhaustive matches. Also deriving Default with Basic helps with configuration defaults.

Apply:

-#[derive(Debug, Clone, Copy, PartialEq, Eq)]
-pub enum ReasoningParserType {
-    DeepseekR1,
-    Basic,
-}
+#[derive(Debug, Clone, Copy, PartialEq, Eq, Default)]
+#[non_exhaustive]
+pub enum ReasoningParserType {
+    /// Parser tuned for DeepSeek-R1 style reasoning traces.
+    DeepseekR1,
+    /// Marker-based parser that extracts content between `<think>` and `</think>`.
+    #[default]
+    Basic,
+}

Note: #[non_exhaustive] is a public API change; if the crate is already consumed externally, confirm semver impact before adopting.


33-45: Thread-safety bound and minor nit: prefer .into() over .to_string()

  • If this parser is ever moved across threads (e.g., spawned tasks consuming a stream), consider returning Box<dyn ReasoningParser + Send> to make the bound explicit. This will surface any non-Send fields early.
  • Style nit: "<think>".into() avoids explicitly calling to_string() and lets the compiler infer the target type.

Apply:

-    pub fn get_reasoning_parser(self) -> Box<dyn ReasoningParser> {
+    pub fn get_reasoning_parser(self) -> Box<dyn ReasoningParser + Send> {
         match self {
             ReasoningParserType::DeepseekR1 => Box::new(DeepseekR1ReasoningParser::new()),
             ReasoningParserType::Basic => Box::new(BasicReasoningParser::new(
-                "<think>".to_string(),
-                "</think>".to_string(),
+                "<think>".into(),
+                "</think>".into(),
                 false,
                 true,
             )),
         }
     }

If the parser is strictly single-threaded by design, you can skip the + Send bound—please confirm how the delta/aggregator pipeline holds and propagates the parser object (e.g., shared between tasks vs confined to a single thread). As a separate follow-up, consider renaming this to into_parser since it consumes self, matching Rust naming idioms.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 867c07b and 9ca1004.

📒 Files selected for processing (7)
  • lib/llm/src/engines.rs (1 hunks)
  • lib/llm/src/protocols/openai/chat_completions/aggregator.rs (2 hunks)
  • lib/llm/src/protocols/openai/chat_completions/delta.rs (7 hunks)
  • lib/llm/tests/http-service.rs (1 hunks)
  • lib/parsers/src/reasoning/base_parser.rs (22 hunks)
  • lib/parsers/src/reasoning/deepseek_r1_parser.rs (1 hunks)
  • lib/parsers/src/reasoning/mod.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • lib/llm/src/protocols/openai/chat_completions/aggregator.rs
  • lib/parsers/src/reasoning/base_parser.rs
  • lib/llm/src/protocols/openai/chat_completions/delta.rs
  • lib/parsers/src/reasoning/deepseek_r1_parser.rs
🔇 Additional comments (5)
lib/llm/src/engines.rs (2)

186-186: Making the response generator mutable is correct and required

Using a mutable binding here aligns with the updated DeltaGenerator API (create_choice takes &mut self). The borrow occurs after the await and does not live across yield points, so this is safe.


186-214: All DeltaGenerator bindings match their create_choice signatures

Verified that in chat_completions/delta.rs, create_choice(&mut self, …) is paired with let mut deltas, and in completions/delta.rs, create_choice(&self, …) is paired with an immutable let deltas. No updates to the completions path are needed.

lib/llm/tests/http-service.rs (1)

98-98: Good: mutable generator matches streaming API change

The mutable binding is necessary to call create_choice(&mut self, ...). This keeps the test aligned with the updated chat-completions delta generator.

lib/parsers/src/reasoning/mod.rs (2)

8-9: Re-exports look good

Keeping DeepseekR1ReasoningParser and exposing BasicReasoningParser at the module root preserves ergonomics while hiding internal structure. Nice.


4-5: No external imports of private submodules found; change is safe

A ripgrep scan for reasoning::base_parser and reasoning::deepseek_r1_parser across the repo yielded no matches. Both parser types still expose public new() constructors and implement the ReasoningParser trait via your top-level re-exports. This tightening of the API surface has no remaining callers using the old module paths.

@nachiketb-nvidia nachiketb-nvidia force-pushed the nachiketb/enable-basic-reasoning-parsing branch 2 times, most recently from 8035210 to 0c732cd Compare August 20, 2025 18:34
@nachiketb-nvidia nachiketb-nvidia force-pushed the nachiketb/enable-basic-reasoning-parsing branch from e9f045a to 663a565 Compare August 20, 2025 22:44
@nachiketb-nvidia
Copy link
Contributor Author

@coderabbitai review again

Copy link
Contributor

coderabbitai bot commented Aug 20, 2025

@nachiketb-nvidia I'll review the changes in the PR again.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
lib/llm/src/protocols/openai/chat_completions/delta.rs (3)

103-119: Consider making parser type configurable.

The reasoning parser type is hardcoded to Basic. While the PR description mentions this is temporary for front-end integration determination, consider adding a comment indicating this is a temporary hardcode.

-        // Reasoning parser
-        let reasoning_parser_type = ReasoningParserType::Basic;
-
-        let reasoning_parser = reasoning_parser_type.get_reasoning_parser();
+        // Reasoning parser
+        // TODO: Make parser type configurable once front-end integration is determined
+        let reasoning_parser_type = ReasoningParserType::Basic;
+
+        let reasoning_parser = reasoning_parser_type.get_reasoning_parser();

195-202: Simplify the unwrap operation.

The current implementation checks if text is Some but still calls unwrap() on the same value. This can be simplified using ? operator or pattern matching.

-    fn create_reasoning_content(&mut self, text: Option<String>) -> Option<ParserResult> {
-        text.as_ref()?;
-        let parser_result = self
-            .reasoning_parser
-            .parse_reasoning_streaming_incremental(text.as_deref().unwrap());
-
-        Some(parser_result)
-    }
+    fn create_reasoning_content(&mut self, text: Option<String>) -> Option<ParserResult> {
+        let text = text?;
+        let parser_result = self
+            .reasoning_parser
+            .parse_reasoning_streaming_incremental(&text);
+
+        Some(parser_result)
+    }

216-227: Potential content loss when parsing fails
By using unwrap_or_default() on the Option<ParserResult> returned by create_reasoning_content, you’ll get a derived‐default ParserResult—i.e. both normal_text and reasoning_text are empty—whenever parsing fails or text is None. In those failure cases, the entire message content is dropped rather than preserved.

• File: lib/llm/src/protocols/openai/chat_completions/delta.rs
Line: 222

Consider instead preserving the original text as normal_text on failure. For example:

- let reasoning_parser_result = self.create_reasoning_content(text).unwrap_or_default();
+ let reasoning_parser_result = self
+     .create_reasoning_content(text.clone())
+     .unwrap_or_else(|| ParserResult {
+         normal_text: text.clone().unwrap_or_default(),
+         reasoning_text: String::new(),
+     });

This ensures that even when parsing fails, the user’s message still appears in normal_text.

lib/parsers/src/reasoning/base_parser.rs (1)

85-174: Review streaming incremental parsing logic.

The streaming parsing implementation handles complex buffering scenarios, but there are some areas that could be improved:

  1. Lines 104-119: The prefix matching logic for partial tokens is good, but could be more efficient.
  2. Lines 151-158: When stream_reasoning is true, the buffer is cleared immediately after streaming content. This might cause issues with partial tokens at the end of the content.
  3. Lines 135-149: The end token handling correctly manages state transitions.

The implementation handles the complex requirements of streaming parsing with partial tokens, but the logic is quite intricate.

Consider adding more inline comments to explain the complex state transitions, especially around buffer management and partial token handling.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 9ca1004 and c19f465.

📒 Files selected for processing (7)
  • lib/llm/src/engines.rs (1 hunks)
  • lib/llm/src/protocols/openai/chat_completions/aggregator.rs (3 hunks)
  • lib/llm/src/protocols/openai/chat_completions/delta.rs (7 hunks)
  • lib/llm/tests/http-service.rs (1 hunks)
  • lib/parsers/src/reasoning/base_parser.rs (22 hunks)
  • lib/parsers/src/reasoning/deepseek_r1_parser.rs (1 hunks)
  • lib/parsers/src/reasoning/mod.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • lib/llm/tests/http-service.rs
  • lib/llm/src/engines.rs
  • lib/llm/src/protocols/openai/chat_completions/aggregator.rs
  • lib/parsers/src/reasoning/deepseek_r1_parser.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build and Test - dynamo
  • GitHub Check: pre-merge-rust (.)
🔇 Additional comments (17)
lib/llm/src/protocols/openai/chat_completions/delta.rs (4)

16-16: LGTM! Good use of structured imports.

The import statement properly brings in all necessary components from the reasoning module with clear visibility into what's being used.


70-74: LGTM! Well-documented field addition.

The reasoning parser field is properly documented and follows the existing struct conventions. The field type ReasoningParserWrapper provides the necessary abstraction for thread-safe access to the underlying parser.


224-227: LGTM! Clean extraction of parser results.

The tuple destructuring cleanly separates normal text and reasoning content from the parser result, making the code readable and maintainable.


228-239: LGTM! Proper delta construction with reasoning content.

The delta construction properly integrates the reasoning content while preserving existing functionality. The conditional role assignment logic remains intact.

lib/parsers/src/reasoning/mod.rs (8)

4-5: LGTM! Proper module visibility.

Making the parser modules private while exposing their public types through re-exports is a good encapsulation practice.


7-7: LGTM! Necessary imports for thread safety.

The Arc and Mutex imports are required for the thread-safe ReasoningParserWrapper implementation.


10-11: LGTM! Clean re-exports with updated naming.

The re-exports properly expose BasicReasoningParser (renamed from BaseReasoningParser) while maintaining the existing DeepseekR1ReasoningParser export.


13-38: LGTM! Well-structured ParserResult with helpful utility methods.

The ParserResult struct is well-designed with:

  • Clear field documentation
  • Appropriate derives including Default
  • Utility methods that return Option<String> for empty check convenience
  • Consistent naming conventions

40-50: LGTM! Clean trait definition with clear method contracts.

The ReasoningParser trait is well-designed with:

  • Proper trait bounds (Send + std::fmt::Debug)
  • Clear method documentation explaining streaming vs non-streaming behavior
  • Consistent return types using ParserResult
  • Good separation of concerns between standalone and incremental parsing

52-57: LGTM! Well-designed enum with future extensibility.

The ReasoningParserType enum is properly designed with:

  • #[non_exhaustive] for future extensibility
  • Appropriate derives
  • Clear variant naming

59-75: LGTM! Proper wrapper implementation for thread safety.

The ReasoningParserWrapper provides clean delegation to the underlying parser with proper mutex locking. The trait implementation correctly forwards calls to the inner parser.


77-93: LGTM! Clean factory pattern implementation.

The get_reasoning_parser method provides a clean factory pattern for creating parser instances:

  • Proper wrapping in Arc<Mutex<>> for thread safety
  • Correct constructor parameters for BasicReasoningParser
  • Clean match pattern for extensibility
lib/parsers/src/reasoning/base_parser.rs (5)

5-5: LGTM! Proper import from crate root.

The import correctly references ParserResult and ReasoningParser from the crate root, aligning with the module restructuring.


7-8: LGTM! Proper rename and derive additions.

The rename from BaseReasoningParser to BasicReasoningParser is consistent with the PR objectives, and the added derives (Default, Debug, Clone) are appropriate for the use cases.


17-33: LGTM! Constructor properly handles new parameters.

The constructor correctly maps force_reasoning to _in_reasoning and initializes the new streaming-related fields (stream_reasoning, _buffer, stripped_think_start).


35-83: LGTM! Non-streaming parsing logic is sound.

The detect_and_parse_reasoning method handles the various cases correctly:

  • Returns normal text when no reasoning is detected
  • Handles truncated reasoning (missing end token)
  • Properly splits reasoning and normal text at the first end token
  • Good logging for debugging

177-435: Comprehensive test coverage validates the implementation.

The test suite covers a wide range of scenarios:

  • Basic reasoning detection and parsing
  • Streaming incremental parsing with partial tokens
  • Multiple reasoning blocks
  • Edge cases (empty blocks, whitespace-only, malformed input)
  • State persistence across streaming calls
  • Different parser configurations

All tests have been updated to use BasicReasoningParser, maintaining consistency with the rename.

@nachiketb-nvidia nachiketb-nvidia force-pushed the nachiketb/enable-basic-reasoning-parsing branch from c19f465 to b5088e6 Compare August 21, 2025 16:57
@nachiketb-nvidia nachiketb-nvidia force-pushed the nachiketb/enable-basic-reasoning-parsing branch from b5088e6 to 8247b64 Compare August 21, 2025 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants