-
Notifications
You must be signed in to change notification settings - Fork 565
feat: enable basic reasoning parsing of <think> </think> tokens #2555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: enable basic reasoning parsing of <think> </think> tokens #2555
Conversation
22d0268
to
9ca1004
Compare
WalkthroughAdds streaming reasoning parsing and propagation into chat-completion deltas: introduces a ReasoningParser abstraction and wrapper, renames and refactors base parser to BasicReasoningParser, wires parsing into DeltaGenerator (mutating create_choice), accumulates per-choice reasoning_content in the aggregator, and changes response generator bindings to mutable. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Client
participant API as HTTP API
participant Engine as Engine
participant RG as ResponseGenerator (mutable)
participant DG as DeltaGenerator
participant RP as ReasoningParser (wrapper)
participant Agg as DeltaAggregator
Client->>API: POST /chat/completions (stream)
API->>Engine: build request
Engine->>RG: response_generator() as mutable
loop token/delta stream
RG->>DG: create_choice(index, text, finish_reason, logprobs) (mut)
DG->>RP: parse streaming text
RP-->>DG: {normal_text, reasoning_text}
DG-->>RG: emit delta {content: normal_text, reasoning_content: reasoning_text}
RG->>Agg: apply(delta)
Agg-->>API: stream chunk (choices[].delta.content + internal reasoning accumulation)
API-->>Client: data: {choices[].delta.content,...}
end
note over Agg,API: aggregated reasoning_content is carried per-choice and exposed in final ChatChoice
API-->>Client: [DONE]
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
lib/parsers/src/reasoning/mod.rs (4)
11-17
: Derive Debug/Clone/Default on ParserResult for usability and loggingThese derives are low-cost and make the type easier to log, clone in tests, and initialize.
Apply:
-pub struct ParserResult { +#[derive(Debug, Clone, Default)] +pub struct ParserResult { /// The normal text outside of reasoning blocks. pub normal_text: String, /// The extracted reasoning text from within reasoning blocks. pub reasoning_text: String, }Optionally consider a more specific name like
ReasoningParseResult
in a follow-up, but not required.
19-25
: Clarify method contracts (streaming vs full) and align namingThe trait is clear, but downstream components (delta/aggregator) depend on the exact contract of what each method returns:
- Should
parse_reasoning_streaming_incremental
return only the newly parsed delta for this chunk vs cumulative content?- Does
detect_and_parse_reasoning
reset internal state, or is it stateful across calls?Tighten the docs so implementers and callers agree on semantics. Also consider naming consistency: e.g.,
parse_full
orparse_complete
vsdetect_and_parse_reasoning
.Suggested docs tweak:
-pub trait ReasoningParser { - /// Detects and parses reasoning from the input text. - fn detect_and_parse_reasoning(&mut self, text: &str) -> ParserResult; - - /// Parses reasoning incrementally from streaming input. - fn parse_reasoning_streaming_incremental(&mut self, text: &str) -> ParserResult; -} +pub trait ReasoningParser { + /// Parses a standalone, non-streaming input chunk. Implementations may reset or ignore + /// internal streaming state and should return the split of normal vs reasoning text for + /// this complete input. Marker tokens must not be included in either output. + fn detect_and_parse_reasoning(&mut self, text: &str) -> ParserResult; + + /// Parses a streaming chunk and updates internal state. The return value should be the + /// delta: only the newly discovered normal and reasoning text attributable to this chunk + /// (not the cumulative totals). Marker tokens must not be included in either output. + fn parse_reasoning_streaming_incremental(&mut self, text: &str) -> ParserResult; +}If the aggregator relies on cumulative semantics today, call that out explicitly and we can adjust the wording.
27-31
: Consider #[non_exhaustive] and a Default variant; add brief variant docsIf this enum is part of your public API and you plan to add new parser types, marking it
#[non_exhaustive]
avoids breaking consumers that use exhaustive matches. Also deriving Default withBasic
helps with configuration defaults.Apply:
-#[derive(Debug, Clone, Copy, PartialEq, Eq)] -pub enum ReasoningParserType { - DeepseekR1, - Basic, -} +#[derive(Debug, Clone, Copy, PartialEq, Eq, Default)] +#[non_exhaustive] +pub enum ReasoningParserType { + /// Parser tuned for DeepSeek-R1 style reasoning traces. + DeepseekR1, + /// Marker-based parser that extracts content between `<think>` and `</think>`. + #[default] + Basic, +}Note:
#[non_exhaustive]
is a public API change; if the crate is already consumed externally, confirm semver impact before adopting.
33-45
: Thread-safety bound and minor nit: prefer .into() over .to_string()
- If this parser is ever moved across threads (e.g., spawned tasks consuming a stream), consider returning
Box<dyn ReasoningParser + Send>
to make the bound explicit. This will surface any non-Send fields early.- Style nit:
"<think>".into()
avoids explicitly callingto_string()
and lets the compiler infer the target type.Apply:
- pub fn get_reasoning_parser(self) -> Box<dyn ReasoningParser> { + pub fn get_reasoning_parser(self) -> Box<dyn ReasoningParser + Send> { match self { ReasoningParserType::DeepseekR1 => Box::new(DeepseekR1ReasoningParser::new()), ReasoningParserType::Basic => Box::new(BasicReasoningParser::new( - "<think>".to_string(), - "</think>".to_string(), + "<think>".into(), + "</think>".into(), false, true, )), } }If the parser is strictly single-threaded by design, you can skip the
+ Send
bound—please confirm how the delta/aggregator pipeline holds and propagates the parser object (e.g., shared between tasks vs confined to a single thread). As a separate follow-up, consider renaming this tointo_parser
since it consumesself
, matching Rust naming idioms.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (7)
lib/llm/src/engines.rs
(1 hunks)lib/llm/src/protocols/openai/chat_completions/aggregator.rs
(2 hunks)lib/llm/src/protocols/openai/chat_completions/delta.rs
(7 hunks)lib/llm/tests/http-service.rs
(1 hunks)lib/parsers/src/reasoning/base_parser.rs
(22 hunks)lib/parsers/src/reasoning/deepseek_r1_parser.rs
(1 hunks)lib/parsers/src/reasoning/mod.rs
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- lib/llm/src/protocols/openai/chat_completions/aggregator.rs
- lib/parsers/src/reasoning/base_parser.rs
- lib/llm/src/protocols/openai/chat_completions/delta.rs
- lib/parsers/src/reasoning/deepseek_r1_parser.rs
🔇 Additional comments (5)
lib/llm/src/engines.rs (2)
186-186
: Making the response generator mutable is correct and requiredUsing a mutable binding here aligns with the updated DeltaGenerator API (create_choice takes &mut self). The borrow occurs after the await and does not live across yield points, so this is safe.
186-214
: All DeltaGenerator bindings match their create_choice signaturesVerified that in chat_completions/delta.rs,
create_choice(&mut self, …)
is paired withlet mut deltas
, and in completions/delta.rs,create_choice(&self, …)
is paired with an immutablelet deltas
. No updates to the completions path are needed.lib/llm/tests/http-service.rs (1)
98-98
: Good: mutable generator matches streaming API changeThe mutable binding is necessary to call create_choice(&mut self, ...). This keeps the test aligned with the updated chat-completions delta generator.
lib/parsers/src/reasoning/mod.rs (2)
8-9
: Re-exports look goodKeeping
DeepseekR1ReasoningParser
and exposingBasicReasoningParser
at the module root preserves ergonomics while hiding internal structure. Nice.
4-5
: No external imports of private submodules found; change is safeA ripgrep scan for
reasoning::base_parser
andreasoning::deepseek_r1_parser
across the repo yielded no matches. Both parser types still expose publicnew()
constructors and implement theReasoningParser
trait via your top-level re-exports. This tightening of the API surface has no remaining callers using the old module paths.
8035210
to
0c732cd
Compare
e9f045a
to
663a565
Compare
@coderabbitai review again |
@nachiketb-nvidia I'll review the changes in the PR again. ✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
lib/llm/src/protocols/openai/chat_completions/delta.rs (3)
103-119
: Consider making parser type configurable.The reasoning parser type is hardcoded to
Basic
. While the PR description mentions this is temporary for front-end integration determination, consider adding a comment indicating this is a temporary hardcode.- // Reasoning parser - let reasoning_parser_type = ReasoningParserType::Basic; - - let reasoning_parser = reasoning_parser_type.get_reasoning_parser(); + // Reasoning parser + // TODO: Make parser type configurable once front-end integration is determined + let reasoning_parser_type = ReasoningParserType::Basic; + + let reasoning_parser = reasoning_parser_type.get_reasoning_parser();
195-202
: Simplify the unwrap operation.The current implementation checks if
text
isSome
but still callsunwrap()
on the same value. This can be simplified using?
operator or pattern matching.- fn create_reasoning_content(&mut self, text: Option<String>) -> Option<ParserResult> { - text.as_ref()?; - let parser_result = self - .reasoning_parser - .parse_reasoning_streaming_incremental(text.as_deref().unwrap()); - - Some(parser_result) - } + fn create_reasoning_content(&mut self, text: Option<String>) -> Option<ParserResult> { + let text = text?; + let parser_result = self + .reasoning_parser + .parse_reasoning_streaming_incremental(&text); + + Some(parser_result) + }
216-227
: Potential content loss when parsing fails
By usingunwrap_or_default()
on theOption<ParserResult>
returned bycreate_reasoning_content
, you’ll get a derived‐defaultParserResult
—i.e. bothnormal_text
andreasoning_text
are empty—whenever parsing fails ortext
isNone
. In those failure cases, the entire message content is dropped rather than preserved.• File:
lib/llm/src/protocols/openai/chat_completions/delta.rs
Line: 222Consider instead preserving the original
text
asnormal_text
on failure. For example:- let reasoning_parser_result = self.create_reasoning_content(text).unwrap_or_default(); + let reasoning_parser_result = self + .create_reasoning_content(text.clone()) + .unwrap_or_else(|| ParserResult { + normal_text: text.clone().unwrap_or_default(), + reasoning_text: String::new(), + });This ensures that even when parsing fails, the user’s message still appears in
normal_text
.lib/parsers/src/reasoning/base_parser.rs (1)
85-174
: Review streaming incremental parsing logic.The streaming parsing implementation handles complex buffering scenarios, but there are some areas that could be improved:
- Lines 104-119: The prefix matching logic for partial tokens is good, but could be more efficient.
- Lines 151-158: When
stream_reasoning
is true, the buffer is cleared immediately after streaming content. This might cause issues with partial tokens at the end of the content.- Lines 135-149: The end token handling correctly manages state transitions.
The implementation handles the complex requirements of streaming parsing with partial tokens, but the logic is quite intricate.
Consider adding more inline comments to explain the complex state transitions, especially around buffer management and partial token handling.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (7)
lib/llm/src/engines.rs
(1 hunks)lib/llm/src/protocols/openai/chat_completions/aggregator.rs
(3 hunks)lib/llm/src/protocols/openai/chat_completions/delta.rs
(7 hunks)lib/llm/tests/http-service.rs
(1 hunks)lib/parsers/src/reasoning/base_parser.rs
(22 hunks)lib/parsers/src/reasoning/deepseek_r1_parser.rs
(1 hunks)lib/parsers/src/reasoning/mod.rs
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- lib/llm/tests/http-service.rs
- lib/llm/src/engines.rs
- lib/llm/src/protocols/openai/chat_completions/aggregator.rs
- lib/parsers/src/reasoning/deepseek_r1_parser.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Build and Test - dynamo
- GitHub Check: pre-merge-rust (.)
🔇 Additional comments (17)
lib/llm/src/protocols/openai/chat_completions/delta.rs (4)
16-16
: LGTM! Good use of structured imports.The import statement properly brings in all necessary components from the reasoning module with clear visibility into what's being used.
70-74
: LGTM! Well-documented field addition.The reasoning parser field is properly documented and follows the existing struct conventions. The field type
ReasoningParserWrapper
provides the necessary abstraction for thread-safe access to the underlying parser.
224-227
: LGTM! Clean extraction of parser results.The tuple destructuring cleanly separates normal text and reasoning content from the parser result, making the code readable and maintainable.
228-239
: LGTM! Proper delta construction with reasoning content.The delta construction properly integrates the reasoning content while preserving existing functionality. The conditional role assignment logic remains intact.
lib/parsers/src/reasoning/mod.rs (8)
4-5
: LGTM! Proper module visibility.Making the parser modules private while exposing their public types through re-exports is a good encapsulation practice.
7-7
: LGTM! Necessary imports for thread safety.The
Arc
andMutex
imports are required for the thread-safeReasoningParserWrapper
implementation.
10-11
: LGTM! Clean re-exports with updated naming.The re-exports properly expose
BasicReasoningParser
(renamed fromBaseReasoningParser
) while maintaining the existingDeepseekR1ReasoningParser
export.
13-38
: LGTM! Well-structured ParserResult with helpful utility methods.The
ParserResult
struct is well-designed with:
- Clear field documentation
- Appropriate derives including
Default
- Utility methods that return
Option<String>
for empty check convenience- Consistent naming conventions
40-50
: LGTM! Clean trait definition with clear method contracts.The
ReasoningParser
trait is well-designed with:
- Proper trait bounds (
Send + std::fmt::Debug
)- Clear method documentation explaining streaming vs non-streaming behavior
- Consistent return types using
ParserResult
- Good separation of concerns between standalone and incremental parsing
52-57
: LGTM! Well-designed enum with future extensibility.The
ReasoningParserType
enum is properly designed with:
#[non_exhaustive]
for future extensibility- Appropriate derives
- Clear variant naming
59-75
: LGTM! Proper wrapper implementation for thread safety.The
ReasoningParserWrapper
provides clean delegation to the underlying parser with proper mutex locking. The trait implementation correctly forwards calls to the inner parser.
77-93
: LGTM! Clean factory pattern implementation.The
get_reasoning_parser
method provides a clean factory pattern for creating parser instances:
- Proper wrapping in
Arc<Mutex<>>
for thread safety- Correct constructor parameters for
BasicReasoningParser
- Clean match pattern for extensibility
lib/parsers/src/reasoning/base_parser.rs (5)
5-5
: LGTM! Proper import from crate root.The import correctly references
ParserResult
andReasoningParser
from the crate root, aligning with the module restructuring.
7-8
: LGTM! Proper rename and derive additions.The rename from
BaseReasoningParser
toBasicReasoningParser
is consistent with the PR objectives, and the added derives (Default
,Debug
,Clone
) are appropriate for the use cases.
17-33
: LGTM! Constructor properly handles new parameters.The constructor correctly maps
force_reasoning
to_in_reasoning
and initializes the new streaming-related fields (stream_reasoning
,_buffer
,stripped_think_start
).
35-83
: LGTM! Non-streaming parsing logic is sound.The
detect_and_parse_reasoning
method handles the various cases correctly:
- Returns normal text when no reasoning is detected
- Handles truncated reasoning (missing end token)
- Properly splits reasoning and normal text at the first end token
- Good logging for debugging
177-435
: Comprehensive test coverage validates the implementation.The test suite covers a wide range of scenarios:
- Basic reasoning detection and parsing
- Streaming incremental parsing with partial tokens
- Multiple reasoning blocks
- Edge cases (empty blocks, whitespace-only, malformed input)
- State persistence across streaming calls
- Different parser configurations
All tests have been updated to use
BasicReasoningParser
, maintaining consistency with the rename.
c19f465
to
b5088e6
Compare
b5088e6
to
8247b64
Compare
… from rabbit, remove mut from non streaming parser
8247b64
to
9927adf
Compare
Signed-off-by: Hannah Zhang <[email protected]>
Overview:
Enable basic reasoning parsing of types of LLM outputs
Details:
Where should the reviewer start?
changes in aggregator and delta.rs
Summary by CodeRabbit
New Features
Refactor
Tests