Skip to content

Conversation

messiaen
Copy link
Contributor

@messiaen messiaen commented Aug 13, 2025

Overview:

  • Passes output_options through to engine workers (e.g. return_logprobs)
  • Enables logprobs and top_logprobs

Details:

Where should the reviewer start?

  • lib/llm/src/protocols/openai/chat_completions/delta.rs
  • lib/llm/src/protocols/openai/completions/delta.rs

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • New Features
    • Per-token log probabilities now available in streaming responses (including prefill path).
    • OpenAI-style logprobs supported for Completions and Chat when requested.
    • New request/output options to control logprob behavior (enable/disable, prompt logprobs, etc.).
    • Streaming deltas now include token IDs together with logprobs for incremental updates.
    • Placeholder support for top-logprobs added to responses (field present; not yet populated).

Copy link
Contributor

coderabbitai bot commented Aug 13, 2025

Caution

Review failed

The head commit changed during the review from e25a256 to 49f3d15.

Walkthrough

Adds per-token logprob support end-to-end: sglang streaming now emits token_ids and log_probs; new TopLogprob/top_logprobs types and optional fields are threaded through engine/backend outputs; OutputOptions is introduced and propagated from OpenAI requests into preprocessors and delta generators, which now emit OpenAI-style logprobs.

Changes

Cohort / File(s) Summary
SGLang streaming
components/backends/sglang/src/dynamo/sglang/worker/main.py
Enable return_logprob on prefill/stream paths; remove batch streaming path; switch to per-token processing using offsets and itemgetter; emit token_ids and log_probs (or finish payload); add debug logging.
Common LLM types
lib/llm/src/protocols/common/llm_backend.rs
Add TopLogprob struct and TopLogprobs alias; add top_logprobs: Option<TopLogprobs> to BackendOutput and LLMEngineOutput; initialize as None in constructors.
Engine & backend plumbing
lib/llm/src/engines.rs, lib/llm/src/backend.rs, lib/engines/llamacpp/src/lib.rs, lib/engines/mistralrs/src/lib.rs, lib/llm/src/mocker/engine.rs
Thread top_logprobs into engine/backend outputs (initialized None where produced); add runtime logging; update create_choice call-sites to pass a 4th arg (None) where required.
OpenAI protocols & deltas
lib/llm/src/protocols/openai.rs, .../completions.rs, .../completions/delta.rs, .../chat_completions.rs, .../chat_completions/delta.rs
Add OutputOptionsProvider and adapter from OpenAI request types; extract and propagate output_options (logprobs settings); add DeltaGenerator::create_logprobs; expand create_choice signatures to accept logprobs and wire per-token logprobs into OpenAI-style responses.
Preprocessing & request surfaces
lib/llm/src/preprocessor.rs, lib/llm/src/protocols/common/preprocessor.rs, lib/llm/src/protocols/common.rs, lib/llm/src/migration.rs
Introduce OutputOptions type and output_options field on PreprocessedRequest/CompletionRequest; add provider trait bounds and builder plumbing; update tests/mocks to include output_options and top_logprobs.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant OpenAILayer
    participant Preprocessor
    participant Engine
    participant Backend
    participant DeltaGen
    participant Stream

    Client->>OpenAILayer: Request (with logprobs/output_options)
    OpenAILayer->>Preprocessor: extract_output_options(...)
    Preprocessor-->>OpenAILayer: PreprocessedRequest (includes output_options)
    OpenAILayer->>Engine: run request
    Engine->>Backend: generate tokens
    Backend-->>Engine: LLMEngineOutput { tokens, log_probs?, top_logprobs? }
    Engine-->>DeltaGen: BackendOutput { tokens, log_probs?, top_logprobs? }
    DeltaGen-->>Stream: create_choice(..., logprobs)
    Stream-->>Client: streaming deltas containing token logprobs
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

I hop through tokens, numbers in tow,
Counting logprobs where soft breezes blow.
Per-token crumbs of probabilistic cheer,
I stash them in outputs and whisper them near.
Thump-thump — new options, a carrot for code. 🥕🐇


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@grahamking
Copy link
Contributor

@coderabbitai full review

Copy link
Contributor

coderabbitai bot commented Aug 13, 2025

✅ Actions performed

Full review triggered.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🔭 Outside diff range comments (1)
lib/llm/src/protocols/openai/chat_completions/delta.rs (1)

25-32: Enable logprobs when either logprobs=true or top_logprobs>0

Currently enable_logprobs is gated only by self.inner.logprobs. If a client requests top_logprobs without setting logprobs=true (OpenAI supports this), no logprobs will be emitted. Include top_logprobs in the condition.

Apply:

-        let options = DeltaGeneratorOptions {
-            enable_usage: true,
-            enable_logprobs: self.inner.logprobs.unwrap_or(false),
-        };
+        let options = DeltaGeneratorOptions {
+            enable_usage: true,
+            // Enable if either logprobs=true or top_logprobs>0
+            enable_logprobs: self.inner.logprobs.unwrap_or(false)
+                || self.inner.top_logprobs.unwrap_or(0) > 0,
+        };
🧹 Nitpick comments (8)
components/backends/sglang/src/dynamo/sglang/worker/main.py (1)

278-278: Use lazy logging to avoid f-string work when DEBUG is disabled

Minor perf/readability improvement; prevents formatting when debug logs are off.

Apply:

-            logging.debug(f"Generated output: {out}")
+            logging.debug("Generated output: %s", out)
lib/llm/src/backend.rs (1)

221-221: Remove debugging log statement.

This debug log statement logs the entire data structure at the info level, which could be noisy in production and potentially expose sensitive information.

-                log::info!("data: {:?}", data);
lib/engines/llamacpp/src/lib.rs (1)

263-274: top_logprobs field wired through (initialized to None) — OK; consider conditional population when requested

This aligns the engine output shape with the new protocol. As a follow-up, consider conditionally computing and populating top_logprobs (and/or log_probs) when request.output_options.logprobs is Some(k). Llama’s logits are available after decode; you could derive top-k probabilities there to honor the request end-to-end.

If helpful, I can sketch the minimal wiring and a helper that extracts top-k from logits for k in 1..=N.

lib/llm/src/protocols/openai.rs (2)

82-91: Clarify semantics of the new OpenAIOutputOptionsProvider with concise docs

Add short doc comments to prevent ambiguity (e.g., that get_logprobs represents the number of top logprobs to emit per token, and that 0 means “disabled”/None). This helps maintainers implement consistent behavior across endpoints.

-trait OpenAIOutputOptionsProvider {
-    fn get_logprobs(&self) -> Option<u32>;
-
-    fn get_prompt_logprobs(&self) -> Option<u32>;
-
-    fn get_skip_special_tokens(&self) -> Option<bool>;
-
-    fn get_formatted_prompt(&self) -> Option<bool>;
-}
+trait OpenAIOutputOptionsProvider {
+    /// Number of top logprobs per token to include (e.g., 1..N). None disables logprobs.
+    /// Implementations should avoid returning Some(0); use None instead.
+    fn get_logprobs(&self) -> Option<u32>;
+
+    /// Number of top logprobs to include for prompt tokens (if supported). None disables.
+    fn get_prompt_logprobs(&self) -> Option<u32>;
+
+    /// Whether to skip special tokens in decoding/output (if supported).
+    fn get_skip_special_tokens(&self) -> Option<bool>;
+
+    /// Whether to format the prompt (e.g., apply templates) before generation (if supported).
+    fn get_formatted_prompt(&self) -> Option<bool>;
+}

181-195: Normalize zero values to None in extract_output_options for defensive behavior

Zero values for “counts” are effectively disabled. Normalizing here prevents passing Some(0) downstream and simplifies consumers.

 impl<T: OpenAIOutputOptionsProvider> OutputOptionsProvider for T {
     fn extract_output_options(&self) -> Result<common::OutputOptions> {
-        let logprobs = self.get_logprobs();
-        let prompt_logprobs = self.get_prompt_logprobs();
+        let logprobs = match self.get_logprobs() {
+            Some(0) => None,
+            other => other,
+        };
+        let prompt_logprobs = match self.get_prompt_logprobs() {
+            Some(0) => None,
+            other => other,
+        };
         let skip_special_tokens = self.get_skip_special_tokens();
         let formatted_prompt = self.get_formatted_prompt();
 
         Ok(common::OutputOptions {
             logprobs,
             prompt_logprobs,
             skip_special_tokens,
             formatted_prompt,
         })
     }
 }
lib/llm/src/mocker/engine.rs (1)

641-647: Optionally exercise logprobs in integration test to validate end-to-end plumbing

Setting output_options.logprobs for at least one request would validate the streaming/logprob path (even if engines currently don’t populate values, this guards future regressions in delta formatting).

-            output_options: OutputOptions::default(),
+            output_options: OutputOptions { logprobs: Some(1), ..Default::default() },

If you prefer to avoid hard-coding across all requests, apply this only to one test case to keep coverage focused.

lib/llm/src/protocols/openai/chat_completions.rs (1)

236-264: Defensive handling for top_logprobs=0 in get_logprobs

OpenAI clients shouldn’t send 0, and you validate top_logprobs elsewhere, but adding a local guard makes this robust to edge cases and keeps semantics consistent (0 → disabled).

     fn get_logprobs(&self) -> Option<u32> {
         match self.inner.logprobs {
             Some(logprobs) => {
                 if logprobs {
                     match self.inner.top_logprobs {
-                        Some(top_logprobs) => Some(top_logprobs as u32),
+                        Some(top_logprobs) if top_logprobs > 0 => Some(top_logprobs as u32),
+                        Some(_) => None,
                         None => Some(1_u32),
                     }
                 } else {
                     None
                 }
             }
             None => None,
         }
     }

Also: thanks for defaulting to 1 when logprobs=true and top_logprobs is unspecified—this matches common expectations.

Consider a brief doc comment on this behavior for future maintainers.

lib/llm/src/protocols/openai/completions.rs (1)

303-335: Non-stream conversion still drops logprobs

The TryFrom to Choice still sets logprobs=None (TODO). If you intend to support non-stream completions with logprobs, follow up by aggregating per-delta logprobs into the final Choice as well.

I can draft an aggregator that accumulates token strings, logprobs, and optional top_logprobs across deltas into the final Choice.Logprobs payload.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 72ec5f5 and 406ef5e.

📒 Files selected for processing (16)
  • components/backends/sglang/src/dynamo/sglang/worker/main.py (3 hunks)
  • lib/engines/llamacpp/src/lib.rs (1 hunks)
  • lib/engines/mistralrs/src/lib.rs (1 hunks)
  • lib/llm/src/backend.rs (1 hunks)
  • lib/llm/src/engines.rs (2 hunks)
  • lib/llm/src/migration.rs (3 hunks)
  • lib/llm/src/mocker/engine.rs (3 hunks)
  • lib/llm/src/preprocessor.rs (3 hunks)
  • lib/llm/src/protocols/common.rs (2 hunks)
  • lib/llm/src/protocols/common/llm_backend.rs (7 hunks)
  • lib/llm/src/protocols/common/preprocessor.rs (2 hunks)
  • lib/llm/src/protocols/openai.rs (3 hunks)
  • lib/llm/src/protocols/openai/chat_completions.rs (2 hunks)
  • lib/llm/src/protocols/openai/chat_completions/delta.rs (3 hunks)
  • lib/llm/src/protocols/openai/completions.rs (4 hunks)
  • lib/llm/src/protocols/openai/completions/delta.rs (4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-06-24T20:59:35.725Z
Learnt from: ishandhanani
PR: ai-dynamo/dynamo#1626
File: lib/llm/src/preprocessor.rs:238-239
Timestamp: 2025-06-24T20:59:35.725Z
Learning: In lib/llm/src/preprocessor.rs, the `sampling_options` call in the `preprocess_request` method is placed in the common section after the match statement on `request.prompt_input_type()`, meaning it applies to both `PromptInput::Tokens` and `PromptInput::Text` request types.

Applied to files:

  • lib/llm/src/protocols/common/preprocessor.rs
  • lib/llm/src/preprocessor.rs
  • lib/llm/src/protocols/openai/completions.rs
🧬 Code Graph Analysis (10)
lib/llm/src/protocols/common.rs (1)
lib/llm/src/protocols/openai.rs (1)
  • extract_output_options (182-194)
lib/llm/src/protocols/openai/chat_completions/delta.rs (1)
lib/llm/src/protocols/openai/completions/delta.rs (1)
  • create_logprobs (85-104)
lib/llm/src/protocols/openai/chat_completions.rs (2)
lib/llm/src/protocols/openai/completions.rs (5)
  • validate (362-383)
  • get_logprobs (339-341)
  • get_prompt_logprobs (343-348)
  • get_skip_special_tokens (350-352)
  • get_formatted_prompt (354-356)
lib/llm/src/protocols/openai.rs (4)
  • get_logprobs (83-83)
  • get_prompt_logprobs (85-85)
  • get_skip_special_tokens (87-87)
  • get_formatted_prompt (89-89)
lib/llm/src/protocols/openai.rs (3)
lib/llm/src/protocols/openai/chat_completions.rs (4)
  • get_logprobs (237-251)
  • get_prompt_logprobs (253-255)
  • get_skip_special_tokens (257-259)
  • get_formatted_prompt (261-263)
lib/llm/src/protocols/openai/completions.rs (4)
  • get_logprobs (339-341)
  • get_prompt_logprobs (343-348)
  • get_skip_special_tokens (350-352)
  • get_formatted_prompt (354-356)
lib/llm/src/protocols/common.rs (1)
  • extract_output_options (49-49)
lib/llm/src/migration.rs (1)
components/backends/sglang/src/dynamo/sglang/common/protocol.py (2)
  • SamplingOptions (33-45)
  • StopConditions (25-30)
lib/engines/mistralrs/src/lib.rs (2)
lib/llm/src/protocols/openai/chat_completions/delta.rs (1)
  • create_choice (160-207)
lib/llm/src/protocols/openai/completions/delta.rs (1)
  • create_choice (106-140)
lib/llm/src/engines.rs (2)
lib/llm/src/protocols/openai/chat_completions/delta.rs (1)
  • create_choice (160-207)
lib/llm/src/protocols/openai/completions/delta.rs (1)
  • create_choice (106-140)
lib/llm/src/preprocessor.rs (7)
lib/llm/src/protocols/common/preprocessor.rs (2)
  • builder (62-64)
  • builder (99-101)
lib/llm/src/protocols/openai/completions.rs (1)
  • builder (223-225)
lib/llm/src/protocols/common.rs (1)
  • builder (199-201)
lib/runtime/src/config.rs (1)
  • builder (159-161)
lib/llm/src/model_card.rs (1)
  • builder (143-145)
lib/llm/src/tokenizers.rs (1)
  • builder (436-438)
lib/runtime/src/transports/nats.rs (2)
  • builder (57-59)
  • builder (242-244)
lib/llm/src/protocols/openai/completions/delta.rs (1)
lib/llm/src/protocols/openai/chat_completions/delta.rs (2)
  • create_logprobs (115-147)
  • create_choice (160-207)
lib/llm/src/protocols/openai/completions.rs (1)
lib/llm/src/protocols/openai.rs (6)
  • nvext (55-55)
  • nvext (65-65)
  • get_logprobs (83-83)
  • get_prompt_logprobs (85-85)
  • get_skip_special_tokens (87-87)
  • get_formatted_prompt (89-89)
🔇 Additional comments (28)
components/backends/sglang/src/dynamo/sglang/worker/main.py (1)

10-11: Imports look good; itemgetter use is appropriate

No issues here. The narrowed typing imports are fine and itemgetter is used correctly below.

lib/llm/src/protocols/common/preprocessor.rs (2)

7-7: LGTM!

The import addition aligns with the introduction of OutputOptions to the module scope.


32-34: Well-documented and consistent output options field.

The field documentation clearly explains its purpose and follows the same style as existing fields. The positioning after sampling_options maintains logical grouping of request configuration options.

lib/engines/mistralrs/src/lib.rs (1)

593-593: No changes needed for create_choice parameters
The create_choice signature in the llm crate is

pub fn create_choice(
    &self,
    index: u32,
    text: Option<String>,
    finish_reason: Option<…>,
    logprobs: Option<…>,
) -> …

Across all callers—including lib/engines/mistralrs/src/lib.rs:593—the fourth argument correctly corresponds to the logprobs parameter, so None is valid here.

lib/llm/src/protocols/common/llm_backend.rs (3)

26-33: Well-structured TopLogprob implementation.

The TopLogprob struct has appropriate derives and clear field names. The type alias TopLogprobs effectively represents the two-dimensional structure (tokens × top_logprobs).


53-53: Consistent top_logprobs field addition.

The optional top_logprobs field is consistently added to both BackendOutput and LLMEngineOutput structs, maintaining the same positioning and type.

Also applies to: 91-91


109-109: Proper initialization in constructor methods.

All constructor methods (cancelled, stop, length, error) correctly initialize the new top_logprobs field to None, maintaining consistency with the existing pattern.

Also applies to: 123-123, 135-135, 148-148

lib/llm/src/backend.rs (1)

228-228: Correct propagation of top_logprobs field.

The BackendOutput construction correctly propagates the top_logprobs field from the LLMEngineOutput data, maintaining the optional nature of the field.

lib/llm/src/protocols/common.rs (2)

48-50: Clean trait definition for OutputOptions extraction.

The OutputOptionsProvider trait follows the established pattern of other provider traits in the module, with a clear method signature that returns a Result.


186-187: Proper builder pattern integration.

The output_options field is correctly added to CompletionRequest with the #[builder(default)] attribute, allowing it to be optional in the builder pattern while maintaining backward compatibility.

lib/llm/src/protocols/openai.rs (1)

20-22: Added OutputOptionsProvider to imports — LGTM

This is the right place to wire the new OutputOptionsProvider alongside sampling and stop conditions.

lib/llm/src/mocker/engine.rs (2)

408-411: Mock output now includes top_logprobs: None — LGTM

The mock engine’s output shape matches the protocol evolution and remains backward compatible.


530-530: Tests import OutputOptions — LGTM

Bringing OutputOptions into the test module is correct given the new PreprocessedRequest field.

lib/llm/src/migration.rs (3)

170-170: Tests import OutputOptions — LGTM

Matches the new field usage in PreprocessedRequest.


186-191: PreprocessedRequest now carries output_options — LGTM

Defaulting here is appropriate for these migration tests.


195-205: Mock output includes top_logprobs: None — LGTM

The test fixtures reflect the protocol changes without altering existing test intent.

lib/llm/src/protocols/openai/chat_completions.rs (1)

26-28: Bridging trait imported here — LGTM

Correctly pulls in OpenAIOutputOptionsProvider to expose OutputOptions via the blanket impl.

lib/llm/src/preprocessor.rs (3)

36-44: Importing OutputOptionsProvider is correct and necessary

Good call threading output options through preprocessing. This aligns with the new OutputOptions plumbing.


251-255: Threading output_options into PreprocessedRequest

Placing builder.output_options(request.extract_output_options()?) alongside sampling and stop conditions keeps concerns centralized and consistent for both token and text inputs. Looks good.


145-151: Bound Verified: OutputOptionsProvider is implemented for all chat/completion request types

I confirmed that both NvCreateCompletionRequest and NvCreateChatCompletionRequest implement OpenAIOutputOptionsProvider, and there’s a blanket impl in lib/llm/src/protocols/openai.rs (lines 181–184) that makes any T: OpenAIOutputOptionsProvider also implement OutputOptionsProvider. Adding the OutputOptionsProvider bound on preprocess_request will not break compilation.

lib/llm/src/engines.rs (2)

98-109: LLMEngineOutput now includes top_logprobs: None

Adding the new field (initialized to None) keeps Echo engines compiling against the expanded API while backends start populating it. No issues.


244-251: Updated create_choice signature usage (added logprobs)

Call sites updated to pass the new logprobs parameter (None). This matches the new signature and keeps Echo behavior unchanged.

lib/llm/src/protocols/openai/completions/delta.rs (3)

23-27: Enablement condition for completions looks correct

Using inner.logprobs.unwrap_or(0) > 0 to drive enable_logprobs matches the legacy Completions API semantics.


110-131: Choice construction now includes logprobs

Wiring logprobs into create_choice is correct and matches async-openai Choice schema.


161-168: Good: passing through delta.top_logprobs for future use

This matches the PR’s direction and keeps parity with the chat path once it’s updated to forward top_logprobs too.

lib/llm/src/protocols/openai/completions.rs (3)

24-29: Importing OutputOptionsProvider and related providers

This is needed for extract_output_options and is consistent with the new plumbing.


283-287: Extracting output_options during TryFrom is correct

Forwarding output options into common::CompletionRequest ensures backends can honor logprob/top_logprob requests end-to-end.


296-299: CompletionRequest now carries output_options

This matches changes across the common layer. Looks good.

@grahamking
Copy link
Contributor

@messiaen Can you take a look at the Code Rabbit review, and "Resolve" the ones that don't apply? I usually find them really insightful.

@messiaen
Copy link
Contributor Author

@coderabbitai full review

Copy link
Contributor

coderabbitai bot commented Aug 14, 2025

✅ Actions performed

Full review triggered.

Signed-off-by: Greg Clark <[email protected]>
@messiaen messiaen changed the title feat: Token logprob handling feat: logprob handling Aug 14, 2025
@messiaen messiaen merged commit f476fd7 into main Aug 14, 2025
14 of 15 checks passed
@messiaen messiaen deleted the grclark/rust-logprobs branch August 14, 2025 15:38
hhzhang16 pushed a commit that referenced this pull request Aug 27, 2025
Signed-off-by: Greg Clark <[email protected]>
Signed-off-by: Hannah Zhang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants