-
Notifications
You must be signed in to change notification settings - Fork 580
feat: logprob handling #2426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: logprob handling #2426
Conversation
48ce1da
to
406ef5e
Compare
WalkthroughAdds per-token logprob support end-to-end: sglang streaming now emits token_ids and log_probs; new TopLogprob/top_logprobs types and optional fields are threaded through engine/backend outputs; OutputOptions is introduced and propagated from OpenAI requests into preprocessors and delta generators, which now emit OpenAI-style logprobs. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant OpenAILayer
participant Preprocessor
participant Engine
participant Backend
participant DeltaGen
participant Stream
Client->>OpenAILayer: Request (with logprobs/output_options)
OpenAILayer->>Preprocessor: extract_output_options(...)
Preprocessor-->>OpenAILayer: PreprocessedRequest (includes output_options)
OpenAILayer->>Engine: run request
Engine->>Backend: generate tokens
Backend-->>Engine: LLMEngineOutput { tokens, log_probs?, top_logprobs? }
Engine-->>DeltaGen: BackendOutput { tokens, log_probs?, top_logprobs? }
DeltaGen-->>Stream: create_choice(..., logprobs)
Stream-->>Client: streaming deltas containing token logprobs
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🔭 Outside diff range comments (1)
lib/llm/src/protocols/openai/chat_completions/delta.rs (1)
25-32
: Enable logprobs when either logprobs=true or top_logprobs>0Currently enable_logprobs is gated only by self.inner.logprobs. If a client requests top_logprobs without setting logprobs=true (OpenAI supports this), no logprobs will be emitted. Include top_logprobs in the condition.
Apply:
- let options = DeltaGeneratorOptions { - enable_usage: true, - enable_logprobs: self.inner.logprobs.unwrap_or(false), - }; + let options = DeltaGeneratorOptions { + enable_usage: true, + // Enable if either logprobs=true or top_logprobs>0 + enable_logprobs: self.inner.logprobs.unwrap_or(false) + || self.inner.top_logprobs.unwrap_or(0) > 0, + };
🧹 Nitpick comments (8)
components/backends/sglang/src/dynamo/sglang/worker/main.py (1)
278-278
: Use lazy logging to avoid f-string work when DEBUG is disabledMinor perf/readability improvement; prevents formatting when debug logs are off.
Apply:
- logging.debug(f"Generated output: {out}") + logging.debug("Generated output: %s", out)lib/llm/src/backend.rs (1)
221-221
: Remove debugging log statement.This debug log statement logs the entire data structure at the info level, which could be noisy in production and potentially expose sensitive information.
- log::info!("data: {:?}", data);
lib/engines/llamacpp/src/lib.rs (1)
263-274
: top_logprobs field wired through (initialized to None) — OK; consider conditional population when requestedThis aligns the engine output shape with the new protocol. As a follow-up, consider conditionally computing and populating top_logprobs (and/or log_probs) when request.output_options.logprobs is Some(k). Llama’s logits are available after decode; you could derive top-k probabilities there to honor the request end-to-end.
If helpful, I can sketch the minimal wiring and a helper that extracts top-k from logits for k in 1..=N.
lib/llm/src/protocols/openai.rs (2)
82-91
: Clarify semantics of the new OpenAIOutputOptionsProvider with concise docsAdd short doc comments to prevent ambiguity (e.g., that get_logprobs represents the number of top logprobs to emit per token, and that 0 means “disabled”/None). This helps maintainers implement consistent behavior across endpoints.
-trait OpenAIOutputOptionsProvider { - fn get_logprobs(&self) -> Option<u32>; - - fn get_prompt_logprobs(&self) -> Option<u32>; - - fn get_skip_special_tokens(&self) -> Option<bool>; - - fn get_formatted_prompt(&self) -> Option<bool>; -} +trait OpenAIOutputOptionsProvider { + /// Number of top logprobs per token to include (e.g., 1..N). None disables logprobs. + /// Implementations should avoid returning Some(0); use None instead. + fn get_logprobs(&self) -> Option<u32>; + + /// Number of top logprobs to include for prompt tokens (if supported). None disables. + fn get_prompt_logprobs(&self) -> Option<u32>; + + /// Whether to skip special tokens in decoding/output (if supported). + fn get_skip_special_tokens(&self) -> Option<bool>; + + /// Whether to format the prompt (e.g., apply templates) before generation (if supported). + fn get_formatted_prompt(&self) -> Option<bool>; +}
181-195
: Normalize zero values to None in extract_output_options for defensive behaviorZero values for “counts” are effectively disabled. Normalizing here prevents passing Some(0) downstream and simplifies consumers.
impl<T: OpenAIOutputOptionsProvider> OutputOptionsProvider for T { fn extract_output_options(&self) -> Result<common::OutputOptions> { - let logprobs = self.get_logprobs(); - let prompt_logprobs = self.get_prompt_logprobs(); + let logprobs = match self.get_logprobs() { + Some(0) => None, + other => other, + }; + let prompt_logprobs = match self.get_prompt_logprobs() { + Some(0) => None, + other => other, + }; let skip_special_tokens = self.get_skip_special_tokens(); let formatted_prompt = self.get_formatted_prompt(); Ok(common::OutputOptions { logprobs, prompt_logprobs, skip_special_tokens, formatted_prompt, }) } }lib/llm/src/mocker/engine.rs (1)
641-647
: Optionally exercise logprobs in integration test to validate end-to-end plumbingSetting output_options.logprobs for at least one request would validate the streaming/logprob path (even if engines currently don’t populate values, this guards future regressions in delta formatting).
- output_options: OutputOptions::default(), + output_options: OutputOptions { logprobs: Some(1), ..Default::default() },If you prefer to avoid hard-coding across all requests, apply this only to one test case to keep coverage focused.
lib/llm/src/protocols/openai/chat_completions.rs (1)
236-264
: Defensive handling for top_logprobs=0 in get_logprobsOpenAI clients shouldn’t send 0, and you validate top_logprobs elsewhere, but adding a local guard makes this robust to edge cases and keeps semantics consistent (0 → disabled).
fn get_logprobs(&self) -> Option<u32> { match self.inner.logprobs { Some(logprobs) => { if logprobs { match self.inner.top_logprobs { - Some(top_logprobs) => Some(top_logprobs as u32), + Some(top_logprobs) if top_logprobs > 0 => Some(top_logprobs as u32), + Some(_) => None, None => Some(1_u32), } } else { None } } None => None, } }Also: thanks for defaulting to 1 when logprobs=true and top_logprobs is unspecified—this matches common expectations.
Consider a brief doc comment on this behavior for future maintainers.
lib/llm/src/protocols/openai/completions.rs (1)
303-335
: Non-stream conversion still drops logprobsThe TryFrom to Choice still sets logprobs=None (TODO). If you intend to support non-stream completions with logprobs, follow up by aggregating per-delta logprobs into the final Choice as well.
I can draft an aggregator that accumulates token strings, logprobs, and optional top_logprobs across deltas into the final Choice.Logprobs payload.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (16)
components/backends/sglang/src/dynamo/sglang/worker/main.py
(3 hunks)lib/engines/llamacpp/src/lib.rs
(1 hunks)lib/engines/mistralrs/src/lib.rs
(1 hunks)lib/llm/src/backend.rs
(1 hunks)lib/llm/src/engines.rs
(2 hunks)lib/llm/src/migration.rs
(3 hunks)lib/llm/src/mocker/engine.rs
(3 hunks)lib/llm/src/preprocessor.rs
(3 hunks)lib/llm/src/protocols/common.rs
(2 hunks)lib/llm/src/protocols/common/llm_backend.rs
(7 hunks)lib/llm/src/protocols/common/preprocessor.rs
(2 hunks)lib/llm/src/protocols/openai.rs
(3 hunks)lib/llm/src/protocols/openai/chat_completions.rs
(2 hunks)lib/llm/src/protocols/openai/chat_completions/delta.rs
(3 hunks)lib/llm/src/protocols/openai/completions.rs
(4 hunks)lib/llm/src/protocols/openai/completions/delta.rs
(4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-06-24T20:59:35.725Z
Learnt from: ishandhanani
PR: ai-dynamo/dynamo#1626
File: lib/llm/src/preprocessor.rs:238-239
Timestamp: 2025-06-24T20:59:35.725Z
Learning: In lib/llm/src/preprocessor.rs, the `sampling_options` call in the `preprocess_request` method is placed in the common section after the match statement on `request.prompt_input_type()`, meaning it applies to both `PromptInput::Tokens` and `PromptInput::Text` request types.
Applied to files:
lib/llm/src/protocols/common/preprocessor.rs
lib/llm/src/preprocessor.rs
lib/llm/src/protocols/openai/completions.rs
🧬 Code Graph Analysis (10)
lib/llm/src/protocols/common.rs (1)
lib/llm/src/protocols/openai.rs (1)
extract_output_options
(182-194)
lib/llm/src/protocols/openai/chat_completions/delta.rs (1)
lib/llm/src/protocols/openai/completions/delta.rs (1)
create_logprobs
(85-104)
lib/llm/src/protocols/openai/chat_completions.rs (2)
lib/llm/src/protocols/openai/completions.rs (5)
validate
(362-383)get_logprobs
(339-341)get_prompt_logprobs
(343-348)get_skip_special_tokens
(350-352)get_formatted_prompt
(354-356)lib/llm/src/protocols/openai.rs (4)
get_logprobs
(83-83)get_prompt_logprobs
(85-85)get_skip_special_tokens
(87-87)get_formatted_prompt
(89-89)
lib/llm/src/protocols/openai.rs (3)
lib/llm/src/protocols/openai/chat_completions.rs (4)
get_logprobs
(237-251)get_prompt_logprobs
(253-255)get_skip_special_tokens
(257-259)get_formatted_prompt
(261-263)lib/llm/src/protocols/openai/completions.rs (4)
get_logprobs
(339-341)get_prompt_logprobs
(343-348)get_skip_special_tokens
(350-352)get_formatted_prompt
(354-356)lib/llm/src/protocols/common.rs (1)
extract_output_options
(49-49)
lib/llm/src/migration.rs (1)
components/backends/sglang/src/dynamo/sglang/common/protocol.py (2)
SamplingOptions
(33-45)StopConditions
(25-30)
lib/engines/mistralrs/src/lib.rs (2)
lib/llm/src/protocols/openai/chat_completions/delta.rs (1)
create_choice
(160-207)lib/llm/src/protocols/openai/completions/delta.rs (1)
create_choice
(106-140)
lib/llm/src/engines.rs (2)
lib/llm/src/protocols/openai/chat_completions/delta.rs (1)
create_choice
(160-207)lib/llm/src/protocols/openai/completions/delta.rs (1)
create_choice
(106-140)
lib/llm/src/preprocessor.rs (7)
lib/llm/src/protocols/common/preprocessor.rs (2)
builder
(62-64)builder
(99-101)lib/llm/src/protocols/openai/completions.rs (1)
builder
(223-225)lib/llm/src/protocols/common.rs (1)
builder
(199-201)lib/runtime/src/config.rs (1)
builder
(159-161)lib/llm/src/model_card.rs (1)
builder
(143-145)lib/llm/src/tokenizers.rs (1)
builder
(436-438)lib/runtime/src/transports/nats.rs (2)
builder
(57-59)builder
(242-244)
lib/llm/src/protocols/openai/completions/delta.rs (1)
lib/llm/src/protocols/openai/chat_completions/delta.rs (2)
create_logprobs
(115-147)create_choice
(160-207)
lib/llm/src/protocols/openai/completions.rs (1)
lib/llm/src/protocols/openai.rs (6)
nvext
(55-55)nvext
(65-65)get_logprobs
(83-83)get_prompt_logprobs
(85-85)get_skip_special_tokens
(87-87)get_formatted_prompt
(89-89)
🔇 Additional comments (28)
components/backends/sglang/src/dynamo/sglang/worker/main.py (1)
10-11
: Imports look good;itemgetter
use is appropriateNo issues here. The narrowed typing imports are fine and
itemgetter
is used correctly below.lib/llm/src/protocols/common/preprocessor.rs (2)
7-7
: LGTM!The import addition aligns with the introduction of OutputOptions to the module scope.
32-34
: Well-documented and consistent output options field.The field documentation clearly explains its purpose and follows the same style as existing fields. The positioning after
sampling_options
maintains logical grouping of request configuration options.lib/engines/mistralrs/src/lib.rs (1)
593-593
: No changes needed forcreate_choice
parameters
Thecreate_choice
signature in the llm crate ispub fn create_choice( &self, index: u32, text: Option<String>, finish_reason: Option<…>, logprobs: Option<…>, ) -> …Across all callers—including
lib/engines/mistralrs/src/lib.rs:593
—the fourth argument correctly corresponds to thelogprobs
parameter, soNone
is valid here.lib/llm/src/protocols/common/llm_backend.rs (3)
26-33
: Well-structured TopLogprob implementation.The TopLogprob struct has appropriate derives and clear field names. The type alias TopLogprobs effectively represents the two-dimensional structure (tokens × top_logprobs).
53-53
: Consistent top_logprobs field addition.The optional
top_logprobs
field is consistently added to both BackendOutput and LLMEngineOutput structs, maintaining the same positioning and type.Also applies to: 91-91
109-109
: Proper initialization in constructor methods.All constructor methods (cancelled, stop, length, error) correctly initialize the new
top_logprobs
field toNone
, maintaining consistency with the existing pattern.Also applies to: 123-123, 135-135, 148-148
lib/llm/src/backend.rs (1)
228-228
: Correct propagation of top_logprobs field.The BackendOutput construction correctly propagates the
top_logprobs
field from the LLMEngineOutput data, maintaining the optional nature of the field.lib/llm/src/protocols/common.rs (2)
48-50
: Clean trait definition for OutputOptions extraction.The OutputOptionsProvider trait follows the established pattern of other provider traits in the module, with a clear method signature that returns a Result.
186-187
: Proper builder pattern integration.The
output_options
field is correctly added to CompletionRequest with the#[builder(default)]
attribute, allowing it to be optional in the builder pattern while maintaining backward compatibility.lib/llm/src/protocols/openai.rs (1)
20-22
: Added OutputOptionsProvider to imports — LGTMThis is the right place to wire the new OutputOptionsProvider alongside sampling and stop conditions.
lib/llm/src/mocker/engine.rs (2)
408-411
: Mock output now includes top_logprobs: None — LGTMThe mock engine’s output shape matches the protocol evolution and remains backward compatible.
530-530
: Tests import OutputOptions — LGTMBringing OutputOptions into the test module is correct given the new PreprocessedRequest field.
lib/llm/src/migration.rs (3)
170-170
: Tests import OutputOptions — LGTMMatches the new field usage in PreprocessedRequest.
186-191
: PreprocessedRequest now carries output_options — LGTMDefaulting here is appropriate for these migration tests.
195-205
: Mock output includes top_logprobs: None — LGTMThe test fixtures reflect the protocol changes without altering existing test intent.
lib/llm/src/protocols/openai/chat_completions.rs (1)
26-28
: Bridging trait imported here — LGTMCorrectly pulls in OpenAIOutputOptionsProvider to expose OutputOptions via the blanket impl.
lib/llm/src/preprocessor.rs (3)
36-44
: Importing OutputOptionsProvider is correct and necessaryGood call threading output options through preprocessing. This aligns with the new OutputOptions plumbing.
251-255
: Threading output_options into PreprocessedRequestPlacing builder.output_options(request.extract_output_options()?) alongside sampling and stop conditions keeps concerns centralized and consistent for both token and text inputs. Looks good.
145-151
: Bound Verified:OutputOptionsProvider
is implemented for all chat/completion request typesI confirmed that both
NvCreateCompletionRequest
andNvCreateChatCompletionRequest
implementOpenAIOutputOptionsProvider
, and there’s a blanket impl inlib/llm/src/protocols/openai.rs
(lines 181–184) that makes anyT: OpenAIOutputOptionsProvider
also implementOutputOptionsProvider
. Adding theOutputOptionsProvider
bound onpreprocess_request
will not break compilation.lib/llm/src/engines.rs (2)
98-109
: LLMEngineOutput now includes top_logprobs: NoneAdding the new field (initialized to None) keeps Echo engines compiling against the expanded API while backends start populating it. No issues.
244-251
: Updated create_choice signature usage (added logprobs)Call sites updated to pass the new logprobs parameter (None). This matches the new signature and keeps Echo behavior unchanged.
lib/llm/src/protocols/openai/completions/delta.rs (3)
23-27
: Enablement condition for completions looks correctUsing inner.logprobs.unwrap_or(0) > 0 to drive enable_logprobs matches the legacy Completions API semantics.
110-131
: Choice construction now includes logprobsWiring logprobs into create_choice is correct and matches async-openai Choice schema.
161-168
: Good: passing through delta.top_logprobs for future useThis matches the PR’s direction and keeps parity with the chat path once it’s updated to forward top_logprobs too.
lib/llm/src/protocols/openai/completions.rs (3)
24-29
: Importing OutputOptionsProvider and related providersThis is needed for extract_output_options and is consistent with the new plumbing.
283-287
: Extracting output_options during TryFrom is correctForwarding output options into common::CompletionRequest ensures backends can honor logprob/top_logprob requests end-to-end.
296-299
: CompletionRequest now carries output_optionsThis matches changes across the common layer. Looks good.
@messiaen Can you take a look at the Code Rabbit review, and "Resolve" the ones that don't apply? I usually find them really insightful. |
Signed-off-by: Greg Clark <[email protected]>
406ef5e
to
fc8cd4e
Compare
Signed-off-by: Greg Clark <[email protected]>
@coderabbitai full review |
✅ Actions performedFull review triggered. |
Signed-off-by: Greg Clark <[email protected]>
Signed-off-by: Greg Clark <[email protected]> Signed-off-by: Hannah Zhang <[email protected]>
Overview:
top_logprobs
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit