Skip to content

Conversation

afeldman-nm
Copy link
Contributor

@afeldman-nm afeldman-nm commented Aug 14, 2025

Purpose

  • Document the vLLM v1 logits processor functionality including built-in logits processors and custom logits processors

Test Plan

N/A

Test Result

N/A

(Optional) Documentation Update

See Purpose

Essential Elements of an Effective PR Description Checklist
  • [x ] The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • [ x] The test plan, such as providing test command.
  • [ x] The test results, such as pasting the results comparison before and after, or e2e results
  • [ x] (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Signed-off-by: Andrew Feldman <[email protected]>
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added the documentation Improvements or additions to documentation label Aug 14, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces documentation for the custom logits processor extensibility feature. The new markdown file explains how to create and use custom logits processors, including a code example. My review focuses on fixing several issues in this code example to make it functional and clear for developers.

Comment on lines 16 to 77
The contrived example below implements a

??? code "Example custom logits processor definition"

```python
from typing import Optional
import torch
from vllm.config import VllmConfig
from vllm.sampling_params import SamplingParams
from vllm.v1.sample.logits_processor import (BatchUpdate,
LogitsProcessor,
MoveDirectionality)

class DummyLogitsProcessor(LogitsProcessor):
"""Fake logit processor to support unit testing and examples"""

def __init__(self, vllm_config: "VllmConfig", device: torch.device,
is_pin_memory: bool):
self.req_info: dict[int, SamplingParams] = {}

def is_argmax_invariant(self) -> bool:
"""Never impacts greedy sampling"""
return False

def update_state(self, batch_update: Optional[BatchUpdate]):
if not batch_update:
return

# Process added requests.
for index, params, _, _ in batch_update.added:
assert params is not None
if params.extra_args and (target_token :=
params.extra_args.get("target_token")):
self.req_info[index] = target_token

if self.req_info:
# Process removed requests.
for index in batch_update.removed:
self.req_info.pop(index, None)

# Process moved requests, unidirectional move (a->b) and swap
# (a<->b)
for adx, bdx, direct in batch_update.moved:
a_val = self.req_info.pop(adx, None)
b_val = self.req_info.pop(bdx, None)
if a_val is not None:
self.req_info[bdx] = a_val
if direct == MoveDirectionality.SWAP and b_val is not None:
self.req_info[adx] = b_val

def apply(self, logits: torch.Tensor) -> torch.Tensor:
if not self.req_info:
return logits

# Save target values before modification
rows_list = list(self.req_info.keys())
cols = torch.tensor([self.req_info[i] for i in rows_list],
dtype=torch.long,
device=logits.device)
rows = torch.tensor(rows_list, dtype=torch.long, device=logits.device)
values_to_keep = logits[rows, cols].clone()
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The example DummyLogitsProcessor has a few issues that prevent it from working correctly and could confuse users:

  1. The introductory sentence on line 16 is incomplete.
  2. The type hint for self.req_info in __init__ is dict[int, SamplingParams], but it is used to store integer target_token values. This should be dict[int, int].
  3. The apply method is incomplete. When self.req_info is populated, it doesn't return the logits tensor, which will lead to a runtime error. The logic is also unfinished.

I've provided a corrected version of the example that addresses these points, making it a complete and functional illustration of a custom logits processor.

The contrived example below implements a logits processor that forces the model to select a specific `target_token` for requests that provide it.

??? code "Example custom logits processor definition"

    ```python
    from typing import Optional
    import torch
    from vllm.config import VllmConfig
    from vllm.sampling_params import SamplingParams
    from vllm.v1.sample.logits_processor import (BatchUpdate,
                                                LogitsProcessor,
                                                MoveDirectionality)

    class DummyLogitsProcessor(LogitsProcessor):
        """Fake logit processor to support unit testing and examples"""

        def __init__(self, vllm_config: "VllmConfig", device: torch.device,
                    is_pin_memory: bool):
            self.req_info: dict[int, int] = {}

        def is_argmax_invariant(self) -> bool:
            """Never impacts greedy sampling"""
            return False

        def update_state(self, batch_update: Optional[BatchUpdate]):
            if not batch_update:
                return

            # Process added requests.
            for index, params, _, _ in batch_update.added:
                assert params is not None
                if params.extra_args and (target_token :=
                                        params.extra_args.get("target_token")):
                    self.req_info[index] = target_token

            if self.req_info:
                # Process removed requests.
                for index in batch_update.removed:
                    self.req_info.pop(index, None)

                # Process moved requests, unidirectional move (a->b) and swap
                # (a<->b)
                for adx, bdx, direct in batch_update.moved:
                    a_val = self.req_info.pop(adx, None)
                    b_val = self.req_info.pop(bdx, None)
                    if a_val is not None:
                        self.req_info[bdx] = a_val
                    if direct == MoveDirectionality.SWAP and b_val is not None:
                        self.req_info[adx] = b_val

        def apply(self, logits: torch.Tensor) -> torch.Tensor:
            if not self.req_info:
                return logits

            rows_list = list(self.req_info.keys())
            cols = torch.tensor([self.req_info[i] for i in rows_list],
                                dtype=torch.long,
                                device=logits.device)
            rows = torch.tensor(rows_list, dtype=torch.long, device=logits.device)

            # Get the original logits for the target tokens.
            values_to_keep = logits[rows, cols].clone()

            # For requests with a target token, set all other logits to -inf.
            # This is a contrived example to force the model to select the
            # target token.
            for row_idx in rows_list:
                logits[row_idx, :] = -float("inf")

            logits[rows, cols] = values_to_keep
            return logits
    ```

Signed-off-by: Andrew Feldman <[email protected]>
Signed-off-by: Andrew Feldman <[email protected]>
@afeldman-nm afeldman-nm marked this pull request as ready for review August 19, 2025 14:41
@afeldman-nm afeldman-nm requested a review from hmellor as a code owner August 19, 2025 14:41
afeldman-nm and others added 3 commits August 19, 2025 10:56
Signed-off-by: Andrew Feldman <[email protected]>
Signed-off-by: Andrew Feldman <[email protected]>
@afeldman-nm afeldman-nm changed the title [V1] Logits processor extensibility docs [V1] Logits processor docs, V0 logits processor wrapper, and V0 logits processor docs Aug 20, 2025
Signed-off-by: Andrew Feldman <[email protected]>
Signed-off-by: Andrew Feldman <[email protected]>
Signed-off-by: Andrew Feldman <[email protected]>
Signed-off-by: Andrew Feldman <[email protected]>
Signed-off-by: Andrew Feldman <[email protected]>
@mergify mergify bot added the v1 label Aug 21, 2025
Signed-off-by: Andrew Feldman <[email protected]>
@afeldman-nm
Copy link
Contributor Author

Thank you @JosephMarinier for your review, I believe I addressed everything you mentioned

@afeldman-nm afeldman-nm changed the title [V1] Logits processor docs, V0 logits processor wrapper, and V0 logits processor docs [V1] Logits processor docs Aug 21, 2025
Copy link
Contributor

@JosephMarinier JosephMarinier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the cool feature! 🙏

Signed-off-by: Andrew Feldman <[email protected]>
Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @afeldman-nm, looks great apart from the one comment, we could merge this since that will likely need to change soon anyhow.

Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @afeldman-nm

@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 17, 2025
@njhill njhill enabled auto-merge (squash) September 17, 2025 15:30
@njhill njhill merged commit 7ae9887 into vllm-project:main Sep 17, 2025
42 checks passed
debroy-rh pushed a commit to debroy-rh/vllm that referenced this pull request Sep 19, 2025
Signed-off-by: Andrew Feldman <[email protected]>
Signed-off-by: afeldman-nm <[email protected]>
Co-authored-by: Joseph Marinier <[email protected]>
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
Signed-off-by: Andrew Feldman <[email protected]>
Signed-off-by: afeldman-nm <[email protected]>
Co-authored-by: Joseph Marinier <[email protected]>
charlifu pushed a commit to ROCm/vllm that referenced this pull request Sep 25, 2025
Signed-off-by: Andrew Feldman <[email protected]>
Signed-off-by: afeldman-nm <[email protected]>
Co-authored-by: Joseph Marinier <[email protected]>
Signed-off-by: charlifu <[email protected]>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
Signed-off-by: Andrew Feldman <[email protected]>
Signed-off-by: afeldman-nm <[email protected]>
Co-authored-by: Joseph Marinier <[email protected]>
Signed-off-by: xuebwang-amd <[email protected]>
choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025
Signed-off-by: Andrew Feldman <[email protected]>
Signed-off-by: afeldman-nm <[email protected]>
Co-authored-by: Joseph Marinier <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants