[V1] Wrapper which plumbs request-level logits processors into vLLM batch-level logits processing #23656

afeldman-nm · 2025-08-26T14:26:07Z

Purpose

This PR adds support for passing request-level logits processors into the vLLM V1 engine. This is accomplished using a wrapper which builds a batch-level logits processor class out of a Callable request-level logits processor that complies with the interface described here https://docs.vllm.ai/en/v0.9.2/api/vllm/logits_process.html

Test Plan

Unit-test wraps request-level logits processor, passes to batch-level logits processor and confirms correct behavior

Test Result

Passes

Documentation update

Will be provided in #22919

njhill · 2025-08-26T17:36:01Z

vllm/v1/sample/logits_processor/__init__.py

I think we should call it RequestLogitsProcessor

njhill · 2025-08-26T17:38:11Z

vllm/v1/sample/logits_processor/__init__.py

The contents of this class looks like the min tokens LP ... is this just a copy-pasted temporary state before updating it with the wrapper LP impl?

mergify · 2025-08-28T13:53:50Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @afeldman-nm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Andrew Feldman <[email protected]>

WoosukKwon · 2025-08-28T15:30:11Z

@afeldman-nm @njhill Do we really want to support this feature? I thought we decided to deprecate it.

afeldman-nm · 2025-08-28T15:38:19Z

@afeldman-nm @njhill Do we really want to support this feature? I thought we decided to deprecate it.

Hi Woosuk. This PR represents a compromise solution. New logits processors should subclass the V1 LogitsProcessor base class to create a new type of logits processor that operates at batch granularity. However, there are entire libraries of pre-existing V0-style logits processor implementations that operate at the request granularity, for example

https://github.com/NVIDIA/logits-processor-zoo

so the purpose of this PR is to provide the thinnest possible wrapper for plumbing these into the logits processor extensibility interface, without making any changes to the interface defined in #19912

This way we are not committing to support V0-style processors in the engine internals/interface, but we are also still providing a solution for existing logits processor "zoos".

In the long term, anyone with a V0 style logits processor should upgrade their implementation for the perf benefits

afeldman-nm · 2025-08-28T15:40:49Z

@afeldman-nm @njhill Do we really want to support this feature? I thought we decided to deprecate it.

Once this PR and the logits processor documentation PR land, it might make sense for me to file an issue against https://github.com/NVIDIA/logits-processor-zoo and any other similar repo...

njhill

Thanks @afeldman-nm

vllm/v1/sample/logits_processor/__init__.py

Signed-off-by: Andrew Feldman <[email protected]>

njhill · 2025-08-28T15:59:32Z

@afeldman-nm @njhill Do we really want to support this feature? I thought we decided to deprecate it.

@WoosukKwon IMO we do want to support it for users that want to experiment and/or where performance isn't critical, but with the understand that vectorized LPs should be used for performance. As well as support for all the preexisting impls as @afeldman-nm said.

It's much easier to implement something to mutate the LPs of a single request.

Signed-off-by: Andrew Feldman <[email protected]>

njhill

Thanks @afeldman-nm, the class looks good to me now, just a couple of nits remaining

vllm/v1/sample/logits_processor/__init__.py

njhill · 2025-08-29T05:48:24Z

vllm/v1/sample/logits_processor/__init__.py

+    def __init__(self, vllm_config: VllmConfig, device: torch.device,
+                 is_pin_memory: bool):


I guess we don't need/want any args here. So that they can call super().__init__().

But we want the contract to be that their subclass of this must have those args, since that's expected of our regular LogitsProcessors (and they can also then make use of them if they need to).

I'm just not sure how to reflect that in the abstract class apart from via comments/doc.

If I am understanding your comment correctly, yes I think these arguments cannot be skipped unfortunately. I added a docstring explaining that these arguments may be used by the subclass but must be present regardless.

This is also worth mentioning in the docs, however I think I will leave documentation of this per-request logits processor wrapper to the docs PR.

I'm was saying that we don't actually need the args here if we require the subclasses to have such a constructor.

However on second thoughts maybe this is better because then implementations don't need to define an init method at all if they don't need any of the args (like your dummy impls for example).

Signed-off-by: Andrew Feldman <[email protected]>

njhill

Thanks @afeldman-nm! LGTM just a couple of tiny things

tests/v1/logits_processors/utils.py

vllm/v1/sample/logits_processor/__init__.py

Signed-off-by: Andrew Feldman <[email protected]>

…o lp_v0_wrap

njhill

Thanks @afeldman-nm !

* 'main' of https://github.com/845473182/vllm: (457 commits) [BugFix] Fix routed_scaling_factor double mul for dots1 and glm4 MoE models (vllm-project#24132) [Misc] Add check for dual_chunk_attention (vllm-project#24070) [Doc]: fix typos in Python comments (vllm-project#24115) [Doc]: fix typos in Python comments (vllm-project#24093) [Compile] Fix Compile Warning for `w4a8_mm_entry.cu` (vllm-project#23660) fix some typos (vllm-project#24071) [V1] Wrapper which plumbs request-level logits processors into vLLM batch-level logits processing (vllm-project#23656) Upgrade xgrammar to 0.1.23 (vllm-project#22988) Update release pipeline post PyTorch 2.8.0 update (vllm-project#24073) [XPU] Fix the bug of LoRA logits on the XPU platform (vllm-project#24081) [CI/Build] Disable SiluMul NVFP4 quant fusion tests (vllm-project#24121) [Bug] R1 Accuracy: Fix `routed_scaling_factor` Double Mul Issue (vllm-project#24119) [AMD][Kernel][Bugfix] Cast offsets tensor bn to tl.int64 to avoid GPU segfault (vllm-project#23692) [CI] Enable all hf transformers baselines in test_hybrid (vllm-project#23936) [Log] Only Print Profiler Results on Rank 0 (vllm-project#23370) Fix weights loading for Apertus (vllm-project#24100) [Metrics] Deprecate TPOT in favor of ITL (vllm-project#24110) [Bugfix] Fix packed_factor missing attribute error (vllm-project#23902) Run ruff format on a few files. (vllm-project#24075) [Bugfix] Fix transform_config parsing in Compressed Tensors (vllm-project#23945) ...

jwkirchenbauer · 2025-09-04T02:28:28Z

@afeldman-nm @njhill Do we really want to support this feature? I thought we decided to deprecate it.

Hi Woosuk. This PR represents a compromise solution. New logits processors should subclass the V1 LogitsProcessor base class to create a new type of logits processor that operates at batch granularity. However, there are entire libraries of pre-existing V0-style logits processor implementations that operate at the request granularity, for example

https://github.com/NVIDIA/logits-processor-zoo

so the purpose of this PR is to provide the thinnest possible wrapper for plumbing these into the logits processor extensibility interface, without making any changes to the interface defined in #19912

This way we are not committing to support V0-style processors in the engine internals/interface, but we are also still providing a solution for existing logits processor "zoos".

In the long term, anyone with a V0 style logits processor should upgrade their implementation for the perf benefits

Note that something here is still a little opaque about how to use logits processors that were developed in the V0 style.

If an old school logits processor is run in version 0.10.1.1, then you land at the export VLLM_USE_V1=0 solution because you get the message "vLLM V1 does not support per request user provided logits processors."
However, then you get the message "ValueError: Logits processors are not supported in multi-step decoding" which was added here: 37efc63#diff-c89ac25bd066e936e80260d21be63c7d2379cfedc371a9ff288fb5ba02ae1350R689-R1985

vllm/vllm/engine/llm_engine.py

Line 689 in 37efc63

"Logits processors are not supported in multi-step decoding")

The fix reach on that zoo repo's side was a downgrade to 0.10.0, which also just "worked" for me as well.

All in all though I get a sense that something needs to be resolved here but I have no experience with either library to understand why/how this is happening or whether it will get resolved.
Usecase is actually this repo, where watermarking methods are implemented as logits processors: https://github.com/THU-BPM/MarkLLM
The current state of vllm essentially breaks this kind of special LP usecase I think, and I assume there are more of them.

…atch-level logits processing (vllm-project#23656) Signed-off-by: Andrew Feldman <[email protected]>

mergify bot added documentation Improvements or additions to documentation v1 labels Aug 26, 2025

njhill reviewed Aug 26, 2025

View reviewed changes

mergify bot added the needs-rebase label Aug 28, 2025

afeldman-nm added 2 commits August 28, 2025 11:19

wrapper

56e4d74

Signed-off-by: Andrew Feldman <[email protected]>

Merge branch 'main' into lp_v0_wrap

ff1d90b

afeldman-nm force-pushed the lp_v0_plumb branch from 018d8be to ff1d90b Compare August 28, 2025 15:19

mergify bot removed the needs-rebase label Aug 28, 2025

afeldman-nm marked this pull request as ready for review August 28, 2025 15:24

afeldman-nm requested review from WoosukKwon, alexm-redhat, comaniac, robertgshaw2-redhat and ywang96 as code owners August 28, 2025 15:24

njhill reviewed Aug 28, 2025

View reviewed changes

skeleton of example

e66e2e0

Signed-off-by: Andrew Feldman <[email protected]>

njhill marked this pull request as draft August 28, 2025 16:38

afeldman-nm added 3 commits August 28, 2025 19:18

Merge branch 'main' into lp_v0_wrap

bdcb7af

feedback

2dc7b61

Signed-off-by: Andrew Feldman <[email protected]>

feedback

52fb9c8

Signed-off-by: Andrew Feldman <[email protected]>

njhill reviewed Aug 29, 2025

View reviewed changes

afeldman-nm added 4 commits August 29, 2025 09:37

Merge branch 'main' into lp_v0_wrap

ba7805b

feedback

8662cc9

Signed-off-by: Andrew Feldman <[email protected]>

example

484780d

Signed-off-by: Andrew Feldman <[email protected]>

test

e225a80

Signed-off-by: Andrew Feldman <[email protected]>

afeldman-nm and others added 10 commits September 1, 2025 00:07

rename

02a6b06

Signed-off-by: Andrew Feldman <[email protected]>

Merge branch 'main' into lp_v0_wrap

c559c8e

wip:

9347520

Signed-off-by: Andrew Feldman <[email protected]>

refactor

2dfcd00

Signed-off-by: Andrew Feldman <[email protected]>

Merge branch 'main' into lp_v0_wrap

17c7af5

refactor

df51606

Signed-off-by: Andrew Feldman <[email protected]>

rename

14d246d

Signed-off-by: Andrew Feldman <[email protected]>

refactor

97143c4

Signed-off-by: Andrew Feldman <[email protected]>

refactor

68ec593

Signed-off-by: Andrew Feldman <[email protected]>

Merge branch 'main' into lp_v0_plumb

e61a521

njhill approved these changes Sep 2, 2025

View reviewed changes

tests/v1/logits_processors/utils.py Outdated Show resolved Hide resolved

vllm/v1/sample/logits_processor/__init__.py Outdated Show resolved Hide resolved

afeldman-nm added 5 commits September 2, 2025 18:06

small fix; refactor

c23fec8

Signed-off-by: Andrew Feldman <[email protected]>

Merge branch 'main' into lp_v0_wrap

8a8bbd5

lint

67d1860

Signed-off-by: Andrew Feldman <[email protected]>

Merge branch 'main' into lp_v0_wrap

7256952

Merge branch 'lp_v0_plumb' of https://github.com/neuralmagic/vllm int…

cd81b39

…o lp_v0_wrap

njhill approved these changes Sep 2, 2025

View reviewed changes

njhill enabled auto-merge (squash) September 2, 2025 23:45

njhill merged commit 136d853 into vllm-project:main Sep 3, 2025
39 checks passed

afeldman-nm deleted the lp_v0_plumb branch September 3, 2025 03:23

jwkirchenbauer mentioned this pull request Sep 4, 2025

ValueError: Logits processors are not supported in multi-step decoding NVIDIA/logits-processor-zoo#29

Open

eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025

[V1] Wrapper which plumbs request-level logits processors into vLLM b…

02bc9fd

…atch-level logits processing (vllm-project#23656) Signed-off-by: Andrew Feldman <[email protected]>

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[V1] Wrapper which plumbs request-level logits processors into vLLM b…

e5147bf

…atch-level logits processing (vllm-project#23656) Signed-off-by: Andrew Feldman <[email protected]>

		def __init__(self, vllm_config: VllmConfig, device: torch.device,
		is_pin_memory: bool):

Uh oh!

[V1] Wrapper which plumbs request-level logits processors into vLLM batch-level logits processing #23656

[V1] Wrapper which plumbs request-level logits processors into vLLM batch-level logits processing #23656

Uh oh!

Conversation

afeldman-nm commented Aug 26, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Documentation update

Uh oh!

njhill Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

njhill Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Aug 28, 2025

Uh oh!

WoosukKwon commented Aug 28, 2025

Uh oh!

afeldman-nm commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

afeldman-nm commented Aug 28, 2025

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

njhill commented Aug 28, 2025

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

njhill Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

afeldman-nm Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

njhill Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jwkirchenbauer commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

afeldman-nm commented Aug 26, 2025 •

edited by github-actions bot

Loading

afeldman-nm commented Aug 28, 2025 •

edited

Loading

jwkirchenbauer commented Sep 4, 2025 •

edited

Loading