[Bugfix] Validate logit biases to prevent out of vocab ids crashing engine #16529

rymc · 2025-04-12T07:08:40Z

In V1 when an out of vocab token ID is submitted within logit_bias in a request it will cause the engine to crash.

For example, the following request will lead to an engine crash as the token ID is out of the vocab for the model.

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Llama-3.1-8B-Instruct",
    "max_tokens": 4000,
    "logit_bias": {
      "191222": -100,
      "99999": 5
    },
    "messages": [
      {
        "role": "user",
        "content": "Hello, tell me about a nice dog."
      }
    ]
  }'

INFO 04-12 06:41:33 [async_llm.py:228] Added request chatcmpl-5da6fafa57b640e4835e057cf19021f2.
ERROR 04-12 06:41:33 [core.py:390] EngineCore hit an exception: Traceback (most recent call last):
ERROR 04-12 06:41:33 [core.py:390]   File "/home/ryan/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 383, in run_engine_core
ERROR 04-12 06:41:33 [core.py:390]     engine_core.run_busy_loop()
ERROR 04-12 06:41:33 [core.py:390]   File "/home/ryan/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 405, in run_busy_loop
ERROR 04-12 06:41:33 [core.py:390]     self._process_engine_step()
ERROR 04-12 06:41:33 [core.py:390]   File "/home/ryan/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 434, in _process_engine_step
ERROR 04-12 06:41:33 [core.py:390]     outputs = self.step_fn()
ERROR 04-12 06:41:33 [core.py:390]               ^^^^^^^^^^^^^^
ERROR 04-12 06:41:33 [core.py:390]   File "/home/ryan/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 206, in step
ERROR 04-12 06:41:33 [core.py:390]     output = self.model_executor.execute_model(scheduler_output)
ERROR 04-12 06:41:33 [core.py:390]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-12 06:41:33 [core.py:390]   File "/home/ryan/venv/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 77, in execute_model
ERROR 04-12 06:41:33 [core.py:390]     output = self.collective_rpc("execute_model",
ERROR 04-12 06:41:33 [core.py:390]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-12 06:41:33 [core.py:390]   File "/home/ryan/venv/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
ERROR 04-12 06:41:33 [core.py:390]     answer = run_method(self.driver_worker, method, args, kwargs)
ERROR 04-12 06:41:33 [core.py:390]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-12 06:41:33 [core.py:390]   File "/home/ryan/venv/lib/python3.12/site-packages/vllm/utils.py", line 2347, in run_method
ERROR 04-12 06:41:33 [core.py:390]     return func(*args, **kwargs)
ERROR 04-12 06:41:33 [core.py:390]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 04-12 06:41:33 [core.py:390]   File "/home/ryan/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 04-12 06:41:33 [core.py:390]     return func(*args, **kwargs)
ERROR 04-12 06:41:33 [core.py:390]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 04-12 06:41:33 [core.py:390]   File "/home/ryan/venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 242, in execute_model
ERROR 04-12 06:41:33 [core.py:390]     output = self.model_runner.execute_model(scheduler_output)
ERROR 04-12 06:41:33 [core.py:390]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-12 06:41:33 [core.py:390]   File "/home/ryan/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 04-12 06:41:33 [core.py:390]     return func(*args, **kwargs)
ERROR 04-12 06:41:33 [core.py:390]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 04-12 06:41:33 [core.py:390]   File "/home/ryan/venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1070, in execute_model
ERROR 04-12 06:41:33 [core.py:390]     sampler_output = self.model.sample(
ERROR 04-12 06:41:33 [core.py:390]                      ^^^^^^^^^^^^^^^^^^
ERROR 04-12 06:41:33 [core.py:390]   File "/home/ryan/venv/lib/python3.12/site-packages/vllm/model_executor/models/llama.py", line 556, in sample
ERROR 04-12 06:41:33 [core.py:390]     next_tokens = self.sampler(logits, sampling_metadata)
ERROR 04-12 06:41:33 [core.py:390]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-12 06:41:33 [core.py:390]   File "/home/ryan/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
ERROR 04-12 06:41:33 [core.py:390]     return self._call_impl(*args, **kwargs)
ERROR 04-12 06:41:33 [core.py:390]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-12 06:41:33 [core.py:390]   File "/home/ryan/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
ERROR 04-12 06:41:33 [core.py:390]     return forward_call(*args, **kwargs)
ERROR 04-12 06:41:33 [core.py:390]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-12 06:41:33 [core.py:390]   File "/home/ryan/venv/lib/python3.12/site-packages/vllm/v1/sample/sampler.py", line 45, in forward
ERROR 04-12 06:41:33 [core.py:390]     logits = self.apply_logits_bias(logits, sampling_metadata)
ERROR 04-12 06:41:33 [core.py:390]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-12 06:41:33 [core.py:390]   File "/home/ryan/venv/lib/python3.12/site-packages/vllm/v1/sample/sampler.py", line 236, in apply_logits_bias
ERROR 04-12 06:41:33 [core.py:390]     logits[i, token_id] += bias
ERROR 04-12 06:41:33 [core.py:390]     ~~~~~~^^^^^^^^^^^^^
ERROR 04-12 06:41:33 [core.py:390] IndexError: index 141718 is out of bounds for dimension 1 with size 128256
ERROR 04-12 06:41:33 [core.py:390]
CRITICAL 04-12 06:41:33 [core_client.py:361] Got fatal signal from worker processes, shutting down. See stack trace above for root cause issue.
Killed

After this fix, we return a 400 bad request error in V1
{"object":"error","message":"token_id(s) [191222] in logit_bias contain out-of-vocab token ids. Vocabulary size: 128256","type":"BadRequestError","param":null,"code":400}

which is similar to the error in V0
{"object":"error","message":"token_id 191222 in logit_bias contains out-of-vocab token id","type":"BadRequestError","param":null,"code":400

Signed-off-by: Ryan McConville <[email protected]>

github-actions · 2025-04-12T07:08:49Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

mgoin

Thank you for the fix! Would you mind adding a test case for this?

Signed-off-by: Ryan McConville <[email protected]>

rymc · 2025-04-12T17:27:17Z

I added a test, let me know if there is anything else needed.

mgoin

Great work!

njhill · 2025-04-15T15:48:13Z

Apologies, I just came across this. I think we should be doing this check higher up as part of the request validation. Raising an error here will impact the entire batch and potentially destabilize things.

It's also adds unnecessary overhead on the engine critical path - better to do the check in the separate front-end process before it hits the engine.

…ngine (vllm-project#16529) Signed-off-by: Ryan McConville <[email protected]> Signed-off-by: Yang Wang <[email protected]>

…ngine (vllm-project#16529) Signed-off-by: Ryan McConville <[email protected]>

afeldman-nm · 2025-05-06T17:55:18Z

vllm/v1/sample/sampler.py

+
+        # Get vocabulary size from logits
+        vocab_size = logits.shape[-1]
+
        for i, logit_bias in enumerate(sampling_metadata.logit_bias):
            if logit_bias:
                for token_id, bias in logit_bias.items():
+                    # Check token_id bounds to ensure within vocabulary
+                    if token_id < 0 or token_id >= vocab_size:
+                        raise ValueError(
+                            f"token_id {token_id} in logit_bias contains "
+                            f"out-of-vocab token id. Vocabulary size: "
+                            f"{vocab_size}")


Hello @rymc question, why do we need out-of-vocab token id validation inside of the logits bias logits processor? It appears that this PR already added validation within the frontend (in processor.py); it appears that the _validate_logit_bias() method will be called for every new request that is submitted to both the sync and async engines. Would it be acceptable to remove this logit bias validation from the internals of the logit bias logits processor?

(I know this PR is merged already, I'm asking because #16728 emits this out-of-vocab token check from the logit bias logits processor but keeps the frontend validate_logit_bias() check`)

CC @njhill

I think when I made the above comment I may have missed the fact that this PR adds the validation in both places. Given that, all that's needed is to remove the validation within the sampler / omit it from the vectorized impl that we'll move to.

Ah, yes. The reason was because I wasn't confident there wasn't a path that ends up here without going through the frontend processor and I didn't want a crash. Given that it does, I'm happy to submit a new PR with the sampler validation removed. Let me know :)

@rymc thanks for the response :) most likely the omission of the redundant checks will be accomplished by #16728

…ngine (vllm-project#16529) Signed-off-by: Ryan McConville <[email protected]> Signed-off-by: Mu Huai <[email protected]>

rymc added 2 commits April 12, 2025 07:17

ensure logit biases within vocab range

c24217e

Signed-off-by: Ryan McConville <[email protected]>

linting

5980ab0

Signed-off-by: Ryan McConville <[email protected]>

rymc requested review from WoosukKwon, robertgshaw2-redhat, njhill, ywang96, comaniac and alexm-redhat as code owners April 12, 2025 07:08

mergify bot added the v1 label Apr 12, 2025

mgoin reviewed Apr 12, 2025

View reviewed changes

add tests for logit bias validation in chat completions api

1429405

Signed-off-by: Ryan McConville <[email protected]>

rymc requested review from DarkLight1337 and simon-mo as code owners April 12, 2025 17:25

mgoin approved these changes Apr 12, 2025

View reviewed changes

mgoin added ready ONLY add when PR is ready to merge/full CI is needed bug Something isn't working labels Apr 12, 2025

mgoin enabled auto-merge (squash) April 12, 2025 18:26

mgoin merged commit 6c11ecf into vllm-project:main Apr 12, 2025
59 checks passed

njhill mentioned this pull request Apr 15, 2025

[RFC][V1] LogitsProcessor interface #13360

Draft

jikunshang pushed a commit to jikunshang/vllm that referenced this pull request Apr 29, 2025

[Bugfix] Validate logit biases to prevent out of vocab ids crashing e…

1944195

…ngine (vllm-project#16529) Signed-off-by: Ryan McConville <[email protected]>

lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Apr 29, 2025

[Bugfix] Validate logit biases to prevent out of vocab ids crashing e…

396f19c

…ngine (vllm-project#16529) Signed-off-by: Ryan McConville <[email protected]>

afeldman-nm reviewed May 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] Validate logit biases to prevent out of vocab ids crashing engine #16529

[Bugfix] Validate logit biases to prevent out of vocab ids crashing engine #16529

Uh oh!

rymc commented Apr 12, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Apr 12, 2025

Uh oh!

mgoin left a comment

Uh oh!

rymc commented Apr 12, 2025

Uh oh!

mgoin left a comment

Uh oh!

Uh oh!

njhill commented Apr 15, 2025

Uh oh!

afeldman-nm May 6, 2025 •

edited

Loading

Uh oh!

njhill May 6, 2025

Uh oh!

rymc May 7, 2025

Uh oh!

afeldman-nm May 7, 2025

Uh oh!

Uh oh!

Uh oh!

[Bugfix] Validate logit biases to prevent out of vocab ids crashing engine #16529

[Bugfix] Validate logit biases to prevent out of vocab ids crashing engine #16529

Uh oh!

Conversation

rymc commented Apr 12, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 12, 2025

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

rymc commented Apr 12, 2025

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

njhill commented Apr 15, 2025

Uh oh!

afeldman-nm May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

njhill May 6, 2025

Choose a reason for hiding this comment

Uh oh!

rymc May 7, 2025

Choose a reason for hiding this comment

Uh oh!

afeldman-nm May 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rymc commented Apr 12, 2025 •

edited by github-actions bot

Loading

afeldman-nm May 6, 2025 •

edited

Loading