Skip to content

Conversation

rymc
Copy link
Contributor

@rymc rymc commented Apr 12, 2025

In V1 when an out of vocab token ID is submitted within logit_bias in a request it will cause the engine to crash.

For example, the following request will lead to an engine crash as the token ID is out of the vocab for the model.

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Llama-3.1-8B-Instruct",
    "max_tokens": 4000,
    "logit_bias": {
      "191222": -100,
      "99999": 5
    },
    "messages": [
      {
        "role": "user",
        "content": "Hello, tell me about a nice dog."
      }
    ]
  }'
INFO 04-12 06:41:33 [async_llm.py:228] Added request chatcmpl-5da6fafa57b640e4835e057cf19021f2.
ERROR 04-12 06:41:33 [core.py:390] EngineCore hit an exception: Traceback (most recent call last):
ERROR 04-12 06:41:33 [core.py:390]   File "/home/ryan/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 383, in run_engine_core
ERROR 04-12 06:41:33 [core.py:390]     engine_core.run_busy_loop()
ERROR 04-12 06:41:33 [core.py:390]   File "/home/ryan/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 405, in run_busy_loop
ERROR 04-12 06:41:33 [core.py:390]     self._process_engine_step()
ERROR 04-12 06:41:33 [core.py:390]   File "/home/ryan/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 434, in _process_engine_step
ERROR 04-12 06:41:33 [core.py:390]     outputs = self.step_fn()
ERROR 04-12 06:41:33 [core.py:390]               ^^^^^^^^^^^^^^
ERROR 04-12 06:41:33 [core.py:390]   File "/home/ryan/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 206, in step
ERROR 04-12 06:41:33 [core.py:390]     output = self.model_executor.execute_model(scheduler_output)
ERROR 04-12 06:41:33 [core.py:390]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-12 06:41:33 [core.py:390]   File "/home/ryan/venv/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 77, in execute_model
ERROR 04-12 06:41:33 [core.py:390]     output = self.collective_rpc("execute_model",
ERROR 04-12 06:41:33 [core.py:390]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-12 06:41:33 [core.py:390]   File "/home/ryan/venv/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
ERROR 04-12 06:41:33 [core.py:390]     answer = run_method(self.driver_worker, method, args, kwargs)
ERROR 04-12 06:41:33 [core.py:390]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-12 06:41:33 [core.py:390]   File "/home/ryan/venv/lib/python3.12/site-packages/vllm/utils.py", line 2347, in run_method
ERROR 04-12 06:41:33 [core.py:390]     return func(*args, **kwargs)
ERROR 04-12 06:41:33 [core.py:390]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 04-12 06:41:33 [core.py:390]   File "/home/ryan/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 04-12 06:41:33 [core.py:390]     return func(*args, **kwargs)
ERROR 04-12 06:41:33 [core.py:390]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 04-12 06:41:33 [core.py:390]   File "/home/ryan/venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 242, in execute_model
ERROR 04-12 06:41:33 [core.py:390]     output = self.model_runner.execute_model(scheduler_output)
ERROR 04-12 06:41:33 [core.py:390]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-12 06:41:33 [core.py:390]   File "/home/ryan/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 04-12 06:41:33 [core.py:390]     return func(*args, **kwargs)
ERROR 04-12 06:41:33 [core.py:390]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 04-12 06:41:33 [core.py:390]   File "/home/ryan/venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1070, in execute_model
ERROR 04-12 06:41:33 [core.py:390]     sampler_output = self.model.sample(
ERROR 04-12 06:41:33 [core.py:390]                      ^^^^^^^^^^^^^^^^^^
ERROR 04-12 06:41:33 [core.py:390]   File "/home/ryan/venv/lib/python3.12/site-packages/vllm/model_executor/models/llama.py", line 556, in sample
ERROR 04-12 06:41:33 [core.py:390]     next_tokens = self.sampler(logits, sampling_metadata)
ERROR 04-12 06:41:33 [core.py:390]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-12 06:41:33 [core.py:390]   File "/home/ryan/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
ERROR 04-12 06:41:33 [core.py:390]     return self._call_impl(*args, **kwargs)
ERROR 04-12 06:41:33 [core.py:390]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-12 06:41:33 [core.py:390]   File "/home/ryan/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
ERROR 04-12 06:41:33 [core.py:390]     return forward_call(*args, **kwargs)
ERROR 04-12 06:41:33 [core.py:390]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-12 06:41:33 [core.py:390]   File "/home/ryan/venv/lib/python3.12/site-packages/vllm/v1/sample/sampler.py", line 45, in forward
ERROR 04-12 06:41:33 [core.py:390]     logits = self.apply_logits_bias(logits, sampling_metadata)
ERROR 04-12 06:41:33 [core.py:390]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-12 06:41:33 [core.py:390]   File "/home/ryan/venv/lib/python3.12/site-packages/vllm/v1/sample/sampler.py", line 236, in apply_logits_bias
ERROR 04-12 06:41:33 [core.py:390]     logits[i, token_id] += bias
ERROR 04-12 06:41:33 [core.py:390]     ~~~~~~^^^^^^^^^^^^^
ERROR 04-12 06:41:33 [core.py:390] IndexError: index 141718 is out of bounds for dimension 1 with size 128256
ERROR 04-12 06:41:33 [core.py:390]
CRITICAL 04-12 06:41:33 [core_client.py:361] Got fatal signal from worker processes, shutting down. See stack trace above for root cause issue.
Killed

After this fix, we return a 400 bad request error in V1
{"object":"error","message":"token_id(s) [191222] in logit_bias contain out-of-vocab token ids. Vocabulary size: 128256","type":"BadRequestError","param":null,"code":400}

which is similar to the error in V0
{"object":"error","message":"token_id 191222 in logit_bias contains out-of-vocab token id","type":"BadRequestError","param":null,"code":400

rymc added 2 commits April 12, 2025 07:17
Signed-off-by: Ryan McConville <[email protected]>
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added the v1 label Apr 12, 2025
Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fix! Would you mind adding a test case for this?

@rymc
Copy link
Contributor Author

rymc commented Apr 12, 2025

I added a test, let me know if there is anything else needed.

Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

@mgoin mgoin added ready ONLY add when PR is ready to merge/full CI is needed bug Something isn't working labels Apr 12, 2025
@mgoin mgoin enabled auto-merge (squash) April 12, 2025 18:26
@mgoin mgoin merged commit 6c11ecf into vllm-project:main Apr 12, 2025
59 checks passed
@njhill
Copy link
Member

njhill commented Apr 15, 2025

Apologies, I just came across this. I think we should be doing this check higher up as part of the request validation. Raising an error here will impact the entire batch and potentially destabilize things.

It's also adds unnecessary overhead on the engine critical path - better to do the check in the separate front-end process before it hits the engine.

yangw-dev pushed a commit to yangw-dev/vllm that referenced this pull request Apr 21, 2025
jikunshang pushed a commit to jikunshang/vllm that referenced this pull request Apr 29, 2025
lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Apr 29, 2025
Comment on lines +233 to +245

# Get vocabulary size from logits
vocab_size = logits.shape[-1]

for i, logit_bias in enumerate(sampling_metadata.logit_bias):
if logit_bias:
for token_id, bias in logit_bias.items():
# Check token_id bounds to ensure within vocabulary
if token_id < 0 or token_id >= vocab_size:
raise ValueError(
f"token_id {token_id} in logit_bias contains "
f"out-of-vocab token id. Vocabulary size: "
f"{vocab_size}")
Copy link
Contributor

@afeldman-nm afeldman-nm May 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @rymc question, why do we need out-of-vocab token id validation inside of the logits bias logits processor? It appears that this PR already added validation within the frontend (in processor.py); it appears that the _validate_logit_bias() method will be called for every new request that is submitted to both the sync and async engines. Would it be acceptable to remove this logit bias validation from the internals of the logit bias logits processor?

(I know this PR is merged already, I'm asking because #16728 emits this out-of-vocab token check from the logit bias logits processor but keeps the frontend validate_logit_bias() check`)

CC @njhill

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think when I made the above comment I may have missed the fact that this PR adds the validation in both places. Given that, all that's needed is to remove the validation within the sampler / omit it from the vectorized impl that we'll move to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes. The reason was because I wasn't confident there wasn't a path that ends up here without going through the frontend processor and I didn't want a crash. Given that it does, I'm happy to submit a new PR with the sampler validation removed. Let me know :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rymc thanks for the response :) most likely the omission of the redundant checks will be accomplished by #16728

RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed v1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants