-
-
Notifications
You must be signed in to change notification settings - Fork 10.6k
[Bugfix] Validate logit biases to prevent out of vocab ids crashing engine #16529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Ryan McConville <[email protected]>
Signed-off-by: Ryan McConville <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the fix! Would you mind adding a test case for this?
Signed-off-by: Ryan McConville <[email protected]>
I added a test, let me know if there is anything else needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
Apologies, I just came across this. I think we should be doing this check higher up as part of the request validation. Raising an error here will impact the entire batch and potentially destabilize things. It's also adds unnecessary overhead on the engine critical path - better to do the check in the separate front-end process before it hits the engine. |
…ngine (vllm-project#16529) Signed-off-by: Ryan McConville <[email protected]> Signed-off-by: Yang Wang <[email protected]>
…ngine (vllm-project#16529) Signed-off-by: Ryan McConville <[email protected]>
…ngine (vllm-project#16529) Signed-off-by: Ryan McConville <[email protected]>
|
||
# Get vocabulary size from logits | ||
vocab_size = logits.shape[-1] | ||
|
||
for i, logit_bias in enumerate(sampling_metadata.logit_bias): | ||
if logit_bias: | ||
for token_id, bias in logit_bias.items(): | ||
# Check token_id bounds to ensure within vocabulary | ||
if token_id < 0 or token_id >= vocab_size: | ||
raise ValueError( | ||
f"token_id {token_id} in logit_bias contains " | ||
f"out-of-vocab token id. Vocabulary size: " | ||
f"{vocab_size}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @rymc question, why do we need out-of-vocab token id validation inside of the logits bias logits processor? It appears that this PR already added validation within the frontend (in processor.py
); it appears that the _validate_logit_bias()
method will be called for every new request that is submitted to both the sync and async engines. Would it be acceptable to remove this logit bias validation from the internals of the logit bias logits processor?
(I know this PR is merged already, I'm asking because #16728 emits this out-of-vocab token check from the logit bias logits processor but keeps the frontend validate_logit_bias()
check`)
CC @njhill
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think when I made the above comment I may have missed the fact that this PR adds the validation in both places. Given that, all that's needed is to remove the validation within the sampler / omit it from the vectorized impl that we'll move to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes. The reason was because I wasn't confident there wasn't a path that ends up here without going through the frontend processor and I didn't want a crash. Given that it does, I'm happy to submit a new PR with the sampler validation removed. Let me know :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…ngine (vllm-project#16529) Signed-off-by: Ryan McConville <[email protected]> Signed-off-by: Mu Huai <[email protected]>
In V1 when an out of vocab token ID is submitted within logit_bias in a request it will cause the engine to crash.
For example, the following request will lead to an engine crash as the token ID is out of the vocab for the model.
After this fix, we return a 400 bad request error in V1
{"object":"error","message":"token_id(s) [191222] in logit_bias contain out-of-vocab token ids. Vocabulary size: 128256","type":"BadRequestError","param":null,"code":400}
which is similar to the error in V0
{"object":"error","message":"token_id 191222 in logit_bias contains out-of-vocab token id","type":"BadRequestError","param":null,"code":400