-
-
Notifications
You must be signed in to change notification settings - Fork 10.4k
fixed reasoning streaming with tool_choice="required" #24108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fixed reasoning streaming with tool_choice="required" #24108
Conversation
a0c7ec3
to
e95416c
Compare
Tested on multiple qwen models + tool
|
Signed-off-by: CNE Pierre FICHEPOIL <[email protected]>
58cfd8a
to
17853a1
Compare
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: ExtReMLapin <[email protected]>
Signed-off-by: CNE Pierre FICHEPOIL <[email protected]>
Signed-off-by: CNE Pierre FICHEPOIL <[email protected]>
02a8dde
to
4d8d81c
Compare
@DarkLight1337 @heheda12345 I'm not sure exactly who to ping to get it reviewed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reasoning not being sent to client when tool_choice="required"
Could you provide a reproduction step?
The combination of stream + enable_thinking + required
has been continuously tested in e2e.
@chaunceyjiang there is no assert/check/test in the stream mode for reasoning master/HEAD : ![]() ![]() (See it directly goes into tool call , something like 10 seconds between first message and tool call start) This branch : ![]() |
Also this PR cover both forced reasoning models (like Qwen3 2507 that doesn't output opening reasoning tag) and original ones that outputs both opening and closing reasoning tags |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.
Can you add a check for this here: https://github.com/vllm-project/vllm/blob/main/tests/entrypoints/openai/test_completion_with_function_calling.py#L165-L177?
Signed-off-by: CNE Pierre FICHEPOIL <[email protected]>
Signed-off-by: CNE Pierre FICHEPOIL <[email protected]>
Got it for the changes. In the tests i'm having a weird issue where output = []
reasoning = []
async for chunk in output_stream:
if chunk.choices:
if enable_thinking and chunk.choices[0].delta.reasoning_content:
reasoning.append(chunk.choices[0].delta.reasoning_content)
if chunk.choices[0].delta.tool_calls:
output.extend(chunk.choices[0].delta.tool_calls)
assert len(output) > 0
if enable_thinking:
assert len(reasoning) > 0 Doesn't work because the openai class doesn't have this param, and I don't understand why it doesn't error with the non stream part. So instead I moved to |
And something's broken with non reasoning models so i'll fix it when I get back from vacations |
…dded) Signed-off-by: CNE Pierre FICHEPOIL <[email protected]>
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: ExtReMLapin <[email protected]>
Before releasing the last fix, i triple checked with
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request fixes an issue where reasoning content was not streamed correctly when tool_choice="required"
. The fix involves using the correct streaming-aware function for extracting reasoning content. The associated test is also updated to verify this behavior.
My review focuses on the maintainability of the fix. While the fix is correct, it introduces code duplication for handling reasoning streaming across different tool_choice
scenarios. I've suggested refactoring this duplicated logic into a helper function to improve code clarity and reduce the risk of future inconsistencies.
Considering the precommit warning linked to the values of if tool_choice_auto or self.reasoning_parser:
# These are only required in "auto" tool choice case
all_previous_token_ids = [[]] * num_choices
# For reasoning parser and tool call all enabled
added_content_delta_arr = [False] * num_choices
reasoning_end_arr = [False] * num_choices
else:
all_previous_token_ids = None
reasoning_end_arr = None Would you be fine with |
I haven’t reviewed your PR carefully yet, but my understanding is that |
…periment, and removed added parentheses Signed-off-by: CNE Pierre FICHEPOIL <[email protected]>
Purpose
fixed reasoning not being sent to client when
tool_choice="required"
closes #14429
kinda fixes #21026
Test Plan
Added one test to ensure it's returned in streamed data
pytest ./tests/entrypoints/openai/test_completion_with_function_calling.py
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.