fixed reasoning streaming with tool_choice="required" #24108

ExtReMLapin · 2025-09-02T15:26:15Z

Purpose

fixed reasoning not being sent to client when tool_choice="required"

closes #14429
kinda fixes #21026

Test Plan

Added one test to ensure it's returned in streamed data pytest ./tests/entrypoints/openai/test_completion_with_function_calling.py

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

ExtReMLapin · 2025-09-03T13:14:03Z

Tested on multiple qwen models + tool

Qwen 3 with reasoning
Qwen 3 with reasoning disabled
Qwen 2.5

Signed-off-by: CNE Pierre FICHEPOIL <[email protected]>

mergify · 2025-09-08T13:57:22Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ExtReMLapin.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: ExtReMLapin <[email protected]>

Signed-off-by: CNE Pierre FICHEPOIL <[email protected]>

ExtReMLapin · 2025-09-12T12:25:22Z

@DarkLight1337 @heheda12345
@simon-mo

I'm not sure exactly who to ping to get it reviewed

DarkLight1337 · 2025-09-12T12:35:47Z

cc @aarnphm @chaunceyjiang

chaunceyjiang

reasoning not being sent to client when tool_choice="required"

Could you provide a reproduction step?

The combination of stream + enable_thinking + required has been continuously tested in e2e.

https://github.com/vllm-project/vllm/blob/main/tests/entrypoints/openai/test_completion_with_function_calling.py#L165-L177

ExtReMLapin · 2025-09-12T14:44:33Z

@chaunceyjiang there is no assert/check/test in the stream mode for reasoning

https://github.com/vllm-project/vllm/blob/main/tests/entrypoints/openai/test_completion_with_function_calling.py#L218

master/HEAD :

(See it directly goes into tool call , something like 10 seconds between first message and tool call start)

query.js

This branch :

ExtReMLapin · 2025-09-12T14:46:16Z

Also this PR cover both forced reasoning models (like Qwen3 2507 that doesn't output opening reasoning tag) and original ones that outputs both opening and closing reasoning tags

chaunceyjiang

Good catch.

Can you add a check for this here: https://github.com/vllm-project/vllm/blob/main/tests/entrypoints/openai/test_completion_with_function_calling.py#L165-L177?

vllm/entrypoints/openai/serving_chat.py

Signed-off-by: CNE Pierre FICHEPOIL <[email protected]>

ExtReMLapin · 2025-09-15T15:53:25Z

Got it for the changes.

In the tests i'm having a weird issue where

        output = []
        reasoning = []
        async for chunk in output_stream:
            if chunk.choices:
                if enable_thinking and chunk.choices[0].delta.reasoning_content:
                    reasoning.append(chunk.choices[0].delta.reasoning_content)
                if chunk.choices[0].delta.tool_calls:
                    output.extend(chunk.choices[0].delta.tool_calls)

        assert len(output) > 0
        if enable_thinking:
            assert len(reasoning) > 0

Doesn't work because the openai class doesn't have this param, and I don't understand why it doesn't error with the non stream part.

So instead I moved to if enable_thinking and getattr(chunk.choices[0].delta, "reasoning_content", None): not sure if it's good.

ExtReMLapin · 2025-09-15T16:15:06Z

And something's broken with non reasoning models so i'll fix it when I get back from vacations

…dded) Signed-off-by: CNE Pierre FICHEPOIL <[email protected]>

mergify · 2025-09-17T11:46:37Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ExtReMLapin.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: ExtReMLapin <[email protected]>

ExtReMLapin · 2025-09-17T12:32:07Z

Before releasing the last fix, i triple checked with

Qwen 3 (qwen3 reasoning)
Qwen 3 instruct
Qwen 3 thinking (deepseek_r1 reasoning)
Qwen/QwQ-32B-AWQ


vllm_last/tests/entrypoints/openai/test_completion.py ............................ssssssssss......sss.................................ssssssssss......sss.....                                                                        [100%]

================================================================================================ 78 passed, 26 skipped in 318.63s (0:05:18) =================================================================================================


vllm_last/tests/entrypoints/openai/test_completion_with_function_calling.py ..............                                                                                                                                            [100%]

====================================================================================================== 14 passed in 130.91s (0:02:10) =======================================================================================================

ExtReMLapin · 2025-09-17T17:47:24Z

/gemini review

gemini-code-assist

Code Review

This pull request fixes an issue where reasoning content was not streamed correctly when tool_choice="required". The fix involves using the correct streaming-aware function for extracting reasoning content. The associated test is also updated to verify this behavior.

My review focuses on the maintainability of the fix. While the fix is correct, it introduces code duplication for handling reasoning streaming across different tool_choice scenarios. I've suggested refactoring this duplicated logic into a helper function to improve code clarity and reduce the risk of future inconsistencies.

ExtReMLapin · 2025-09-18T05:47:17Z

Considering the precommit warning linked to the values of reasoning_end_arr in

        if tool_choice_auto or self.reasoning_parser:
            # These are only required in "auto" tool choice case
            all_previous_token_ids = [[]] * num_choices
            # For reasoning parser and tool call all enabled
            added_content_delta_arr = [False] * num_choices
            reasoning_end_arr = [False] * num_choices
        else:
            all_previous_token_ids = None
            reasoning_end_arr = None

Would you be fine with reasoning_end_arr = [False] * num_choices being initialized either way outside of the if ? @chaunceyjiang

chaunceyjiang · 2025-09-18T05:57:02Z

Would you be fine with reasoning_end_arr = [False] * num_choices being initialized either way outside of the if ? @chaunceyjiang

I haven’t reviewed your PR carefully yet, but my understanding is that reasoning_end_arr should only be used when if self.reasoning_parser is True.

…periment, and removed added parentheses Signed-off-by: CNE Pierre FICHEPOIL <[email protected]>

mergify bot added the frontend label Sep 2, 2025

ExtReMLapin force-pushed the streaming_tool_required_true branch from a0c7ec3 to e95416c Compare September 3, 2025 13:12

ExtReMLapin marked this pull request as ready for review September 3, 2025 13:13

ExtReMLapin requested a review from aarnphm as a code owner September 3, 2025 13:13

fixed reasoning streaming with tool_choice="required"

17853a1

Signed-off-by: CNE Pierre FICHEPOIL <[email protected]>

ExtReMLapin force-pushed the streaming_tool_required_true branch from 58cfd8a to 17853a1 Compare September 4, 2025 13:47

mergify bot added the needs-rebase label Sep 8, 2025

Merge branch 'main' into streaming_tool_required_true

c1919a7

Signed-off-by: ExtReMLapin <[email protected]>

ExtReMLapin requested a review from chaunceyjiang as a code owner September 12, 2025 05:55

mergify bot removed the needs-rebase label Sep 12, 2025

CNE Pierre FICHEPOIL added 2 commits September 12, 2025 12:08

oops

1a5780e

Signed-off-by: CNE Pierre FICHEPOIL <[email protected]>

faken precommit forgotten

4d8d81c

Signed-off-by: CNE Pierre FICHEPOIL <[email protected]>

ExtReMLapin force-pushed the streaming_tool_required_true branch from 02a8dde to 4d8d81c Compare September 12, 2025 12:09

chaunceyjiang reviewed Sep 12, 2025

View reviewed changes

chaunceyjiang reviewed Sep 15, 2025

View reviewed changes

CNE Pierre FICHEPOIL added 2 commits September 15, 2025 15:20

reverted useless changes, weird

e89b484

Signed-off-by: CNE Pierre FICHEPOIL <[email protected]>

fixed reasoning streaming with tool_choice=required

b62941d

Signed-off-by: CNE Pierre FICHEPOIL <[email protected]>

ExtReMLapin requested review from DarkLight1337, robertgshaw2-redhat, simon-mo and NickLucche as code owners September 15, 2025 15:52

ExtReMLapin marked this pull request as draft September 15, 2025 16:14

fixed streaming toolcall with non reasoning model (duplicated delta a…

003fe59

…dded) Signed-off-by: CNE Pierre FICHEPOIL <[email protected]>

mergify bot added the needs-rebase label Sep 17, 2025

Merge branch 'main' into streaming_tool_required_true

9533578

Signed-off-by: ExtReMLapin <[email protected]>

ExtReMLapin marked this pull request as ready for review September 17, 2025 12:30

mergify bot removed the needs-rebase label Sep 17, 2025

ExtReMLapin requested a review from chaunceyjiang September 17, 2025 12:32

gemini-code-assist bot reviewed Sep 17, 2025

View reviewed changes

removed useless reasoning_end_arr I forgot to remove from old code ex…

92fda09

…periment, and removed added parentheses Signed-off-by: CNE Pierre FICHEPOIL <[email protected]>

Uh oh!

fixed reasoning streaming with tool_choice="required" #24108

Are you sure you want to change the base?

fixed reasoning streaming with tool_choice="required" #24108

Conversation

ExtReMLapin commented Sep 2, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

ExtReMLapin commented Sep 3, 2025

Uh oh!

mergify bot commented Sep 8, 2025

Uh oh!

ExtReMLapin commented Sep 12, 2025

Uh oh!

DarkLight1337 commented Sep 12, 2025

Uh oh!

chaunceyjiang left a comment

Choose a reason for hiding this comment

Uh oh!

ExtReMLapin commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ExtReMLapin commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chaunceyjiang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ExtReMLapin commented Sep 15, 2025

Uh oh!

ExtReMLapin commented Sep 15, 2025

Uh oh!

mergify bot commented Sep 17, 2025

Uh oh!

ExtReMLapin commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ExtReMLapin commented Sep 17, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

ExtReMLapin commented Sep 18, 2025

Uh oh!

chaunceyjiang commented Sep 18, 2025

Uh oh!

Uh oh!

ExtReMLapin commented Sep 2, 2025 •

edited by github-actions bot

Loading

ExtReMLapin commented Sep 12, 2025 •

edited

Loading

ExtReMLapin commented Sep 12, 2025 •

edited

Loading

ExtReMLapin commented Sep 17, 2025 •

edited

Loading