Skip to content

Conversation

ExtReMLapin
Copy link

@ExtReMLapin ExtReMLapin commented Sep 2, 2025

Purpose

fixed reasoning not being sent to client when tool_choice="required"

closes #14429
kinda fixes #21026

Test Plan

Added one test to ensure it's returned in streamed data pytest ./tests/entrypoints/openai/test_completion_with_function_calling.py

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify mergify bot added the frontend label Sep 2, 2025
@ExtReMLapin ExtReMLapin force-pushed the streaming_tool_required_true branch from a0c7ec3 to e95416c Compare September 3, 2025 13:12
@ExtReMLapin ExtReMLapin marked this pull request as ready for review September 3, 2025 13:13
@ExtReMLapin ExtReMLapin requested a review from aarnphm as a code owner September 3, 2025 13:13
@ExtReMLapin
Copy link
Author

Tested on multiple qwen models + tool

  • Qwen 3 with reasoning
  • Qwen 3 with reasoning disabled
  • Qwen 2.5

@ExtReMLapin ExtReMLapin force-pushed the streaming_tool_required_true branch from 58cfd8a to 17853a1 Compare September 4, 2025 13:47
Copy link

mergify bot commented Sep 8, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ExtReMLapin.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Sep 8, 2025
CNE Pierre FICHEPOIL added 2 commits September 12, 2025 12:08
Signed-off-by: CNE Pierre FICHEPOIL <[email protected]>
Signed-off-by: CNE Pierre FICHEPOIL <[email protected]>
@ExtReMLapin ExtReMLapin force-pushed the streaming_tool_required_true branch from 02a8dde to 4d8d81c Compare September 12, 2025 12:09
@ExtReMLapin
Copy link
Author

@DarkLight1337 @heheda12345
@simon-mo

I'm not sure exactly who to ping to get it reviewed

@DarkLight1337
Copy link
Member

cc @aarnphm @chaunceyjiang

Copy link
Collaborator

@chaunceyjiang chaunceyjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reasoning not being sent to client when tool_choice="required"

Could you provide a reproduction step?

The combination of stream + enable_thinking + required has been continuously tested in e2e.

https://github.com/vllm-project/vllm/blob/main/tests/entrypoints/openai/test_completion_with_function_calling.py#L165-L177

@ExtReMLapin
Copy link
Author

ExtReMLapin commented Sep 12, 2025

@chaunceyjiang there is no assert/check/test in the stream mode for reasoning

https://github.com/vllm-project/vllm/blob/main/tests/entrypoints/openai/test_completion_with_function_calling.py#L218

master/HEAD :

image image

(See it directly goes into tool call , something like 10 seconds between first message and tool call start)

query.js


This branch :

image

@ExtReMLapin
Copy link
Author

ExtReMLapin commented Sep 12, 2025

Also this PR cover both forced reasoning models (like Qwen3 2507 that doesn't output opening reasoning tag) and original ones that outputs both opening and closing reasoning tags

Copy link
Collaborator

@chaunceyjiang chaunceyjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CNE Pierre FICHEPOIL added 2 commits September 15, 2025 15:20
Signed-off-by: CNE Pierre FICHEPOIL <[email protected]>
@ExtReMLapin
Copy link
Author

Got it for the changes.

In the tests i'm having a weird issue where

        output = []
        reasoning = []
        async for chunk in output_stream:
            if chunk.choices:
                if enable_thinking and chunk.choices[0].delta.reasoning_content:
                    reasoning.append(chunk.choices[0].delta.reasoning_content)
                if chunk.choices[0].delta.tool_calls:
                    output.extend(chunk.choices[0].delta.tool_calls)

        assert len(output) > 0
        if enable_thinking:
            assert len(reasoning) > 0

Doesn't work because the openai class doesn't have this param, and I don't understand why it doesn't error with the non stream part.

So instead I moved to if enable_thinking and getattr(chunk.choices[0].delta, "reasoning_content", None): not sure if it's good.

@ExtReMLapin ExtReMLapin marked this pull request as draft September 15, 2025 16:14
@ExtReMLapin
Copy link
Author

And something's broken with non reasoning models so i'll fix it when I get back from vacations

Copy link

mergify bot commented Sep 17, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ExtReMLapin.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Sep 17, 2025
@ExtReMLapin ExtReMLapin marked this pull request as ready for review September 17, 2025 12:30
@mergify mergify bot removed the needs-rebase label Sep 17, 2025
@ExtReMLapin
Copy link
Author

ExtReMLapin commented Sep 17, 2025

Before releasing the last fix, i triple checked with

  • Qwen 3 (qwen3 reasoning)
  • Qwen 3 instruct
  • Qwen 3 thinking (deepseek_r1 reasoning)
  • Qwen/QwQ-32B-AWQ

vllm_last/tests/entrypoints/openai/test_completion.py ............................ssssssssss......sss.................................ssssssssss......sss.....                                                                        [100%]

================================================================================================ 78 passed, 26 skipped in 318.63s (0:05:18) =================================================================================================

vllm_last/tests/entrypoints/openai/test_completion_with_function_calling.py ..............                                                                                                                                            [100%]

====================================================================================================== 14 passed in 130.91s (0:02:10) =======================================================================================================

@ExtReMLapin
Copy link
Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes an issue where reasoning content was not streamed correctly when tool_choice="required". The fix involves using the correct streaming-aware function for extracting reasoning content. The associated test is also updated to verify this behavior.

My review focuses on the maintainability of the fix. While the fix is correct, it introduces code duplication for handling reasoning streaming across different tool_choice scenarios. I've suggested refactoring this duplicated logic into a helper function to improve code clarity and reduce the risk of future inconsistencies.

@ExtReMLapin
Copy link
Author

Considering the precommit warning linked to the values of reasoning_end_arr in

        if tool_choice_auto or self.reasoning_parser:
            # These are only required in "auto" tool choice case
            all_previous_token_ids = [[]] * num_choices
            # For reasoning parser and tool call all enabled
            added_content_delta_arr = [False] * num_choices
            reasoning_end_arr = [False] * num_choices
        else:
            all_previous_token_ids = None
            reasoning_end_arr = None

Would you be fine with reasoning_end_arr = [False] * num_choices being initialized either way outside of the if ? @chaunceyjiang

@chaunceyjiang
Copy link
Collaborator

Would you be fine with reasoning_end_arr = [False] * num_choices being initialized either way outside of the if ? @chaunceyjiang

I haven’t reviewed your PR carefully yet, but my understanding is that reasoning_end_arr should only be used when if self.reasoning_parser is True.

…periment, and removed added parentheses

Signed-off-by: CNE Pierre FICHEPOIL <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants