feat: add MultimodalParams & putting all multimodal params into it and refactor HyperCLOVAX & Qwen2/2.5-VL #5522

yechank-nvidia · 2025-06-26T15:02:37Z

Description

This PR adds mm_data element so that we could transfer easily mm_related_data.
Also, refactor the Qwen2/2.5-VL & HyperCLOVAX-Vision accordingly.
Especially, moving Qwen2/2.5-VL's encoder into the LLM.

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

tensorrt_llm/_torch/models/modeling_qwen2vl.py

tensorrt_llm/llmapi/llm.py

tensorrt_llm/executor/worker.py

tensorrt_llm/inputs/multimodal.py

yechank-nvidia · 2025-07-03T10:59:41Z

/bot run

tensorrt-cicd · 2025-07-03T11:04:53Z

PR_Github #10815 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-03T11:16:01Z

PR_Github #10815 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #7989 completed with status: 'FAILURE'

yechank-nvidia · 2025-07-03T11:34:07Z

/bot run

tensorrt-cicd · 2025-07-03T11:39:13Z

PR_Github #10820 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-03T13:01:35Z

PR_Github #10820 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #7994 completed with status: 'FAILURE'

yechank-nvidia · 2025-07-03T14:36:48Z

/bot run

tensorrt-cicd · 2025-07-03T14:41:52Z

PR_Github #10842 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-03T17:11:16Z

PR_Github #10842 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #8011 completed with status: 'FAILURE'

symphonylyh

LGTM

yechank-nvidia · 2025-07-04T06:20:15Z

/bot run

yechank-nvidia · 2025-07-04T06:21:07Z

/bot kill

tensorrt-cicd · 2025-07-04T06:25:44Z

PR_Github #10949 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-04T06:25:59Z

PR_Github #10950 [ kill ] triggered by Bot

tensorrt-cicd · 2025-07-04T06:26:00Z

PR_Github #10949 [ run ] completed with state ABORTED

Signed-off-by: yechank <[email protected]>

yechank-nvidia · 2025-07-07T16:23:16Z

/bot run

tensorrt-cicd · 2025-07-07T16:28:56Z

PR_Github #11166 [ run ] triggered by Bot

yechank-nvidia · 2025-07-07T16:33:03Z

/bot kill

tensorrt_llm/executor/worker.py

Signed-off-by: yechank <[email protected]>

tensorrt-cicd · 2025-07-07T16:38:31Z

PR_Github #11170 [ kill ] triggered by Bot

tensorrt-cicd · 2025-07-07T16:38:32Z

PR_Github #11166 [ run ] completed with state ABORTED

tensorrt-cicd · 2025-07-07T16:39:02Z

PR_Github #11170 [ kill ] completed with state SUCCESS
Successfully killed previous jobs for commit dcf6ab1

Signed-off-by: yechank <[email protected]>

yechank-nvidia · 2025-07-07T16:57:11Z

/bot run

tensorrt-cicd · 2025-07-07T17:02:20Z

PR_Github #11173 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-07T20:39:31Z

PR_Github #11173 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8264 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

chang-l · 2025-07-10T22:00:58Z

tensorrt_llm/_torch/models/modeling_qwen2vl.py

+        if len(multimodal_params) > 0:
+            if not DISAGG:
+                mm_embeds = self.mm_encoder.forward(
+                    multimodal_params[:num_context_requests])


Hi @yechank-nvidia, here if some of the ctx requests do not have multimodal items (non-mm request), would it crash?

It will not. If there is no mm_data found, I retuned input_ids directly so that the len(multimodal_params) = 0, in this case. So that, it skips calling Vision Encoder's forward() and do only LLM-forward.

extra_processed_inputs is None, is multimodal_params is None here

Right, I mean in one batch of all context requests, len(multimodal_parms) maybe less than num_context_requests right? (as some of the ctx request may pure text prompts)

I see. Then naively truncating to num_context_requests is not the right solution. Maybe in this case, it will crash. I just assumed there will be case of all text_prompts or all multimodal_prompts.

Let me think how I could handle this.

Not sure if it is a common or meaningful scenario though...

…d refactor HyperCLOVAX & Qwen2/2.5-VL (NVIDIA#5522) Signed-off-by: yechank <[email protected]> Signed-off-by: Yuxin <[email protected]>

Aze1999 · 2025-07-24T01:36:44Z

Hi, may I ask whether this means Qwen2.5-VL will be supported or have already been? Thanks for your patience!

yechank-nvidia · 2025-07-24T01:41:59Z

Hi @Aze1999, you can query either text-only prompts or multimodal-only prompts(meaning has image or video contents). It is fine unless you query mixture of these two.

I will file a PR to resolve this issue later!

Aze1999 · 2025-07-24T04:42:21Z

Hi @yechank-nvidia , big thanks for the fast response! Looking forward to the upcoming PR. Meanwhile, I will try to learn what I can help.

yechank-nvidia requested review from a team as code owners June 26, 2025 15:02

yechank-nvidia requested review from dongxuy04 and liji-nv June 26, 2025 15:02

yechank-nvidia self-assigned this Jun 26, 2025

yechank-nvidia commented Jun 26, 2025

View reviewed changes

tensorrt_llm/_torch/models/modeling_qwen2vl.py Outdated Show resolved Hide resolved

tensorrt_llm/llmapi/llm.py Outdated Show resolved Hide resolved

yechank-nvidia changed the title ~~feat: transfer mm_data and refactor HyperCLOVAX & Qwen2/2.5-VL~~ [DRAFT] feat: transfer mm_data and refactor HyperCLOVAX & Qwen2/2.5-VL Jun 26, 2025

yechank-nvidia marked this pull request as draft June 26, 2025 15:10

yechank-nvidia changed the title ~~[DRAFT] feat: transfer mm_data and refactor HyperCLOVAX & Qwen2/2.5-VL~~ [DRAFT] feat: transfer multimodal_data and refactor HyperCLOVAX & Qwen2/2.5-VL Jun 30, 2025

chang-l reviewed Jul 2, 2025

View reviewed changes

tensorrt_llm/executor/worker.py Show resolved Hide resolved

chang-l reviewed Jul 2, 2025

View reviewed changes

tensorrt_llm/inputs/multimodal.py Outdated Show resolved Hide resolved

yechank-nvidia force-pushed the multimodal_refactor branch from 294077e to ac0bb0f Compare July 3, 2025 10:58

yechank-nvidia changed the title ~~[DRAFT] feat: transfer multimodal_data and refactor HyperCLOVAX & Qwen2/2.5-VL~~ feat: transfer multimodal_data and refactor HyperCLOVAX & Qwen2/2.5-VL Jul 3, 2025

yechank-nvidia marked this pull request as ready for review July 3, 2025 11:35

symphonylyh approved these changes Jul 4, 2025

View reviewed changes

yechank-nvidia force-pushed the multimodal_refactor branch from 5a466d4 to a508aee Compare July 4, 2025 06:20

yechank-nvidia added 9 commits July 8, 2025 01:21

add MultimodalParams

24499b4

Signed-off-by: yechank <[email protected]>

multimodal_data into multimodal_params

b8f76b7

Signed-off-by: yechank <[email protected]>

address pin memory

105ad8f

Signed-off-by: yechank <[email protected]>

add example of multimodal_data and pre-commit

9602cdb

Signed-off-by: yechank <[email protected]>

move mrope_config and mm_embedding under MultimodalParams

e2b57d7

Signed-off-by: yechank <[email protected]>

add more description to MultimodalParams

00f82a0

Signed-off-by: yechank <[email protected]>

rebase fix

a7a4e19

Signed-off-by: yechank <[email protected]>

mrope_position_deltas fix

777a83d

Signed-off-by: yechank <[email protected]>

change image_url to video_url on video test

82cf62e

Signed-off-by: yechank <[email protected]>

yechank-nvidia force-pushed the multimodal_refactor branch from 441cc10 to 82cf62e Compare July 7, 2025 16:22

chang-l reviewed Jul 7, 2025

View reviewed changes

tensorrt_llm/executor/worker.py Show resolved Hide resolved

rebase fix

dcf6ab1

Signed-off-by: yechank <[email protected]>

chang-l approved these changes Jul 7, 2025

View reviewed changes

add comment

8c50d02

Signed-off-by: yechank <[email protected]>

symphonylyh mentioned this pull request Jul 7, 2025

[TRTLLM-4958][feat] improve multimodal workflow with Vision + LLM inflight batching #4673

Open

symphonylyh merged commit 5bc3a15 into NVIDIA:main Jul 8, 2025
3 checks passed

chang-l reviewed Jul 10, 2025

View reviewed changes

feat: add MultimodalParams & putting all multimodal params into it and refactor HyperCLOVAX & Qwen2/2.5-VL #5522

feat: add MultimodalParams & putting all multimodal params into it and refactor HyperCLOVAX & Qwen2/2.5-VL #5522

Uh oh!

Conversation

yechank-nvidia commented Jun 26, 2025

Description

Test Coverage

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yechank-nvidia commented Jul 3, 2025

Uh oh!

tensorrt-cicd commented Jul 3, 2025

Uh oh!

tensorrt-cicd commented Jul 3, 2025

Uh oh!

yechank-nvidia commented Jul 3, 2025

Uh oh!

tensorrt-cicd commented Jul 3, 2025

Uh oh!

tensorrt-cicd commented Jul 3, 2025

Uh oh!

yechank-nvidia commented Jul 3, 2025

Uh oh!

tensorrt-cicd commented Jul 3, 2025

Uh oh!

tensorrt-cicd commented Jul 3, 2025

Uh oh!

symphonylyh left a comment

Choose a reason for hiding this comment

Uh oh!

yechank-nvidia commented Jul 4, 2025

Uh oh!

yechank-nvidia commented Jul 4, 2025

Uh oh!

tensorrt-cicd commented Jul 4, 2025

Uh oh!

tensorrt-cicd commented Jul 4, 2025

Uh oh!

tensorrt-cicd commented Jul 4, 2025

Uh oh!

yechank-nvidia commented Jul 7, 2025

Uh oh!

tensorrt-cicd commented Jul 7, 2025

Uh oh!

yechank-nvidia commented Jul 7, 2025

Uh oh!

Uh oh!

tensorrt-cicd commented Jul 7, 2025

Uh oh!

tensorrt-cicd commented Jul 7, 2025

Uh oh!

tensorrt-cicd commented Jul 7, 2025

Uh oh!

yechank-nvidia commented Jul 7, 2025

Uh oh!

tensorrt-cicd commented Jul 7, 2025

Uh oh!

tensorrt-cicd commented Jul 7, 2025

Uh oh!

Uh oh!

chang-l Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

yechank-nvidia Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chang-l Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

yechank-nvidia Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

yechank-nvidia Jul 11, 2025 •

edited

Loading