[nvbug/5280806][fix] Fix 2 model spec decode flow #4807

mikeiovine · 2025-05-30T18:13:00Z

[fix] Fix 2 model spec decode flow

Description

Fix a small issue introduced by the feat/llama4 integration (#4739).

There was a bad merge:

In [fix] Fix a few issues with EAGLE3 in PyTorch backend #3686, we did some code cleanup on the 2 engine path. This line was confusing because of the hardcoded layer index:

spec_metadata.maybe_capture_hidden_states(1, hidden_states_to_save)

The fix was to capture the hidden states in the midlayer, which makes everything consistent with how hidden states are captured in the target model.

The feat/llama4 branch added these lines back by mistake. They don't work any more due to an API change.

The one model flow doesn't use this code path, so it was probably untested.

Also fixes another small issue introduced by #4379. The prompt_length was not being set correctly for extend_ctx cases.

Test Coverage

Re-enabled EAGLE3 2 model tests in L0.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

mikeiovine · 2025-06-02T13:13:22Z

/bot run

tensorrt-cicd · 2025-06-02T13:19:39Z

PR_Github #7215 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-02T16:14:13Z

PR_Github #7215 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5225 completed with status: 'FAILURE'

mikeiovine · 2025-06-02T17:03:06Z

/bot run

tensorrt-cicd · 2025-06-02T17:09:23Z

PR_Github #7233 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-02T19:13:35Z

PR_Github #7233 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5238 completed with status: 'FAILURE'

Signed-off-by: Mike Iovine <[email protected]>

mikeiovine · 2025-06-02T21:22:40Z

Fixed one more issue (extend_ctx method using isinstance instead of issubclass, causing the prefill kernels to be used instead of the new generation kernels on Blackwell).

Also uncovered one more bug. Added a workaround in the unit tests and opened https://nvbugspro.nvidia.com/bug/5314469 to follow up.

mikeiovine · 2025-06-02T21:22:52Z

/bot run

tensorrt-cicd · 2025-06-02T21:28:25Z

PR_Github #7240 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-03T04:10:52Z

PR_Github #7240 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5244 completed with status: 'FAILURE'

mikeiovine · 2025-06-03T13:39:11Z

/bot run

tensorrt-cicd · 2025-06-03T13:44:55Z

PR_Github #7357 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-03T15:33:45Z

PR_Github #7357 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5333 completed with status: 'FAILURE'

mikeiovine · 2025-06-03T15:39:47Z

/bot run

tensorrt-cicd · 2025-06-03T15:46:06Z

PR_Github #7376 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-03T17:05:32Z

PR_Github #7376 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5351 completed with status: 'FAILURE'

mikeiovine · 2025-06-04T14:13:59Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-06-04T14:19:49Z

PR_Github #7530 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-04T18:01:09Z

PR_Github #7530 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5462 completed with status: 'FAILURE'

mikeiovine · 2025-06-05T13:21:31Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-06-05T13:27:07Z

PR_Github #7742 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-05T17:29:08Z

PR_Github #7742 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5611 completed with status: 'FAILURE'

mikeiovine · 2025-06-05T21:08:01Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-06-05T21:14:43Z

PR_Github #7792 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-06T00:40:41Z

PR_Github #7792 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5643 completed with status: 'FAILURE'

mikeiovine · 2025-06-06T14:30:21Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-06-06T14:54:39Z

PR_Github #7915 [ run ] triggered by Bot

mikeiovine · 2025-06-06T20:49:41Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-06-06T20:56:27Z

PR_Github #7936 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-06T20:56:29Z

PR_Github #7915 [ run ] completed with state ABORTED

tensorrt-cicd · 2025-06-06T21:17:14Z

PR_Github #7936 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #5739 completed with status: 'FAILURE'

mikeiovine · 2025-06-07T17:48:46Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-06-07T17:55:07Z

PR_Github #7990 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-08T03:26:32Z

PR_Github #7990 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5789 completed with status: 'SUCCESS'

Signed-off-by: Mike Iovine <[email protected]>

mikeiovine requested review from lfr-0531 and nv-yilinf May 30, 2025 18:13

mikeiovine requested a review from a team as a code owner May 30, 2025 18:13

mikeiovine requested a review from litaotju May 30, 2025 18:13

nv-yilinf approved these changes May 30, 2025

View reviewed changes

hlu1 approved these changes May 30, 2025

View reviewed changes

lfr-0531 approved these changes Jun 2, 2025

View reviewed changes

mikeiovine changed the title ~~[fix] Fix 2 model spec decode flow~~ [nvbug/5280806][fix] Fix 2 model spec decode flow Jun 2, 2025

mikeiovine force-pushed the fix-2-model branch from 1ddd487 to 2532f1c Compare June 2, 2025 17:02

mikeiovine requested a review from a team as a code owner June 2, 2025 17:02

[fix] Fix 2 model spec decode flow

4636791

Signed-off-by: Mike Iovine <[email protected]>

mikeiovine force-pushed the fix-2-model branch from 2532f1c to 4636791 Compare June 2, 2025 21:22

Merge branch 'main' into fix-2-model

a278e68

Merge branch 'main' into fix-2-model

de42b92

IzzyPutterman mentioned this pull request Jun 4, 2025

[TRTLLM-3456] Speculation: Draft Target in new FW #4558

Merged

Merge branch 'main' into fix-2-model

ed487cd

Merge branch 'main' into fix-2-model

b0c1d93

Merge branch 'main' into fix-2-model

7384b3f

Merge branch 'main' into fix-2-model

e630868

Merge branch 'main' into fix-2-model

e0b43f9

mikeiovine merged commit ec0d984 into NVIDIA:main Jun 8, 2025
3 checks passed

mikeiovine deleted the fix-2-model branch June 8, 2025 11:40

crazydemo pushed a commit to crazydemo/TensorRT-LLM that referenced this pull request Jun 9, 2025

[nvbug/5280806][fix] Fix 2 model spec decode flow (NVIDIA#4807)

ee128b1

Signed-off-by: Mike Iovine <[email protected]>

mikeiovine mentioned this pull request Jun 9, 2025

[fix] Unwaive test_llama_eagle3 #5042

Merged

[nvbug/5280806][fix] Fix 2 model spec decode flow #4807

[nvbug/5280806][fix] Fix 2 model spec decode flow #4807

Uh oh!

Conversation

mikeiovine commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

[fix] Fix 2 model spec decode flow

Description

Test Coverage

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

mikeiovine commented Jun 2, 2025

Uh oh!

tensorrt-cicd commented Jun 2, 2025

Uh oh!

tensorrt-cicd commented Jun 2, 2025

Uh oh!

mikeiovine commented Jun 2, 2025

Uh oh!

tensorrt-cicd commented Jun 2, 2025

Uh oh!

tensorrt-cicd commented Jun 2, 2025

Uh oh!

mikeiovine commented Jun 2, 2025

Uh oh!

mikeiovine commented Jun 2, 2025

Uh oh!

tensorrt-cicd commented Jun 2, 2025

Uh oh!

tensorrt-cicd commented Jun 3, 2025

Uh oh!

mikeiovine commented Jun 3, 2025

Uh oh!

tensorrt-cicd commented Jun 3, 2025

Uh oh!

tensorrt-cicd commented Jun 3, 2025

Uh oh!

mikeiovine commented Jun 3, 2025

Uh oh!

tensorrt-cicd commented Jun 3, 2025

Uh oh!

tensorrt-cicd commented Jun 3, 2025

Uh oh!

mikeiovine commented Jun 4, 2025

Uh oh!

tensorrt-cicd commented Jun 4, 2025

Uh oh!

tensorrt-cicd commented Jun 4, 2025

Uh oh!

mikeiovine commented Jun 5, 2025

Uh oh!

tensorrt-cicd commented Jun 5, 2025

Uh oh!

tensorrt-cicd commented Jun 5, 2025

Uh oh!

mikeiovine commented Jun 5, 2025

Uh oh!

tensorrt-cicd commented Jun 5, 2025

Uh oh!

tensorrt-cicd commented Jun 6, 2025

Uh oh!

mikeiovine commented Jun 6, 2025

Uh oh!

tensorrt-cicd commented Jun 6, 2025

Uh oh!

mikeiovine commented Jun 6, 2025

Uh oh!

tensorrt-cicd commented Jun 6, 2025

Uh oh!

tensorrt-cicd commented Jun 6, 2025

Uh oh!

tensorrt-cicd commented Jun 6, 2025

Uh oh!

mikeiovine commented Jun 7, 2025

Uh oh!

tensorrt-cicd commented Jun 7, 2025

Uh oh!

mikeiovine commented May 30, 2025 •

edited

Loading