fix: fix accuracy and illegal memory access issues when using mtp + attention dp #4379

lfr-0531 · 2025-05-16T01:35:41Z

Description

Fixed two issues:

MTP + attention dp illegal memory access
MTP + attention dp + overlap scheduler accuracy drop

The illegal memory access is caused by the token_num in add_dummy_requests. With token_num=1, the prompt_len will be zero. But our attention kernel cannot handle such a special case, so we got illegal memory access. The fix in this PR should be a workaround to make Attention DP+MTP work.

The accuracy drop is because we didn't correctly prepare the inputs for the dummy requests when enabling overlap scheduler and speculative decoding.

Other changes:

Change the default py_draft_tokens to an empty list.

NOTE: there is still an illegal memory access issue when running DeepSeek-V3-Lite MTP with tp + autotuner + cuda graph + overlap scheduler. We track this issue in nvbug 5304040.

Test Coverage

The waived tests are added back.

lfr-0531 · 2025-05-16T01:44:18Z

/bot run

tensorrt-cicd · 2025-05-16T01:52:15Z

PR_Github #5431 [ run ] triggered by Bot

tensorrt_llm/_torch/pyexecutor/py_executor.py

tensorrt_llm/_torch/pyexecutor/resource_manager.py

lfr-0531 · 2025-05-16T02:30:07Z

/bot kill

tensorrt-cicd · 2025-05-16T02:36:30Z

PR_Github #5441 [ kill ] triggered by Bot

tensorrt-cicd · 2025-05-16T02:36:33Z

PR_Github #5431 [ run ] completed with state ABORTED

tensorrt-cicd · 2025-05-16T02:37:04Z

PR_Github #5441 [ kill ] completed with state SUCCESS
Successfully killed previous jobs for commit f10d532

lfr-0531 · 2025-05-16T02:38:33Z

/bot run

tensorrt-cicd · 2025-05-16T02:45:10Z

PR_Github #5442 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-16T06:28:54Z

PR_Github #5442 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3972 completed with status: 'FAILURE'

lfr-0531 · 2025-05-23T12:16:10Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-05-23T12:21:17Z

PR_Github #6297 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-23T15:45:23Z

PR_Github #6297 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4602 completed with status: 'FAILURE'

PerkzZheng · 2025-05-25T07:55:38Z

@lfr-0531 have all the illegal memory access issues been resolved or the one you mentioned still exists ? thanks.

lfr-0531 · 2025-05-25T10:36:28Z

@lfr-0531 have all the illegal memory access issues been resolved or the one you mentioned still exists ? thanks.

No, we haven't fixed it. But looks like we didn't cover this special case in our test list before. I mean the setting mtp_nextn=2-attention_dp=False-cuda_graph=True-overlap_scheduler=True. I want to merge this PR first so that we can have other waived tests back. And we can track this special case in a new nvbug and fix it later.

@PerkzZheng, what do you think?

lfr-0531 · 2025-05-25T10:36:53Z

/bot run

PerkzZheng · 2025-05-25T10:37:49Z

@lfr-0531 have all the illegal memory access issues been resolved or the one you mentioned still exists ? thanks.

No, we haven't fixed it. But looks like we didn't cover this special case in our test list before. I mean the setting mtp_nextn=2-attention_dp=False-cuda_graph=True-overlap_scheduler=True. I want to merge this PR first so that we can have other waived tests back. And we can track this special case in a new nvbug and fix it later.

@PerkzZheng, what do you think?

that makes sense to me.

tensorrt-cicd · 2025-05-25T10:42:04Z

PR_Github #6382 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-25T13:14:30Z

PR_Github #6382 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4664 completed with status: 'FAILURE'

Signed-off-by: Fanrong Li <[email protected]>

…anatory comments. Signed-off-by: Fanrong Li <[email protected]>

Signed-off-by: Fanrong Li <[email protected]>

tensorrt-cicd · 2025-06-01T08:23:27Z

PR_Github #7154 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5170 completed with status: 'SUCCESS'

lfr-0531 · 2025-06-01T08:28:31Z

/bot run

lfr-0531 · 2025-06-01T08:29:10Z

Got another conflict. Rerun the pipeline.

tensorrt-cicd · 2025-06-01T08:34:02Z

PR_Github #7160 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-01T14:15:08Z

PR_Github #7160 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5175 completed with status: 'FAILURE'

lfr-0531 · 2025-06-01T14:25:40Z

/bot run

tensorrt-cicd · 2025-06-01T14:31:57Z

PR_Github #7170 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-01T16:35:51Z

PR_Github #7170 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5185 completed with status: 'SUCCESS'

…ttention dp (NVIDIA#4379) Signed-off-by: Fanrong Li <[email protected]>

Signed-off-by: Fanrong Li <[email protected]>

…ttention dp (NVIDIA#4379) Signed-off-by: Fanrong Li <[email protected]> Signed-off-by: darraghdog <[email protected]>

Signed-off-by: Fanrong Li <[email protected]>

lfr-0531 requested review from schetlur-nv and zhhuang-nv May 16, 2025 01:43

zhhuang-nv reviewed May 16, 2025

View reviewed changes

tensorrt_llm/_torch/pyexecutor/py_executor.py Outdated Show resolved Hide resolved

tensorrt_llm/_torch/pyexecutor/resource_manager.py Outdated Show resolved Hide resolved

tensorrt_llm/_torch/pyexecutor/resource_manager.py Outdated Show resolved Hide resolved

zhhuang-nv approved these changes May 16, 2025

View reviewed changes

lfr-0531 force-pushed the user/fanrongl/fix_input_ids_in_dummy_req branch from 7072ea3 to 7437b2b Compare May 22, 2025 05:38

lfr-0531 requested a review from a team as a code owner May 22, 2025 05:38

lfr-0531 requested review from mikeiovine and lucaslie May 22, 2025 05:38

lfr-0531 mentioned this pull request May 23, 2025

fix: fix the speculative decoding accuracy issue when using cuda graph padded requests (dummy requests) and enabling overlap scheduler #4301

Closed

lfr-0531 force-pushed the user/fanrongl/fix_input_ids_in_dummy_req branch from 0a2ffda to 3ee3361 Compare May 23, 2025 10:27

lfr-0531 changed the title ~~fix: fix input ids when adding dummy requests~~ fix: fix accuracy and illegal memory access issues when using mtp + attention dp May 23, 2025

lfr-0531 requested a review from PerkzZheng May 23, 2025 12:39

lfr-0531 added 11 commits June 1, 2025 01:16

fix input ids when adding dummy requests.

e799438

Signed-off-by: Fanrong Li <[email protected]>

fix token_nums.

06c2d03

Signed-off-by: Fanrong Li <[email protected]>

fix the dummy requests + overlap scheduler + spec decoding issue.

f7f5c37

Signed-off-by: Fanrong Li <[email protected]>

fix prepare inputs for the dummy requests in dp+mtp.

bdd28b5

Signed-off-by: Fanrong Li <[email protected]>

fix prompt_len.

2d4ace6

Signed-off-by: Fanrong Li <[email protected]>

add extra tokens.

49cd272

Signed-off-by: Fanrong Li <[email protected]>

add waived tests back.

a7ef090

Signed-off-by: Fanrong Li <[email protected]>

update the py_draft_tokens default to an empty list and add some expl…

cc11e44

…anatory comments. Signed-off-by: Fanrong Li <[email protected]>

fix py_draft_tokens in dis-agg.

7b974aa

Signed-off-by: Fanrong Li <[email protected]>

waive test_fp8_block_scales tests due to unstable errors.

2511aec

Signed-off-by: Fanrong Li <[email protected]>

waive test_bfloat16 tests due to unstable errors in post-merge.

bead11c

Signed-off-by: Fanrong Li <[email protected]>

lfr-0531 force-pushed the user/fanrongl/fix_input_ids_in_dummy_req branch from 7175803 to bead11c Compare June 1, 2025 08:24

lfr-0531 merged commit 7d356ef into NVIDIA:main Jun 1, 2025
3 checks passed

lfr-0531 added a commit to lfr-0531/TensorRT-LLM that referenced this pull request Jun 2, 2025

fix: fix accuracy and illegal memory access issues when using mtp + a…

e0dffa3

…ttention dp (NVIDIA#4379) Signed-off-by: Fanrong Li <[email protected]>

lfr-0531 mentioned this pull request Jun 2, 2025

Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/4379 #4833

Merged

mikeiovine mentioned this pull request Jun 2, 2025

[nvbug/5280806][fix] Fix 2 model spec decode flow #4807

Merged

lfr-0531 added a commit that referenced this pull request Jun 3, 2025

Cherry-pick #4379 (#4833)

6e46e13

Signed-off-by: Fanrong Li <[email protected]>

lfr-0531 mentioned this pull request Jun 20, 2025

[doc] update mtp documents #5387

Merged

k-l-lambda pushed a commit to k-l-lambda/TensorRT-LLM that referenced this pull request Jun 23, 2025

Cherry-pick NVIDIA#4379 (NVIDIA#4833)

9935911

Signed-off-by: Fanrong Li <[email protected]>

lfr-0531 deleted the user/fanrongl/fix_input_ids_in_dummy_req branch June 27, 2025 12:43

symphonylyh mentioned this pull request Aug 8, 2025

Deepseek R1 and V3, FP4 quant, output quality issues at batch size > 2 #4037

Closed

4 tasks

fix: fix accuracy and illegal memory access issues when using mtp + attention dp #4379

fix: fix accuracy and illegal memory access issues when using mtp + attention dp #4379

Uh oh!

Conversation

lfr-0531 commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Coverage

Uh oh!

lfr-0531 commented May 16, 2025

Uh oh!

tensorrt-cicd commented May 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lfr-0531 commented May 16, 2025

Uh oh!

tensorrt-cicd commented May 16, 2025

Uh oh!

tensorrt-cicd commented May 16, 2025

Uh oh!

tensorrt-cicd commented May 16, 2025

Uh oh!

lfr-0531 commented May 16, 2025

Uh oh!

tensorrt-cicd commented May 16, 2025

Uh oh!

tensorrt-cicd commented May 16, 2025

Uh oh!

lfr-0531 commented May 23, 2025

Uh oh!

tensorrt-cicd commented May 23, 2025

Uh oh!

tensorrt-cicd commented May 23, 2025

Uh oh!

PerkzZheng commented May 25, 2025

Uh oh!

lfr-0531 commented May 25, 2025

Uh oh!

lfr-0531 commented May 25, 2025

Uh oh!

PerkzZheng commented May 25, 2025

Uh oh!

tensorrt-cicd commented May 25, 2025

Uh oh!

tensorrt-cicd commented May 25, 2025

Uh oh!

tensorrt-cicd commented Jun 1, 2025

Uh oh!

lfr-0531 commented Jun 1, 2025

Uh oh!

lfr-0531 commented Jun 1, 2025

Uh oh!

tensorrt-cicd commented Jun 1, 2025

Uh oh!

tensorrt-cicd commented Jun 1, 2025

Uh oh!

lfr-0531 commented Jun 1, 2025

Uh oh!

tensorrt-cicd commented Jun 1, 2025

Uh oh!

tensorrt-cicd commented Jun 1, 2025

Uh oh!

Uh oh!

Uh oh!

lfr-0531 commented May 16, 2025 •

edited

Loading