Fix: fix the deterministic issue in the MTP Eagle path #5285

lfr-0531 · 2025-06-17T12:12:06Z

Description

This PR updates the first draft forward in the MTP Eagle path to use all accepted tokens instead of just the last one. This allows the KV cache for the draft layer to be updated.

Before this fix, each iteration, the KV cache didn't store the key/value pair for the last draft token (let's call it D) because it was the output of the last draft forward pass. For this last draft token, we call it D. In the next iteration, if all draft tokens were accepted, we'd use the newly generated tokens as inputs for the first draft forward. But since the KV cache had incorrect key/value data for token D, identical inputs could produce different draft tokens.

With this fix, the KV cache will be updated in the first draft forward each iteration, making MTP Eagle deterministic.

I tested the DS-R1-FP4 model with the same dataset and on the same node (with BS=1). The acceptance rate will increase a little bit:

Before the changes:

===========================================================
= PERFORMANCE OVERVIEW 
===========================================================
Request Throughput (req/sec):                     0.1530
Total Output Throughput (tokens/sec):             313.0214
Total Token Throughput (tokens/sec):              472.7368
Total Latency (ms):                               65372.5274
Average request latency (ms):                     6537.2043
Per User Output Throughput [w/ ctx] (tps/user):   313.3934
Per GPU Output Throughput (tps/gpu):              39.1277

-- Acceptance Rate Details --------------------------------
[AR] MINIMUM: 2.69
[AR] MAXIMUM: 3.08
[AR] AVERAGE: 2.81
[AR] P50    : 2.80
[AR] P90    : 3.08
[AR] P95    : 3.08
[AR] P99    : 3.08

After:

===========================================================
= PERFORMANCE OVERVIEW 
===========================================================
Request Throughput (req/sec):                     0.1544
Total Output Throughput (tokens/sec):             316.0127
Total Token Throughput (tokens/sec):              477.2701
Total Latency (ms):                               64747.4007
Average request latency (ms):                     6474.6948
Per User Output Throughput [w/ ctx] (tps/user):   316.3113
Per GPU Output Throughput (tps/gpu):              39.5016

-- Acceptance Rate Details --------------------------------
[AR] MINIMUM: 2.75
[AR] MAXIMUM: 3.07
[AR] AVERAGE: 2.83
[AR] P50    : 2.84
[AR] P90    : 3.07
[AR] P95    : 3.07
[AR] P99    : 3.07

For the model accuracy, with this fix, I enabled MTP Ealge and tested GPQA diamond twice with the same random seed. We got the same results 70.202 ± 3.2586, which is expected.

lfr-0531 · 2025-06-17T12:13:00Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-06-17T12:19:20Z

PR_Github #9204 [ run ] triggered by Bot

lfr-0531 · 2025-06-18T01:17:13Z

/bot run

lfr-0531 · 2025-06-18T01:17:36Z

/bot kill

tensorrt-cicd · 2025-06-18T01:23:22Z

PR_Github #9264 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-18T01:23:48Z

PR_Github #9265 [ kill ] triggered by Bot

tensorrt-cicd · 2025-06-18T01:23:49Z

PR_Github #9264 [ run ] completed with state ABORTED

tensorrt-cicd · 2025-06-18T01:24:19Z

PR_Github #9265 [ kill ] completed with state SUCCESS
Successfully killed previous jobs for commit 9fcb06c

lfr-0531 · 2025-06-18T02:05:05Z

/bot run

tensorrt-cicd · 2025-06-18T02:11:04Z

PR_Github #9277 [ run ] triggered by Bot

lfr-0531 · 2025-06-18T04:23:19Z

/bot kill

tensorrt-cicd · 2025-06-18T04:29:18Z

PR_Github #9307 [ kill ] triggered by Bot

tensorrt-cicd · 2025-06-18T04:29:19Z

PR_Github #9277 [ run ] completed with state ABORTED

lfr-0531 · 2025-06-18T04:29:44Z

/bot run

tensorrt-cicd · 2025-06-18T04:29:49Z

PR_Github #9307 [ kill ] completed with state SUCCESS
Successfully killed previous jobs for commit 0bf3101

tensorrt-cicd · 2025-06-18T04:35:08Z

PR_Github #9308 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-18T06:38:01Z

PR_Github #9308 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6830 completed with status: 'FAILURE'

Signed-off-by: Fanrong Li <[email protected]>

lfr-0531 · 2025-06-18T07:36:07Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-06-18T07:41:23Z

PR_Github #9334 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-18T14:47:51Z

PR_Github #9334 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6851 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

tensorrt_llm/_torch/models/modeling_deepseekv3.py

tensorrt_llm/_torch/speculative/mtp.py

Signed-off-by: Fanrong Li <[email protected]>

lfr-0531 requested a review from yweng0828 June 17, 2025 12:12

lfr-0531 requested a review from a team as a code owner June 17, 2025 12:12

lfr-0531 requested review from byshiue and yilin-void June 17, 2025 12:12

lfr-0531 force-pushed the user/fanrongl/fix_mtp_eagle_deterministic branch from 3883280 to cc397aa Compare June 17, 2025 12:12

lfr-0531 removed request for byshiue and yilin-void June 17, 2025 14:18

lfr-0531 force-pushed the user/fanrongl/fix_mtp_eagle_deterministic branch from cc397aa to 9fcb06c Compare June 18, 2025 01:17

lfr-0531 force-pushed the user/fanrongl/fix_mtp_eagle_deterministic branch from 9fcb06c to 0bf3101 Compare June 18, 2025 04:29

lfr-0531 force-pushed the user/fanrongl/fix_mtp_eagle_deterministic branch from 0bf3101 to df46b63 Compare June 18, 2025 04:29

lfr-0531 added 3 commits June 18, 2025 15:35

fix the deterministic issue in MTP eagle path.

889f641

Signed-off-by: Fanrong Li <[email protected]>

fix.

3472018

Signed-off-by: Fanrong Li <[email protected]>

fix attention dp for mtp eagle.

fb38299

Signed-off-by: Fanrong Li <[email protected]>

fix accepted_tokens tensor.

d6276c0

Signed-off-by: Fanrong Li <[email protected]>

lfr-0531 force-pushed the user/fanrongl/fix_mtp_eagle_deterministic branch from df46b63 to d6276c0 Compare June 18, 2025 07:35

yweng0828 reviewed Jun 19, 2025

View reviewed changes

yweng0828 approved these changes Jun 19, 2025

View reviewed changes

QiJune approved these changes Jun 19, 2025

View reviewed changes

lfr-0531 merged commit c7af650 into NVIDIA:main Jun 19, 2025
3 checks passed

lfr-0531 mentioned this pull request Jun 20, 2025

[doc] update mtp documents #5387

Merged

k-l-lambda pushed a commit to k-l-lambda/TensorRT-LLM that referenced this pull request Jun 23, 2025

Fix: fix the deterministic issue in the MTP Eagle path (NVIDIA#5285)

24548d7

Signed-off-by: Fanrong Li <[email protected]>

lfr-0531 deleted the user/fanrongl/fix_mtp_eagle_deterministic branch June 27, 2025 12:43

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 9, 2025

Fix: fix the deterministic issue in the MTP Eagle path (NVIDIA#5285)

b604d91

Signed-off-by: Fanrong Li <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025

Fix: fix the deterministic issue in the MTP Eagle path (NVIDIA#5285)

2ac9b9d

Signed-off-by: Fanrong Li <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025

Fix: fix the deterministic issue in the MTP Eagle path (NVIDIA#5285)

2d56095

Signed-off-by: Fanrong Li <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025

Fix: fix the deterministic issue in the MTP Eagle path (NVIDIA#5285)

edfcef6

Signed-off-by: Fanrong Li <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025

Fix: fix the deterministic issue in the MTP Eagle path (NVIDIA#5285)

0343373

Signed-off-by: Fanrong Li <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025

Fix: fix the deterministic issue in the MTP Eagle path (NVIDIA#5285)

84b0e08

Signed-off-by: Fanrong Li <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025

Fix: fix the deterministic issue in the MTP Eagle path (NVIDIA#5285)

6c70015

Signed-off-by: Fanrong Li <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025

Fix: fix the deterministic issue in the MTP Eagle path (NVIDIA#5285)

043a7f4

Signed-off-by: Fanrong Li <[email protected]>

Fix: fix the deterministic issue in the MTP Eagle path #5285

Fix: fix the deterministic issue in the MTP Eagle path #5285

Uh oh!

Conversation

lfr-0531 commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

lfr-0531 commented Jun 17, 2025

Uh oh!

tensorrt-cicd commented Jun 17, 2025

Uh oh!

lfr-0531 commented Jun 18, 2025

Uh oh!

lfr-0531 commented Jun 18, 2025

Uh oh!

tensorrt-cicd commented Jun 18, 2025

Uh oh!

tensorrt-cicd commented Jun 18, 2025

Uh oh!

tensorrt-cicd commented Jun 18, 2025

Uh oh!

tensorrt-cicd commented Jun 18, 2025

Uh oh!

lfr-0531 commented Jun 18, 2025

Uh oh!

tensorrt-cicd commented Jun 18, 2025

Uh oh!

lfr-0531 commented Jun 18, 2025

Uh oh!

tensorrt-cicd commented Jun 18, 2025

Uh oh!

tensorrt-cicd commented Jun 18, 2025

Uh oh!

lfr-0531 commented Jun 18, 2025

Uh oh!

tensorrt-cicd commented Jun 18, 2025

Uh oh!

tensorrt-cicd commented Jun 18, 2025

Uh oh!

tensorrt-cicd commented Jun 18, 2025

Uh oh!

lfr-0531 commented Jun 18, 2025

Uh oh!

tensorrt-cicd commented Jun 18, 2025

Uh oh!

tensorrt-cicd commented Jun 18, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lfr-0531 commented Jun 17, 2025 •

edited

Loading