Skip to content

Conversation

lfr-0531
Copy link
Collaborator

@lfr-0531 lfr-0531 commented May 29, 2025

Description

  • Refactor the MTP vanilla implementation
  • Fix two bugs in MTP vanilla forward
    • The one is the saved past token
    • Another one is the input hidden states of the draft layers

Now MTP vanilla can achieve comparable performance to MTP Eagle:

model num_samples nextn Total Output Throughput (tokens/sec) acceptance rate
deepseek-ai/DeepSeek-R1-fp4 10 0 151.1382389365772 1.0
deepseek-ai/DeepSeek-R1-fp4 10 1 266.1345253466809 1.9064093282044987
deepseek-ai/DeepSeek-R1-fp4 10 2 317.71522794209443 2.53556285582307
deepseek-ai/DeepSeek-R1-fp4 10 3 320.31630735801565 2.83028864320805
deepseek-ai/DeepSeek-R1-fp4 10 4 306.35980785857925 2.944817683111899
deepseek-ai/DeepSeek-R1-fp4 10 5 291.8027987875048 2.986063372598476
deepseek-ai/DeepSeek-R1-fp4 10 6 269.4846495353019 2.989467810471117
deepseek-ai/DeepSeek-R1-fp4 10 7 253.9811841654886 3.0055965829479336

@lfr-0531 lfr-0531 requested a review from a team as a code owner May 29, 2025 09:35
@lfr-0531 lfr-0531 requested review from mikeiovine, lucaslie and yweng0828 and removed request for mikeiovine and lucaslie May 29, 2025 09:35
@lfr-0531 lfr-0531 force-pushed the user/fanrongl/simplify_mtp_vanilla branch 5 times, most recently from 0d16978 to 57f634a Compare June 4, 2025 09:01
@lfr-0531 lfr-0531 changed the title draft: refactor and fix mtp vanilla fix: refactor and fix mtp vanilla Jun 4, 2025
@lfr-0531
Copy link
Collaborator Author

lfr-0531 commented Jun 4, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7484 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7484 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #5429 completed with status: 'ABORTED'

@lfr-0531 lfr-0531 force-pushed the user/fanrongl/simplify_mtp_vanilla branch from 57f634a to c9765d5 Compare June 5, 2025 00:32
@lfr-0531
Copy link
Collaborator Author

lfr-0531 commented Jun 5, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7585 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7585 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5506 completed with status: 'FAILURE'

@lfr-0531 lfr-0531 force-pushed the user/fanrongl/simplify_mtp_vanilla branch from c9765d5 to d2f3183 Compare June 5, 2025 17:14
@lfr-0531
Copy link
Collaborator Author

lfr-0531 commented Jun 5, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7785 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7785 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #5638 completed with status: 'FAILURE'

@lfr-0531 lfr-0531 force-pushed the user/fanrongl/simplify_mtp_vanilla branch from d2f3183 to 5e5bdf8 Compare June 6, 2025 04:52
@lfr-0531
Copy link
Collaborator Author

lfr-0531 commented Jun 6, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7837 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7837 [ run ] completed with state FAILURE

@lfr-0531 lfr-0531 force-pushed the user/fanrongl/simplify_mtp_vanilla branch from 5e5bdf8 to bfe434d Compare June 6, 2025 16:21
@lfr-0531
Copy link
Collaborator Author

lfr-0531 commented Jun 6, 2025

/bot run --disable-fail-fast

@lfr-0531 lfr-0531 force-pushed the user/fanrongl/simplify_mtp_vanilla branch from 2c85775 to 8b8aa4b Compare June 19, 2025 10:28
@lfr-0531
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9505 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9478 [ run ] completed with state ABORTED

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9505 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6972 completed with status: 'FAILURE'

@lfr-0531
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9527 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9527 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6990 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

@lfr-0531 lfr-0531 merged commit 5d4ab47 into NVIDIA:main Jun 19, 2025
3 checks passed
k-l-lambda pushed a commit to k-l-lambda/TensorRT-LLM that referenced this pull request Jun 23, 2025
@lfr-0531 lfr-0531 deleted the user/fanrongl/simplify_mtp_vanilla branch June 27, 2025 12:43
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 9, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants