-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[TRTLLM-3456] Speculation: Draft Target in new FW #4558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TRTLLM-3456] Speculation: Draft Target in new FW #4558
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Let's add a test?
By the way, I realized that this was a bit too specific to eagle 3: https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/_torch/pyexecutor/py_executor.py#L1817 You don't have to re-compute KV cache for verified tokens in this case; that's only a thing in MTP/eagle. We can make it optional. No need to fix this now since it's a performance thing, won't hurt correctness. |
Some variance in output is expected (I think) specifically when a draft token is from the prefill portion. |
Apparently I have found an edge case in the formatting :) |
I guess it was a conflict between yapf and isort or something similar? Could you please disable the formatting locally instead of changing the config file? Something like:
or
|
Depends on these fixes: #4807 |
/bot run --disable-fail-fast |
1 similar comment
/bot run --disable-fail-fast |
PR_Github #8320 [ run ] triggered by Bot |
/bot run --disable-fail-fast |
PR_Github #8320 [ run ] completed with state |
/bot run --disable-fail-fast |
1 similar comment
/bot run --disable-fail-fast |
5ce0535
to
340922c
Compare
/bot run --disable-fail-fast |
1 similar comment
/bot run --disable-fail-fast |
PR_Github #8517 [ run ] triggered by Bot |
PR_Github #8517 [ run ] completed with state |
PR_Github #8568 [ run ] completed with state |
PR_Github #8569 [ kill ] completed with state |
PR_Github #8570 [ run ] triggered by Bot |
PR_Github #8570 [ run ] completed with state |
a1cff6c
to
e22b7a4
Compare
/bot run --disable-fail-fast |
PR_Github #8704 [ run ] triggered by Bot |
PR_Github #8704 [ run ] completed with state |
/bot run |
PR_Github #8774 [ run ] triggered by Bot |
PR_Github #8774 [ run ] completed with state |
e22b7a4
to
73f069a
Compare
Signed-off-by: Izzy Putterman <[email protected]>
Signed-off-by: Izzy Putterman <[email protected]>
Signed-off-by: Izzy Putterman <[email protected]>
Signed-off-by: Izzy Putterman <[email protected]>
Signed-off-by: Izzy Putterman <[email protected]>
73f069a
to
d6d087d
Compare
/bot skip |
GitHub Bot Help
Provide a user friendly way for developers to interact with a Jenkins server. Run See details below for each supported subcommand.
Launch build/test pipelines. All previously running jobs will be killed.
kill
Kill all running builds associated with pull request. skip
Skip testing for latest commit on pull request. reuse-pipeline
Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break. |
/bot skip "No major changes since last CI pass, just rebase" |
GitHub Bot Help
Provide a user friendly way for developers to interact with a Jenkins server. Run See details below for each supported subcommand.
Launch build/test pipelines. All previously running jobs will be killed.
kill
Kill all running builds associated with pull request. skip
Skip testing for latest commit on pull request. reuse-pipeline
Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break. |
/bot skip --comment "No major changes since last CI pass, just rebase" |
PR_Github #9054 [ skip ] triggered by Bot |
PR_Github #9054 [ skip ] completed with state |
No description provided.