-
Notifications
You must be signed in to change notification settings - Fork 1.8k
feature: make trtllmsampler new_tokens format the universal format #4401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature: make trtllmsampler new_tokens format the universal format #4401
Conversation
af06fcd
to
27fd1d1
Compare
/bot run |
PR_Github #6632 [ run ] triggered by Bot |
PR_Github #6632 [ run ] completed with state |
8345da8
to
b07fa8e
Compare
/bot run |
PR_Github #7516 [ run ] triggered by Bot |
PR_Github #7516 [ run ] completed with state |
/bot run |
PR_Github #7698 [ run ] triggered by Bot |
PR_Github #7698 [ run ] completed with state |
f48672a
to
71783a4
Compare
/bot run |
PR_Github #9539 [ run ] triggered by Bot |
PR_Github #9539 [ run ] completed with state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors how speculative samplers handle new token formatting by unifying on a single TorchSampler.Args
structure, streamlining decoder factory logic, and replacing legacy sampler implementations.
- Refactored
get_spec_decoder
to acceptTorchSampler.Args
and updated MTP/Eagle3OneModel sampler constructors. - Consolidated request iteration via
ScheduledRequests.all_requests()
, replacingitertools.chain
across the codebase. - Removed outdated Eagle3Sampler/Eagle3Decoder classes and integrated
SeqSlotManager
for draft slot management.
Reviewed Changes
Copilot reviewed 11 out of 12 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
speculative/utils.py | Updated decoder factory signature and imports for spec decoders |
speculative/mtp.py | Refactored MTPSampler constructor, updated stop criteria logic |
speculative/eagle3.py | Removed legacy sampler/decoder classes, added new sampler class |
pyexecutor/seq_slot_manager.py | Simplified resource prep using all_requests() |
pyexecutor/scheduler.py | Changed all_requests to a method returning a list |
pyexecutor/py_executor.py | Integrated SeqSlotManager , updated logits field assignments |
pyexecutor/model_engine.py | Introduced BEAM_WIDTH , centralized batch indexing logic |
pyexecutor/llm_request.py | Added py_is_draft flag to LlmRequest |
pyexecutor/guided_decoder.py | Replaced itertools.chain with all_requests() |
pyexecutor/_util.py | Centralized sampler instantiation with create_torch_sampler_args |
auto_deploy/shim/ad_executor.py | Updated AD executor to use TorchSampler.Args and slot manager |
Comments suppressed due to low confidence (3)
tensorrt_llm/_torch/speculative/mtp.py:314
- The returned SampleStateMTP no longer includes a logits field, which may be accessed downstream in the executor (e.g., in
_executor_loop_pp
). Consider preserving or settingdevice.logits
andhost.logits
inSampleStateMTP
to avoid missing attribute errors.
)
tensorrt_llm/_torch/speculative/utils.py:83
- [nitpick] The parameter name
sampler_args
is more verbose than other code that usesargs
forTorchSampler
parameters. Consider renaming it toargs
for consistency and brevity.
def get_spec_decoder(sampler_args: TorchSampler.Args, spec_config: SpecConfig):
tensorrt_llm/_torch/pyexecutor/model_engine.py:1160
- [nitpick] The
nonlocal mtp_batch_idx
declaration appears after a conditional return in the nestedpy_batch_idx
function. For clarity, move thenonlocal
statement to the top of the function body before any logic.
nonlocal mtp_batch_idx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AD changes LGTM
…er new_tokens format (NVIDIA#4401)" This reverts commit 58a8a8f. Signed-off-by: Netanel Haber <[email protected]>
…r new_tokens format (#4401)" (#5474) Signed-off-by: Netanel Haber <[email protected]>
…okens format (NVIDIA#4401) Signed-off-by: Netanel Haber <[email protected]>
…r new_tokens format (NVIDIA#4401)" (NVIDIA#5474) Signed-off-by: Netanel Haber <[email protected]>
…okens format (NVIDIA#4401) Signed-off-by: Netanel Haber <[email protected]>
…r new_tokens format (NVIDIA#4401)" (NVIDIA#5474) Signed-off-by: Netanel Haber <[email protected]>
…okens format (NVIDIA#4401) Signed-off-by: Netanel Haber <[email protected]>
…r new_tokens format (NVIDIA#4401)" (NVIDIA#5474) Signed-off-by: Netanel Haber <[email protected]>
…okens format (NVIDIA#4401) Signed-off-by: Netanel Haber <[email protected]>
…r new_tokens format (NVIDIA#4401)" (NVIDIA#5474) Signed-off-by: Netanel Haber <[email protected]>
…okens format (NVIDIA#4401) Signed-off-by: Netanel Haber <[email protected]>
…r new_tokens format (NVIDIA#4401)" (NVIDIA#5474) Signed-off-by: Netanel Haber <[email protected]>
…okens format (NVIDIA#4401) Signed-off-by: Netanel Haber <[email protected]>
…r new_tokens format (NVIDIA#4401)" (NVIDIA#5474) Signed-off-by: Netanel Haber <[email protected]>
…okens format (NVIDIA#4401) Signed-off-by: Netanel Haber <[email protected]>
…r new_tokens format (NVIDIA#4401)" (NVIDIA#5474) Signed-off-by: Netanel Haber <[email protected]>
…okens format (NVIDIA#4401) Signed-off-by: Netanel Haber <[email protected]>
…r new_tokens format (NVIDIA#4401)" (NVIDIA#5474) Signed-off-by: Netanel Haber <[email protected]>
No description provided.