feature: make trtllmsampler new_tokens format the universal format #4401

netanel-haber · 2025-05-16T16:10:45Z

No description provided.

netanel-haber · 2025-05-27T11:25:10Z

/bot run

tensorrt-cicd · 2025-05-27T11:30:53Z

PR_Github #6632 [ run ] triggered by Bot

tensorrt-cicd · 2025-05-27T13:47:59Z

PR_Github #6632 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #4848 completed with status: 'FAILURE'

netanel-haber · 2025-06-04T11:54:53Z

/bot run

tensorrt-cicd · 2025-06-04T12:00:59Z

PR_Github #7516 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-04T13:08:57Z

PR_Github #7516 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5451 completed with status: 'FAILURE'

netanel-haber · 2025-06-05T09:58:45Z

/bot run

tensorrt-cicd · 2025-06-05T10:05:34Z

PR_Github #7698 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-05T12:42:03Z

PR_Github #7698 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5588 completed with status: 'FAILURE'

netanel-haber · 2025-06-19T21:59:51Z

/bot run

tensorrt-cicd · 2025-06-19T22:04:49Z

PR_Github #9539 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-20T01:56:55Z

PR_Github #9539 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #7000 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

Copilot

Pull Request Overview

This PR refactors how speculative samplers handle new token formatting by unifying on a single TorchSampler.Args structure, streamlining decoder factory logic, and replacing legacy sampler implementations.

Refactored get_spec_decoder to accept TorchSampler.Args and updated MTP/Eagle3OneModel sampler constructors.
Consolidated request iteration via ScheduledRequests.all_requests(), replacing itertools.chain across the codebase.
Removed outdated Eagle3Sampler/Eagle3Decoder classes and integrated SeqSlotManager for draft slot management.

Reviewed Changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
speculative/utils.py	Updated decoder factory signature and imports for spec decoders
speculative/mtp.py	Refactored `MTPSampler` constructor, updated stop criteria logic
speculative/eagle3.py	Removed legacy sampler/decoder classes, added new sampler class
pyexecutor/seq_slot_manager.py	Simplified resource prep using `all_requests()`
pyexecutor/scheduler.py	Changed `all_requests` to a method returning a list
pyexecutor/py_executor.py	Integrated `SeqSlotManager`, updated logits field assignments
pyexecutor/model_engine.py	Introduced `BEAM_WIDTH`, centralized batch indexing logic
pyexecutor/llm_request.py	Added `py_is_draft` flag to `LlmRequest`
pyexecutor/guided_decoder.py	Replaced `itertools.chain` with `all_requests()`
pyexecutor/_util.py	Centralized sampler instantiation with `create_torch_sampler_args`
auto_deploy/shim/ad_executor.py	Updated AD executor to use `TorchSampler.Args` and slot manager

Comments suppressed due to low confidence (3)

tensorrt_llm/_torch/speculative/mtp.py:314

The returned SampleStateMTP no longer includes a logits field, which may be accessed downstream in the executor (e.g., in _executor_loop_pp). Consider preserving or setting device.logits and host.logits in SampleStateMTP to avoid missing attribute errors.

tensorrt_llm/_torch/speculative/utils.py:83

[nitpick] The parameter name sampler_args is more verbose than other code that uses args for TorchSampler parameters. Consider renaming it to args for consistency and brevity.

def get_spec_decoder(sampler_args: TorchSampler.Args, spec_config: SpecConfig):

tensorrt_llm/_torch/pyexecutor/model_engine.py:1160

[nitpick] The nonlocal mtp_batch_idx declaration appears after a conditional return in the nested py_batch_idx function. For clarity, move the nonlocal statement to the top of the function body before any logic.

            nonlocal mtp_batch_idx

suyoggupta

AD changes LGTM

…er new_tokens format (NVIDIA#4401)" This reverts commit 58a8a8f. Signed-off-by: Netanel Haber <[email protected]>

…r new_tokens format (#4401)" (#5474) Signed-off-by: Netanel Haber <[email protected]>

…okens format (NVIDIA#4401) Signed-off-by: Netanel Haber <[email protected]>

…r new_tokens format (NVIDIA#4401)" (NVIDIA#5474) Signed-off-by: Netanel Haber <[email protected]>

…okens format (NVIDIA#4401) Signed-off-by: Netanel Haber <[email protected]>

…r new_tokens format (NVIDIA#4401)" (NVIDIA#5474) Signed-off-by: Netanel Haber <[email protected]>

…okens format (NVIDIA#4401) Signed-off-by: Netanel Haber <[email protected]>

…r new_tokens format (NVIDIA#4401)" (NVIDIA#5474) Signed-off-by: Netanel Haber <[email protected]>

…okens format (NVIDIA#4401) Signed-off-by: Netanel Haber <[email protected]>

…r new_tokens format (NVIDIA#4401)" (NVIDIA#5474) Signed-off-by: Netanel Haber <[email protected]>

…okens format (NVIDIA#4401) Signed-off-by: Netanel Haber <[email protected]>

…r new_tokens format (NVIDIA#4401)" (NVIDIA#5474) Signed-off-by: Netanel Haber <[email protected]>

…okens format (NVIDIA#4401) Signed-off-by: Netanel Haber <[email protected]>

…r new_tokens format (NVIDIA#4401)" (NVIDIA#5474) Signed-off-by: Netanel Haber <[email protected]>

…okens format (NVIDIA#4401) Signed-off-by: Netanel Haber <[email protected]>

…r new_tokens format (NVIDIA#4401)" (NVIDIA#5474) Signed-off-by: Netanel Haber <[email protected]>

…okens format (NVIDIA#4401) Signed-off-by: Netanel Haber <[email protected]>

…r new_tokens format (NVIDIA#4401)" (NVIDIA#5474) Signed-off-by: Netanel Haber <[email protected]>

netanel-haber force-pushed the user/nhaber/feature/align_sample_state_with_trtllm_sampler_sample_state branch from af06fcd to 27fd1d1 Compare May 19, 2025 13:06

netanel-haber force-pushed the user/nhaber/feature/align_sample_state_with_trtllm_sampler_sample_state branch 12 times, most recently from 8345da8 to b07fa8e Compare June 4, 2025 11:54

netanel-haber force-pushed the user/nhaber/feature/align_sample_state_with_trtllm_sampler_sample_state branch 2 times, most recently from f48672a to 71783a4 Compare June 5, 2025 16:11

netanel-haber marked this pull request as ready for review June 5, 2025 16:16

netanel-haber requested review from a team as code owners June 5, 2025 16:16

netanel-haber requested review from suyoggupta and schetlur-nv June 5, 2025 16:16

Funatiq requested a review from Copilot June 20, 2025 07:05

Copilot AI reviewed Jun 20, 2025

View reviewed changes

netanel-haber requested a review from dcampora June 20, 2025 09:31

dcampora approved these changes Jun 20, 2025

View reviewed changes

DomBrown approved these changes Jun 23, 2025

View reviewed changes

suyoggupta approved these changes Jun 23, 2025

View reviewed changes

netanel-haber merged commit 58a8a8f into NVIDIA:main Jun 23, 2025
3 checks passed

wili-65535 mentioned this pull request Jun 23, 2025

[TRTLLM-5000][feat] NGrams V2 #4569

Merged

litaotju pushed a commit that referenced this pull request Jun 26, 2025

Revert "feature: unify new_tokens format sample state to trtllm sampe…

6aef149

…r new_tokens format (#4401)" (#5474) Signed-off-by: Netanel Haber <[email protected]>

netanel-haber deleted the user/nhaber/feature/align_sample_state_with_trtllm_sampler_sample_state branch July 1, 2025 09:56

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 9, 2025

feature: unify new_tokens format sample state to trtllm sampler new_t…

eefd602

…okens format (NVIDIA#4401) Signed-off-by: Netanel Haber <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 9, 2025

Revert "feature: unify new_tokens format sample state to trtllm sampe…

77783c5

…r new_tokens format (NVIDIA#4401)" (NVIDIA#5474) Signed-off-by: Netanel Haber <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025

feature: unify new_tokens format sample state to trtllm sampler new_t…

a51e62b

…okens format (NVIDIA#4401) Signed-off-by: Netanel Haber <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025

Revert "feature: unify new_tokens format sample state to trtllm sampe…

05b9065

…r new_tokens format (NVIDIA#4401)" (NVIDIA#5474) Signed-off-by: Netanel Haber <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025

feature: unify new_tokens format sample state to trtllm sampler new_t…

3eed5da

…okens format (NVIDIA#4401) Signed-off-by: Netanel Haber <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025

Revert "feature: unify new_tokens format sample state to trtllm sampe…

c9a4441

…r new_tokens format (NVIDIA#4401)" (NVIDIA#5474) Signed-off-by: Netanel Haber <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025

feature: unify new_tokens format sample state to trtllm sampler new_t…

904c0e3

…okens format (NVIDIA#4401) Signed-off-by: Netanel Haber <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025

Revert "feature: unify new_tokens format sample state to trtllm sampe…

55b9df7

…r new_tokens format (NVIDIA#4401)" (NVIDIA#5474) Signed-off-by: Netanel Haber <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025

feature: unify new_tokens format sample state to trtllm sampler new_t…

b8f98ad

…okens format (NVIDIA#4401) Signed-off-by: Netanel Haber <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 10, 2025

Revert "feature: unify new_tokens format sample state to trtllm sampe…

537ae78

…r new_tokens format (NVIDIA#4401)" (NVIDIA#5474) Signed-off-by: Netanel Haber <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025

feature: unify new_tokens format sample state to trtllm sampler new_t…

0082feb

…okens format (NVIDIA#4401) Signed-off-by: Netanel Haber <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025

Revert "feature: unify new_tokens format sample state to trtllm sampe…

5edd8db

…r new_tokens format (NVIDIA#4401)" (NVIDIA#5474) Signed-off-by: Netanel Haber <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025

feature: unify new_tokens format sample state to trtllm sampler new_t…

7adc31a

…okens format (NVIDIA#4401) Signed-off-by: Netanel Haber <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025

Revert "feature: unify new_tokens format sample state to trtllm sampe…

3ee37e4

…r new_tokens format (NVIDIA#4401)" (NVIDIA#5474) Signed-off-by: Netanel Haber <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025

feature: unify new_tokens format sample state to trtllm sampler new_t…

7511c20

…okens format (NVIDIA#4401) Signed-off-by: Netanel Haber <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Jul 11, 2025

Revert "feature: unify new_tokens format sample state to trtllm sampe…

8278fd3

…r new_tokens format (NVIDIA#4401)" (NVIDIA#5474) Signed-off-by: Netanel Haber <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feature: make trtllmsampler new_tokens format the universal format #4401

feature: make trtllmsampler new_tokens format the universal format #4401

Uh oh!

netanel-haber commented May 16, 2025

Uh oh!

netanel-haber commented May 27, 2025

Uh oh!

tensorrt-cicd commented May 27, 2025

Uh oh!

tensorrt-cicd commented May 27, 2025

Uh oh!

netanel-haber commented Jun 4, 2025

Uh oh!

tensorrt-cicd commented Jun 4, 2025

Uh oh!

tensorrt-cicd commented Jun 4, 2025

Uh oh!

netanel-haber commented Jun 5, 2025

Uh oh!

tensorrt-cicd commented Jun 5, 2025

Uh oh!

tensorrt-cicd commented Jun 5, 2025

Uh oh!

netanel-haber commented Jun 19, 2025

Uh oh!

tensorrt-cicd commented Jun 19, 2025

Uh oh!

tensorrt-cicd commented Jun 20, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

suyoggupta left a comment

Uh oh!

Uh oh!

Uh oh!

feature: make trtllmsampler new_tokens format the universal format #4401

feature: make trtllmsampler new_tokens format the universal format #4401

Uh oh!

Conversation

netanel-haber commented May 16, 2025

Uh oh!

netanel-haber commented May 27, 2025

Uh oh!

tensorrt-cicd commented May 27, 2025

Uh oh!

tensorrt-cicd commented May 27, 2025

Uh oh!

netanel-haber commented Jun 4, 2025

Uh oh!

tensorrt-cicd commented Jun 4, 2025

Uh oh!

tensorrt-cicd commented Jun 4, 2025

Uh oh!

netanel-haber commented Jun 5, 2025

Uh oh!

tensorrt-cicd commented Jun 5, 2025

Uh oh!

tensorrt-cicd commented Jun 5, 2025

Uh oh!

netanel-haber commented Jun 19, 2025

Uh oh!

tensorrt-cicd commented Jun 19, 2025

Uh oh!

tensorrt-cicd commented Jun 20, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

suyoggupta left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!