[TRTLLM-6785][feat] BREAKING CHANGE Enable TRTLLM sampler by default #6216

dcampora · 2025-07-21T14:03:57Z

Summary by CodeRabbit

New Features
- Introduced a new option to enable the Torch sampler for model inference, replacing the previous TensorRT LLM sampler flag across user interfaces, configuration files, and command-line arguments.
Refactor
- Renamed parameters and configuration options from enable_trtllm_sampler to use_torch_sampler throughout the application and documentation for improved clarity.
- Updated terminology from "max batch size" to "max number of sequences" in APIs and user-facing documentation for model decoding.
- Improved internal naming and added optionality to sampler state attributes for better clarity and robustness.
- Enhanced sampler setup to handle additional request types and improved event handling robustness.
Bug Fixes
- Enhanced error messaging for out-of-range sampling parameters.
- Improved robustness against missing configuration attributes and event handling.
Tests
- Updated all relevant tests and configurations to use the new sampler flag and naming conventions.
- Adjusted test expectations and fixtures to reflect updated model and configuration options.
Documentation
- Updated descriptions and statuses in help texts and configuration files to reflect the new Torch sampler option and its maturity level.
- Simplified example usage by removing deprecated sampler flags.

Description

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

coderabbitai · 2025-07-21T14:04:05Z

📝 Walkthrough

Walkthrough

This change set standardizes the naming of a configuration parameter across the codebase, renaming enable_trtllm_sampler to use_torch_sampler and inverting its logic to reflect the use of the Torch sampler instead of the TRTLLM sampler. The update is applied in Python, C++, YAML configs, test code, and documentation. Additionally, C++ code and bindings rename "maxBatchSize" to "maxNumSequences" for semantic clarity. Minor logic improvements and logging enhancements are also included.

Changes

Cohort / File(s)	Change Summary
Sampler Option Rename & Logic Inversion `tensorrt_llm/llmapi/llm_args.py`, `tensorrt_llm/_torch/pyexecutor/config.py`, `tensorrt_llm/_torch/pyexecutor/_util.py`, `examples/llm-api/quickstart_advanced.py`, `tests/integration/defs/disaggregated/test_configs/disagg_config_torch_sampler.yaml`, `tests/integration/defs/disaggregated/test_configs/disagg_config_ngram.yaml`, `tests/unittest/api_stability/references/llm.yaml`, `tests/integration/defs/disaggregated/test_disaggregated.py`, `tests/integration/test_lists/qa/llm_function_full.txt`, `tests/integration/test_lists/qa/llm_function_sanity.txt`	Renames `enable_trtllm_sampler` to `use_torch_sampler` throughout code, configs, and docs; inverts logic so the flag enables the Torch sampler; updates descriptions and status; updates all references and related test identifiers.
Test Updates for Sampler Option `tests/integration/defs/accuracy/test_llm_api_pytorch.py`, `tests/unittest/_torch/modeling/test_modeling_nemotron_h.py`, `tests/unittest/_torch/test_beam_search.py`, `tests/unittest/_torch/test_overlap_scheduler.py`, `tests/unittest/_torch/test_return_logits.py`, `tests/unittest/_torch/test_trtllm_sampler.py`, `tests/unittest/_torch/speculative/test_draft_target.py`, `tests/unittest/_torch/speculative/test_eagle3.py`, `tests/unittest/llmapi/apps/_test_openai_misc.py`, `tests/unittest/llmapi/apps/_test_openai_chat_multimodal.py`	Updates all test code to use `use_torch_sampler` instead of `enable_trtllm_sampler`; adjusts test logic, parameterization, and config construction accordingly; updates fixtures and server startup args where relevant.
Sampler Instantiation & Logging `tensorrt_llm/_torch/pyexecutor/_util.py`, `tensorrt_llm/_torch/pyexecutor/py_executor_creator.py`	Modifies sampler instantiation logic to match new flag meaning; adds logging of sampler type at runtime; improves robustness with `getattr` usage for speculative config flags.
C++ API: Batch Size → Num Sequences `cpp/include/tensorrt_llm/runtime/decoderState.h`, `cpp/include/tensorrt_llm/runtime/gptDecoder.h`, `cpp/include/tensorrt_llm/runtime/gptDecoderBatched.h`, `cpp/include/tensorrt_llm/runtime/iGptDecoderBatched.h`, `cpp/tensorrt_llm/runtime/decoderState.cpp`, `cpp/tensorrt_llm/runtime/gptDecoder.cpp`, `cpp/tensorrt_llm/runtime/gptDecoderBatched.cpp`, `cpp/tensorrt_llm/batch_manager/createNewDecoderRequests.cpp`, `cpp/tensorrt_llm/pybind/runtime/bindings.cpp`	Renames all C++ API parameters, member variables, and comments from `maxBatchSize` to `maxNumSequences` for clarity; updates Python bindings accordingly; applies change to all relevant constructors, methods, and internal logic.
Sampler Implementation Improvements `tensorrt_llm/_torch/pyexecutor/sampler.py`	Renames `num_seq_slots` to `max_num_sequences` for clarity; updates tensor shapes and setup calls; makes `finalize_events` optional with docstring; adds new tensor to store; adjusts attention window assignment; improves event handling robustness.
Sampling Kernel Error Reporting `cpp/tensorrt_llm/kernels/samplingTopKKernels.h`	Enhances error messages in parameter checks by using macros with detailed info for out-of-range values; includes necessary header for assertions.
Test Keyword & Fixture Updates `tests/integration/defs/test_e2e.py`, `tests/unittest/llmapi/apps/_test_openai_misc.py`	Updates expected keywords in a multimodal test; parameterizes model fixture based on backend; adjusts max sequence length fixture for model compatibility.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI/Config
    participant LLM Constructor
    participant PyExecutor
    participant Sampler (TorchSampler/TRTLLMSampler)

    User->>CLI/Config: Set --use_torch_sampler flag
    CLI/Config->>LLM Constructor: Pass use_torch_sampler param
    LLM Constructor->>PyExecutor: Pass use_torch_sampler in config
    PyExecutor->>Sampler: Instantiate based on use_torch_sampler
    Sampler-->>PyExecutor: TorchSampler or TRTLLMSampler instance
    PyExecutor-->>LLM Constructor: Sampler ready
    LLM Constructor-->>User: LLM object initialized

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

[TRTLLM-6761][refactor] Replace LogitBiasLogitsProcessor with embedding bias tensor system #6464: Refactors logit bias handling by replacing the LogitBiasLogitsProcessor with an embedding bias tensor system and adds embedding bias support in the TorchSampler; related due to changes in sampling subsystem but no direct overlap.

Suggested labels

Community want to contribute

Suggested reviewers

pcastonguay
litaotju
yilin-void
Superjomn

📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f66a946 and 8f678cb.

📒 Files selected for processing (1)

tests/integration/test_lists/waives.txt (1 hunks)

✅ Files skipped from review due to trivial changes (1)

tests/integration/test_lists/waives.txt

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

dcampora · 2025-07-21T14:06:22Z

/bot run

tensorrt-cicd · 2025-07-21T14:12:02Z

PR_Github #12446 [ run ] triggered by Bot

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

tests/integration/defs/disaggregated/test_configs/disagg_config_trtllm_sampler.yaml (1)

14-14: Consider renaming the configuration file to reflect the new parameter.

The filename disagg_config_trtllm_sampler.yaml still references the old parameter name, but now uses enable_torch_sampler. Consider renaming the file to disagg_config_torch_sampler.yaml or similar to maintain consistency with the new parameter naming.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9832bef and 6673725.

📒 Files selected for processing (12)

examples/llm-api/quickstart_advanced.py (2 hunks)
tensorrt_llm/_torch/pyexecutor/_util.py (1 hunks)
tensorrt_llm/_torch/pyexecutor/config.py (1 hunks)
tensorrt_llm/llmapi/llm_args.py (2 hunks)
tests/integration/defs/accuracy/test_llm_api_pytorch.py (2 hunks)
tests/integration/defs/disaggregated/test_configs/disagg_config_trtllm_sampler.yaml (2 hunks)
tests/unittest/_torch/modeling/test_modeling_nemotron_h.py (1 hunks)
tests/unittest/_torch/test_beam_search.py (1 hunks)
tests/unittest/_torch/test_overlap_scheduler.py (4 hunks)
tests/unittest/_torch/test_return_logits.py (4 hunks)
tests/unittest/_torch/test_trtllm_sampler.py (1 hunks)
tests/unittest/api_stability/references/llm.yaml (1 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (2)

tests/unittest/_torch/test_overlap_scheduler.py (4)

tests/unittest/_torch/test_trtllm_sampler.py (3)

create_llm (25-41)

model_path (21-22)

test_case (15-17)

tests/unittest/llmapi/test_llm_kv_cache_events.py (1)

create_llm (49-54)

tensorrt_llm/llmapi/llm_args.py (2)

model_dir (908-910)

model_dir (913-917)

tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py (1)

model_path (31-44)

tensorrt_llm/_torch/pyexecutor/_util.py (3)

tensorrt_llm/_torch/pyexecutor/sampler.py (3)

TorchSampler (208-457)

EarlyStopSampler (70-97)

TRTLLMSampler (483-942)

tensorrt_llm/_torch/models/checkpoints/base_weight_mapper.py (2)

model (162-165)

mapping (152-153)

tensorrt_llm/runtime/generation.py (2)

dtype (815-816)

dtype (1218-1219)

🔇 Additional comments (20)

tests/integration/defs/disaggregated/test_configs/disagg_config_trtllm_sampler.yaml (1)

14-14: Confirm TRTLLM Sampler Activation
I wasn’t able to locate the Python‐level branch that inverts enable_torch_sampler (it’s likely handled in the native extension), so please manually verify that setting
tests/integration/defs/disaggregated/test_configs/disagg_config_trtllm_sampler.yaml:14
  enable_torch_sampler: False
indeed forces use of the TRTLLM sampler. You can confirm by:

Reviewing the native (C++/CUDA) entry point where enable_torch_sampler is interpreted.

Adding a small smoke test that inspects or logs which sampler implementation is chosen.
tests/unittest/_torch/test_beam_search.py (1)

46-46: LGTM! Correct parameter migration.

The change from enable_trtllm_sampler=True to enable_torch_sampler=False correctly maintains the same sampler behavior (TRTLLM) while adopting the new parameter naming and inverted logic.
tests/unittest/api_stability/references/llm.yaml (1)

87-89: Confirm default sampler behavior flip

The rename in tests/unittest/api_stability/references/llm.yaml (lines 87–89) from
enable_trtllm_sampler:
  annotation: bool
  default: False
to
enable_torch_sampler:
  annotation: bool
  default: False
inverts the default sampler:

Old default (enable_trtllm_sampler=False): uses TorchSampler

New default (enable_torch_sampler=False): uses TRTLLMSampler

This aligns with the PR objective “Enable TRTLLM sampler by default” but introduces a breaking change for users relying on the previous default. Please verify that switching to TRTLLMSampler as the default sampler is intentional and acceptable for downstream consumers.
tests/unittest/_torch/test_trtllm_sampler.py (1)

27-27: LGTM! Correct sampler configuration for TRTLLM sampler testing.

The change from enable_trtllm_sampler=True to enable_torch_sampler=False correctly maintains the TRTLLM sampler selection for this test, which is appropriate given the test file's purpose (test_trtllm_sampler.py).

tensorrt_llm/_torch/pyexecutor/config.py (1)

57-60: instantiate_sampler correctly respects enable_torch_sampler

Verified instantiate_sampler in _util.py (lines 555–570) checks pytorch_backend_config.enable_torch_sampler and returns TorchSampler(sampler_args) when true, falling back to TRTLLMSampler otherwise.

Confirmed no remaining references to the old enable_trtllm_sampler flag.

All sampler instantiation logic aligns with the updated docstring in config.py.

LGTM!

tests/integration/defs/accuracy/test_llm_api_pytorch.py (2)

198-198: Verify the test's intended sampler selection.

The change from enable_trtllm_sampler=True to enable_torch_sampler=True represents both a flag rename and a logic inversion. Please confirm that this test should indeed be using the Torch sampler rather than the TRTLLM sampler for FP8 LLM sampling validation.

231-231: LGTM - Appropriate sampler selection for beam search test.

The change to enable_torch_sampler=False ensures this beam search test uses the TRTLLM sampler (default behavior), which is appropriate since the test focuses on beam search functionality rather than sampler comparison.

examples/llm-api/quickstart_advanced.py (1)

57-57: LGTM - Consistent flag rename.

The changes consistently rename the sampler flag from --enable_trtllm_sampler to --enable_torch_sampler in both the argument parser definition and LLM constructor usage. This aligns perfectly with the broader codebase refactor.

Also applies to: 211-211

tests/unittest/_torch/modeling/test_modeling_nemotron_h.py (1)

43-43: LGTM - Behavior preserved with new flag.

The change from enable_trtllm_sampler=True to enable_torch_sampler=False maintains the same effective behavior (using the TRTLLM sampler) while adopting the new flag convention. This is appropriate for this Nemotron-H correctness test.

tensorrt_llm/llmapi/llm_args.py (2)

1849-1852: LGTM! Semantic inversion correctly implements the PR objective.

The field renaming from enable_trtllm_sampler to enable_torch_sampler with inverted logic successfully enables the TRTLLM sampler by default while maintaining the same default value (False). The updated description clearly documents the new behavior.

2113-2113: LGTM! Parameter usage correctly updated for consistency.

The parameter passed to PyTorchConfig constructor properly reflects the field renaming, maintaining consistency across the codebase.

tests/unittest/_torch/test_overlap_scheduler.py (4)

24-27: LGTM! Parameter renaming maintains consistency with codebase changes.

The function signature and dictionary key updates correctly reflect the field renaming from enable_trtllm_sampler to enable_torch_sampler.

44-47: LGTM! Test parameterization correctly updated.

The pytest parameterization and function signature properly reflect the parameter renaming while maintaining the same test coverage for both sampler configurations.

53-53: LGTM! Stop words logic correctly adapted to inverted flag semantics.

The condition if not enable_torch_sampler correctly maintains the intended behavior - stop words are set when using the TRTLLM sampler (when enable_torch_sampler=False), which aligns with the inverted flag semantics.

65-65: LGTM! Function calls correctly updated with new parameter name.

The create_llm function calls properly use the renamed enable_torch_sampler parameter, maintaining consistency with the updated function signature.

Also applies to: 77-77

tests/unittest/_torch/test_return_logits.py (4)

19-19: LGTM! Test parameterization correctly updated.

The pytest parameterization and function signature properly reflect the parameter renaming while maintaining the same test coverage.

Also applies to: 22-22

30-31: LGTM! Skip condition correctly adapted to inverted flag semantics.

The condition if enable_torch_sampler and gather_context_logits correctly identifies when the Torch sampler is being used and skips the test appropriately, maintaining the intended behavior after the flag semantic inversion.

44-44: LGTM! Constructor parameter correctly updated.

The LLM constructor parameter properly uses the renamed enable_torch_sampler field, maintaining consistency with the codebase changes.

86-86: LGTM! Second test function correctly updated.

The pytest parameterization and function signature in the second test function properly reflect the parameter renaming, consistent with the first test function.

Also applies to: 89-89

tensorrt_llm/_torch/pyexecutor/_util.py (1)

575-576: LGTM: Torch sampler flag implementation is correct.

The condition correctly checks for enable_torch_sampler and instantiates the appropriate TorchSampler when enabled.

tensorrt_llm/_torch/pyexecutor/_util.py

tests/unittest/_torch/test_return_logits.py

tensorrt-cicd · 2025-07-21T15:48:45Z

PR_Github #12446 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #9255 completed with status: 'FAILURE'

dcampora · 2025-07-22T05:29:06Z

/bot run

tensorrt-cicd · 2025-07-22T05:34:10Z

PR_Github #12523 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-22T06:56:19Z

PR_Github #12523 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #9307 completed with status: 'FAILURE'

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

tensorrt_llm/_torch/pyexecutor/_util.py (1)

575-582: LGTM: Logic reversal correctly implements the default sampler change.

The flag renaming and logic inversion correctly achieves the PR objective of making TRTLLMSampler the default. The implementation properly handles the semantic reversal where enable_torch_sampler=True now selects TorchSampler, while the default behavior uses TRTLLMSampler.

Note: This aligns with the previous review comment about the breaking change requiring documentation and migration guidance.

🧹 Nitpick comments (1)

tensorrt_llm/_torch/pyexecutor/_util.py (1)
645-645: Fix line length violation.

The line exceeds the 120-character limit (131 characters). Consider breaking it into multiple lines for better readability.
-            "Model is built with 'explicit draft tokens' decoding, but decoding mode is something else. Overwriting decoding mode."
+            "Model is built with 'explicit draft tokens' decoding, but decoding mode is "
+            "something else. Overwriting decoding mode."

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8ae6c16 and 69b55e0.

📒 Files selected for processing (1)

tensorrt_llm/_torch/pyexecutor/_util.py (6 hunks)

🪛 Ruff (0.12.2)

tensorrt_llm/_torch/pyexecutor/_util.py

645-645: Line too long (131 > 120)

(E501)

🧰 Additional context used

🪛 Ruff (0.12.2)

tensorrt_llm/_torch/pyexecutor/_util.py

645-645: Line too long (131 > 120)

(E501)

🔇 Additional comments (1)

tensorrt_llm/_torch/pyexecutor/_util.py (1)

604-605: LGTM: Defensive programming prevents AttributeError exceptions.

The change from direct attribute access to getattr with False defaults is a good defensive programming practice. This prevents potential AttributeError exceptions when speculative_config lacks the expected boolean attributes, while maintaining the same logical behavior.

Also applies to: 612-613, 623-624, 631-632, 642-643, 650-652, 662-663, 670-671, 681-682

dcampora · 2025-07-22T07:44:37Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-07-22T07:49:52Z

PR_Github #12541 [ run ] triggered by Bot

dcampora · 2025-07-22T12:58:13Z

/bot run --disable-fail-fast

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

tensorrt_llm/_torch/pyexecutor/_util.py (1)

575-582: Breaking change implemented correctly but still lacks documentation

The sampler selection logic has been correctly inverted to make TRTLLMSampler the default, which aligns with the PR objective. However, the previous concern about missing documentation for this breaking change remains unaddressed.

🧹 Nitpick comments (1)

tensorrt_llm/_torch/pyexecutor/_util.py (1)

575-575: Fix line length violations

Two lines exceed the 120-character limit. Consider breaking them up for better readability:

For line 575:

-    if pytorch_backend_config.enable_torch_sampler or pytorch_backend_config.enable_mixed_sampler or engine.spec_config is not None:
+    if (pytorch_backend_config.enable_torch_sampler or 
+        pytorch_backend_config.enable_mixed_sampler or 
+        engine.spec_config is not None):

For line 645:

-        logger.warning(
-            "Model is built with 'explicit draft tokens' decoding, but decoding mode is something else. Overwriting decoding mode."
-        )
+        logger.warning(
+            "Model is built with 'explicit draft tokens' decoding, but "
+            "decoding mode is something else. Overwriting decoding mode."
+        )

Also applies to: 645-645

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between be424dc and 5b786d2.

📒 Files selected for processing (4)

tensorrt_llm/_torch/pyexecutor/_util.py (6 hunks)
tests/integration/defs/disaggregated/test_configs/disagg_config_ngram.yaml (1 hunks)
tests/unittest/_torch/speculative/test_draft_target.py (1 hunks)
tests/unittest/_torch/speculative/test_eagle3.py (1 hunks)

🧬 Code Graph Analysis (1)

tests/unittest/_torch/speculative/test_draft_target.py (1)

tensorrt_llm/llmapi/llm_args.py (2)

DraftTargetDecodingConfig (404-413)

speculative_model_dir (1159-1160)

🪛 Ruff (0.12.2)

tensorrt_llm/_torch/pyexecutor/_util.py

575-575: Line too long (132 > 120)

(E501)

645-645: Line too long (131 > 120)

(E501)

✅ Files skipped from review due to trivial changes (1)

tests/integration/defs/disaggregated/test_configs/disagg_config_ngram.yaml

🧰 Additional context used

🧬 Code Graph Analysis (1)

tests/unittest/_torch/speculative/test_draft_target.py (1)

tensorrt_llm/llmapi/llm_args.py (2)

DraftTargetDecodingConfig (404-413)

speculative_model_dir (1159-1160)

🪛 Ruff (0.12.2)

tensorrt_llm/_torch/pyexecutor/_util.py

575-575: Line too long (132 > 120)

(E501)

645-645: Line too long (131 > 120)

(E501)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (4)

tests/unittest/_torch/speculative/test_eagle3.py (1)

62-62: LGTM: Correct flag usage for enabling Torch sampler

The addition of enable_torch_sampler=True correctly reflects the new flag semantics where this parameter explicitly selects the Torch sampler. This is appropriate for Eagle3 speculative decoding tests.

tests/unittest/_torch/speculative/test_draft_target.py (2)

44-44: LGTM: Correct flag usage in main configuration

The addition of enable_torch_sampler=True in the main LLM configuration correctly enables the Torch sampler for draft-target speculative decoding.

50-50: LGTM: Consistent sampler configuration in speculative config

Adding enable_torch_sampler=True to the DraftTargetDecodingConfig ensures consistent sampler selection across both the main model and speculative configuration. This is appropriate since the DraftTargetDecodingConfig only supports the "pytorch" backend.

tensorrt_llm/_torch/pyexecutor/_util.py (1)

604-605: LGTM: Improved error handling with defensive getattr usage

The replacement of direct attribute access with getattr(executor_config.speculative_config, "attribute", False) calls is excellent defensive programming. This prevents AttributeError exceptions when speculative_config is None or missing these boolean attributes, defaulting safely to False.

The consistent pattern across all speculative configuration checks improves the robustness of the decoding mode determination logic.

Also applies to: 612-613, 623-624, 631-632, 642-643, 650-652, 662-663, 670-671, 681-682

tensorrt-cicd · 2025-07-22T13:03:32Z

PR_Github #12567 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-22T13:03:34Z

PR_Github #12541 [ run ] completed with state ABORTED

tests/unittest/_torch/speculative/test_eagle3.py

tensorrt-cicd · 2025-07-23T00:53:16Z

PR_Github #12567 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #9347 completed with status: 'FAILURE'

dcampora · 2025-07-23T07:48:41Z

/bot run

tensorrt-cicd · 2025-07-23T07:53:44Z

PR_Github #12676 [ run ] triggered by Bot

dcampora · 2025-07-23T08:12:39Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-07-23T08:18:32Z

PR_Github #12679 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-23T08:18:34Z

PR_Github #12676 [ run ] completed with state ABORTED

…coder classes - Updated the parameter names and related comments in the DecoderState and GptDecoder classes to reflect the change from maxBatchSize to maxNumSequences. - Adjustments were made in the setup methods, member variables, and associated bindings in the Python interface. - This change improves clarity regarding the number of sequences being processed. Signed-off-by: Robin Kobus <[email protected]>

`Optional` to accommodate `_forward_step_inter_pp` which creates a `SampleState` without `finalize_events` Signed-off-by: Netanel Haber <[email protected]>

Signed-off-by: Netanel Haber <[email protected]> something Signed-off-by: Netanel Haber <[email protected]>

Signed-off-by: Daniel Campora <[email protected]>

Signed-off-by: Netanel Haber <[email protected]>

Signed-off-by: Daniel Campora <[email protected]>

Signed-off-by: Netanel Haber <[email protected]>

Signed-off-by: Daniel Campora <[email protected]>

dcampora · 2025-08-07T15:12:18Z

/bot run

tensorrt-cicd · 2025-08-07T15:18:04Z

PR_Github #14488 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-07T16:42:43Z

PR_Github #14488 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #10943 completed with status: 'FAILURE'

Signed-off-by: Daniel Campora <[email protected]>

dcampora · 2025-08-07T19:47:44Z

/bot run

tensorrt-cicd · 2025-08-07T19:53:12Z

PR_Github #14512 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-08T02:19:34Z

PR_Github #14512 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #10961 completed with status: 'SUCCESS'

…VIDIA#6216) Signed-off-by: Daniel Campora <[email protected]>

dcampora requested review from a team as code owners July 21, 2025 14:03

dcampora requested a review from juney-nvidia July 21, 2025 14:03

coderabbitai bot reviewed Jul 21, 2025

View reviewed changes

tensorrt_llm/_torch/pyexecutor/_util.py Show resolved Hide resolved

tests/unittest/_torch/test_return_logits.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Jul 22, 2025

View reviewed changes

QiJune requested review from litaotju, kaiyux and laikhtewari July 23, 2025 00:25

QiJune reviewed Jul 23, 2025

View reviewed changes

tests/unittest/_torch/speculative/test_eagle3.py Outdated Show resolved Hide resolved

dcampora force-pushed the user/dcampora/enable_trtllm_sampler_default branch from 6c48327 to 30376d2 Compare July 23, 2025 08:06

Funatiq and others added 11 commits August 7, 2025 15:11

finalize_events: dict[str, CudaEvent]** | None = None**

810d0ff

`Optional` to accommodate `_forward_step_inter_pp` which creates a `SampleState` without `finalize_events` Signed-off-by: Netanel Haber <[email protected]>

wording

435d19e

Signed-off-by: Netanel Haber <[email protected]> something Signed-off-by: Netanel Haber <[email protected]>

Adapt to use_torch_sampler.

5a5d642

Signed-off-by: Daniel Campora <[email protected]>

Fix test_openai_chat_multimodal.

2e653a1

Signed-off-by: Daniel Campora <[email protected]>

setup is_attention_dp_dummy=True gen requests

41ff222

Signed-off-by: Netanel Haber <[email protected]>

Use torch sampler in multimodal.

cd1c497

Signed-off-by: Daniel Campora <[email protected]>

minimal disagg fix

7bec668

Signed-off-by: Netanel Haber <[email protected]>

Adapt keywords.

98a8504

Signed-off-by: Daniel Campora <[email protected]>

Update expected output in multimodal test.

08db006

Signed-off-by: Daniel Campora <[email protected]>

Put multimodal tests on torch sampler due to memory consumption.

f66a946

Signed-off-by: Daniel Campora <[email protected]>

dcampora force-pushed the user/dcampora/enable_trtllm_sampler_default branch from 2610cd2 to f66a946 Compare August 7, 2025 15:12

Waive flaky test_ptp tests.

8f678cb

Signed-off-by: Daniel Campora <[email protected]>

dcampora merged commit efca359 into NVIDIA:main Aug 8, 2025
5 checks passed

Shunkangz pushed a commit to hcyezhang/TensorRT-LLM that referenced this pull request Aug 8, 2025

[TRTLLM-6785][feat] BREAKING CHANGE Enable TRTLLM sampler by default (N…

a55d764

…VIDIA#6216) Signed-off-by: Daniel Campora <[email protected]>

Funatiq mentioned this pull request Aug 11, 2025

refactor: Rename maxBatchSize to maxNumSequences in decoder #5321

Closed

JunyiXu-nv mentioned this pull request Aug 12, 2025

[https://nvbugs/5437106][fix] Add L4 Scout benchmarking WAR option in deploy guide #6829

Merged

This was referenced Aug 22, 2025

refactor: Remove IGptDecoderBatched interface #5577

Draft

[None][refactor] Improve lookahead decoding interfaces #5576

Open

[TRTLLM-6785][feat] BREAKING CHANGE Enable TRTLLM sampler by default #6216

[TRTLLM-6785][feat] BREAKING CHANGE Enable TRTLLM sampler by default #6216

Uh oh!

Conversation

dcampora commented Jul 21, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

coderabbitai bot commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

dcampora commented Jul 21, 2025

Uh oh!

tensorrt-cicd commented Jul 21, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tensorrt-cicd commented Jul 21, 2025

Uh oh!

dcampora commented Jul 22, 2025

Uh oh!

tensorrt-cicd commented Jul 22, 2025

Uh oh!

tensorrt-cicd commented Jul 22, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

dcampora commented Jul 22, 2025

Uh oh!

tensorrt-cicd commented Jul 22, 2025

Uh oh!

dcampora commented Jul 22, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented Jul 22, 2025

Uh oh!

tensorrt-cicd commented Jul 22, 2025

Uh oh!

Uh oh!

tensorrt-cicd commented Jul 23, 2025

Uh oh!

dcampora commented Jul 23, 2025

Uh oh!

tensorrt-cicd commented Jul 23, 2025

Uh oh!

dcampora commented Jul 23, 2025

Uh oh!

tensorrt-cicd commented Jul 23, 2025

Uh oh!

tensorrt-cicd commented Jul 23, 2025

Uh oh!

dcampora commented Aug 7, 2025

Uh oh!

tensorrt-cicd commented Aug 7, 2025

Uh oh!

tensorrt-cicd commented Aug 7, 2025

dcampora commented Jul 21, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 21, 2025 •

edited

Loading