Skip to content

Conversation

dcampora
Copy link
Collaborator

@dcampora dcampora commented Jul 21, 2025

Summary by CodeRabbit

  • New Features

    • Introduced a new option to enable the Torch sampler for model inference, replacing the previous TensorRT LLM sampler flag across user interfaces, configuration files, and command-line arguments.
  • Refactor

    • Renamed parameters and configuration options from enable_trtllm_sampler to use_torch_sampler throughout the application and documentation for improved clarity.
    • Updated terminology from "max batch size" to "max number of sequences" in APIs and user-facing documentation for model decoding.
    • Improved internal naming and added optionality to sampler state attributes for better clarity and robustness.
    • Enhanced sampler setup to handle additional request types and improved event handling robustness.
  • Bug Fixes

    • Enhanced error messaging for out-of-range sampling parameters.
    • Improved robustness against missing configuration attributes and event handling.
  • Tests

    • Updated all relevant tests and configurations to use the new sampler flag and naming conventions.
    • Adjusted test expectations and fixtures to reflect updated model and configuration options.
  • Documentation

    • Updated descriptions and statuses in help texts and configuration files to reflect the new Torch sampler option and its maturity level.
    • Simplified example usage by removing deprecated sampler flags.

Description

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@dcampora dcampora requested review from a team as code owners July 21, 2025 14:03
@dcampora dcampora requested a review from juney-nvidia July 21, 2025 14:03
Copy link
Contributor

coderabbitai bot commented Jul 21, 2025

📝 Walkthrough

Walkthrough

This change set standardizes the naming of a configuration parameter across the codebase, renaming enable_trtllm_sampler to use_torch_sampler and inverting its logic to reflect the use of the Torch sampler instead of the TRTLLM sampler. The update is applied in Python, C++, YAML configs, test code, and documentation. Additionally, C++ code and bindings rename "maxBatchSize" to "maxNumSequences" for semantic clarity. Minor logic improvements and logging enhancements are also included.

Changes

Cohort / File(s) Change Summary
Sampler Option Rename & Logic Inversion
tensorrt_llm/llmapi/llm_args.py, tensorrt_llm/_torch/pyexecutor/config.py, tensorrt_llm/_torch/pyexecutor/_util.py, examples/llm-api/quickstart_advanced.py, tests/integration/defs/disaggregated/test_configs/disagg_config_torch_sampler.yaml, tests/integration/defs/disaggregated/test_configs/disagg_config_ngram.yaml, tests/unittest/api_stability/references/llm.yaml, tests/integration/defs/disaggregated/test_disaggregated.py, tests/integration/test_lists/qa/llm_function_full.txt, tests/integration/test_lists/qa/llm_function_sanity.txt
Renames enable_trtllm_sampler to use_torch_sampler throughout code, configs, and docs; inverts logic so the flag enables the Torch sampler; updates descriptions and status; updates all references and related test identifiers.
Test Updates for Sampler Option
tests/integration/defs/accuracy/test_llm_api_pytorch.py, tests/unittest/_torch/modeling/test_modeling_nemotron_h.py, tests/unittest/_torch/test_beam_search.py, tests/unittest/_torch/test_overlap_scheduler.py, tests/unittest/_torch/test_return_logits.py, tests/unittest/_torch/test_trtllm_sampler.py, tests/unittest/_torch/speculative/test_draft_target.py, tests/unittest/_torch/speculative/test_eagle3.py, tests/unittest/llmapi/apps/_test_openai_misc.py, tests/unittest/llmapi/apps/_test_openai_chat_multimodal.py
Updates all test code to use use_torch_sampler instead of enable_trtllm_sampler; adjusts test logic, parameterization, and config construction accordingly; updates fixtures and server startup args where relevant.
Sampler Instantiation & Logging
tensorrt_llm/_torch/pyexecutor/_util.py, tensorrt_llm/_torch/pyexecutor/py_executor_creator.py
Modifies sampler instantiation logic to match new flag meaning; adds logging of sampler type at runtime; improves robustness with getattr usage for speculative config flags.
C++ API: Batch Size → Num Sequences
cpp/include/tensorrt_llm/runtime/decoderState.h, cpp/include/tensorrt_llm/runtime/gptDecoder.h, cpp/include/tensorrt_llm/runtime/gptDecoderBatched.h, cpp/include/tensorrt_llm/runtime/iGptDecoderBatched.h, cpp/tensorrt_llm/runtime/decoderState.cpp, cpp/tensorrt_llm/runtime/gptDecoder.cpp, cpp/tensorrt_llm/runtime/gptDecoderBatched.cpp, cpp/tensorrt_llm/batch_manager/createNewDecoderRequests.cpp, cpp/tensorrt_llm/pybind/runtime/bindings.cpp
Renames all C++ API parameters, member variables, and comments from maxBatchSize to maxNumSequences for clarity; updates Python bindings accordingly; applies change to all relevant constructors, methods, and internal logic.
Sampler Implementation Improvements
tensorrt_llm/_torch/pyexecutor/sampler.py
Renames num_seq_slots to max_num_sequences for clarity; updates tensor shapes and setup calls; makes finalize_events optional with docstring; adds new tensor to store; adjusts attention window assignment; improves event handling robustness.
Sampling Kernel Error Reporting
cpp/tensorrt_llm/kernels/samplingTopKKernels.h
Enhances error messages in parameter checks by using macros with detailed info for out-of-range values; includes necessary header for assertions.
Test Keyword & Fixture Updates
tests/integration/defs/test_e2e.py, tests/unittest/llmapi/apps/_test_openai_misc.py
Updates expected keywords in a multimodal test; parameterizes model fixture based on backend; adjusts max sequence length fixture for model compatibility.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI/Config
    participant LLM Constructor
    participant PyExecutor
    participant Sampler (TorchSampler/TRTLLMSampler)

    User->>CLI/Config: Set --use_torch_sampler flag
    CLI/Config->>LLM Constructor: Pass use_torch_sampler param
    LLM Constructor->>PyExecutor: Pass use_torch_sampler in config
    PyExecutor->>Sampler: Instantiate based on use_torch_sampler
    Sampler-->>PyExecutor: TorchSampler or TRTLLMSampler instance
    PyExecutor-->>LLM Constructor: Sampler ready
    LLM Constructor-->>User: LLM object initialized
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

Community want to contribute

Suggested reviewers

  • pcastonguay
  • litaotju
  • yilin-void
  • Superjomn

📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f66a946 and 8f678cb.

📒 Files selected for processing (1)
  • tests/integration/test_lists/waives.txt (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • tests/integration/test_lists/waives.txt
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@dcampora
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12446 [ run ] triggered by Bot

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
tests/integration/defs/disaggregated/test_configs/disagg_config_trtllm_sampler.yaml (1)

14-14: Consider renaming the configuration file to reflect the new parameter.

The filename disagg_config_trtllm_sampler.yaml still references the old parameter name, but now uses enable_torch_sampler. Consider renaming the file to disagg_config_torch_sampler.yaml or similar to maintain consistency with the new parameter naming.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9832bef and 6673725.

📒 Files selected for processing (12)
  • examples/llm-api/quickstart_advanced.py (2 hunks)
  • tensorrt_llm/_torch/pyexecutor/_util.py (1 hunks)
  • tensorrt_llm/_torch/pyexecutor/config.py (1 hunks)
  • tensorrt_llm/llmapi/llm_args.py (2 hunks)
  • tests/integration/defs/accuracy/test_llm_api_pytorch.py (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_trtllm_sampler.yaml (2 hunks)
  • tests/unittest/_torch/modeling/test_modeling_nemotron_h.py (1 hunks)
  • tests/unittest/_torch/test_beam_search.py (1 hunks)
  • tests/unittest/_torch/test_overlap_scheduler.py (4 hunks)
  • tests/unittest/_torch/test_return_logits.py (4 hunks)
  • tests/unittest/_torch/test_trtllm_sampler.py (1 hunks)
  • tests/unittest/api_stability/references/llm.yaml (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
tests/unittest/_torch/test_overlap_scheduler.py (4)
tests/unittest/_torch/test_trtllm_sampler.py (3)
  • create_llm (25-41)
  • model_path (21-22)
  • test_case (15-17)
tests/unittest/llmapi/test_llm_kv_cache_events.py (1)
  • create_llm (49-54)
tensorrt_llm/llmapi/llm_args.py (2)
  • model_dir (908-910)
  • model_dir (913-917)
tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py (1)
  • model_path (31-44)
tensorrt_llm/_torch/pyexecutor/_util.py (3)
tensorrt_llm/_torch/pyexecutor/sampler.py (3)
  • TorchSampler (208-457)
  • EarlyStopSampler (70-97)
  • TRTLLMSampler (483-942)
tensorrt_llm/_torch/models/checkpoints/base_weight_mapper.py (2)
  • model (162-165)
  • mapping (152-153)
tensorrt_llm/runtime/generation.py (2)
  • dtype (815-816)
  • dtype (1218-1219)
🔇 Additional comments (20)
tests/integration/defs/disaggregated/test_configs/disagg_config_trtllm_sampler.yaml (1)

14-14: Confirm TRTLLM Sampler Activation
I wasn’t able to locate the Python‐level branch that inverts enable_torch_sampler (it’s likely handled in the native extension), so please manually verify that setting

tests/integration/defs/disaggregated/test_configs/disagg_config_trtllm_sampler.yaml:14
  enable_torch_sampler: False

indeed forces use of the TRTLLM sampler. You can confirm by:

  • Reviewing the native (C++/CUDA) entry point where enable_torch_sampler is interpreted.
  • Adding a small smoke test that inspects or logs which sampler implementation is chosen.
tests/unittest/_torch/test_beam_search.py (1)

46-46: LGTM! Correct parameter migration.

The change from enable_trtllm_sampler=True to enable_torch_sampler=False correctly maintains the same sampler behavior (TRTLLM) while adopting the new parameter naming and inverted logic.

tests/unittest/api_stability/references/llm.yaml (1)

87-89: Confirm default sampler behavior flip

The rename in tests/unittest/api_stability/references/llm.yaml (lines 87–89) from

enable_trtllm_sampler:
  annotation: bool
  default: False

to

enable_torch_sampler:
  annotation: bool
  default: False

inverts the default sampler:

  • Old default (enable_trtllm_sampler=False): uses TorchSampler
  • New default (enable_torch_sampler=False): uses TRTLLMSampler

This aligns with the PR objective “Enable TRTLLM sampler by default” but introduces a breaking change for users relying on the previous default. Please verify that switching to TRTLLMSampler as the default sampler is intentional and acceptable for downstream consumers.

tests/unittest/_torch/test_trtllm_sampler.py (1)

27-27: LGTM! Correct sampler configuration for TRTLLM sampler testing.

The change from enable_trtllm_sampler=True to enable_torch_sampler=False correctly maintains the TRTLLM sampler selection for this test, which is appropriate given the test file's purpose (test_trtllm_sampler.py).

tensorrt_llm/_torch/pyexecutor/config.py (1)

57-60: instantiate_sampler correctly respects enable_torch_sampler

  • Verified instantiate_sampler in _util.py (lines 555–570) checks pytorch_backend_config.enable_torch_sampler and returns TorchSampler(sampler_args) when true, falling back to TRTLLMSampler otherwise.
  • Confirmed no remaining references to the old enable_trtllm_sampler flag.
  • All sampler instantiation logic aligns with the updated docstring in config.py.

LGTM!

tests/integration/defs/accuracy/test_llm_api_pytorch.py (2)

198-198: Verify the test's intended sampler selection.

The change from enable_trtllm_sampler=True to enable_torch_sampler=True represents both a flag rename and a logic inversion. Please confirm that this test should indeed be using the Torch sampler rather than the TRTLLM sampler for FP8 LLM sampling validation.


231-231: LGTM - Appropriate sampler selection for beam search test.

The change to enable_torch_sampler=False ensures this beam search test uses the TRTLLM sampler (default behavior), which is appropriate since the test focuses on beam search functionality rather than sampler comparison.

examples/llm-api/quickstart_advanced.py (1)

57-57: LGTM - Consistent flag rename.

The changes consistently rename the sampler flag from --enable_trtllm_sampler to --enable_torch_sampler in both the argument parser definition and LLM constructor usage. This aligns perfectly with the broader codebase refactor.

Also applies to: 211-211

tests/unittest/_torch/modeling/test_modeling_nemotron_h.py (1)

43-43: LGTM - Behavior preserved with new flag.

The change from enable_trtllm_sampler=True to enable_torch_sampler=False maintains the same effective behavior (using the TRTLLM sampler) while adopting the new flag convention. This is appropriate for this Nemotron-H correctness test.

tensorrt_llm/llmapi/llm_args.py (2)

1849-1852: LGTM! Semantic inversion correctly implements the PR objective.

The field renaming from enable_trtllm_sampler to enable_torch_sampler with inverted logic successfully enables the TRTLLM sampler by default while maintaining the same default value (False). The updated description clearly documents the new behavior.


2113-2113: LGTM! Parameter usage correctly updated for consistency.

The parameter passed to PyTorchConfig constructor properly reflects the field renaming, maintaining consistency across the codebase.

tests/unittest/_torch/test_overlap_scheduler.py (4)

24-27: LGTM! Parameter renaming maintains consistency with codebase changes.

The function signature and dictionary key updates correctly reflect the field renaming from enable_trtllm_sampler to enable_torch_sampler.


44-47: LGTM! Test parameterization correctly updated.

The pytest parameterization and function signature properly reflect the parameter renaming while maintaining the same test coverage for both sampler configurations.


53-53: LGTM! Stop words logic correctly adapted to inverted flag semantics.

The condition if not enable_torch_sampler correctly maintains the intended behavior - stop words are set when using the TRTLLM sampler (when enable_torch_sampler=False), which aligns with the inverted flag semantics.


65-65: LGTM! Function calls correctly updated with new parameter name.

The create_llm function calls properly use the renamed enable_torch_sampler parameter, maintaining consistency with the updated function signature.

Also applies to: 77-77

tests/unittest/_torch/test_return_logits.py (4)

19-19: LGTM! Test parameterization correctly updated.

The pytest parameterization and function signature properly reflect the parameter renaming while maintaining the same test coverage.

Also applies to: 22-22


30-31: LGTM! Skip condition correctly adapted to inverted flag semantics.

The condition if enable_torch_sampler and gather_context_logits correctly identifies when the Torch sampler is being used and skips the test appropriately, maintaining the intended behavior after the flag semantic inversion.


44-44: LGTM! Constructor parameter correctly updated.

The LLM constructor parameter properly uses the renamed enable_torch_sampler field, maintaining consistency with the codebase changes.


86-86: LGTM! Second test function correctly updated.

The pytest parameterization and function signature in the second test function properly reflect the parameter renaming, consistent with the first test function.

Also applies to: 89-89

tensorrt_llm/_torch/pyexecutor/_util.py (1)

575-576: LGTM: Torch sampler flag implementation is correct.

The condition correctly checks for enable_torch_sampler and instantiates the appropriate TorchSampler when enabled.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12446 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #9255 completed with status: 'FAILURE'

@dcampora
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12523 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12523 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #9307 completed with status: 'FAILURE'

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
tensorrt_llm/_torch/pyexecutor/_util.py (1)

575-582: LGTM: Logic reversal correctly implements the default sampler change.

The flag renaming and logic inversion correctly achieves the PR objective of making TRTLLMSampler the default. The implementation properly handles the semantic reversal where enable_torch_sampler=True now selects TorchSampler, while the default behavior uses TRTLLMSampler.

Note: This aligns with the previous review comment about the breaking change requiring documentation and migration guidance.

🧹 Nitpick comments (1)
tensorrt_llm/_torch/pyexecutor/_util.py (1)

645-645: Fix line length violation.

The line exceeds the 120-character limit (131 characters). Consider breaking it into multiple lines for better readability.

-            "Model is built with 'explicit draft tokens' decoding, but decoding mode is something else. Overwriting decoding mode."
+            "Model is built with 'explicit draft tokens' decoding, but decoding mode is "
+            "something else. Overwriting decoding mode."
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8ae6c16 and 69b55e0.

📒 Files selected for processing (1)
  • tensorrt_llm/_torch/pyexecutor/_util.py (6 hunks)
🪛 Ruff (0.12.2)
tensorrt_llm/_torch/pyexecutor/_util.py

645-645: Line too long (131 > 120)

(E501)

🧰 Additional context used
🪛 Ruff (0.12.2)
tensorrt_llm/_torch/pyexecutor/_util.py

645-645: Line too long (131 > 120)

(E501)

🔇 Additional comments (1)
tensorrt_llm/_torch/pyexecutor/_util.py (1)

604-605: LGTM: Defensive programming prevents AttributeError exceptions.

The change from direct attribute access to getattr with False defaults is a good defensive programming practice. This prevents potential AttributeError exceptions when speculative_config lacks the expected boolean attributes, while maintaining the same logical behavior.

Also applies to: 612-613, 623-624, 631-632, 642-643, 650-652, 662-663, 670-671, 681-682

@dcampora
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12541 [ run ] triggered by Bot

@dcampora
Copy link
Collaborator Author

/bot run --disable-fail-fast

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
tensorrt_llm/_torch/pyexecutor/_util.py (1)

575-582: Breaking change implemented correctly but still lacks documentation

The sampler selection logic has been correctly inverted to make TRTLLMSampler the default, which aligns with the PR objective. However, the previous concern about missing documentation for this breaking change remains unaddressed.

🧹 Nitpick comments (1)
tensorrt_llm/_torch/pyexecutor/_util.py (1)

575-575: Fix line length violations

Two lines exceed the 120-character limit. Consider breaking them up for better readability:

For line 575:

-    if pytorch_backend_config.enable_torch_sampler or pytorch_backend_config.enable_mixed_sampler or engine.spec_config is not None:
+    if (pytorch_backend_config.enable_torch_sampler or 
+        pytorch_backend_config.enable_mixed_sampler or 
+        engine.spec_config is not None):

For line 645:

-        logger.warning(
-            "Model is built with 'explicit draft tokens' decoding, but decoding mode is something else. Overwriting decoding mode."
-        )
+        logger.warning(
+            "Model is built with 'explicit draft tokens' decoding, but "
+            "decoding mode is something else. Overwriting decoding mode."
+        )

Also applies to: 645-645

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between be424dc and 5b786d2.

📒 Files selected for processing (4)
  • tensorrt_llm/_torch/pyexecutor/_util.py (6 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ngram.yaml (1 hunks)
  • tests/unittest/_torch/speculative/test_draft_target.py (1 hunks)
  • tests/unittest/_torch/speculative/test_eagle3.py (1 hunks)
🧬 Code Graph Analysis (1)
tests/unittest/_torch/speculative/test_draft_target.py (1)
tensorrt_llm/llmapi/llm_args.py (2)
  • DraftTargetDecodingConfig (404-413)
  • speculative_model_dir (1159-1160)
🪛 Ruff (0.12.2)
tensorrt_llm/_torch/pyexecutor/_util.py

575-575: Line too long (132 > 120)

(E501)


645-645: Line too long (131 > 120)

(E501)

✅ Files skipped from review due to trivial changes (1)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ngram.yaml
🧰 Additional context used
🧬 Code Graph Analysis (1)
tests/unittest/_torch/speculative/test_draft_target.py (1)
tensorrt_llm/llmapi/llm_args.py (2)
  • DraftTargetDecodingConfig (404-413)
  • speculative_model_dir (1159-1160)
🪛 Ruff (0.12.2)
tensorrt_llm/_torch/pyexecutor/_util.py

575-575: Line too long (132 > 120)

(E501)


645-645: Line too long (131 > 120)

(E501)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (4)
tests/unittest/_torch/speculative/test_eagle3.py (1)

62-62: LGTM: Correct flag usage for enabling Torch sampler

The addition of enable_torch_sampler=True correctly reflects the new flag semantics where this parameter explicitly selects the Torch sampler. This is appropriate for Eagle3 speculative decoding tests.

tests/unittest/_torch/speculative/test_draft_target.py (2)

44-44: LGTM: Correct flag usage in main configuration

The addition of enable_torch_sampler=True in the main LLM configuration correctly enables the Torch sampler for draft-target speculative decoding.


50-50: LGTM: Consistent sampler configuration in speculative config

Adding enable_torch_sampler=True to the DraftTargetDecodingConfig ensures consistent sampler selection across both the main model and speculative configuration. This is appropriate since the DraftTargetDecodingConfig only supports the "pytorch" backend.

tensorrt_llm/_torch/pyexecutor/_util.py (1)

604-605: LGTM: Improved error handling with defensive getattr usage

The replacement of direct attribute access with getattr(executor_config.speculative_config, "attribute", False) calls is excellent defensive programming. This prevents AttributeError exceptions when speculative_config is None or missing these boolean attributes, defaulting safely to False.

The consistent pattern across all speculative configuration checks improves the robustness of the decoding mode determination logic.

Also applies to: 612-613, 623-624, 631-632, 642-643, 650-652, 662-663, 670-671, 681-682

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12567 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12541 [ run ] completed with state ABORTED

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12567 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #9347 completed with status: 'FAILURE'

@dcampora
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12676 [ run ] triggered by Bot

@dcampora dcampora force-pushed the user/dcampora/enable_trtllm_sampler_default branch from 6c48327 to 30376d2 Compare July 23, 2025 08:06
@dcampora
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12679 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12676 [ run ] completed with state ABORTED

Funatiq and others added 11 commits August 7, 2025 15:11
…coder classes

- Updated the parameter names and related comments in the DecoderState and GptDecoder classes to reflect the change from maxBatchSize to maxNumSequences.
- Adjustments were made in the setup methods, member variables, and associated bindings in the Python interface.
- This change improves clarity regarding the number of sequences being processed.

Signed-off-by: Robin Kobus <[email protected]>
`Optional` to accommodate `_forward_step_inter_pp` which creates a `SampleState` without `finalize_events`

Signed-off-by: Netanel Haber <[email protected]>
Signed-off-by: Netanel Haber <[email protected]>

something

Signed-off-by: Netanel Haber <[email protected]>
Signed-off-by: Daniel Campora <[email protected]>
Signed-off-by: Daniel Campora <[email protected]>
Signed-off-by: Daniel Campora <[email protected]>
Signed-off-by: Netanel Haber <[email protected]>
Signed-off-by: Daniel Campora <[email protected]>
@dcampora dcampora force-pushed the user/dcampora/enable_trtllm_sampler_default branch from 2610cd2 to f66a946 Compare August 7, 2025 15:12
@dcampora
Copy link
Collaborator Author

dcampora commented Aug 7, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14488 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14488 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #10943 completed with status: 'FAILURE'

Signed-off-by: Daniel Campora <[email protected]>
@dcampora
Copy link
Collaborator Author

dcampora commented Aug 7, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14512 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14512 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #10961 completed with status: 'SUCCESS'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants