-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[TRTLLM-6785][feat] BREAKING CHANGE Enable TRTLLM sampler by default #6216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TRTLLM-6785][feat] BREAKING CHANGE Enable TRTLLM sampler by default #6216
Conversation
📝 WalkthroughWalkthroughThis change set standardizes the naming of a configuration parameter across the codebase, renaming Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI/Config
participant LLM Constructor
participant PyExecutor
participant Sampler (TorchSampler/TRTLLMSampler)
User->>CLI/Config: Set --use_torch_sampler flag
CLI/Config->>LLM Constructor: Pass use_torch_sampler param
LLM Constructor->>PyExecutor: Pass use_torch_sampler in config
PyExecutor->>Sampler: Instantiate based on use_torch_sampler
Sampler-->>PyExecutor: TorchSampler or TRTLLMSampler instance
PyExecutor-->>LLM Constructor: Sampler ready
LLM Constructor-->>User: LLM object initialized
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
📜 Recent review detailsConfiguration used: .coderabbit.yaml 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
/bot run |
PR_Github #12446 [ run ] triggered by Bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/integration/defs/disaggregated/test_configs/disagg_config_trtllm_sampler.yaml (1)
14-14
: Consider renaming the configuration file to reflect the new parameter.The filename
disagg_config_trtllm_sampler.yaml
still references the old parameter name, but now usesenable_torch_sampler
. Consider renaming the file todisagg_config_torch_sampler.yaml
or similar to maintain consistency with the new parameter naming.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
examples/llm-api/quickstart_advanced.py
(2 hunks)tensorrt_llm/_torch/pyexecutor/_util.py
(1 hunks)tensorrt_llm/_torch/pyexecutor/config.py
(1 hunks)tensorrt_llm/llmapi/llm_args.py
(2 hunks)tests/integration/defs/accuracy/test_llm_api_pytorch.py
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_trtllm_sampler.yaml
(2 hunks)tests/unittest/_torch/modeling/test_modeling_nemotron_h.py
(1 hunks)tests/unittest/_torch/test_beam_search.py
(1 hunks)tests/unittest/_torch/test_overlap_scheduler.py
(4 hunks)tests/unittest/_torch/test_return_logits.py
(4 hunks)tests/unittest/_torch/test_trtllm_sampler.py
(1 hunks)tests/unittest/api_stability/references/llm.yaml
(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
tests/unittest/_torch/test_overlap_scheduler.py (4)
tests/unittest/_torch/test_trtllm_sampler.py (3)
create_llm
(25-41)model_path
(21-22)test_case
(15-17)tests/unittest/llmapi/test_llm_kv_cache_events.py (1)
create_llm
(49-54)tensorrt_llm/llmapi/llm_args.py (2)
model_dir
(908-910)model_dir
(913-917)tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py (1)
model_path
(31-44)
tensorrt_llm/_torch/pyexecutor/_util.py (3)
tensorrt_llm/_torch/pyexecutor/sampler.py (3)
TorchSampler
(208-457)EarlyStopSampler
(70-97)TRTLLMSampler
(483-942)tensorrt_llm/_torch/models/checkpoints/base_weight_mapper.py (2)
model
(162-165)mapping
(152-153)tensorrt_llm/runtime/generation.py (2)
dtype
(815-816)dtype
(1218-1219)
🔇 Additional comments (20)
tests/integration/defs/disaggregated/test_configs/disagg_config_trtllm_sampler.yaml (1)
14-14
: Confirm TRTLLM Sampler Activation
I wasn’t able to locate the Python‐level branch that invertsenable_torch_sampler
(it’s likely handled in the native extension), so please manually verify that settingtests/integration/defs/disaggregated/test_configs/disagg_config_trtllm_sampler.yaml:14 enable_torch_sampler: Falseindeed forces use of the TRTLLM sampler. You can confirm by:
- Reviewing the native (C++/CUDA) entry point where
enable_torch_sampler
is interpreted.- Adding a small smoke test that inspects or logs which sampler implementation is chosen.
tests/unittest/_torch/test_beam_search.py (1)
46-46
: LGTM! Correct parameter migration.The change from
enable_trtllm_sampler=True
toenable_torch_sampler=False
correctly maintains the same sampler behavior (TRTLLM) while adopting the new parameter naming and inverted logic.tests/unittest/api_stability/references/llm.yaml (1)
87-89
: Confirm default sampler behavior flipThe rename in
tests/unittest/api_stability/references/llm.yaml
(lines 87–89) fromenable_trtllm_sampler: annotation: bool default: Falseto
enable_torch_sampler: annotation: bool default: Falseinverts the default sampler:
- Old default (
enable_trtllm_sampler=False
): uses TorchSampler- New default (
enable_torch_sampler=False
): uses TRTLLMSamplerThis aligns with the PR objective “Enable TRTLLM sampler by default” but introduces a breaking change for users relying on the previous default. Please verify that switching to TRTLLMSampler as the default sampler is intentional and acceptable for downstream consumers.
tests/unittest/_torch/test_trtllm_sampler.py (1)
27-27
: LGTM! Correct sampler configuration for TRTLLM sampler testing.The change from
enable_trtllm_sampler=True
toenable_torch_sampler=False
correctly maintains the TRTLLM sampler selection for this test, which is appropriate given the test file's purpose (test_trtllm_sampler.py
).tensorrt_llm/_torch/pyexecutor/config.py (1)
57-60
: instantiate_sampler correctly respectsenable_torch_sampler
- Verified
instantiate_sampler
in_util.py
(lines 555–570) checkspytorch_backend_config.enable_torch_sampler
and returnsTorchSampler(sampler_args)
when true, falling back toTRTLLMSampler
otherwise.- Confirmed no remaining references to the old
enable_trtllm_sampler
flag.- All sampler instantiation logic aligns with the updated docstring in
config.py
.LGTM!
tests/integration/defs/accuracy/test_llm_api_pytorch.py (2)
198-198
: Verify the test's intended sampler selection.The change from
enable_trtllm_sampler=True
toenable_torch_sampler=True
represents both a flag rename and a logic inversion. Please confirm that this test should indeed be using the Torch sampler rather than the TRTLLM sampler for FP8 LLM sampling validation.
231-231
: LGTM - Appropriate sampler selection for beam search test.The change to
enable_torch_sampler=False
ensures this beam search test uses the TRTLLM sampler (default behavior), which is appropriate since the test focuses on beam search functionality rather than sampler comparison.examples/llm-api/quickstart_advanced.py (1)
57-57
: LGTM - Consistent flag rename.The changes consistently rename the sampler flag from
--enable_trtllm_sampler
to--enable_torch_sampler
in both the argument parser definition and LLM constructor usage. This aligns perfectly with the broader codebase refactor.Also applies to: 211-211
tests/unittest/_torch/modeling/test_modeling_nemotron_h.py (1)
43-43
: LGTM - Behavior preserved with new flag.The change from
enable_trtllm_sampler=True
toenable_torch_sampler=False
maintains the same effective behavior (using the TRTLLM sampler) while adopting the new flag convention. This is appropriate for this Nemotron-H correctness test.tensorrt_llm/llmapi/llm_args.py (2)
1849-1852
: LGTM! Semantic inversion correctly implements the PR objective.The field renaming from
enable_trtllm_sampler
toenable_torch_sampler
with inverted logic successfully enables the TRTLLM sampler by default while maintaining the same default value (False
). The updated description clearly documents the new behavior.
2113-2113
: LGTM! Parameter usage correctly updated for consistency.The parameter passed to
PyTorchConfig
constructor properly reflects the field renaming, maintaining consistency across the codebase.tests/unittest/_torch/test_overlap_scheduler.py (4)
24-27
: LGTM! Parameter renaming maintains consistency with codebase changes.The function signature and dictionary key updates correctly reflect the field renaming from
enable_trtllm_sampler
toenable_torch_sampler
.
44-47
: LGTM! Test parameterization correctly updated.The pytest parameterization and function signature properly reflect the parameter renaming while maintaining the same test coverage for both sampler configurations.
53-53
: LGTM! Stop words logic correctly adapted to inverted flag semantics.The condition
if not enable_torch_sampler
correctly maintains the intended behavior - stop words are set when using the TRTLLM sampler (whenenable_torch_sampler=False
), which aligns with the inverted flag semantics.
65-65
: LGTM! Function calls correctly updated with new parameter name.The
create_llm
function calls properly use the renamedenable_torch_sampler
parameter, maintaining consistency with the updated function signature.Also applies to: 77-77
tests/unittest/_torch/test_return_logits.py (4)
19-19
: LGTM! Test parameterization correctly updated.The pytest parameterization and function signature properly reflect the parameter renaming while maintaining the same test coverage.
Also applies to: 22-22
30-31
: LGTM! Skip condition correctly adapted to inverted flag semantics.The condition
if enable_torch_sampler and gather_context_logits
correctly identifies when the Torch sampler is being used and skips the test appropriately, maintaining the intended behavior after the flag semantic inversion.
44-44
: LGTM! Constructor parameter correctly updated.The LLM constructor parameter properly uses the renamed
enable_torch_sampler
field, maintaining consistency with the codebase changes.
86-86
: LGTM! Second test function correctly updated.The pytest parameterization and function signature in the second test function properly reflect the parameter renaming, consistent with the first test function.
Also applies to: 89-89
tensorrt_llm/_torch/pyexecutor/_util.py (1)
575-576
: LGTM: Torch sampler flag implementation is correct.The condition correctly checks for
enable_torch_sampler
and instantiates the appropriateTorchSampler
when enabled.
PR_Github #12446 [ run ] completed with state |
/bot run |
PR_Github #12523 [ run ] triggered by Bot |
PR_Github #12523 [ run ] completed with state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
tensorrt_llm/_torch/pyexecutor/_util.py (1)
575-582
: LGTM: Logic reversal correctly implements the default sampler change.The flag renaming and logic inversion correctly achieves the PR objective of making TRTLLMSampler the default. The implementation properly handles the semantic reversal where
enable_torch_sampler=True
now selects TorchSampler, while the default behavior uses TRTLLMSampler.Note: This aligns with the previous review comment about the breaking change requiring documentation and migration guidance.
🧹 Nitpick comments (1)
tensorrt_llm/_torch/pyexecutor/_util.py (1)
645-645
: Fix line length violation.The line exceeds the 120-character limit (131 characters). Consider breaking it into multiple lines for better readability.
- "Model is built with 'explicit draft tokens' decoding, but decoding mode is something else. Overwriting decoding mode." + "Model is built with 'explicit draft tokens' decoding, but decoding mode is " + "something else. Overwriting decoding mode."
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tensorrt_llm/_torch/pyexecutor/_util.py
(6 hunks)
🪛 Ruff (0.12.2)
tensorrt_llm/_torch/pyexecutor/_util.py
645-645: Line too long (131 > 120)
(E501)
🧰 Additional context used
🪛 Ruff (0.12.2)
tensorrt_llm/_torch/pyexecutor/_util.py
645-645: Line too long (131 > 120)
(E501)
🔇 Additional comments (1)
tensorrt_llm/_torch/pyexecutor/_util.py (1)
604-605
: LGTM: Defensive programming prevents AttributeError exceptions.The change from direct attribute access to
getattr
withFalse
defaults is a good defensive programming practice. This prevents potential AttributeError exceptions whenspeculative_config
lacks the expected boolean attributes, while maintaining the same logical behavior.Also applies to: 612-613, 623-624, 631-632, 642-643, 650-652, 662-663, 670-671, 681-682
/bot run --disable-fail-fast |
PR_Github #12541 [ run ] triggered by Bot |
/bot run --disable-fail-fast |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
tensorrt_llm/_torch/pyexecutor/_util.py (1)
575-582
: Breaking change implemented correctly but still lacks documentationThe sampler selection logic has been correctly inverted to make TRTLLMSampler the default, which aligns with the PR objective. However, the previous concern about missing documentation for this breaking change remains unaddressed.
🧹 Nitpick comments (1)
tensorrt_llm/_torch/pyexecutor/_util.py (1)
575-575
: Fix line length violationsTwo lines exceed the 120-character limit. Consider breaking them up for better readability:
For line 575:
- if pytorch_backend_config.enable_torch_sampler or pytorch_backend_config.enable_mixed_sampler or engine.spec_config is not None: + if (pytorch_backend_config.enable_torch_sampler or + pytorch_backend_config.enable_mixed_sampler or + engine.spec_config is not None):For line 645:
- logger.warning( - "Model is built with 'explicit draft tokens' decoding, but decoding mode is something else. Overwriting decoding mode." - ) + logger.warning( + "Model is built with 'explicit draft tokens' decoding, but " + "decoding mode is something else. Overwriting decoding mode." + )Also applies to: 645-645
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
tensorrt_llm/_torch/pyexecutor/_util.py
(6 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ngram.yaml
(1 hunks)tests/unittest/_torch/speculative/test_draft_target.py
(1 hunks)tests/unittest/_torch/speculative/test_eagle3.py
(1 hunks)
🧬 Code Graph Analysis (1)
tests/unittest/_torch/speculative/test_draft_target.py (1)
tensorrt_llm/llmapi/llm_args.py (2)
DraftTargetDecodingConfig
(404-413)speculative_model_dir
(1159-1160)
🪛 Ruff (0.12.2)
tensorrt_llm/_torch/pyexecutor/_util.py
575-575: Line too long (132 > 120)
(E501)
645-645: Line too long (131 > 120)
(E501)
✅ Files skipped from review due to trivial changes (1)
- tests/integration/defs/disaggregated/test_configs/disagg_config_ngram.yaml
🧰 Additional context used
🧬 Code Graph Analysis (1)
tests/unittest/_torch/speculative/test_draft_target.py (1)
tensorrt_llm/llmapi/llm_args.py (2)
DraftTargetDecodingConfig
(404-413)speculative_model_dir
(1159-1160)
🪛 Ruff (0.12.2)
tensorrt_llm/_torch/pyexecutor/_util.py
575-575: Line too long (132 > 120)
(E501)
645-645: Line too long (131 > 120)
(E501)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (4)
tests/unittest/_torch/speculative/test_eagle3.py (1)
62-62
: LGTM: Correct flag usage for enabling Torch samplerThe addition of
enable_torch_sampler=True
correctly reflects the new flag semantics where this parameter explicitly selects the Torch sampler. This is appropriate for Eagle3 speculative decoding tests.tests/unittest/_torch/speculative/test_draft_target.py (2)
44-44
: LGTM: Correct flag usage in main configurationThe addition of
enable_torch_sampler=True
in the main LLM configuration correctly enables the Torch sampler for draft-target speculative decoding.
50-50
: LGTM: Consistent sampler configuration in speculative configAdding
enable_torch_sampler=True
to theDraftTargetDecodingConfig
ensures consistent sampler selection across both the main model and speculative configuration. This is appropriate since theDraftTargetDecodingConfig
only supports the "pytorch" backend.tensorrt_llm/_torch/pyexecutor/_util.py (1)
604-605
: LGTM: Improved error handling with defensive getattr usageThe replacement of direct attribute access with
getattr(executor_config.speculative_config, "attribute", False)
calls is excellent defensive programming. This prevents AttributeError exceptions whenspeculative_config
is None or missing these boolean attributes, defaulting safely to False.The consistent pattern across all speculative configuration checks improves the robustness of the decoding mode determination logic.
Also applies to: 612-613, 623-624, 631-632, 642-643, 650-652, 662-663, 670-671, 681-682
PR_Github #12567 [ run ] triggered by Bot |
PR_Github #12541 [ run ] completed with state |
PR_Github #12567 [ run ] completed with state |
/bot run |
PR_Github #12676 [ run ] triggered by Bot |
6c48327
to
30376d2
Compare
/bot run --disable-fail-fast |
PR_Github #12679 [ run ] triggered by Bot |
PR_Github #12676 [ run ] completed with state |
…coder classes - Updated the parameter names and related comments in the DecoderState and GptDecoder classes to reflect the change from maxBatchSize to maxNumSequences. - Adjustments were made in the setup methods, member variables, and associated bindings in the Python interface. - This change improves clarity regarding the number of sequences being processed. Signed-off-by: Robin Kobus <[email protected]>
`Optional` to accommodate `_forward_step_inter_pp` which creates a `SampleState` without `finalize_events` Signed-off-by: Netanel Haber <[email protected]>
Signed-off-by: Netanel Haber <[email protected]> something Signed-off-by: Netanel Haber <[email protected]>
Signed-off-by: Daniel Campora <[email protected]>
Signed-off-by: Daniel Campora <[email protected]>
Signed-off-by: Netanel Haber <[email protected]>
Signed-off-by: Daniel Campora <[email protected]>
Signed-off-by: Netanel Haber <[email protected]>
Signed-off-by: Daniel Campora <[email protected]>
Signed-off-by: Daniel Campora <[email protected]>
Signed-off-by: Daniel Campora <[email protected]>
2610cd2
to
f66a946
Compare
/bot run |
PR_Github #14488 [ run ] triggered by Bot |
PR_Github #14488 [ run ] completed with state |
Signed-off-by: Daniel Campora <[email protected]>
/bot run |
PR_Github #14512 [ run ] triggered by Bot |
PR_Github #14512 [ run ] completed with state |
…VIDIA#6216) Signed-off-by: Daniel Campora <[email protected]>
Summary by CodeRabbit
New Features
Refactor
enable_trtllm_sampler
touse_torch_sampler
throughout the application and documentation for improved clarity.Bug Fixes
Tests
Documentation
Description
Test Coverage
GitHub Bot Help
/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...
Provide a user friendly way for developers to interact with a Jenkins server.
Run
/bot [-h|--help]
to print this help message.See details below for each supported subcommand.
run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]
Launch build/test pipelines. All previously running jobs will be killed.
--reuse-test (optional)pipeline-id
(OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.--disable-reuse-test
(OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.--disable-fail-fast
(OPTIONAL) : Disable fail fast on build/tests/infra failures.--skip-test
(OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.--stage-list "A10-PyTorch-1, xxx"
(OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.--gpu-type "A30, H100_PCIe"
(OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.--test-backend "pytorch, cpp"
(OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.--only-multi-gpu-test
(OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.--disable-multi-gpu-test
(OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.--add-multi-gpu-test
(OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.--post-merge
(OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx"
(OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".--detailed-log
(OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.--debug
(OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in thestage-list
parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.For guidance on mapping tests to stage names, see
docs/source/reference/ci-overview.md
and the
scripts/test_to_stage_mapping.py
helper.kill
kill
Kill all running builds associated with pull request.
skip
skip --comment COMMENT
Skip testing for latest commit on pull request.
--comment "Reason for skipping build/test"
is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.reuse-pipeline
reuse-pipeline
Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.