refactor: Rename maxBatchSize to maxNumSequences in decoder #5321

Funatiq · 2025-06-18T08:15:16Z

Description

Updated the parameter names and related comments in the DecoderState and GptDecoder classes to reflect the change from maxBatchSize to maxNumSequences.
Adjustments were made in the setup methods, member variables, and associated bindings in the Python interface.
This change improves clarity regarding the number of sequences being processed.

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Summary by CodeRabbit

Refactor
- Renamed parameters and variables related to batch size to "number of sequences" across the user-facing API and Python bindings for improved clarity and consistency.
- Updated method and constructor signatures, argument names, and documentation in both C++ and Python interfaces to reflect this terminology change.
Style
- Improved naming consistency in user-facing interfaces and documentation.

…coder classes - Updated the parameter names and related comments in the DecoderState and GptDecoder classes to reflect the change from maxBatchSize to maxNumSequences. - Adjustments were made in the setup methods, member variables, and associated bindings in the Python interface. - This change improves clarity regarding the number of sequences being processed. Signed-off-by: Robin Kobus <[email protected]>

coderabbitai · 2025-07-28T13:29:41Z

📝 Walkthrough

Walkthrough

This change performs a systematic renaming of the "batch size" concept to "number of sequences" across the decoder-related interfaces, implementations, and Python bindings. All relevant identifiers, parameters, and comments are updated for clarity and semantic accuracy. No logic, control flow, or error handling is modified; the changes are strictly nomenclature updates.

Changes

Cohort / File(s)	Change Summary
DecoderState interface and implementation `cpp/include/tensorrt_llm/runtime/decoderState.h`, `cpp/tensorrt_llm/runtime/decoderState.cpp`	Renamed all `maxBatchSize` parameters, member variables, and related identifiers to `maxNumSequences`. Updated method signatures, comments, and internal usages accordingly. No logic or control flow changes.
GPT Decoder interface and implementation `cpp/include/tensorrt_llm/runtime/gptDecoder.h`, `cpp/tensorrt_llm/runtime/gptDecoder.cpp`	Renamed `maxBatchSize` to `maxNumSequences` in all public and internal APIs, including constructor parameters, member variables, and method calls. Updated static factory methods and related usages.
GPT Decoder Batched interface and implementation `cpp/include/tensorrt_llm/runtime/gptDecoderBatched.h`, `cpp/tensorrt_llm/runtime/gptDecoderBatched.cpp`, `cpp/include/tensorrt_llm/runtime/iGptDecoderBatched.h`	Updated the `setup` method and related parameter names from `maxBatchSize` to `maxNumSequences` in both interface and implementation. Adjusted internal usages and checks accordingly.
Python bindings `cpp/tensorrt_llm/pybind/runtime/bindings.cpp`	Updated Python binding argument names from `max_batch_size` to `max_num_sequences` for relevant methods in `DecoderState` and `GptDecoderBatched`. No functional changes.
Batch manager usage `cpp/tensorrt_llm/batch_manager/createNewDecoderRequests.cpp`	Replaced usage of `decoderState.getMaxBatchSize()` with `decoderState.getMaxNumSequences()` in batch slot checks.
PyTorch executor sampler `tensorrt_llm/_torch/pyexecutor/sampler.py`	Replaced all usages of `self.executor_config.max_batch_size` with `self.max_num_sequences` in tensor shape and algorithm setup logic within `TRTLLMSampler`.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant PythonAPI
    participant DecoderState
    participant GptDecoder
    participant GptDecoderBatched

    User->>PythonAPI: Call setup(max_num_sequences, ...)
    PythonAPI->>DecoderState: setup(maxNumSequences, ...)
    PythonAPI->>GptDecoderBatched: setup(maxNumSequences, ...)
    PythonAPI->>GptDecoder: create(..., maxNumSequences, ...)
    DecoderState-->>PythonAPI: Initialized state for maxNumSequences
    GptDecoderBatched-->>PythonAPI: Batched decoder ready
    GptDecoder-->>PythonAPI: Decoder created for maxNumSequences

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Suggested reviewers

juney-nvidia
nv-guomingz
pcastonguay

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 03632a6 and 01cf692.

📒 Files selected for processing (10)

cpp/include/tensorrt_llm/runtime/decoderState.h (4 hunks)
cpp/include/tensorrt_llm/runtime/gptDecoder.h (3 hunks)
cpp/include/tensorrt_llm/runtime/gptDecoderBatched.h (1 hunks)
cpp/include/tensorrt_llm/runtime/iGptDecoderBatched.h (1 hunks)
cpp/tensorrt_llm/batch_manager/createNewDecoderRequests.cpp (1 hunks)
cpp/tensorrt_llm/pybind/runtime/bindings.cpp (2 hunks)
cpp/tensorrt_llm/runtime/decoderState.cpp (9 hunks)
cpp/tensorrt_llm/runtime/gptDecoder.cpp (8 hunks)
cpp/tensorrt_llm/runtime/gptDecoderBatched.cpp (2 hunks)
tensorrt_llm/_torch/pyexecutor/sampler.py (3 hunks)

🔇 Additional comments (30)

cpp/tensorrt_llm/batch_manager/createNewDecoderRequests.cpp (1)

179-179: LGTM - Method call updated consistently.

The method call has been correctly updated from getMaxBatchSize() to getMaxNumSequences() to align with the systematic renaming across the codebase.

tensorrt_llm/_torch/pyexecutor/sampler.py (3)

547-547: LGTM - Tensor shape parameter updated correctly.

The shape parameter for sequence_lengths_host has been correctly updated to use self.max_num_sequences, maintaining consistency with the semantic renaming.

557-557: LGTM - DecoderState setup parameter updated correctly.

The decoder_state.setup() call has been correctly updated to use the max_num_sequences parameter name, aligning with the interface changes in the C++ components.

573-573: LGTM - Decoder setup parameter updated correctly.

The algs.decoder.setup() call has been correctly updated to use the max_num_sequences parameter name, maintaining consistency with the systematic renaming across the decoder interfaces.

cpp/include/tensorrt_llm/runtime/gptDecoderBatched.h (1)

50-51: LGTM - Method signature parameter renamed consistently.

The setup method parameter has been correctly renamed from maxBatchSize to maxNumSequences while maintaining the same type and position, ensuring consistency with the systematic renaming across the decoder interfaces.

cpp/tensorrt_llm/runtime/gptDecoderBatched.cpp (3)

75-76: LGTM - Method signature updated consistently.

The setup method signature has been correctly updated to use maxNumSequences parameter name, matching the header declaration and maintaining consistency with the systematic renaming.

79-79: LGTM - Validation check updated correctly.

The parameter validation check has been correctly updated to use the renamed maxNumSequences parameter while maintaining the same validation logic.

95-96: LGTM - Factory method call updated correctly.

The IGptDecoder::create call has been correctly updated to pass the renamed maxNumSequences parameter, maintaining consistency with the interface changes.

cpp/include/tensorrt_llm/runtime/iGptDecoderBatched.h (1)

89-90: LGTM - Interface method parameter renamed consistently.

The pure virtual setup method parameter has been correctly renamed from maxBatchSize to maxNumSequences in the interface, ensuring all implementing classes will use the consistent parameter naming throughout the codebase.

cpp/tensorrt_llm/pybind/runtime/bindings.cpp (2)

329-333: LGTM! Parameter renaming maintains consistency with C++ interface.

The renaming from max_batch_size to max_num_sequences in the DecoderState setup methods aligns perfectly with the underlying C++ interface changes, ensuring consistent terminology across the Python bindings.

389-390: LGTM! GptDecoderBatched parameter renaming is consistent.

The parameter renaming in the GptDecoderBatched setup method maintains consistency with the C++ interface and follows the same pattern as the DecoderState changes.

cpp/include/tensorrt_llm/runtime/gptDecoder.h (4)

75-77: LGTM! Factory method parameter renaming improves semantic clarity.

The renaming from maxBatchSize to maxNumSequences in the IGptDecoder::create static method better represents the actual meaning of the parameter.

88-90: LGTM! Constructor parameter renaming maintains interface consistency.

The constructor parameter renaming aligns with the factory method changes and improves semantic accuracy.

118-121: LGTM! Member variable renaming ensures internal consistency.

The member variable renaming from mMaxBatchSize to mMaxNumSequences maintains consistency with the parameter changes and internal usage.

126-141: LGTM! Factory method implementation consistently updated.

The inline factory method implementation correctly updates the parameter name and all template instantiation calls to use maxNumSequences, maintaining consistency throughout the interface.

cpp/include/tensorrt_llm/runtime/decoderState.h (3)

54-56: LGTM! Setup method parameter renaming improves semantic accuracy.

The renaming from maxBatchSize to maxNumSequences in the setup method better represents the actual meaning of the parameter.

60-61: LGTM! Cache indirection setup parameter renaming maintains consistency.

The parameter renaming in setupCacheIndirection aligns with the main setup method and improves semantic clarity.

140-140: LGTM! Complete interface consistency achieved through systematic renaming.

The getter method renaming, member variable update, and comment modifications ensure complete consistency throughout the DecoderState class interface and implementation.

Also applies to: 193-197, 215-215, 219-222

cpp/tensorrt_llm/runtime/gptDecoder.cpp (6)

39-51: LGTM! Constructor implementation consistently updated.

The constructor parameter renaming and member initialization properly align with the header file changes, maintaining consistency throughout the implementation.

68-68: LGTM! Method implementation uses updated member variable.

The disableLookahead method correctly uses the renamed member variable mMaxNumSequences.

289-306: LGTM! Helper function consistently updated.

The prepareMedusaInputs function parameter renaming and usage maintain consistency with the overall changes.

415-415: LGTM! PrepareInputs function parameter consistently renamed.

The function parameter renaming maintains consistency with the overall changes.

502-502: LGTM! Function call uses correct parameter name.

The prepareMedusaInputs function call correctly uses the renamed parameter.

742-742: LGTM! Forward methods use updated member variable.

The forwardAsync and forwardSync methods correctly use the renamed member variable mMaxNumSequences in their prepareInputs calls.

Also applies to: 753-753

cpp/tensorrt_llm/runtime/decoderState.cpp (6)

56-65: LGTM! Setup method implementation consistently updated.

The method parameter renaming and function call maintain consistency with the header declaration.

199-297: LGTM! Comprehensive buffer reshaping implementation updated.

The reshapeBuffers method implementation correctly updates parameter naming, bounds checking, member variable assignment, and all tensor reshaping operations to use maxNumSequences terminology while maintaining identical logic.

300-322: LGTM! Cache indirection methods consistently updated.

The setupCacheIndirection method and related buffer reshaping operations correctly use the renamed parameter throughout.

342-441: LGTM! Speculative decoding buffer management consistently updated.

The reshapeSpeculativeDecodingBuffers method correctly uses mMaxNumSequences throughout all tensor reshaping operations for different speculative decoding modes while maintaining identical functionality.

568-571: LGTM! Getter method implementation properly updated.

The getter method renaming and return value maintain perfect consistency with the header declaration and member variable changes.

621-622: LGTM! Bounds checking uses correct member variable.

The bounds checking in both getter and setter methods correctly uses mMaxNumSequences while maintaining identical validation logic and error messages.

Also applies to: 627-628

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

Funatiq · 2025-08-11T09:38:19Z

Changes were integrated in #6216.

Funatiq requested a review from dcampora June 18, 2025 08:15

dcampora approved these changes Jun 18, 2025

View reviewed changes

Funatiq force-pushed the dev/fix_decoder_max branch from ec19ed3 to ced3ab5 Compare June 18, 2025 09:05

Funatiq force-pushed the dev/fix_decoder_max branch from ced3ab5 to 01cf692 Compare July 28, 2025 13:29

coderabbitai bot requested review from juney-nvidia, nv-guomingz and pcastonguay July 28, 2025 13:30

pcastonguay approved these changes Jul 28, 2025

View reviewed changes

Funatiq removed request for nv-guomingz and juney-nvidia August 1, 2025 16:48

netanel-haber mentioned this pull request Aug 3, 2025

NOT MEANT FOR REVIEW YET [TRTLLM-6121] TRTLLM Sampler PP support #6415

Closed

Funatiq closed this Aug 11, 2025

Funatiq deleted the dev/fix_decoder_max branch August 11, 2025 09:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor: Rename maxBatchSize to maxNumSequences in decoder #5321

refactor: Rename maxBatchSize to maxNumSequences in decoder #5321

Uh oh!

Funatiq commented Jun 18, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jul 28, 2025 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

Funatiq commented Aug 11, 2025

Uh oh!

Uh oh!

refactor: Rename maxBatchSize to maxNumSequences in decoder #5321

refactor: Rename maxBatchSize to maxNumSequences in decoder #5321

Uh oh!

Conversation

Funatiq commented Jun 18, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Coverage

GitHub Bot Help

kill

skip

reuse-pipeline

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

Funatiq commented Aug 11, 2025

Uh oh!

Uh oh!

Funatiq commented Jun 18, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 28, 2025 •

edited

Loading