[https://nvbugs/5394392][fix] Enlarge scheduler capacity under disagg bs == 1 #6537

yifeizhang-c · 2025-08-01T03:41:33Z

Description

Under disagg, if generation server runs with bs == 1, the dummy GENERATION_IN_PROGRESS request for attention dp will prevent DISAGG_GENERATION_INIT from being scheduled, thus letting generation server running in an endless cycle. This PR enlarges scheduler capacity and related resources to be with at least capacity == 2.

Besides, originally py_executor logic assign new SEQ_SLOT resource on DISAGG_GEN_INIT state. This PR delays the assignment to DISAGG_TRANS_COMPLETE state.

Summary by CodeRabbit

Bug Fixes
- Resolved a scheduling stall that could occur with single-sequence generation when using KV cache by adjusting capacity and sampler sizing, improving stability.
- Prevented initialization-only requests in disaggregated generation from blocking sequence-slot assignment, enhancing throughput and reliability.
Tests
- Added an integration test for disaggregated generation with batch size 1 (TinyLlama) and included it in the l0_dgx_h100 pre-merge test suite.

coderabbitai · 2025-08-01T03:41:40Z

📝 Walkthrough

Walkthrough

Adds conditional +1 to max_num_sequences in PyTorch executor/sampler when gen batch size is 1 with KV cache present, skips DISAGG_GENERATION_INIT requests during sequence-slot assignment in C++ scheduler, and introduces a new integration test and test-list entry for gen batch size 1 (TinyLlama).

Changes

Cohort / File(s)	Summary of changes
Disaggregated scheduling and executor adjustments `cpp/tensorrt_llm/batch_manager/assignReqSeqSlots.cpp`, `tensorrt_llm/_torch/pyexecutor/_util.py`	C++: Skip sequence-slot assignment for requests in DISAGG_GENERATION_INIT. Python: When max_num_sequences == 1 and KV cache is used, increment it by 1 in create_py_executor_instance and create_torch_sampler_args.
Integration tests for gen batch size 1 `tests/integration/defs/disaggregated/test_disaggregated.py`, `tests/integration/test_lists/test-db/l0_dgx_h100.yml`	Add test_disaggregated_genbs1 with TinyLlama and corresponding entry in l0_dgx_h100 pre_merge PyTorch list.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant Scheduler
  participant AssignReqSeqSlots
  Client->>Scheduler: Submit requests
  Scheduler->>AssignReqSeqSlots: Assign sequence slots
  loop For each request
    AssignReqSeqSlots->>AssignReqSeqSlots: Check state
    alt state == DISAGG_GENERATION_INIT
      AssignReqSeqSlots-->>AssignReqSeqSlots: continue (skip assignment)
    else
      AssignReqSeqSlots->>AssignReqSeqSlots: Evaluate isReqNew / acquire slot
    end
  end
  AssignReqSeqSlots-->>Scheduler: Assigned slots (excluding INIT)

sequenceDiagram
  participant Caller
  participant PyExecUtil
  Caller->>PyExecUtil: create_py_executor_instance(max_num_sequences, kv_cache_mgr)
  alt max_num_sequences==1 and kv_cache_mgr present
    PyExecUtil->>PyExecUtil: max_num_sequences += 1
  end
  PyExecUtil-->>Caller: Executor instance

  Caller->>PyExecUtil: create_torch_sampler_args(max_num_sequences, kv_cache_cfg)
  alt max_num_sequences==1 and kv_cache_cfg present
    PyExecUtil->>PyExecUtil: max_num_sequences += 1
  end
  PyExecUtil-->>Caller: Sampler args

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~15 minutes

Possibly related PRs

feat: Add support for disaggregation with pp with pytorch backend #6369: Also changes disaggregated PyTorch serving paths (KV-cache/pyexecutor and related tests), likely aligned with this scheduling/executor adjustment.

Suggested reviewers

Shixiaowei02
Tabrizian
pcastonguay
mikeiovine
yilin-void

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

tensorrt_llm/_torch/pyexecutor/_util.py (1)

565-568: Same adjustment duplicated here – apply the helper for consistency

This block repeats the logic discussed above. Once a helper like _effective_max_num_sequences() is introduced, replace the open-coded calculation here to guarantee both the sampler and the slot/scheduler stay in lock-step.

🧹 Nitpick comments (1)

tensorrt_llm/_torch/pyexecutor/_util.py (1)
509-513: Factor out the “extra-capacity” formula to avoid duplication and future drift

The same if executor_config.max_batch_size == 1: max_num_sequences += mapping.pp_size pattern now lives in two different code paths (create_py_executor_instance and create_torch_sampler_args). If this heuristic ever needs tuning (e.g. the required headroom changes again), the risk of the two locations diverging is high.

Consider extracting a small helper, e.g.
def _effective_max_num_sequences(max_batch_size: int, pp_size: int) -> int:
    cap = max_batch_size * pp_size
    if max_batch_size == 1:
        cap += pp_size
    return cap
and reuse it in both call sites.
This keeps the policy in one place, makes the intent explicit, and simplifies unit testing for edge cases (pp_size > 1, max_batch_size > 1, etc.).

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4472f11 and 4ea8dff.

📒 Files selected for processing (1)

tensorrt_llm/_torch/pyexecutor/_util.py (2 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: in tensorrt_llm/executor/worker.py, the lora adapter cache optimization logic that checks `is_adapte...

Learnt from: amitz-nv
PR: NVIDIA/TensorRT-LLM#5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks `is_adapter_in_cpu_cache()` and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.

Applied to files:

tensorrt_llm/_torch/pyexecutor/_util.py

yifeizhang-c · 2025-08-05T05:37:53Z

/bot run

tensorrt-cicd · 2025-08-05T05:43:05Z

PR_Github #14084 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-05T06:19:14Z

PR_Github #14084 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #10628 completed with status: 'FAILURE'

yifeizhang-c · 2025-08-05T06:35:15Z

/bot run

tensorrt-cicd · 2025-08-05T06:40:42Z

PR_Github #14096 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-05T08:57:20Z

PR_Github #14096 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #10636 completed with status: 'FAILURE'

yifeizhang-c · 2025-08-05T09:24:04Z

/bot run

tensorrt-cicd · 2025-08-05T09:29:21Z

PR_Github #14131 [ run ] triggered by Bot

yifeizhang-c · 2025-08-05T09:29:28Z

@Shixiaowei02 @qiaoxj07 Can you also help take a look at the current PR?

tensorrt-cicd · 2025-08-05T14:52:05Z

PR_Github #14131 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #10661 completed with status: 'FAILURE'

yifeizhang-c · 2025-08-13T02:39:55Z

/bot run

tensorrt-cicd · 2025-08-13T02:45:39Z

PR_Github #15051 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-13T04:51:45Z

PR_Github #15051 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #11368 completed with status: 'SUCCESS'

Tabrizian · 2025-08-13T19:45:49Z

@yifeizhang-c Do you mind responding to the question here? #6537 (comment)

Increasing max_num_sequences by 1 when ADP is enabled is like a workaround, no?

yifeizhang-c · 2025-08-14T07:01:57Z

@yifeizhang-c Do you mind responding to the question here? #6537 (comment)

Increasing max_num_sequences by 1 when ADP is enabled is like a workaround, no?

Increasing max_num_sequences by 1 can be regarded as a workaround.

Sorry for not replying the previous comment in time, as the suggestion indeed will make the code changes more concise, and I was analyzing into it.

To resolve the disagg bs==1 non-stop issue, increasing the CapacityScheduler's capacity to be at least 2 seems to be a must as the active request pool has both DISAGG_GENERATION_INIT and GENERATION_IN_PROGRESS requests. If CapacityScheduler's capacity is only 1, then according to request scheduling priority, GENERATION_IN_PROGRESS will be scheduled first as it is closer to the final GENERATION_COMPLETE status, and thus DISAGG_GEN_INIT will be stuck unscheduled forever, and generation workers will never start. CapacityScheduler currently cannot distinguish dummy GENERATION_IN_PROGRESS from normal GENERATION_IN_PROGRESS.

Signed-off-by: Yifei Zhang <[email protected]>

yifeizhang-c · 2025-08-14T16:47:45Z

Updated the logic to move SEQ_SLOT assignment from DISAGG_GEN_INIT stage to DISAGG_GEN_TRANS_COMPLETE stage. The current solution can resolve bs==1 non-stop issue, but I am not sure whether this workflow change will lead to more issues.

yifeizhang-c · 2025-08-14T16:48:01Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-08-14T16:53:02Z

PR_Github #15326 [ run ] triggered by Bot

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

cpp/tensorrt_llm/batch_manager/assignReqSeqSlots.cpp (1)
35-36: Nit: punctuate the comment and optionally add a trace log to aid debugging.

A terminal period aligns with our style. The trace log makes it easier to confirm that INIT requests are indeed skipped at runtime.

Apply:
-                // Skip assigning sequence slot for DISAGG_GENERATION_INIT request
-                continue;
+                // Skip assigning sequence slot for DISAGG_GENERATION_INIT request.
+                TLLM_LOG_TRACE("Skipping seq-slot assignment for DISAGG_GENERATION_INIT request ID %lu", llmReq->mRequestId);
+                continue;

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these settings in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 6852001 and 14eecb5.

📒 Files selected for processing (1)

cpp/tensorrt_llm/batch_manager/assignReqSeqSlots.cpp (1 hunks)

🧰 Additional context used

📓 Path-based instructions (4)

**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh}: In C++, close namespaces with a comment naming the namespace (e.g., } // namespace foo)
Prefer const/constexpr variables over #define for constants
Declare variables const if not modified after initialization
Use Allman brace style in C++
C++ filenames use lowerCamelCase and must be case-insensitively unique within a build target
C++ type names use UpperCamelCase
Local variables, methods, and namespaces use lowerCamelCase
Global non-static variables not in anonymous namespace use gPrefix lowerCamelCase (e.g., gExample)
Static globals or globals in anonymous namespaces use sPrefix lowerCamelCase
Locally visible static variables start with 's' (e.g., static std::once_flag sFlag;)
Member variables use mPrefix lowerCamelCase; public members may omit but are encouraged to use 'm'
Constants (enums, global/static/function-scope magic numbers) use kPREFIXED_UPPER_SNAKE (e.g., kDIGIT_NUM)
If macros are unavoidable, use UPPER_SNAKE_CASE (prefer constants over #define)
Constructor parameter that conflicts with a public member name gets trailing underscore (foo_)
Literal suffixes should be uppercase (e.g., 1234L not 1234l)
C++: use spaces only; indent 4 spaces
Run clang-format (LLVM style) before submitting; wrap lines at 120 characters
If formatting must be bypassed, use // clang-format off/on around the section
Prefer smart pointers; use unique_ptr for sole ownership, shared_ptr for shared; weak_ptr only in exceptional cases
Do not use deprecated pre-C++11 smart pointers
Use C++ style comments; avoid C comments except special inline cases; prefer // single-line
Capitalize and punctuate full-sentence comments
Follow Doxygen rules: use //! for comments and //!< for members in C++
Disable code with #if/#endif and mnemonic conditions; avoid commented-out code; avoid dead code
Do not throw exceptions across library boundaries
Use least-forceful casts; avoid removing const/volatile; avoid C-style and functional casts (except constructors); p...

Files:

cpp/tensorrt_llm/batch_manager/assignReqSeqSlots.cpp

**/*.{cpp,cxx,cc,cu}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.{cpp,cxx,cc,cu}: Avoid literal values except for 0, nullptr, true, false; use named constexpr for other literals
Place semicolon of empty for/while loop on a new line
Always use brace-delimited bodies for switch/while/do-for/if/else
Use inline C comments in argument lists when parameter meaning is unclear (e.g., /* checkForErrors = */ false)
Do not use assignment in subexpressions (e.g., if (x = y) ... is forbidden)
Switch on enums should enumerate all values and omit default to catch new values at compile time
Structure switch statements; prohibit fallthrough except between empty cases; each case ends with break or throw; return at end of case not allowed; put break inside braces for compound case
Prefer anonymous namespaces over static for internal linkage of functions
Every defined function must be called at least once (no unused methods)

Files:

cpp/tensorrt_llm/batch_manager/assignReqSeqSlots.cpp

**/*.{h,hpp,hxx,hh,cuh,cpp,cxx,cc,cu}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

Parameter names must be consistent between declarations and definitions

Files:

cpp/tensorrt_llm/batch_manager/assignReqSeqSlots.cpp

**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

Prepend NVIDIA copyright header (current year) to all source files

Files:

cpp/tensorrt_llm/batch_manager/assignReqSeqSlots.cpp

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (1)

cpp/tensorrt_llm/batch_manager/assignReqSeqSlots.cpp (1)

33-37: Skipping DISAGG_GENERATION_INIT from seq-slot assignment — verified safe

Short summary: I inspected the call flow and usages of mSeqSlot / perf timestamps. Skipping seq-slot assignment for DISAGG_GENERATION_INIT is intentional and safe — no downstream code path reads mSeqSlot or relies on firstScheduledTime while a request is in the INIT state; slots are assigned later when transmission completes.

Relevant findings (key locations inspected)

cpp/tensorrt_llm/batch_manager/assignReqSeqSlots.cpp (lines ~29–46): the continue for isDisaggGenerationInitState() is present; setFirstScheduledTime() is only called for isReqNew (context init or isDisaggGenerationTransmissionComplete).

cpp/tensorrt_llm/batch_manager/trtGptModelInflightBatching.cpp:

capacity + micro-batching flow: capacity scheduler returns fittingRequests and fittingDisaggGenInitRequests; prepareDisaggGenInitRequests(...) handles KV transfer for DISAGG_GENERATION_INIT (no seq-slot use) and mAssignReqSeqSlots is called only for microBatchScheduler output (currRequests) — see the calls around lines ~1035 and ~1064, and prepareDisaggGenInitRequests around ~1520–1620.

cpp/tensorrt_llm/batch_manager/sequenceSlotManager.cpp: getSequenceSlot(startFlag, sequenceId) only assigns on startFlag; freeSequenceSlot / freeIdleSequenceSlots manage lifetimes.

Code paths that dereference mSeqSlot (e.g., createNewDecoderRequests.cpp, makeDecodingBatchInputOutput.cpp, runtime/gptDecoderBatched.cpp, runtime/decoderState.cpp, transformerBuffers.cpp, handleGenerationLogits.cpp, runtimeBuffers.cpp) operate on requests produced by the microBatchScheduler (scheduled context/generation requests) — i.e., requests that should have a slot by the time those functions run.

Perf metrics: setFirstScheduledTime is invoked only when a slot is newly assigned (isReqNew && getReturnPerfMetrics()); firstScheduledTime is used for reporting/serialization (executor serialization, triton reporting), not for control flow that would break if absent during INIT.

Python-side mirror: tensorrt_llm/_torch/pyexecutor/seq_slot_manager.py also explicitly skips DISAGG_GENERATION_INIT, and assigns slot + first-scheduled-time later (consistent behavior).

Integration test & scheduler guard: the repo contains test_disaggregated_genbs1 and the Python-side scheduler_capacity bump for bs==1 (tensorrt_llm/_torch/pyexecutor/_util.py) to prevent scheduler deadlocks — complementary to this change.

Conclusion / action

No code changes required here; the continue is correct and safe. Approve this change.

Tabrizian · 2025-08-14T17:19:26Z

Thanks @yifeizhang-c, I think the pytorch runtime doesn't use the C++ batch scheduler (feel free to correct me). I think the reason that _schedule function doesn't schedule the generation init requests is that as soon as it hits the max batch size, it would stop looking at the other requests.

In this case, it would just schedule the dummy request and never reach the generation init request since there is a dummy request added in every iteration.

Here is a draft of the scheduler changes which I think is needed for this: https://github.com/NVIDIA/TensorRT-LLM/compare/main...Tabrizian:TensorRT-LLM:user/imant/schedulerChange?expand=1 Please let me know if you have any feedback.

tensorrt-cicd · 2025-08-14T23:45:37Z

PR_Github #15326 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #11567 completed with status: 'FAILURE'

yifeizhang-c · 2025-08-15T03:44:30Z

I think the pytorch runtime doesn't use the C++ batch scheduler (feel free to correct me).

Currently py_executor uses BindCapacityScheduler, which eventually invokes C++ GuaranteedNoEvictScheduler.

I think the reason that _schedule function doesn't schedule the generation init requests is that as soon as it hits the max batch size, it would stop looking at the other requests.

In this case, it would just schedule the dummy request and never reach the generation init request since there is a dummy request added in every iteration.

Here is a draft of the scheduler changes which I think is needed for this: https://github.com/NVIDIA/TensorRT-LLM/compare/main...Tabrizian:TensorRT-LLM:user/imant/schedulerChange?expand=1 Please let me know if you have any feedback.

I am not clear about the logic behind the implementation of GuaranteedNoEvictScheduler. But I wonder whether we can schedule DISAGG_GEN_INIT totally separately from GEN_IN_PROGRESS. Will there be cases where the (N-1)-th request is still in GEN_IN_PROGRESS and we received the N-th request’s DISAGG_GEN_INIT, and will scheduling DISAGG_GEN_INIT cause trouble in this case?

yifeizhang-c · 2025-08-15T04:06:47Z

/bot run

tensorrt-cicd · 2025-08-15T04:12:00Z

PR_Github #15394 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-15T09:51:42Z

PR_Github #15394 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #11603 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

… bs == 1 (NVIDIA#6537) Signed-off-by: Yifei Zhang <[email protected]>

… bs == 1 (NVIDIA#6537) Signed-off-by: Yifei Zhang <[email protected]> Signed-off-by: Wangshanshan <[email protected]>

… bs == 1 (NVIDIA#6537) Signed-off-by: Yifei Zhang <[email protected]>

… bs == 1 (NVIDIA#6537) Signed-off-by: Yifei Zhang <[email protected]> Signed-off-by: Wangshanshan <[email protected]>

… bs == 1 (NVIDIA#6537) Signed-off-by: Yifei Zhang <[email protected]>

… bs == 1 (NVIDIA#6537) Signed-off-by: Yifei Zhang <[email protected]> Signed-off-by: Wangshanshan <[email protected]>

coderabbitai bot reviewed Aug 1, 2025

View reviewed changes

yifeizhang-c marked this pull request as ready for review August 1, 2025 04:31

yifeizhang-c requested a review from a team as a code owner August 1, 2025 04:31

yifeizhang-c requested a review from schetlur-nv August 1, 2025 04:31

yifeizhang-c force-pushed the dev-yifeiz-fix-bs1-nonstop branch 3 times, most recently from ca73d42 to 69a5319 Compare August 1, 2025 06:29

yifeizhang-c changed the title ~~Enlarge scheduler and slot manager capacity under disagg bs == 1~~ https://nvbugs/5394392: Enlarge scheduler and slot manager capacity under disagg bs == 1 Aug 1, 2025

yifeizhang-c changed the title ~~https://nvbugs/5394392: Enlarge scheduler and slot manager capacity under disagg bs == 1~~ [https://nvbugs/5394392] [Fix] Enlarge scheduler and slot manager capacity under disagg bs == 1 Aug 1, 2025

yifeizhang-c changed the title ~~[https://nvbugs/5394392] [Fix] Enlarge scheduler and slot manager capacity under disagg bs == 1~~ [https://nvbugs/5394392][fix] Enlarge scheduler and slot manager capacity under disagg bs == 1 Aug 4, 2025

coderabbitai bot requested review from mikeiovine, pcastonguay, wili-65535 and yuxianq August 4, 2025 05:36

yifeizhang-c force-pushed the dev-yifeiz-fix-bs1-nonstop branch 3 times, most recently from be42893 to 4f68507 Compare August 5, 2025 05:36

yifeizhang-c force-pushed the dev-yifeiz-fix-bs1-nonstop branch from 4f68507 to 00f167f Compare August 5, 2025 06:34

yifeizhang-c force-pushed the dev-yifeiz-fix-bs1-nonstop branch from 00f167f to f8cc579 Compare August 5, 2025 09:18

Move seq slot resources init to disagg trans complete stage

14eecb5

Signed-off-by: Yifei Zhang <[email protected]>

coderabbitai bot reviewed Aug 14, 2025

View reviewed changes

yifeizhang-c requested a review from Shixiaowei02 August 15, 2025 02:54

yifeizhang-c changed the title ~~[https://nvbugs/5394392][fix] Enlarge scheduler and slot manager capacity under disagg bs == 1~~ [https://nvbugs/5394392][fix] Enlarge scheduler capacity under disagg bs == 1 Aug 15, 2025

schetlur-nv approved these changes Aug 15, 2025

View reviewed changes

schetlur-nv merged commit 4127d77 into NVIDIA:main Aug 15, 2025
7 checks passed

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Aug 17, 2025

[https://nvbugs/5394392][fix] Enlarge scheduler capacity under disagg…

b1098df

… bs == 1 (NVIDIA#6537) Signed-off-by: Yifei Zhang <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Aug 17, 2025

[https://nvbugs/5394392][fix] Enlarge scheduler capacity under disagg…

665af09

… bs == 1 (NVIDIA#6537) Signed-off-by: Yifei Zhang <[email protected]>

yifeizhang-c added a commit to yifeizhang-c/TensorRT-LLM that referenced this pull request Aug 18, 2025

[https://nvbugs/5394392][fix] Enlarge scheduler capacity under disagg…

35e3f0b

… bs == 1 (NVIDIA#6537) Signed-off-by: Yifei Zhang <[email protected]>

coderabbitai bot mentioned this pull request Aug 18, 2025

[https://nvbugs/5394392][fix] Enlarge scheduler capacity under disagg bs == 1 #6975

Merged

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Aug 18, 2025

[https://nvbugs/5394392][fix] Enlarge scheduler capacity under disagg…

8316274

… bs == 1 (NVIDIA#6537) Signed-off-by: Yifei Zhang <[email protected]>

coderabbitai bot mentioned this pull request Aug 18, 2025

[https://nvbugs/5440521][bug] Fix sequence slot allocation for attention DP #6878

Closed

[https://nvbugs/5394392][fix] Enlarge scheduler capacity under disagg bs == 1 #6537

[https://nvbugs/5394392][fix] Enlarge scheduler capacity under disagg bs == 1 #6537

Uh oh!

Conversation

yifeizhang-c commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

yifeizhang-c commented Aug 5, 2025

Uh oh!

tensorrt-cicd commented Aug 5, 2025

Uh oh!

tensorrt-cicd commented Aug 5, 2025

Uh oh!

yifeizhang-c commented Aug 5, 2025

Uh oh!

tensorrt-cicd commented Aug 5, 2025

Uh oh!

tensorrt-cicd commented Aug 5, 2025

Uh oh!

yifeizhang-c commented Aug 5, 2025

Uh oh!

tensorrt-cicd commented Aug 5, 2025

Uh oh!

yifeizhang-c commented Aug 5, 2025

Uh oh!

tensorrt-cicd commented Aug 5, 2025

Uh oh!

yifeizhang-c commented Aug 13, 2025

Uh oh!

tensorrt-cicd commented Aug 13, 2025

Uh oh!

tensorrt-cicd commented Aug 13, 2025

Uh oh!

Tabrizian commented Aug 13, 2025

Uh oh!

yifeizhang-c commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yifeizhang-c commented Aug 14, 2025

Uh oh!

yifeizhang-c commented Aug 14, 2025

Uh oh!

tensorrt-cicd commented Aug 14, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Tabrizian commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tensorrt-cicd commented Aug 14, 2025

Uh oh!

yifeizhang-c commented Aug 15, 2025

Uh oh!

yifeizhang-c commented Aug 15, 2025

Uh oh!

tensorrt-cicd commented Aug 15, 2025

Uh oh!

tensorrt-cicd commented Aug 15, 2025

Uh oh!

Uh oh!

yifeizhang-c commented Aug 1, 2025 •

edited

Loading

coderabbitai bot commented Aug 1, 2025 •

edited

Loading

yifeizhang-c commented Aug 14, 2025 •

edited

Loading

Tabrizian commented Aug 14, 2025 •

edited

Loading