[None][fix] Overlap: Skip last iter on length #5211

IzzyPutterman · 2025-06-13T22:23:54Z

Summary by CodeRabbit

Bug Fixes
- Improved handling of batch processing to avoid unnecessary operations when batch size is zero.
- Refined sorting of generation requests for more accurate processing.
- Enhanced conditional checks for resource management during batch execution.

litaotju · 2025-07-28T02:43:11Z

Thanks for the change. How to test the change?
Whats the existing cases can cover it if no need new test cases? Thx

coderabbitai · 2025-07-30T03:38:05Z

📝 Walkthrough

Walkthrough

The internal logic of the _executor_loop_overlap method in PyExecutor was refactored. Conditional checks around batch size and kv_cache_transceiver were tightened, the sorting criteria for generation requests were refined, and control flow was reorganized to avoid unnecessary operations when batch size is zero.

Changes

Cohort / File(s)	Change Summary
PyExecutor Overlap Logic Refactor `tensorrt_llm/_torch/pyexecutor/py_executor.py`	Refined sorting key for generation requests; added and reorganized conditional checks around batch size and `kv_cache_transceiver`; restructured control flow to minimize unnecessary operations; adjusted assignment and processing of previous batch state.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~15 minutes

Possibly related PRs

chore: remove unused code in PyExecutor #6351: Related internal _executor_loop_overlap method cleanup and logic refinement focusing on request filtering and sorting.
chore: add _prepare_and_schedule_batch function in PyExecutor #6365: Both PRs modify the _executor_loop_overlap method and related batch scheduling logic, focusing on internal control flow and batch preparation.

Suggested reviewers

pcastonguay
HuiGao-NV
leslie-fang25

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

🔭 Outside diff range comments (2)

tensorrt_llm/_torch/pyexecutor/py_executor.py (2)
1014-1031: Potential undefined variable error for ctx_transmission_reqs

The variable ctx_transmission_reqs is only defined inside the if self.kv_cache_transceiver condition (lines 1014-1016), but it's referenced unconditionally in line 1031. This will cause a NameError when kv_cache_transceiver is None.

Initialize ctx_transmission_reqs before the conditional block:
                    if scheduled_batch.batch_size > 0:
                        if self.kv_cache_transceiver:
                            # Return the first token to the client
                            self._handle_first_token_response(scheduled_batch)

                        previous_tensors_device = self.previous_batch and self.previous_batch.sample_state and self.previous_batch.sample_state.device

                        batch_outputs = self._forward_step(
                            scheduled_batch, previous_tensors_device)

                        if self.previous_batch is not None:
                            self._update_requests(
                                self.previous_batch.sample_state)

                        if self.guided_decoder is not None:
                            self.guided_decoder.build(scheduled_batch)
                            self.guided_decoder.execute(scheduled_batch,
                                                        batch_outputs['logits'])

                        sample_state = self._sample_async(
                            scheduled_batch, batch_outputs)
                        assert sample_state is not None, "Sampling failed"

                        self._update_request_states(scheduled_batch)

+                       ctx_transmission_reqs = []
                        ctx_transmission_reqs = self._send_disagg_ctx_cache(
                            scheduled_batch.context_requests
                        ) if self.kv_cache_transceiver else []
946-1037: Add unit tests for _executor_loop_overlap behavior

Our search didn’t uncover any existing tests targeting this overlap executor logic (sorting of generation requests, empty‐batch handling, or previous‐batch cleanup). To ensure these changes are correct and guard against regressions, please add dedicated tests covering:

Sorting criteria: mixed generation_requests with and without py_batch_idx, varying py_decoding_iter and py_max_new_tokens, and assert expected stable order.

Empty‐batch handling: invoke _executor_loop_overlap (or simulate its loop) with a batch of size 0 in overlap mode and verify it breaks/skips correctly without errors.

Previous‐batch cleanup: set up a non‐empty previous_batch, then trigger conditions where it should be cleared (e.g., empty next batch or end of loop), and assert self.previous_batch is None.

Suggested location:
• tests/unittest/_torch/pyexecutor/test_executor_overlap.py

🧹 Nitpick comments (2)

tensorrt_llm/_torch/pyexecutor/py_executor.py (2)
985-988: The new sorting condition may impact request ordering and execution

The sorting key now includes an additional condition req.py_decoding_iter + 1 < req.py_max_new_tokens, which changes how generation requests are ordered. This ensures that requests that have not reached their maximum token limit are processed first, which aligns with the PR's goal of "skipping last iter on length".

However, this change could affect the execution order of requests in mixed batches. Consider adding a comment to explain the sorting logic for future maintainability.
                    scheduled_batch.generation_requests = sorted(  # stable sort
                        scheduled_batch.generation_requests,
+                       # Sort requests that haven't reached max tokens first to ensure proper batch ordering
                        key=lambda req:
                        int(req.py_batch_idx is not None and req.
                            py_decoding_iter + 1 < req.py_max_new_tokens),
                    )
994-994: Line exceeds maximum length

This line is 150 characters long, exceeding the project's 120-character limit.
-                        previous_tensors_device = self.previous_batch and self.previous_batch.sample_state and self.previous_batch.sample_state.device
+                        previous_tensors_device = (
+                            self.previous_batch and 
+                            self.previous_batch.sample_state and 
+                            self.previous_batch.sample_state.device
+                        )

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c9ed1ab and b596c3c.

📒 Files selected for processing (1)

tensorrt_llm/_torch/pyexecutor/py_executor.py (2 hunks)

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: amitz-nv
PR: NVIDIA/TensorRT-LLM#5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks `is_adapter_in_cpu_cache()` and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.

tensorrt_llm/_torch/pyexecutor/py_executor.py (2)

Learnt from: amitz-nv
PR: #5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks is_adapter_in_cpu_cache() and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.

Learnt from: yechank-nvidia
PR: #6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using from_shared_tensor() is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call strip_for_generation() to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.

🪛 Ruff (0.12.2)

tensorrt_llm/_torch/pyexecutor/py_executor.py

994-994: Line too long (150 > 120)

(E501)

🔇 Additional comments (1)

tensorrt_llm/_torch/pyexecutor/py_executor.py (1)

1026-1033: Good defensive programming for empty batches

Setting previous_batch to None when batch_size is 0 prevents unnecessary processing in subsequent iterations and aligns with the PR's goal of skipping iterations when appropriate. This is a clean way to handle empty batches.

coderabbitai · 2025-07-30T03:40:11Z

tensorrt_llm/_torch/pyexecutor/py_executor.py

                    )
+                    if scheduled_batch.batch_size > 0:


⚠️ Potential issue

Remove redundant batch size check

This conditional check is redundant since it's already inside a if scheduled_batch.batch_size > 0: block starting at line 970. The nested check serves no purpose and reduces code readability.

- if scheduled_batch.batch_size > 0:

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In tensorrt_llm/_torch/pyexecutor/py_executor.py at line 989, remove the redundant conditional check for scheduled_batch.batch_size > 0 since this check is already performed in the outer block starting at line 970. Deleting this nested if statement will improve code readability without affecting functionality.

Please address

🧩 Analysis chain

@pcastonguay You're absolutely right, and I can confirm this is indeed a redundant conditional check.

Looking at the code structure, there are two nested if scheduled_batch.batch_size > 0: conditions:

Outer condition (line 970): if scheduled_batch.batch_size > 0:

Inner condition (line 989): if scheduled_batch.batch_size > 0: (newly added)

The inner check is completely redundant since we're already inside a block that only executes when scheduled_batch.batch_size > 0 is true. This creates unnecessary nesting and reduces code readability.

Solution: Remove the redundant inner conditional check at line 989. The code inside that block should remain but be unindented by one level to align with the rest of the outer conditional block.

The fix would involve:

Removing the line if scheduled_batch.batch_size > 0:

Unindenting the subsequent code block that was inside this redundant condition

This will maintain the exact same functionality while improving code clarity and reducing unnecessary nesting depth.

🏁 Script executed:

#!/bin/bash # Let's examine the specific section of code around the redundant conditional check sed -n '965,995p' tensorrt_llm/_torch/pyexecutor/py_executor.py

Length of output: 1402

🏁 Script executed:

#!/bin/bash # Let's get more context around line 970 to see the outer conditional block structure sed -n '968,1020p' tensorrt_llm/_torch/pyexecutor/py_executor.py

Length of output: 2436

@pcastonguay After examining the code more carefully, I can see that the issue is in the _executor_loop_overlap method, not the _executor_loop method I was initially looking at. Let me get the correct context for the redundant conditional check.

🏁 Script executed:

#!/bin/bash # Let's look specifically at the _executor_loop_overlap method where the redundant check is located grep -n "_executor_loop_overlap" tensorrt_llm/_torch/pyexecutor/py_executor.py -A 100 | head -150

Length of output: 7082

🏁 Script executed:

#!/bin/bash # Let's examine the specific line range around 970-1000 in the _executor_loop_overlap method awk 'NR >= 950 && NR <= 1000' tensorrt_llm/_torch/pyexecutor/py_executor.py

Length of output: 2317

@pcastonguay Perfect! Now I can see the exact issue. Looking at the _executor_loop_overlap method, there are indeed two nested conditional checks for scheduled_batch.batch_size > 0:

Line 1138: if scheduled_batch.batch_size > 0: (outer condition)

Line 1147: if scheduled_batch.batch_size > 0: (inner condition)

And from the diff, I can see that the change adds yet another nested check at what appears to be around line 989 in the diff context.

The structure shows:

Line 1138: First if scheduled_batch.batch_size > 0: block

Line 1147: Second if scheduled_batch.batch_size > 0: block (redundant)

The diff adds a third nested check

This creates unnecessary nesting and makes the code harder to read. The inner conditional checks are completely redundant since the code is already inside a block that only executes when scheduled_batch.batch_size > 0 is true.

Recommendation: Remove the redundant inner conditional check(s) and unindent the code accordingly to maintain the same functionality while improving readability.

🏁 Script executed:

#!/bin/bash # Let's get more lines from the _executor_loop_overlap method to see the redundant check grep -n "_executor_loop_overlap" tensorrt_llm/_torch/pyexecutor/py_executor.py -A 200 | grep -A 50 "scheduled_batch.batch_size > 0"

Length of output: 6562

🏁 Script executed:

#!/bin/bash # Let's look at the specific area around lines 1138-1200 in the _executor_loop_overlap method awk 'NR >= 1138 && NR <= 1200' tensorrt_llm/_torch/pyexecutor/py_executor.py

Length of output: 3613

@pcastonguay Perfect! Now I can clearly see the issue. Looking at the _executor_loop_overlap method, there are three redundant nested conditional checks for scheduled_batch.batch_size > 0:

Line 1138: if scheduled_batch.batch_size > 0: (first outer condition)

Line 1147: if scheduled_batch.batch_size > 0: (second redundant condition)

Line 1160: if scheduled_batch.batch_size > 0: (third redundant condition - this is the one highlighted in the diff)

Line 1200: if scheduled_batch.batch_size > 0: (fourth redundant condition)

This creates deeply nested code that's hard to read and maintain. All the inner conditions are redundant since we're already inside the first conditional block.

Here's what needs to be fixed:

Remove the redundant checks at lines 1147, 1160, and 1200

Unindent the code blocks that were inside these redundant conditions

Merge the logic into the single outer conditional at line 1138

The corrected structure should be:

if scheduled_batch.batch_size > 0: # All the functionality from the nested blocks should be here # without additional batch_size > 0 checks if self.kv_cache_transceiver: self._prepare_disagg_gen_transmission_complete(scheduled_batch) self.resource_manager.prepare_resources(scheduled_batch) self._kv_connector_start_batch(scheduled_batch) # Sorting logic scheduled_batch.generation_requests = sorted( scheduled_batch.generation_requests, key=lambda req: int(req.py_batch_idx is not None and req.py_decoding_iter + 1 < req.py_max_new_tokens) ) # First token response and forward step logic if self.kv_cache_transceiver: self._handle_first_token_response(scheduled_batch) # ... rest of the logic without nested batch_size checks self.previous_batch = BatchState( sample_state=sample_state, iter_start_time=iter_start_time, iter_stats=iter_stats, ctx_transmission_reqs=ctx_transmission_reqs ) else: self.previous_batch = None

This will eliminate the redundant nesting and make the code much more readable while maintaining exactly the same functionality.

It is not redundant, the batch size can be set to 0 because of the added check py_decoding_iter + 1 < req.py_max_new_tokens, so we need to redo the if statement.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Mmm, you're changing the key used to sort the generation requests, but that won't change the size of scheduled_batch.generation_requests. So batch_size should stay the same unless I'm missing something.

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

tensorrt_llm/_torch/pyexecutor/py_executor.py (1)
989-989: Remove redundant batch size check.

This conditional check is redundant since it's already inside a if scheduled_batch.batch_size > 0: block starting at line 970. The nested check serves no purpose and reduces code readability.
-                    if scheduled_batch.batch_size > 0:

🧹 Nitpick comments (1)

tensorrt_llm/_torch/pyexecutor/py_executor.py (1)
983-987: LGTM! Sorting logic correctly implements last iteration optimization.

The addition of req.py_decoding_iter + 1 < req.py_max_new_tokens to the sorting key effectively identifies requests that are not on their last iteration, which aligns with the PR objective to "skip last iter on length".

However, there's a line length issue that should be addressed:
-                        key=lambda req:
-                        int(req.py_batch_idx is not None and req.
-                            py_decoding_iter + 1 < req.py_max_new_tokens),
+                        key=lambda req: int(
+                            req.py_batch_idx is not None and 
+                            req.py_decoding_iter + 1 < req.py_max_new_tokens
+                        ),

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b596c3c and 6983bf1.

📒 Files selected for processing (1)

tensorrt_llm/_torch/pyexecutor/py_executor.py (2 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: The code developed for TensorRT-LLM should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile = ...).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL = ...).
Python constants should use upper snake_case (e.g., MY_CONSTANT = ...).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

tensorrt_llm/_torch/pyexecutor/py_executor.py

**/*.{cpp,h,hpp,cc,cxx,cu,cuh,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

tensorrt_llm/_torch/pyexecutor/py_executor.py

🧠 Learnings (2)

📓 Common learnings

Learnt from: amitz-nv
PR: NVIDIA/TensorRT-LLM#5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks `is_adapter_in_cpu_cache()` and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.

tensorrt_llm/_torch/pyexecutor/py_executor.py (2)

Learnt from: amitz-nv
PR: #5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks is_adapter_in_cpu_cache() and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.

Learnt from: yechank-nvidia
PR: #6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using from_shared_tensor() is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call strip_for_generation() to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.

🪛 Ruff (0.12.2)

tensorrt_llm/_torch/pyexecutor/py_executor.py

994-994: Line too long (150 > 120)

(E501)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (2)

tensorrt_llm/_torch/pyexecutor/py_executor.py (2)

990-1016: LGTM! Control flow reorganization improves efficiency.

The reorganization to execute expensive operations only when batch_size > 0 is a good optimization. This avoids unnecessary work when there are no requests to process, including:

KV cache transceiver operations

Forward step execution

Guided decoding and sampling

Context cache transmission

The variable scoping is correct - previous_tensors_device is properly initialized within the conditional where it's needed.

1026-1033: LGTM! Conditional batch state assignment completes the optimization.

The conditional assignment of self.previous_batch is consistent with the overall optimization strategy:

When batch_size > 0: Creates proper BatchState with sampling results to carry forward

When batch_size == 0: Sets to None to avoid carrying forward empty state

This ensures that the "skip last iter on length" optimization is properly reflected in the state management, preventing unnecessary state propagation when no processing occurred.

pcastonguay · 2025-09-04T12:13:54Z

@IzzyPutterman is this PR still relevant? If so, could you rebase and resolve conflicts?

IzzyPutterman · 2025-09-05T03:41:15Z

@IzzyPutterman is this PR still relevant? If so, could you rebase and resolve conflicts?

Yes I believe this is still relevant, we are doing extra work in some of these cases when its not needed.

pcastonguay · 2025-09-05T12:44:03Z

tensorrt_llm/_torch/pyexecutor/py_executor.py

                    )
+                    if scheduled_batch.batch_size > 0:


Please address

pcastonguay · 2025-09-05T12:45:06Z

tensorrt_llm/_torch/pyexecutor/py_executor.py

-                        iter_start_time=iter_start_time,
-                        iter_stats=iter_stats,
-                        ctx_transmission_reqs=ctx_transmission_reqs)
+                    if scheduled_batch.batch_size > 0:


Again this seems redundant since already inside an if scheduled_batch.batch_size > 0.

same as above, the scheduled_batch.generation_requests reassignment can reduce BS to 0 when already inside the top level conditional

pcastonguay · 2025-09-05T15:46:24Z

tensorrt_llm/_torch/pyexecutor/py_executor.py

+                        key=lambda req:
+                        int(req.py_batch_idx is not None and req.
+                            py_decoding_iter + 1 < req.py_max_new_tokens),


Could we add a test to verify this is behaving as expected? Maybe check the per iteration stats?

IzzyPutterman · 2025-09-05T21:28:44Z

/bot run

tensorrt-cicd · 2025-09-05T21:34:26Z

PR_Github #17820 [ run ] triggered by Bot

tensorrt-cicd · 2025-09-05T23:17:03Z

PR_Github #17820 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #13340 completed with status: 'FAILURE'

IzzyPutterman · 2025-09-06T03:57:03Z

/bot run

tensorrt-cicd · 2025-09-06T04:02:54Z

PR_Github #17846 [ run ] triggered by Bot

tensorrt-cicd · 2025-09-06T05:26:40Z

PR_Github #17846 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #13360 completed with status: 'FAILURE'

IzzyPutterman · 2025-09-06T17:28:30Z

/bot run

tensorrt-cicd · 2025-09-06T17:33:30Z

PR_Github #17892 [ run ] triggered by Bot

tensorrt-cicd · 2025-09-06T19:06:28Z

PR_Github #17892 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #13401 completed with status: 'FAILURE'

IzzyPutterman · 2025-09-06T20:17:42Z

/bot run

tensorrt-cicd · 2025-09-06T20:23:14Z

PR_Github #17895 [ run ] triggered by Bot

IzzyPutterman · 2025-09-07T17:37:23Z

/bot run

tensorrt-cicd · 2025-09-07T17:43:39Z

PR_Github #17950 [ run ] triggered by Bot

tensorrt-cicd · 2025-09-07T19:22:24Z

PR_Github #17950 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #13453 completed with status: 'FAILURE'

Signed-off-by: Izzy Putterman <[email protected]>

IzzyPutterman · 2025-09-08T18:01:20Z

/bot run

tensorrt-cicd · 2025-09-08T18:07:00Z

PR_Github #18082 [ run ] triggered by Bot

tensorrt-cicd · 2025-09-08T19:26:55Z

PR_Github #18082 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #13554 completed with status: 'FAILURE'

IzzyPutterman · 2025-09-08T20:02:32Z

/bot run

tensorrt-cicd · 2025-09-08T20:10:27Z

PR_Github #18091 [ run ] triggered by Bot

tensorrt-cicd · 2025-09-08T22:17:45Z

PR_Github #18091 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #13558 completed with status: 'FAILURE'

IzzyPutterman · 2025-09-08T22:18:57Z

/bot run

tensorrt-cicd · 2025-09-08T22:25:15Z

PR_Github #18099 [ run ] triggered by Bot

tensorrt-cicd · 2025-09-09T00:07:16Z

PR_Github #18099 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #13563 completed with status: 'FAILURE'

Signed-off-by: Izzy Putterman <[email protected]>

IzzyPutterman · 2025-09-09T01:00:10Z

/bot run

tensorrt-cicd · 2025-09-09T01:06:05Z

PR_Github #18110 [ run ] triggered by Bot

tensorrt-cicd · 2025-09-09T08:14:19Z

PR_Github #18110 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #13571 completed with status: 'FAILURE'

pcastonguay · 2025-09-09T12:00:18Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-09-09T12:05:50Z

PR_Github #18218 [ run ] triggered by Bot

IzzyPutterman · 2025-09-10T19:53:51Z

/bot run

tensorrt-cicd · 2025-09-10T20:18:38Z

PR_Github #18401 [ run ] triggered by Bot

tensorrt-cicd · 2025-09-11T00:41:46Z

PR_Github #18401 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #13805 completed with status: 'FAILURE'

IzzyPutterman requested a review from a team as a code owner June 13, 2025 22:23

IzzyPutterman requested a review from Naveassaf June 13, 2025 22:23

IzzyPutterman mentioned this pull request Jun 13, 2025

[TRTLLM-4983] feat: enable overlap scheduler between draft forwards #4802

Merged

IzzyPutterman force-pushed the iputterman/skip-last-overlap branch from 282cca7 to b596c3c Compare July 30, 2025 03:37

coderabbitai bot requested review from HuiGao-NV, leslie-fang25 and pcastonguay July 30, 2025 03:38

coderabbitai bot reviewed Jul 30, 2025

View reviewed changes

IzzyPutterman force-pushed the iputterman/skip-last-overlap branch 2 times, most recently from 42e9603 to 3ca3494 Compare September 5, 2025 03:40

IzzyPutterman changed the title ~~Overlap: Skip last iter on length~~ [None][fix] Overlap: Skip last iter on length Sep 5, 2025

pcastonguay reviewed Sep 5, 2025

View reviewed changes

IzzyPutterman force-pushed the iputterman/skip-last-overlap branch from 6d9c6a8 to 2f2b8e3 Compare September 6, 2025 20:17

IzzyPutterman added 7 commits September 8, 2025 11:00

Overlap Scheduling for last decoding iter

faf40e7

Signed-off-by: Izzy Putterman <[email protected]>

small cleanup

9fc9ad0

Signed-off-by: Izzy Putterman <[email protected]>

fix the logic, move code

e0fd465

Signed-off-by: Izzy Putterman <[email protected]>

Actually working now

96e83ea

Signed-off-by: Izzy Putterman <[email protected]>

modify test

c21ac50

Signed-off-by: Izzy Putterman <[email protected]>

Edge case for logits

bfa32e5

Signed-off-by: Izzy Putterman <[email protected]>

update another test for overlap

0dcc979

Signed-off-by: Izzy Putterman <[email protected]>

IzzyPutterman force-pushed the iputterman/skip-last-overlap branch from 73ad66d to 0dcc979 Compare September 8, 2025 18:00

Fix for attn DP

98a2557

Signed-off-by: Izzy Putterman <[email protected]>

pcastonguay requested a review from QiJune September 11, 2025 12:12

[None][fix] Overlap: Skip last iter on length #5211

Are you sure you want to change the base?

[None][fix] Overlap: Skip last iter on length #5211

Conversation

IzzyPutterman commented Jun 13, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

litaotju commented Jul 28, 2025

Uh oh!

coderabbitai bot commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pcastonguay Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

IzzyPutterman Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

pcastonguay Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

pcastonguay commented Sep 4, 2025

Uh oh!

IzzyPutterman commented Sep 5, 2025

Uh oh!

pcastonguay Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

pcastonguay Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

IzzyPutterman Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

pcastonguay Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

IzzyPutterman commented Sep 5, 2025

Uh oh!

tensorrt-cicd commented Sep 5, 2025

Uh oh!

tensorrt-cicd commented Sep 5, 2025

Uh oh!

IzzyPutterman commented Sep 6, 2025

Uh oh!

tensorrt-cicd commented Sep 6, 2025

Uh oh!

tensorrt-cicd commented Sep 6, 2025

Uh oh!

IzzyPutterman commented Sep 6, 2025

Uh oh!

tensorrt-cicd commented Sep 6, 2025

Uh oh!

IzzyPutterman commented Jun 13, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 30, 2025 •

edited

Loading

coderabbitai bot Jul 30, 2025 •

edited

Loading