[TRTLLM-6675][infra] Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/6623 #6735

bo-nv · 2025-08-08T06:10:04Z

Summary by CodeRabbit

New Features
- Added integration and accuracy tests for disaggregated serving on an additional backend across multiple models.
- Added an end-to-end disaggregated benchmarking workflow to run and compare backend performance (E2EL and TTFT).
Bug Fixes
- Improved benchmark metric extraction to avoid failures when some metrics are missing.
Tests
- Expanded test lists, fixtures, configs, and exclusion entries to include the new backend and benchmarks.

Description

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

coderabbitai · 2025-08-08T06:10:10Z

📝 Walkthrough

Walkthrough

Adds nixl-backed accuracy tests, a disaggregated benchmarking workflow (fixtures, helpers, and a parameterized benchmark test), updates multiple test lists and test-db exclusions, and makes benchmark_serving.py’s metrics construction tolerant of missing keys to avoid KeyError. Some inserted blocks are duplicated in the diffs.

Changes

Cohort / File(s)	Change Summary
Benchmark script `tensorrt_llm/serve/scripts/benchmark_serving.py`	In `save_to_pytorch_benchmark_format`, build the metrics mapping only from keys present in `results` to avoid KeyError; no public API changes.
Accuracy tests (nixl backend) `tests/integration/defs/accuracy/test_disaggregated_serving.py`	Added `test_nixl_backend` to `TestDeepSeekV3Lite` and `TestQwen3_8B` that launch disaggregated servers (nixl context/generation) and evaluate MMLU and GSM8K.
Disaggregated benchmark infra & test `tests/integration/defs/disaggregated/test_disaggregated.py`	Added fixtures (`benchmark_root`, `shared_gpt_path`), a parameterized `benchmark_model_root`, helper functions (`run_disaggregated_benchmark`, `get_config_for_benchmark`), and `test_disaggregated_benchmark_on_diff_backends` comparing nixl vs ucx across models. Note: added blocks appear duplicated.
Test list additions (accuracy) `tests/integration/test_lists/qa/llm_function_full.txt`, `tests/integration/test_lists/qa/llm_function_sanity.txt`	Appended entries for the two new `test_nixl_backend` accuracy tests (duplicate insertion points present).
Test DB additions `tests/integration/test_lists/test-db/l0_dgx_b200.yml`	Added parameterized disaggregated benchmark invocations for several models and the two `test_nixl_backend` accuracy entries.
Test exclusions updated `tests/integration/test_lists/test-db/l0_dgx_h100.yml`	Added exclusions for the new accuracy and disaggregated benchmark tests under PyTorch pre_merge; some exclusions duplicated.

Sequence Diagram(s)

sequenceDiagram
    participant Runner as Test Runner
    participant Disagg as Disaggregated Server
    participant MPI as MPI Workers
    participant Bench as benchmark_serving.py

    Runner->>Disagg: Start disaggregated server (config: nixl or ucx)
    Runner->>MPI: Launch 2 MPI workers
    Runner->>Bench: Warm up client
    Runner->>Bench: Run benchmark_serving.py
    Bench->>Disagg: Send inference requests
    Disagg->>MPI: Dispatch to workers
    MPI-->>Disagg: Return responses
    Disagg-->>Bench: Aggregate metrics
    Bench-->>Runner: Emit Median E2EL and Median TTFT

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

[TRTLLM-6675][infra] Nixl test completion #6623: Implements the same metrics-filtering fix in benchmark_serving.py and includes related nixl benchmarking/tests.
[TRTLLM-6471] Infra: unwaive nixl tests and some disagg-serve tests #6095: Re-enables/adds NIXL disaggregated tests and updates related test-list entries overlapping this change.
feat: Add support for disaggregation with pp with pytorch backend #6369: Modifies the same accuracy tests file and disaggregated test infrastructure; likely to interact with these additions.

Suggested labels

Release Blocker

Suggested reviewers

chzblych
chuangz0
Shixiaowei02
Tabrizian
pcastonguay

📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a7be498 and 4e21fab.

📒 Files selected for processing (1)

tests/integration/defs/accuracy/test_disaggregated_serving.py (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

tests/integration/defs/accuracy/test_disaggregated_serving.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

bo-nv · 2025-08-08T06:10:20Z

/bot run --add-multi-gpu-test

tensorrt-cicd · 2025-08-08T06:16:29Z

PR_Github #14571 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-08T19:58:28Z

PR_Github #14571 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #11008 completed with status: 'FAILURE'

bo-nv · 2025-08-10T15:49:39Z

/bot run --add-multi-gpu-test

tensorrt-cicd · 2025-08-10T15:54:46Z

PR_Github #14713 [ run ] triggered by Bot

coderabbitai

Actionable comments posted: 8

🧹 Nitpick comments (3)

tests/integration/defs/disaggregated/test_disaggregated.py (3)
23-24: Import style: prefer namespace imports per guidelines

Guideline: “Always maintain the namespace when importing in Python.” Consider importing the module and using qualified names.

Example:
-from defs.trt_test_alternative import check_call, check_output, popen
+import defs.trt_test_alternative as trt_test_alt
Outside this hunk, calls would become:

trt_test_alt.check_call(...)

trt_test_alt.check_output(...)

trt_test_alt.popen(...)

1064-1069: Dataset path is unused for random dataset; optionally drop it

You pass --dataset-name random, so --dataset-path is ignored. Removing it reduces environment dependencies.

Apply together with the benchmark_cmd change below (lines 1148-1152).

1139-1142: Verify benchmark script path before running

Avoid confusing subprocess errors if the script path is wrong.
-            benchmark_script = os.path.join(benchmark_root,
-                                            "benchmark_serving.py")
+            benchmark_script = os.path.join(benchmark_root, "benchmark_serving.py")
+            if not os.path.exists(benchmark_script):
+                pytest.skip(f"benchmark_serving.py not found at: {benchmark_script}")

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 199f306 and 2b87930.

📒 Files selected for processing (7)

tensorrt_llm/serve/scripts/benchmark_serving.py (1 hunks)
tests/integration/defs/accuracy/test_disaggregated_serving.py (2 hunks)
tests/integration/defs/disaggregated/test_disaggregated.py (2 hunks)
tests/integration/test_lists/qa/llm_function_full.txt (1 hunks)
tests/integration/test_lists/qa/llm_function_sanity.txt (1 hunks)
tests/integration/test_lists/test-db/l0_dgx_b200.yml (1 hunks)
tests/integration/test_lists/test-db/l0_dgx_h100.yml (2 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

tensorrt_llm/serve/scripts/benchmark_serving.py
tests/integration/defs/accuracy/test_disaggregated_serving.py
tests/integration/defs/disaggregated/test_disaggregated.py

**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

tensorrt_llm/serve/scripts/benchmark_serving.py
tests/integration/defs/accuracy/test_disaggregated_serving.py
tests/integration/defs/disaggregated/test_disaggregated.py

🧠 Learnings (1)

📚 Learning: 2025-07-28T17:06:08.621Z

Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

tests/integration/test_lists/qa/llm_function_sanity.txt
tests/integration/test_lists/qa/llm_function_full.txt
tests/integration/test_lists/test-db/l0_dgx_h100.yml
tests/integration/test_lists/test-db/l0_dgx_b200.yml
tests/integration/defs/accuracy/test_disaggregated_serving.py

🧬 Code Graph Analysis (1)

tests/integration/defs/accuracy/test_disaggregated_serving.py (1)

tests/integration/defs/accuracy/accuracy_core.py (4)

MMLU (269-283)

evaluate (146-199)

evaluate (678-688)

GSM8K (286-299)

🔇 Additional comments (8)

tensorrt_llm/serve/scripts/benchmark_serving.py (1)

582-588: Safer metrics dict construction (avoids KeyError).

Filtering keys to only those present in results is correct and prevents errors when selected percentiles differ from fixed entries like p99_*. LGTM.

tests/integration/defs/accuracy/test_disaggregated_serving.py (1)

568-601: Add nixl backend accuracy test for DeepSeek-V3-Lite — looks good.

Config uses nixl for both ctx/gen transceivers and standard disaggregated server wiring. Evaluation on MMLU and GSM8K aligns with existing harness patterns.

tests/integration/test_lists/qa/llm_function_full.txt (1)

563-565: Register nixl backend accuracy tests — OK.

Entries correctly reference the newly added tests and follow list conventions.

tests/integration/test_lists/qa/llm_function_sanity.txt (1)

113-115: Add nixl backend tests to sanity list — OK.

Consistent with accuracy list additions; names/paths align.

tests/integration/test_lists/test-db/l0_dgx_h100.yml (2)

55-56: Include nixl backend accuracy tests in H100 suite — OK.

These align with newly added test methods and the platform constraints.

115-118: Benchmark test names exist and are collected by pytest

Verified that test_disaggregated_benchmark_on_diff_backends is defined in
tests/integration/defs/disaggregated/test_disaggregated.py (line 1244).

Confirmed that the four parameterized IDs in
tests/integration/test_lists/test-db/l0_dgx_h100.yml (lines 115–118)
match the ones collected by pytest.

No further action required; CI scheduling will succeed.

tests/integration/test_lists/test-db/l0_dgx_b200.yml (2)

77-78: Confirmed accuracy tests are present
Both TestQwen3_8B::test_nixl_backend and TestDeepSeekV3Lite::test_nixl_backend are defined in tests/integration/defs/accuracy/test_disaggregated_serving.py (lines ~564–569 and ~699–704). No renames detected—this list will collect successfully.

73-76: Backend benchmark entries verified

Confirmed test_disaggregated_benchmark_on_diff_backends is defined and parameterized with exactly these four models:

DeepSeek-V3-Lite-fp8

DeepSeek-V3-Lite-bf16

llama-v3-8b-hf

llama-3.1-8b-instruct-hf-fp8

The YAML entries in tests/integration/test_lists/test-db/l0_dgx_b200.yml include each of these IDs.

The path disaggregated/test_disaggregated.py correctly resolves to tests/integration/defs/disaggregated/test_disaggregated.py.

No changes required.

tests/integration/defs/accuracy/test_disaggregated_serving.py

tests/integration/defs/disaggregated/test_disaggregated.py

coderabbitai

Actionable comments posted: 4

🔭 Outside diff range comments (2)

tensorrt_llm/serve/scripts/benchmark_serving.py (2)
471-473: Bug: JSON key typo includes a trailing colon ("request_goodput:")

This will produce an unexpected key name in results and could break downstream consumers expecting "request_goodput".

Apply this diff:
-        "request_goodput:":
-        metrics.request_goodput if goodput_config_dict else None,
+        "request_goodput": (
+            metrics.request_goodput if goodput_config_dict else None
+        ),
19-24: Add postponed annotation evaluation for Python 3.8 compatibility

The project’s python_requires=">=3.7,<4" (setup.py) means this script must run on Python 3.8, where built-in generics like list[float] and dict[str, float] are not subscriptable at import time unless you use the PEP 563 future import. Please prepend the future import to avoid TypeError on 3.8:

• File: tensorrt_llm/serve/scripts/benchmark_serving.py
#!/usr/bin/env python3
+ from __future__ import annotations
 import argparse
 import asyncio
 import gc
 import json
 import os
 import random

🧹 Nitpick comments (5)

tensorrt_llm/serve/scripts/benchmark_serving.py (2)
1-4: Missing NVIDIA copyright header per repo guidelines

All .py files should include an NVIDIA copyright header with the current year. Keep the existing Apache SPDX notice, but prepend the NVIDIA header.
+# Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
 # Adopted from
 # https://github.com/vllm-project/vllm/blob/200bbf92e8861e2458a6f90bca73f40cc3b1ad1f/benchmarks/benchmark_serving.py
-# SPDX-License-Identifier: Apache-2.0
+# SPDX-License-Identifier: Apache-2.0
570-575: Consider exporting E2EL stats in the metrics payload too

You compute and optionally attach E2EL stats to the result; include them in the primary metrics list so they land in the structured payload rather than only extra_info.
     metrics = [
         "median_ttft_ms", "mean_ttft_ms", "std_ttft_ms", "p99_ttft_ms",
         "mean_tpot_ms", "median_tpot_ms", "std_tpot_ms", "p99_tpot_ms",
         "median_itl_ms", "mean_itl_ms", "std_itl_ms", "p99_itl_ms",
+        "mean_e2el_ms", "median_e2el_ms", "std_e2el_ms", "p99_e2el_ms",
         "mean_request_ar", "median_request_ar", "std_request_ar"
     ]
Note: your guarded comprehension already filters to keys present in results.
tests/integration/test_lists/test-db/l0_dgx_h100.yml (1)

115-118: Pre-merge includes benchmark-on-diff-backends; verify runtime budget

Adding parameterized benchmarking tests pre-merge can be time-consuming. If this was meant as an exclusion, it’s currently included. If inclusion is intended, ensure timeouts/gpu allocation are adequate.

Would you prefer moving these to post_merge (similar to B200), or keep them behind a specific auto_trigger label?
tests/integration/defs/disaggregated/test_disaggregated.py (2)
1249-1256: Prefer context-managed TemporaryDirectory to ensure cleanup

Use a with-statement so the temp dir is always cleaned up, even on failures.

Example refactor:
-    temp_dir = tempfile.TemporaryDirectory()
-    nixl_config_path = os.path.join(temp_dir.name, "nixl_config.yaml")
-    ucx_config_path = os.path.join(temp_dir.name, "ucx_config.yaml")
-    with open(nixl_config_path, 'w', encoding='utf-8') as f:
+    with tempfile.TemporaryDirectory() as temp_dir:
+        nixl_config_path = os.path.join(temp_dir, "nixl_config.yaml")
+        ucx_config_path = os.path.join(temp_dir, "ucx_config.yaml")
+        with open(nixl_config_path, 'w', encoding='utf-8') as f:
             yaml.dump(nixl_config, f)
-    with open(ucx_config_path, 'w', encoding='utf-8') as f:
-        yaml.dump(ucx_config, f)
+        with open(ucx_config_path, 'w', encoding='utf-8') as f:
+            yaml.dump(ucx_config, f)
+        # ... and keep the rest of the test body indented inside this block ...
1095-1099: Nit: docstring wording

The docstring still says “test”; this is a benchmark runner.
-    """Run disaggregated test with given configuration."""
+    """Run disaggregated benchmark with given configuration."""

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 199f306 and 2b87930.

📒 Files selected for processing (7)

tensorrt_llm/serve/scripts/benchmark_serving.py (1 hunks)
tests/integration/defs/accuracy/test_disaggregated_serving.py (2 hunks)
tests/integration/defs/disaggregated/test_disaggregated.py (2 hunks)
tests/integration/test_lists/qa/llm_function_full.txt (1 hunks)
tests/integration/test_lists/qa/llm_function_sanity.txt (1 hunks)
tests/integration/test_lists/test-db/l0_dgx_b200.yml (1 hunks)
tests/integration/test_lists/test-db/l0_dgx_h100.yml (2 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

tensorrt_llm/serve/scripts/benchmark_serving.py
tests/integration/defs/accuracy/test_disaggregated_serving.py
tests/integration/defs/disaggregated/test_disaggregated.py

**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

tensorrt_llm/serve/scripts/benchmark_serving.py
tests/integration/defs/accuracy/test_disaggregated_serving.py
tests/integration/defs/disaggregated/test_disaggregated.py

🧠 Learnings (1)

📚 Learning: 2025-07-28T17:06:08.621Z

Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

tests/integration/test_lists/qa/llm_function_sanity.txt
tests/integration/test_lists/qa/llm_function_full.txt
tests/integration/test_lists/test-db/l0_dgx_h100.yml
tests/integration/defs/accuracy/test_disaggregated_serving.py
tests/integration/test_lists/test-db/l0_dgx_b200.yml

🔇 Additional comments (5)

tests/integration/test_lists/qa/llm_function_full.txt (1)

563-564: nixl-backend accuracy tests verified

Both TestDeepSeekV3Lite and TestQwen3_8B in
tests/integration/defs/accuracy/test_disaggregated_serving.py
define test_nixl_backend, so the entries in your test list are correct and discoverable. No further changes needed.

tests/integration/test_lists/qa/llm_function_sanity.txt (1)

113-114: Sanity list entries verified – tests are implemented

Both entries in tests/integration/test_lists/qa/llm_function_sanity.txt correspond to real tests in tests/integration/defs/accuracy/test_disaggregated_serving.py:

TestDeepSeekV3Lite::test_nixl_backend found at lines 563–568

TestQwen3_8B::test_nixl_backend found at lines 698–703

No further changes required.

tensorrt_llm/serve/scripts/benchmark_serving.py (1)

584-585: Good defensive fix to avoid KeyError on missing metrics

Filtering to keys present in results prevents crashes when percentile metrics are not selected.

tests/integration/test_lists/test-db/l0_dgx_h100.yml (1)

55-56: These tests are being added to the run list (not excluded) — confirm intent

The AI summary states exclusion, but this YAML adds:

accuracy/test_disaggregated_serving.py::TestQwen3_8B::test_nixl_backend

accuracy/test_disaggregated_serving.py::TestDeepSeekV3Lite::test_nixl_backend

Confirm we want to run these in pre_merge on H100.

If intended only for specific environments, consider gating here (or adding skip markers). Would you like a PR to adjust gating?

tests/integration/test_lists/test-db/l0_dgx_b200.yml (1)

73-79: Verified test implementations present; approving post_merge additions

The newly added benchmarks and nixl backend accuracy tests are present in the codebase at the expected locations:

tests/integration/defs/disaggregated/test_disaggregated.py: test_disaggregated_benchmark_on_diff_backends

tests/integration/defs/accuracy/test_disaggregated_serving.py:

TestDeepSeekV3Lite::test_nixl_backend

TestQwen3_8B::test_nixl_backend

Placement and availability confirmed—approving these changes.

tests/integration/defs/accuracy/test_disaggregated_serving.py

tests/integration/defs/disaggregated/test_disaggregated.py

tensorrt-cicd · 2025-08-10T22:08:03Z

PR_Github #14713 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #11105 completed with status: 'FAILURE'

bo-nv · 2025-08-11T04:51:20Z

/bot run --add-multi-gpu-test

tensorrt-cicd · 2025-08-11T04:57:46Z

PR_Github #14749 [ run ] triggered by Bot

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (8)

tests/integration/defs/disaggregated/test_disaggregated.py (8)

1151-1154: Drop dataset-path when dataset-name=random

For random dataset generation, dataset-path is unused and can be removed.

-                '--dataset-name',
-                'random',
-                '--dataset-path',
-                shared_gpt_path,
+                '--dataset-name', 'random',

1059-1063: Guard LLM_ROOT and verify benchmark script exists

Avoid hard failures if env isn’t set or the script path moved; skip cleanly instead.

 @pytest.fixture(scope="module")
 def benchmark_root():
-    llm_root = os.getenv("LLM_ROOT")
-    return os.path.join(llm_root, "tensorrt_llm", "serve", "scripts")
+    llm_root = os.getenv("LLM_ROOT")
+    if not llm_root:
+        pytest.skip("LLM_ROOT must be set to locate benchmark scripts")
+    root = os.path.join(llm_root, "tensorrt_llm", "serve", "scripts")
+    script = os.path.join(root, "benchmark_serving.py")
+    if not os.path.exists(script):
+        pytest.skip(f"benchmark_serving.py not found at: {script}")
+    return root

1073-1087: Skip gracefully if model path is missing

Avoid hard failures when the model is not present on the runner.

 @pytest.fixture(scope="function")
 def benchmark_model_root(request):
     models_root = llm_models_root()
@@
     else:
         raise ValueError(f"Failed to find the model: {request.param}")
-    return model_path
+    if not os.path.isdir(model_path):
+        pytest.skip(f"Model not found at: {model_path}")
+    return model_path

1097-1111: Env handling: support env=None, derive KV backend env from config, and set UCX_TLS only for UCX

Current code unconditionally sets UCX_TLS and doesn’t set TRTLLM_USE_{NIXL,UCX}_KVCACHE. This can cause mismatched backend behavior and CI failures.

-def run_disaggregated_benchmark(example_dir,
+def run_disaggregated_benchmark(example_dir,
                                 config_file,
                                 benchmark_root,
                                 benchmark_model_root,
                                 shared_gpt_path,
                                 env=None,
                                 cwd=None):
-    """Run disaggregated test with given configuration."""
-    run_env = env.copy()
-    run_env["UCX_TLS"] = "^ib"
+    """Run disaggregated benchmark with given configuration and return (median_e2el, median_ttft)."""
+    if env is None:
+        env = os.environ.copy()
+    run_env = env.copy()
+    # Derive backend from the config file and set KV cache transport env
+    with open(config_file, "r", encoding="utf-8") as _f:
+        _cfg = yaml.safe_load(_f) or {}
+    _gen_backend = (((_cfg.get("generation_servers") or {}).get("cache_transceiver_config") or {}).get("backend"))
+    _ctx_backend = (((_cfg.get("context_servers") or {}).get("cache_transceiver_config") or {}).get("backend"))
+    _backend = _gen_backend or _ctx_backend or "ucx"
+    for _k in ("TRTLLM_USE_UCX_KVCACHE", "TRTLLM_USE_NIXL_KVCACHE"):
+        run_env.pop(_k, None)
+    if _backend == "nixl":
+        run_env["TRTLLM_USE_NIXL_KVCACHE"] = "1"
+        # do not set UCX_TLS for nixl
+    else:
+        run_env["TRTLLM_USE_UCX_KVCACHE"] = "1"
+        run_env["UCX_TLS"] = "^ib"

1130-1135: Warm-up client must use the same config file the server is using

Avoid mismatched ports/urls by using config_file instead of a hardcoded path.

             client_cmd = [
                 'python3', f'{client_dir}/disagg_client.py', '-c',
-                f'{example_dir}/disagg_config.yaml', '-p',
+                config_file, '-p',
                 f'{client_dir}/prompts.json', '--ignore-eos',
                 '--server-start-timeout',
                 str(server_start_timeout)
             ]

1137-1139: Pass run_env consistently so UCX/NIXL settings propagate to subprocesses

Both the warm-up and benchmark should use run_env, not env.

-            check_call(client_cmd,
-                       env=env,
+            check_call(client_cmd,
+                       env=run_env,
                        poll_procs=[workers_proc, server_proc])
@@
-            check_call(benchmark_cmd, env=env)
-            output = check_output(benchmark_cmd, env=env)
+            check_call(benchmark_cmd, env=run_env)
+            output = check_output(benchmark_cmd, env=run_env)

Also applies to: 1174-1175

1113-1120: Robust process cleanup; avoid UnboundLocalError and noisy terminate

Initialize procs to None and guard terminate/wait to prevent exceptions if startup fails or procs already exited.

-    try:
+    workers_proc = None
+    server_proc = None
+    try:
@@
-    finally:
-        server_proc.terminate()
-        workers_proc.terminate()
-        server_proc.wait()
-        workers_proc.wait()
+    finally:
+        for proc in (server_proc, workers_proc):
+            if proc and proc.poll() is None:
+                try:
+                    proc.terminate()
+                except Exception:
+                    pass
+        for proc in (server_proc, workers_proc):
+            if proc:
+                try:
+                    proc.wait(timeout=30)
+                except Exception:
+                    pass

Also applies to: 1198-1202

1251-1267: Use TemporaryDirectory as a context manager for deterministic cleanup

Prevents leaked temp dirs on early failures and keeps files scoped.

-    temp_dir = tempfile.TemporaryDirectory()
-    nixl_config_path = os.path.join(temp_dir.name, "nixl_config.yaml")
-    ucx_config_path = os.path.join(temp_dir.name, "ucx_config.yaml")
-    with open(nixl_config_path, 'w', encoding='utf-8') as f:
-        yaml.dump(nixl_config, f)
-    with open(ucx_config_path, 'w', encoding='utf-8') as f:
-        yaml.dump(ucx_config, f)
-
-    env = llm_venv._new_env.copy()
-    nixl_e2el, nixl_ttft = run_disaggregated_benchmark(
-        disaggregated_example_root,
-        nixl_config_path,
-        benchmark_root,
-        benchmark_model_root,
-        shared_gpt_path,
-        env=env,
-        cwd=llm_venv.get_working_directory())
-    ucx_e2el, ucx_ttft = run_disaggregated_benchmark(
-        disaggregated_example_root,
-        ucx_config_path,
-        benchmark_root,
-        benchmark_model_root,
-        shared_gpt_path,
-        env=env,
-        cwd=llm_venv.get_working_directory())
+    with tempfile.TemporaryDirectory() as td:
+        nixl_config_path = os.path.join(td, "nixl_config.yaml")
+        ucx_config_path = os.path.join(td, "ucx_config.yaml")
+        with open(nixl_config_path, 'w', encoding='utf-8') as f:
+            yaml.dump(nixl_config, f)
+        with open(ucx_config_path, 'w', encoding='utf-8') as f:
+            yaml.dump(ucx_config, f)
+
+        env = llm_venv._new_env.copy()
+        nixl_e2el, nixl_ttft = run_disaggregated_benchmark(
+            disaggregated_example_root,
+            nixl_config_path,
+            benchmark_root,
+            benchmark_model_root,
+            shared_gpt_path,
+            env=env,
+            cwd=llm_venv.get_working_directory())
+        ucx_e2el, ucx_ttft = run_disaggregated_benchmark(
+            disaggregated_example_root,
+            ucx_config_path,
+            benchmark_root,
+            benchmark_model_root,
+            shared_gpt_path,
+            env=env,
+            cwd=llm_venv.get_working_directory())

🧹 Nitpick comments (3)

tests/integration/defs/disaggregated/test_disaggregated.py (3)

1065-1071: Optionally skip if dataset file is missing (or drop this when using random dataset)

The benchmark uses dataset-name=random, so dataset-path is not required. If you keep this fixture for other usages, skip gracefully when the file is missing.

 @pytest.fixture(scope="module")
 def shared_gpt_path():
     DEFAULT_LLM_MODEL_ROOT = os.path.join("/scratch.trt_llm_data", "llm-models")
     LLM_MODELS_ROOT = os.environ.get("LLM_MODELS_ROOT", DEFAULT_LLM_MODEL_ROOT)
-    return os.path.join(LLM_MODELS_ROOT, "datasets",
-                        "ShareGPT_V3_unfiltered_cleaned_split.json")
+    path = os.path.join(LLM_MODELS_ROOT, "datasets",
+                        "ShareGPT_V3_unfiltered_cleaned_split.json")
+    if not os.path.isfile(path):
+        pytest.skip(f"Dataset not found at: {path}")
+    return path

1127-1127: Typo in comment

Minor: “sever” → “server”.

-            # Ensure the sever has started
+            # Ensure the server has started

1175-1175: Decode check_output if it returns bytes

Ensure regex operates on str.

-            output = check_output(benchmark_cmd, env=run_env)
+            output = check_output(benchmark_cmd, env=run_env)
+            if isinstance(output, (bytes, bytearray)):
+                output = output.decode("utf-8", errors="replace")

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9240562 and 2dd10cf.

📒 Files selected for processing (7)

tensorrt_llm/serve/scripts/benchmark_serving.py (1 hunks)
tests/integration/defs/accuracy/test_disaggregated_serving.py (2 hunks)
tests/integration/defs/disaggregated/test_disaggregated.py (2 hunks)
tests/integration/test_lists/qa/llm_function_full.txt (1 hunks)
tests/integration/test_lists/qa/llm_function_sanity.txt (1 hunks)
tests/integration/test_lists/test-db/l0_dgx_b200.yml (1 hunks)
tests/integration/test_lists/test-db/l0_dgx_h100.yml (2 hunks)

✅ Files skipped from review due to trivial changes (1)

tests/integration/test_lists/qa/llm_function_sanity.txt

🚧 Files skipped from review as they are similar to previous changes (5)

tests/integration/test_lists/test-db/l0_dgx_h100.yml
tensorrt_llm/serve/scripts/benchmark_serving.py
tests/integration/defs/accuracy/test_disaggregated_serving.py
tests/integration/test_lists/qa/llm_function_full.txt
tests/integration/test_lists/test-db/l0_dgx_b200.yml

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

tests/integration/defs/disaggregated/test_disaggregated.py

**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

tests/integration/defs/disaggregated/test_disaggregated.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (2)

tests/integration/defs/disaggregated/test_disaggregated.py (2)

17-17: LGTM: imports are necessary and used

re, tempfile, yaml, and the additional defs imports are required by the new code paths.

Also applies to: 19-19, 22-24

1204-1239: LGTM: benchmark config builder is clear and parameterized

The helper cleanly injects the KV backend and sensible defaults for small-node benchmarking.

tensorrt-cicd · 2025-08-11T05:35:51Z

PR_Github #14749 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #11134 completed with status: 'FAILURE'

bo-nv · 2025-08-11T07:53:21Z

/bot run --add-multi-gpu-test

tensorrt-cicd · 2025-08-11T07:59:13Z

PR_Github #14776 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-11T08:33:25Z

PR_Github #14776 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #11155 completed with status: 'FAILURE'

bo-nv · 2025-08-11T08:42:27Z

/bot run --add-multi-gpu-test

tensorrt-cicd · 2025-08-11T08:47:24Z

PR_Github #14785 [ run ] triggered by Bot

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (9)

tests/integration/defs/disaggregated/test_disaggregated.py (9)

1059-1063: Guard LLM_ROOT and verify benchmark script exists

If LLM_ROOT isn't set or the script path changed, this fixture will crash. Prefer a clear skip to avoid hard failures.

1073-1087: Skip gracefully if model path is missing

Avoid hard failure if the model is not available on the runner.

1098-1099: Initialize env safely and unify UCX_TLS usage

Handle env=None and ensure all subprocesses (including benchmark) get the same UCX settings.

1111-1113: Robust process cleanup; avoid UnboundLocalError and noisy terminate

If process startup fails, server_proc/workers_proc may be undefined; also terminate() on already-exited procs can raise.

1130-1135: Warm-up client should use the same config_file

Using example_dir/disagg_config.yaml risks mismatched ports/urls vs the server's temp config. Use the same config you're serving with.

1150-1154: Benchmark command: drop dataset-path for random; pass run_env consistently

Remove dataset-path since dataset-name=random.

Use run_env so UCX_TLS applies.

Also applies to: 1172-1175

1198-1202: Guard process cleanup to avoid UnboundLocalError

If an exception occurs before server_proc/workers_proc are created, the finally block will raise UnboundLocalError. Guard cleanup.

1251-1258: Use TemporaryDirectory as a context manager for deterministic cleanup

Prevents leaked temp dirs if the test fails early.

1259-1267: Set backend-specific KV cache env from config (NIXL vs UCX)

The benchmark runner doesn't set TRTLLM_USE_NIXL_KVCACHE / TRTLLM_USE_UCX_KVCACHE. Earlier disaggregated tests rely on these env flags. Derive backend from the YAML config and set env accordingly; also only set UCX_TLS for UCX.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2dd10cf and a87a577.

📒 Files selected for processing (7)

tensorrt_llm/serve/scripts/benchmark_serving.py (1 hunks)
tests/integration/defs/accuracy/test_disaggregated_serving.py (2 hunks)
tests/integration/defs/disaggregated/test_disaggregated.py (2 hunks)
tests/integration/test_lists/qa/llm_function_full.txt (1 hunks)
tests/integration/test_lists/qa/llm_function_sanity.txt (1 hunks)
tests/integration/test_lists/test-db/l0_dgx_b200.yml (1 hunks)
tests/integration/test_lists/test-db/l0_dgx_h100.yml (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (6)

tests/integration/test_lists/qa/llm_function_full.txt
tests/integration/defs/accuracy/test_disaggregated_serving.py
tensorrt_llm/serve/scripts/benchmark_serving.py
tests/integration/test_lists/qa/llm_function_sanity.txt
tests/integration/test_lists/test-db/l0_dgx_h100.yml
tests/integration/test_lists/test-db/l0_dgx_b200.yml

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

tests/integration/defs/disaggregated/test_disaggregated.py

**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

tests/integration/defs/disaggregated/test_disaggregated.py

🧬 Code Graph Analysis (1)

tests/integration/defs/disaggregated/test_disaggregated.py (3)

tests/integration/defs/conftest.py (5)

llm_models_root (77-83)

llm_root (180-181)

disaggregated_test_root (2339-2344)

disaggregated_example_root (270-275)

llm_venv (707-723)

tests/integration/defs/trt_test_alternative.py (3)

check_call (230-238)

check_output (241-267)

popen (179-198)

tensorrt_llm/executor/request.py (1)

path (48-49)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (3)

tests/integration/defs/disaggregated/test_disaggregated.py (3)

17-17: LGTM!

The new imports are necessary for the benchmarking functionality and follow proper Python import conventions.

Also applies to: 19-19, 22-24

1204-1238: LGTM!

The configuration helper function is well-structured and creates appropriate configurations for different backends.

1279-1280: LGTM!

The performance assertions comparing NIXL and UCX backends are reasonable, ensuring NIXL performance is within 5% of UCX for both E2EL and TTFT metrics.

tests/integration/defs/disaggregated/test_disaggregated.py

tensorrt-cicd · 2025-08-11T09:20:32Z

PR_Github #14785 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #11163 completed with status: 'FAILURE'

bo-nv · 2025-08-11T15:42:16Z

/bot run --add-multi-gpu-test

tensorrt-cicd · 2025-08-11T15:47:41Z

PR_Github #14826 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-11T17:37:36Z

PR_Github #14826 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #11197 completed with status: 'FAILURE'

bo-nv · 2025-08-12T01:43:59Z

/bot run --add-multi-gpu-test

tensorrt-cicd · 2025-08-12T01:52:30Z

PR_Github #14873 [ run ] triggered by Bot

Signed-off-by: Bo Deng <[email protected]>

bo-nv · 2025-08-12T15:38:23Z

/bot run --add-multi-gpu-test

tensorrt-cicd · 2025-08-12T15:44:06Z

PR_Github #14993 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-12T16:06:32Z

PR_Github #14993 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #11322 completed with status: 'FAILURE'

Signed-off-by: Bo Deng <[email protected]>

bo-nv · 2025-08-13T00:55:45Z

/bot run --add-multi-gpu-test

tensorrt-cicd · 2025-08-13T01:01:39Z

PR_Github #15039 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-13T02:28:59Z

PR_Github #15039 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #11359 completed with status: 'FAILURE'

bo-nv · 2025-08-13T03:02:04Z

/bot run --add-multi-gpu-test

tensorrt-cicd · 2025-08-13T03:07:44Z

PR_Github #15061 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-13T12:58:50Z

PR_Github #15061 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #11372 completed with status: 'FAILURE'

bo-nv · 2025-08-13T16:15:44Z

/bot run --add-multi-gpu-test --disable-fail-fast

tensorrt-cicd · 2025-08-13T16:21:51Z

PR_Github #15161 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-13T21:59:41Z

PR_Github #15161 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #11449 completed with status: 'SUCCESS'

bo-nv changed the title ~~Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/6623~~ [TRTLLM-6675][infra] Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/6623 Aug 8, 2025

bo-nv force-pushed the user/bo/add-more-nixl-tests-main branch from 31e80c8 to 2b87930 Compare August 10, 2025 15:47

coderabbitai bot reviewed Aug 10, 2025

View reviewed changes

bo-nv force-pushed the user/bo/add-more-nixl-tests-main branch from 9240562 to 2dd10cf Compare August 11, 2025 04:50

coderabbitai bot reviewed Aug 11, 2025

View reviewed changes

bo-nv force-pushed the user/bo/add-more-nixl-tests-main branch from 2dd10cf to a87a577 Compare August 11, 2025 08:40

coderabbitai bot reviewed Aug 11, 2025

View reviewed changes

tests/integration/defs/disaggregated/test_disaggregated.py Show resolved Hide resolved

bo-nv force-pushed the user/bo/add-more-nixl-tests-main branch from a87a577 to fafd1d6 Compare August 11, 2025 15:41

bo-nv added 4 commits August 12, 2025 08:37

fix accuracy tests

728d96f

Signed-off-by: Bo Deng <[email protected]>

fix some tests

05658d8

Signed-off-by: Bo Deng <[email protected]>

clean codes

3ddc21a

Signed-off-by: Bo Deng <[email protected]>

adjust tests

a7be498

Signed-off-by: Bo Deng <[email protected]>

bo-nv force-pushed the user/bo/add-more-nixl-tests-main branch from 32484da to a7be498 Compare August 12, 2025 15:37

fix format

4e21fab

Signed-off-by: Bo Deng <[email protected]>

bo-nv requested review from Shixiaowei02 and chuangz0 August 14, 2025 01:33

Shixiaowei02 approved these changes Aug 14, 2025

View reviewed changes

bo-nv enabled auto-merge (squash) August 14, 2025 04:16

bo-nv force-pushed the user/bo/add-more-nixl-tests-main branch from 018f4ba to 4e21fab Compare August 14, 2025 04:36

bo-nv merged commit d8acca4 into NVIDIA:main Aug 14, 2025
5 checks passed

This was referenced Aug 15, 2025

[https://nvbugs/5448437][fix] fix some nixl tests #6940

Merged

[https://nvbugs/5461980][draft] torch.cuda.event.synchronize() hangs #7027

Closed

[None][infra] Upgrade UCX to v1.19.x and NIXL to 0.5.0 #7024

Merged

This was referenced Aug 22, 2025

[None][feat] add gpt-osss tests to sanity list #7158

Merged

[None][infra] Split DGX_B200 stage into multiple parts and pre-/post-merge #7074

Merged

[https://nvbugs/5470782][fix] Add specific test names for test_deepseek.py #7318

Merged

[TRTLLM-6675][infra] Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/6623 #6735

[TRTLLM-6675][infra] Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/6623 #6735

Uh oh!

Conversation

bo-nv commented Aug 8, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

coderabbitai bot commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

bo-nv commented Aug 8, 2025

Uh oh!

tensorrt-cicd commented Aug 8, 2025

Uh oh!

tensorrt-cicd commented Aug 8, 2025

Uh oh!

bo-nv commented Aug 10, 2025

Uh oh!

tensorrt-cicd commented Aug 10, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tensorrt-cicd commented Aug 10, 2025

Uh oh!

bo-nv commented Aug 11, 2025

Uh oh!

tensorrt-cicd commented Aug 11, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented Aug 11, 2025

Uh oh!

bo-nv commented Aug 11, 2025

Uh oh!

tensorrt-cicd commented Aug 11, 2025

Uh oh!

tensorrt-cicd commented Aug 11, 2025

Uh oh!

bo-nv commented Aug 11, 2025

Uh oh!

tensorrt-cicd commented Aug 11, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

bo-nv commented Aug 8, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 8, 2025 •

edited

Loading