-
Notifications
You must be signed in to change notification settings - Fork 1.7k
[TRTLLM-6675][infra] Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/6623 #6735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughAdds nixl-backed accuracy tests, a disaggregated benchmarking workflow (fixtures, helpers, and a parameterized benchmark test), updates multiple test lists and test-db exclusions, and makes benchmark_serving.py’s metrics construction tolerant of missing keys to avoid KeyError. Some inserted blocks are duplicated in the diffs. Changes
Sequence Diagram(s)sequenceDiagram
participant Runner as Test Runner
participant Disagg as Disaggregated Server
participant MPI as MPI Workers
participant Bench as benchmark_serving.py
Runner->>Disagg: Start disaggregated server (config: nixl or ucx)
Runner->>MPI: Launch 2 MPI workers
Runner->>Bench: Warm up client
Runner->>Bench: Run benchmark_serving.py
Bench->>Disagg: Send inference requests
Disagg->>MPI: Dispatch to workers
MPI-->>Disagg: Return responses
Disagg-->>Bench: Aggregate metrics
Bench-->>Runner: Emit Median E2EL and Median TTFT
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
📜 Recent review detailsConfiguration used: .coderabbit.yaml 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
/bot run --add-multi-gpu-test |
PR_Github #14571 [ run ] triggered by Bot |
PR_Github #14571 [ run ] completed with state |
31e80c8
to
2b87930
Compare
/bot run --add-multi-gpu-test |
PR_Github #14713 [ run ] triggered by Bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
🧹 Nitpick comments (3)
tests/integration/defs/disaggregated/test_disaggregated.py (3)
23-24
: Import style: prefer namespace imports per guidelinesGuideline: “Always maintain the namespace when importing in Python.” Consider importing the module and using qualified names.
Example:
-from defs.trt_test_alternative import check_call, check_output, popen +import defs.trt_test_alternative as trt_test_altOutside this hunk, calls would become:
- trt_test_alt.check_call(...)
- trt_test_alt.check_output(...)
- trt_test_alt.popen(...)
1064-1069
: Dataset path is unused for random dataset; optionally drop itYou pass --dataset-name random, so --dataset-path is ignored. Removing it reduces environment dependencies.
Apply together with the benchmark_cmd change below (lines 1148-1152).
1139-1142
: Verify benchmark script path before runningAvoid confusing subprocess errors if the script path is wrong.
- benchmark_script = os.path.join(benchmark_root, - "benchmark_serving.py") + benchmark_script = os.path.join(benchmark_root, "benchmark_serving.py") + if not os.path.exists(benchmark_script): + pytest.skip(f"benchmark_serving.py not found at: {benchmark_script}")
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
tensorrt_llm/serve/scripts/benchmark_serving.py
(1 hunks)tests/integration/defs/accuracy/test_disaggregated_serving.py
(2 hunks)tests/integration/defs/disaggregated/test_disaggregated.py
(2 hunks)tests/integration/test_lists/qa/llm_function_full.txt
(1 hunks)tests/integration/test_lists/qa/llm_function_sanity.txt
(1 hunks)tests/integration/test_lists/test-db/l0_dgx_b200.yml
(1 hunks)tests/integration/test_lists/test-db/l0_dgx_h100.yml
(2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py
: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.
Files:
tensorrt_llm/serve/scripts/benchmark_serving.py
tests/integration/defs/accuracy/test_disaggregated_serving.py
tests/integration/defs/disaggregated/test_disaggregated.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
tensorrt_llm/serve/scripts/benchmark_serving.py
tests/integration/defs/accuracy/test_disaggregated_serving.py
tests/integration/defs/disaggregated/test_disaggregated.py
🧠 Learnings (1)
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Applied to files:
tests/integration/test_lists/qa/llm_function_sanity.txt
tests/integration/test_lists/qa/llm_function_full.txt
tests/integration/test_lists/test-db/l0_dgx_h100.yml
tests/integration/test_lists/test-db/l0_dgx_b200.yml
tests/integration/defs/accuracy/test_disaggregated_serving.py
🧬 Code Graph Analysis (1)
tests/integration/defs/accuracy/test_disaggregated_serving.py (1)
tests/integration/defs/accuracy/accuracy_core.py (4)
MMLU
(269-283)evaluate
(146-199)evaluate
(678-688)GSM8K
(286-299)
🔇 Additional comments (8)
tensorrt_llm/serve/scripts/benchmark_serving.py (1)
582-588
: Safer metrics dict construction (avoids KeyError).Filtering keys to only those present in
results
is correct and prevents errors when selected percentiles differ from fixed entries likep99_*
. LGTM.tests/integration/defs/accuracy/test_disaggregated_serving.py (1)
568-601
: Add nixl backend accuracy test for DeepSeek-V3-Lite — looks good.Config uses nixl for both ctx/gen transceivers and standard disaggregated server wiring. Evaluation on MMLU and GSM8K aligns with existing harness patterns.
tests/integration/test_lists/qa/llm_function_full.txt (1)
563-565
: Register nixl backend accuracy tests — OK.Entries correctly reference the newly added tests and follow list conventions.
tests/integration/test_lists/qa/llm_function_sanity.txt (1)
113-115
: Add nixl backend tests to sanity list — OK.Consistent with accuracy list additions; names/paths align.
tests/integration/test_lists/test-db/l0_dgx_h100.yml (2)
55-56
: Include nixl backend accuracy tests in H100 suite — OK.These align with newly added test methods and the platform constraints.
115-118
: Benchmark test names exist and are collected by pytest
- Verified that
test_disaggregated_benchmark_on_diff_backends
is defined in
tests/integration/defs/disaggregated/test_disaggregated.py
(line 1244).- Confirmed that the four parameterized IDs in
tests/integration/test_lists/test-db/l0_dgx_h100.yml
(lines 115–118)
match the ones collected by pytest.No further action required; CI scheduling will succeed.
tests/integration/test_lists/test-db/l0_dgx_b200.yml (2)
77-78
: Confirmed accuracy tests are present
BothTestQwen3_8B::test_nixl_backend
andTestDeepSeekV3Lite::test_nixl_backend
are defined intests/integration/defs/accuracy/test_disaggregated_serving.py
(lines ~564–569 and ~699–704). No renames detected—this list will collect successfully.
73-76
: Backend benchmark entries verified
- Confirmed
test_disaggregated_benchmark_on_diff_backends
is defined and parameterized with exactly these four models:
- DeepSeek-V3-Lite-fp8
- DeepSeek-V3-Lite-bf16
- llama-v3-8b-hf
- llama-3.1-8b-instruct-hf-fp8
- The YAML entries in
tests/integration/test_lists/test-db/l0_dgx_b200.yml
include each of these IDs.- The path
disaggregated/test_disaggregated.py
correctly resolves totests/integration/defs/disaggregated/test_disaggregated.py
.No changes required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🔭 Outside diff range comments (2)
tensorrt_llm/serve/scripts/benchmark_serving.py (2)
471-473
: Bug: JSON key typo includes a trailing colon ("request_goodput:")This will produce an unexpected key name in results and could break downstream consumers expecting "request_goodput".
Apply this diff:
- "request_goodput:": - metrics.request_goodput if goodput_config_dict else None, + "request_goodput": ( + metrics.request_goodput if goodput_config_dict else None + ),
19-24
: Add postponed annotation evaluation for Python 3.8 compatibilityThe project’s
python_requires=">=3.7,<4"
(setup.py) means this script must run on Python 3.8, where built-in generics likelist[float]
anddict[str, float]
are not subscriptable at import time unless you use the PEP 563 future import. Please prepend the future import to avoidTypeError
on 3.8:• File:
tensorrt_llm/serve/scripts/benchmark_serving.py
#!/usr/bin/env python3 + from __future__ import annotations import argparse import asyncio import gc import json import os import random
🧹 Nitpick comments (5)
tensorrt_llm/serve/scripts/benchmark_serving.py (2)
1-4
: Missing NVIDIA copyright header per repo guidelinesAll .py files should include an NVIDIA copyright header with the current year. Keep the existing Apache SPDX notice, but prepend the NVIDIA header.
+# Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 # Adopted from # https://github.com/vllm-project/vllm/blob/200bbf92e8861e2458a6f90bca73f40cc3b1ad1f/benchmarks/benchmark_serving.py -# SPDX-License-Identifier: Apache-2.0 +# SPDX-License-Identifier: Apache-2.0
570-575
: Consider exporting E2EL stats in the metrics payload tooYou compute and optionally attach E2EL stats to the result; include them in the primary metrics list so they land in the structured payload rather than only extra_info.
metrics = [ "median_ttft_ms", "mean_ttft_ms", "std_ttft_ms", "p99_ttft_ms", "mean_tpot_ms", "median_tpot_ms", "std_tpot_ms", "p99_tpot_ms", "median_itl_ms", "mean_itl_ms", "std_itl_ms", "p99_itl_ms", + "mean_e2el_ms", "median_e2el_ms", "std_e2el_ms", "p99_e2el_ms", "mean_request_ar", "median_request_ar", "std_request_ar" ]
Note: your guarded comprehension already filters to keys present in results.
tests/integration/test_lists/test-db/l0_dgx_h100.yml (1)
115-118
: Pre-merge includes benchmark-on-diff-backends; verify runtime budgetAdding parameterized benchmarking tests pre-merge can be time-consuming. If this was meant as an exclusion, it’s currently included. If inclusion is intended, ensure timeouts/gpu allocation are adequate.
Would you prefer moving these to post_merge (similar to B200), or keep them behind a specific auto_trigger label?
tests/integration/defs/disaggregated/test_disaggregated.py (2)
1249-1256
: Prefer context-managed TemporaryDirectory to ensure cleanupUse a with-statement so the temp dir is always cleaned up, even on failures.
Example refactor:
- temp_dir = tempfile.TemporaryDirectory() - nixl_config_path = os.path.join(temp_dir.name, "nixl_config.yaml") - ucx_config_path = os.path.join(temp_dir.name, "ucx_config.yaml") - with open(nixl_config_path, 'w', encoding='utf-8') as f: + with tempfile.TemporaryDirectory() as temp_dir: + nixl_config_path = os.path.join(temp_dir, "nixl_config.yaml") + ucx_config_path = os.path.join(temp_dir, "ucx_config.yaml") + with open(nixl_config_path, 'w', encoding='utf-8') as f: yaml.dump(nixl_config, f) - with open(ucx_config_path, 'w', encoding='utf-8') as f: - yaml.dump(ucx_config, f) + with open(ucx_config_path, 'w', encoding='utf-8') as f: + yaml.dump(ucx_config, f) + # ... and keep the rest of the test body indented inside this block ...
1095-1099
: Nit: docstring wordingThe docstring still says “test”; this is a benchmark runner.
- """Run disaggregated test with given configuration.""" + """Run disaggregated benchmark with given configuration."""
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
tensorrt_llm/serve/scripts/benchmark_serving.py
(1 hunks)tests/integration/defs/accuracy/test_disaggregated_serving.py
(2 hunks)tests/integration/defs/disaggregated/test_disaggregated.py
(2 hunks)tests/integration/test_lists/qa/llm_function_full.txt
(1 hunks)tests/integration/test_lists/qa/llm_function_sanity.txt
(1 hunks)tests/integration/test_lists/test-db/l0_dgx_b200.yml
(1 hunks)tests/integration/test_lists/test-db/l0_dgx_h100.yml
(2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py
: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.
Files:
tensorrt_llm/serve/scripts/benchmark_serving.py
tests/integration/defs/accuracy/test_disaggregated_serving.py
tests/integration/defs/disaggregated/test_disaggregated.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
tensorrt_llm/serve/scripts/benchmark_serving.py
tests/integration/defs/accuracy/test_disaggregated_serving.py
tests/integration/defs/disaggregated/test_disaggregated.py
🧠 Learnings (1)
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Applied to files:
tests/integration/test_lists/qa/llm_function_sanity.txt
tests/integration/test_lists/qa/llm_function_full.txt
tests/integration/test_lists/test-db/l0_dgx_h100.yml
tests/integration/defs/accuracy/test_disaggregated_serving.py
tests/integration/test_lists/test-db/l0_dgx_b200.yml
🔇 Additional comments (5)
tests/integration/test_lists/qa/llm_function_full.txt (1)
563-564
: nixl-backend accuracy tests verifiedBoth
TestDeepSeekV3Lite
andTestQwen3_8B
in
tests/integration/defs/accuracy/test_disaggregated_serving.py
definetest_nixl_backend
, so the entries in your test list are correct and discoverable. No further changes needed.tests/integration/test_lists/qa/llm_function_sanity.txt (1)
113-114
: Sanity list entries verified – tests are implementedBoth entries in
tests/integration/test_lists/qa/llm_function_sanity.txt
correspond to real tests intests/integration/defs/accuracy/test_disaggregated_serving.py
:
- TestDeepSeekV3Lite::test_nixl_backend found at lines 563–568
- TestQwen3_8B::test_nixl_backend found at lines 698–703
No further changes required.
tensorrt_llm/serve/scripts/benchmark_serving.py (1)
584-585
: Good defensive fix to avoid KeyError on missing metricsFiltering to keys present in results prevents crashes when percentile metrics are not selected.
tests/integration/test_lists/test-db/l0_dgx_h100.yml (1)
55-56
: These tests are being added to the run list (not excluded) — confirm intentThe AI summary states exclusion, but this YAML adds:
- accuracy/test_disaggregated_serving.py::TestQwen3_8B::test_nixl_backend
- accuracy/test_disaggregated_serving.py::TestDeepSeekV3Lite::test_nixl_backend
Confirm we want to run these in pre_merge on H100.
If intended only for specific environments, consider gating here (or adding skip markers). Would you like a PR to adjust gating?
tests/integration/test_lists/test-db/l0_dgx_b200.yml (1)
73-79
: Verified test implementations present; approving post_merge additionsThe newly added benchmarks and nixl backend accuracy tests are present in the codebase at the expected locations:
- tests/integration/defs/disaggregated/test_disaggregated.py:
test_disaggregated_benchmark_on_diff_backends
- tests/integration/defs/accuracy/test_disaggregated_serving.py:
TestDeepSeekV3Lite::test_nixl_backend
TestQwen3_8B::test_nixl_backend
Placement and availability confirmed—approving these changes.
PR_Github #14713 [ run ] completed with state |
9240562
to
2dd10cf
Compare
/bot run --add-multi-gpu-test |
PR_Github #14749 [ run ] triggered by Bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (8)
tests/integration/defs/disaggregated/test_disaggregated.py (8)
1151-1154
: Drop dataset-path when dataset-name=randomFor random dataset generation, dataset-path is unused and can be removed.
- '--dataset-name', - 'random', - '--dataset-path', - shared_gpt_path, + '--dataset-name', 'random',
1059-1063
: Guard LLM_ROOT and verify benchmark script existsAvoid hard failures if env isn’t set or the script path moved; skip cleanly instead.
@pytest.fixture(scope="module") def benchmark_root(): - llm_root = os.getenv("LLM_ROOT") - return os.path.join(llm_root, "tensorrt_llm", "serve", "scripts") + llm_root = os.getenv("LLM_ROOT") + if not llm_root: + pytest.skip("LLM_ROOT must be set to locate benchmark scripts") + root = os.path.join(llm_root, "tensorrt_llm", "serve", "scripts") + script = os.path.join(root, "benchmark_serving.py") + if not os.path.exists(script): + pytest.skip(f"benchmark_serving.py not found at: {script}") + return root
1073-1087
: Skip gracefully if model path is missingAvoid hard failures when the model is not present on the runner.
@pytest.fixture(scope="function") def benchmark_model_root(request): models_root = llm_models_root() @@ else: raise ValueError(f"Failed to find the model: {request.param}") - return model_path + if not os.path.isdir(model_path): + pytest.skip(f"Model not found at: {model_path}") + return model_path
1097-1111
: Env handling: support env=None, derive KV backend env from config, and set UCX_TLS only for UCXCurrent code unconditionally sets UCX_TLS and doesn’t set TRTLLM_USE_{NIXL,UCX}_KVCACHE. This can cause mismatched backend behavior and CI failures.
-def run_disaggregated_benchmark(example_dir, +def run_disaggregated_benchmark(example_dir, config_file, benchmark_root, benchmark_model_root, shared_gpt_path, env=None, cwd=None): - """Run disaggregated test with given configuration.""" - run_env = env.copy() - run_env["UCX_TLS"] = "^ib" + """Run disaggregated benchmark with given configuration and return (median_e2el, median_ttft).""" + if env is None: + env = os.environ.copy() + run_env = env.copy() + # Derive backend from the config file and set KV cache transport env + with open(config_file, "r", encoding="utf-8") as _f: + _cfg = yaml.safe_load(_f) or {} + _gen_backend = (((_cfg.get("generation_servers") or {}).get("cache_transceiver_config") or {}).get("backend")) + _ctx_backend = (((_cfg.get("context_servers") or {}).get("cache_transceiver_config") or {}).get("backend")) + _backend = _gen_backend or _ctx_backend or "ucx" + for _k in ("TRTLLM_USE_UCX_KVCACHE", "TRTLLM_USE_NIXL_KVCACHE"): + run_env.pop(_k, None) + if _backend == "nixl": + run_env["TRTLLM_USE_NIXL_KVCACHE"] = "1" + # do not set UCX_TLS for nixl + else: + run_env["TRTLLM_USE_UCX_KVCACHE"] = "1" + run_env["UCX_TLS"] = "^ib"
1130-1135
: Warm-up client must use the same config file the server is usingAvoid mismatched ports/urls by using config_file instead of a hardcoded path.
client_cmd = [ 'python3', f'{client_dir}/disagg_client.py', '-c', - f'{example_dir}/disagg_config.yaml', '-p', + config_file, '-p', f'{client_dir}/prompts.json', '--ignore-eos', '--server-start-timeout', str(server_start_timeout) ]
1137-1139
: Pass run_env consistently so UCX/NIXL settings propagate to subprocessesBoth the warm-up and benchmark should use run_env, not env.
- check_call(client_cmd, - env=env, + check_call(client_cmd, + env=run_env, poll_procs=[workers_proc, server_proc]) @@ - check_call(benchmark_cmd, env=env) - output = check_output(benchmark_cmd, env=env) + check_call(benchmark_cmd, env=run_env) + output = check_output(benchmark_cmd, env=run_env)Also applies to: 1174-1175
1113-1120
: Robust process cleanup; avoid UnboundLocalError and noisy terminateInitialize procs to None and guard terminate/wait to prevent exceptions if startup fails or procs already exited.
- try: + workers_proc = None + server_proc = None + try: @@ - finally: - server_proc.terminate() - workers_proc.terminate() - server_proc.wait() - workers_proc.wait() + finally: + for proc in (server_proc, workers_proc): + if proc and proc.poll() is None: + try: + proc.terminate() + except Exception: + pass + for proc in (server_proc, workers_proc): + if proc: + try: + proc.wait(timeout=30) + except Exception: + passAlso applies to: 1198-1202
1251-1267
: Use TemporaryDirectory as a context manager for deterministic cleanupPrevents leaked temp dirs on early failures and keeps files scoped.
- temp_dir = tempfile.TemporaryDirectory() - nixl_config_path = os.path.join(temp_dir.name, "nixl_config.yaml") - ucx_config_path = os.path.join(temp_dir.name, "ucx_config.yaml") - with open(nixl_config_path, 'w', encoding='utf-8') as f: - yaml.dump(nixl_config, f) - with open(ucx_config_path, 'w', encoding='utf-8') as f: - yaml.dump(ucx_config, f) - - env = llm_venv._new_env.copy() - nixl_e2el, nixl_ttft = run_disaggregated_benchmark( - disaggregated_example_root, - nixl_config_path, - benchmark_root, - benchmark_model_root, - shared_gpt_path, - env=env, - cwd=llm_venv.get_working_directory()) - ucx_e2el, ucx_ttft = run_disaggregated_benchmark( - disaggregated_example_root, - ucx_config_path, - benchmark_root, - benchmark_model_root, - shared_gpt_path, - env=env, - cwd=llm_venv.get_working_directory()) + with tempfile.TemporaryDirectory() as td: + nixl_config_path = os.path.join(td, "nixl_config.yaml") + ucx_config_path = os.path.join(td, "ucx_config.yaml") + with open(nixl_config_path, 'w', encoding='utf-8') as f: + yaml.dump(nixl_config, f) + with open(ucx_config_path, 'w', encoding='utf-8') as f: + yaml.dump(ucx_config, f) + + env = llm_venv._new_env.copy() + nixl_e2el, nixl_ttft = run_disaggregated_benchmark( + disaggregated_example_root, + nixl_config_path, + benchmark_root, + benchmark_model_root, + shared_gpt_path, + env=env, + cwd=llm_venv.get_working_directory()) + ucx_e2el, ucx_ttft = run_disaggregated_benchmark( + disaggregated_example_root, + ucx_config_path, + benchmark_root, + benchmark_model_root, + shared_gpt_path, + env=env, + cwd=llm_venv.get_working_directory())
🧹 Nitpick comments (3)
tests/integration/defs/disaggregated/test_disaggregated.py (3)
1065-1071
: Optionally skip if dataset file is missing (or drop this when using random dataset)The benchmark uses dataset-name=random, so dataset-path is not required. If you keep this fixture for other usages, skip gracefully when the file is missing.
@pytest.fixture(scope="module") def shared_gpt_path(): DEFAULT_LLM_MODEL_ROOT = os.path.join("/scratch.trt_llm_data", "llm-models") LLM_MODELS_ROOT = os.environ.get("LLM_MODELS_ROOT", DEFAULT_LLM_MODEL_ROOT) - return os.path.join(LLM_MODELS_ROOT, "datasets", - "ShareGPT_V3_unfiltered_cleaned_split.json") + path = os.path.join(LLM_MODELS_ROOT, "datasets", + "ShareGPT_V3_unfiltered_cleaned_split.json") + if not os.path.isfile(path): + pytest.skip(f"Dataset not found at: {path}") + return path
1127-1127
: Typo in commentMinor: “sever” → “server”.
- # Ensure the sever has started + # Ensure the server has started
1175-1175
: Decode check_output if it returns bytesEnsure regex operates on str.
- output = check_output(benchmark_cmd, env=run_env) + output = check_output(benchmark_cmd, env=run_env) + if isinstance(output, (bytes, bytearray)): + output = output.decode("utf-8", errors="replace")
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
tensorrt_llm/serve/scripts/benchmark_serving.py
(1 hunks)tests/integration/defs/accuracy/test_disaggregated_serving.py
(2 hunks)tests/integration/defs/disaggregated/test_disaggregated.py
(2 hunks)tests/integration/test_lists/qa/llm_function_full.txt
(1 hunks)tests/integration/test_lists/qa/llm_function_sanity.txt
(1 hunks)tests/integration/test_lists/test-db/l0_dgx_b200.yml
(1 hunks)tests/integration/test_lists/test-db/l0_dgx_h100.yml
(2 hunks)
✅ Files skipped from review due to trivial changes (1)
- tests/integration/test_lists/qa/llm_function_sanity.txt
🚧 Files skipped from review as they are similar to previous changes (5)
- tests/integration/test_lists/test-db/l0_dgx_h100.yml
- tensorrt_llm/serve/scripts/benchmark_serving.py
- tests/integration/defs/accuracy/test_disaggregated_serving.py
- tests/integration/test_lists/qa/llm_function_full.txt
- tests/integration/test_lists/test-db/l0_dgx_b200.yml
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py
: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.
Files:
tests/integration/defs/disaggregated/test_disaggregated.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
tests/integration/defs/disaggregated/test_disaggregated.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (2)
tests/integration/defs/disaggregated/test_disaggregated.py (2)
17-17
: LGTM: imports are necessary and usedre, tempfile, yaml, and the additional defs imports are required by the new code paths.
Also applies to: 19-19, 22-24
1204-1239
: LGTM: benchmark config builder is clear and parameterizedThe helper cleanly injects the KV backend and sensible defaults for small-node benchmarking.
PR_Github #14749 [ run ] completed with state |
/bot run --add-multi-gpu-test |
PR_Github #14776 [ run ] triggered by Bot |
PR_Github #14776 [ run ] completed with state |
2dd10cf
to
a87a577
Compare
/bot run --add-multi-gpu-test |
PR_Github #14785 [ run ] triggered by Bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (9)
tests/integration/defs/disaggregated/test_disaggregated.py (9)
1059-1063
: Guard LLM_ROOT and verify benchmark script existsIf LLM_ROOT isn't set or the script path changed, this fixture will crash. Prefer a clear skip to avoid hard failures.
1073-1087
: Skip gracefully if model path is missingAvoid hard failure if the model is not available on the runner.
1098-1099
: Initialize env safely and unify UCX_TLS usageHandle env=None and ensure all subprocesses (including benchmark) get the same UCX settings.
1111-1113
: Robust process cleanup; avoid UnboundLocalError and noisy terminateIf process startup fails, server_proc/workers_proc may be undefined; also terminate() on already-exited procs can raise.
1130-1135
: Warm-up client should use the same config_fileUsing example_dir/disagg_config.yaml risks mismatched ports/urls vs the server's temp config. Use the same config you're serving with.
1150-1154
: Benchmark command: drop dataset-path for random; pass run_env consistently
- Remove dataset-path since dataset-name=random.
- Use run_env so UCX_TLS applies.
Also applies to: 1172-1175
1198-1202
: Guard process cleanup to avoid UnboundLocalErrorIf an exception occurs before server_proc/workers_proc are created, the finally block will raise UnboundLocalError. Guard cleanup.
1251-1258
: Use TemporaryDirectory as a context manager for deterministic cleanupPrevents leaked temp dirs if the test fails early.
1259-1267
: Set backend-specific KV cache env from config (NIXL vs UCX)The benchmark runner doesn't set TRTLLM_USE_NIXL_KVCACHE / TRTLLM_USE_UCX_KVCACHE. Earlier disaggregated tests rely on these env flags. Derive backend from the YAML config and set env accordingly; also only set UCX_TLS for UCX.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
tensorrt_llm/serve/scripts/benchmark_serving.py
(1 hunks)tests/integration/defs/accuracy/test_disaggregated_serving.py
(2 hunks)tests/integration/defs/disaggregated/test_disaggregated.py
(2 hunks)tests/integration/test_lists/qa/llm_function_full.txt
(1 hunks)tests/integration/test_lists/qa/llm_function_sanity.txt
(1 hunks)tests/integration/test_lists/test-db/l0_dgx_b200.yml
(1 hunks)tests/integration/test_lists/test-db/l0_dgx_h100.yml
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
- tests/integration/test_lists/qa/llm_function_full.txt
- tests/integration/defs/accuracy/test_disaggregated_serving.py
- tensorrt_llm/serve/scripts/benchmark_serving.py
- tests/integration/test_lists/qa/llm_function_sanity.txt
- tests/integration/test_lists/test-db/l0_dgx_h100.yml
- tests/integration/test_lists/test-db/l0_dgx_b200.yml
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py
: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.
Files:
tests/integration/defs/disaggregated/test_disaggregated.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
tests/integration/defs/disaggregated/test_disaggregated.py
🧬 Code Graph Analysis (1)
tests/integration/defs/disaggregated/test_disaggregated.py (3)
tests/integration/defs/conftest.py (5)
llm_models_root
(77-83)llm_root
(180-181)disaggregated_test_root
(2339-2344)disaggregated_example_root
(270-275)llm_venv
(707-723)tests/integration/defs/trt_test_alternative.py (3)
check_call
(230-238)check_output
(241-267)popen
(179-198)tensorrt_llm/executor/request.py (1)
path
(48-49)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (3)
tests/integration/defs/disaggregated/test_disaggregated.py (3)
17-17
: LGTM!The new imports are necessary for the benchmarking functionality and follow proper Python import conventions.
Also applies to: 19-19, 22-24
1204-1238
: LGTM!The configuration helper function is well-structured and creates appropriate configurations for different backends.
1279-1280
: LGTM!The performance assertions comparing NIXL and UCX backends are reasonable, ensuring NIXL performance is within 5% of UCX for both E2EL and TTFT metrics.
PR_Github #14785 [ run ] completed with state |
a87a577
to
fafd1d6
Compare
/bot run --add-multi-gpu-test |
PR_Github #14826 [ run ] triggered by Bot |
PR_Github #14826 [ run ] completed with state |
/bot run --add-multi-gpu-test |
PR_Github #14873 [ run ] triggered by Bot |
Signed-off-by: Bo Deng <[email protected]>
Signed-off-by: Bo Deng <[email protected]>
Signed-off-by: Bo Deng <[email protected]>
Signed-off-by: Bo Deng <[email protected]>
32484da
to
a7be498
Compare
/bot run --add-multi-gpu-test |
PR_Github #14993 [ run ] triggered by Bot |
PR_Github #14993 [ run ] completed with state |
Signed-off-by: Bo Deng <[email protected]>
/bot run --add-multi-gpu-test |
PR_Github #15039 [ run ] triggered by Bot |
PR_Github #15039 [ run ] completed with state |
/bot run --add-multi-gpu-test |
PR_Github #15061 [ run ] triggered by Bot |
PR_Github #15061 [ run ] completed with state |
/bot run --add-multi-gpu-test --disable-fail-fast |
PR_Github #15161 [ run ] triggered by Bot |
PR_Github #15161 [ run ] completed with state |
018f4ba
to
4e21fab
Compare
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Description
Test Coverage
GitHub Bot Help
/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...
Provide a user friendly way for developers to interact with a Jenkins server.
Run
/bot [-h|--help]
to print this help message.See details below for each supported subcommand.
run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]
Launch build/test pipelines. All previously running jobs will be killed.
--reuse-test (optional)pipeline-id
(OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.--disable-reuse-test
(OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.--disable-fail-fast
(OPTIONAL) : Disable fail fast on build/tests/infra failures.--skip-test
(OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.--stage-list "A10-PyTorch-1, xxx"
(OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.--gpu-type "A30, H100_PCIe"
(OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.--test-backend "pytorch, cpp"
(OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.--only-multi-gpu-test
(OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.--disable-multi-gpu-test
(OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.--add-multi-gpu-test
(OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.--post-merge
(OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx"
(OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".--detailed-log
(OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.--debug
(OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in thestage-list
parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.For guidance on mapping tests to stage names, see
docs/source/reference/ci-overview.md
and the
scripts/test_to_stage_mapping.py
helper.kill
kill
Kill all running builds associated with pull request.
skip
skip --comment COMMENT
Skip testing for latest commit on pull request.
--comment "Reason for skipping build/test"
is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.reuse-pipeline
reuse-pipeline
Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.