Skip to content

Conversation

Shixiaowei02
Copy link
Collaborator

@Shixiaowei02 Shixiaowei02 commented Aug 11, 2025

Summary by CodeRabbit

  • Documentation

    • Prefer UCX-based KV cache in disaggregated serving docs, removed an outdated MPI troubleshooting FAQ, and clarified startup/config flow with explicit YAML examples and consistent "KV cache" terminology.
  • Refactor

    • Normalized cache backend names to uppercase (DEFAULT, UCX, MPI, NIXL); validation/schema updated accordingly.
  • Tests

    • Updated tests and test configs to use normalized uppercase backend values across disaggregated scenarios.

@Shixiaowei02 Shixiaowei02 requested review from a team as code owners August 11, 2025 06:20
Copy link
Contributor

coderabbitai bot commented Aug 11, 2025

📝 Walkthrough

Walkthrough

Enforces uppercase literals for CacheTransceiverConfig.backend and updates docs, examples, tests, and YAML configs to use uppercase backend values (e.g., "DEFAULT", "UCX", "NIXL", "MPI"). Removes an MPI-specific troubleshooting FAQ and replaces MPI KVCache env examples with UCX. No runtime control-flow changes.

Changes

Cohort / File(s) Summary
Core API change (backend literal types)
tensorrt_llm/llmapi/llm_args.py
Restrict CacheTransceiverConfig.backend to uppercase Literal values: ["DEFAULT","UCX","NIXL","MPI"].
Documentation & READMEs (MPI → UCX, FAQ removal)
docs/source/advanced/disaggregated-service.md, benchmarks/cpp/README.md, examples/cpp/executor/README.md
Remove MPI-centric troubleshooting FAQ and update README examples to use UCX KVCache env var (TRTLLM_USE_UCX_KVCACHE=1) instead of MPI.
Examples: disaggregated README & configs
examples/disaggregated/README.md, examples/disaggregated/disagg_config.yaml, examples/disaggregated/slurm/gen_yaml.py
Normalize backend values to uppercase; restructure README examples to use explicit YAML files and updated example filenames.
Integration tests: YAML configs (backend casing)
tests/integration/defs/disaggregated/test_configs/*
Change cache_transceiver_config.backend values to uppercase (defaultDEFAULT, ucxUCX, nixlNIXL, mpiMPI) across many test YAMLs.
Integration tests: test code updates
tests/integration/defs/accuracy/test_disaggregated_serving.py, tests/integration/defs/disaggregated/test_disaggregated_etcd.py, tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py, tests/integration/defs/disaggregated/test_disaggregated.py
Update test-generated/configured backend strings and related assertions to use uppercase backend literals.
Unit tests: llm_args adjustments
tests/unittest/llmapi/test_llm_args.py
Update unit test inputs and expectations from lowercase backend literals (e.g., ucx) to uppercase (UCX).
Benchmarks / examples (KVCache env var switch)
benchmarks/cpp/README.md, examples/cpp/executor/README.md
Replace example exports of TRTLLM_USE_MPI_KVCACHE=1 with TRTLLM_USE_UCX_KVCACHE=1 (documentation-only).

Sequence Diagram(s)

(No sequence diagram generated — changes are configuration, docs, and value normalization only.)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • Superjomn
  • chzblych
  • kaiyux

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@Shixiaowei02
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14754 [ run ] triggered by Bot

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🔭 Outside diff range comments (8)
examples/disaggregated/slurm/gen_yaml.py (3)

9-15: Python 3.8 compatibility: use typing.Tuple instead of builtin tuple[...]

tuple[...] requires Python 3.9+. Project guideline is Python 3.8+. Replace with Tuple[...] and import Tuple.

Apply:

-from typing import Dict, List
+from typing import Dict, List, Tuple
@@
-def process_node_and_task() -> tuple[int, List[str], List[str]]:
+def process_node_and_task() -> Tuple[int, List[str], List[str]]:

77-86: Python 3.8 compatibility: fix return annotation

Same issue as above: use Tuple[List[str], int].

Apply:

-def generate_urls(..., task_nodes_offset: int = 0) -> tuple[List[str], int]:
+def generate_urls(..., task_nodes_offset: int = 0) -> Tuple[List[str], int]:

Note: keep the actual parameters unchanged; only the return type needs Tuple[…]. Ensure Tuple is imported.


1-1: Missing required NVIDIA copyright header

Per repo guidelines, add a current-year header at the top of .py files.

Apply:

+# Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+
 import argparse
examples/disaggregated/README.md (1)

130-131: Typo in example YAML key: “refersh_interval” should be “refresh_interval”

This will confuse users copying the snippet. Please fix the key name.

- ref ersh_interval: 10.0
+ refresh_interval: 10.0
tests/integration/defs/accuracy/test_disaggregated_serving.py (2)

460-467: Lowercase “nixl” will fail validation — use uppercase “NIXL”

CacheTransceiverConfig.backend now accepts only uppercase literals. Update both occurrences.

Apply this diff:

-            "cache_transceiver_config": {
-                "backend": "nixl"
-            }
+            "cache_transceiver_config": {
+                "backend": "NIXL"
+            }

Also applies to: 591-598


600-602: Remove stray no-op lines

These lines have no effect and are likely leftovers; they should be deleted.

-        ctx_server_config["cache_transceiver_config"]
-        ctx_server_config["cache_transceiver_config"]
+        # (removed no-op lines)
tests/unittest/llmapi/test_llm_args.py (1)

1-1: Add NVIDIA SPDX header (required by repo guidelines)

This file appears to be missing the standard header. Please add:

# SPDX-FileCopyrightText: Copyright (c) 2022-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py (1)

1-1: Add NVIDIA SPDX header (required by repo guidelines)

Please add the standard header at the top:

# SPDX-FileCopyrightText: Copyright (c) 2022-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
🧹 Nitpick comments (13)
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_mpi.yaml (1)

12-12: Uppercasing to "MPI" matches the new Literal contract.

Looks correct and aligned with the enforced uppercase backend values.

  • Minor: quoting style is mixed across configs (some use quoted strings, others unquoted). Consider standardizing for consistency.
  • Optional: If MPI is not exercised in CI or is discouraged in docs, consider adding a brief comment in this file or moving tests to UCX to avoid drift.

Also applies to: 20-20

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_overlap.yaml (1)

14-14: DEFAULT uppercasing is consistent with llm_args validation.

Change is correct and should pass the new Literal validation.

Standardize quoting across YAML (either consistently quoted or unquoted) for readability.

Also applies to: 24-24

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_overlap_cuda_graph.yaml (1)

12-12: DEFAULT uppercasing matches the new accepted values.

Good alignment with the schema; no functional issues anticipated.

Consider normalizing quoting across files for consistency.

Also applies to: 23-23

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_nixl.yaml (1)

12-12: NIXL uppercasing is correct per the new Literal set.

Change is correct and consistent with the updated parser expectations.

Quoting is fine, but consider consistent quoting style across the repo.

Also applies to: 20-20

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_one_mtp.yaml (1)

17-17: DEFAULT uppercasing aligns with the new schema.

Looks good and should parse cleanly.

Keep quoting style consistent across similar configs.

Also applies to: 26-26

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_one_mtp_attention_dp_overlap.yaml (1)

17-17: DEFAULT uppercasing is correct.

Update matches the enforced Literal values.

Consider a repo-wide lint or check to keep backend literals consistently styled (quoted vs unquoted).

Also applies to: 27-27

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp1.yaml (1)

13-13: DEFAULT uppercasing is aligned with llm_args.

No issues from a parsing or validation perspective.

Optional consistency cleanup: unify quoting convention across YAML configs.

Also applies to: 21-21

benchmarks/cpp/README.md (2)

339-349: Add language to fenced code block for linting

markdownlint MD040 suggests specifying a language. For these shell snippets, annotate the opening fence with bash.

Example:

-```
+```bash
 export TRTLLM_USE_UCX_KVCACHE=1
 mpirun -n ${proc} benchmarks/disaggServerBenchmark ...

---

`347-351`: **Add language to fenced code block here as well**

Annotate the opening fence with bash to satisfy MD040.

```diff
-```
+```bash
 export TRTLLM_USE_UCX_KVCACHE=1
 mpirun -n 7 benchmarks/disaggServerBenchmark ...

</blockquote></details>
<details>
<summary>examples/cpp/executor/README.md (1)</summary><blockquote>

`127-134`: **Update to UCX env var looks correct; minor wording nit**

The env var switch is good. Nit: add a colon after “Note” and set fenced language to bash.

```diff
-Note setting `TRTLLM_USE_UCX_KVCACHE=1` is required to run disaggregated executor.
+Note: setting `TRTLLM_USE_UCX_KVCACHE=1` is required to run disaggregated executor.

Also annotate code fences:

-```
+```bash
 export TRTLLM_USE_UCX_KVCACHE=1
 ...

</blockquote></details>
<details>
<summary>tests/integration/defs/disaggregated/test_configs/disagg_config_ngram.yaml (1)</summary><blockquote>

`12-13`: **Quoting style inconsistency (optional)**

This file uses "DEFAULT" (quoted) while others use DEFAULT (unquoted). Both are valid YAML strings, but consider standardizing for consistency across configs.



Also applies to: 20-21

</blockquote></details>
<details>
<summary>tensorrt_llm/llmapi/llm_args.py (1)</summary><blockquote>

`1042-1055`: **Make backend input case-insensitive and clarify default**

Current typing enforces uppercase, which is fine given the BREAKING tag, but we can reduce friction by normalizing user input to uppercase before validation. Also, docs state default backend is UCX; here the default is `None`. If `from_string(None)` doesn’t resolve to UCX, that’s a mismatch.

- Option A (preferred): accept case-insensitive input via a validator, keep default as-is if `from_string(None)` maps to UCX.
- Option B: set default to `"UCX"` to precisely reflect docs.


Apply Option A (case-insensitive normalization):

```diff
 class CacheTransceiverConfig(StrictBaseModel, PybindMirror):
@@
-    backend: Optional[Literal["DEFAULT", "UCX", "NIXL", "MPI"]] = Field(
+    backend: Optional[Literal["DEFAULT", "UCX", "NIXL", "MPI"]] = Field(
         default=None,
         description=
         "The communication backend type to use for the cache transceiver.")
+
+    @field_validator("backend", mode="before")
+    @classmethod
+    def _normalize_backend(cls, v):
+        if v is None:
+            return v
+        vv = str(v).upper()
+        if vv not in ("DEFAULT", "UCX", "NIXL", "MPI"):
+            raise ValueError(f"Invalid cache_transceiver_config.backend: {v}")
+        return vv

Alternatively, Option B (explicit default):

-    backend: Optional[Literal["DEFAULT", "UCX", "NIXL", "MPI"]] = Field(
-        default=None,
+    backend: Optional[Literal["DEFAULT", "UCX", "NIXL", "MPI"]] = Field(
+        default="UCX",
         description=
         "The communication backend type to use for the cache transceiver.")

If you prefer Option A, please confirm that _CacheTransceiverBackendType.from_string(None) indeed defaults to UCX to keep behavior consistent with the docs.

tests/integration/defs/accuracy/test_disaggregated_serving.py (1)

118-125: Optional: avoid forcing UCX via env when backend is configured

You always set TRTLLM_USE_UCX_KVCACHE=1 regardless of the configured backend. Prefer relying on the explicit cache_transceiver_config.backend to reduce environment coupling and prevent accidental mismatches in backend-specific tests.

-    env_ctx["TRTLLM_USE_UCX_KVCACHE"] = "1"
+    # Rely on cache_transceiver_config.backend; avoid forcing UCX via env vars
     env_ctx["CUDA_VISIBLE_DEVICES"] = ",".join(map(str, range(ctx_total_gpus)))
@@
-    env_gen["TRTLLM_USE_UCX_KVCACHE"] = "1"
+    # Rely on cache_transceiver_config.backend; avoid forcing UCX via env vars
     env_gen["CUDA_VISIBLE_DEVICES"] = ",".join(
         map(str, range(ctx_total_gpus, ctx_total_gpus + gen_total_gpus)))
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a4f9e63 and 7c9600c.

📒 Files selected for processing (48)
  • benchmarks/cpp/README.md (1 hunks)
  • docs/source/advanced/disaggregated-service.md (0 hunks)
  • examples/cpp/executor/README.md (1 hunks)
  • examples/disaggregated/README.md (3 hunks)
  • examples/disaggregated/disagg_config.yaml (1 hunks)
  • examples/disaggregated/slurm/gen_yaml.py (2 hunks)
  • tensorrt_llm/llmapi/llm_args.py (1 hunks)
  • tests/integration/defs/accuracy/test_disaggregated_serving.py (10 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_cache_aware_balance.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_cache_aware_balance_deepseek_v3.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_cache_reuse.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_cache_reuse_deepseek_v3.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_conditional.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_conditional_deepseek_v3.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp2_genpp2.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp2_gentp2.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp4_genpp4.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_one_mtp.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_one_mtp_attention_dp_overlap.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_two_mtp.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_genpp2.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp1.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp1_trt_backend.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_one.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_one_mtp.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_overlap.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_overlap_cuda_graph.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_mpi.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_nixl.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_overlap_cuda_graph.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_ucx.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2pp2_gentp2pp2.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_cuda_graph_padding.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_diff_max_tokens.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_gen_only.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_gen_only_trt_backend.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_load_balance.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_mixed.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ngram.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_overlap.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_trt_backend.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_trtllm_sampler.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_disaggregated_etcd.py (1 hunks)
  • tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py (3 hunks)
  • tests/unittest/llmapi/test_llm_args.py (1 hunks)
💤 Files with no reviewable changes (1)
  • docs/source/advanced/disaggregated-service.md
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

  • examples/disaggregated/slurm/gen_yaml.py
  • tests/unittest/llmapi/test_llm_args.py
  • tests/integration/defs/disaggregated/test_disaggregated_etcd.py
  • tensorrt_llm/llmapi/llm_args.py
  • tests/integration/defs/accuracy/test_disaggregated_serving.py
  • tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

  • examples/disaggregated/slurm/gen_yaml.py
  • tests/unittest/llmapi/test_llm_args.py
  • tests/integration/defs/disaggregated/test_disaggregated_etcd.py
  • tensorrt_llm/llmapi/llm_args.py
  • tests/integration/defs/accuracy/test_disaggregated_serving.py
  • tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py
🪛 Ruff (0.12.2)
tests/unittest/llmapi/test_llm_args.py

667-667: CacheTransceiverConfig may be undefined, or defined from star imports

(F405)


675-675: CacheTransceiverConfig may be undefined, or defined from star imports

(F405)

🪛 markdownlint-cli2 (0.17.2)
benchmarks/cpp/README.md

346-346: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

examples/cpp/executor/README.md

129-129: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (39)
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp2_gentp2.yaml (1)

19-19: All backend values are correctly uppercased

I ran a repository-wide search for any lowercase backend: entries in YAML, MD, and Python files and found none. ✅ Ready to merge.

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_overlap_cuda_graph.yaml (1)

13-13: No lowercase cache_transceiver_config.backend entries remain

The only lowercase backend: pytorch occurs at the top level in
tests/integration/defs/disaggregated/test_configs/disagg_config_ngram.yaml:5,
not under any cache_transceiver_config block. All cache_transceiver_config.backend values are already uppercase.

tests/integration/defs/disaggregated/test_configs/disagg_config_diff_max_tokens.yaml (1)

13-13: Consistent normalization to uppercase backend

DEFAULT aligns with the updated allowed literals. Looks good.

Also applies to: 21-21

tests/integration/defs/disaggregated/test_configs/disagg_config_overlap.yaml (1)

19-19: Backend value normalized to DEFAULT

Matches the new schema expectations and keeps both blocks consistent.

Also applies to: 34-34

tests/integration/defs/disaggregated/test_configs/disagg_config_cache_reuse_deepseek_v3.yaml (1)

18-18: Uppercase backend literal applied correctly

DEFAULT is the expected value; both context and generation sections are in sync.

Also applies to: 33-33

tests/integration/defs/disaggregated/test_configs/disagg_config_conditional_deepseek_v3.yaml (1)

21-21: Alignment with uppercase backend contract

DEFAULT usage is correct in both locations.

Also applies to: 36-36

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite.yaml (1)

13-13: DEFAULT backend literal adopted

Change is correct and consistent with the new validation logic.

Also applies to: 21-21

tests/integration/defs/disaggregated/test_configs/disagg_config_gen_only_trt_backend.yaml (1)

16-16: DEFAULT backend literal adopted

Generation servers’ cache_transceiver_config is updated correctly.

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp1_trt_backend.yaml (1)

11-11: Uppercase backend literal applied consistently

Both sections correctly use DEFAULT per the new accepted values.

Also applies to: 19-19

examples/disaggregated/slurm/gen_yaml.py (2)

200-201: Normalize backend to uppercase DEFAULT — aligned with parser expectations

Matches the new Literal["DEFAULT","UCX","NIXL","MPI"]. Good.


228-229: Normalize backend to uppercase DEFAULT — aligned with parser expectations

Consistent with the new enforcement in llm_args.py. Good.

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_one_mtp.yaml (1)

17-17: No lowercase backend values detected
The repository-wide search returned only uppercase DEFAULT, UCX, MPI, and NIXL entries for all backend settings. All backends have been correctly normalized.

tests/integration/defs/disaggregated/test_configs/disagg_config_trtllm_sampler.yaml (1)

19-19: Backend normalized to DEFAULT

Both blocks updated; matches the enforced uppercase. Good.

Also applies to: 35-35

tests/integration/defs/disaggregated/test_configs/disagg_config_trt_backend.yaml (1)

13-13: Backend normalized to DEFAULT

Updates are correct and consistent across context/generation blocks.

Also applies to: 21-21

tests/integration/defs/disaggregated/test_configs/disagg_config_mixed.yaml (1)

13-13: Backend normalized to DEFAULT

Consistent with the new accepted literals. LGTM.

Also applies to: 21-21

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_genpp2.yaml (1)

19-19: Backend normalized to DEFAULT

Both instances updated as expected. Good.

Also applies to: 34-34

tests/integration/defs/disaggregated/test_configs/disagg_config_load_balance.yaml (1)

22-22: Backend normalized to DEFAULT

Matches the uppercase-only change; consistent across both sections.

Also applies to: 41-41

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp4_genpp4.yaml (1)

19-19: Backend normalized to DEFAULT

Both updated; aligned with llm_args constraints.

Also applies to: 34-34

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp2_genpp2.yaml (2)

19-21: LGTM: backend normalized to DEFAULT

Case normalization aligns with parser expectations. No other changes observed.

Also applies to: 34-36


19-21: All backend values are uppercase

A case-sensitive grep across tests/, examples/, docs/, and tensorrt_llm/ returned no lowercase backend: values—everything (DEFAULT, UCX, MPI, NIXL) is correctly capitalized now. No further action needed.

tests/integration/defs/disaggregated/test_configs/disagg_config_cuda_graph_padding.yaml (1)

19-21: LGTM: backend normalized to DEFAULT

Consistent with new Literal set. No functional changes to other fields.

Also applies to: 37-39

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_two_mtp.yaml (1)

17-20: LGTM: backend normalized to DEFAULT

Matches the enforced uppercase literals. Disagg config semantics unchanged.

Also applies to: 28-29

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2pp2_gentp2pp2.yaml (1)

19-21: LGTM: backend normalized to DEFAULT

Two occurrences updated; consistent with parser expectations.

Also applies to: 34-36

tests/integration/defs/disaggregated/test_configs/disagg_config_gen_only.yaml (1)

16-20: LGTM: backend normalized to DEFAULT

Generation-only config updated accordingly. No other changes.

tests/integration/defs/disaggregated/test_configs/disagg_config_ngram.yaml (1)

12-15: LGTM: backend normalized to "DEFAULT"

Functionally correct.

Also applies to: 20-23

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite.yaml (1)

13-13: Uppercase backend literal — LGTM

Using DEFAULT aligns with the updated CacheTransceiverConfig.backend validation.

Also applies to: 21-21

examples/disaggregated/disagg_config.yaml (1)

14-14: Backend normalization to "DEFAULT" — LGTM

Consistent with the new uppercase-only Literal in llm_args.py.

Also applies to: 22-22

examples/disaggregated/README.md (1)

15-15: Disaggregated KV cache docs updates — LGTM

  • Backend options and examples now show uppercase tokens and UCX default.
  • Split ctx/gen extra-llm-api YAMLs clarifies setup.
  • Dynamic scaling section aligned with UCX usage.

Looks good.

Also applies to: 17-17, 19-40, 42-51, 112-118, 198-198

tests/integration/defs/accuracy/test_disaggregated_serving.py (1)

224-225: Backend normalization to "DEFAULT" in test configs — LGTM

Matches new uppercase-only validation in llm_args.py and keeps tests aligned with docs.

Also applies to: 233-234, 268-270, 308-309, 316-317, 359-360, 373-374, 422-424, 497-498, 542-543, 549-550, 629-630, 636-637

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_ucx.yaml (1)

12-12: UCX backend normalization — LGTM

Uppercase UCX complies with the new validation in llm_args.py.

Also applies to: 20-20

tests/integration/defs/disaggregated/test_configs/disagg_config_cache_aware_balance_deepseek_v3.yaml (1)

20-20: DEFAULT backend normalization — LGTM

Consistent and correct with the updated schema.

Also applies to: 36-36

tests/integration/defs/disaggregated/test_configs/disagg_config_cache_reuse.yaml (1)

18-18: DEFAULT backend normalization — LGTM

Valid YAML; matches the uppercase-only contract.

Also applies to: 33-33

tests/integration/defs/disaggregated/test_configs/disagg_config_conditional.yaml (1)

21-21: Backend normalization verified across repository

Ran the suggested regex scan across all .yaml, .yml, and .py files and confirmed there are no lowercase backend values—every entry uses DEFAULT, UCX, MPI, or NIXL. Ready to merge.

tests/unittest/llmapi/test_llm_args.py (1)

667-671: UCX uppercasing matches the new Literal contract

Switching backend to "UCX" in both the valid and invalid-args tests is correct and consistent.

Also applies to: 675-676

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_one.yaml (1)

14-14: Uppercase DEFAULT applied consistently

Both occurrences updated to DEFAULT. No structural changes — this is correct.

Also applies to: 23-23

tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py (1)

134-134: Switch to backend="DEFAULT" is correct

All three CacheTransceiverConfig constructions now match the uppercase Literal. Consistent with the API change.

Also applies to: 277-277, 380-380

tests/integration/defs/disaggregated/test_disaggregated_etcd.py (1)

247-249: Generated config now uses backend: "DEFAULT" — good

Matches the new accepted values and keeps env-based UCX usage consistent.

Also applies to: 255-257

tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp.yaml (1)

14-14: DEFAULT normalization — OK

Both cache_transceiver_config.backend entries are updated to DEFAULT.

Also applies to: 23-23

tests/integration/defs/disaggregated/test_configs/disagg_config_cache_aware_balance.yaml (1)

24-24: DEFAULT applied in both sections

Uppercase values align with the updated validation logic.

Also applies to: 38-38

Comment on lines +339 to 349
export TRTLLM_USE_UCX_KVCACHE=1
mpirun -n ${proc} benchmarks/disaggServerBenchmark --context_engine_dirs ${context_engine_0},${context_engine_1}...,${context_engine_{m-1}} \
--generation_engine_dirs ${generation_engine_0},${generation_engine_1}...,${generation_engine_{n-1}} --dataset ${dataset_path}
```
This command will launch m context engines and n generation engines. You need to ensure `proc` is equal to the sum of the number of processes required for each engine plus 1. Since we use orchestrator mode for `disaggServerBenchmark` we need an additional process as the orchestrator. For example, if there are two context engines (one is TP2_PP1,another is TP1_PP1) and two generation engines(one is TP2_PP1,another is TP1_PP1), then the `proc` value should be set to 7.
for example:
```
export TRTLLM_USE_MPI_KVCACHE=1
export TRTLLM_USE_UCX_KVCACHE=1
mpirun -n 7 benchmarks/disaggServerBenchmark --context_engine_dirs ${llama_7b_tp2_pp1_dir},${llama_7b_tp1_pp1_dir} --generation_engine_dirs ${llama_7b_tp1_pp1_dir},${llama_7b_tp2_pp1_dir} --dataset ${dataset_path}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Switch to UCX env var looks correct

TRTLLM_USE_UCX_KVCACHE supersedes MPI variant in disaggServerBenchmark examples. Please ensure any internal tooling/docs refer to the new env var.

Quick scan for old variable:


🏁 Script executed:

#!/bin/bash
rg -n --no-heading -e 'TRTLLM_USE_MPI_KVCACHE'

Length of output: 1674


Ensure all references to the old MPI KV-cache env var are updated

I confirmed your switch to TRTLLM_USE_UCX_KVCACHE in benchmarks/cpp/README.md is correct, but there are still numerous references to the deprecated TRTLLM_USE_MPI_KVCACHE elsewhere in the repo. Please update these to use the new UCX-based variable for consistency:

• tensorrt_llm/_torch/pyexecutor/kv_cache_transceiver.py:44
• tests/integration/defs/triton_server/test.sh:1130, 1145
• tests/integration/defs/triton_server/test_triton_llm.py:3380
• tests/integration/defs/cpp/test_e2e.py:48
• tests/integration/defs/cpp/test_multi_gpu.py:35
• tests/integration/defs/disaggregated/test_disaggregated.py:706
• docker/common/install_mpi4py.sh:34, 47, 50
• examples/disaggregated/README.md:155, 198
• cpp/tensorrt_llm/common/envUtils.cpp:271

Let’s replace TRTLLM_USE_MPI_KVCACHE with TRTLLM_USE_UCX_KVCACHE in these locations (and any other internal tooling/docs) to complete the migration.

🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

346-346: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

Comment on lines +667 to 673
config = CacheTransceiverConfig(backend="UCX",
max_tokens_in_buffer=1024)
assert config.backend == "ucx"
assert config.backend == "UCX"
assert config.max_tokens_in_buffer == 1024

# Arbitrary arguments should be rejected
with pytest.raises(
pydantic_core._pydantic_core.ValidationError) as exc_info:
CacheTransceiverConfig(backend="ucx", invalid_config="should_fail")
CacheTransceiverConfig(backend="UCX", invalid_config="should_fail")
assert "invalid_config" in str(exc_info.value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Fix Ruff F405: star import makes CacheTransceiverConfig ambiguous

Ruff flags these lines because CacheTransceiverConfig originates from a star import. Two options:

  • Short-term minimal patch (keeps current code style):
    • Add an explicit import alongside the star import to satisfy Ruff.
-from tensorrt_llm.llmapi.llm_args import *
+from tensorrt_llm.llmapi.llm_args import *  # noqa: F403 (intentional)
+from tensorrt_llm.llmapi.llm_args import CacheTransceiverConfig  # Explicit import resolves F405
  • Recommended refactor (aligns with project guideline to maintain namespace):
    • Replace the star import with a module import and qualify usages (larger change across this file).
-from tensorrt_llm.llmapi.llm_args import *
+import tensorrt_llm.llmapi.llm_args as llm_args

Then reference via llm_args.CacheTransceiverConfig, llm_args.CudaGraphConfig, etc.

If you want, I can follow up with a patch converting this file to fully-qualified imports.

You can confirm the F405 fix locally with:


🏁 Script executed:

#!/bin/bash
# If ruff is available:
ruff check tests/unittest/llmapi/test_llm_args.py || true

Length of output: 65602


Refactor star import to resolve Ruff F403/F405
Ruff reports 105 F403/F405 errors in tests/unittest/llmapi/test_llm_args.py because of the wildcard import. Rather than adding dozens of explicit imports, replace the star import with a module import and qualify every symbol usage:

  • In tests/unittest/llmapi/test_llm_args.py, remove:
    from tensorrt_llm.llmapi.llm_args import *
  • Add:
    import tensorrt_llm.llmapi.llm_args as llm_args
  • Update all references, for example:
    - config = CacheTransceiverConfig(backend="UCX", max_tokens_in_buffer=1024)
    + config = llm_args.CacheTransceiverConfig(backend="UCX", max_tokens_in_buffer=1024)

This single refactor will eliminate all F403/F405 warnings in the file.

🧰 Tools
🪛 Ruff (0.12.2)

667-667: CacheTransceiverConfig may be undefined, or defined from star imports

(F405)


675-675: CacheTransceiverConfig may be undefined, or defined from star imports

(F405)

🤖 Prompt for AI Agents
In tests/unittest/llmapi/test_llm_args.py around lines 667 to 676, the file uses
a wildcard import from tensorrt_llm.llmapi.llm_args causing many Ruff F403/F405
warnings; remove the star import and replace it with a module import (import
tensorrt_llm.llmapi.llm_args as llm_args), then update every symbol reference in
the file to be qualified with the llm_args. prefix (e.g.,
llm_args.CacheTransceiverConfig) so all names are explicitly referenced and the
F403/F405 errors are resolved.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14754 [ run ] completed with state SUCCESS
/LLM/release-1.0/L0_MergeRequest_PR pipeline #51 completed with status: 'FAILURE'

@Shixiaowei02 Shixiaowei02 requested a review from a team as a code owner August 14, 2025 09:54
@Shixiaowei02
Copy link
Collaborator Author

/bot run

@Shixiaowei02 Shixiaowei02 changed the title [None][fix] BREAKING CHANGE: Mismatch between docs and actual commands [TRTLLM-7030][fix] BREAKING CHANGE: Mismatch between docs and actual commands Aug 14, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (6)
tests/integration/defs/accuracy/test_disaggregated_serving.py (6)

512-514: Lowercase "nixl" will break with new uppercase-only backend literals.

The backend is now validated against uppercase literals ["DEFAULT", "UCX", "NIXL", "MPI"]. Using "nixl" will fail argument parsing/validation.

Apply this diff to fix:

         ctx_server_config = {
             "disable_overlap_scheduler": True,
             "cache_transceiver_config": {
-                "backend": "nixl"
+                "backend": "NIXL"
             }
         }
         gen_server_config = {
             "disable_overlap_scheduler": True,
             "cache_transceiver_config": {
-                "backend": "nixl"
+                "backend": "NIXL"
             }
         }

Also applies to: 516-520


644-646: Same issue: lowercase "nixl" should be "NIXL".

Mirror the uppercase normalization here as well.

Apply this diff:

         ctx_server_config = {
             "disable_overlap_scheduler": True,
             "cache_transceiver_config": {
-                "backend": "nixl"
+                "backend": "NIXL"
             }
         }
         gen_server_config = {
             "disable_overlap_scheduler": True,
             "cache_transceiver_config": {
-                "backend": "nixl"
+                "backend": "NIXL"
             }
         }

Also applies to: 648-652


653-654: Redundant no-op lines; remove to avoid confusion.

These two lines evaluate the dict key but do nothing, and should be deleted.

Apply this diff:

-        ctx_server_config["cache_transceiver_config"]
-        ctx_server_config["cache_transceiver_config"]

53-53: Type annotation uses PEP 585 builtins; breaks on Python 3.8.

Per coding guidelines, target Python 3.8+. The annotation list[concurrent.futures.Future[RequestOutput]] requires 3.9+. Use typing.List and avoid parameterizing concurrent.futures.Future.

Apply this diff:

-        self.futures: list[concurrent.futures.Future[RequestOutput]] = []
+        self.futures: List[concurrent.futures.Future] = []

61-64: ThreadPoolExecutor.shutdown(cancel_futures=...) is not available in Python 3.8.

Passing cancel_futures causes a TypeError on 3.8. Since you already cancel futures above, call shutdown without that parameter to retain 3.8 compatibility.

Apply this diff:

-        self.shutdown(wait=True, cancel_futures=True)
+        self.shutdown(wait=True)

1-4: Missing NVIDIA copyright header.

Per coding guidelines, prepend the NVIDIA copyright header (current year) to all source files.

Apply this diff:

+# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
+
 # I want to create accuracy tests for disaggregated serving.
 # I need to to this by creating a new class that mimics LLM class. Instead of implementing the
 # actual methods it will send OAI requests to the disaggregated serving endpoint.
 # Please take a look at the existing test_llm_api_pytorch.py file for reference.
🧹 Nitpick comments (2)
tests/integration/defs/accuracy/test_disaggregated_serving.py (2)

127-133: Consider gating TRTLLM_USE_UCX_KVCACHE by selected backend.

You're forcing UCX KVCache for all tests, including NIXL. If NIXL doesn't need or conflicts with this, prefer conditional enabling to avoid unintended behavior.

Apply this diff to set the env var only when using UCX-like backends:

-        env_ctx["TRTLLM_USE_UCX_KVCACHE"] = "1"
+        if ctx_server_config.get("cache_transceiver_config", {}).get("backend", "DEFAULT") in {"DEFAULT", "UCX"}:
+            env_ctx["TRTLLM_USE_UCX_KVCACHE"] = "1"
...
-        env_gen["TRTLLM_USE_UCX_KVCACHE"] = "1"
+        if gen_server_config.get("cache_transceiver_config", {}).get("backend", "DEFAULT") in {"DEFAULT", "UCX"}:
+            env_gen["TRTLLM_USE_UCX_KVCACHE"] = "1"

Also applies to: 148-154


197-201: Health check loop can hang; add network timeout.

requests.get without timeout can block. Use a small timeout to fail fast and keep the loop responsive.

Apply this diff:

-                        response = requests.get("http://localhost:8000/health")
+                        response = requests.get("http://localhost:8000/health", timeout=2)
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these settings in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 7c9600c and 9c7d203.

📒 Files selected for processing (1)
  • tests/integration/defs/accuracy/test_disaggregated_serving.py (10 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+
Python indentation: 4 spaces, no tabs
Maintain module namespace in imports (from package.subpackage import foo; then use foo.SomeClass())
Python file names use snake_case
Python class names use PascalCase
Python functions/methods and local variables use snake_case; variables starting with a number get k_ prefix (e.g., k_99th_percentile)
Global variables use G_ prefixed UPPER_SNAKE_CASE (e.g., G_MY_GLOBAL)
Constants use UPPER_SNAKE_CASE in Python
Avoid shadowing variables from outer scopes in Python
Initialize all externally visible members of a Python class in init
Prefer docstrings for interfaces used outside a file; comments for local code
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes/variables inline with short docstrings
Avoid reflection when simple alternatives exist (e.g., prefer explicit parameters over dict(**locals()))
In try/except, catch the narrowest exceptions possible
For duck-typing with try/except, keep try body minimal and put logic in else

Files:

  • tests/integration/defs/accuracy/test_disaggregated_serving.py
**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

Prepend NVIDIA copyright header (current year) to all source files

Files:

  • tests/integration/defs/accuracy/test_disaggregated_serving.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (8)
tests/integration/defs/accuracy/test_disaggregated_serving.py (8)

263-264: Uppercasing backend to DEFAULT is correct and aligned with new literals.

Using "DEFAULT" here matches the updated CacheTransceiverConfig backend literals and prevents validation errors at runtime.

Please ensure any other disaggregated tests/configs still using lower-case "default" are updated in this repo (including YAMLs and docs). If you want, I can scan and patch remaining occurrences.

Also applies to: 272-274


311-312: Good: tests now explicitly use uppercase DEFAULT backend.

This avoids failures with the new strict Literal typing.


351-352: Good: ngram test configs updated to DEFAULT.

Consistent with repository-wide backend normalization.

Also applies to: 359-360


403-404: Good: eagle3 test configs updated to DEFAULT.

No issues spotted with these changes.

Also applies to: 417-418


474-475: Good: Llama4Scout test configs updated to DEFAULT.

Matches the new accepted backend values.


549-550: Good: DeepSeek auto_dtype configs use DEFAULT.

Consistent with new Literal enforcement.


595-596: Good: Gemma test configs updated to DEFAULT.

No further comments here.

Also applies to: 602-603


683-684: Good: Qwen auto_dtype configs updated to DEFAULT.

Consistent and correct.

Also applies to: 690-691

@tensorrt-cicd
Copy link
Collaborator

PR_Github #15283 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #15283 [ run ] completed with state SUCCESS
/LLM/release-1.0/L0_MergeRequest_PR pipeline #113 completed with status: 'FAILURE'

@Shixiaowei02
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #15376 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #15376 [ run ] completed with state SUCCESS
/LLM/release-1.0/L0_MergeRequest_PR pipeline #131 completed with status: 'FAILURE'

@Shixiaowei02
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #15428 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #15428 [ run ] completed with state FAILURE
/LLM/release-1.0/L0_MergeRequest_PR pipeline #141 completed with status: 'FAILURE'

@Shixiaowei02
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #15687 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #15687 [ run ] completed with state SUCCESS
/LLM/release-1.0/L0_MergeRequest_PR pipeline #196 completed with status: 'FAILURE'

@Shixiaowei02
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #15923 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #15923 [ run ] completed with state SUCCESS
/LLM/release-1.0/L0_MergeRequest_PR pipeline #237 completed with status: 'FAILURE'

@Shixiaowei02
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16099 [ run ] triggered by Bot

@Shixiaowei02
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16244 [ run ] triggered by Bot

Signed-off-by: Shixiaowei02 <[email protected]>
@tensorrt-cicd
Copy link
Collaborator

PR_Github #16244 [ run ] completed with state SUCCESS
/LLM/release-1.0/L0_MergeRequest_PR pipeline #273 completed with status: 'FAILURE'

@Shixiaowei02
Copy link
Collaborator Author

/bot run

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
benchmarks/cpp/README.md (2)

347-351: Mirror the same fixes in the second snippet (bash fence + ./ path)

Same notes as above: declare the code fence language and keep the executable path style consistent.

Apply:

-```
+```bash
 export TRTLLM_USE_UCX_KVCACHE=1
-mpirun -n 7 benchmarks/disaggServerBenchmark --context_engine_dirs ${llama_7b_tp2_pp1_dir},${llama_7b_tp1_pp1_dir} --generation_engine_dirs ${llama_7b_tp1_pp1_dir},${llama_7b_tp2_pp1_dir} --dataset ${dataset_path}
+mpirun -n 7 ./benchmarks/disaggServerBenchmark --context_engine_dirs ${llama_7b_tp2_pp1_dir},${llama_7b_tp1_pp1_dir} --generation_engine_dirs ${llama_7b_tp1_pp1_dir},${llama_7b_tp2_pp1_dir} --dataset ${dataset_path}

---

`339-349`: **Repo-wide clean-up: replace deprecated MPI KV-cache env var everywhere**

This file is updated, but there are still many references to TRTLLM_USE_MPI_KVCACHE across the repo. Please complete the migration to TRTLLM_USE_UCX_KVCACHE in docs, scripts, tests, and helper tooling.




Run to locate remaining occurrences:

```shell
#!/bin/bash
# Find deprecated and new KV-cache env vars (exclude typical build artifacts).
rg -n --no-heading -g '!**/build/**' -g '!**/dist/**' -g '!**/.venv/**' -e '\bTRTLLM_USE_MPI_KVCACHE\b' -e '\bTRTLLM_USE_UCX_KVCACHE\b'
🧹 Nitpick comments (2)
benchmarks/cpp/README.md (2)

339-342: UCX env var switch looks good; add bash fence + path consistency

The move to TRTLLM_USE_UCX_KVCACHE is correct for disaggServerBenchmark. Please also label the fenced block as bash (MD040) and keep path style consistent with prior examples that use ./benchmarks/... for executables.

Apply:

-```
+```bash
 export TRTLLM_USE_UCX_KVCACHE=1
-mpirun -n ${proc} benchmarks/disaggServerBenchmark --context_engine_dirs ${context_engine_0},${context_engine_1}...,${context_engine_{m-1}} \
+mpirun -n ${proc} ./benchmarks/disaggServerBenchmark --context_engine_dirs ${context_engine_0},${context_engine_1}...,${context_engine_{m-1}} \
 --generation_engine_dirs ${generation_engine_0},${generation_engine_1}...,${generation_engine_{n-1}} --dataset ${dataset_path}

---

`339-342`: **Avoid misleading placeholder syntax in shell snippets**

`${context_engine_{m-1}}` and `${generation_engine_{n-1}}` are not valid bash variable names. Consider using neutral placeholders or brace expansion examples to avoid copy/paste errors, e.g., `ctx_dir_{0..m-1}` and `gen_dir_{0..n-1}`, or angle-bracketed tokens like `<ctx_dir_0>,...,<ctx_dir_{m-1}>`.

</blockquote></details>

</blockquote></details>

<details>
<summary>📜 Review details</summary>

**Configuration used**: Path: .coderabbit.yaml

**Review profile**: CHILL

**Plan**: Pro

**💡 Knowledge Base configuration:**

- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 044700fee7d836160c1083d86758cc8351436f83 and 3823bee58b3045444903185c8f8ae341f14033a8.

</details>

<details>
<summary>📒 Files selected for processing (1)</summary>

* `benchmarks/cpp/README.md` (1 hunks)

</details>

<details>
<summary>🧰 Additional context used</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.17.2)</summary>

<details>
<summary>benchmarks/cpp/README.md</summary>

346-346: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

</details>

<details>
<summary>⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)</summary>

* GitHub Check: Pre-commit Check

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16256 [ run ] triggered by Bot

@nv-guomingz
Copy link
Collaborator

Do we need to apply similar changes on 1.0 release? @Shixiaowei02

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16256 [ run ] completed with state SUCCESS
/LLM/release-1.0/L0_MergeRequest_PR pipeline #274 completed with status: 'FAILURE'

Signed-off-by: Shixiaowei02 <[email protected]>
@Shixiaowei02
Copy link
Collaborator Author

/bot run --only-multi-gpu-test

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/integration/defs/disaggregated/test_disaggregated.py (1)

1231-1265: Optional: type-restrict and validate the backend parameter to fail fast on typos

Locking the accepted values here will surface mistakes early in CI and keeps tests self-documenting with the new uppercase contract.

Apply this diff to restrict and validate the parameter:

-def get_config_for_benchmark(model_root, backend):
+def get_config_for_benchmark(model_root: str,
+                             backend: "Literal['DEFAULT','UCX','NIXL','MPI']"):
+    allowed = {"DEFAULT", "UCX", "NIXL", "MPI"}
+    if backend not in allowed:
+        raise ValueError(f"Invalid cache_transceiver backend: {backend}. "
+                         f"Expected one of {sorted(allowed)}")

Add the missing import near the top of the file:

from typing import Literal
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 3823bee and 88767e8.

📒 Files selected for processing (2)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_gen_only_bs1.yaml (2 hunks)
  • tests/integration/defs/disaggregated/test_disaggregated.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_gen_only_bs1.yaml
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+
Python indentation: 4 spaces, no tabs
Maintain module namespace in imports (from package.subpackage import foo; then use foo.SomeClass())
Python file names use snake_case
Python class names use PascalCase
Python functions/methods and local variables use snake_case; variables starting with a number get k_ prefix (e.g., k_99th_percentile)
Global variables use G_ prefixed UPPER_SNAKE_CASE (e.g., G_MY_GLOBAL)
Constants use UPPER_SNAKE_CASE in Python
Avoid shadowing variables from outer scopes in Python
Initialize all externally visible members of a Python class in init
Prefer docstrings for interfaces used outside a file; comments for local code
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes/variables inline with short docstrings
Avoid reflection when simple alternatives exist (e.g., prefer explicit parameters over dict(**locals()))
In try/except, catch the narrowest exceptions possible
For duck-typing with try/except, keep try body minimal and put logic in else

Files:

  • tests/integration/defs/disaggregated/test_disaggregated.py
**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend NVIDIA copyright header (current year) to all source files

Files:

  • tests/integration/defs/disaggregated/test_disaggregated.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16298 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16298 [ run ] completed with state SUCCESS
/LLM/release-1.0/L0_MergeRequest_PR pipeline #282 (Partly Tested) completed with status: 'FAILURE'

Signed-off-by: Shixiaowei02 <[email protected]>
@Shixiaowei02
Copy link
Collaborator Author

/bot run --only-multi-gpu-test

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
tests/integration/defs/accuracy/test_disaggregated_serving.py (4)

1-1: Heads-up: missing NVIDIA copyright header (guideline).

Coding guidelines require the NVIDIA copyright header at the top of all Python sources. Consider adding it in a follow-up if this file doesn’t already include it via a repo-wide mechanism.


516-518: NIXL tests: verify interaction with unconditional UCX env.

Inside launch_disaggregated_llm, we always export TRTLLM_USE_UCX_KVCACHE=1 (Line 129 and Line 150) regardless of the chosen backend. When backend="NIXL", that env var is likely ignored, but it could be confusing or lead to unexpected code paths if the server code checks it early.

  • Ask: does setting TRTLLM_USE_UCX_KVCACHE=1 have any side effects when backend="NIXL"? If yes, consider scoping the env export to UCX/DEFAULT-only runs.

If needed, I can propose a minimal refactor to make the env opt-in via a function parameter (defaulting to UCX/DEFAULT) and disable it for NIXL tests.

Also applies to: 522-524


651-653: Qwen3 NIXL test: double-check UCX env vs. NIXL backend.

Same point as DeepSeek’s NIXL test: launch_disaggregated_llm always sets TRTLLM_USE_UCX_KVCACHE=1 (Lines 129, 150). Please confirm it’s benign under NIXL. If not, we should gate that env by backend.

Also applies to: 657-659


312-313: Reduce duplication with a shared constant for default transceiver config (optional).

"DEFAULT" appears many times across this file. A tiny constant reduces drift and edit churn:

+ # Near imports/top of file
+ CACHE_TRANSCEIVER_DEFAULT = {"backend": "DEFAULT"}

Then, for call sites (example below):

-ctx_server_config["cache_transceiver_config"] = {"backend": "DEFAULT"}
-gen_server_config["cache_transceiver_config"] = {"backend": "DEFAULT"}
+ctx_server_config["cache_transceiver_config"] = CACHE_TRANSCEIVER_DEFAULT
+gen_server_config["cache_transceiver_config"] = CACHE_TRANSCEIVER_DEFAULT

Repeat similarly where appropriate.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 88767e8 and 6087101.

📒 Files selected for processing (1)
  • tests/integration/defs/accuracy/test_disaggregated_serving.py (12 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+
Python indentation: 4 spaces, no tabs
Maintain module namespace in imports (from package.subpackage import foo; then use foo.SomeClass())
Python file names use snake_case
Python class names use PascalCase
Python functions/methods and local variables use snake_case; variables starting with a number get k_ prefix (e.g., k_99th_percentile)
Global variables use G_ prefixed UPPER_SNAKE_CASE (e.g., G_MY_GLOBAL)
Constants use UPPER_SNAKE_CASE in Python
Avoid shadowing variables from outer scopes in Python
Initialize all externally visible members of a Python class in init
Prefer docstrings for interfaces used outside a file; comments for local code
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes/variables inline with short docstrings
Avoid reflection when simple alternatives exist (e.g., prefer explicit parameters over dict(**locals()))
In try/except, catch the narrowest exceptions possible
For duck-typing with try/except, keep try body minimal and put logic in else

Files:

  • tests/integration/defs/accuracy/test_disaggregated_serving.py
**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend NVIDIA copyright header (current year) to all source files

Files:

  • tests/integration/defs/accuracy/test_disaggregated_serving.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (9)
tests/integration/defs/accuracy/test_disaggregated_serving.py (9)

263-265: Uppercasing backend literals aligns tests with the new LlmArgs schema.

Switching to "DEFAULT" here is correct given CacheTransceiverConfig.backend now accepts uppercase-only literals. No logic changes; just prevents schema/validation failures.

Also applies to: 272-274


312-313: Consistent with breaking change: backend → "DEFAULT".

These updates keep the accuracy tests in sync with the uppercase-only backend enum. Good catch.


354-356: ngram test updated to uppercase backend.

Looks good and matches the Literal["DEFAULT","UCX","NIXL","MPI"] change.

Also applies to: 362-364


407-409: eagle3 config uses uppercase backend.

Change is correct; avoids config validation errors at server start.

Also applies to: 421-423


475-476: Scout model: uppercase backend is correct.

Keeps this test coherent with the schema change and the docs shift away from lowercase identifiers.


553-555: DeepSeek-V3-Lite auto-dtype: backend uppercasing is correct.

Matches the new allowed literals and prevents schema mismatches.


600-602: Gemma3 test: backend uppercasing is correct.

No functional change; aligns with the enum and doc normalization.

Also applies to: 607-609


689-691: Qwen3 auto-dtype: uppercase backend is correct.

Consistent with the schema change; nothing else to do.

Also applies to: 696-698


263-265: No lingering lowercase backend values detected. The repository-wide grep only surfaced uppercase "DEFAULT" usages and no lowercase occurrences of default, ucx, nixl, or mpi. All existing assignments are already uppercase, so no further changes are needed.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16348 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #16348 [ run ] completed with state SUCCESS
/LLM/release-1.0/L0_MergeRequest_PR pipeline #285 (Partly Tested) completed with status: 'FAILURE'

@Shixiaowei02
Copy link
Collaborator Author

Do we need to apply similar changes on 1.0 release? @Shixiaowei02

#7191 Yes, it has been merged.

@Shixiaowei02 Shixiaowei02 deleted the release/1.0-fix branch August 25, 2025 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants