-
Notifications
You must be signed in to change notification settings - Fork 1.7k
[TRTLLM-7030][fix] BREAKING CHANGE: Mismatch between docs and actual commands #6780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughEnforces uppercase literals for CacheTransceiverConfig.backend and updates docs, examples, tests, and YAML configs to use uppercase backend values (e.g., "DEFAULT", "UCX", "NIXL", "MPI"). Removes an MPI-specific troubleshooting FAQ and replaces MPI KVCache env examples with UCX. No runtime control-flow changes. Changes
Sequence Diagram(s)(No sequence diagram generated — changes are configuration, docs, and value normalization only.) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
/bot run |
PR_Github #14754 [ run ] triggered by Bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🔭 Outside diff range comments (8)
examples/disaggregated/slurm/gen_yaml.py (3)
9-15
: Python 3.8 compatibility: use typing.Tuple instead of builtin tuple[...]tuple[...] requires Python 3.9+. Project guideline is Python 3.8+. Replace with Tuple[...] and import Tuple.
Apply:
-from typing import Dict, List +from typing import Dict, List, Tuple @@ -def process_node_and_task() -> tuple[int, List[str], List[str]]: +def process_node_and_task() -> Tuple[int, List[str], List[str]]:
77-86
: Python 3.8 compatibility: fix return annotationSame issue as above: use Tuple[List[str], int].
Apply:
-def generate_urls(..., task_nodes_offset: int = 0) -> tuple[List[str], int]: +def generate_urls(..., task_nodes_offset: int = 0) -> Tuple[List[str], int]:Note: keep the actual parameters unchanged; only the return type needs Tuple[…]. Ensure Tuple is imported.
1-1
: Missing required NVIDIA copyright headerPer repo guidelines, add a current-year header at the top of .py files.
Apply:
+# Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. + import argparseexamples/disaggregated/README.md (1)
130-131
: Typo in example YAML key: “refersh_interval” should be “refresh_interval”This will confuse users copying the snippet. Please fix the key name.
- ref ersh_interval: 10.0 + refresh_interval: 10.0tests/integration/defs/accuracy/test_disaggregated_serving.py (2)
460-467
: Lowercase “nixl” will fail validation — use uppercase “NIXL”
CacheTransceiverConfig.backend
now accepts only uppercase literals. Update both occurrences.Apply this diff:
- "cache_transceiver_config": { - "backend": "nixl" - } + "cache_transceiver_config": { + "backend": "NIXL" + }Also applies to: 591-598
600-602
: Remove stray no-op linesThese lines have no effect and are likely leftovers; they should be deleted.
- ctx_server_config["cache_transceiver_config"] - ctx_server_config["cache_transceiver_config"] + # (removed no-op lines)tests/unittest/llmapi/test_llm_args.py (1)
1-1
: Add NVIDIA SPDX header (required by repo guidelines)This file appears to be missing the standard header. Please add:
# SPDX-FileCopyrightText: Copyright (c) 2022-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-License-Identifier: Apache-2.0tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py (1)
1-1
: Add NVIDIA SPDX header (required by repo guidelines)Please add the standard header at the top:
# SPDX-FileCopyrightText: Copyright (c) 2022-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-License-Identifier: Apache-2.0
🧹 Nitpick comments (13)
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_mpi.yaml (1)
12-12
: Uppercasing to "MPI" matches the new Literal contract.Looks correct and aligned with the enforced uppercase backend values.
- Minor: quoting style is mixed across configs (some use quoted strings, others unquoted). Consider standardizing for consistency.
- Optional: If MPI is not exercised in CI or is discouraged in docs, consider adding a brief comment in this file or moving tests to UCX to avoid drift.
Also applies to: 20-20
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_overlap.yaml (1)
14-14
: DEFAULT uppercasing is consistent with llm_args validation.Change is correct and should pass the new Literal validation.
Standardize quoting across YAML (either consistently quoted or unquoted) for readability.
Also applies to: 24-24
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_overlap_cuda_graph.yaml (1)
12-12
: DEFAULT uppercasing matches the new accepted values.Good alignment with the schema; no functional issues anticipated.
Consider normalizing quoting across files for consistency.
Also applies to: 23-23
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_nixl.yaml (1)
12-12
: NIXL uppercasing is correct per the new Literal set.Change is correct and consistent with the updated parser expectations.
Quoting is fine, but consider consistent quoting style across the repo.
Also applies to: 20-20
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_one_mtp.yaml (1)
17-17
: DEFAULT uppercasing aligns with the new schema.Looks good and should parse cleanly.
Keep quoting style consistent across similar configs.
Also applies to: 26-26
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_one_mtp_attention_dp_overlap.yaml (1)
17-17
: DEFAULT uppercasing is correct.Update matches the enforced Literal values.
Consider a repo-wide lint or check to keep backend literals consistently styled (quoted vs unquoted).
Also applies to: 27-27
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp1.yaml (1)
13-13
: DEFAULT uppercasing is aligned with llm_args.No issues from a parsing or validation perspective.
Optional consistency cleanup: unify quoting convention across YAML configs.
Also applies to: 21-21
benchmarks/cpp/README.md (2)
339-349
: Add language to fenced code block for lintingmarkdownlint MD040 suggests specifying a language. For these shell snippets, annotate the opening fence with bash.
Example:
-``` +```bash export TRTLLM_USE_UCX_KVCACHE=1 mpirun -n ${proc} benchmarks/disaggServerBenchmark ...--- `347-351`: **Add language to fenced code block here as well** Annotate the opening fence with bash to satisfy MD040. ```diff -``` +```bash export TRTLLM_USE_UCX_KVCACHE=1 mpirun -n 7 benchmarks/disaggServerBenchmark ...
</blockquote></details> <details> <summary>examples/cpp/executor/README.md (1)</summary><blockquote> `127-134`: **Update to UCX env var looks correct; minor wording nit** The env var switch is good. Nit: add a colon after “Note” and set fenced language to bash. ```diff -Note setting `TRTLLM_USE_UCX_KVCACHE=1` is required to run disaggregated executor. +Note: setting `TRTLLM_USE_UCX_KVCACHE=1` is required to run disaggregated executor.
Also annotate code fences:
-``` +```bash export TRTLLM_USE_UCX_KVCACHE=1 ...</blockquote></details> <details> <summary>tests/integration/defs/disaggregated/test_configs/disagg_config_ngram.yaml (1)</summary><blockquote> `12-13`: **Quoting style inconsistency (optional)** This file uses "DEFAULT" (quoted) while others use DEFAULT (unquoted). Both are valid YAML strings, but consider standardizing for consistency across configs. Also applies to: 20-21 </blockquote></details> <details> <summary>tensorrt_llm/llmapi/llm_args.py (1)</summary><blockquote> `1042-1055`: **Make backend input case-insensitive and clarify default** Current typing enforces uppercase, which is fine given the BREAKING tag, but we can reduce friction by normalizing user input to uppercase before validation. Also, docs state default backend is UCX; here the default is `None`. If `from_string(None)` doesn’t resolve to UCX, that’s a mismatch. - Option A (preferred): accept case-insensitive input via a validator, keep default as-is if `from_string(None)` maps to UCX. - Option B: set default to `"UCX"` to precisely reflect docs. Apply Option A (case-insensitive normalization): ```diff class CacheTransceiverConfig(StrictBaseModel, PybindMirror): @@ - backend: Optional[Literal["DEFAULT", "UCX", "NIXL", "MPI"]] = Field( + backend: Optional[Literal["DEFAULT", "UCX", "NIXL", "MPI"]] = Field( default=None, description= "The communication backend type to use for the cache transceiver.") + + @field_validator("backend", mode="before") + @classmethod + def _normalize_backend(cls, v): + if v is None: + return v + vv = str(v).upper() + if vv not in ("DEFAULT", "UCX", "NIXL", "MPI"): + raise ValueError(f"Invalid cache_transceiver_config.backend: {v}") + return vv
Alternatively, Option B (explicit default):
- backend: Optional[Literal["DEFAULT", "UCX", "NIXL", "MPI"]] = Field( - default=None, + backend: Optional[Literal["DEFAULT", "UCX", "NIXL", "MPI"]] = Field( + default="UCX", description= "The communication backend type to use for the cache transceiver.")If you prefer Option A, please confirm that
_CacheTransceiverBackendType.from_string(None)
indeed defaults to UCX to keep behavior consistent with the docs.tests/integration/defs/accuracy/test_disaggregated_serving.py (1)
118-125
: Optional: avoid forcing UCX via env when backend is configuredYou always set
TRTLLM_USE_UCX_KVCACHE=1
regardless of the configured backend. Prefer relying on the explicitcache_transceiver_config.backend
to reduce environment coupling and prevent accidental mismatches in backend-specific tests.- env_ctx["TRTLLM_USE_UCX_KVCACHE"] = "1" + # Rely on cache_transceiver_config.backend; avoid forcing UCX via env vars env_ctx["CUDA_VISIBLE_DEVICES"] = ",".join(map(str, range(ctx_total_gpus))) @@ - env_gen["TRTLLM_USE_UCX_KVCACHE"] = "1" + # Rely on cache_transceiver_config.backend; avoid forcing UCX via env vars env_gen["CUDA_VISIBLE_DEVICES"] = ",".join( map(str, range(ctx_total_gpus, ctx_total_gpus + gen_total_gpus)))
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (48)
benchmarks/cpp/README.md
(1 hunks)docs/source/advanced/disaggregated-service.md
(0 hunks)examples/cpp/executor/README.md
(1 hunks)examples/disaggregated/README.md
(3 hunks)examples/disaggregated/disagg_config.yaml
(1 hunks)examples/disaggregated/slurm/gen_yaml.py
(2 hunks)tensorrt_llm/llmapi/llm_args.py
(1 hunks)tests/integration/defs/accuracy/test_disaggregated_serving.py
(10 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_cache_aware_balance.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_cache_aware_balance_deepseek_v3.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_cache_reuse.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_cache_reuse_deepseek_v3.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_conditional.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_conditional_deepseek_v3.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp2_genpp2.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp2_gentp2.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp4_genpp4.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite.yaml
(1 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_one_mtp.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_one_mtp_attention_dp_overlap.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_two_mtp.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_genpp2.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp1.yaml
(1 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp1_trt_backend.yaml
(1 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite.yaml
(1 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_one.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_one_mtp.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_overlap.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_overlap_cuda_graph.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_mpi.yaml
(1 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_nixl.yaml
(1 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_overlap_cuda_graph.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_ucx.yaml
(1 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2pp2_gentp2pp2.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_cuda_graph_padding.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_diff_max_tokens.yaml
(1 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_gen_only.yaml
(1 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_gen_only_trt_backend.yaml
(1 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_load_balance.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_mixed.yaml
(1 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ngram.yaml
(1 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_overlap.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_trt_backend.yaml
(1 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_trtllm_sampler.yaml
(2 hunks)tests/integration/defs/disaggregated/test_disaggregated_etcd.py
(1 hunks)tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py
(3 hunks)tests/unittest/llmapi/test_llm_args.py
(1 hunks)
💤 Files with no reviewable changes (1)
- docs/source/advanced/disaggregated-service.md
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py
: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.
Files:
examples/disaggregated/slurm/gen_yaml.py
tests/unittest/llmapi/test_llm_args.py
tests/integration/defs/disaggregated/test_disaggregated_etcd.py
tensorrt_llm/llmapi/llm_args.py
tests/integration/defs/accuracy/test_disaggregated_serving.py
tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
examples/disaggregated/slurm/gen_yaml.py
tests/unittest/llmapi/test_llm_args.py
tests/integration/defs/disaggregated/test_disaggregated_etcd.py
tensorrt_llm/llmapi/llm_args.py
tests/integration/defs/accuracy/test_disaggregated_serving.py
tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py
🪛 Ruff (0.12.2)
tests/unittest/llmapi/test_llm_args.py
667-667: CacheTransceiverConfig
may be undefined, or defined from star imports
(F405)
675-675: CacheTransceiverConfig
may be undefined, or defined from star imports
(F405)
🪛 markdownlint-cli2 (0.17.2)
benchmarks/cpp/README.md
346-346: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
examples/cpp/executor/README.md
129-129: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (39)
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp2_gentp2.yaml (1)
19-19
: All backend values are correctly uppercasedI ran a repository-wide search for any lowercase
backend:
entries in YAML, MD, and Python files and found none. ✅ Ready to merge.tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_overlap_cuda_graph.yaml (1)
13-13
: No lowercasecache_transceiver_config.backend
entries remainThe only lowercase
backend: pytorch
occurs at the top level in
tests/integration/defs/disaggregated/test_configs/disagg_config_ngram.yaml:5
,
not under anycache_transceiver_config
block. Allcache_transceiver_config.backend
values are already uppercase.tests/integration/defs/disaggregated/test_configs/disagg_config_diff_max_tokens.yaml (1)
13-13
: Consistent normalization to uppercase backendDEFAULT aligns with the updated allowed literals. Looks good.
Also applies to: 21-21
tests/integration/defs/disaggregated/test_configs/disagg_config_overlap.yaml (1)
19-19
: Backend value normalized to DEFAULTMatches the new schema expectations and keeps both blocks consistent.
Also applies to: 34-34
tests/integration/defs/disaggregated/test_configs/disagg_config_cache_reuse_deepseek_v3.yaml (1)
18-18
: Uppercase backend literal applied correctlyDEFAULT is the expected value; both context and generation sections are in sync.
Also applies to: 33-33
tests/integration/defs/disaggregated/test_configs/disagg_config_conditional_deepseek_v3.yaml (1)
21-21
: Alignment with uppercase backend contractDEFAULT usage is correct in both locations.
Also applies to: 36-36
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite.yaml (1)
13-13
: DEFAULT backend literal adoptedChange is correct and consistent with the new validation logic.
Also applies to: 21-21
tests/integration/defs/disaggregated/test_configs/disagg_config_gen_only_trt_backend.yaml (1)
16-16
: DEFAULT backend literal adoptedGeneration servers’ cache_transceiver_config is updated correctly.
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp1_trt_backend.yaml (1)
11-11
: Uppercase backend literal applied consistentlyBoth sections correctly use DEFAULT per the new accepted values.
Also applies to: 19-19
examples/disaggregated/slurm/gen_yaml.py (2)
200-201
: Normalize backend to uppercase DEFAULT — aligned with parser expectationsMatches the new Literal["DEFAULT","UCX","NIXL","MPI"]. Good.
228-229
: Normalize backend to uppercase DEFAULT — aligned with parser expectationsConsistent with the new enforcement in llm_args.py. Good.
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_one_mtp.yaml (1)
17-17
: No lowercasebackend
values detected
The repository-wide search returned only uppercaseDEFAULT
,UCX
,MPI
, andNIXL
entries for allbackend
settings. All backends have been correctly normalized.tests/integration/defs/disaggregated/test_configs/disagg_config_trtllm_sampler.yaml (1)
19-19
: Backend normalized to DEFAULTBoth blocks updated; matches the enforced uppercase. Good.
Also applies to: 35-35
tests/integration/defs/disaggregated/test_configs/disagg_config_trt_backend.yaml (1)
13-13
: Backend normalized to DEFAULTUpdates are correct and consistent across context/generation blocks.
Also applies to: 21-21
tests/integration/defs/disaggregated/test_configs/disagg_config_mixed.yaml (1)
13-13
: Backend normalized to DEFAULTConsistent with the new accepted literals. LGTM.
Also applies to: 21-21
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_genpp2.yaml (1)
19-19
: Backend normalized to DEFAULTBoth instances updated as expected. Good.
Also applies to: 34-34
tests/integration/defs/disaggregated/test_configs/disagg_config_load_balance.yaml (1)
22-22
: Backend normalized to DEFAULTMatches the uppercase-only change; consistent across both sections.
Also applies to: 41-41
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp4_genpp4.yaml (1)
19-19
: Backend normalized to DEFAULTBoth updated; aligned with llm_args constraints.
Also applies to: 34-34
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp2_genpp2.yaml (2)
19-21
: LGTM: backend normalized to DEFAULTCase normalization aligns with parser expectations. No other changes observed.
Also applies to: 34-36
19-21
: All backend values are uppercaseA case-sensitive grep across
tests/
,examples/
,docs/
, andtensorrt_llm/
returned no lowercasebackend:
values—everything (DEFAULT, UCX, MPI, NIXL) is correctly capitalized now. No further action needed.tests/integration/defs/disaggregated/test_configs/disagg_config_cuda_graph_padding.yaml (1)
19-21
: LGTM: backend normalized to DEFAULTConsistent with new Literal set. No functional changes to other fields.
Also applies to: 37-39
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_two_mtp.yaml (1)
17-20
: LGTM: backend normalized to DEFAULTMatches the enforced uppercase literals. Disagg config semantics unchanged.
Also applies to: 28-29
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2pp2_gentp2pp2.yaml (1)
19-21
: LGTM: backend normalized to DEFAULTTwo occurrences updated; consistent with parser expectations.
Also applies to: 34-36
tests/integration/defs/disaggregated/test_configs/disagg_config_gen_only.yaml (1)
16-20
: LGTM: backend normalized to DEFAULTGeneration-only config updated accordingly. No other changes.
tests/integration/defs/disaggregated/test_configs/disagg_config_ngram.yaml (1)
12-15
: LGTM: backend normalized to "DEFAULT"Functionally correct.
Also applies to: 20-23
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite.yaml (1)
13-13
: Uppercase backend literal — LGTMUsing
DEFAULT
aligns with the updatedCacheTransceiverConfig.backend
validation.Also applies to: 21-21
examples/disaggregated/disagg_config.yaml (1)
14-14
: Backend normalization to "DEFAULT" — LGTMConsistent with the new uppercase-only
Literal
inllm_args.py
.Also applies to: 22-22
examples/disaggregated/README.md (1)
15-15
: Disaggregated KV cache docs updates — LGTM
- Backend options and examples now show uppercase tokens and UCX default.
- Split ctx/gen extra-llm-api YAMLs clarifies setup.
- Dynamic scaling section aligned with UCX usage.
Looks good.
Also applies to: 17-17, 19-40, 42-51, 112-118, 198-198
tests/integration/defs/accuracy/test_disaggregated_serving.py (1)
224-225
: Backend normalization to "DEFAULT" in test configs — LGTMMatches new uppercase-only validation in
llm_args.py
and keeps tests aligned with docs.Also applies to: 233-234, 268-270, 308-309, 316-317, 359-360, 373-374, 422-424, 497-498, 542-543, 549-550, 629-630, 636-637
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_ucx.yaml (1)
12-12
: UCX backend normalization — LGTMUppercase
UCX
complies with the new validation inllm_args.py
.Also applies to: 20-20
tests/integration/defs/disaggregated/test_configs/disagg_config_cache_aware_balance_deepseek_v3.yaml (1)
20-20
: DEFAULT backend normalization — LGTMConsistent and correct with the updated schema.
Also applies to: 36-36
tests/integration/defs/disaggregated/test_configs/disagg_config_cache_reuse.yaml (1)
18-18
: DEFAULT backend normalization — LGTMValid YAML; matches the uppercase-only contract.
Also applies to: 33-33
tests/integration/defs/disaggregated/test_configs/disagg_config_conditional.yaml (1)
21-21
: Backend normalization verified across repositoryRan the suggested regex scan across all
.yaml
,.yml
, and.py
files and confirmed there are no lowercasebackend
values—every entry usesDEFAULT
,UCX
,MPI
, orNIXL
. Ready to merge.tests/unittest/llmapi/test_llm_args.py (1)
667-671
: UCX uppercasing matches the new Literal contractSwitching backend to "UCX" in both the valid and invalid-args tests is correct and consistent.
Also applies to: 675-676
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_one.yaml (1)
14-14
: Uppercase DEFAULT applied consistentlyBoth occurrences updated to DEFAULT. No structural changes — this is correct.
Also applies to: 23-23
tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py (1)
134-134
: Switch to backend="DEFAULT" is correctAll three CacheTransceiverConfig constructions now match the uppercase Literal. Consistent with the API change.
Also applies to: 277-277, 380-380
tests/integration/defs/disaggregated/test_disaggregated_etcd.py (1)
247-249
: Generated config now uses backend: "DEFAULT" — goodMatches the new accepted values and keeps env-based UCX usage consistent.
Also applies to: 255-257
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp.yaml (1)
14-14
: DEFAULT normalization — OKBoth cache_transceiver_config.backend entries are updated to DEFAULT.
Also applies to: 23-23
tests/integration/defs/disaggregated/test_configs/disagg_config_cache_aware_balance.yaml (1)
24-24
: DEFAULT applied in both sectionsUppercase values align with the updated validation logic.
Also applies to: 38-38
export TRTLLM_USE_UCX_KVCACHE=1 | ||
mpirun -n ${proc} benchmarks/disaggServerBenchmark --context_engine_dirs ${context_engine_0},${context_engine_1}...,${context_engine_{m-1}} \ | ||
--generation_engine_dirs ${generation_engine_0},${generation_engine_1}...,${generation_engine_{n-1}} --dataset ${dataset_path} | ||
``` | ||
This command will launch m context engines and n generation engines. You need to ensure `proc` is equal to the sum of the number of processes required for each engine plus 1. Since we use orchestrator mode for `disaggServerBenchmark` we need an additional process as the orchestrator. For example, if there are two context engines (one is TP2_PP1,another is TP1_PP1) and two generation engines(one is TP2_PP1,another is TP1_PP1), then the `proc` value should be set to 7. | ||
for example: | ||
``` | ||
export TRTLLM_USE_MPI_KVCACHE=1 | ||
export TRTLLM_USE_UCX_KVCACHE=1 | ||
mpirun -n 7 benchmarks/disaggServerBenchmark --context_engine_dirs ${llama_7b_tp2_pp1_dir},${llama_7b_tp1_pp1_dir} --generation_engine_dirs ${llama_7b_tp1_pp1_dir},${llama_7b_tp2_pp1_dir} --dataset ${dataset_path} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Switch to UCX env var looks correct
TRTLLM_USE_UCX_KVCACHE supersedes MPI variant in disaggServerBenchmark examples. Please ensure any internal tooling/docs refer to the new env var.
Quick scan for old variable:
🏁 Script executed:
#!/bin/bash
rg -n --no-heading -e 'TRTLLM_USE_MPI_KVCACHE'
Length of output: 1674
Ensure all references to the old MPI KV-cache env var are updated
I confirmed your switch to TRTLLM_USE_UCX_KVCACHE
in benchmarks/cpp/README.md
is correct, but there are still numerous references to the deprecated TRTLLM_USE_MPI_KVCACHE
elsewhere in the repo. Please update these to use the new UCX-based variable for consistency:
• tensorrt_llm/_torch/pyexecutor/kv_cache_transceiver.py:44
• tests/integration/defs/triton_server/test.sh:1130, 1145
• tests/integration/defs/triton_server/test_triton_llm.py:3380
• tests/integration/defs/cpp/test_e2e.py:48
• tests/integration/defs/cpp/test_multi_gpu.py:35
• tests/integration/defs/disaggregated/test_disaggregated.py:706
• docker/common/install_mpi4py.sh:34, 47, 50
• examples/disaggregated/README.md:155, 198
• cpp/tensorrt_llm/common/envUtils.cpp:271
Let’s replace TRTLLM_USE_MPI_KVCACHE
with TRTLLM_USE_UCX_KVCACHE
in these locations (and any other internal tooling/docs) to complete the migration.
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
346-346: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
config = CacheTransceiverConfig(backend="UCX", | ||
max_tokens_in_buffer=1024) | ||
assert config.backend == "ucx" | ||
assert config.backend == "UCX" | ||
assert config.max_tokens_in_buffer == 1024 | ||
|
||
# Arbitrary arguments should be rejected | ||
with pytest.raises( | ||
pydantic_core._pydantic_core.ValidationError) as exc_info: | ||
CacheTransceiverConfig(backend="ucx", invalid_config="should_fail") | ||
CacheTransceiverConfig(backend="UCX", invalid_config="should_fail") | ||
assert "invalid_config" in str(exc_info.value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Fix Ruff F405: star import makes CacheTransceiverConfig ambiguous
Ruff flags these lines because CacheTransceiverConfig originates from a star import. Two options:
- Short-term minimal patch (keeps current code style):
- Add an explicit import alongside the star import to satisfy Ruff.
-from tensorrt_llm.llmapi.llm_args import *
+from tensorrt_llm.llmapi.llm_args import * # noqa: F403 (intentional)
+from tensorrt_llm.llmapi.llm_args import CacheTransceiverConfig # Explicit import resolves F405
- Recommended refactor (aligns with project guideline to maintain namespace):
- Replace the star import with a module import and qualify usages (larger change across this file).
-from tensorrt_llm.llmapi.llm_args import *
+import tensorrt_llm.llmapi.llm_args as llm_args
Then reference via llm_args.CacheTransceiverConfig, llm_args.CudaGraphConfig, etc.
If you want, I can follow up with a patch converting this file to fully-qualified imports.
You can confirm the F405 fix locally with:
🏁 Script executed:
#!/bin/bash
# If ruff is available:
ruff check tests/unittest/llmapi/test_llm_args.py || true
Length of output: 65602
Refactor star import to resolve Ruff F403/F405
Ruff reports 105 F403/F405 errors in tests/unittest/llmapi/test_llm_args.py because of the wildcard import. Rather than adding dozens of explicit imports, replace the star import with a module import and qualify every symbol usage:
- In tests/unittest/llmapi/test_llm_args.py, remove:
from tensorrt_llm.llmapi.llm_args import *
- Add:
import tensorrt_llm.llmapi.llm_args as llm_args
- Update all references, for example:
- config = CacheTransceiverConfig(backend="UCX", max_tokens_in_buffer=1024) + config = llm_args.CacheTransceiverConfig(backend="UCX", max_tokens_in_buffer=1024)
This single refactor will eliminate all F403/F405 warnings in the file.
🧰 Tools
🪛 Ruff (0.12.2)
667-667: CacheTransceiverConfig
may be undefined, or defined from star imports
(F405)
675-675: CacheTransceiverConfig
may be undefined, or defined from star imports
(F405)
🤖 Prompt for AI Agents
In tests/unittest/llmapi/test_llm_args.py around lines 667 to 676, the file uses
a wildcard import from tensorrt_llm.llmapi.llm_args causing many Ruff F403/F405
warnings; remove the star import and replace it with a module import (import
tensorrt_llm.llmapi.llm_args as llm_args), then update every symbol reference in
the file to be qualified with the llm_args. prefix (e.g.,
llm_args.CacheTransceiverConfig) so all names are explicitly referenced and the
F403/F405 errors are resolved.
PR_Github #14754 [ run ] completed with state |
/bot run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (6)
tests/integration/defs/accuracy/test_disaggregated_serving.py (6)
512-514
: Lowercase "nixl" will break with new uppercase-only backend literals.The backend is now validated against uppercase literals ["DEFAULT", "UCX", "NIXL", "MPI"]. Using "nixl" will fail argument parsing/validation.
Apply this diff to fix:
ctx_server_config = { "disable_overlap_scheduler": True, "cache_transceiver_config": { - "backend": "nixl" + "backend": "NIXL" } } gen_server_config = { "disable_overlap_scheduler": True, "cache_transceiver_config": { - "backend": "nixl" + "backend": "NIXL" } }Also applies to: 516-520
644-646
: Same issue: lowercase "nixl" should be "NIXL".Mirror the uppercase normalization here as well.
Apply this diff:
ctx_server_config = { "disable_overlap_scheduler": True, "cache_transceiver_config": { - "backend": "nixl" + "backend": "NIXL" } } gen_server_config = { "disable_overlap_scheduler": True, "cache_transceiver_config": { - "backend": "nixl" + "backend": "NIXL" } }Also applies to: 648-652
653-654
: Redundant no-op lines; remove to avoid confusion.These two lines evaluate the dict key but do nothing, and should be deleted.
Apply this diff:
- ctx_server_config["cache_transceiver_config"] - ctx_server_config["cache_transceiver_config"]
53-53
: Type annotation uses PEP 585 builtins; breaks on Python 3.8.Per coding guidelines, target Python 3.8+. The annotation
list[concurrent.futures.Future[RequestOutput]]
requires 3.9+. Use typing.List and avoid parameterizing concurrent.futures.Future.Apply this diff:
- self.futures: list[concurrent.futures.Future[RequestOutput]] = [] + self.futures: List[concurrent.futures.Future] = []
61-64
: ThreadPoolExecutor.shutdown(cancel_futures=...) is not available in Python 3.8.Passing cancel_futures causes a TypeError on 3.8. Since you already cancel futures above, call shutdown without that parameter to retain 3.8 compatibility.
Apply this diff:
- self.shutdown(wait=True, cancel_futures=True) + self.shutdown(wait=True)
1-4
: Missing NVIDIA copyright header.Per coding guidelines, prepend the NVIDIA copyright header (current year) to all source files.
Apply this diff:
+# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. + # I want to create accuracy tests for disaggregated serving. # I need to to this by creating a new class that mimics LLM class. Instead of implementing the # actual methods it will send OAI requests to the disaggregated serving endpoint. # Please take a look at the existing test_llm_api_pytorch.py file for reference.
🧹 Nitpick comments (2)
tests/integration/defs/accuracy/test_disaggregated_serving.py (2)
127-133
: Consider gating TRTLLM_USE_UCX_KVCACHE by selected backend.You're forcing UCX KVCache for all tests, including NIXL. If NIXL doesn't need or conflicts with this, prefer conditional enabling to avoid unintended behavior.
Apply this diff to set the env var only when using UCX-like backends:
- env_ctx["TRTLLM_USE_UCX_KVCACHE"] = "1" + if ctx_server_config.get("cache_transceiver_config", {}).get("backend", "DEFAULT") in {"DEFAULT", "UCX"}: + env_ctx["TRTLLM_USE_UCX_KVCACHE"] = "1" ... - env_gen["TRTLLM_USE_UCX_KVCACHE"] = "1" + if gen_server_config.get("cache_transceiver_config", {}).get("backend", "DEFAULT") in {"DEFAULT", "UCX"}: + env_gen["TRTLLM_USE_UCX_KVCACHE"] = "1"Also applies to: 148-154
197-201
: Health check loop can hang; add network timeout.requests.get without timeout can block. Use a small timeout to fail fast and keep the loop responsive.
Apply this diff:
- response = requests.get("http://localhost:8000/health") + response = requests.get("http://localhost:8000/health", timeout=2)
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these settings in your CodeRabbit configuration.
📒 Files selected for processing (1)
tests/integration/defs/accuracy/test_disaggregated_serving.py
(10 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py
: Python code must target Python 3.8+
Python indentation: 4 spaces, no tabs
Maintain module namespace in imports (from package.subpackage import foo; then use foo.SomeClass())
Python file names use snake_case
Python class names use PascalCase
Python functions/methods and local variables use snake_case; variables starting with a number get k_ prefix (e.g., k_99th_percentile)
Global variables use G_ prefixed UPPER_SNAKE_CASE (e.g., G_MY_GLOBAL)
Constants use UPPER_SNAKE_CASE in Python
Avoid shadowing variables from outer scopes in Python
Initialize all externally visible members of a Python class in init
Prefer docstrings for interfaces used outside a file; comments for local code
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes/variables inline with short docstrings
Avoid reflection when simple alternatives exist (e.g., prefer explicit parameters over dict(**locals()))
In try/except, catch the narrowest exceptions possible
For duck-typing with try/except, keep try body minimal and put logic in else
Files:
tests/integration/defs/accuracy/test_disaggregated_serving.py
**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
Prepend NVIDIA copyright header (current year) to all source files
Files:
tests/integration/defs/accuracy/test_disaggregated_serving.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (8)
tests/integration/defs/accuracy/test_disaggregated_serving.py (8)
263-264
: Uppercasing backend to DEFAULT is correct and aligned with new literals.Using "DEFAULT" here matches the updated CacheTransceiverConfig backend literals and prevents validation errors at runtime.
Please ensure any other disaggregated tests/configs still using lower-case "default" are updated in this repo (including YAMLs and docs). If you want, I can scan and patch remaining occurrences.
Also applies to: 272-274
311-312
: Good: tests now explicitly use uppercase DEFAULT backend.This avoids failures with the new strict Literal typing.
351-352
: Good: ngram test configs updated to DEFAULT.Consistent with repository-wide backend normalization.
Also applies to: 359-360
403-404
: Good: eagle3 test configs updated to DEFAULT.No issues spotted with these changes.
Also applies to: 417-418
474-475
: Good: Llama4Scout test configs updated to DEFAULT.Matches the new accepted backend values.
549-550
: Good: DeepSeek auto_dtype configs use DEFAULT.Consistent with new Literal enforcement.
595-596
: Good: Gemma test configs updated to DEFAULT.No further comments here.
Also applies to: 602-603
683-684
: Good: Qwen auto_dtype configs updated to DEFAULT.Consistent and correct.
Also applies to: 690-691
PR_Github #15283 [ run ] triggered by Bot |
PR_Github #15283 [ run ] completed with state |
/bot run |
PR_Github #15376 [ run ] triggered by Bot |
PR_Github #15376 [ run ] completed with state |
/bot run |
PR_Github #15428 [ run ] triggered by Bot |
PR_Github #15428 [ run ] completed with state |
/bot run |
PR_Github #15687 [ run ] triggered by Bot |
PR_Github #15687 [ run ] completed with state |
/bot run |
PR_Github #15923 [ run ] triggered by Bot |
PR_Github #15923 [ run ] completed with state |
/bot run |
PR_Github #16099 [ run ] triggered by Bot |
/bot run |
PR_Github #16244 [ run ] triggered by Bot |
Signed-off-by: Shixiaowei02 <[email protected]>
PR_Github #16244 [ run ] completed with state |
664ce9f
to
3823bee
Compare
/bot run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
benchmarks/cpp/README.md (2)
347-351
: Mirror the same fixes in the second snippet (bash fence + ./ path)Same notes as above: declare the code fence language and keep the executable path style consistent.
Apply:
-``` +```bash export TRTLLM_USE_UCX_KVCACHE=1 -mpirun -n 7 benchmarks/disaggServerBenchmark --context_engine_dirs ${llama_7b_tp2_pp1_dir},${llama_7b_tp1_pp1_dir} --generation_engine_dirs ${llama_7b_tp1_pp1_dir},${llama_7b_tp2_pp1_dir} --dataset ${dataset_path} +mpirun -n 7 ./benchmarks/disaggServerBenchmark --context_engine_dirs ${llama_7b_tp2_pp1_dir},${llama_7b_tp1_pp1_dir} --generation_engine_dirs ${llama_7b_tp1_pp1_dir},${llama_7b_tp2_pp1_dir} --dataset ${dataset_path}--- `339-349`: **Repo-wide clean-up: replace deprecated MPI KV-cache env var everywhere** This file is updated, but there are still many references to TRTLLM_USE_MPI_KVCACHE across the repo. Please complete the migration to TRTLLM_USE_UCX_KVCACHE in docs, scripts, tests, and helper tooling. Run to locate remaining occurrences: ```shell #!/bin/bash # Find deprecated and new KV-cache env vars (exclude typical build artifacts). rg -n --no-heading -g '!**/build/**' -g '!**/dist/**' -g '!**/.venv/**' -e '\bTRTLLM_USE_MPI_KVCACHE\b' -e '\bTRTLLM_USE_UCX_KVCACHE\b'
🧹 Nitpick comments (2)
benchmarks/cpp/README.md (2)
339-342
: UCX env var switch looks good; add bash fence + path consistencyThe move to TRTLLM_USE_UCX_KVCACHE is correct for disaggServerBenchmark. Please also label the fenced block as bash (MD040) and keep path style consistent with prior examples that use ./benchmarks/... for executables.
Apply:
-``` +```bash export TRTLLM_USE_UCX_KVCACHE=1 -mpirun -n ${proc} benchmarks/disaggServerBenchmark --context_engine_dirs ${context_engine_0},${context_engine_1}...,${context_engine_{m-1}} \ +mpirun -n ${proc} ./benchmarks/disaggServerBenchmark --context_engine_dirs ${context_engine_0},${context_engine_1}...,${context_engine_{m-1}} \ --generation_engine_dirs ${generation_engine_0},${generation_engine_1}...,${generation_engine_{n-1}} --dataset ${dataset_path}--- `339-342`: **Avoid misleading placeholder syntax in shell snippets** `${context_engine_{m-1}}` and `${generation_engine_{n-1}}` are not valid bash variable names. Consider using neutral placeholders or brace expansion examples to avoid copy/paste errors, e.g., `ctx_dir_{0..m-1}` and `gen_dir_{0..n-1}`, or angle-bracketed tokens like `<ctx_dir_0>,...,<ctx_dir_{m-1}>`. </blockquote></details> </blockquote></details> <details> <summary>📜 Review details</summary> **Configuration used**: Path: .coderabbit.yaml **Review profile**: CHILL **Plan**: Pro **💡 Knowledge Base configuration:** - MCP integration is disabled by default for public repositories - Jira integration is disabled by default for public repositories - Linear integration is disabled by default for public repositories You can enable these sources in your CodeRabbit configuration. <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between 044700fee7d836160c1083d86758cc8351436f83 and 3823bee58b3045444903185c8f8ae341f14033a8. </details> <details> <summary>📒 Files selected for processing (1)</summary> * `benchmarks/cpp/README.md` (1 hunks) </details> <details> <summary>🧰 Additional context used</summary> <details> <summary>🪛 markdownlint-cli2 (0.17.2)</summary> <details> <summary>benchmarks/cpp/README.md</summary> 346-346: Fenced code blocks should have a language specified (MD040, fenced-code-language) </details> </details> </details> <details> <summary>⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)</summary> * GitHub Check: Pre-commit Check </details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
PR_Github #16256 [ run ] triggered by Bot |
Do we need to apply similar changes on 1.0 release? @Shixiaowei02 |
PR_Github #16256 [ run ] completed with state |
Signed-off-by: Shixiaowei02 <[email protected]>
/bot run --only-multi-gpu-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/integration/defs/disaggregated/test_disaggregated.py (1)
1231-1265
: Optional: type-restrict and validate the backend parameter to fail fast on typosLocking the accepted values here will surface mistakes early in CI and keeps tests self-documenting with the new uppercase contract.
Apply this diff to restrict and validate the parameter:
-def get_config_for_benchmark(model_root, backend): +def get_config_for_benchmark(model_root: str, + backend: "Literal['DEFAULT','UCX','NIXL','MPI']"): + allowed = {"DEFAULT", "UCX", "NIXL", "MPI"} + if backend not in allowed: + raise ValueError(f"Invalid cache_transceiver backend: {backend}. " + f"Expected one of {sorted(allowed)}")Add the missing import near the top of the file:
from typing import Literal
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
tests/integration/defs/disaggregated/test_configs/disagg_config_gen_only_bs1.yaml
(2 hunks)tests/integration/defs/disaggregated/test_disaggregated.py
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- tests/integration/defs/disaggregated/test_configs/disagg_config_gen_only_bs1.yaml
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py
: Python code must target Python 3.8+
Python indentation: 4 spaces, no tabs
Maintain module namespace in imports (from package.subpackage import foo; then use foo.SomeClass())
Python file names use snake_case
Python class names use PascalCase
Python functions/methods and local variables use snake_case; variables starting with a number get k_ prefix (e.g., k_99th_percentile)
Global variables use G_ prefixed UPPER_SNAKE_CASE (e.g., G_MY_GLOBAL)
Constants use UPPER_SNAKE_CASE in Python
Avoid shadowing variables from outer scopes in Python
Initialize all externally visible members of a Python class in init
Prefer docstrings for interfaces used outside a file; comments for local code
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes/variables inline with short docstrings
Avoid reflection when simple alternatives exist (e.g., prefer explicit parameters over dict(**locals()))
In try/except, catch the narrowest exceptions possible
For duck-typing with try/except, keep try body minimal and put logic in else
Files:
tests/integration/defs/disaggregated/test_disaggregated.py
**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Prepend NVIDIA copyright header (current year) to all source files
Files:
tests/integration/defs/disaggregated/test_disaggregated.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
PR_Github #16298 [ run ] triggered by Bot |
PR_Github #16298 [ run ] completed with state |
Signed-off-by: Shixiaowei02 <[email protected]>
/bot run --only-multi-gpu-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
tests/integration/defs/accuracy/test_disaggregated_serving.py (4)
1-1
: Heads-up: missing NVIDIA copyright header (guideline).Coding guidelines require the NVIDIA copyright header at the top of all Python sources. Consider adding it in a follow-up if this file doesn’t already include it via a repo-wide mechanism.
516-518
: NIXL tests: verify interaction with unconditional UCX env.Inside launch_disaggregated_llm, we always export TRTLLM_USE_UCX_KVCACHE=1 (Line 129 and Line 150) regardless of the chosen backend. When backend="NIXL", that env var is likely ignored, but it could be confusing or lead to unexpected code paths if the server code checks it early.
- Ask: does setting TRTLLM_USE_UCX_KVCACHE=1 have any side effects when backend="NIXL"? If yes, consider scoping the env export to UCX/DEFAULT-only runs.
If needed, I can propose a minimal refactor to make the env opt-in via a function parameter (defaulting to UCX/DEFAULT) and disable it for NIXL tests.
Also applies to: 522-524
651-653
: Qwen3 NIXL test: double-check UCX env vs. NIXL backend.Same point as DeepSeek’s NIXL test: launch_disaggregated_llm always sets TRTLLM_USE_UCX_KVCACHE=1 (Lines 129, 150). Please confirm it’s benign under NIXL. If not, we should gate that env by backend.
Also applies to: 657-659
312-313
: Reduce duplication with a shared constant for default transceiver config (optional)."DEFAULT" appears many times across this file. A tiny constant reduces drift and edit churn:
+ # Near imports/top of file + CACHE_TRANSCEIVER_DEFAULT = {"backend": "DEFAULT"}Then, for call sites (example below):
-ctx_server_config["cache_transceiver_config"] = {"backend": "DEFAULT"} -gen_server_config["cache_transceiver_config"] = {"backend": "DEFAULT"} +ctx_server_config["cache_transceiver_config"] = CACHE_TRANSCEIVER_DEFAULT +gen_server_config["cache_transceiver_config"] = CACHE_TRANSCEIVER_DEFAULTRepeat similarly where appropriate.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
tests/integration/defs/accuracy/test_disaggregated_serving.py
(12 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py
: Python code must target Python 3.8+
Python indentation: 4 spaces, no tabs
Maintain module namespace in imports (from package.subpackage import foo; then use foo.SomeClass())
Python file names use snake_case
Python class names use PascalCase
Python functions/methods and local variables use snake_case; variables starting with a number get k_ prefix (e.g., k_99th_percentile)
Global variables use G_ prefixed UPPER_SNAKE_CASE (e.g., G_MY_GLOBAL)
Constants use UPPER_SNAKE_CASE in Python
Avoid shadowing variables from outer scopes in Python
Initialize all externally visible members of a Python class in init
Prefer docstrings for interfaces used outside a file; comments for local code
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes/variables inline with short docstrings
Avoid reflection when simple alternatives exist (e.g., prefer explicit parameters over dict(**locals()))
In try/except, catch the narrowest exceptions possible
For duck-typing with try/except, keep try body minimal and put logic in else
Files:
tests/integration/defs/accuracy/test_disaggregated_serving.py
**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Prepend NVIDIA copyright header (current year) to all source files
Files:
tests/integration/defs/accuracy/test_disaggregated_serving.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (9)
tests/integration/defs/accuracy/test_disaggregated_serving.py (9)
263-265
: Uppercasing backend literals aligns tests with the new LlmArgs schema.Switching to "DEFAULT" here is correct given CacheTransceiverConfig.backend now accepts uppercase-only literals. No logic changes; just prevents schema/validation failures.
Also applies to: 272-274
312-313
: Consistent with breaking change: backend → "DEFAULT".These updates keep the accuracy tests in sync with the uppercase-only backend enum. Good catch.
354-356
: ngram test updated to uppercase backend.Looks good and matches the Literal["DEFAULT","UCX","NIXL","MPI"] change.
Also applies to: 362-364
407-409
: eagle3 config uses uppercase backend.Change is correct; avoids config validation errors at server start.
Also applies to: 421-423
475-476
: Scout model: uppercase backend is correct.Keeps this test coherent with the schema change and the docs shift away from lowercase identifiers.
553-555
: DeepSeek-V3-Lite auto-dtype: backend uppercasing is correct.Matches the new allowed literals and prevents schema mismatches.
600-602
: Gemma3 test: backend uppercasing is correct.No functional change; aligns with the enum and doc normalization.
Also applies to: 607-609
689-691
: Qwen3 auto-dtype: uppercase backend is correct.Consistent with the schema change; nothing else to do.
Also applies to: 696-698
263-265
: No lingering lowercasebackend
values detected. The repository-wide grep only surfaced uppercase"DEFAULT"
usages and no lowercase occurrences ofdefault
,ucx
,nixl
, ormpi
. All existing assignments are already uppercase, so no further changes are needed.
PR_Github #16348 [ run ] triggered by Bot |
PR_Github #16348 [ run ] completed with state |
#7191 Yes, it has been merged. |
Summary by CodeRabbit
Documentation
Refactor
Tests