-
Notifications
You must be signed in to change notification settings - Fork 1.7k
[TRTLLM-7030][fix] Refactor the example doc of dist-serving #6766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: ShiXiaowei02 <[email protected]>
Signed-off-by: Shixiaowei02 <[email protected]>
📝 WalkthroughWalkthroughThe change standardizes cache transceiver backend strings to uppercase literals (DEFAULT/UCX/NIXL/MPI) across code, tests, and configs, and updates docs to use TRTLLM_USE_UCX_KVCACHE instead of TRTLLM_USE_MPI_KVCACHE. It also restructures disaggregated examples to use per-role YAML configs and adds rank allocation guidance. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Possibly related PRs
Suggested labels
Suggested reviewers
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
/bot run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 15
🔭 Outside diff range comments (4)
examples/disaggregated/slurm/benchmark/gen_yaml.py (1)
1-1
: examples/disaggregated/slurm/benchmark/gen_yaml.py – Add NVIDIA SPDX headerPer our Python file conventions, every non-test
.py
must start with the NVIDIA SPDX copyright header
including the current year (2025). Please insert the following lines at the very top of
examples/disaggregated/slurm/benchmark/gen_yaml.py
, before any imports:+ # SPDX-FileCopyrightText: Copyright (c) 2022-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. + # SPDX-License-Identifier: Apache-2.0 import argparseThis matches the canonical header used throughout the repo and satisfies compliance.
tests/integration/defs/disaggregated/test_disaggregated_etcd.py (1)
1-2
: Update copyright year to include 2025.Per project guidelines, headers should include the current year for .py files.
-# SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-FileCopyrightText: Copyright (c) 2022-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.tensorrt_llm/llmapi/llm_args.py (1)
1050-1054
: Normalize backend to uppercase for backward compatibilityThe Literal now enforces uppercase inputs. Existing external configs using "default"/"ucx" (lowercase) will fail validation. Add a pre-validation normalizer to accept lowercase and coerce to uppercase, keeping the codebase strict and user-facing BC:
@PybindMirror.mirror_pybind_fields(_CacheTransceiverConfig) class CacheTransceiverConfig(StrictBaseModel, PybindMirror): @@ - backend: Optional[Literal["DEFAULT", "UCX", "NIXL", "MPI"]] = Field( + backend: Optional[Literal["DEFAULT", "UCX", "NIXL", "MPI"]] = Field( default=None, description= "The communication backend type to use for the cache transceiver.") @@ def _to_pybind(self): return _CacheTransceiverConfig( backend=_CacheTransceiverBackendType.from_string(self.backend), max_tokens_in_buffer=self.max_tokens_in_buffer) + + @field_validator('backend', mode='before') + @classmethod + def normalize_backend(cls, v): + if v is None: + return v + if isinstance(v, str): + v = v.upper() + return vThis preserves compatibility with legacy lowercase configs while enforcing uppercase internally.
examples/disaggregated/README.md (1)
232-236
: Typo in metadata config: refersh_interval → refresh_intervalThe YAML key is misspelled; users copying this will get a non-working config.
Apply this diff:
- refersh_interval: 10.0 + refresh_interval: 10.0
🧹 Nitpick comments (26)
tests/integration/defs/disaggregated/test_configs/disagg_config_gen_only_trt_backend.yaml (1)
16-16
: Keep quoting style consistent across configsUppercasing looks good. For consistency with other files that quote this literal, consider quoting DEFAULT.
- backend: DEFAULT + backend: "DEFAULT"tests/integration/defs/disaggregated/test_configs/disagg_config_gen_only.yaml (1)
17-17
: Quote the backend literal for consistencyFunctionally equivalent, but quoting matches other config files and avoids mixed styles.
- backend: DEFAULT + backend: "DEFAULT"tests/integration/defs/disaggregated/test_configs/disagg_config_load_balance.yaml (1)
22-22
: Unify quoting within the same fileLine 22 is unquoted while line 41 is quoted. Pick one style; recommending quoted to match other configs.
- backend: DEFAULT + backend: "DEFAULT" @@ - backend: "DEFAULT" + backend: "DEFAULT"Also applies to: 41-41
tests/integration/defs/disaggregated/test_configs/disagg_config_trt_backend.yaml (1)
13-13
: Quote backend literals for consistency across test configsBoth entries are correct; quoting them keeps style uniform with other files.
- backend: DEFAULT + backend: "DEFAULT" @@ - backend: DEFAULT + backend: "DEFAULT"Also applies to: 21-21
tests/integration/defs/disaggregated/test_configs/disagg_config_cache_reuse.yaml (1)
18-18
: Quote backend string for consistencyYAML accepts bare words, but quoting string literals avoids ambiguity and keeps style consistent with other example configs.
Apply this diff:
- backend: DEFAULT + backend: "DEFAULT"And:
- backend: DEFAULT + backend: "DEFAULT"Also applies to: 33-33
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp1.yaml (1)
13-13
: Quote backend string for consistencyFor consistency with other configs and to avoid YAML ambiguity, quote the string literal.
Apply this diff:
- backend: DEFAULT + backend: "DEFAULT"And:
- backend: DEFAULT + backend: "DEFAULT"Also applies to: 21-21
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp1_trt_backend.yaml (1)
11-11
: Quote backend string for consistencyRecommend quoting to match style used in example configs.
Apply this diff:
- backend: DEFAULT + backend: "DEFAULT"And:
- backend: DEFAULT + backend: "DEFAULT"Also applies to: 19-19
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_one_mtp.yaml (1)
17-17
: Quote literal for YAML consistency.To match other string fields (e.g., backend: "pytorch") and avoid accidental keyword interpretation by tooling, quote the value.
- backend: DEFAULT + backend: "DEFAULT"Also applies to: 26-26
tests/integration/defs/disaggregated/test_configs/disagg_config_conditional.yaml (1)
21-21
: Prefer quoting string literals.Minor consistency tweak with other quoted strings in the file.
- backend: DEFAULT + backend: "DEFAULT"Also applies to: 36-36
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite.yaml (1)
13-13
: Nit: quote the string for clarity.Keeps style consistent and avoids any YAML/tooling edge-cases.
- backend: DEFAULT + backend: "DEFAULT"Also applies to: 21-21
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp4_genpp4.yaml (1)
19-19
: Nit: quote the backend value.Stylistic consistency with other quoted strings.
- backend: DEFAULT + backend: "DEFAULT"Also applies to: 34-34
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_one_mtp_attention_dp_overlap.yaml (1)
17-17
: Nit: add quotes around string literal.For consistency with the rest of the YAML config strings.
- backend: DEFAULT + backend: "DEFAULT"Also applies to: 27-27
tests/integration/defs/disaggregated/test_configs/disagg_config_diff_max_tokens.yaml (1)
13-13
: Uppercase ‘backend’ values confirmed; ensure parser normalizationAll occurrences of
backend: default/ucx/mpi/nixl
have been updated to uppercase (DEFAULT
,UCX
, etc.), and a repository-wide scan found no remaining lowercase entries.Next steps:
- Verify that the configuration loader (e.g., in your YAML parsing code) normalizes incoming
backend
values to uppercase at parse-time to maintain backward compatibility for configs that may still use lowercase.- No changes to the existing YAML files are required.
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_overlap_cuda_graph.yaml (1)
12-12
: Nit: unify YAML quoting style for string scalarsHere DEFAULT is unquoted, while other configs in this PR quote "DEFAULT". Consider standardizing (prefer quoting) for readability and to avoid YAML edge-cases.
Also applies to: 23-23
tests/integration/defs/disaggregated/test_configs/disagg_config_cache_aware_balance_deepseek_v3.yaml (1)
20-20
: Nit: consistent quoting across configsThese lines use quotes, while some other configs in this PR don’t. Consider consistently quoting string scalars repo-wide.
Also applies to: 36-36
tests/integration/defs/disaggregated/test_configs/disagg_config_torch_sampler.yaml (1)
19-19
: Nit: align YAML scalar quoting styleRecommend choosing a single style (quoted vs unquoted) for string values across all disaggregated configs.
Also applies to: 35-35
tests/integration/defs/disaggregated/test_configs/disagg_config_ngram.yaml (1)
12-12
: Nit: standardize quotingUnify quoting of string scalars across the PR for readability and consistency.
Also applies to: 21-21
benchmarks/cpp/README.md (1)
339-342
: Add language to fenced code blocks (bash) for readability and consistency.Use bash on the opening triple backticks enclosing the export/mpirun snippets so tooling renders it properly.
Example:
-``` +```bash export TRTLLM_USE_UCX_KVCACHE=1 mpirun -n ${proc} benchmarks/disaggServerBenchmark ... -``` +```Also applies to: 347-351
examples/disaggregated/slurm/simple_example/ctx_extra-llm-api-config.yaml (1)
4-6
: Optional: quote backend value for style consistency across configs.Some configs use quoted strings (e.g., "DEFAULT"). Consider:
- backend: UCX + backend: "UCX"examples/cpp/executor/README.md (1)
129-134
: Specify language on fenced block to satisfy markdownlint (MD040).Mark this code block as bash.
-``` +```bash export TRTLLM_USE_UCX_KVCACHE=1 mpirun -n <num_ranks> --allow-run-as-root --oversubscribe ./executorExampleDisaggregated ... -``` +```tests/unittest/llmapi/test_llm_args.py (2)
669-672
: LGTM on "UCX"; add a small fix for Ruff F405The value change is correct. Ruff warns F405 here due to star import; add an inline noqa to keep the test concise:
- config = CacheTransceiverConfig(backend="UCX", + config = CacheTransceiverConfig(backend="UCX", # noqa: F405 max_tokens_in_buffer=1024)Alternatively, explicitly import the class to satisfy both Ruff and the project’s “maintain namespace” guideline:
import tensorrt_llm.llmapi.llm_args as llm_args config = llm_args.CacheTransceiverConfig(backend="UCX", max_tokens_in_buffer=1024)
677-677
: Mirror the F405 fix on the negative testAdd noqa here as well:
- CacheTransceiverConfig(backend="UCX", invalid_config="should_fail") + CacheTransceiverConfig(backend="UCX", invalid_config="should_fail") # noqa: F405If you adopt namespaced imports as above, switch to llm_args.CacheTransceiverConfig to avoid noqa.
tensorrt_llm/serve/openai_disagg_server.py (1)
207-209
: Good defensive check; consider tightening validationThe None check is useful. If protocol guarantees ctx_request_id is a positive integer, also validate type/range:
- isinstance(ctx_request_id, int)
- ctx_request_id > 0
Optionally return a 4xx HTTP error upstream instead of generic ValueError, depending on error handling policy.
I can add a focused unit/integration test that asserts a missing/invalid ctx_request_id raises the expected error. Want a PR follow-up?
examples/disaggregated/README.md (3)
113-115
: Add a language to the fenced code block (markdownlint MD040)Specify the language for the code block.
Apply this diff:
-``` +```bash python3 ./clients/disagg_client.py -c disagg_config.yaml -p ./clients/prompts.json -e chat--- `171-199`: **SLURM srun example: ensure concurrency and parameter consistency** Consider mirroring the launch.slurm fixes here: - Background long-running srun services for context and generation; keep proxy in foreground. - Ensure the generation tp_size in the command matches the comment (4). Would you like me to propose a full revised snippet aligned with the launch.slurm pattern? --- `245-250`: **Minor: log filename reuse** The example for adding another generation server writes to log_gen_0 again. Consider log_gen_1 to avoid clobbering. ```diff - --metadata_server_config_file ./metadata_config.yaml &> log_gen_0 & + --metadata_server_config_file ./metadata_config.yaml &> log_gen_1 &
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (55)
benchmarks/cpp/README.md
(1 hunks)docs/source/advanced/disaggregated-service.md
(0 hunks)docs/source/blogs/tech_blog/blog5_Disaggregated_Serving_in_TensorRT-LLM.md
(1 hunks)examples/cpp/executor/README.md
(1 hunks)examples/disaggregated/README.md
(5 hunks)examples/disaggregated/disagg_config.yaml
(1 hunks)examples/disaggregated/slurm/benchmark/gen_yaml.py
(2 hunks)examples/disaggregated/slurm/simple_example/ctx_extra-llm-api-config.yaml
(1 hunks)examples/disaggregated/slurm/simple_example/disagg_config.yaml
(1 hunks)examples/disaggregated/slurm/simple_example/gen_extra-llm-api-config.yaml
(1 hunks)examples/disaggregated/slurm/simple_example/launch.slurm
(1 hunks)examples/wide_ep/README.md
(1 hunks)examples/wide_ep/slurm_scripts/README.md
(2 hunks)examples/wide_ep/slurm_scripts/submit.sh
(1 hunks)tensorrt_llm/_torch/pyexecutor/py_executor.py
(1 hunks)tensorrt_llm/llmapi/llm_args.py
(1 hunks)tensorrt_llm/serve/openai_disagg_server.py
(1 hunks)tests/integration/defs/accuracy/test_disaggregated_serving.py
(10 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_cache_aware_balance.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_cache_aware_balance_deepseek_v3.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_cache_reuse.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_cache_reuse_deepseek_v3.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_conditional.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_conditional_deepseek_v3.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp2_genpp2.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp2_gentp2.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp4_genpp4.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite.yaml
(1 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_one_mtp.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_one_mtp_attention_dp_overlap.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_two_mtp.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_genpp2.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp1.yaml
(1 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp1_trt_backend.yaml
(1 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite.yaml
(1 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_one.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_one_mtp.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_overlap.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_overlap_cuda_graph.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_overlap_cuda_graph.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2pp2_gentp2pp2.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_cuda_graph_padding.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_diff_max_tokens.yaml
(1 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_gen_only.yaml
(1 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_gen_only_trt_backend.yaml
(1 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_load_balance.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_mixed.yaml
(1 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ngram.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_overlap.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_torch_sampler.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_trt_backend.yaml
(1 hunks)tests/integration/defs/disaggregated/test_disaggregated_etcd.py
(1 hunks)tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py
(3 hunks)tests/unittest/llmapi/test_llm_args.py
(1 hunks)
💤 Files with no reviewable changes (1)
- docs/source/advanced/disaggregated-service.md
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py
: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.
Files:
examples/disaggregated/slurm/benchmark/gen_yaml.py
tests/integration/defs/disaggregated/test_disaggregated_etcd.py
tensorrt_llm/serve/openai_disagg_server.py
tests/integration/defs/accuracy/test_disaggregated_serving.py
tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py
tensorrt_llm/_torch/pyexecutor/py_executor.py
tests/unittest/llmapi/test_llm_args.py
tensorrt_llm/llmapi/llm_args.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
examples/disaggregated/slurm/benchmark/gen_yaml.py
tests/integration/defs/disaggregated/test_disaggregated_etcd.py
tensorrt_llm/serve/openai_disagg_server.py
tests/integration/defs/accuracy/test_disaggregated_serving.py
tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py
tensorrt_llm/_torch/pyexecutor/py_executor.py
tests/unittest/llmapi/test_llm_args.py
tensorrt_llm/llmapi/llm_args.py
🧠 Learnings (6)
📚 Learning: 2025-08-06T13:58:07.506Z
Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.
Applied to files:
docs/source/blogs/tech_blog/blog5_Disaggregated_Serving_in_TensorRT-LLM.md
examples/disaggregated/slurm/simple_example/launch.slurm
tensorrt_llm/llmapi/llm_args.py
📚 Learning: 2025-08-01T15:14:45.673Z
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Applied to files:
docs/source/blogs/tech_blog/blog5_Disaggregated_Serving_in_TensorRT-LLM.md
examples/cpp/executor/README.md
tensorrt_llm/llmapi/llm_args.py
examples/disaggregated/README.md
📚 Learning: 2025-07-22T09:22:14.726Z
Learnt from: yechank-nvidia
PR: NVIDIA/TensorRT-LLM#6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using `from_shared_tensor()` is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call `strip_for_generation()` to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.
Applied to files:
docs/source/blogs/tech_blog/blog5_Disaggregated_Serving_in_TensorRT-LLM.md
benchmarks/cpp/README.md
tensorrt_llm/_torch/pyexecutor/py_executor.py
examples/disaggregated/slurm/simple_example/launch.slurm
examples/disaggregated/README.md
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Applied to files:
docs/source/blogs/tech_blog/blog5_Disaggregated_Serving_in_TensorRT-LLM.md
benchmarks/cpp/README.md
tensorrt_llm/_torch/pyexecutor/py_executor.py
examples/disaggregated/slurm/simple_example/launch.slurm
tensorrt_llm/llmapi/llm_args.py
examples/disaggregated/README.md
📚 Learning: 2025-07-17T09:01:27.402Z
Learnt from: amitz-nv
PR: NVIDIA/TensorRT-LLM#5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks `is_adapter_in_cpu_cache()` and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.
Applied to files:
tensorrt_llm/_torch/pyexecutor/py_executor.py
tensorrt_llm/llmapi/llm_args.py
📚 Learning: 2025-08-09T02:04:49.580Z
Learnt from: Fridah-nv
PR: NVIDIA/TensorRT-LLM#6760
File: tensorrt_llm/_torch/auto_deploy/models/quant_config_reader.py:81-98
Timestamp: 2025-08-09T02:04:49.580Z
Learning: In TensorRT-LLM's auto_deploy module, torch.dtype values in configuration dictionaries must be stored as string representations (e.g., "float16" instead of torch.float16) because OmegaConf.merge does not support torch.dtype types. These string representations are converted to actual torch.dtype objects in downstream code.
Applied to files:
tensorrt_llm/llmapi/llm_args.py
🪛 markdownlint-cli2 (0.17.2)
examples/cpp/executor/README.md
129-129: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
examples/disaggregated/README.md
28-28: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
45-45: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
66-66: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
99-99: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
113-113: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
131-131: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
148-148: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
163-163: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
174-174: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
191-191: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Ruff (0.12.2)
tests/unittest/llmapi/test_llm_args.py
669-669: CacheTransceiverConfig
may be undefined, or defined from star imports
(F405)
677-677: CacheTransceiverConfig
may be undefined, or defined from star imports
(F405)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (46)
tests/integration/defs/disaggregated/test_configs/disagg_config_cache_reuse.yaml (2)
18-18
: LGTM on backend standardizationUsing uppercase backend literals aligns with the updated type constraints and improves consistency across configs and tests.
Also applies to: 33-33
18-18
: Standardize backend literals to uppercase across all test configurationsThe search uncovered remaining lowercase
backend
values in several integration test YAML files. Please update these entries to their uppercase equivalents:
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_ucx.yaml
• Line 12:backend: "ucx"
→backend: "UCX"
• Line 20:backend: "ucx"
→backend: "UCX"
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_nixl.yaml
• Line 12:backend: "nixl"
→backend: "NIXL"
• Line 20:backend: "nixl"
→backend: "NIXL"
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_mpi.yaml
• Line 12:backend: "mpi"
→backend: "MPI"
• Line 20:backend: "mpi"
→backend: "MPI"
Example diff:
- backend: "ucx" + backend: "UCX"After updating these, re-run the lowercase-literal search to confirm no lowercase backends remain.
Likely an incorrect or invalid review comment.
examples/disaggregated/disagg_config.yaml (1)
14-14
: LGTM on backend standardizationAlready quoted; aligns with the new uppercase literal set and keeps YAML style consistent.
Also applies to: 22-22
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp1.yaml (1)
13-13
: LGTM on backend standardizationUppercase value matches the updated type constraints used by CacheTransceiverConfig.
Also applies to: 21-21
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp1_trt_backend.yaml (1)
11-11
: LGTM on backend standardizationUppercase matches the new accepted literals; no other changes needed.
Also applies to: 19-19
examples/disaggregated/slurm/benchmark/gen_yaml.py (2)
200-201
: Uppercasing backend to "DEFAULT" — consistent with new standard.Matches the uppercase literal convention introduced across the codebase. No issues.
228-229
: Uppercasing backend to "DEFAULT" — consistent with new standard.Matches the uppercase literal convention introduced across the codebase. No issues.
examples/disaggregated/slurm/simple_example/gen_extra-llm-api-config.yaml (1)
1-3
: UCX backend and buffer size look good.Config aligns with the uppercase backend convention and adds a reasonable buffer size. LGTM.
tests/integration/defs/disaggregated/test_configs/disagg_config_cache_aware_balance.yaml (2)
24-24
: Backend literal standardized to "DEFAULT" — OK.
38-38
: Backend literal standardized to "DEFAULT" — OK.tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_two_mtp.yaml (2)
17-17
: Backend literal standardized to "DEFAULT" — OK.
28-28
: Backend literal standardized to "DEFAULT" — OK.tests/integration/defs/disaggregated/test_configs/disagg_config_cache_reuse_deepseek_v3.yaml (2)
18-18
: Backend literal standardized to "DEFAULT" — OK.
33-33
: Backend literal standardized to "DEFAULT" — OK.tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_overlap.yaml (2)
14-14
: Backend literal standardized to DEFAULT — looks correctMatches the updated uppercase literals and is consistent across context and generation server configs.
Also applies to: 24-24
14-14
: No lowercase “default” remains in backend assignments
Verified via a case-insensitive search across all YAML and code files—everybackend:
entry uses uppercaseDEFAULT
. No mixed-case or lowercasedefault
usages were found.tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp2_genpp2.yaml (1)
19-19
: Consistent capitalization of backend (DEFAULT) — OKAligned with updated type validation. No other changes required here.
Also applies to: 34-34
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_one.yaml (1)
14-14
: Backend set to DEFAULT for both roles — OKMatches the new allowed literals and stays consistent across sections.
Also applies to: 23-23
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_one_mtp.yaml (1)
17-17
: Updated backend to DEFAULT — looks goodConsistent with the standardized uppercase values and surrounding configs.
Also applies to: 26-26
tests/integration/defs/disaggregated/test_configs/disagg_config_mixed.yaml (1)
13-13
: DEFAULT backend capitalization applied — goodChange is minimal and consistent across context and generation servers.
Also applies to: 21-21
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_one_mtp.yaml (1)
17-17
: Uppercase backend value standardization LGTM.Aligns with the updated literals in code/tests. No functional concerns.
Also applies to: 26-26
tests/integration/defs/disaggregated/test_configs/disagg_config_conditional.yaml (1)
21-21
: Uppercase backend value standardization LGTM.Consistent with updated accepted literals; test config looks correct.
Also applies to: 36-36
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite.yaml (1)
13-13
: Standardization to uppercase DEFAULT is correct.Matches the new allowed values; no issues spotted.
Also applies to: 21-21
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp4_genpp4.yaml (1)
19-19
: LGTM on backend casing change.Consistent with the updated enum-like literals across configs.
Also applies to: 34-34
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_one_mtp_attention_dp_overlap.yaml (1)
17-17
: Backend normalization to DEFAULT looks good.No functional issues with the change.
Also applies to: 27-27
tests/integration/defs/disaggregated/test_configs/disagg_config_overlap.yaml (1)
19-19
: LGTM: standardized backend literal to DEFAULTMatches the new uppercase convention and remains valid YAML.
Also applies to: 34-34
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2pp2_gentp2pp2.yaml (1)
19-19
: LGTM: backend literal set to uppercase DEFAULTConsistent with updated type annotations and other configs.
Also applies to: 34-34
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite.yaml (1)
13-13
: LGTM: uppercase DEFAULT appliedChange is correct and consistent with the broader refactor.
Also applies to: 21-21
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_overlap_cuda_graph.yaml (1)
13-13
: LGTM: backend literal standardized to DEFAULTAligned with the new accepted values; no issues spotted.
Also applies to: 25-25
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_overlap_cuda_graph.yaml (1)
12-12
: Backend literal normalized to DEFAULT — looks correctMatches the uppercase literal convention enforced in code. No functional concerns.
Also applies to: 23-23
tests/integration/defs/disaggregated/test_configs/disagg_config_cache_aware_balance_deepseek_v3.yaml (1)
20-20
: Backend literal standardized to "DEFAULT"Aligned with new accepted uppercase literals. Good update.
Also applies to: 36-36
tests/integration/defs/disaggregated/test_configs/disagg_config_torch_sampler.yaml (1)
19-19
: Use of "DEFAULT" backend matches enforced literalsChange is correct and consistent with the API expectations.
Also applies to: 35-35
tests/integration/defs/disaggregated/test_configs/disagg_config_ngram.yaml (1)
12-12
: "DEFAULT" backend value normalization LGTMConsistent with the new literal set and validation.
Also applies to: 21-21
benchmarks/cpp/README.md (1)
339-342
: Env var rename to UCX backend is correct and consistent.Switching to TRTLLM_USE_UCX_KVCACHE=1 matches the new transceiver backend naming and the rest of the docs. LGTM.
Also applies to: 347-351
examples/disaggregated/slurm/simple_example/ctx_extra-llm-api-config.yaml (1)
1-6
: Config looks good and aligns with disaggregated context constraints.UCX backend and disabling overlap scheduler are appropriate here. No issues found.
examples/wide_ep/slurm_scripts/README.md (1)
20-20
: Path updates look right.References now point to examples/disaggregated/benchmark/slurm. LGTM.
Also applies to: 38-40
examples/cpp/executor/README.md (2)
127-127
: UCX env var requirement is correct.Using TRTLLM_USE_UCX_KVCACHE=1 for disaggregated executor is consistent with the new backend naming. Good change.
Also applies to: 130-134
127-127
: Remove all remainingTRTLLM_USE_MPI_KVCACHE
referencesStale mentions of the deprecated MPI-based KV cache flag were found in the following locations. Please update or remove them (e.g., switch to
TRTLLM_USE_UCX_KVCACHE
where appropriate):
- tests/integration/defs/triton_server/test_triton_llm.py (≈line 3380)
- tests/integration/defs/disaggregated/test_disaggregated.py (line 701)
- tests/integration/defs/cpp/test_e2e.py (line 48)
- tests/integration/defs/cpp/test_multi_gpu.py (line 35)
- tests/integration/defs/triton_server/test.sh (lines 1130, 1145)
- tensorrt_llm/_torch/pyexecutor/kv_cache_transceiver.py (line 44)
- cpp/tensorrt_llm/common/envUtils.cpp (line 271)
- examples/disaggregated/README.md (line 304)
- docker/common/install_mpi4py.sh (lines 34, 47, 50)
⛔ Skipped due to learnings
Learnt from: galagam PR: NVIDIA/TensorRT-LLM#6487 File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12 Timestamp: 2025-08-06T13:58:07.506Z Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.
Learnt from: moraxu PR: NVIDIA/TensorRT-LLM#6303 File: tests/integration/test_lists/qa/examples_test_list.txt:494-494 Timestamp: 2025-07-28T17:06:08.621Z Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: amitz-nv PR: NVIDIA/TensorRT-LLM#5616 File: tensorrt_llm/executor/worker.py:375-384 Timestamp: 2025-07-17T09:01:27.402Z Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks `is_adapter_in_cpu_cache()` and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.
Learnt from: CR PR: NVIDIA/TensorRT-LLM#0 File: CODING_GUIDELINES.md:0-0 Timestamp: 2025-08-06T21:22:55.018Z Learning: Applies to **/*.{h,hpp} : Use a preprocessor guard in header files. The guard name must have prefix TRTLLM_ followed by the filename, all in caps, and no trailing underscore.
Learnt from: yibinl-nvidia PR: NVIDIA/TensorRT-LLM#6506 File: examples/models/core/mixtral/requirements.txt:3-3 Timestamp: 2025-08-01T15:14:45.673Z Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Learnt from: sklevtsov-nvidia PR: NVIDIA/TensorRT-LLM#3294 File: cpp/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/fusion/sm90_visitor_scatter.hpp:36-36 Timestamp: 2025-08-08T05:06:31.537Z Learning: CUTLASS extension files (under cpp/tensorrt_llm/cutlass_extensions/) follow CUTLASS coding style conventions, including using #pragma once instead of TRTLLM_ prefixed header guards, even though they are .hpp files.
Learnt from: CR PR: NVIDIA/TensorRT-LLM#0 File: CODING_GUIDELINES.md:0-0 Timestamp: 2025-08-06T21:22:55.018Z Learning: Applies to **/*.{cpp,h,hpp,cc,cxx,cu,py} : All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Learnt from: yiqingy0 PR: NVIDIA/TensorRT-LLM#5198 File: jenkins/mergeWaiveList.py:0-0 Timestamp: 2025-07-22T08:33:49.109Z Learning: In the TensorRT-LLM waive list merging system, removed lines are always located at the end of the merge waive lists, which is why the mergeWaiveList.py script uses reverse traversal - it's an optimization for this specific domain constraint.
tests/integration/defs/disaggregated/test_disaggregated_etcd.py (1)
247-249
: Backend casing updated to 'DEFAULT' — good.Matches the standardized uppercase convention.
Also applies to: 255-257
examples/disaggregated/slurm/simple_example/disagg_config.yaml (1)
4-4
: Incorrect conflation of model backend with cache transceiver backendThe top-level
backend: pytorch
in examples/disaggregated/slurm/simple_example/disagg_config.yaml drives which inference engine to launch (PyTorch vs. TRT), not the KV‐cache transceiver. Cache transceiver settings live under each server’scache_transceiver_config
section. You can safely leave or rename the top-levelbackend
field according to its own schema (e.g.backend: pytorch
).Likely an incorrect or invalid review comment.
tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py (3)
277-278
: LGTM: consistent use of uppercase "DEFAULT"Aligned with new accepted literals.
380-381
: LGTM: consistent use of uppercase "DEFAULT"Aligned with new accepted literals.
134-135
: LGTM: backend standardized to "DEFAULT"
Verified there are no remaining lowercase"default"
or"ucx"
references in the repo.tests/integration/defs/accuracy/test_disaggregated_serving.py (2)
247-248
: Consistent backend literal update to "DEFAULT" — LGTMAll these updates align with the new uppercase convention. No functional concerns.
Also applies to: 256-258, 291-293, 331-333, 339-341, 382-385, 396-398, 535-536, 575-576, 624-626, 631-633, 675-677, 682-684
426-429
: Lowercase "default" backend literals missed; update to "DEFAULT" for consistency and validation compatibilityA few cache_transceiver_config backends still use "default" (lowercase). Please update to "DEFAULT" to match the new accepted values and prevent config parsing/validation mismatches.
Apply this diff:
@@ ctx_server_config = { "disable_overlap_scheduler": True, "guided_decoding_backend": backend, "cache_transceiver_config": { - "backend": "default" + "backend": "DEFAULT" } } gen_server_config = { "guided_decoding_backend": backend, "cache_transceiver_config": { - "backend": "default" + "backend": "DEFAULT" } } @@ ctx_server_config = { "disable_overlap_scheduler": True, "speculative_config": speculative_decoding_config, "kv_cache_config": { "free_gpu_memory_fraction": 0.8, }, "guided_decoding_backend": backend, "cache_transceiver_config": { - "backend": "default" + "backend": "DEFAULT" } } gen_server_config = { "disable_overlap_scheduler": True, "speculative_config": speculative_decoding_config, "kv_cache_config": { "free_gpu_memory_fraction": 0.8, }, "guided_decoding_backend": backend, "cache_transceiver_config": { - "backend": "default" + "backend": "DEFAULT" } }Also applies to: 431-435, 474-477, 485-488
Likely an incorrect or invalid review comment.
examples/disaggregated/README.md (1)
24-32
: Good: example config uses uppercase backends and disables overlap on contextThe examples align with the new uppercase backend convention and the current limitation on overlap scheduler for context servers.
Also applies to: 35-40
@@ -277,7 +277,7 @@ We also conducted performance evaluations of Qwen 3 on GB200 GPUs. The data indi | |||
|
|||
### Reproducing Steps | |||
|
|||
We provide a set of scripts to reproduce the performance data presented in this paper. Please refer to the usage instructions described in [this document](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/disaggregated/slurm). | |||
We provide a set of scripts to reproduce the performance data presented in this paper. Please refer to the usage instructions described in [this document](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/disaggregated/benchmark/slurm). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Updated link path — please verify the URL resolves
The path now points to examples/disaggregated/benchmark/slurm. Verify it exists and renders as expected.
🏁 Script executed:
#!/bin/bash
set -euo pipefail
URL="https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/disaggregated/benchmark/slurm"
# Expect: HTTP 200 for HTML content
curl -sI "$URL" | sed -n '1p;/^content-type:/Ip'
Length of output: 297
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Locate any directory named "slurm" in the repo to verify the correct path
fd -t d slurm
Length of output: 97
Fix broken link to Slurm scripts in blog post
The link to examples/disaggregated/benchmark/slurm
returns a 404. Update it to point to the existing examples/disaggregated/slurm
directory.
• File: docs/source/blogs/tech_blog/blog5_Disaggregated_Serving_in_TensorRT-LLM.md
• Line: 280
Suggested diff:
- [this document](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/disaggregated/benchmark/slurm)
+ [this document](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/disaggregated/slurm)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
We provide a set of scripts to reproduce the performance data presented in this paper. Please refer to the usage instructions described in [this document](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/disaggregated/benchmark/slurm). | |
We provide a set of scripts to reproduce the performance data presented in this paper. Please refer to the usage instructions described in [this document](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/disaggregated/slurm). |
🤖 Prompt for AI Agents
In docs/source/blogs/tech_blog/blog5_Disaggregated_Serving_in_TensorRT-LLM.md at
line 280, the link to the Slurm scripts is broken because it points to a
non-existent directory. Update the URL from
"examples/disaggregated/benchmark/slurm" to "examples/disaggregated/slurm" to
fix the 404 error and ensure the link directs to the correct existing directory.
'cache_transceiver_config': { | ||
'max_tokens_in_buffer': cache_transceiver_max_num_tokens, | ||
'backend': 'default', | ||
'backend': 'DEFAULT', | ||
}, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Sanity check: ensure no lingering lowercase backend literals remain.
As a follow-up, verify no configs or code still use backend: default/ucx/etc. in lowercase.
Run:
Also applies to: 226-229
🏁 Script executed:
#!/bin/bash
# Find lingering lowercase backend literals in YAML/Python/JSON/TOML
rg -n --no-heading -e $'backend:\\s*default\\b' -e $'backend:\\s*ucx\\b' -e $'backend:\\s*mpi\\b' -e $'backend:\\s*nixl\\b' -e $'\"backend\"\\s*:\\s*\"default\"' -e $'\\'backend\\'\\s*:\\s*\\'default\\'' -e $'backend:\\s*\"default\"'
Length of output: 616
Update lingering lowercase backend literals in tests
The search uncovered lingering lowercase "backend": "default"
entries in the integration tests. Please update these to use the uppercase literal ("DEFAULT"
) for consistency:
- tests/integration/defs/accuracy/test_disaggregated_serving.py:427
- tests/integration/defs/accuracy/test_disaggregated_serving.py:433
- tests/integration/defs/accuracy/test_disaggregated_serving.py:475
- tests/integration/defs/accuracy/test_disaggregated_serving.py:486
🤖 Prompt for AI Agents
In examples/disaggregated/slurm/benchmark/gen_yaml.py around lines 198 to 202,
update any occurrences of the lowercase string "default" used as a backend
literal to the uppercase "DEFAULT" to maintain consistency. Search through the
specified integration test files for similar lowercase literals and replace them
with the uppercase version as well.
context_servers: | ||
num_instances: 1 | ||
urls: | ||
- "ctx_hostname:8001" | ||
generation_servers: | ||
num_instances: 1 | ||
urls: | ||
- "gen_hostname:8002" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider embedding per-server cache_transceiver_config blocks
Other configs in this PR place the transceiver backend under each server type. If this proxy config is intended to route disaggregated requests, consider aligning structure:
- context_servers.cache_transceiver_config.backend: DEFAULT
- generation_servers.cache_transceiver_config.backend: DEFAULT
Example structure:
hostname: localhost
port: 8000
context_servers:
num_instances: 1
urls:
- "ctx_hostname:8001"
cache_transceiver_config:
backend: DEFAULT
generation_servers:
num_instances: 1
urls:
- "gen_hostname:8002"
cache_transceiver_config:
backend: DEFAULT
Happy to propose a concrete patch once the schema is confirmed.
🤖 Prompt for AI Agents
In examples/disaggregated/slurm/simple_example/disagg_config.yaml around lines 5
to 12, the configuration lacks per-server cache_transceiver_config blocks under
context_servers and generation_servers. To fix this, embed a
cache_transceiver_config section with a backend key set to DEFAULT under both
context_servers and generation_servers, aligning the structure with other
configs in the PR and supporting proper routing of disaggregated requests.
srun --container-image=${container_image} \ | ||
--container-mounts=${mount_paths} \ | ||
-N 2 --ntasks-per-node=4 \ | ||
--mpi=pmix \ | ||
bash -c "trtllm-llmapi-launch trtllm-serve TinyLlama/TinyLlama-1.1B-Chat-v1.0 --tp_size 8 --host 0.0.0.0 --port ${ctx_port} --extra_llm_api_options ${work_path}/ctx_extra-llm-api-config.yaml" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
srun will block; background long-running services or they’ll never reach subsequent steps
As written, this srun will block until the context service exits, so the generation and proxy steps won’t run. Run long-lived services concurrently.
Apply this diff:
srun --container-image=${container_image} \
--container-mounts=${mount_paths} \
-N 2 --ntasks-per-node=4 \
--mpi=pmix \
- bash -c "trtllm-llmapi-launch trtllm-serve TinyLlama/TinyLlama-1.1B-Chat-v1.0 --tp_size 8 --host 0.0.0.0 --port ${ctx_port} --extra_llm_api_options ${work_path}/ctx_extra-llm-api-config.yaml"
+ bash -c "trtllm-llmapi-launch trtllm-serve TinyLlama/TinyLlama-1.1B-Chat-v1.0 --tp_size 8 --host 0.0.0.0 --port ${ctx_port} --extra_llm_api_options ${work_path}/ctx_extra-llm-api-config.yaml" &
Optional hardening at top of the script:
#!/bin/bash
+set -euo pipefail
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
srun --container-image=${container_image} \ | |
--container-mounts=${mount_paths} \ | |
-N 2 --ntasks-per-node=4 \ | |
--mpi=pmix \ | |
bash -c "trtllm-llmapi-launch trtllm-serve TinyLlama/TinyLlama-1.1B-Chat-v1.0 --tp_size 8 --host 0.0.0.0 --port ${ctx_port} --extra_llm_api_options ${work_path}/ctx_extra-llm-api-config.yaml" | |
#!/bin/bash | |
set -euo pipefail | |
srun --container-image=${container_image} \ | |
--container-mounts=${mount_paths} \ | |
-N 2 --ntasks-per-node=4 \ | |
--mpi=pmix \ | |
bash -c "trtllm-llmapi-launch trtllm-serve TinyLlama/TinyLlama-1.1B-Chat-v1.0 --tp_size 8 --host 0.0.0.0 --port ${ctx_port} --extra_llm_api_options ${work_path}/ctx_extra-llm-api-config.yaml" & |
🤖 Prompt for AI Agents
In examples/disaggregated/slurm/simple_example/launch.slurm around lines 16 to
21, the srun command is currently blocking, preventing subsequent steps from
running. To fix this, modify the srun command to run the long-lived service in
the background by appending an ampersand (&) at the end of the command string.
This allows the script to continue executing subsequent commands concurrently.
Optionally, add script hardening at the top of the file as suggested.
# Launch a generation with `tp_size=4` using one 4-GPU node. | ||
srun --container-image=${container_image} \ | ||
--container-mounts=${mount_paths} \ | ||
-N 1 --ntasks-per-node=4 \ | ||
--mpi=pmix \ | ||
bash -c "trtllm-llmapi-launch trtllm-serve TinyLlama/TinyLlama-1.1B-Chat-v1.0 --tp_size 8 --host 0.0.0.0 --port ${gen_port} --extra_llm_api_options ${work_path}/gen_extra-llm-api-config.yaml" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Comment says tp_size=4 but command uses --tp_size 8 — make them consistent
Either adjust the comment or the command. If you intend tp=4 on a single 4-GPU node, update the flag accordingly.
Apply this diff:
-# Launch a generation with `tp_size=4` using one 4-GPU node.
+# Launch a generation with `tp_size=4` using one 4-GPU node.
srun --container-image=${container_image} \
--container-mounts=${mount_paths} \
-N 1 --ntasks-per-node=4 \
--mpi=pmix \
- bash -c "trtllm-llmapi-launch trtllm-serve TinyLlama/TinyLlama-1.1B-Chat-v1.0 --tp_size 8 --host 0.0.0.0 --port ${gen_port} --extra_llm_api_options ${work_path}/gen_extra-llm-api-config.yaml"
+ bash -c "trtllm-llmapi-launch trtllm-serve TinyLlama/TinyLlama-1.1B-Chat-v1.0 --tp_size 4 --host 0.0.0.0 --port ${gen_port} --extra_llm_api_options ${work_path}/gen_extra-llm-api-config.yaml" &
Finally, wait for backgrounded services (keep proxy in foreground to hold the allocation):
- srun --container-image=${container_image} \
+ srun --container-image=${container_image} \
--container-mounts=${mount_paths} \
-N 1 --ntasks-per-node=1 \
--mpi=pmix \
bash -c "trtllm-llmapi-launch trtllm-serve disaggregated -c ${work_path}/disagg_config.yaml"
+wait
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
# Launch a generation with `tp_size=4` using one 4-GPU node. | |
srun --container-image=${container_image} \ | |
--container-mounts=${mount_paths} \ | |
-N 1 --ntasks-per-node=4 \ | |
--mpi=pmix \ | |
bash -c "trtllm-llmapi-launch trtllm-serve TinyLlama/TinyLlama-1.1B-Chat-v1.0 --tp_size 8 --host 0.0.0.0 --port ${gen_port} --extra_llm_api_options ${work_path}/gen_extra-llm-api-config.yaml" | |
# Launch a generation with `tp_size=4` using one 4-GPU node. | |
srun --container-image=${container_image} \ | |
--container-mounts=${mount_paths} \ | |
-N 1 --ntasks-per-node=4 \ | |
--mpi=pmix \ | |
bash -c "trtllm-llmapi-launch trtllm-serve TinyLlama/TinyLlama-1.1B-Chat-v1.0 --tp_size 4 --host 0.0.0.0 --port ${gen_port} --extra_llm_api_options ${work_path}/gen_extra-llm-api-config.yaml" & | |
srun --container-image=${container_image} \ | |
--container-mounts=${mount_paths} \ | |
-N 1 --ntasks-per-node=1 \ | |
--mpi=pmix \ | |
bash -c "trtllm-llmapi-launch trtllm-serve disaggregated -c ${work_path}/disagg_config.yaml" | |
wait |
🤖 Prompt for AI Agents
In examples/disaggregated/slurm/simple_example/launch.slurm around lines 22 to
28, the comment states tp_size=4 but the command uses --tp_size 8, causing
inconsistency. To fix this, update the --tp_size flag in the command to 4 to
match the comment, ensuring the command launches with tp_size=4 on a single
4-GPU node.
@@ -16,7 +16,7 @@ context_servers: | |||
enable_partial_reuse: False | |||
disable_overlap_scheduler: True | |||
cache_transceiver_config: | |||
backend: default | |||
backend: DEFAULT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Standardization to DEFAULT acknowledged; ensure backward compatibility and consistent quoting.
- Aligns with the new uppercase convention.
- Suggest quoting for consistency with other quoted string scalars in the same file.
Scan for missed lowercase occurrences:
Suggested change:
- backend: DEFAULT
+ backend: "DEFAULT"
Also applies to: 34-34
🏁 Script executed:
#!/bin/bash
rg -n -S $'backend:\\s*(\"|\\\')?(default|ucx|nixl|mpi)(\"|\\\')?'
Length of output: 194
I wasn’t able to confirm whether any lowercase backend
values remain due to the regex error. Let’s rerun with a corrected pattern:
🏁 Script executed:
#!/bin/bash
# Search for any unquoted or lowercase backend values in YAML configs
rg -n -S -e 'backend:\s*(["\'])?(default|ucx|nixl|mpi)\1?' --glob '*.yaml' tests/integration/defs/disaggregated/
Length of output: 295
We need a corrected regex (Rust regex doesn’t support backreferences). Let’s rerun:
🏁 Script executed:
#!/bin/bash
# Verify no lowercase or unquoted backend values exist in YAML configs
rg -n -S -e "backend:\s*['\"]?(default|ucx|nixl|mpi)['\"]?" --glob '*.yaml' tests/integration/defs/disaggregated/
Length of output: 8747
Quote and standardize all backend:
values in disaggregated test configs
We’ve found dozens of unquoted uses of DEFAULT
(and some other backends) across
tests/integration/defs/disaggregated/test_configs/*.yaml
. To ensure consistency:
- Use uppercase for all backend identifiers (DEFAULT, UCX, NIXL, MPI).
- Always wrap the value in double quotes:
- backend: DEFAULT + backend: "DEFAULT"
- Apply the same quoting rule for other backends:
- backend: ucx + backend: "UCX" - backend: nixl + backend: "NIXL" - backend: mpi + backend: "MPI"
Suggested one-liner to catch remaining unquoted entries:
rg -n -S --glob '*.yaml' '^\s*backend:\s*[A-Z]+' tests/integration/defs/disaggregated/test_configs/
And a bulk fix (run from repo root):
find tests/integration/defs/disaggregated/test_configs/ -name '*.yaml' \
-exec sed -i -E 's/^( *)(backend:\s*)([A-Z]+)$/\1\2"\3"/' {} +
🤖 Prompt for AI Agents
In
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp2_gentp2.yaml
at line 19, the backend value DEFAULT is unquoted. To fix this, update the
backend value to be uppercase and wrap it in double quotes, changing backend:
DEFAULT to backend: "DEFAULT". Apply this quoting and uppercase standard
consistently to all backend values in the disaggregated test configs directory.
@@ -16,7 +16,7 @@ context_servers: | |||
enable_partial_reuse: False | |||
disable_overlap_scheduler: True | |||
cache_transceiver_config: | |||
backend: default | |||
backend: DEFAULT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
❓ Verification inconclusive
Consistent with uppercase convention; minor quoting nit and migration check.
- Change is correct per the new allowed literals.
- Recommend quoting for consistency with other string fields.
Find missed lowercase:
Diff:
- backend: DEFAULT
+ backend: "DEFAULT"
Also applies to: 34-34
🏁 Script executed:
#!/bin/bash
rg -n -S $'backend:\\s*(\"|\\\')?(default|ucx|nixl|mpi)(\"|\\\')?'
Length of output: 194
🏁 Script executed:
#!/usr/bin/env bash
# Search for any occurrences of lowercase backend values (quoted or unquoted) in YAML configs
rg -n -E "backend:\s*(['\"])?(default|ucx|nixl|mpi)\1" --glob "*.yaml"
Length of output: 471
#!/usr/bin/env bash
set -euo pipefail
echo "Searching for any lowercase backend values in YAML files (unquoted, single-quoted, or double-quoted)..."
for val in default ucx nixl mpi; do
echo "=== Checking unquoted: backend: $val ==="
grep -R --include='*.yaml' -n "backend: $val" . || true
echo "=== Checking double-quoted: backend: \"$val\" ==="
grep -R --include='*.yaml' -n "backend: \"$val\"" . || true
echo "=== Checking single-quoted: backend: '$val' ==="
grep -R --include='*.yaml' -n "backend: '$val'" . || true
done
Quote uppercase backend
literal for consistency
To align with other string fields, please add quotes around the uppercase DEFAULT
values in your YAML:
• tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_genpp2.yaml
– Line 19: change backend: DEFAULT
→ backend: "DEFAULT"
– Line 34: change backend: DEFAULT
→ backend: "DEFAULT"
Diff:
--- a/tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_genpp2.yaml
+++ b/tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_genpp2.yaml
@@ -19,1 +19,1 @@
- backend: DEFAULT
+ backend: "DEFAULT"
@@ -34,1 +34,1 @@
- backend: DEFAULT
+ backend: "DEFAULT"
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
backend: DEFAULT | |
--- a/tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_genpp2.yaml | |
+++ b/tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_genpp2.yaml | |
@@ -19,1 +19,1 @@ | |
- backend: DEFAULT | |
+ backend: "DEFAULT" | |
@@ -34,1 +34,1 @@ | |
- backend: DEFAULT | |
+ backend: "DEFAULT" |
🤖 Prompt for AI Agents
In
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_genpp2.yaml
at lines 19 and 34, the uppercase string literal DEFAULT under the backend key
should be enclosed in quotes for consistency with other string fields. Update
both lines to use backend: "DEFAULT" instead of backend: DEFAULT.
@@ -11,7 +11,7 @@ context_servers: | |||
pipeline_parallel_size: 1 | |||
enable_attention_dp: True | |||
cache_transceiver_config: | |||
backend: default | |||
backend: DEFAULT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Backend value updated; consider quoting and verify no lowercase remains elsewhere.
- Uppercase
DEFAULT
matches the updated literals; looks good. - For stylistic consistency with other quoted strings in the file, consider quoting.
Repo-wide check:
Patch:
- backend: DEFAULT
+ backend: "DEFAULT"
Also applies to: 23-23
🏁 Script executed:
#!/bin/bash
rg -n -S $'backend:\\s*(\"|\\\')?(default|ucx|nixl|mpi)(\"|\\\')?'
Length of output: 194
To catch any unquoted or lowercase default
occurrences, let’s retry with a simpler pattern. Please run:
🏁 Script executed:
#!/bin/bash
# Search for any backend: DEFAULT, default, ucx, nixl, or mpi (quoted or unquoted)
rg -n "backend:\s*(\"|')?(DEFAULT|default|ucx|nixl|mpi)(\"|')?"
Length of output: 9011
Quote DEFAULT
in YAML backend entries for consistency
Multiple test config files use an unquoted DEFAULT
value for the backend
key. To match the style of other quoted literals (e.g. "ucx"
, "nixl"
, "mpi"
), please wrap DEFAULT
in quotes.
Files to update (unquoted backend: DEFAULT
occurrences):
- tests/integration/defs/disaggregated/test_configs/**/*.yaml
- examples/disaggregated/disagg_config.yaml (lines 14, 22)
Example patch:
- backend: DEFAULT
+ backend: "DEFAULT"
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
backend: DEFAULT | |
backend: "DEFAULT" |
🤖 Prompt for AI Agents
In
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp.yaml
at line 14, the backend value DEFAULT is unquoted. To maintain consistency with
other YAML files where backend values like "ucx" or "mpi" are quoted, update
this line to quote DEFAULT as "DEFAULT". Also, apply this quoting style to all
other unquoted DEFAULT backend entries in the specified test config YAML files
and examples/disaggregated/disagg_config.yaml at lines 14 and 22.
@@ -16,7 +16,7 @@ context_servers: | |||
batch_sizes: [1,3000] | |||
disable_overlap_scheduler: True | |||
cache_transceiver_config: | |||
backend: default | |||
backend: DEFAULT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Uppercase backend adopted; add quotes for consistency and verify global sweep.
- Matches PR objective to standardize backend literals.
- Optional: quote the value to align with other quoted scalars (e.g., "pytorch").
Sweep for lowercase literals:
Localized change:
- backend: DEFAULT
+ backend: "DEFAULT"
Also applies to: 37-37
🏁 Script executed:
#!/bin/bash
rg -n -S $'backend:\\s*(\"|\\\')?(default|ucx|nixl|mpi)(\"|\\\')?'
Length of output: 194
🏁 Script executed:
#!/bin/bash
# Scan all YAML files for lowercase backend values
for val in default ucx nixl mpi; do
echo "Searching for backend: ${val}"
rg -n -S "backend:\s*${val}" --glob '*.yaml'
done
Length of output: 7846
Quote uppercase backend scalars for consistency
A global sweep found only uppercase backend:
values—no lowercase literals—so please wrap all of them in quotes to align with other YAML scalars.
• Files to update:
- All
tests/integration/defs/disaggregated/test_configs/*.yaml
entries ofbackend: DEFAULT
examples/disaggregated/slurm/*.yaml
entries ofbackend: UCX
Example diff:
- backend: DEFAULT
+ backend: "DEFAULT"
- backend: UCX
+ backend: "UCX"
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
backend: DEFAULT | |
backend: "DEFAULT" |
🤖 Prompt for AI Agents
In
tests/integration/defs/disaggregated/test_configs/disagg_config_cuda_graph_padding.yaml
at line 19, the backend scalar DEFAULT is unquoted. To maintain consistency with
other YAML scalars, wrap the value DEFAULT in double quotes, changing backend:
DEFAULT to backend: "DEFAULT". Apply this quoting style to all similar uppercase
backend values in the specified YAML files.
PR_Github #14673 [ run ] triggered by Bot |
PR_Github #14673 [ run ] completed with state |
ea3715d
to
321327a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🔭 Outside diff range comments (2)
tests/integration/defs/accuracy/test_disaggregated_serving.py (2)
475-477
: Fix lowercase backend in guided_decoding_with_eagle3Normalize to "DEFAULT" to match the new allowed values.
Apply this diff:
"cache_transceiver_config": { - "backend": "default" + "backend": "DEFAULT" } ... "cache_transceiver_config": { - "backend": "default" + "backend": "DEFAULT" }Also applies to: 486-488
427-429
: Update all remaining lowercase backend entries to “DEFAULT”The grep results show four leftover
"backend": "default"
instances intests/integration/defs/accuracy/test_disaggregated_serving.py
. They need to be uppercased:• Line 427
• Line 433
• Line 475
• Line 486Apply this diff at each location:
- "backend": "default" + "backend": "DEFAULT"
🧹 Nitpick comments (5)
tests/integration/defs/disaggregated/test_configs/disagg_config_conditional.yaml (1)
21-21
: Quote YAML scalars for consistencyUnquoted DEFAULT parses fine in YAML, but other configs quote their backend values. Quote for consistency and to avoid accidental type interpretation by some linters.
- backend: DEFAULT + backend: "DEFAULT"Also applies to: 36-36
tests/unittest/llmapi/test_llm_args.py (2)
669-672
: Use uppercase backend in tests — LGTM; silence F405 for wildcard import usageRuff flags F405 on names coming from
import *
. Either explicitly importCacheTransceiverConfig
or add an inline noqa here.-config = CacheTransceiverConfig(backend="UCX", - max_tokens_in_buffer=1024) +config = CacheTransceiverConfig(backend="UCX", # noqa: F405 + max_tokens_in_buffer=1024)Optional (preferred): replace the wildcard import at the top with explicit imports so you don’t need the noqa.
677-677
: Silence F405 here as well or avoid wildcard importsSame linter warning; add noqa or import explicitly.
- CacheTransceiverConfig(backend="UCX", invalid_config="should_fail") + CacheTransceiverConfig(backend="UCX", # noqa: F405 + invalid_config="should_fail")Optional (preferred): replace
from tensorrt_llm.llmapi.llm_args import *
with explicit imports at file top to comply with the project’s import guidelines.examples/cpp/executor/README.md (2)
125-134
: Add language to fenced code block for linting and readabilitymarkdownlint reports MD040; add a language specifier.
Apply this diff:
-``` +```bash export TRTLLM_USE_UCX_KVCACHE=1 mpirun -n <num_ranks> --allow-run-as-root --oversubscribe ./executorExampleDisaggregated --context_engine_dir <path_to_context_engine_dir> --context_rank_size <num_ranks_for_context> --generation_engine_dir <path_to_generation_engine_dir> --generation_rank_size <num_ranks_for_generation> --input_tokens ../inputTokens.csv -``` +```
135-135
: Typo: missing space after commaMinor readability fix.
-where `<num_ranks_for_context>` must equal to `tp*pp` for the context engine, and `<num_ranks_for_generation>` must equal to `tp*pp` for the generation engine,the context engine and generation engine can be heterogeneous in parallelism. `<num_ranks>` must equal to `<num_ranks_for_context>+<num_ranks_for_generation>+1`, the additional rank is used as orchestrator process. +where `<num_ranks_for_context>` must equal to `tp*pp` for the context engine, and `<num_ranks_for_generation>` must equal to `tp*pp` for the generation engine, the context engine and generation engine can be heterogeneous in parallelism. `<num_ranks>` must equal to `<num_ranks_for_context>+<num_ranks_for_generation>+1`, the additional rank is used as orchestrator process.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (48)
benchmarks/cpp/README.md
(1 hunks)docs/source/advanced/disaggregated-service.md
(0 hunks)examples/cpp/executor/README.md
(1 hunks)examples/disaggregated/README.md
(3 hunks)examples/disaggregated/disagg_config.yaml
(1 hunks)examples/disaggregated/slurm/gen_yaml.py
(2 hunks)tensorrt_llm/llmapi/llm_args.py
(1 hunks)tests/integration/defs/accuracy/test_disaggregated_serving.py
(10 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_cache_aware_balance.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_cache_aware_balance_deepseek_v3.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_cache_reuse.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_cache_reuse_deepseek_v3.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_conditional.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_conditional_deepseek_v3.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp2_genpp2.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp2_gentp2.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp4_genpp4.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite.yaml
(1 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_one_mtp.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_one_mtp_attention_dp_overlap.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_two_mtp.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_genpp2.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp1.yaml
(1 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp1_trt_backend.yaml
(1 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite.yaml
(1 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_one.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_one_mtp.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_overlap.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_overlap_cuda_graph.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_mpi.yaml
(1 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_nixl.yaml
(1 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_overlap_cuda_graph.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_ucx.yaml
(1 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2pp2_gentp2pp2.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_cuda_graph_padding.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_diff_max_tokens.yaml
(1 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_gen_only.yaml
(1 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_gen_only_trt_backend.yaml
(1 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_load_balance.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_mixed.yaml
(1 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_ngram.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_overlap.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_torch_sampler.yaml
(2 hunks)tests/integration/defs/disaggregated/test_configs/disagg_config_trt_backend.yaml
(1 hunks)tests/integration/defs/disaggregated/test_disaggregated_etcd.py
(1 hunks)tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py
(3 hunks)tests/unittest/llmapi/test_llm_args.py
(1 hunks)
💤 Files with no reviewable changes (1)
- docs/source/advanced/disaggregated-service.md
✅ Files skipped from review due to trivial changes (3)
- examples/disaggregated/README.md
- tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp2_gentp2.yaml
- tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_one_mtp.yaml
🚧 Files skipped from review as they are similar to previous changes (34)
- tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite.yaml
- examples/disaggregated/disagg_config.yaml
- tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2pp2_gentp2pp2.yaml
- tests/integration/defs/disaggregated/test_configs/disagg_config_cache_reuse_deepseek_v3.yaml
- tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_one_mtp_attention_dp_overlap.yaml
- tests/integration/defs/disaggregated/test_configs/disagg_config_cache_reuse.yaml
- tests/integration/defs/disaggregated/test_configs/disagg_config_cache_aware_balance.yaml
- tests/integration/defs/disaggregated/test_configs/disagg_config_ngram.yaml
- tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_overlap.yaml
- tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp.yaml
- tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp1_trt_backend.yaml
- tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_one.yaml
- tests/integration/defs/disaggregated/test_configs/disagg_config_gen_only_trt_backend.yaml
- tensorrt_llm/llmapi/llm_args.py
- tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite.yaml
- tests/integration/defs/disaggregated/test_configs/disagg_config_torch_sampler.yaml
- tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp4_genpp4.yaml
- tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_one_mtp.yaml
- tests/integration/defs/disaggregated/test_disaggregated_etcd.py
- tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_genpp2.yaml
- tests/integration/defs/disaggregated/test_configs/disagg_config_load_balance.yaml
- tests/integration/defs/disaggregated/test_configs/disagg_config_diff_max_tokens.yaml
- tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_overlap_cuda_graph.yaml
- tests/integration/defs/disaggregated/test_configs/disagg_config_conditional_deepseek_v3.yaml
- tests/integration/defs/disaggregated/test_configs/disagg_config_overlap.yaml
- tests/integration/defs/disaggregated/test_configs/disagg_config_cuda_graph_padding.yaml
- tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_attention_dp_overlap_cuda_graph.yaml
- tests/integration/defs/disaggregated/test_configs/disagg_config_cache_aware_balance_deepseek_v3.yaml
- tests/integration/defs/disaggregated/test_configs/disagg_config_ctxpp2_genpp2.yaml
- tests/integration/defs/disaggregated/test_configs/disagg_config_mixed.yaml
- tests/integration/defs/disaggregated/test_configs/disagg_config_gen_only.yaml
- tests/integration/defs/disaggregated/test_configs/disagg_config_trt_backend.yaml
- tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp1_gentp1_deepseek_v3_lite_two_mtp.yaml
- tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp1.yaml
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py
: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.
Files:
examples/disaggregated/slurm/gen_yaml.py
tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py
tests/integration/defs/accuracy/test_disaggregated_serving.py
tests/unittest/llmapi/test_llm_args.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
examples/disaggregated/slurm/gen_yaml.py
tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py
tests/integration/defs/accuracy/test_disaggregated_serving.py
tests/unittest/llmapi/test_llm_args.py
🪛 markdownlint-cli2 (0.17.2)
benchmarks/cpp/README.md
346-346: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
examples/cpp/executor/README.md
129-129: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Ruff (0.12.2)
tests/unittest/llmapi/test_llm_args.py
669-669: CacheTransceiverConfig
may be undefined, or defined from star imports
(F405)
677-677: CacheTransceiverConfig
may be undefined, or defined from star imports
(F405)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Pre-commit Check
- GitHub Check: Pre-commit Check
🔇 Additional comments (13)
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_ucx.yaml (1)
12-12
: Uppercase backend literal “UCX” — LGTMChange aligns with repo-wide standardization. No further issues spotted.
Also applies to: 20-20
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_mpi.yaml (1)
12-12
: Uppercase backend literal “MPI” — LGTMMatches the updated Literal type. Good.
Also applies to: 20-20
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2_gentp2_deepseek_v3_lite_nixl.yaml (1)
12-12
: Uppercase backend literal “NIXL” — LGTMConsistent with the new contract.
Also applies to: 20-20
tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py (1)
134-135
: Switch to uppercase “DEFAULT” — LGTMThis follows the updated allowed backend values. No further issues on these lines.
Also applies to: 277-278, 380-381
tests/integration/defs/accuracy/test_disaggregated_serving.py (8)
246-248
: Backend normalization to "DEFAULT" looks goodConsistent with updated LlmArgs backend Literal types and broader PR direction.
Also applies to: 255-257
291-292
: LGTM: Standardized backend to "DEFAULT"Matches the new accepted backend tokens.
330-333
: LGTM: NGram test configs use "DEFAULT"Consistent with the normalization efforts.
Also applies to: 338-341
381-384
: LGTM: Eagle3 test configs use "DEFAULT"Aligned with llm_args backend literals.
Also applies to: 395-397
535-536
: LGTM: Llama4ScoutInstruct uses "DEFAULT"Backend normalization is correct here as well.
575-576
: LGTM: DeepSeekV3Lite uses "DEFAULT"Consistent backend token.
624-626
: LGTM: Gemma3 test configs use "DEFAULT"Good consistency across context/generation configs.
Also applies to: 631-633
675-677
: LGTM: Qwen3 test configs use "DEFAULT"Backend normalization looks complete for this test.
Also applies to: 682-684
examples/cpp/executor/README.md (1)
127-131
: Good update: switch to TRTLLM_USE_UCX_KVCACHEThis aligns the disaggregated executor example with the UCX-based KV cache guidance used elsewhere.
export TRTLLM_USE_UCX_KVCACHE=1 | ||
mpirun -n ${proc} benchmarks/disaggServerBenchmark --context_engine_dirs ${context_engine_0},${context_engine_1}...,${context_engine_{m-1}} \ | ||
--generation_engine_dirs ${generation_engine_0},${generation_engine_1}...,${generation_engine_{n-1}} --dataset ${dataset_path} | ||
``` | ||
This command will launch m context engines and n generation engines. You need to ensure `proc` is equal to the sum of the number of processes required for each engine plus 1. Since we use orchestrator mode for `disaggServerBenchmark` we need an additional process as the orchestrator. For example, if there are two context engines (one is TP2_PP1,another is TP1_PP1) and two generation engines(one is TP2_PP1,another is TP1_PP1), then the `proc` value should be set to 7. | ||
|
||
for example: | ||
``` | ||
export TRTLLM_USE_MPI_KVCACHE=1 | ||
export TRTLLM_USE_UCX_KVCACHE=1 | ||
mpirun -n 7 benchmarks/disaggServerBenchmark --context_engine_dirs ${llama_7b_tp2_pp1_dir},${llama_7b_tp1_pp1_dir} --generation_engine_dirs ${llama_7b_tp1_pp1_dir},${llama_7b_tp2_pp1_dir} --dataset ${dataset_path} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Env var switch to UCX — LGTM; fix fenced-code languages (MD040)
Change to TRTLLM_USE_UCX_KVCACHE is correct. Add a language to fenced blocks to satisfy markdownlint (MD040).
-```
+```bash
export TRTLLM_USE_UCX_KVCACHE=1
mpirun -n ${proc} benchmarks/disaggServerBenchmark --context_engine_dirs ${context_engine_0},${context_engine_1}...,${context_engine_{m-1}} \
--generation_engine_dirs ${generation_engine_0},${generation_engine_1}...,${generation_engine_{n-1}} --dataset ${dataset_path}
```diff
-```
+```bash
export TRTLLM_USE_UCX_KVCACHE=1
mpirun -n 7 benchmarks/disaggServerBenchmark --context_engine_dirs ${llama_7b_tp2_pp1_dir},${llama_7b_tp1_pp1_dir} --generation_engine_dirs ${llama_7b_tp1_pp1_dir},${llama_7b_tp2_pp1_dir} --dataset ${dataset_path}
# need 6 gpus and 7 processes to launch the benchmark.
To catch any stale references to the old env var or lowercase backends:
---
🏁 Script executed:
```shell
#!/bin/bash
set -euo pipefail
echo "Searching for old env var name..."
rg -n --hidden --glob '!.git' 'TRTLLM_USE_MPI_KVCACHE'
echo "Searching for lowercase backend literals in code/configs..."
rg -n --hidden --glob '!.git' -E 'backend:\s*"(?:default|ucx|mpi|nixl)"|backend:\s*(?:default|ucx|mpi|nixl)|CacheTransceiverConfig\(\s*backend\s*=\s*"(?:default|ucx|mpi|nixl)"'
Length of output: 2484
Add bash
to fenced code blocks in benchmarks/cpp/README.md for MD040
The switch to TRTLLM_USE_UCX_KVCACHE
is correct. References to TRTLLM_USE_MPI_KVCACHE
in tests and code are intentional for MPI-backend support and can remain unchanged. Please update the fenced code blocks at lines 339–349 to specify the bash
language:
- File: benchmarks/cpp/README.md, lines 339–349
-``` +```bash export TRTLLM_USE_UCX_KVCACHE=1 mpirun -n ${proc} benchmarks/disaggServerBenchmark --context_engine_dirs ${context_engine_0},${context_engine_1}...,${context_engine_{m-1}} \ --generation_engine_dirs ${generation_engine_0},${generation_engine_1}...,${generation_engine_{n-1}} --dataset ${dataset_path}
```diff -``` +```bash export TRTLLM_USE_UCX_KVCACHE=1 mpirun -n 7 benchmarks/disaggServerBenchmark --context_engine_dirs ${llama_7b_tp2_pp1_dir},${llama_7b_tp1_pp1_dir} \ --generation_engine_dirs ${llama_7b_tp1_pp1_dir},${llama_7b_tp2_pp1_dir} --dataset ${dataset_path} # need 6 gpus and 7 processes to launch the benchmark.
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
346-346: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
In benchmarks/cpp/README.md around lines 339 to 349, the fenced code blocks are
missing the language tag which triggers MD040; update both fenced code blocks in
that range to start with ```bash (preserve all existing lines and content inside
each block, including the TRTLLM_USE_UCX_KVCACHE export, mpirun command, and the
trailing comment) so the blocks are explicitly marked as bash.
'backend': 'DEFAULT', | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Emit uppercase backend — LGTM; consider making it configurable + add NVIDIA header
- Emitting 'DEFAULT' is correct per the new Literal type.
- Optional: add a CLI flag to choose the cache transceiver backend (DEFAULT/UCX/NIXL/MPI) instead of hardcoding.
- Compliance: this Python file appears to be missing the required NVIDIA copyright header.
Minimal change within this hunk if you add a variable cache_transceiver_backend
:
- 'backend': 'DEFAULT',
+ 'backend': cache_transceiver_backend,
…and similarly for generation_servers.
Outside this hunk, suggested additions:
# Add near function signature
def gen_config_file(..., cache_transceiver_max_num_tokens: int = 4608,
cache_transceiver_backend: str = "DEFAULT") -> None:
...
# In argparse (main)
parser.add_argument("--cache_transceiver_backend",
type=str,
choices=["DEFAULT", "UCX", "NIXL", "MPI"],
default="DEFAULT",
help="Backend for cache transceiver")
# Pass through to gen_config_file(...)
..., args.server_port, args.cache_transceiver_max_num_tokens,
args.cache_transceiver_backend)
Add header at top of file (exact wording may vary per repo template):
# Copyright (c) 2025, NVIDIA CORPORATION.
# All rights reserved.
Also applies to: 228-229
🤖 Prompt for AI Agents
In examples/disaggregated/slurm/gen_yaml.py around lines 200-201 (and similarly
lines 228-229), the backend value is hardcoded to 'DEFAULT' and the file is
missing the NVIDIA copyright header; add a new parameter
cache_transceiver_backend (default "DEFAULT") to gen_config_file's signature and
thread it through from argparse (add a --cache_transceiver_backend flag with
choices ["DEFAULT","UCX","NIXL","MPI"]), replace the hardcoded 'DEFAULT'
occurrences for cache_transceiver_backend (and generation_servers equivalent)
with that variable, and prepend the repository's required NVIDIA copyright
header at the top of the file.
@@ -1055,7 +1055,7 @@ class CacheTransceiverConfig(StrictBaseModel, PybindMirror): | |||
Configuration for the cache transceiver. | |||
""" | |||
|
|||
backend: Optional[Literal["default", "ucx", "nixl", "mpi"]] = Field( | |||
backend: Optional[Literal["DEFAULT", "UCX", "NIXL", "MPI"]] = Field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a breaking change, are there a specific reason for making them capitalized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a flaw in the document - which was discussed in PR 6323 . If it's a choice between uppercase and lowercase, I think uppercase is more appropriate since they are acronyms.
321327a
to
ea3715d
Compare
Summary by CodeRabbit
Documentation
Refactor
Tests