[TRTLLM-6106][feat] Add support for KVCache transfer from KVCache reuse path #6348

Tabrizian · 2025-07-25T02:23:30Z

Summary by CodeRabbit

New Features
- Session-based KV-cache transfers with multi-connection support; Python binding to pin per-request cache blocks (pin_blocks).
Performance
- Selective, hash-driven block reuse and lookup to reduce KV-cache transfer and allocation overhead.
Behavior Changes
- Executor flag now influences block-reuse lifecycle (requests may be terminated vs retained during context transmission).
API Changes
- Environment toggle for selective transfer removed; transceiver/transfer interfaces reorganized.
Tests
- Added/updated tests for reuse-tree lookups and transceiver/cache transfer flows.
Chores
- Build and header cleanup to align with new transfer architecture.

Description

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

coderabbitai · 2025-07-25T02:23:37Z

Caution

Review failed

An error occurred during the review process. Please try again later.

📝 Walkthrough

Walkthrough

Renames DataResponder/DataRequester → CacheSender/CacheReceiver, deletes legacy impl files, introduces TransferSession and hash-driven block-range transfers, adds pinBlocks and hash-based reuse lookups, updates related headers, tests, Python binding, and serialization for BlockKey.

Changes

Cohort / File(s)	Summary
CacheTransceiver refactor `cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h`, `cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp`	Replace `DataResponder`/`DataRequester` with `CacheSender`/`CacheReceiver`; rename members (`mDataResponder`→`mCacheSender`, `mDataRequester`→`mCacheReceiver`, `mResponderFutures`→`mSenderFutures`); update comm-state sourcing and async send/receive calls; add commented `mCacheServer`.
Public transceiver API & wrappers `cpp/tensorrt_llm/batch_manager/dataTransceiver.h`, `cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp`	Add `CacheSender`/`CacheReceiver` wrappers (PIMPL), aliases (`SizeType32`, `TransferSession`, `BaseCacheFormatter`), `TransceiverTag`, async send/receive APIs and request/session helpers; remove old DataSender/DataReceiver/DataResponder/DataRequester and KvCacheMeasureHelper.
Legacy impl removal & build update `cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp`, `.../dataTransceiverImpl.h`, `cpp/tensorrt_llm/batch_manager/CMakeLists.txt`	Delete legacy `dataTransceiverImpl.*` (DataSenderImpl/DataReceiverImpl and helpers) and remove `dataTransceiverImpl.cpp` from CMake SRCS.
Formatter & session model `cpp/tensorrt_llm/batch_manager/cacheFormatter.h`, `.../cacheFormatter.cpp`, `.../mlaCacheFormatter.cpp`	Introduce `TransferSession` and `KvCacheMeasureHelper`; change BaseCacheFormatter interface to session-based `format`/`unformat`; extend `getBlockRangeForSending` to accept `allBlockKeys` and `indexFromEnd`; wire hashes through sending/receiving paths.
KV-cache manager & utils `cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h`, `.../kvCacheManager.cpp`, `.../kvCacheUtils.h`	Add `BlockKey::hash`, `BlockKey` equality implementation, `buildBlockKeys` declaration, `BlockRange::fromReuseTree`, pinning API `pinBlocks(...)`, and 1/2-arg reuse lookup overloads across WindowBlockManager/BlockManager/KVCacheManager/BaseKVCacheManager.
LLM request API `cpp/include/tensorrt_llm/batch_manager/llmRequest.h`	Remove `mRequestedBlockHashes` and its getter/setter from `GenericLlmRequest`.
Agent / UCX connection tweaks `cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h`, `.../connection.cpp`, `cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.cpp`	Rename sender-side memory descriptor to `mCacheReceiverBufferDesc` and parameter; update AgentConnection usage; replace include of removed impl header with `dataTransceiver.h` in UCX path.
Serialization & utils `cpp/include/tensorrt_llm/executor/serialization.h`, `cpp/tensorrt_llm/executor/serialization.cpp`, `cpp/tensorrt_llm/executor/serializeUtils.h`	Add BlockKey (de)serialization API and include; extend generic serialize utilities to support `std::array` and `std::pair`.
Env & Python/Executor integration `cpp/tensorrt_llm/common/envUtils.cpp`, `cpp/tensorrt_llm/pybind/batch_manager/kvCacheManager.cpp`, `tensorrt_llm/_torch/pyexecutor/py_executor.py`	Remove `getEnvDisableSelectiveCacheTransfer()`; expose `pin_blocks` binding for `BaseKVCacheManager::pinBlocks`; add `block_reuse_enabled` flag to PyExecutor and adjust response termination behavior.
Tests `cpp/tests/unit_tests/multi_gpu/cacheTransceiverTest.cpp`, `cpp/tests/unit_tests/executor/ucxCommTest.cpp`, `cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp`	Update tests/mocks and usages to `CacheSender`/`CacheReceiver`; remove `dataTransceiverImpl.h` includes; add duplicated `FindBlocksInReuseTreeByHashesTest`.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor PyExec as PyExecutor
  participant CT as CacheTransceiver
  participant CS as CacheSender
  participant CR as CacheReceiver
  participant CM as ConnectionManager
  participant F as BaseCacheFormatter
  participant KM as KVCacheManager

  rect rgb(245,248,255)
  note over PyExec,CT: Send path (async)
  PyExec->>CT: respondAndSendAsync(req)
  CT->>CS: sendAsync(req)
  CS->>CM: resolve connections/peers
  CS->>F: format(TransferSession{allBlockKeys, indexFromEnd})
  F->>KM: getBlockRangeForSending(..., allBlockKeys, indexFromEnd)
  F-->>CS: formatted buffers
  CS-->>CT: future<void> (completion)
  end

  rect rgb(245,255,245)
  note over PyExec,CT: Receive path (async)
  PyExec->>CT: requestAndReceiveAsync(req)
  CT->>CR: receiveAsync(req)
  CR->>CM: send RequestInfo (state + allBlockKeys + indexFromEnd)
  CR->>F: unformat(TransferSession)
  F->>KM: allocate or reuse blocks by hashes
  CR-->>CT: future<void> (completion)
  end

  rect rgb(255,248,240)
  note over PyExec,KM: On completion (if block reuse enabled)
  PyExec->>KM: pinBlocks(requestId)
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

[None][feat] move kv cache measure into transfer session #6633 — related: TransferSession / KV-cache measurement and session wiring overlap.
[None][chore] Dead code elimination, we no longer record/fetch through WindowBlockManager:: mContextBlocksByHash #6249 — related: KV-cache manager changes for BlockKey/hash and reuse-by-hash logic.
chore:[BREAKING CHANGE] use cacheTransceiverConfig as knobs for disagg service #5234 — related: transceiver renames and CacheTransceiver / sender/receiver API refactors.

Suggested labels

KV-Cache Management

Suggested reviewers

chuangz0
nv-guomingz
pcastonguay
Shixiaowei02

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🔭 Outside diff range comments (4)

cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h (1)
18-18: Use preprocessor guard instead of pragma once.

According to coding guidelines, header files should use a preprocessor guard with prefix TRTLLM_ followed by the filename in caps.

Replace #pragma once with a proper header guard:
-#pragma once
+#ifndef TRTLLM_CONNECTION_H
+#define TRTLLM_CONNECTION_H
And add at the end of the file:
+#endif // TRTLLM_CONNECTION_H
cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h (2)
17-17: Use preprocessor guard instead of pragma once.

According to coding guidelines, header files should use a preprocessor guard with prefix TRTLLM_ followed by the filename in caps.

Replace #pragma once with a proper header guard:
-#pragma once
+#ifndef TRTLLM_CACHETRANSCEIVER_H
+#define TRTLLM_CACHETRANSCEIVER_H
And add at the end of the file:
+#endif // TRTLLM_CACHETRANSCEIVER_H
2-2: Update copyright year to include 2025.

According to coding guidelines, all TensorRT-LLM code should contain an NVIDIA copyright header that includes the current year (2025).
- * Copyright (c) 2023-2024, NVIDIA CORPORATION.  All rights reserved.
+ * Copyright (c) 2023-2025, NVIDIA CORPORATION.  All rights reserved.
cpp/tests/batch_manager/cacheTransceiverTest.cpp (1)
655-658: Remove unused variable assignment.

The variable cacheType is assigned but never used. This appears to be dead code.
-        if (kvFactor == 1)
-        {
-            auto cacheType = CacheType::kSELFKONLY;
-        }

🧹 Nitpick comments (2)

examples/disaggregated/clients/disagg_client.py (1)

81-126: Consider adding timing measurements to send_chat_request for consistency.

The send_chat_request function lacks the same TTFT and ITL timing measurements that were added to send_request. This creates inconsistent behavior between the two endpoints.

Consider adding similar timing measurements to send_chat_request or extracting the timing logic into a shared helper function to maintain consistency across both endpoints.

examples/disaggregated/clients/prompts.json (1)

1-3: Consider maintaining diverse test prompts for comprehensive testing.

While simplifying to a single prompt reduces complexity, having diverse prompts helps test various scenarios (different lengths, complexity, languages, etc.). Consider whether this reduction impacts test coverage adequately.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0f2f11f and eed674f.

📒 Files selected for processing (13)

cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h (2 hunks)
cpp/tensorrt_llm/batch_manager/CMakeLists.txt (0 hunks)
cpp/tensorrt_llm/batch_manager/cacheFormatter.h (3 hunks)
cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp (7 hunks)
cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (15 hunks)
cpp/tensorrt_llm/batch_manager/dataTransceiver.h (3 hunks)
cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp (0 hunks)
cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.h (0 hunks)
cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp (2 hunks)
cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h (2 hunks)
cpp/tests/batch_manager/cacheTransceiverTest.cpp (11 hunks)
examples/disaggregated/clients/disagg_client.py (3 hunks)
examples/disaggregated/clients/prompts.json (1 hunks)

💤 Files with no reviewable changes (3)

cpp/tensorrt_llm/batch_manager/CMakeLists.txt
cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.h
cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp

🧰 Additional context used

📓 Path-based instructions (4)

**/*.{cpp,h,hpp,cc,cxx}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.{cpp,h,hpp,cc,cxx}: Closing braces of namespaces should have a comment saying the namespace it closes (e.g., } // namespace foo)
Prefer const or constexpr variables over #defines whenever possible, as the latter are not visible to the compiler.
A variable that is not modified after its initialization should be declared as const.
Except 0, nullptr, true, false, all other literals should only be used for variable initialization.
Use the Allman indentation style for braces in C++ code.
Put the semicolon for an empty for or while loop in a new line.
The statement forming the body of a switch, while, do .. while or for statement shall be a compound statement (use brace-delimited statements).
If and else should always be followed by brace-delimited statements, even if empty or a single statement.
C++ filenames should use camel case with first letter lowercase (e.g., thisIsAFilename.cpp), and must be case-insensitive unique within a compilation target.
All types (including class names) should use camel case with uppercase first letter (e.g., FooBarClass).
Local variables, methods, and namespaces should use camel case with first letter lowercase (e.g., localFooBar).
Non-magic-number global variables that are non-static and not defined in anonymous namespace should use camel case prefixed by a lower case 'g' (e.g., gDontUseGlobalFoos).
Non-magic-number global variables that are static or defined in an anonymous namespace should use camel case prefixed by a lower case 's' (e.g., sMutableStaticGlobal).
Locally visible static variable should use camel case with lowercase prefix 's' as the first letter of the name (e.g., static std::once_flag sFlag;).
Class member variables should use camel case prefixed with an 'm' (e.g., mNbFooValues). Public member variables do not require the 'm' prefix but it is encouraged for clarity.
Enumerations, global constants, static constants at class-scope, and function-scope magic-number/literal constants are uppercase snakecase with prefix...

Files:

cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp
cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h
cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp
cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h
cpp/tensorrt_llm/batch_manager/cacheFormatter.h
cpp/tensorrt_llm/batch_manager/dataTransceiver.h
cpp/tests/batch_manager/cacheTransceiverTest.cpp
cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp

**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp
examples/disaggregated/clients/disagg_client.py
cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h
cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp
cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h
cpp/tensorrt_llm/batch_manager/cacheFormatter.h
cpp/tensorrt_llm/batch_manager/dataTransceiver.h
cpp/tests/batch_manager/cacheTransceiverTest.cpp
cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp

**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

examples/disaggregated/clients/disagg_client.py

**/*.{h,hpp}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

Use a preprocessor guard in header files. The guard name must have prefix TRTLLM_ followed by the filename, all in caps, and no trailing underscore.

Files:

cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h
cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h
cpp/tensorrt_llm/batch_manager/cacheFormatter.h
cpp/tensorrt_llm/batch_manager/dataTransceiver.h

🧠 Learnings (3)

cpp/tensorrt_llm/batch_manager/cacheFormatter.h (1)

Learnt from: amitz-nv
PR: #5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks is_adapter_in_cpu_cache() and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.

cpp/tensorrt_llm/batch_manager/dataTransceiver.h (1)

Learnt from: amitz-nv
PR: #5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks is_adapter_in_cpu_cache() and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (1)

Learnt from: amitz-nv
PR: #5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks is_adapter_in_cpu_cache() and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.

🔇 Additional comments (24)

cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h (1)

178-178: Consistent renaming verified

I performed a repository-wide search and confirmed there are no remaining occurrences of mReceiverBufferDesc. All references in

cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h

cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp

have been updated to mCacheReceiverBufferDesc.

LGTM!

cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp (2)

84-84: LGTM! Consistent implementation of the renamed member variable.

The changes correctly implement the renamed mCacheReceiverBufferDesc member variable throughout the implementation, maintaining consistency with the header file updates.

Also applies to: 140-142

468-468: LGTM! Proper namespace closing comment.

The namespace closing comment follows the coding guidelines correctly.

cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h (1)

37-38: Renaming Verification Complete

All references to the old class names have been removed:

No matches for DataResponder

No matches for DataRequester

No matches for mDataSender or mDataReceiver

The renaming to CacheSender/CacheReceiver is consistent and complete. No further action required.

cpp/tests/batch_manager/cacheTransceiverTest.cpp (6)

159-185: LGTM! Mock class properly renamed and implemented.

The mock class follows appropriate testing patterns with proper use of Google Mock features.

186-192: LGTM! Mock receiver class properly implemented.

The mock methods are correctly defined for testing cache receiving functionality.

211-230: LGTM! Test properly updated to use renamed classes.

The test correctly uses CacheSender instead of DataResponder while maintaining the original test logic.

232-252: LGTM! Test properly uses CacheReceiver.

The test correctly replaces DataRequester with CacheReceiver while maintaining test functionality.

396-404: LGTM! Cache transceiver setup properly updated.

The setup correctly instantiates CacheSender and CacheReceiver with proper implementations using CacheSenderImpl and CacheReceiverImpl.

762-768: LGTM! Asymmetric cache test properly updated.

The test correctly uses CacheSender and CacheReceiver with their implementation classes.

cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp (3)

196-200: LGTM! Proper initialization of cache sender and receiver.

The constructor correctly initializes CacheSender and CacheReceiver with appropriate parameters and formatter creation.

236-298: LGTM! Methods properly updated to use new cache classes.

All async send/receive operations correctly use mCacheSender and mCacheReceiver with proper future handling.

369-436: LGTM! Context transfer status checking properly updated.

The method correctly uses mSenderFutures (renamed from mResponderFutures) while maintaining the original synchronization logic.

cpp/tensorrt_llm/batch_manager/cacheFormatter.h (2)

45-122: LGTM! Well-designed TransferSession class.

The class provides a clean abstraction for managing transfer sessions with proper const-correctness, precondition checks, and encapsulation.

163-222: LGTM! Performance measurement helper properly implemented.

The class provides thread-safe performance measurement with proper file output handling. Good use of RAII pattern for automatic output on destruction.

cpp/tensorrt_llm/batch_manager/dataTransceiver.h (1)

115-183: LGTM! Clean interface design with pImpl pattern.

The CacheSender and CacheReceiver classes provide clean interfaces with proper use of the pImpl idiom for implementation hiding and ABI stability.

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (8)

1-32: LGTM: Copyright header and includes look good.

The copyright header contains the current year (2025) as required, and the new include for agent connection utilities aligns with the refactoring objectives.

38-57: Good RAII design and helper functions.

The new helper structures and functions are well-designed:

Using declarations improve readability

ReceiveCacheResource follows RAII principles correctly

tagFromRequestId implements proper bit manipulation for tag generation

The magic number kDATA_TAG{43} is appropriately named as a constexpr constant.

115-174: Well-designed constructor and core methods.

The implementation demonstrates good practices:

Constructor properly validates inputs with TLLM_CHECK

Resource management uses smart pointers appropriately

Thread safety is maintained with proper mutex usage

Dependency injection pattern is correctly implemented

The new methods getCounterpartsCount and release provide clean session management.

176-228: Complex but well-structured request info handling.

The method correctly handles both agent and direct connection scenarios:

Appropriate use of dynamic_cast for type discrimination

Comprehensive error handling with informative messages

Proper thread safety with mutex protection

Session management is correctly implemented

The method's complexity is justified given the dual connection handling requirements.

230-345: Excellent thread management and synchronization.

The implementation demonstrates solid concurrent programming practices:

Clean separation between sync and async operations

Proper use of condition variables and mutexes for thread coordination

Exception handling ensures promises are always resolved

Thread naming (dataTransResp) aids in debugging

Producer-consumer pattern is correctly implemented

The variable renaming from responder to sender terminology is consistent with the refactor.

405-529: Solid CacheReceiver implementation with good async handling.

The receiver implementation mirrors the sender's good practices:

Constructor properly validates inputs and manages resources

Async operations are correctly implemented with std::async

Complex connection handling for both agent and direct scenarios

Proper resource management with ReceiveCacheResource

Thread safety maintained throughout

The symmetry with CacheSender provides a clean, consistent API.

560-698: Well-implemented async resource management.

The async processing implementation demonstrates good concurrent design:

AsyncResource properly encapsulates thread synchronization primitives

Worker thread pattern correctly processes queued requests

Proper termination handling prevents resource leaks

Exception handling ensures all promises are resolved

Thread naming (dataTransRequest) aids debugging

Use of std::deque for request queue is appropriate

The cleanup logic in the destructor properly terminates all worker threads.

700-736: Clean public interface implementation using pimpl idiom.

The public interfaces are well-designed:

Proper use of pimpl idiom with std::unique_ptr<Impl>

Constructor delegation to implementation is clean

Method delegation is straightforward and maintains const correctness

Defaulted destructors are appropriate with smart pointer members

The namespace closing comment follows the coding guidelines correctly.

examples/disaggregated/clients/disagg_client.py

Tabrizian · 2025-07-25T02:28:30Z

/bot run

tensorrt-cicd · 2025-07-25T02:34:58Z

PR_Github #12929 [ run ] triggered by Bot

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (1)
420-424: Consider thread pool instead of frequent thread creation.

The TODO comment indicates awareness of the performance issue with frequent thread creation. The current implementation uses std::async with std::launch::async which may create new threads frequently.

Consider implementing a thread pool for better performance:
- return std::async(std::launch::async, &CacheReceiver::Impl::requestSync, this, std::ref(llmRequest));
+ // Use thread pool implementation instead of direct async call
+ return mThreadPool.submit(&CacheReceiver::Impl::requestSync, this, std::ref(llmRequest));

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 376d131 and 90d8774.

📒 Files selected for processing (11)

cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h (2 hunks)
cpp/tensorrt_llm/batch_manager/CMakeLists.txt (0 hunks)
cpp/tensorrt_llm/batch_manager/cacheFormatter.h (3 hunks)
cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp (7 hunks)
cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (15 hunks)
cpp/tensorrt_llm/batch_manager/dataTransceiver.h (3 hunks)
cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp (0 hunks)
cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.h (0 hunks)
cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp (2 hunks)
cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h (2 hunks)
cpp/tests/batch_manager/cacheTransceiverTest.cpp (11 hunks)

💤 Files with no reviewable changes (3)

cpp/tensorrt_llm/batch_manager/CMakeLists.txt
cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.h
cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp

🚧 Files skipped from review as they are similar to previous changes (6)

cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp
cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h
cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h
cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp
cpp/tensorrt_llm/batch_manager/cacheFormatter.h
cpp/tests/batch_manager/cacheTransceiverTest.cpp

🧰 Additional context used

📓 Path-based instructions (3)

**/*.{cpp,h,hpp,cc,cxx}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.{cpp,h,hpp,cc,cxx}: Closing braces of namespaces should have a comment saying the namespace it closes (e.g., } // namespace foo)
Prefer const or constexpr variables over #defines whenever possible, as the latter are not visible to the compiler.
A variable that is not modified after its initialization should be declared as const.
Except 0, nullptr, true, false, all other literals should only be used for variable initialization.
Use the Allman indentation style for braces in C++ code.
Put the semicolon for an empty for or while loop in a new line.
The statement forming the body of a switch, while, do .. while or for statement shall be a compound statement (use brace-delimited statements).
If and else should always be followed by brace-delimited statements, even if empty or a single statement.
C++ filenames should use camel case with first letter lowercase (e.g., thisIsAFilename.cpp), and must be case-insensitive unique within a compilation target.
All types (including class names) should use camel case with uppercase first letter (e.g., FooBarClass).
Local variables, methods, and namespaces should use camel case with first letter lowercase (e.g., localFooBar).
Non-magic-number global variables that are non-static and not defined in anonymous namespace should use camel case prefixed by a lower case 'g' (e.g., gDontUseGlobalFoos).
Non-magic-number global variables that are static or defined in an anonymous namespace should use camel case prefixed by a lower case 's' (e.g., sMutableStaticGlobal).
Locally visible static variable should use camel case with lowercase prefix 's' as the first letter of the name (e.g., static std::once_flag sFlag;).
Class member variables should use camel case prefixed with an 'm' (e.g., mNbFooValues). Public member variables do not require the 'm' prefix but it is encouraged for clarity.
Enumerations, global constants, static constants at class-scope, and function-scope magic-number/literal constants are uppercase snakecase with prefix...

Files:

cpp/tensorrt_llm/batch_manager/dataTransceiver.h
cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp

**/*.{h,hpp}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

Use a preprocessor guard in header files. The guard name must have prefix TRTLLM_ followed by the filename, all in caps, and no trailing underscore.

Files:

cpp/tensorrt_llm/batch_manager/dataTransceiver.h

**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

cpp/tensorrt_llm/batch_manager/dataTransceiver.h
cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp

🧠 Learnings (2)

cpp/tensorrt_llm/batch_manager/dataTransceiver.h (1)

Learnt from: amitz-nv
PR: #5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks is_adapter_in_cpu_cache() and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (1)

Learnt from: amitz-nv
PR: #5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks is_adapter_in_cpu_cache() and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (48)

cpp/tensorrt_llm/batch_manager/dataTransceiver.h (17)

24-24: LGTM: Include added for new cache formatter abstraction.

The new include for cacheFormatter.h is correctly added to support the refactored architecture that uses BaseCacheFormatter and TransferSession from the kv_cache_manager namespace.

42-44: LGTM: Type aliases properly updated for the refactored architecture.

The type aliases correctly reference the new abstractions:

SizeType32 properly aliased from runtime namespace

TransferSession correctly aliased from kv_cache_manager namespace, replacing the previous internal definition

45-58: LGTM: TransceiverTag struct provides proper message tagging.

The TransceiverTag struct correctly defines:

Enumerated message types for REQUEST_SEND and TERMINATION

Appropriate tag constants for ID, INFO_SIZE, and INFO message components

Uses constexpr for compile-time constants as per coding guidelines

The naming follows the coding guidelines with uppercase snake_case for constants prefixed with 'k'.

58-58: LGTM: BaseCacheFormatter type alias correctly defined.

The type alias correctly references the BaseCacheFormatter from the kv_cache_manager namespace, maintaining consistency with the new architecture.

115-121: LGTM: CacheSender constructor signature properly designed.

The constructor correctly accepts:

ConnectionManager* for managing connections

CacheState for cache state information

SizeType32 for self index

std::unique_ptr<BaseCacheFormatter> for cache formatting operations

The use of std::unique_ptr follows C++ best practices for single resource ownership as specified in the coding guidelines.

122-126: LGTM: sendAsync method signature correctly updated.

The method signature properly:

Returns std::future<void> for asynchronous operation

Takes LlmRequest& as parameter

Uses [[nodiscard]] attribute appropriately

Includes comprehensive documentation

130-134: LGTM: Communication state accessors properly updated.

The getCommState() and setCommState() methods correctly use the new executor::kv_cache::CommState type, maintaining the same interface pattern as before but with updated types.

136-150: LGTM: Additional methods properly defined for the refactored architecture.

The new methods correctly provide:

recvRequestInfo() for receiving request information

sendSync() for synchronous sending

sendRequestInfo() returning TransferSession

receiveSync() accepting TransferSession&

All methods use appropriate parameter types and return types for the new architecture.

153-153: LGTM: Destructor properly declared.

The destructor is correctly declared and will be implemented to properly clean up resources.

160-177: LGTM: CacheReceiver class properly designed.

The CacheReceiver class correctly:

Has matching constructor signature with CacheSender

Provides receiveAsync() method for asynchronous reception

Includes sendRequestInfo() and receiveSync() methods for coordination

Uses proper parameter and return types

Follows the same design patterns as CacheSender

24-24: LGTM! Proper include for new dependency.

The inclusion of cacheFormatter.h aligns with the refactoring to use kv_cache_manager types.

42-44: LGTM! Clean type aliasing approach.

The type aliases consolidate dependencies and the TransferSession alias properly delegates to the kv_cache_manager namespace as intended by the refactor.

45-57: LGTM! Well-structured communication protocol.

The TransceiverTag struct provides a clean enumeration of message types and associated tag constants for the communication protocol.

58-58: LGTM! Consistent type aliasing.

The BaseCacheFormatter alias maintains consistency with the delegation pattern to kv_cache_manager.

160-182: LGTM! Well-designed receiver interface.

The CacheReceiver class interface mirrors the sender pattern appropriately, with proper constructor signature and async receive capability.

184-184: LGTM! Proper namespace closure comment.

The namespace closure comment follows the coding guidelines requirement for namespace comments.

115-158: Constructor Validation Confirmed – No Action Needed

The Impl constructor already enforces non-null and consistent state checks:

Line 127: TLLM_CHECK(mManager); ensures manager is not null.

Line 128: TLLM_CHECK(mManager->getCommState().getSelfIdx() == selfIndex); verifies selfIndex matches the manager’s state.

All public methods simply forward to mImpl, maintaining interface consistency.

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (31)

26-26: LGTM: Include added for agent connection utilities.

The new include for agent connection utilities is correctly added to support the refactored connection management functionality.

38-45: LGTM: Helper function and type aliases properly implemented.

The code correctly:

Defines type aliases for the agent connection manager and data context

Implements tagFromRequestId() helper function with proper bit manipulation to create unique tags

Uses constexpr for the constant as per coding guidelines

47-57: LGTM: ReceiveCacheResource struct properly designed.

The struct correctly:

Manages buffer manager and CUDA event resources

Uses move semantics in the constructor

Follows proper resource management patterns

Uses member initializer list as per C++ best practices

115-132: LGTM: CacheSender::Impl constructor properly implemented.

The constructor correctly:

Initializes all member variables using member initializer list

Performs proper null pointer checks with TLLM_CHECK

Validates self index consistency

Sets up CUDA device context

Launches the response thread asynchronously

The use of std::make_shared for CudaStream follows smart pointer best practices.

134-149: LGTM: sendAsync method properly implemented with thread synchronization.

The method correctly:

Creates promise/future pair for asynchronous operation

Uses proper mutex locking for thread safety

Adds requests to the ready responses queue

Notifies waiting threads via condition variable

Returns the future for caller synchronization

The nested locking approach prevents race conditions.

230-237: LGTM: sendSync method properly implemented.

The method correctly:

Looks up the transfer session by request ID

Sets the LLM request on the session

Delegates formatting to the formatter

Includes proper error checking

251-265: LGTM: sendAndRemoveResponse method with proper exception handling.

The method correctly:

Uses CUDA device setting for proper context

Calls synchronous send and cleanup

Provides proper exception handling with promise notification

Uses noexcept specification appropriately

267-345: LGTM: Response thread implementation with comprehensive synchronization.

The response thread correctly:

Sets thread name for debugging

Uses proper CUDA device context

Implements condition variable wait with proper predicates

Handles termination gracefully

Manages request counting and cleanup

Supports both parallel and sequential sending modes

Includes comprehensive error handling and logging

The thread synchronization logic appears sound with proper use of mutexes and condition variables.

408-418: LGTM: CacheReceiver::Impl constructor properly implemented.

The constructor follows the same pattern as CacheSender::Impl with:

Proper member initialization

Null pointer validation

Self index consistency check

CUDA device setup

460-463: LGTM: receiveSync method properly delegates to formatter.

The method correctly delegates the unformatting operation to the formatter using the transfer session, maintaining proper separation of concerns.

465-529: LGTM: sendRequestInfo method with comprehensive connection handling.

The method correctly:

Validates formatter support for cache states

Handles selective cache transfer optimization

Supports both agent and non-agent connection managers

Manages buffer index assignment for agent connections

Creates transfer session with proper parameters

Includes comprehensive error checking and validation

The dual path handling for agent vs. non-agent connections is properly implemented.

531-547: LGTM: getReceiveCacheResource method with proper resource management.

The method correctly:

Uses scoped lock for thread safety

Implements per-process resource isolation when concurrent mode is enabled

Creates resources on-demand with proper RAII

Uses default process string for non-concurrent mode

549-559: LGTM: sendRequestInfo helper method properly implemented.

The helper method correctly:

Serializes request info using the established serialization utilities

Sends data in the proper order (ID, size, then data)

Uses appropriate data contexts for different message components

700-704: LGTM: CacheSender public constructor properly implemented.

The constructor correctly creates the implementation with perfect forwarding of the formatter parameter using std::move.

706-721: LGTM: CacheSender public interface methods properly implemented.

All public methods correctly delegate to the implementation:

sendAsync() forwards the request properly

getCommState() and setCommState() maintain the interface contract

Destructor is properly defaulted

723-734: LGTM: CacheReceiver public interface properly implemented.

The constructor and methods correctly:

Create implementation with proper parameter forwarding

Delegate receiveAsync() to the multi-threaded implementation

Use defaulted destructor for proper cleanup

176-228: Dynamic_cast usage is safe – no action needed

Verified that ConnectionManager is indeed polymorphic (it declares virtual methods, including recvConnect, getConnections, and getCommState), so the dynamic_cast in recvRequestInfo() will behave as intended. The interface is stable and handles both agent and non-agent scenarios correctly.

26-26: LGTM! Updated include for agent utilities.

The include path update aligns with the refactored cache transmission structure.

38-45: LGTM! Clean helper function with proper naming.

The tagFromRequestId function follows proper naming conventions and provides a clean abstraction for generating unique tags from request IDs.

47-57: LGTM! Well-structured resource management.

The ReceiveCacheResource struct properly encapsulates buffer manager and CUDA event resources with appropriate constructor.

134-149: LGTM! Proper async pattern with promise/future.

The sendAsync method correctly implements the promise/future pattern with proper synchronization using mutexes and condition variables.

176-228: LGTM! Comprehensive request info handling with proper validation.

The recvRequestInfo method properly handles both agent and direct connection scenarios, with appropriate validation of cache state compatibility and connection management.

230-237: LGTM! Clean synchronous send implementation.

The sendSync method properly retrieves the session and delegates to the formatter, with appropriate error checking.

405-418: LGTM! Consistent constructor pattern.

The CacheReceiver::Impl constructor follows the same validation and initialization pattern as the sender.

465-529: LGTM! Comprehensive request info sending with agent support.

The sendRequestInfo method properly handles both agent and direct connection scenarios, with appropriate buffer management and connection selection logic.

531-547: LGTM! Thread-safe resource management with proper scoping.

The getReceiveCacheResource method uses proper scoped locking and resource management patterns for concurrent access.

549-559: LGTM! Clean request info serialization.

The sendRequestInfo helper method properly serializes and sends request information using the defined communication protocol.

700-735: LGTM! Clean public interface implementation.

The public interface methods properly delegate to the pImpl pattern implementations with appropriate parameter forwarding and resource management.

736-736: LGTM! Proper namespace closure comment.

The namespace closure comment follows the coding guidelines requirement.

267-345: Thread name and exception handling verified

The thread name “dataTransResp” is only set here and does not conflict with other threads. Exception propagation via set_exception(std::current_exception()) matches the pattern used elsewhere (e.g., in workerPool.h). No further changes are needed.

115-132: Ensure ConnectionManager Access Is Thread-Safe

I didn’t find any explicit synchronization (mutexes/locks) around calls to mManager in this class. Since you spawn a background thread via std::async for Impl::response, concurrent calls to mManager could race unless ConnectionManager itself is fully thread-safe.

Please confirm or address this:

• Constructor (lines 115–132): validates and reads from mManager
• Impl::response (launched async): invokes mManager->recvConnect(...) and other methods
• receiveAsync()/other methods: call mManager->getConnections(...)

If executor::kv_cache::ConnectionManager is not inherently thread-safe, wrap all shared calls in a mutex or otherwise serialize access to mManager.

coderabbitai · 2025-07-25T02:35:41Z

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp

+    [[nodiscard]] size_t getCounterpartsCount(LlmRequest::RequestIdType requestId) const
+    {
+        auto it = mRequestToSession.find(requestId);
+        TLLM_CHECK(it != mRequestToSession.end());
+        return it->second.getConnections().size();
+    }
+
+    void release(LlmRequest::RequestIdType requestId)
+    {
+        auto it = mRequestToSession.find(requestId);
+        TLLM_CHECK(it != mRequestToSession.end());
+        std::unique_lock<std::mutex> lk(mMtxForMap);
+        mRequestToSession.erase(it);
+    }


🛠️ Refactor suggestion

Check thread safety of session map access.

The getCounterpartsCount and release methods access mRequestToSession map. Verify that all map accesses are properly synchronized.

[[nodiscard]] size_t getCounterpartsCount(LlmRequest::RequestIdType requestId) const { + std::unique_lock<std::mutex> lk(mMtxForMap); auto it = mRequestToSession.find(requestId); TLLM_CHECK(it != mRequestToSession.end()); return it->second.getConnections().size(); }

🤖 Prompt for AI Agents

In cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp around lines 161 to 174, the methods getCounterpartsCount and release access the mRequestToSession map without synchronization, which can cause race conditions. To fix this, ensure that all accesses to mRequestToSession are protected by acquiring the mutex mMtxForMap before accessing or modifying the map. Add a std::unique_lock<std::mutex> lock(mMtxForMap) at the start of getCounterpartsCount and confirm the existing lock in release covers the map access properly.

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp

tensorrt-cicd · 2025-07-25T02:47:38Z

PR_Github #12929 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #9638 completed with status: 'FAILURE'

Tabrizian · 2025-07-25T19:27:07Z

/bot run

coderabbitai

Actionable comments posted: 2

🔭 Outside diff range comments (2)

cpp/tensorrt_llm/batch_manager/dataTransceiver.h (1)
18-18: Replace #pragma once with required preprocessor guard.

The coding guidelines specifically require preprocessor guards in header files with the format TRTLLM_ followed by the filename in caps.

Replace the current directive with the required guard format:
-#pragma once
+#ifndef TRTLLM_DATATRANSCEIVER_H
+#define TRTLLM_DATATRANSCEIVER_H
And add the closing directive at the end of the file:
 } // namespace tensorrt_llm::batch_manager
+
+#endif // TRTLLM_DATATRANSCEIVER_H
cpp/tensorrt_llm/batch_manager/cacheFormatter.h (1)
225-261: Fix member variable naming convention.

The member variable kvCacheMeasureHelper should follow the coding guidelines for class member variables.

Apply this fix:
-    KvCacheMeasureHelper kvCacheMeasureHelper{common::getEnvKVCacheTransferOutputPath()};
+    KvCacheMeasureHelper mKvCacheMeasureHelper{common::getEnvKVCacheTransferOutputPath()};
Class member variables should use camelCase prefixed with 'm' according to the coding guidelines.

♻️ Duplicate comments (2)

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (2)
161-174: Thread safety issue with session map access.

The getCounterpartsCount method accesses mRequestToSession without proper synchronization, while release uses mutex protection. This creates a race condition.

This is the same thread safety issue identified in previous reviews. The getCounterpartsCount method needs mutex protection:
 [[nodiscard]] size_t getCounterpartsCount(LlmRequest::RequestIdType requestId) const
 {
+    std::unique_lock<std::mutex> lk(mMtxForMap);
     auto it = mRequestToSession.find(requestId);
     TLLM_CHECK(it != mRequestToSession.end());
     return it->second.getConnections().size();
 }
420-424: TODO: Address inefficient thread creation pattern.

The method creates a new thread for each async receive operation, which is inefficient as noted in the TODO comment.

This matches the previous review feedback about frequent thread creation. The TODO comment correctly identifies this as needing improvement. Consider using the existing requestAndReceiveAsyncMultiThreads pattern or implementing a thread pool.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 90d8774 and 7d3e452.

📒 Files selected for processing (13)

cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h (2 hunks)
cpp/tensorrt_llm/batch_manager/CMakeLists.txt (0 hunks)
cpp/tensorrt_llm/batch_manager/cacheFormatter.h (3 hunks)
cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp (7 hunks)
cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (15 hunks)
cpp/tensorrt_llm/batch_manager/dataTransceiver.h (3 hunks)
cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp (0 hunks)
cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.h (0 hunks)
cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp (2 hunks)
cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h (2 hunks)
cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.cpp (1 hunks)
cpp/tests/batch_manager/cacheTransceiverTest.cpp (11 hunks)
cpp/tests/unit_tests/executor/ucxCommTest.cpp (0 hunks)

💤 Files with no reviewable changes (4)

cpp/tensorrt_llm/batch_manager/CMakeLists.txt
cpp/tests/unit_tests/executor/ucxCommTest.cpp
cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.h
cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp

✅ Files skipped from review due to trivial changes (1)

cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.cpp

🚧 Files skipped from review as they are similar to previous changes (5)

cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h
cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h
cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp
cpp/tests/batch_manager/cacheTransceiverTest.cpp
cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp

🧰 Additional context used

📓 Path-based instructions (3)

**/*.{cpp,h,hpp,cc,cxx}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.{cpp,h,hpp,cc,cxx}: Closing braces of namespaces should have a comment saying the namespace it closes (e.g., } // namespace foo)
Prefer const or constexpr variables over #defines whenever possible, as the latter are not visible to the compiler.
A variable that is not modified after its initialization should be declared as const.
Except 0, nullptr, true, false, all other literals should only be used for variable initialization.
Use the Allman indentation style for braces in C++ code.
Put the semicolon for an empty for or while loop in a new line.
The statement forming the body of a switch, while, do .. while or for statement shall be a compound statement (use brace-delimited statements).
If and else should always be followed by brace-delimited statements, even if empty or a single statement.
C++ filenames should use camel case with first letter lowercase (e.g., thisIsAFilename.cpp), and must be case-insensitive unique within a compilation target.
All types (including class names) should use camel case with uppercase first letter (e.g., FooBarClass).
Local variables, methods, and namespaces should use camel case with first letter lowercase (e.g., localFooBar).
Non-magic-number global variables that are non-static and not defined in anonymous namespace should use camel case prefixed by a lower case 'g' (e.g., gDontUseGlobalFoos).
Non-magic-number global variables that are static or defined in an anonymous namespace should use camel case prefixed by a lower case 's' (e.g., sMutableStaticGlobal).
Locally visible static variable should use camel case with lowercase prefix 's' as the first letter of the name (e.g., static std::once_flag sFlag;).
Class member variables should use camel case prefixed with an 'm' (e.g., mNbFooValues). Public member variables do not require the 'm' prefix but it is encouraged for clarity.
Enumerations, global constants, static constants at class-scope, and function-scope magic-number/literal constants are uppercase snakecase with prefix...

Files:

cpp/tensorrt_llm/batch_manager/cacheFormatter.h
cpp/tensorrt_llm/batch_manager/dataTransceiver.h
cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp

**/*.{h,hpp}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

Use a preprocessor guard in header files. The guard name must have prefix TRTLLM_ followed by the filename, all in caps, and no trailing underscore.

Files:

cpp/tensorrt_llm/batch_manager/cacheFormatter.h
cpp/tensorrt_llm/batch_manager/dataTransceiver.h

**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

cpp/tensorrt_llm/batch_manager/cacheFormatter.h
cpp/tensorrt_llm/batch_manager/dataTransceiver.h
cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp

🧠 Learnings (3)

cpp/tensorrt_llm/batch_manager/cacheFormatter.h (1)

Learnt from: amitz-nv
PR: #5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks is_adapter_in_cpu_cache() and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.

cpp/tensorrt_llm/batch_manager/dataTransceiver.h (1)

Learnt from: amitz-nv
PR: #5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks is_adapter_in_cpu_cache() and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (1)

Learnt from: amitz-nv
PR: #5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks is_adapter_in_cpu_cache() and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.

🔇 Additional comments (24)

cpp/tensorrt_llm/batch_manager/dataTransceiver.h (6)

1-16: LGTM! Copyright header complies with coding guidelines.

The NVIDIA copyright header includes the current year (2025) as required by the coding guidelines for TensorRT-LLM Open Source Software.

24-24: Good addition of cacheFormatter.h include.

The new include aligns with the refactored design that moves TransferSession definition to the kv_cache_manager namespace.

42-58: Well-designed TransceiverTag structure follows coding guidelines.

The type aliases and TransceiverTag struct properly follow C++ naming conventions:

SizeType32 alias uses camelCase as required

TransferSession alias provides clear abstraction boundary

Enum values use uppercase naming (REQUEST_SEND, TERMINATION)

Static constexpr constants use kCONSTANT_NAME pattern with proper k prefix

115-158: CacheSender class design follows best practices.

The refactored class demonstrates good design principles:

Proper UpperCamelCase naming for class name

Method names use lowerCamelCase convention

Smart pointer usage with unique_ptr for resource management

Pimpl pattern reduces compilation dependencies

Constructor parameters use appropriate types (ConnectionManager*, CacheState, etc.)

160-182: CacheReceiver class maintains design consistency.

The class follows the same excellent design patterns as CacheSender:

Consistent naming conventions and interface design

Symmetric constructor signature with CacheSender

Appropriate method names for receiving operations

Proper use of pimpl pattern for implementation hiding

184-184: Perfect namespace closing comment.

The closing brace comment follows the coding guidelines requirement exactly: } // namespace tensorrt_llm::batch_manager

cpp/tensorrt_llm/batch_manager/cacheFormatter.h (4)

1-16: Copyright header meets coding guidelines.

The NVIDIA copyright header correctly includes the current year (2025) as required for TensorRT-LLM Open Source Software.

45-121: TransferSession class demonstrates excellent C++ design practices.

The class properly follows all coding guidelines:

UpperCamelCase class name

Member variables with proper 'm' prefix (mConnections, mDataContext, etc.)

Method names use lowerCamelCase convention

Proper const-correctness with const methods and const references

Good use of TLLM_CHECK for assertions

Clean initializer list in constructor

123-161: BaseCacheFormatter interface demonstrates clean design.

The interface properly follows coding guidelines:

Method names use lowerCamelCase convention

Proper virtual destructor

Clean signature with TransferSession& parameter

Appropriate const-correctness

Good use of [[nodiscard]] attribute for return values

266-266: Perfect namespace closing comment.

The closing brace comment properly follows coding guidelines with the complete namespace path: } // namespace tensorrt_llm::batch_manager::kv_cache_manager

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (14)

1-16: Copyright header complies with coding guidelines.

The NVIDIA copyright header correctly includes the current year (2025) as required for TensorRT-LLM Open Source Software.

38-57: Well-designed utility functions and structures.

The code follows C++ coding guidelines properly:

tagFromRequestId function uses lowerCamelCase and static linkage

ReceiveCacheResource struct uses UpperCamelCase naming

Member variables correctly use 'm' prefix (mBufferManager, mCudaEvent)

Constant kDATA_TAG follows uppercase with 'k' prefix convention

115-133: CacheSender::Impl constructor follows best practices.

The constructor demonstrates good C++ practices:

Proper member initialization list

Smart pointer usage with unique_ptr

Appropriate validation with TLLM_CHECK

Member variables follow 'm' prefix convention

176-228: Complex but well-structured request info handling.

The method demonstrates good practices:

Proper type checking with dynamic_cast for agent connections

Clean lambda usage for agent-specific logic

Correct mutex synchronization for mRequestToSession access

Good validation of formatter compatibility between cache states

Appropriate session creation and connection management

230-237: Clean and focused sendSync implementation.

The method properly delegates to formatter and manages the session state appropriately.

251-403: Robust threading and response management.

The response handling demonstrates excellent threading practices:

Proper exception handling with promise/future pattern

Correct CUDA device management across threads

Thread naming for debugging (dataTransResp)

Proper condition variable usage for synchronization

Clean termination handling in destructor

405-418: CacheReceiver::Impl constructor follows consistent design.

The constructor maintains the same excellent practices as CacheSender::Impl with proper initialization and validation.

426-458: Excellent async request handling with thread pool pattern.

This method demonstrates the proper solution for efficient async operations:

Resource pooling per process to avoid thread creation overhead

Queue-based task distribution with worker threads

Proper synchronization with mutex and condition variables

Good exception handling and future management

This addresses the inefficiency noted in the receiveAsync TODO

460-463: Clean receiveSync delegation.

Simple and appropriate delegation to the formatter's unformat method.

465-559: Comprehensive sendRequestInfo implementation.

The method handles complex scenarios well:

Proper validation of formatter support between cache states

Clean handling of both agent and regular connection managers

Appropriate serialization of request information

Good resource management with TransferSession creation

Proper exception handling and validation throughout

531-559: Solid resource management with thread safety.

The resource management demonstrates good practices:

Thread-safe access with scoped_lock<mutex>

Lazy initialization of per-process resources

Proper RAII with unique_ptr and BufferManager

Clean helper method for connection communication

561-698: Excellent async resource management and cleanup.

The implementation demonstrates robust design:

Proper RAII in destructor with resource cleanup and future synchronization

Good CUDA device management and logging in requestSync

Well-designed AsyncResource structure for thread coordination

Proper move semantics in RequestAndPromise

Comprehensive error handling with exception propagation

Thread naming for debugging (dataTransRequest)

700-734: Clean public interface following pimpl pattern.

The public interface demonstrates excellent design:

Proper delegation to implementation classes

Clean constructor forwarding with smart pointer management

Appropriate method delegation maintaining interface contracts

Correctly defaulted destructors for pimpl pattern

736-737: Perfect namespace closing comment.

The closing brace comment properly follows coding guidelines: } // namespace tensorrt_llm::batch_manager

cpp/tensorrt_llm/batch_manager/cacheFormatter.h

tensorrt-cicd · 2025-07-25T19:32:32Z

PR_Github #13033 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-25T21:16:35Z

PR_Github #13033 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #9736 completed with status: 'FAILURE'

Tabrizian · 2025-09-16T19:21:08Z

/bot run --disable-fail-fast

Tabrizian · 2025-09-16T19:21:31Z

@pcastonguay / @Shixiaowei02 I addressed the review comments. Please review when you get a chance. Thanks.

tensorrt-cicd · 2025-09-16T19:26:10Z

PR_Github #18822 [ run ] triggered by Bot

tensorrt-cicd · 2025-09-16T22:33:33Z

PR_Github #18822 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #14114 completed with status: 'FAILURE'

Tabrizian · 2025-09-17T00:45:25Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-09-17T00:50:30Z

PR_Github #18845 [ run ] triggered by Bot

tensorrt-cicd · 2025-09-17T06:20:10Z

PR_Github #18845 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #14126 completed with status: 'FAILURE'

tensorrt_llm/_torch/pyexecutor/py_executor.py

cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp

Signed-off-by: Iman Tabrizian <[email protected]> Add unittest for findBlocksInReuseTreeByHashes Signed-off-by: Iman Tabrizian <[email protected]> fixes Signed-off-by: Iman Tabrizian <[email protected]> Switch from hash id to block key Signed-off-by: Iman Tabrizian <[email protected]> Add support for blockKeys Signed-off-by: Iman Tabrizian <[email protected]> Fix bugs Signed-off-by: Iman Tabrizian <[email protected]> Fix accuracy bug and add tests Signed-off-by: Iman Tabrizian <[email protected]>

Signed-off-by: Iman Tabrizian <[email protected]>

Tabrizian · 2025-09-18T16:15:58Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-09-18T16:22:50Z

PR_Github #19211 [ run ] triggered by Bot

pcastonguay · 2025-09-18T16:28:07Z

@chuangz0 @Shixiaowei02 could you please review the kvCacheTransceiver changes? Thanks.

tensorrt-cicd · 2025-09-19T08:55:00Z

PR_Github #19211 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #14422 completed with status: 'FAILURE'

Signed-off-by: Iman Tabrizian <[email protected]>

coderabbitai bot requested review from nv-guomingz, Superjomn, tburt-nv and venkywonka July 25, 2025 02:24

coderabbitai bot reviewed Jul 25, 2025

View reviewed changes

examples/disaggregated/clients/disagg_client.py Outdated Show resolved Hide resolved

examples/disaggregated/clients/disagg_client.py Outdated Show resolved Hide resolved

examples/disaggregated/clients/disagg_client.py Outdated Show resolved Hide resolved

Tabrizian force-pushed the user/imant/useHashIds branch from eed674f to 376d131 Compare July 25, 2025 02:26

coderabbitai bot requested a review from DomBrown July 25, 2025 02:26

Tabrizian requested review from pcastonguay, chuangz0 and Shixiaowei02 and removed request for Superjomn, DomBrown, venkywonka, nv-guomingz and tburt-nv July 25, 2025 02:27

Tabrizian force-pushed the user/imant/useHashIds branch from 376d131 to 90d8774 Compare July 25, 2025 02:28

coderabbitai bot requested a review from nv-guomingz July 25, 2025 02:30

coderabbitai bot reviewed Jul 25, 2025

View reviewed changes

Tabrizian force-pushed the user/imant/useHashIds branch 2 times, most recently from ddc66c6 to 7d3e452 Compare July 25, 2025 19:27

coderabbitai bot reviewed Jul 25, 2025

View reviewed changes

cpp/tensorrt_llm/batch_manager/cacheFormatter.h Show resolved Hide resolved

cpp/tensorrt_llm/batch_manager/cacheFormatter.h Outdated Show resolved Hide resolved

Tabrizian force-pushed the user/imant/useHashIds branch from 7d3e452 to 1a21dba Compare August 12, 2025 20:06

Tabrizian requested a review from a team as a code owner August 12, 2025 20:06

Tabrizian force-pushed the user/imant/useHashIds branch from f728821 to 5a46db5 Compare September 16, 2025 19:18

Tabrizian force-pushed the user/imant/useHashIds branch 2 times, most recently from 6a094b9 to f3db06d Compare September 16, 2025 20:04

Tabrizian requested review from Shixiaowei02 and pcastonguay September 16, 2025 20:18

Tabrizian force-pushed the user/imant/useHashIds branch 2 times, most recently from 9adbf1c to 1189980 Compare September 17, 2025 00:45

pcastonguay requested a review from thorjohnsen September 17, 2025 17:31

pcastonguay reviewed Sep 17, 2025

View reviewed changes

tensorrt_llm/_torch/pyexecutor/py_executor.py Show resolved Hide resolved

tensorrt_llm/_torch/pyexecutor/py_executor.py Show resolved Hide resolved

cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp Outdated Show resolved Hide resolved

Tabrizian added 4 commits September 18, 2025 07:12

Address review comments + testing

d510ddf

Signed-off-by: Iman Tabrizian <[email protected]>

Fix TRT path

a7ce5f0

Signed-off-by: Iman Tabrizian <[email protected]>

Review comment

5c39f4f

Signed-off-by: Iman Tabrizian <[email protected]>

Tabrizian force-pushed the user/imant/useHashIds branch from 1189980 to 5c39f4f Compare September 18, 2025 14:16

Fix tests

08ffe67

Signed-off-by: Iman Tabrizian <[email protected]>

Tabrizian force-pushed the user/imant/useHashIds branch from 08ffe67 to b91e12e Compare September 18, 2025 16:15

Fix pp bugs

0e69eb5

Signed-off-by: Iman Tabrizian <[email protected]>

Tabrizian force-pushed the user/imant/useHashIds branch from b91e12e to 0e69eb5 Compare September 19, 2025 17:46

Fix eagle

00141a1

Signed-off-by: Iman Tabrizian <[email protected]>

[TRTLLM-6106][feat] Add support for KVCache transfer from KVCache reuse path #6348

Are you sure you want to change the base?

[TRTLLM-6106][feat] Add support for KVCache transfer from KVCache reuse path #6348

Uh oh!

Conversation

Tabrizian commented Jul 25, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

coderabbitai bot commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Tabrizian commented Jul 25, 2025

Uh oh!

tensorrt-cicd commented Jul 25, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tensorrt-cicd commented Jul 25, 2025

Uh oh!

Tabrizian commented Jul 25, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tensorrt-cicd commented Jul 25, 2025

Uh oh!

tensorrt-cicd commented Jul 25, 2025

Uh oh!

Tabrizian commented Sep 16, 2025

Uh oh!

Tabrizian commented Sep 16, 2025

Uh oh!

tensorrt-cicd commented Sep 16, 2025

Uh oh!

tensorrt-cicd commented Sep 16, 2025

Uh oh!

Tabrizian commented Sep 17, 2025

Uh oh!

tensorrt-cicd commented Sep 17, 2025

Uh oh!

tensorrt-cicd commented Sep 17, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Tabrizian commented Sep 18, 2025

Uh oh!

tensorrt-cicd commented Sep 18, 2025

Uh oh!

pcastonguay commented Sep 18, 2025

Uh oh!

tensorrt-cicd commented Sep 19, 2025

Tabrizian commented Jul 25, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 25, 2025 •

edited

Loading