Skip to content

Conversation

Tabrizian
Copy link
Member

@Tabrizian Tabrizian commented Jul 25, 2025

Summary by CodeRabbit

  • New Features

    • Session-based KV-cache transfers with multi-connection support; Python binding to pin per-request cache blocks (pin_blocks).
  • Performance

    • Selective, hash-driven block reuse and lookup to reduce KV-cache transfer and allocation overhead.
  • Behavior Changes

    • Executor flag now influences block-reuse lifecycle (requests may be terminated vs retained during context transmission).
  • API Changes

    • Environment toggle for selective transfer removed; transceiver/transfer interfaces reorganized.
  • Tests

    • Added/updated tests for reuse-tree lookups and transceiver/cache transfer flows.
  • Chores

    • Build and header cleanup to align with new transfer architecture.

Description

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Copy link
Contributor

coderabbitai bot commented Jul 25, 2025

Caution

Review failed

An error occurred during the review process. Please try again later.

📝 Walkthrough

Walkthrough

Renames DataResponder/DataRequester → CacheSender/CacheReceiver, deletes legacy impl files, introduces TransferSession and hash-driven block-range transfers, adds pinBlocks and hash-based reuse lookups, updates related headers, tests, Python binding, and serialization for BlockKey.

Changes

Cohort / File(s) Summary
CacheTransceiver refactor
cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h, cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp
Replace DataResponder/DataRequester with CacheSender/CacheReceiver; rename members (mDataRespondermCacheSender, mDataRequestermCacheReceiver, mResponderFuturesmSenderFutures); update comm-state sourcing and async send/receive calls; add commented mCacheServer.
Public transceiver API & wrappers
cpp/tensorrt_llm/batch_manager/dataTransceiver.h, cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp
Add CacheSender/CacheReceiver wrappers (PIMPL), aliases (SizeType32, TransferSession, BaseCacheFormatter), TransceiverTag, async send/receive APIs and request/session helpers; remove old DataSender/DataReceiver/DataResponder/DataRequester and KvCacheMeasureHelper.
Legacy impl removal & build update
cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp, .../dataTransceiverImpl.h, cpp/tensorrt_llm/batch_manager/CMakeLists.txt
Delete legacy dataTransceiverImpl.* (DataSenderImpl/DataReceiverImpl and helpers) and remove dataTransceiverImpl.cpp from CMake SRCS.
Formatter & session model
cpp/tensorrt_llm/batch_manager/cacheFormatter.h, .../cacheFormatter.cpp, .../mlaCacheFormatter.cpp
Introduce TransferSession and KvCacheMeasureHelper; change BaseCacheFormatter interface to session-based format/unformat; extend getBlockRangeForSending to accept allBlockKeys and indexFromEnd; wire hashes through sending/receiving paths.
KV-cache manager & utils
cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h, .../kvCacheManager.cpp, .../kvCacheUtils.h
Add BlockKey::hash, BlockKey equality implementation, buildBlockKeys declaration, BlockRange::fromReuseTree, pinning API pinBlocks(...), and 1/2-arg reuse lookup overloads across WindowBlockManager/BlockManager/KVCacheManager/BaseKVCacheManager.
LLM request API
cpp/include/tensorrt_llm/batch_manager/llmRequest.h
Remove mRequestedBlockHashes and its getter/setter from GenericLlmRequest.
Agent / UCX connection tweaks
cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h, .../connection.cpp, cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.cpp
Rename sender-side memory descriptor to mCacheReceiverBufferDesc and parameter; update AgentConnection usage; replace include of removed impl header with dataTransceiver.h in UCX path.
Serialization & utils
cpp/include/tensorrt_llm/executor/serialization.h, cpp/tensorrt_llm/executor/serialization.cpp, cpp/tensorrt_llm/executor/serializeUtils.h
Add BlockKey (de)serialization API and include; extend generic serialize utilities to support std::array and std::pair.
Env & Python/Executor integration
cpp/tensorrt_llm/common/envUtils.cpp, cpp/tensorrt_llm/pybind/batch_manager/kvCacheManager.cpp, tensorrt_llm/_torch/pyexecutor/py_executor.py
Remove getEnvDisableSelectiveCacheTransfer(); expose pin_blocks binding for BaseKVCacheManager::pinBlocks; add block_reuse_enabled flag to PyExecutor and adjust response termination behavior.
Tests
cpp/tests/unit_tests/multi_gpu/cacheTransceiverTest.cpp, cpp/tests/unit_tests/executor/ucxCommTest.cpp, cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp
Update tests/mocks and usages to CacheSender/CacheReceiver; remove dataTransceiverImpl.h includes; add duplicated FindBlocksInReuseTreeByHashesTest.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor PyExec as PyExecutor
  participant CT as CacheTransceiver
  participant CS as CacheSender
  participant CR as CacheReceiver
  participant CM as ConnectionManager
  participant F as BaseCacheFormatter
  participant KM as KVCacheManager

  rect rgb(245,248,255)
  note over PyExec,CT: Send path (async)
  PyExec->>CT: respondAndSendAsync(req)
  CT->>CS: sendAsync(req)
  CS->>CM: resolve connections/peers
  CS->>F: format(TransferSession{allBlockKeys, indexFromEnd})
  F->>KM: getBlockRangeForSending(..., allBlockKeys, indexFromEnd)
  F-->>CS: formatted buffers
  CS-->>CT: future<void> (completion)
  end

  rect rgb(245,255,245)
  note over PyExec,CT: Receive path (async)
  PyExec->>CT: requestAndReceiveAsync(req)
  CT->>CR: receiveAsync(req)
  CR->>CM: send RequestInfo (state + allBlockKeys + indexFromEnd)
  CR->>F: unformat(TransferSession)
  F->>KM: allocate or reuse blocks by hashes
  CR-->>CT: future<void> (completion)
  end

  rect rgb(255,248,240)
  note over PyExec,KM: On completion (if block reuse enabled)
  PyExec->>KM: pinBlocks(requestId)
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Suggested labels

KV-Cache Management

Suggested reviewers

  • chuangz0
  • nv-guomingz
  • pcastonguay
  • Shixiaowei02
✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🔭 Outside diff range comments (4)
cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h (1)

18-18: Use preprocessor guard instead of pragma once.

According to coding guidelines, header files should use a preprocessor guard with prefix TRTLLM_ followed by the filename in caps.

Replace #pragma once with a proper header guard:

-#pragma once
+#ifndef TRTLLM_CONNECTION_H
+#define TRTLLM_CONNECTION_H

And add at the end of the file:

+#endif // TRTLLM_CONNECTION_H
cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h (2)

17-17: Use preprocessor guard instead of pragma once.

According to coding guidelines, header files should use a preprocessor guard with prefix TRTLLM_ followed by the filename in caps.

Replace #pragma once with a proper header guard:

-#pragma once
+#ifndef TRTLLM_CACHETRANSCEIVER_H
+#define TRTLLM_CACHETRANSCEIVER_H

And add at the end of the file:

+#endif // TRTLLM_CACHETRANSCEIVER_H

2-2: Update copyright year to include 2025.

According to coding guidelines, all TensorRT-LLM code should contain an NVIDIA copyright header that includes the current year (2025).

- * Copyright (c) 2023-2024, NVIDIA CORPORATION.  All rights reserved.
+ * Copyright (c) 2023-2025, NVIDIA CORPORATION.  All rights reserved.
cpp/tests/batch_manager/cacheTransceiverTest.cpp (1)

655-658: Remove unused variable assignment.

The variable cacheType is assigned but never used. This appears to be dead code.

-        if (kvFactor == 1)
-        {
-            auto cacheType = CacheType::kSELFKONLY;
-        }
🧹 Nitpick comments (2)
examples/disaggregated/clients/disagg_client.py (1)

81-126: Consider adding timing measurements to send_chat_request for consistency.

The send_chat_request function lacks the same TTFT and ITL timing measurements that were added to send_request. This creates inconsistent behavior between the two endpoints.

Consider adding similar timing measurements to send_chat_request or extracting the timing logic into a shared helper function to maintain consistency across both endpoints.

examples/disaggregated/clients/prompts.json (1)

1-3: Consider maintaining diverse test prompts for comprehensive testing.

While simplifying to a single prompt reduces complexity, having diverse prompts helps test various scenarios (different lengths, complexity, languages, etc.). Consider whether this reduction impacts test coverage adequately.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0f2f11f and eed674f.

📒 Files selected for processing (13)
  • cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h (2 hunks)
  • cpp/tensorrt_llm/batch_manager/CMakeLists.txt (0 hunks)
  • cpp/tensorrt_llm/batch_manager/cacheFormatter.h (3 hunks)
  • cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp (7 hunks)
  • cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (15 hunks)
  • cpp/tensorrt_llm/batch_manager/dataTransceiver.h (3 hunks)
  • cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp (0 hunks)
  • cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.h (0 hunks)
  • cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp (2 hunks)
  • cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h (2 hunks)
  • cpp/tests/batch_manager/cacheTransceiverTest.cpp (11 hunks)
  • examples/disaggregated/clients/disagg_client.py (3 hunks)
  • examples/disaggregated/clients/prompts.json (1 hunks)
💤 Files with no reviewable changes (3)
  • cpp/tensorrt_llm/batch_manager/CMakeLists.txt
  • cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.h
  • cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{cpp,h,hpp,cc,cxx}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.{cpp,h,hpp,cc,cxx}: Closing braces of namespaces should have a comment saying the namespace it closes (e.g., } // namespace foo)
Prefer const or constexpr variables over #defines whenever possible, as the latter are not visible to the compiler.
A variable that is not modified after its initialization should be declared as const.
Except 0, nullptr, true, false, all other literals should only be used for variable initialization.
Use the Allman indentation style for braces in C++ code.
Put the semicolon for an empty for or while loop in a new line.
The statement forming the body of a switch, while, do .. while or for statement shall be a compound statement (use brace-delimited statements).
If and else should always be followed by brace-delimited statements, even if empty or a single statement.
C++ filenames should use camel case with first letter lowercase (e.g., thisIsAFilename.cpp), and must be case-insensitive unique within a compilation target.
All types (including class names) should use camel case with uppercase first letter (e.g., FooBarClass).
Local variables, methods, and namespaces should use camel case with first letter lowercase (e.g., localFooBar).
Non-magic-number global variables that are non-static and not defined in anonymous namespace should use camel case prefixed by a lower case 'g' (e.g., gDontUseGlobalFoos).
Non-magic-number global variables that are static or defined in an anonymous namespace should use camel case prefixed by a lower case 's' (e.g., sMutableStaticGlobal).
Locally visible static variable should use camel case with lowercase prefix 's' as the first letter of the name (e.g., static std::once_flag sFlag;).
Class member variables should use camel case prefixed with an 'm' (e.g., mNbFooValues). Public member variables do not require the 'm' prefix but it is encouraged for clarity.
Enumerations, global constants, static constants at class-scope, and function-scope magic-number/literal constants are uppercase snakecase with prefix...

Files:

  • cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp
  • cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h
  • cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp
  • cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h
  • cpp/tensorrt_llm/batch_manager/cacheFormatter.h
  • cpp/tensorrt_llm/batch_manager/dataTransceiver.h
  • cpp/tests/batch_manager/cacheTransceiverTest.cpp
  • cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp
**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

  • cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp
  • examples/disaggregated/clients/disagg_client.py
  • cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h
  • cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp
  • cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h
  • cpp/tensorrt_llm/batch_manager/cacheFormatter.h
  • cpp/tensorrt_llm/batch_manager/dataTransceiver.h
  • cpp/tests/batch_manager/cacheTransceiverTest.cpp
  • cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

  • examples/disaggregated/clients/disagg_client.py
**/*.{h,hpp}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

Use a preprocessor guard in header files. The guard name must have prefix TRTLLM_ followed by the filename, all in caps, and no trailing underscore.

Files:

  • cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h
  • cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h
  • cpp/tensorrt_llm/batch_manager/cacheFormatter.h
  • cpp/tensorrt_llm/batch_manager/dataTransceiver.h
🧠 Learnings (3)
cpp/tensorrt_llm/batch_manager/cacheFormatter.h (1)

Learnt from: amitz-nv
PR: #5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks is_adapter_in_cpu_cache() and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.

cpp/tensorrt_llm/batch_manager/dataTransceiver.h (1)

Learnt from: amitz-nv
PR: #5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks is_adapter_in_cpu_cache() and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (1)

Learnt from: amitz-nv
PR: #5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks is_adapter_in_cpu_cache() and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.

🔇 Additional comments (24)
cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h (1)

178-178: Consistent renaming verified

I performed a repository-wide search and confirmed there are no remaining occurrences of mReceiverBufferDesc. All references in

  • cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h
  • cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp

have been updated to mCacheReceiverBufferDesc.

LGTM!

cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp (2)

84-84: LGTM! Consistent implementation of the renamed member variable.

The changes correctly implement the renamed mCacheReceiverBufferDesc member variable throughout the implementation, maintaining consistency with the header file updates.

Also applies to: 140-142


468-468: LGTM! Proper namespace closing comment.

The namespace closing comment follows the coding guidelines correctly.

cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h (1)

37-38: Renaming Verification Complete

All references to the old class names have been removed:

  • No matches for DataResponder
  • No matches for DataRequester
  • No matches for mDataSender or mDataReceiver

The renaming to CacheSender/CacheReceiver is consistent and complete. No further action required.

cpp/tests/batch_manager/cacheTransceiverTest.cpp (6)

159-185: LGTM! Mock class properly renamed and implemented.

The mock class follows appropriate testing patterns with proper use of Google Mock features.


186-192: LGTM! Mock receiver class properly implemented.

The mock methods are correctly defined for testing cache receiving functionality.


211-230: LGTM! Test properly updated to use renamed classes.

The test correctly uses CacheSender instead of DataResponder while maintaining the original test logic.


232-252: LGTM! Test properly uses CacheReceiver.

The test correctly replaces DataRequester with CacheReceiver while maintaining test functionality.


396-404: LGTM! Cache transceiver setup properly updated.

The setup correctly instantiates CacheSender and CacheReceiver with proper implementations using CacheSenderImpl and CacheReceiverImpl.


762-768: LGTM! Asymmetric cache test properly updated.

The test correctly uses CacheSender and CacheReceiver with their implementation classes.

cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp (3)

196-200: LGTM! Proper initialization of cache sender and receiver.

The constructor correctly initializes CacheSender and CacheReceiver with appropriate parameters and formatter creation.


236-298: LGTM! Methods properly updated to use new cache classes.

All async send/receive operations correctly use mCacheSender and mCacheReceiver with proper future handling.


369-436: LGTM! Context transfer status checking properly updated.

The method correctly uses mSenderFutures (renamed from mResponderFutures) while maintaining the original synchronization logic.

cpp/tensorrt_llm/batch_manager/cacheFormatter.h (2)

45-122: LGTM! Well-designed TransferSession class.

The class provides a clean abstraction for managing transfer sessions with proper const-correctness, precondition checks, and encapsulation.


163-222: LGTM! Performance measurement helper properly implemented.

The class provides thread-safe performance measurement with proper file output handling. Good use of RAII pattern for automatic output on destruction.

cpp/tensorrt_llm/batch_manager/dataTransceiver.h (1)

115-183: LGTM! Clean interface design with pImpl pattern.

The CacheSender and CacheReceiver classes provide clean interfaces with proper use of the pImpl idiom for implementation hiding and ABI stability.

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (8)

1-32: LGTM: Copyright header and includes look good.

The copyright header contains the current year (2025) as required, and the new include for agent connection utilities aligns with the refactoring objectives.


38-57: Good RAII design and helper functions.

The new helper structures and functions are well-designed:

  • Using declarations improve readability
  • ReceiveCacheResource follows RAII principles correctly
  • tagFromRequestId implements proper bit manipulation for tag generation

The magic number kDATA_TAG{43} is appropriately named as a constexpr constant.


115-174: Well-designed constructor and core methods.

The implementation demonstrates good practices:

  • Constructor properly validates inputs with TLLM_CHECK
  • Resource management uses smart pointers appropriately
  • Thread safety is maintained with proper mutex usage
  • Dependency injection pattern is correctly implemented

The new methods getCounterpartsCount and release provide clean session management.


176-228: Complex but well-structured request info handling.

The method correctly handles both agent and direct connection scenarios:

  • Appropriate use of dynamic_cast for type discrimination
  • Comprehensive error handling with informative messages
  • Proper thread safety with mutex protection
  • Session management is correctly implemented

The method's complexity is justified given the dual connection handling requirements.


230-345: Excellent thread management and synchronization.

The implementation demonstrates solid concurrent programming practices:

  • Clean separation between sync and async operations
  • Proper use of condition variables and mutexes for thread coordination
  • Exception handling ensures promises are always resolved
  • Thread naming (dataTransResp) aids in debugging
  • Producer-consumer pattern is correctly implemented

The variable renaming from responder to sender terminology is consistent with the refactor.


405-529: Solid CacheReceiver implementation with good async handling.

The receiver implementation mirrors the sender's good practices:

  • Constructor properly validates inputs and manages resources
  • Async operations are correctly implemented with std::async
  • Complex connection handling for both agent and direct scenarios
  • Proper resource management with ReceiveCacheResource
  • Thread safety maintained throughout

The symmetry with CacheSender provides a clean, consistent API.


560-698: Well-implemented async resource management.

The async processing implementation demonstrates good concurrent design:

  • AsyncResource properly encapsulates thread synchronization primitives
  • Worker thread pattern correctly processes queued requests
  • Proper termination handling prevents resource leaks
  • Exception handling ensures all promises are resolved
  • Thread naming (dataTransRequest) aids debugging
  • Use of std::deque for request queue is appropriate

The cleanup logic in the destructor properly terminates all worker threads.


700-736: Clean public interface implementation using pimpl idiom.

The public interfaces are well-designed:

  • Proper use of pimpl idiom with std::unique_ptr<Impl>
  • Constructor delegation to implementation is clean
  • Method delegation is straightforward and maintains const correctness
  • Defaulted destructors are appropriate with smart pointer members

The namespace closing comment follows the coding guidelines correctly.

@Tabrizian Tabrizian force-pushed the user/imant/useHashIds branch from eed674f to 376d131 Compare July 25, 2025 02:26
@coderabbitai coderabbitai bot requested a review from DomBrown July 25, 2025 02:26
@Tabrizian
Copy link
Member Author

/bot run

@Tabrizian Tabrizian force-pushed the user/imant/useHashIds branch from 376d131 to 90d8774 Compare July 25, 2025 02:28
@coderabbitai coderabbitai bot requested a review from nv-guomingz July 25, 2025 02:30
@tensorrt-cicd
Copy link
Collaborator

PR_Github #12929 [ run ] triggered by Bot

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (1)

420-424: Consider thread pool instead of frequent thread creation.

The TODO comment indicates awareness of the performance issue with frequent thread creation. The current implementation uses std::async with std::launch::async which may create new threads frequently.

Consider implementing a thread pool for better performance:

- return std::async(std::launch::async, &CacheReceiver::Impl::requestSync, this, std::ref(llmRequest));
+ // Use thread pool implementation instead of direct async call
+ return mThreadPool.submit(&CacheReceiver::Impl::requestSync, this, std::ref(llmRequest));
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 376d131 and 90d8774.

📒 Files selected for processing (11)
  • cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h (2 hunks)
  • cpp/tensorrt_llm/batch_manager/CMakeLists.txt (0 hunks)
  • cpp/tensorrt_llm/batch_manager/cacheFormatter.h (3 hunks)
  • cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp (7 hunks)
  • cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (15 hunks)
  • cpp/tensorrt_llm/batch_manager/dataTransceiver.h (3 hunks)
  • cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp (0 hunks)
  • cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.h (0 hunks)
  • cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp (2 hunks)
  • cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h (2 hunks)
  • cpp/tests/batch_manager/cacheTransceiverTest.cpp (11 hunks)
💤 Files with no reviewable changes (3)
  • cpp/tensorrt_llm/batch_manager/CMakeLists.txt
  • cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.h
  • cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp
🚧 Files skipped from review as they are similar to previous changes (6)
  • cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp
  • cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h
  • cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h
  • cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp
  • cpp/tensorrt_llm/batch_manager/cacheFormatter.h
  • cpp/tests/batch_manager/cacheTransceiverTest.cpp
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{cpp,h,hpp,cc,cxx}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.{cpp,h,hpp,cc,cxx}: Closing braces of namespaces should have a comment saying the namespace it closes (e.g., } // namespace foo)
Prefer const or constexpr variables over #defines whenever possible, as the latter are not visible to the compiler.
A variable that is not modified after its initialization should be declared as const.
Except 0, nullptr, true, false, all other literals should only be used for variable initialization.
Use the Allman indentation style for braces in C++ code.
Put the semicolon for an empty for or while loop in a new line.
The statement forming the body of a switch, while, do .. while or for statement shall be a compound statement (use brace-delimited statements).
If and else should always be followed by brace-delimited statements, even if empty or a single statement.
C++ filenames should use camel case with first letter lowercase (e.g., thisIsAFilename.cpp), and must be case-insensitive unique within a compilation target.
All types (including class names) should use camel case with uppercase first letter (e.g., FooBarClass).
Local variables, methods, and namespaces should use camel case with first letter lowercase (e.g., localFooBar).
Non-magic-number global variables that are non-static and not defined in anonymous namespace should use camel case prefixed by a lower case 'g' (e.g., gDontUseGlobalFoos).
Non-magic-number global variables that are static or defined in an anonymous namespace should use camel case prefixed by a lower case 's' (e.g., sMutableStaticGlobal).
Locally visible static variable should use camel case with lowercase prefix 's' as the first letter of the name (e.g., static std::once_flag sFlag;).
Class member variables should use camel case prefixed with an 'm' (e.g., mNbFooValues). Public member variables do not require the 'm' prefix but it is encouraged for clarity.
Enumerations, global constants, static constants at class-scope, and function-scope magic-number/literal constants are uppercase snakecase with prefix...

Files:

  • cpp/tensorrt_llm/batch_manager/dataTransceiver.h
  • cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp
**/*.{h,hpp}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

Use a preprocessor guard in header files. The guard name must have prefix TRTLLM_ followed by the filename, all in caps, and no trailing underscore.

Files:

  • cpp/tensorrt_llm/batch_manager/dataTransceiver.h
**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

  • cpp/tensorrt_llm/batch_manager/dataTransceiver.h
  • cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp
🧠 Learnings (2)
cpp/tensorrt_llm/batch_manager/dataTransceiver.h (1)

Learnt from: amitz-nv
PR: #5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks is_adapter_in_cpu_cache() and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (1)

Learnt from: amitz-nv
PR: #5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks is_adapter_in_cpu_cache() and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (48)
cpp/tensorrt_llm/batch_manager/dataTransceiver.h (17)

24-24: LGTM: Include added for new cache formatter abstraction.

The new include for cacheFormatter.h is correctly added to support the refactored architecture that uses BaseCacheFormatter and TransferSession from the kv_cache_manager namespace.


42-44: LGTM: Type aliases properly updated for the refactored architecture.

The type aliases correctly reference the new abstractions:

  • SizeType32 properly aliased from runtime namespace
  • TransferSession correctly aliased from kv_cache_manager namespace, replacing the previous internal definition

45-58: LGTM: TransceiverTag struct provides proper message tagging.

The TransceiverTag struct correctly defines:

  • Enumerated message types for REQUEST_SEND and TERMINATION
  • Appropriate tag constants for ID, INFO_SIZE, and INFO message components
  • Uses constexpr for compile-time constants as per coding guidelines

The naming follows the coding guidelines with uppercase snake_case for constants prefixed with 'k'.


58-58: LGTM: BaseCacheFormatter type alias correctly defined.

The type alias correctly references the BaseCacheFormatter from the kv_cache_manager namespace, maintaining consistency with the new architecture.


115-121: LGTM: CacheSender constructor signature properly designed.

The constructor correctly accepts:

  • ConnectionManager* for managing connections
  • CacheState for cache state information
  • SizeType32 for self index
  • std::unique_ptr<BaseCacheFormatter> for cache formatting operations

The use of std::unique_ptr follows C++ best practices for single resource ownership as specified in the coding guidelines.


122-126: LGTM: sendAsync method signature correctly updated.

The method signature properly:

  • Returns std::future<void> for asynchronous operation
  • Takes LlmRequest& as parameter
  • Uses [[nodiscard]] attribute appropriately
  • Includes comprehensive documentation

130-134: LGTM: Communication state accessors properly updated.

The getCommState() and setCommState() methods correctly use the new executor::kv_cache::CommState type, maintaining the same interface pattern as before but with updated types.


136-150: LGTM: Additional methods properly defined for the refactored architecture.

The new methods correctly provide:

  • recvRequestInfo() for receiving request information
  • sendSync() for synchronous sending
  • sendRequestInfo() returning TransferSession
  • receiveSync() accepting TransferSession&

All methods use appropriate parameter types and return types for the new architecture.


153-153: LGTM: Destructor properly declared.

The destructor is correctly declared and will be implemented to properly clean up resources.


160-177: LGTM: CacheReceiver class properly designed.

The CacheReceiver class correctly:

  • Has matching constructor signature with CacheSender
  • Provides receiveAsync() method for asynchronous reception
  • Includes sendRequestInfo() and receiveSync() methods for coordination
  • Uses proper parameter and return types
  • Follows the same design patterns as CacheSender

24-24: LGTM! Proper include for new dependency.

The inclusion of cacheFormatter.h aligns with the refactoring to use kv_cache_manager types.


42-44: LGTM! Clean type aliasing approach.

The type aliases consolidate dependencies and the TransferSession alias properly delegates to the kv_cache_manager namespace as intended by the refactor.


45-57: LGTM! Well-structured communication protocol.

The TransceiverTag struct provides a clean enumeration of message types and associated tag constants for the communication protocol.


58-58: LGTM! Consistent type aliasing.

The BaseCacheFormatter alias maintains consistency with the delegation pattern to kv_cache_manager.


160-182: LGTM! Well-designed receiver interface.

The CacheReceiver class interface mirrors the sender pattern appropriately, with proper constructor signature and async receive capability.


184-184: LGTM! Proper namespace closure comment.

The namespace closure comment follows the coding guidelines requirement for namespace comments.


115-158: Constructor Validation Confirmed – No Action Needed

The Impl constructor already enforces non-null and consistent state checks:

  • Line 127: TLLM_CHECK(mManager); ensures manager is not null.
  • Line 128: TLLM_CHECK(mManager->getCommState().getSelfIdx() == selfIndex); verifies selfIndex matches the manager’s state.

All public methods simply forward to mImpl, maintaining interface consistency.

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (31)

26-26: LGTM: Include added for agent connection utilities.

The new include for agent connection utilities is correctly added to support the refactored connection management functionality.


38-45: LGTM: Helper function and type aliases properly implemented.

The code correctly:

  • Defines type aliases for the agent connection manager and data context
  • Implements tagFromRequestId() helper function with proper bit manipulation to create unique tags
  • Uses constexpr for the constant as per coding guidelines

47-57: LGTM: ReceiveCacheResource struct properly designed.

The struct correctly:

  • Manages buffer manager and CUDA event resources
  • Uses move semantics in the constructor
  • Follows proper resource management patterns
  • Uses member initializer list as per C++ best practices

115-132: LGTM: CacheSender::Impl constructor properly implemented.

The constructor correctly:

  • Initializes all member variables using member initializer list
  • Performs proper null pointer checks with TLLM_CHECK
  • Validates self index consistency
  • Sets up CUDA device context
  • Launches the response thread asynchronously

The use of std::make_shared for CudaStream follows smart pointer best practices.


134-149: LGTM: sendAsync method properly implemented with thread synchronization.

The method correctly:

  • Creates promise/future pair for asynchronous operation
  • Uses proper mutex locking for thread safety
  • Adds requests to the ready responses queue
  • Notifies waiting threads via condition variable
  • Returns the future for caller synchronization

The nested locking approach prevents race conditions.


230-237: LGTM: sendSync method properly implemented.

The method correctly:

  • Looks up the transfer session by request ID
  • Sets the LLM request on the session
  • Delegates formatting to the formatter
  • Includes proper error checking

251-265: LGTM: sendAndRemoveResponse method with proper exception handling.

The method correctly:

  • Uses CUDA device setting for proper context
  • Calls synchronous send and cleanup
  • Provides proper exception handling with promise notification
  • Uses noexcept specification appropriately

267-345: LGTM: Response thread implementation with comprehensive synchronization.

The response thread correctly:

  • Sets thread name for debugging
  • Uses proper CUDA device context
  • Implements condition variable wait with proper predicates
  • Handles termination gracefully
  • Manages request counting and cleanup
  • Supports both parallel and sequential sending modes
  • Includes comprehensive error handling and logging

The thread synchronization logic appears sound with proper use of mutexes and condition variables.


408-418: LGTM: CacheReceiver::Impl constructor properly implemented.

The constructor follows the same pattern as CacheSender::Impl with:

  • Proper member initialization
  • Null pointer validation
  • Self index consistency check
  • CUDA device setup

460-463: LGTM: receiveSync method properly delegates to formatter.

The method correctly delegates the unformatting operation to the formatter using the transfer session, maintaining proper separation of concerns.


465-529: LGTM: sendRequestInfo method with comprehensive connection handling.

The method correctly:

  • Validates formatter support for cache states
  • Handles selective cache transfer optimization
  • Supports both agent and non-agent connection managers
  • Manages buffer index assignment for agent connections
  • Creates transfer session with proper parameters
  • Includes comprehensive error checking and validation

The dual path handling for agent vs. non-agent connections is properly implemented.


531-547: LGTM: getReceiveCacheResource method with proper resource management.

The method correctly:

  • Uses scoped lock for thread safety
  • Implements per-process resource isolation when concurrent mode is enabled
  • Creates resources on-demand with proper RAII
  • Uses default process string for non-concurrent mode

549-559: LGTM: sendRequestInfo helper method properly implemented.

The helper method correctly:

  • Serializes request info using the established serialization utilities
  • Sends data in the proper order (ID, size, then data)
  • Uses appropriate data contexts for different message components

700-704: LGTM: CacheSender public constructor properly implemented.

The constructor correctly creates the implementation with perfect forwarding of the formatter parameter using std::move.


706-721: LGTM: CacheSender public interface methods properly implemented.

All public methods correctly delegate to the implementation:

  • sendAsync() forwards the request properly
  • getCommState() and setCommState() maintain the interface contract
  • Destructor is properly defaulted

723-734: LGTM: CacheReceiver public interface properly implemented.

The constructor and methods correctly:

  • Create implementation with proper parameter forwarding
  • Delegate receiveAsync() to the multi-threaded implementation
  • Use defaulted destructor for proper cleanup

176-228: Dynamic_cast usage is safe – no action needed

Verified that ConnectionManager is indeed polymorphic (it declares virtual methods, including recvConnect, getConnections, and getCommState), so the dynamic_cast in recvRequestInfo() will behave as intended. The interface is stable and handles both agent and non-agent scenarios correctly.


26-26: LGTM! Updated include for agent utilities.

The include path update aligns with the refactored cache transmission structure.


38-45: LGTM! Clean helper function with proper naming.

The tagFromRequestId function follows proper naming conventions and provides a clean abstraction for generating unique tags from request IDs.


47-57: LGTM! Well-structured resource management.

The ReceiveCacheResource struct properly encapsulates buffer manager and CUDA event resources with appropriate constructor.


134-149: LGTM! Proper async pattern with promise/future.

The sendAsync method correctly implements the promise/future pattern with proper synchronization using mutexes and condition variables.


176-228: LGTM! Comprehensive request info handling with proper validation.

The recvRequestInfo method properly handles both agent and direct connection scenarios, with appropriate validation of cache state compatibility and connection management.


230-237: LGTM! Clean synchronous send implementation.

The sendSync method properly retrieves the session and delegates to the formatter, with appropriate error checking.


405-418: LGTM! Consistent constructor pattern.

The CacheReceiver::Impl constructor follows the same validation and initialization pattern as the sender.


465-529: LGTM! Comprehensive request info sending with agent support.

The sendRequestInfo method properly handles both agent and direct connection scenarios, with appropriate buffer management and connection selection logic.


531-547: LGTM! Thread-safe resource management with proper scoping.

The getReceiveCacheResource method uses proper scoped locking and resource management patterns for concurrent access.


549-559: LGTM! Clean request info serialization.

The sendRequestInfo helper method properly serializes and sends request information using the defined communication protocol.


700-735: LGTM! Clean public interface implementation.

The public interface methods properly delegate to the pImpl pattern implementations with appropriate parameter forwarding and resource management.


736-736: LGTM! Proper namespace closure comment.

The namespace closure comment follows the coding guidelines requirement.


267-345: Thread name and exception handling verified

The thread name “dataTransResp” is only set here and does not conflict with other threads. Exception propagation via set_exception(std::current_exception()) matches the pattern used elsewhere (e.g., in workerPool.h). No further changes are needed.


115-132: Ensure ConnectionManager Access Is Thread-Safe

I didn’t find any explicit synchronization (mutexes/locks) around calls to mManager in this class. Since you spawn a background thread via std::async for Impl::response, concurrent calls to mManager could race unless ConnectionManager itself is fully thread-safe.

Please confirm or address this:

• Constructor (lines 115–132): validates and reads from mManager
• Impl::response (launched async): invokes mManager->recvConnect(...) and other methods
receiveAsync()/other methods: call mManager->getConnections(...)

If executor::kv_cache::ConnectionManager is not inherently thread-safe, wrap all shared calls in a mutex or otherwise serialize access to mManager.

Comment on lines 161 to 292
[[nodiscard]] size_t getCounterpartsCount(LlmRequest::RequestIdType requestId) const
{
auto it = mRequestToSession.find(requestId);
TLLM_CHECK(it != mRequestToSession.end());
return it->second.getConnections().size();
}

void release(LlmRequest::RequestIdType requestId)
{
auto it = mRequestToSession.find(requestId);
TLLM_CHECK(it != mRequestToSession.end());
std::unique_lock<std::mutex> lk(mMtxForMap);
mRequestToSession.erase(it);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Check thread safety of session map access.

The getCounterpartsCount and release methods access mRequestToSession map. Verify that all map accesses are properly synchronized.

 [[nodiscard]] size_t getCounterpartsCount(LlmRequest::RequestIdType requestId) const
 {
+    std::unique_lock<std::mutex> lk(mMtxForMap);
     auto it = mRequestToSession.find(requestId);
     TLLM_CHECK(it != mRequestToSession.end());
     return it->second.getConnections().size();
 }
🤖 Prompt for AI Agents
In cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp around lines 161 to 174,
the methods getCounterpartsCount and release access the mRequestToSession map
without synchronization, which can cause race conditions. To fix this, ensure
that all accesses to mRequestToSession are protected by acquiring the mutex
mMtxForMap before accessing or modifying the map. Add a
std::unique_lock<std::mutex> lock(mMtxForMap) at the start of
getCounterpartsCount and confirm the existing lock in release covers the map
access properly.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12929 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #9638 completed with status: 'FAILURE'

@Tabrizian Tabrizian force-pushed the user/imant/useHashIds branch 2 times, most recently from ddc66c6 to 7d3e452 Compare July 25, 2025 19:27
@Tabrizian
Copy link
Member Author

/bot run

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🔭 Outside diff range comments (2)
cpp/tensorrt_llm/batch_manager/dataTransceiver.h (1)

18-18: Replace #pragma once with required preprocessor guard.

The coding guidelines specifically require preprocessor guards in header files with the format TRTLLM_ followed by the filename in caps.

Replace the current directive with the required guard format:

-#pragma once
+#ifndef TRTLLM_DATATRANSCEIVER_H
+#define TRTLLM_DATATRANSCEIVER_H

And add the closing directive at the end of the file:

 } // namespace tensorrt_llm::batch_manager
+
+#endif // TRTLLM_DATATRANSCEIVER_H
cpp/tensorrt_llm/batch_manager/cacheFormatter.h (1)

225-261: Fix member variable naming convention.

The member variable kvCacheMeasureHelper should follow the coding guidelines for class member variables.

Apply this fix:

-    KvCacheMeasureHelper kvCacheMeasureHelper{common::getEnvKVCacheTransferOutputPath()};
+    KvCacheMeasureHelper mKvCacheMeasureHelper{common::getEnvKVCacheTransferOutputPath()};

Class member variables should use camelCase prefixed with 'm' according to the coding guidelines.

♻️ Duplicate comments (2)
cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (2)

161-174: Thread safety issue with session map access.

The getCounterpartsCount method accesses mRequestToSession without proper synchronization, while release uses mutex protection. This creates a race condition.

This is the same thread safety issue identified in previous reviews. The getCounterpartsCount method needs mutex protection:

 [[nodiscard]] size_t getCounterpartsCount(LlmRequest::RequestIdType requestId) const
 {
+    std::unique_lock<std::mutex> lk(mMtxForMap);
     auto it = mRequestToSession.find(requestId);
     TLLM_CHECK(it != mRequestToSession.end());
     return it->second.getConnections().size();
 }

420-424: TODO: Address inefficient thread creation pattern.

The method creates a new thread for each async receive operation, which is inefficient as noted in the TODO comment.

This matches the previous review feedback about frequent thread creation. The TODO comment correctly identifies this as needing improvement. Consider using the existing requestAndReceiveAsyncMultiThreads pattern or implementing a thread pool.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 90d8774 and 7d3e452.

📒 Files selected for processing (13)
  • cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h (2 hunks)
  • cpp/tensorrt_llm/batch_manager/CMakeLists.txt (0 hunks)
  • cpp/tensorrt_llm/batch_manager/cacheFormatter.h (3 hunks)
  • cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp (7 hunks)
  • cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (15 hunks)
  • cpp/tensorrt_llm/batch_manager/dataTransceiver.h (3 hunks)
  • cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp (0 hunks)
  • cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.h (0 hunks)
  • cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp (2 hunks)
  • cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h (2 hunks)
  • cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.cpp (1 hunks)
  • cpp/tests/batch_manager/cacheTransceiverTest.cpp (11 hunks)
  • cpp/tests/unit_tests/executor/ucxCommTest.cpp (0 hunks)
💤 Files with no reviewable changes (4)
  • cpp/tensorrt_llm/batch_manager/CMakeLists.txt
  • cpp/tests/unit_tests/executor/ucxCommTest.cpp
  • cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.h
  • cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp
✅ Files skipped from review due to trivial changes (1)
  • cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.cpp
🚧 Files skipped from review as they are similar to previous changes (5)
  • cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h
  • cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h
  • cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp
  • cpp/tests/batch_manager/cacheTransceiverTest.cpp
  • cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{cpp,h,hpp,cc,cxx}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.{cpp,h,hpp,cc,cxx}: Closing braces of namespaces should have a comment saying the namespace it closes (e.g., } // namespace foo)
Prefer const or constexpr variables over #defines whenever possible, as the latter are not visible to the compiler.
A variable that is not modified after its initialization should be declared as const.
Except 0, nullptr, true, false, all other literals should only be used for variable initialization.
Use the Allman indentation style for braces in C++ code.
Put the semicolon for an empty for or while loop in a new line.
The statement forming the body of a switch, while, do .. while or for statement shall be a compound statement (use brace-delimited statements).
If and else should always be followed by brace-delimited statements, even if empty or a single statement.
C++ filenames should use camel case with first letter lowercase (e.g., thisIsAFilename.cpp), and must be case-insensitive unique within a compilation target.
All types (including class names) should use camel case with uppercase first letter (e.g., FooBarClass).
Local variables, methods, and namespaces should use camel case with first letter lowercase (e.g., localFooBar).
Non-magic-number global variables that are non-static and not defined in anonymous namespace should use camel case prefixed by a lower case 'g' (e.g., gDontUseGlobalFoos).
Non-magic-number global variables that are static or defined in an anonymous namespace should use camel case prefixed by a lower case 's' (e.g., sMutableStaticGlobal).
Locally visible static variable should use camel case with lowercase prefix 's' as the first letter of the name (e.g., static std::once_flag sFlag;).
Class member variables should use camel case prefixed with an 'm' (e.g., mNbFooValues). Public member variables do not require the 'm' prefix but it is encouraged for clarity.
Enumerations, global constants, static constants at class-scope, and function-scope magic-number/literal constants are uppercase snakecase with prefix...

Files:

  • cpp/tensorrt_llm/batch_manager/cacheFormatter.h
  • cpp/tensorrt_llm/batch_manager/dataTransceiver.h
  • cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp
**/*.{h,hpp}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

Use a preprocessor guard in header files. The guard name must have prefix TRTLLM_ followed by the filename, all in caps, and no trailing underscore.

Files:

  • cpp/tensorrt_llm/batch_manager/cacheFormatter.h
  • cpp/tensorrt_llm/batch_manager/dataTransceiver.h
**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

  • cpp/tensorrt_llm/batch_manager/cacheFormatter.h
  • cpp/tensorrt_llm/batch_manager/dataTransceiver.h
  • cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp
🧠 Learnings (3)
cpp/tensorrt_llm/batch_manager/cacheFormatter.h (1)

Learnt from: amitz-nv
PR: #5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks is_adapter_in_cpu_cache() and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.

cpp/tensorrt_llm/batch_manager/dataTransceiver.h (1)

Learnt from: amitz-nv
PR: #5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks is_adapter_in_cpu_cache() and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (1)

Learnt from: amitz-nv
PR: #5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks is_adapter_in_cpu_cache() and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.

🔇 Additional comments (24)
cpp/tensorrt_llm/batch_manager/dataTransceiver.h (6)

1-16: LGTM! Copyright header complies with coding guidelines.

The NVIDIA copyright header includes the current year (2025) as required by the coding guidelines for TensorRT-LLM Open Source Software.


24-24: Good addition of cacheFormatter.h include.

The new include aligns with the refactored design that moves TransferSession definition to the kv_cache_manager namespace.


42-58: Well-designed TransceiverTag structure follows coding guidelines.

The type aliases and TransceiverTag struct properly follow C++ naming conventions:

  • SizeType32 alias uses camelCase as required
  • TransferSession alias provides clear abstraction boundary
  • Enum values use uppercase naming (REQUEST_SEND, TERMINATION)
  • Static constexpr constants use kCONSTANT_NAME pattern with proper k prefix

115-158: CacheSender class design follows best practices.

The refactored class demonstrates good design principles:

  • Proper UpperCamelCase naming for class name
  • Method names use lowerCamelCase convention
  • Smart pointer usage with unique_ptr for resource management
  • Pimpl pattern reduces compilation dependencies
  • Constructor parameters use appropriate types (ConnectionManager*, CacheState, etc.)

160-182: CacheReceiver class maintains design consistency.

The class follows the same excellent design patterns as CacheSender:

  • Consistent naming conventions and interface design
  • Symmetric constructor signature with CacheSender
  • Appropriate method names for receiving operations
  • Proper use of pimpl pattern for implementation hiding

184-184: Perfect namespace closing comment.

The closing brace comment follows the coding guidelines requirement exactly: } // namespace tensorrt_llm::batch_manager

cpp/tensorrt_llm/batch_manager/cacheFormatter.h (4)

1-16: Copyright header meets coding guidelines.

The NVIDIA copyright header correctly includes the current year (2025) as required for TensorRT-LLM Open Source Software.


45-121: TransferSession class demonstrates excellent C++ design practices.

The class properly follows all coding guidelines:

  • UpperCamelCase class name
  • Member variables with proper 'm' prefix (mConnections, mDataContext, etc.)
  • Method names use lowerCamelCase convention
  • Proper const-correctness with const methods and const references
  • Good use of TLLM_CHECK for assertions
  • Clean initializer list in constructor

123-161: BaseCacheFormatter interface demonstrates clean design.

The interface properly follows coding guidelines:

  • Method names use lowerCamelCase convention
  • Proper virtual destructor
  • Clean signature with TransferSession& parameter
  • Appropriate const-correctness
  • Good use of [[nodiscard]] attribute for return values

266-266: Perfect namespace closing comment.

The closing brace comment properly follows coding guidelines with the complete namespace path: } // namespace tensorrt_llm::batch_manager::kv_cache_manager

cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (14)

1-16: Copyright header complies with coding guidelines.

The NVIDIA copyright header correctly includes the current year (2025) as required for TensorRT-LLM Open Source Software.


38-57: Well-designed utility functions and structures.

The code follows C++ coding guidelines properly:

  • tagFromRequestId function uses lowerCamelCase and static linkage
  • ReceiveCacheResource struct uses UpperCamelCase naming
  • Member variables correctly use 'm' prefix (mBufferManager, mCudaEvent)
  • Constant kDATA_TAG follows uppercase with 'k' prefix convention

115-133: CacheSender::Impl constructor follows best practices.

The constructor demonstrates good C++ practices:

  • Proper member initialization list
  • Smart pointer usage with unique_ptr
  • Appropriate validation with TLLM_CHECK
  • Member variables follow 'm' prefix convention

176-228: Complex but well-structured request info handling.

The method demonstrates good practices:

  • Proper type checking with dynamic_cast for agent connections
  • Clean lambda usage for agent-specific logic
  • Correct mutex synchronization for mRequestToSession access
  • Good validation of formatter compatibility between cache states
  • Appropriate session creation and connection management

230-237: Clean and focused sendSync implementation.

The method properly delegates to formatter and manages the session state appropriately.


251-403: Robust threading and response management.

The response handling demonstrates excellent threading practices:

  • Proper exception handling with promise/future pattern
  • Correct CUDA device management across threads
  • Thread naming for debugging (dataTransResp)
  • Proper condition variable usage for synchronization
  • Clean termination handling in destructor

405-418: CacheReceiver::Impl constructor follows consistent design.

The constructor maintains the same excellent practices as CacheSender::Impl with proper initialization and validation.


426-458: Excellent async request handling with thread pool pattern.

This method demonstrates the proper solution for efficient async operations:

  • Resource pooling per process to avoid thread creation overhead
  • Queue-based task distribution with worker threads
  • Proper synchronization with mutex and condition variables
  • Good exception handling and future management
  • This addresses the inefficiency noted in the receiveAsync TODO

460-463: Clean receiveSync delegation.

Simple and appropriate delegation to the formatter's unformat method.


465-559: Comprehensive sendRequestInfo implementation.

The method handles complex scenarios well:

  • Proper validation of formatter support between cache states
  • Clean handling of both agent and regular connection managers
  • Appropriate serialization of request information
  • Good resource management with TransferSession creation
  • Proper exception handling and validation throughout

531-559: Solid resource management with thread safety.

The resource management demonstrates good practices:

  • Thread-safe access with scoped_lock<mutex>
  • Lazy initialization of per-process resources
  • Proper RAII with unique_ptr and BufferManager
  • Clean helper method for connection communication

561-698: Excellent async resource management and cleanup.

The implementation demonstrates robust design:

  • Proper RAII in destructor with resource cleanup and future synchronization
  • Good CUDA device management and logging in requestSync
  • Well-designed AsyncResource structure for thread coordination
  • Proper move semantics in RequestAndPromise
  • Comprehensive error handling with exception propagation
  • Thread naming for debugging (dataTransRequest)

700-734: Clean public interface following pimpl pattern.

The public interface demonstrates excellent design:

  • Proper delegation to implementation classes
  • Clean constructor forwarding with smart pointer management
  • Appropriate method delegation maintaining interface contracts
  • Correctly defaulted destructors for pimpl pattern

736-737: Perfect namespace closing comment.

The closing brace comment properly follows coding guidelines: } // namespace tensorrt_llm::batch_manager

@tensorrt-cicd
Copy link
Collaborator

PR_Github #13033 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #13033 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #9736 completed with status: 'FAILURE'

@Tabrizian Tabrizian force-pushed the user/imant/useHashIds branch from 7d3e452 to 1a21dba Compare August 12, 2025 20:06
@Tabrizian Tabrizian requested a review from a team as a code owner August 12, 2025 20:06
@Tabrizian Tabrizian force-pushed the user/imant/useHashIds branch from f728821 to 5a46db5 Compare September 16, 2025 19:18
@Tabrizian
Copy link
Member Author

/bot run --disable-fail-fast

@Tabrizian
Copy link
Member Author

@pcastonguay / @Shixiaowei02 I addressed the review comments. Please review when you get a chance. Thanks.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #18822 [ run ] triggered by Bot

@Tabrizian Tabrizian force-pushed the user/imant/useHashIds branch 2 times, most recently from 6a094b9 to f3db06d Compare September 16, 2025 20:04
@tensorrt-cicd
Copy link
Collaborator

PR_Github #18822 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #14114 completed with status: 'FAILURE'

@Tabrizian Tabrizian force-pushed the user/imant/useHashIds branch 2 times, most recently from 9adbf1c to 1189980 Compare September 17, 2025 00:45
@Tabrizian
Copy link
Member Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #18845 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #18845 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #14126 completed with status: 'FAILURE'

Signed-off-by: Iman Tabrizian <[email protected]>

Add unittest for findBlocksInReuseTreeByHashes

Signed-off-by: Iman Tabrizian <[email protected]>

fixes

Signed-off-by: Iman Tabrizian <[email protected]>

Switch from hash id to block key

Signed-off-by: Iman Tabrizian <[email protected]>

Add support for blockKeys

Signed-off-by: Iman Tabrizian <[email protected]>

Fix bugs

Signed-off-by: Iman Tabrizian <[email protected]>

Fix accuracy bug and add tests

Signed-off-by: Iman Tabrizian <[email protected]>
Signed-off-by: Iman Tabrizian <[email protected]>
Signed-off-by: Iman Tabrizian <[email protected]>
@Tabrizian Tabrizian force-pushed the user/imant/useHashIds branch from 1189980 to 5c39f4f Compare September 18, 2025 14:16
Signed-off-by: Iman Tabrizian <[email protected]>
@Tabrizian
Copy link
Member Author

/bot run --disable-fail-fast

@Tabrizian Tabrizian force-pushed the user/imant/useHashIds branch from 08ffe67 to b91e12e Compare September 18, 2025 16:15
@tensorrt-cicd
Copy link
Collaborator

PR_Github #19211 [ run ] triggered by Bot

@pcastonguay
Copy link
Collaborator

@chuangz0 @Shixiaowei02 could you please review the kvCacheTransceiver changes? Thanks.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #19211 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #14422 completed with status: 'FAILURE'

Signed-off-by: Iman Tabrizian <[email protected]>
@Tabrizian Tabrizian force-pushed the user/imant/useHashIds branch from b91e12e to 0e69eb5 Compare September 19, 2025 17:46
Signed-off-by: Iman Tabrizian <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants