-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[TRTLLM-6106][feat] Add support for KVCache transfer from KVCache reuse path #6348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Caution Review failedAn error occurred during the review process. Please try again later. 📝 WalkthroughWalkthroughRenames DataResponder/DataRequester → CacheSender/CacheReceiver, deletes legacy impl files, introduces TransferSession and hash-driven block-range transfers, adds pinBlocks and hash-based reuse lookups, updates related headers, tests, Python binding, and serialization for BlockKey. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor PyExec as PyExecutor
participant CT as CacheTransceiver
participant CS as CacheSender
participant CR as CacheReceiver
participant CM as ConnectionManager
participant F as BaseCacheFormatter
participant KM as KVCacheManager
rect rgb(245,248,255)
note over PyExec,CT: Send path (async)
PyExec->>CT: respondAndSendAsync(req)
CT->>CS: sendAsync(req)
CS->>CM: resolve connections/peers
CS->>F: format(TransferSession{allBlockKeys, indexFromEnd})
F->>KM: getBlockRangeForSending(..., allBlockKeys, indexFromEnd)
F-->>CS: formatted buffers
CS-->>CT: future<void> (completion)
end
rect rgb(245,255,245)
note over PyExec,CT: Receive path (async)
PyExec->>CT: requestAndReceiveAsync(req)
CT->>CR: receiveAsync(req)
CR->>CM: send RequestInfo (state + allBlockKeys + indexFromEnd)
CR->>F: unformat(TransferSession)
F->>KM: allocate or reuse blocks by hashes
CR-->>CT: future<void> (completion)
end
rect rgb(255,248,240)
note over PyExec,KM: On completion (if block reuse enabled)
PyExec->>KM: pinBlocks(requestId)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested labels
Suggested reviewers
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🔭 Outside diff range comments (4)
cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h (1)
18-18
: Use preprocessor guard instead of pragma once.According to coding guidelines, header files should use a preprocessor guard with prefix
TRTLLM_
followed by the filename in caps.Replace
#pragma once
with a proper header guard:-#pragma once +#ifndef TRTLLM_CONNECTION_H +#define TRTLLM_CONNECTION_HAnd add at the end of the file:
+#endif // TRTLLM_CONNECTION_H
cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h (2)
17-17
: Use preprocessor guard instead of pragma once.According to coding guidelines, header files should use a preprocessor guard with prefix
TRTLLM_
followed by the filename in caps.Replace
#pragma once
with a proper header guard:-#pragma once +#ifndef TRTLLM_CACHETRANSCEIVER_H +#define TRTLLM_CACHETRANSCEIVER_HAnd add at the end of the file:
+#endif // TRTLLM_CACHETRANSCEIVER_H
2-2
: Update copyright year to include 2025.According to coding guidelines, all TensorRT-LLM code should contain an NVIDIA copyright header that includes the current year (2025).
- * Copyright (c) 2023-2024, NVIDIA CORPORATION. All rights reserved. + * Copyright (c) 2023-2025, NVIDIA CORPORATION. All rights reserved.cpp/tests/batch_manager/cacheTransceiverTest.cpp (1)
655-658
: Remove unused variable assignment.The variable
cacheType
is assigned but never used. This appears to be dead code.- if (kvFactor == 1) - { - auto cacheType = CacheType::kSELFKONLY; - }
🧹 Nitpick comments (2)
examples/disaggregated/clients/disagg_client.py (1)
81-126
: Consider adding timing measurements tosend_chat_request
for consistency.The
send_chat_request
function lacks the same TTFT and ITL timing measurements that were added tosend_request
. This creates inconsistent behavior between the two endpoints.Consider adding similar timing measurements to
send_chat_request
or extracting the timing logic into a shared helper function to maintain consistency across both endpoints.examples/disaggregated/clients/prompts.json (1)
1-3
: Consider maintaining diverse test prompts for comprehensive testing.While simplifying to a single prompt reduces complexity, having diverse prompts helps test various scenarios (different lengths, complexity, languages, etc.). Consider whether this reduction impacts test coverage adequately.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h
(2 hunks)cpp/tensorrt_llm/batch_manager/CMakeLists.txt
(0 hunks)cpp/tensorrt_llm/batch_manager/cacheFormatter.h
(3 hunks)cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp
(7 hunks)cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp
(15 hunks)cpp/tensorrt_llm/batch_manager/dataTransceiver.h
(3 hunks)cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp
(0 hunks)cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.h
(0 hunks)cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp
(2 hunks)cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h
(2 hunks)cpp/tests/batch_manager/cacheTransceiverTest.cpp
(11 hunks)examples/disaggregated/clients/disagg_client.py
(3 hunks)examples/disaggregated/clients/prompts.json
(1 hunks)
💤 Files with no reviewable changes (3)
- cpp/tensorrt_llm/batch_manager/CMakeLists.txt
- cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.h
- cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{cpp,h,hpp,cc,cxx}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.{cpp,h,hpp,cc,cxx}
: Closing braces of namespaces should have a comment saying the namespace it closes (e.g., } // namespace foo)
Prefer const or constexpr variables over #defines whenever possible, as the latter are not visible to the compiler.
A variable that is not modified after its initialization should be declared as const.
Except 0, nullptr, true, false, all other literals should only be used for variable initialization.
Use the Allman indentation style for braces in C++ code.
Put the semicolon for an empty for or while loop in a new line.
The statement forming the body of a switch, while, do .. while or for statement shall be a compound statement (use brace-delimited statements).
If and else should always be followed by brace-delimited statements, even if empty or a single statement.
C++ filenames should use camel case with first letter lowercase (e.g., thisIsAFilename.cpp), and must be case-insensitive unique within a compilation target.
All types (including class names) should use camel case with uppercase first letter (e.g., FooBarClass).
Local variables, methods, and namespaces should use camel case with first letter lowercase (e.g., localFooBar).
Non-magic-number global variables that are non-static and not defined in anonymous namespace should use camel case prefixed by a lower case 'g' (e.g., gDontUseGlobalFoos).
Non-magic-number global variables that are static or defined in an anonymous namespace should use camel case prefixed by a lower case 's' (e.g., sMutableStaticGlobal).
Locally visible static variable should use camel case with lowercase prefix 's' as the first letter of the name (e.g., static std::once_flag sFlag;).
Class member variables should use camel case prefixed with an 'm' (e.g., mNbFooValues). Public member variables do not require the 'm' prefix but it is encouraged for clarity.
Enumerations, global constants, static constants at class-scope, and function-scope magic-number/literal constants are uppercase snakecase with prefix...
Files:
cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp
cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h
cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp
cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h
cpp/tensorrt_llm/batch_manager/cacheFormatter.h
cpp/tensorrt_llm/batch_manager/dataTransceiver.h
cpp/tests/batch_manager/cacheTransceiverTest.cpp
cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp
**/*.{cpp,h,hpp,cc,cxx,cu,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp
examples/disaggregated/clients/disagg_client.py
cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h
cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp
cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h
cpp/tensorrt_llm/batch_manager/cacheFormatter.h
cpp/tensorrt_llm/batch_manager/dataTransceiver.h
cpp/tests/batch_manager/cacheTransceiverTest.cpp
cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py
: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.
Files:
examples/disaggregated/clients/disagg_client.py
**/*.{h,hpp}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
Use a preprocessor guard in header files. The guard name must have prefix TRTLLM_ followed by the filename, all in caps, and no trailing underscore.
Files:
cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h
cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h
cpp/tensorrt_llm/batch_manager/cacheFormatter.h
cpp/tensorrt_llm/batch_manager/dataTransceiver.h
🧠 Learnings (3)
cpp/tensorrt_llm/batch_manager/cacheFormatter.h (1)
Learnt from: amitz-nv
PR: #5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks is_adapter_in_cpu_cache()
and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.
cpp/tensorrt_llm/batch_manager/dataTransceiver.h (1)
Learnt from: amitz-nv
PR: #5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks is_adapter_in_cpu_cache()
and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.
cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (1)
Learnt from: amitz-nv
PR: #5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks is_adapter_in_cpu_cache()
and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.
🔇 Additional comments (24)
cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h (1)
178-178
: Consistent renaming verifiedI performed a repository-wide search and confirmed there are no remaining occurrences of
mReceiverBufferDesc
. All references in
- cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h
- cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp
have been updated to
mCacheReceiverBufferDesc
.LGTM!
cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp (2)
84-84
: LGTM! Consistent implementation of the renamed member variable.The changes correctly implement the renamed
mCacheReceiverBufferDesc
member variable throughout the implementation, maintaining consistency with the header file updates.Also applies to: 140-142
468-468
: LGTM! Proper namespace closing comment.The namespace closing comment follows the coding guidelines correctly.
cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h (1)
37-38
: Renaming Verification CompleteAll references to the old class names have been removed:
- No matches for
DataResponder
- No matches for
DataRequester
- No matches for
mDataSender
ormDataReceiver
The renaming to
CacheSender
/CacheReceiver
is consistent and complete. No further action required.cpp/tests/batch_manager/cacheTransceiverTest.cpp (6)
159-185
: LGTM! Mock class properly renamed and implemented.The mock class follows appropriate testing patterns with proper use of Google Mock features.
186-192
: LGTM! Mock receiver class properly implemented.The mock methods are correctly defined for testing cache receiving functionality.
211-230
: LGTM! Test properly updated to use renamed classes.The test correctly uses
CacheSender
instead ofDataResponder
while maintaining the original test logic.
232-252
: LGTM! Test properly uses CacheReceiver.The test correctly replaces
DataRequester
withCacheReceiver
while maintaining test functionality.
396-404
: LGTM! Cache transceiver setup properly updated.The setup correctly instantiates
CacheSender
andCacheReceiver
with proper implementations usingCacheSenderImpl
andCacheReceiverImpl
.
762-768
: LGTM! Asymmetric cache test properly updated.The test correctly uses
CacheSender
andCacheReceiver
with their implementation classes.cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp (3)
196-200
: LGTM! Proper initialization of cache sender and receiver.The constructor correctly initializes
CacheSender
andCacheReceiver
with appropriate parameters and formatter creation.
236-298
: LGTM! Methods properly updated to use new cache classes.All async send/receive operations correctly use
mCacheSender
andmCacheReceiver
with proper future handling.
369-436
: LGTM! Context transfer status checking properly updated.The method correctly uses
mSenderFutures
(renamed frommResponderFutures
) while maintaining the original synchronization logic.cpp/tensorrt_llm/batch_manager/cacheFormatter.h (2)
45-122
: LGTM! Well-designed TransferSession class.The class provides a clean abstraction for managing transfer sessions with proper const-correctness, precondition checks, and encapsulation.
163-222
: LGTM! Performance measurement helper properly implemented.The class provides thread-safe performance measurement with proper file output handling. Good use of RAII pattern for automatic output on destruction.
cpp/tensorrt_llm/batch_manager/dataTransceiver.h (1)
115-183
: LGTM! Clean interface design with pImpl pattern.The
CacheSender
andCacheReceiver
classes provide clean interfaces with proper use of the pImpl idiom for implementation hiding and ABI stability.cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (8)
1-32
: LGTM: Copyright header and includes look good.The copyright header contains the current year (2025) as required, and the new include for agent connection utilities aligns with the refactoring objectives.
38-57
: Good RAII design and helper functions.The new helper structures and functions are well-designed:
- Using declarations improve readability
ReceiveCacheResource
follows RAII principles correctlytagFromRequestId
implements proper bit manipulation for tag generationThe magic number
kDATA_TAG{43}
is appropriately named as a constexpr constant.
115-174
: Well-designed constructor and core methods.The implementation demonstrates good practices:
- Constructor properly validates inputs with
TLLM_CHECK
- Resource management uses smart pointers appropriately
- Thread safety is maintained with proper mutex usage
- Dependency injection pattern is correctly implemented
The new methods
getCounterpartsCount
andrelease
provide clean session management.
176-228
: Complex but well-structured request info handling.The method correctly handles both agent and direct connection scenarios:
- Appropriate use of
dynamic_cast
for type discrimination- Comprehensive error handling with informative messages
- Proper thread safety with mutex protection
- Session management is correctly implemented
The method's complexity is justified given the dual connection handling requirements.
230-345
: Excellent thread management and synchronization.The implementation demonstrates solid concurrent programming practices:
- Clean separation between sync and async operations
- Proper use of condition variables and mutexes for thread coordination
- Exception handling ensures promises are always resolved
- Thread naming (
dataTransResp
) aids in debugging- Producer-consumer pattern is correctly implemented
The variable renaming from responder to sender terminology is consistent with the refactor.
405-529
: Solid CacheReceiver implementation with good async handling.The receiver implementation mirrors the sender's good practices:
- Constructor properly validates inputs and manages resources
- Async operations are correctly implemented with
std::async
- Complex connection handling for both agent and direct scenarios
- Proper resource management with
ReceiveCacheResource
- Thread safety maintained throughout
The symmetry with
CacheSender
provides a clean, consistent API.
560-698
: Well-implemented async resource management.The async processing implementation demonstrates good concurrent design:
AsyncResource
properly encapsulates thread synchronization primitives- Worker thread pattern correctly processes queued requests
- Proper termination handling prevents resource leaks
- Exception handling ensures all promises are resolved
- Thread naming (
dataTransRequest
) aids debugging- Use of
std::deque
for request queue is appropriateThe cleanup logic in the destructor properly terminates all worker threads.
700-736
: Clean public interface implementation using pimpl idiom.The public interfaces are well-designed:
- Proper use of pimpl idiom with
std::unique_ptr<Impl>
- Constructor delegation to implementation is clean
- Method delegation is straightforward and maintains const correctness
- Defaulted destructors are appropriate with smart pointer members
The namespace closing comment follows the coding guidelines correctly.
eed674f
to
376d131
Compare
/bot run |
376d131
to
90d8774
Compare
PR_Github #12929 [ run ] triggered by Bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (1)
420-424
: Consider thread pool instead of frequent thread creation.The TODO comment indicates awareness of the performance issue with frequent thread creation. The current implementation uses
std::async
withstd::launch::async
which may create new threads frequently.Consider implementing a thread pool for better performance:
- return std::async(std::launch::async, &CacheReceiver::Impl::requestSync, this, std::ref(llmRequest)); + // Use thread pool implementation instead of direct async call + return mThreadPool.submit(&CacheReceiver::Impl::requestSync, this, std::ref(llmRequest));
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h
(2 hunks)cpp/tensorrt_llm/batch_manager/CMakeLists.txt
(0 hunks)cpp/tensorrt_llm/batch_manager/cacheFormatter.h
(3 hunks)cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp
(7 hunks)cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp
(15 hunks)cpp/tensorrt_llm/batch_manager/dataTransceiver.h
(3 hunks)cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp
(0 hunks)cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.h
(0 hunks)cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp
(2 hunks)cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h
(2 hunks)cpp/tests/batch_manager/cacheTransceiverTest.cpp
(11 hunks)
💤 Files with no reviewable changes (3)
- cpp/tensorrt_llm/batch_manager/CMakeLists.txt
- cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.h
- cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp
🚧 Files skipped from review as they are similar to previous changes (6)
- cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp
- cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h
- cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h
- cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp
- cpp/tensorrt_llm/batch_manager/cacheFormatter.h
- cpp/tests/batch_manager/cacheTransceiverTest.cpp
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{cpp,h,hpp,cc,cxx}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.{cpp,h,hpp,cc,cxx}
: Closing braces of namespaces should have a comment saying the namespace it closes (e.g., } // namespace foo)
Prefer const or constexpr variables over #defines whenever possible, as the latter are not visible to the compiler.
A variable that is not modified after its initialization should be declared as const.
Except 0, nullptr, true, false, all other literals should only be used for variable initialization.
Use the Allman indentation style for braces in C++ code.
Put the semicolon for an empty for or while loop in a new line.
The statement forming the body of a switch, while, do .. while or for statement shall be a compound statement (use brace-delimited statements).
If and else should always be followed by brace-delimited statements, even if empty or a single statement.
C++ filenames should use camel case with first letter lowercase (e.g., thisIsAFilename.cpp), and must be case-insensitive unique within a compilation target.
All types (including class names) should use camel case with uppercase first letter (e.g., FooBarClass).
Local variables, methods, and namespaces should use camel case with first letter lowercase (e.g., localFooBar).
Non-magic-number global variables that are non-static and not defined in anonymous namespace should use camel case prefixed by a lower case 'g' (e.g., gDontUseGlobalFoos).
Non-magic-number global variables that are static or defined in an anonymous namespace should use camel case prefixed by a lower case 's' (e.g., sMutableStaticGlobal).
Locally visible static variable should use camel case with lowercase prefix 's' as the first letter of the name (e.g., static std::once_flag sFlag;).
Class member variables should use camel case prefixed with an 'm' (e.g., mNbFooValues). Public member variables do not require the 'm' prefix but it is encouraged for clarity.
Enumerations, global constants, static constants at class-scope, and function-scope magic-number/literal constants are uppercase snakecase with prefix...
Files:
cpp/tensorrt_llm/batch_manager/dataTransceiver.h
cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp
**/*.{h,hpp}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
Use a preprocessor guard in header files. The guard name must have prefix TRTLLM_ followed by the filename, all in caps, and no trailing underscore.
Files:
cpp/tensorrt_llm/batch_manager/dataTransceiver.h
**/*.{cpp,h,hpp,cc,cxx,cu,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
cpp/tensorrt_llm/batch_manager/dataTransceiver.h
cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp
🧠 Learnings (2)
cpp/tensorrt_llm/batch_manager/dataTransceiver.h (1)
Learnt from: amitz-nv
PR: #5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks is_adapter_in_cpu_cache()
and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.
cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (1)
Learnt from: amitz-nv
PR: #5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks is_adapter_in_cpu_cache()
and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (48)
cpp/tensorrt_llm/batch_manager/dataTransceiver.h (17)
24-24
: LGTM: Include added for new cache formatter abstraction.The new include for
cacheFormatter.h
is correctly added to support the refactored architecture that usesBaseCacheFormatter
andTransferSession
from thekv_cache_manager
namespace.
42-44
: LGTM: Type aliases properly updated for the refactored architecture.The type aliases correctly reference the new abstractions:
SizeType32
properly aliased from runtime namespaceTransferSession
correctly aliased fromkv_cache_manager
namespace, replacing the previous internal definition
45-58
: LGTM: TransceiverTag struct provides proper message tagging.The
TransceiverTag
struct correctly defines:
- Enumerated message types for REQUEST_SEND and TERMINATION
- Appropriate tag constants for ID, INFO_SIZE, and INFO message components
- Uses
constexpr
for compile-time constants as per coding guidelinesThe naming follows the coding guidelines with uppercase snake_case for constants prefixed with 'k'.
58-58
: LGTM: BaseCacheFormatter type alias correctly defined.The type alias correctly references the
BaseCacheFormatter
from thekv_cache_manager
namespace, maintaining consistency with the new architecture.
115-121
: LGTM: CacheSender constructor signature properly designed.The constructor correctly accepts:
ConnectionManager*
for managing connectionsCacheState
for cache state informationSizeType32
for self indexstd::unique_ptr<BaseCacheFormatter>
for cache formatting operationsThe use of
std::unique_ptr
follows C++ best practices for single resource ownership as specified in the coding guidelines.
122-126
: LGTM: sendAsync method signature correctly updated.The method signature properly:
- Returns
std::future<void>
for asynchronous operation- Takes
LlmRequest&
as parameter- Uses
[[nodiscard]]
attribute appropriately- Includes comprehensive documentation
130-134
: LGTM: Communication state accessors properly updated.The
getCommState()
andsetCommState()
methods correctly use the newexecutor::kv_cache::CommState
type, maintaining the same interface pattern as before but with updated types.
136-150
: LGTM: Additional methods properly defined for the refactored architecture.The new methods correctly provide:
recvRequestInfo()
for receiving request informationsendSync()
for synchronous sendingsendRequestInfo()
returningTransferSession
receiveSync()
acceptingTransferSession&
All methods use appropriate parameter types and return types for the new architecture.
153-153
: LGTM: Destructor properly declared.The destructor is correctly declared and will be implemented to properly clean up resources.
160-177
: LGTM: CacheReceiver class properly designed.The
CacheReceiver
class correctly:
- Has matching constructor signature with
CacheSender
- Provides
receiveAsync()
method for asynchronous reception- Includes
sendRequestInfo()
andreceiveSync()
methods for coordination- Uses proper parameter and return types
- Follows the same design patterns as
CacheSender
24-24
: LGTM! Proper include for new dependency.The inclusion of
cacheFormatter.h
aligns with the refactoring to usekv_cache_manager
types.
42-44
: LGTM! Clean type aliasing approach.The type aliases consolidate dependencies and the
TransferSession
alias properly delegates to thekv_cache_manager
namespace as intended by the refactor.
45-57
: LGTM! Well-structured communication protocol.The
TransceiverTag
struct provides a clean enumeration of message types and associated tag constants for the communication protocol.
58-58
: LGTM! Consistent type aliasing.The
BaseCacheFormatter
alias maintains consistency with the delegation pattern tokv_cache_manager
.
160-182
: LGTM! Well-designed receiver interface.The
CacheReceiver
class interface mirrors the sender pattern appropriately, with proper constructor signature and async receive capability.
184-184
: LGTM! Proper namespace closure comment.The namespace closure comment follows the coding guidelines requirement for namespace comments.
115-158
: Constructor Validation Confirmed – No Action NeededThe
Impl
constructor already enforces non-null and consistent state checks:
- Line 127:
TLLM_CHECK(mManager);
ensuresmanager
is not null.- Line 128:
TLLM_CHECK(mManager->getCommState().getSelfIdx() == selfIndex);
verifiesselfIndex
matches the manager’s state.All public methods simply forward to
mImpl
, maintaining interface consistency.cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (31)
26-26
: LGTM: Include added for agent connection utilities.The new include for agent connection utilities is correctly added to support the refactored connection management functionality.
38-45
: LGTM: Helper function and type aliases properly implemented.The code correctly:
- Defines type aliases for the agent connection manager and data context
- Implements
tagFromRequestId()
helper function with proper bit manipulation to create unique tags- Uses
constexpr
for the constant as per coding guidelines
47-57
: LGTM: ReceiveCacheResource struct properly designed.The struct correctly:
- Manages buffer manager and CUDA event resources
- Uses move semantics in the constructor
- Follows proper resource management patterns
- Uses member initializer list as per C++ best practices
115-132
: LGTM: CacheSender::Impl constructor properly implemented.The constructor correctly:
- Initializes all member variables using member initializer list
- Performs proper null pointer checks with
TLLM_CHECK
- Validates self index consistency
- Sets up CUDA device context
- Launches the response thread asynchronously
The use of
std::make_shared
forCudaStream
follows smart pointer best practices.
134-149
: LGTM: sendAsync method properly implemented with thread synchronization.The method correctly:
- Creates promise/future pair for asynchronous operation
- Uses proper mutex locking for thread safety
- Adds requests to the ready responses queue
- Notifies waiting threads via condition variable
- Returns the future for caller synchronization
The nested locking approach prevents race conditions.
230-237
: LGTM: sendSync method properly implemented.The method correctly:
- Looks up the transfer session by request ID
- Sets the LLM request on the session
- Delegates formatting to the formatter
- Includes proper error checking
251-265
: LGTM: sendAndRemoveResponse method with proper exception handling.The method correctly:
- Uses CUDA device setting for proper context
- Calls synchronous send and cleanup
- Provides proper exception handling with promise notification
- Uses noexcept specification appropriately
267-345
: LGTM: Response thread implementation with comprehensive synchronization.The response thread correctly:
- Sets thread name for debugging
- Uses proper CUDA device context
- Implements condition variable wait with proper predicates
- Handles termination gracefully
- Manages request counting and cleanup
- Supports both parallel and sequential sending modes
- Includes comprehensive error handling and logging
The thread synchronization logic appears sound with proper use of mutexes and condition variables.
408-418
: LGTM: CacheReceiver::Impl constructor properly implemented.The constructor follows the same pattern as
CacheSender::Impl
with:
- Proper member initialization
- Null pointer validation
- Self index consistency check
- CUDA device setup
460-463
: LGTM: receiveSync method properly delegates to formatter.The method correctly delegates the unformatting operation to the formatter using the transfer session, maintaining proper separation of concerns.
465-529
: LGTM: sendRequestInfo method with comprehensive connection handling.The method correctly:
- Validates formatter support for cache states
- Handles selective cache transfer optimization
- Supports both agent and non-agent connection managers
- Manages buffer index assignment for agent connections
- Creates transfer session with proper parameters
- Includes comprehensive error checking and validation
The dual path handling for agent vs. non-agent connections is properly implemented.
531-547
: LGTM: getReceiveCacheResource method with proper resource management.The method correctly:
- Uses scoped lock for thread safety
- Implements per-process resource isolation when concurrent mode is enabled
- Creates resources on-demand with proper RAII
- Uses default process string for non-concurrent mode
549-559
: LGTM: sendRequestInfo helper method properly implemented.The helper method correctly:
- Serializes request info using the established serialization utilities
- Sends data in the proper order (ID, size, then data)
- Uses appropriate data contexts for different message components
700-704
: LGTM: CacheSender public constructor properly implemented.The constructor correctly creates the implementation with perfect forwarding of the formatter parameter using
std::move
.
706-721
: LGTM: CacheSender public interface methods properly implemented.All public methods correctly delegate to the implementation:
sendAsync()
forwards the request properlygetCommState()
andsetCommState()
maintain the interface contract- Destructor is properly defaulted
723-734
: LGTM: CacheReceiver public interface properly implemented.The constructor and methods correctly:
- Create implementation with proper parameter forwarding
- Delegate
receiveAsync()
to the multi-threaded implementation- Use defaulted destructor for proper cleanup
176-228
: Dynamic_cast usage is safe – no action neededVerified that
ConnectionManager
is indeed polymorphic (it declares virtual methods, includingrecvConnect
,getConnections
, andgetCommState
), so thedynamic_cast
inrecvRequestInfo()
will behave as intended. The interface is stable and handles both agent and non-agent scenarios correctly.
26-26
: LGTM! Updated include for agent utilities.The include path update aligns with the refactored cache transmission structure.
38-45
: LGTM! Clean helper function with proper naming.The
tagFromRequestId
function follows proper naming conventions and provides a clean abstraction for generating unique tags from request IDs.
47-57
: LGTM! Well-structured resource management.The
ReceiveCacheResource
struct properly encapsulates buffer manager and CUDA event resources with appropriate constructor.
134-149
: LGTM! Proper async pattern with promise/future.The
sendAsync
method correctly implements the promise/future pattern with proper synchronization using mutexes and condition variables.
176-228
: LGTM! Comprehensive request info handling with proper validation.The
recvRequestInfo
method properly handles both agent and direct connection scenarios, with appropriate validation of cache state compatibility and connection management.
230-237
: LGTM! Clean synchronous send implementation.The
sendSync
method properly retrieves the session and delegates to the formatter, with appropriate error checking.
405-418
: LGTM! Consistent constructor pattern.The
CacheReceiver::Impl
constructor follows the same validation and initialization pattern as the sender.
465-529
: LGTM! Comprehensive request info sending with agent support.The
sendRequestInfo
method properly handles both agent and direct connection scenarios, with appropriate buffer management and connection selection logic.
531-547
: LGTM! Thread-safe resource management with proper scoping.The
getReceiveCacheResource
method uses proper scoped locking and resource management patterns for concurrent access.
549-559
: LGTM! Clean request info serialization.The
sendRequestInfo
helper method properly serializes and sends request information using the defined communication protocol.
700-735
: LGTM! Clean public interface implementation.The public interface methods properly delegate to the pImpl pattern implementations with appropriate parameter forwarding and resource management.
736-736
: LGTM! Proper namespace closure comment.The namespace closure comment follows the coding guidelines requirement.
267-345
: Thread name and exception handling verifiedThe thread name “dataTransResp” is only set here and does not conflict with other threads. Exception propagation via
set_exception(std::current_exception())
matches the pattern used elsewhere (e.g., inworkerPool.h
). No further changes are needed.
115-132
: Ensure ConnectionManager Access Is Thread-SafeI didn’t find any explicit synchronization (mutexes/locks) around calls to
mManager
in this class. Since you spawn a background thread viastd::async
forImpl::response
, concurrent calls tomManager
could race unlessConnectionManager
itself is fully thread-safe.Please confirm or address this:
• Constructor (lines 115–132): validates and reads from
mManager
• Impl::response (launched async): invokesmManager->recvConnect(...)
and other methods
•receiveAsync()
/other methods: callmManager->getConnections(...)
If
executor::kv_cache::ConnectionManager
is not inherently thread-safe, wrap all shared calls in a mutex or otherwise serialize access tomManager
.
[[nodiscard]] size_t getCounterpartsCount(LlmRequest::RequestIdType requestId) const | ||
{ | ||
auto it = mRequestToSession.find(requestId); | ||
TLLM_CHECK(it != mRequestToSession.end()); | ||
return it->second.getConnections().size(); | ||
} | ||
|
||
void release(LlmRequest::RequestIdType requestId) | ||
{ | ||
auto it = mRequestToSession.find(requestId); | ||
TLLM_CHECK(it != mRequestToSession.end()); | ||
std::unique_lock<std::mutex> lk(mMtxForMap); | ||
mRequestToSession.erase(it); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Check thread safety of session map access.
The getCounterpartsCount
and release
methods access mRequestToSession
map. Verify that all map accesses are properly synchronized.
[[nodiscard]] size_t getCounterpartsCount(LlmRequest::RequestIdType requestId) const
{
+ std::unique_lock<std::mutex> lk(mMtxForMap);
auto it = mRequestToSession.find(requestId);
TLLM_CHECK(it != mRequestToSession.end());
return it->second.getConnections().size();
}
🤖 Prompt for AI Agents
In cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp around lines 161 to 174,
the methods getCounterpartsCount and release access the mRequestToSession map
without synchronization, which can cause race conditions. To fix this, ensure
that all accesses to mRequestToSession are protected by acquiring the mutex
mMtxForMap before accessing or modifying the map. Add a
std::unique_lock<std::mutex> lock(mMtxForMap) at the start of
getCounterpartsCount and confirm the existing lock in release covers the map
access properly.
PR_Github #12929 [ run ] completed with state |
ddc66c6
to
7d3e452
Compare
/bot run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🔭 Outside diff range comments (2)
cpp/tensorrt_llm/batch_manager/dataTransceiver.h (1)
18-18
: Replace #pragma once with required preprocessor guard.The coding guidelines specifically require preprocessor guards in header files with the format
TRTLLM_
followed by the filename in caps.Replace the current directive with the required guard format:
-#pragma once +#ifndef TRTLLM_DATATRANSCEIVER_H +#define TRTLLM_DATATRANSCEIVER_HAnd add the closing directive at the end of the file:
} // namespace tensorrt_llm::batch_manager + +#endif // TRTLLM_DATATRANSCEIVER_Hcpp/tensorrt_llm/batch_manager/cacheFormatter.h (1)
225-261
: Fix member variable naming convention.The member variable
kvCacheMeasureHelper
should follow the coding guidelines for class member variables.Apply this fix:
- KvCacheMeasureHelper kvCacheMeasureHelper{common::getEnvKVCacheTransferOutputPath()}; + KvCacheMeasureHelper mKvCacheMeasureHelper{common::getEnvKVCacheTransferOutputPath()};Class member variables should use camelCase prefixed with 'm' according to the coding guidelines.
♻️ Duplicate comments (2)
cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (2)
161-174
: Thread safety issue with session map access.The
getCounterpartsCount
method accessesmRequestToSession
without proper synchronization, whilerelease
uses mutex protection. This creates a race condition.This is the same thread safety issue identified in previous reviews. The
getCounterpartsCount
method needs mutex protection:[[nodiscard]] size_t getCounterpartsCount(LlmRequest::RequestIdType requestId) const { + std::unique_lock<std::mutex> lk(mMtxForMap); auto it = mRequestToSession.find(requestId); TLLM_CHECK(it != mRequestToSession.end()); return it->second.getConnections().size(); }
420-424
: TODO: Address inefficient thread creation pattern.The method creates a new thread for each async receive operation, which is inefficient as noted in the TODO comment.
This matches the previous review feedback about frequent thread creation. The TODO comment correctly identifies this as needing improvement. Consider using the existing
requestAndReceiveAsyncMultiThreads
pattern or implementing a thread pool.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h
(2 hunks)cpp/tensorrt_llm/batch_manager/CMakeLists.txt
(0 hunks)cpp/tensorrt_llm/batch_manager/cacheFormatter.h
(3 hunks)cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp
(7 hunks)cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp
(15 hunks)cpp/tensorrt_llm/batch_manager/dataTransceiver.h
(3 hunks)cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp
(0 hunks)cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.h
(0 hunks)cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp
(2 hunks)cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h
(2 hunks)cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.cpp
(1 hunks)cpp/tests/batch_manager/cacheTransceiverTest.cpp
(11 hunks)cpp/tests/unit_tests/executor/ucxCommTest.cpp
(0 hunks)
💤 Files with no reviewable changes (4)
- cpp/tensorrt_llm/batch_manager/CMakeLists.txt
- cpp/tests/unit_tests/executor/ucxCommTest.cpp
- cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.h
- cpp/tensorrt_llm/batch_manager/dataTransceiverImpl.cpp
✅ Files skipped from review due to trivial changes (1)
- cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.cpp
🚧 Files skipped from review as they are similar to previous changes (5)
- cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h
- cpp/include/tensorrt_llm/batch_manager/cacheTransceiver.h
- cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp
- cpp/tests/batch_manager/cacheTransceiverTest.cpp
- cpp/tensorrt_llm/batch_manager/cacheTransceiver.cpp
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{cpp,h,hpp,cc,cxx}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.{cpp,h,hpp,cc,cxx}
: Closing braces of namespaces should have a comment saying the namespace it closes (e.g., } // namespace foo)
Prefer const or constexpr variables over #defines whenever possible, as the latter are not visible to the compiler.
A variable that is not modified after its initialization should be declared as const.
Except 0, nullptr, true, false, all other literals should only be used for variable initialization.
Use the Allman indentation style for braces in C++ code.
Put the semicolon for an empty for or while loop in a new line.
The statement forming the body of a switch, while, do .. while or for statement shall be a compound statement (use brace-delimited statements).
If and else should always be followed by brace-delimited statements, even if empty or a single statement.
C++ filenames should use camel case with first letter lowercase (e.g., thisIsAFilename.cpp), and must be case-insensitive unique within a compilation target.
All types (including class names) should use camel case with uppercase first letter (e.g., FooBarClass).
Local variables, methods, and namespaces should use camel case with first letter lowercase (e.g., localFooBar).
Non-magic-number global variables that are non-static and not defined in anonymous namespace should use camel case prefixed by a lower case 'g' (e.g., gDontUseGlobalFoos).
Non-magic-number global variables that are static or defined in an anonymous namespace should use camel case prefixed by a lower case 's' (e.g., sMutableStaticGlobal).
Locally visible static variable should use camel case with lowercase prefix 's' as the first letter of the name (e.g., static std::once_flag sFlag;).
Class member variables should use camel case prefixed with an 'm' (e.g., mNbFooValues). Public member variables do not require the 'm' prefix but it is encouraged for clarity.
Enumerations, global constants, static constants at class-scope, and function-scope magic-number/literal constants are uppercase snakecase with prefix...
Files:
cpp/tensorrt_llm/batch_manager/cacheFormatter.h
cpp/tensorrt_llm/batch_manager/dataTransceiver.h
cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp
**/*.{h,hpp}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
Use a preprocessor guard in header files. The guard name must have prefix TRTLLM_ followed by the filename, all in caps, and no trailing underscore.
Files:
cpp/tensorrt_llm/batch_manager/cacheFormatter.h
cpp/tensorrt_llm/batch_manager/dataTransceiver.h
**/*.{cpp,h,hpp,cc,cxx,cu,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
cpp/tensorrt_llm/batch_manager/cacheFormatter.h
cpp/tensorrt_llm/batch_manager/dataTransceiver.h
cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp
🧠 Learnings (3)
cpp/tensorrt_llm/batch_manager/cacheFormatter.h (1)
Learnt from: amitz-nv
PR: #5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks is_adapter_in_cpu_cache()
and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.
cpp/tensorrt_llm/batch_manager/dataTransceiver.h (1)
Learnt from: amitz-nv
PR: #5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks is_adapter_in_cpu_cache()
and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.
cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (1)
Learnt from: amitz-nv
PR: #5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks is_adapter_in_cpu_cache()
and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.
🔇 Additional comments (24)
cpp/tensorrt_llm/batch_manager/dataTransceiver.h (6)
1-16
: LGTM! Copyright header complies with coding guidelines.The NVIDIA copyright header includes the current year (2025) as required by the coding guidelines for TensorRT-LLM Open Source Software.
24-24
: Good addition of cacheFormatter.h include.The new include aligns with the refactored design that moves TransferSession definition to the kv_cache_manager namespace.
42-58
: Well-designed TransceiverTag structure follows coding guidelines.The type aliases and TransceiverTag struct properly follow C++ naming conventions:
- SizeType32 alias uses camelCase as required
- TransferSession alias provides clear abstraction boundary
- Enum values use uppercase naming (REQUEST_SEND, TERMINATION)
- Static constexpr constants use kCONSTANT_NAME pattern with proper k prefix
115-158
: CacheSender class design follows best practices.The refactored class demonstrates good design principles:
- Proper UpperCamelCase naming for class name
- Method names use lowerCamelCase convention
- Smart pointer usage with unique_ptr for resource management
- Pimpl pattern reduces compilation dependencies
- Constructor parameters use appropriate types (ConnectionManager*, CacheState, etc.)
160-182
: CacheReceiver class maintains design consistency.The class follows the same excellent design patterns as CacheSender:
- Consistent naming conventions and interface design
- Symmetric constructor signature with CacheSender
- Appropriate method names for receiving operations
- Proper use of pimpl pattern for implementation hiding
184-184
: Perfect namespace closing comment.The closing brace comment follows the coding guidelines requirement exactly:
} // namespace tensorrt_llm::batch_manager
cpp/tensorrt_llm/batch_manager/cacheFormatter.h (4)
1-16
: Copyright header meets coding guidelines.The NVIDIA copyright header correctly includes the current year (2025) as required for TensorRT-LLM Open Source Software.
45-121
: TransferSession class demonstrates excellent C++ design practices.The class properly follows all coding guidelines:
- UpperCamelCase class name
- Member variables with proper 'm' prefix (mConnections, mDataContext, etc.)
- Method names use lowerCamelCase convention
- Proper const-correctness with const methods and const references
- Good use of TLLM_CHECK for assertions
- Clean initializer list in constructor
123-161
: BaseCacheFormatter interface demonstrates clean design.The interface properly follows coding guidelines:
- Method names use lowerCamelCase convention
- Proper virtual destructor
- Clean signature with TransferSession& parameter
- Appropriate const-correctness
- Good use of [[nodiscard]] attribute for return values
266-266
: Perfect namespace closing comment.The closing brace comment properly follows coding guidelines with the complete namespace path:
} // namespace tensorrt_llm::batch_manager::kv_cache_manager
cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (14)
1-16
: Copyright header complies with coding guidelines.The NVIDIA copyright header correctly includes the current year (2025) as required for TensorRT-LLM Open Source Software.
38-57
: Well-designed utility functions and structures.The code follows C++ coding guidelines properly:
tagFromRequestId
function uses lowerCamelCase and static linkageReceiveCacheResource
struct uses UpperCamelCase naming- Member variables correctly use 'm' prefix (mBufferManager, mCudaEvent)
- Constant
kDATA_TAG
follows uppercase with 'k' prefix convention
115-133
: CacheSender::Impl constructor follows best practices.The constructor demonstrates good C++ practices:
- Proper member initialization list
- Smart pointer usage with unique_ptr
- Appropriate validation with TLLM_CHECK
- Member variables follow 'm' prefix convention
176-228
: Complex but well-structured request info handling.The method demonstrates good practices:
- Proper type checking with dynamic_cast for agent connections
- Clean lambda usage for agent-specific logic
- Correct mutex synchronization for
mRequestToSession
access- Good validation of formatter compatibility between cache states
- Appropriate session creation and connection management
230-237
: Clean and focused sendSync implementation.The method properly delegates to formatter and manages the session state appropriately.
251-403
: Robust threading and response management.The response handling demonstrates excellent threading practices:
- Proper exception handling with promise/future pattern
- Correct CUDA device management across threads
- Thread naming for debugging (
dataTransResp
)- Proper condition variable usage for synchronization
- Clean termination handling in destructor
405-418
: CacheReceiver::Impl constructor follows consistent design.The constructor maintains the same excellent practices as CacheSender::Impl with proper initialization and validation.
426-458
: Excellent async request handling with thread pool pattern.This method demonstrates the proper solution for efficient async operations:
- Resource pooling per process to avoid thread creation overhead
- Queue-based task distribution with worker threads
- Proper synchronization with mutex and condition variables
- Good exception handling and future management
- This addresses the inefficiency noted in the
receiveAsync
TODO
460-463
: Clean receiveSync delegation.Simple and appropriate delegation to the formatter's unformat method.
465-559
: Comprehensive sendRequestInfo implementation.The method handles complex scenarios well:
- Proper validation of formatter support between cache states
- Clean handling of both agent and regular connection managers
- Appropriate serialization of request information
- Good resource management with TransferSession creation
- Proper exception handling and validation throughout
531-559
: Solid resource management with thread safety.The resource management demonstrates good practices:
- Thread-safe access with
scoped_lock<mutex>
- Lazy initialization of per-process resources
- Proper RAII with unique_ptr and BufferManager
- Clean helper method for connection communication
561-698
: Excellent async resource management and cleanup.The implementation demonstrates robust design:
- Proper RAII in destructor with resource cleanup and future synchronization
- Good CUDA device management and logging in
requestSync
- Well-designed
AsyncResource
structure for thread coordination- Proper move semantics in
RequestAndPromise
- Comprehensive error handling with exception propagation
- Thread naming for debugging (
dataTransRequest
)
700-734
: Clean public interface following pimpl pattern.The public interface demonstrates excellent design:
- Proper delegation to implementation classes
- Clean constructor forwarding with smart pointer management
- Appropriate method delegation maintaining interface contracts
- Correctly defaulted destructors for pimpl pattern
736-737
: Perfect namespace closing comment.The closing brace comment properly follows coding guidelines:
} // namespace tensorrt_llm::batch_manager
PR_Github #13033 [ run ] triggered by Bot |
PR_Github #13033 [ run ] completed with state |
7d3e452
to
1a21dba
Compare
f728821
to
5a46db5
Compare
/bot run --disable-fail-fast |
@pcastonguay / @Shixiaowei02 I addressed the review comments. Please review when you get a chance. Thanks. |
PR_Github #18822 [ run ] triggered by Bot |
6a094b9
to
f3db06d
Compare
PR_Github #18822 [ run ] completed with state |
9adbf1c
to
1189980
Compare
/bot run --disable-fail-fast |
PR_Github #18845 [ run ] triggered by Bot |
PR_Github #18845 [ run ] completed with state |
Signed-off-by: Iman Tabrizian <[email protected]> Add unittest for findBlocksInReuseTreeByHashes Signed-off-by: Iman Tabrizian <[email protected]> fixes Signed-off-by: Iman Tabrizian <[email protected]> Switch from hash id to block key Signed-off-by: Iman Tabrizian <[email protected]> Add support for blockKeys Signed-off-by: Iman Tabrizian <[email protected]> Fix bugs Signed-off-by: Iman Tabrizian <[email protected]> Fix accuracy bug and add tests Signed-off-by: Iman Tabrizian <[email protected]>
Signed-off-by: Iman Tabrizian <[email protected]>
Signed-off-by: Iman Tabrizian <[email protected]>
Signed-off-by: Iman Tabrizian <[email protected]>
1189980
to
5c39f4f
Compare
Signed-off-by: Iman Tabrizian <[email protected]>
/bot run --disable-fail-fast |
08ffe67
to
b91e12e
Compare
PR_Github #19211 [ run ] triggered by Bot |
@chuangz0 @Shixiaowei02 could you please review the kvCacheTransceiver changes? Thanks. |
PR_Github #19211 [ run ] completed with state |
Signed-off-by: Iman Tabrizian <[email protected]>
b91e12e
to
0e69eb5
Compare
Signed-off-by: Iman Tabrizian <[email protected]>
Summary by CodeRabbit
New Features
Performance
Behavior Changes
API Changes
Tests
Chores
Description
Test Coverage
GitHub Bot Help
/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...
Provide a user friendly way for developers to interact with a Jenkins server.
Run
/bot [-h|--help]
to print this help message.See details below for each supported subcommand.
run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]
Launch build/test pipelines. All previously running jobs will be killed.
--reuse-test (optional)pipeline-id
(OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.--disable-reuse-test
(OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.--disable-fail-fast
(OPTIONAL) : Disable fail fast on build/tests/infra failures.--skip-test
(OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.--stage-list "A10-PyTorch-1, xxx"
(OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.--gpu-type "A30, H100_PCIe"
(OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.--test-backend "pytorch, cpp"
(OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.--only-multi-gpu-test
(OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.--disable-multi-gpu-test
(OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.--add-multi-gpu-test
(OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.--post-merge
(OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx"
(OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".--detailed-log
(OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.--debug
(OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in thestage-list
parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.For guidance on mapping tests to stage names, see
docs/source/reference/ci-overview.md
and the
scripts/test_to_stage_mapping.py
helper.kill
kill
Kill all running builds associated with pull request.
skip
skip --comment COMMENT
Skip testing for latest commit on pull request.
--comment "Reason for skipping build/test"
is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.reuse-pipeline
reuse-pipeline
Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.