-
Notifications
You must be signed in to change notification settings - Fork 1.8k
chore: Improve the AutoTuner log information. #6368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughThe changes enhance logging and error reporting in the autotuning process for TensorRT-LLM's PyTorch integration. The Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
/bot run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
tensorrt_llm/_torch/pyexecutor/model_engine.py (1)
701-706
: Guard verbose cache dump behind DEBUG log levelThe cache-size message and
print_profiling_cache()
execute on every warm-up.
For large workloads this may emit thousands of lines at INFO level, cluttering logs and costing I/O even when users are not actively debugging.-import logging +import logging # at top-of-file (if not imported already) -logger.info( - f"[Autotuner]: Cache size after warmup is " - f"{len(AutoTuner.get().profiling_cache)}" -) -AutoTuner.get().print_profiling_cache() +if logger.isEnabledFor(logging.DEBUG): + logger.debug( + "[Autotuner] Cache size after warmup: " + f"{len(AutoTuner.get().profiling_cache)}" + ) + AutoTuner.get().print_profiling_cache()Keeps default logs tidy while retaining full diagnostics when
--log-level=debug
is enabled.tensorrt_llm/_torch/autotuner.py (3)
394-397
: Enhanced error handling with appropriate log levelsThe logging improvements properly separate warning-level information from debug-level details. The warning message provides actionable guidance to enable debug logs, while exception details are logged at debug level to avoid cluttering production logs.
However, there are line length violations that need to be addressed.
Fix the line length violations:
- logger.warning( - f"[Autotuner]: Failed when profiling runner={r}, tactic={tac}, shapes={shapes}. Set TLLM_LOG_LEVEL=DEBUG for details." - ) + logger.warning( + f"[Autotuner]: Failed when profiling runner={r}, tactic={tac}, " + f"shapes={shapes}. Set TLLM_LOG_LEVEL=DEBUG for details." + )
427-430
: Critical error logging for autotuning failuresThis addition properly logs critical situations where no valid runner/tactic combination can be found, which is essential for debugging autotuning issues. The error message provides sufficient context for troubleshooting.
However, there's a line length violation that should be addressed.
Fix the line length violation:
- logger.error( - f"[Autotuner]: No valid runner/tactic was found for custom_op={custom_op}, input_shapes={input_shapes}. At least one valid (runner, tactic) pair is required. Please check the implementation of the runner for the custom_op." - ) + logger.error( + f"[Autotuner]: No valid runner/tactic was found for " + f"custom_op={custom_op}, input_shapes={input_shapes}. " + f"At least one valid (runner, tactic) pair is required. " + f"Please check the implementation of the runner for the custom_op." + )
652-660
: New method for profiling cache visibilityThe
print_profiling_cache
method provides valuable debugging capability by allowing inspection of cached profiling results. The implementation correctly iterates through cache entries and formats them for readability.However, there's a line length violation in the header comment that should be addressed.
Fix the line length violation:
- logger.debug( - f"[Autotuner]: Cache contents: (custom_op, runner, hash(attributes), shape_profiles) -> (runner_id, tactic, shape_profile(ignored))" - ) + logger.debug( + f"[Autotuner]: Cache contents: (custom_op, runner, hash(attributes), " + f"shape_profiles) -> (runner_id, tactic, shape_profile(ignored))" + )
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
tensorrt_llm/_torch/autotuner.py
(5 hunks)tensorrt_llm/_torch/pyexecutor/model_engine.py
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py
: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.
Files:
tensorrt_llm/_torch/pyexecutor/model_engine.py
tensorrt_llm/_torch/autotuner.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
tensorrt_llm/_torch/pyexecutor/model_engine.py
tensorrt_llm/_torch/autotuner.py
🪛 Ruff (0.12.2)
tensorrt_llm/_torch/autotuner.py
395-395: Line too long (150 > 120)
(E501)
429-429: Line too long (247 > 120)
(E501)
656-656: Line too long (143 > 120)
(E501)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (3)
tensorrt_llm/_torch/autotuner.py (3)
353-357
: Improved fallback warning logic with appropriate conditionsThe changes correctly implement the PR objective to show fallback warnings only once and only when the profiling cache is not empty. The use of
logger.warning_once
with a key parameter ensures no spam in logs during inference mode.
425-426
: Improved debug message formattingThe reformatted debug message provides clearer information about the profiling process by splitting the message across multiple lines for better readability.
490-490
: Consistent debug message formattingThe reformatted debug message maintains consistency with other logging improvements and provides clear profiling results information.
PR_Github #13017 [ run ] triggered by Bot |
PR_Github #13017 [ run ] completed with state |
/bot run --disable-fail-fast |
PR_Github #13106 [ run ] triggered by Bot |
PR_Github #13106 [ run ] completed with state |
aac330d
to
2550815
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (3)
tensorrt_llm/_torch/autotuner.py (3)
396-399
: Well-structured error handling with appropriate log levels.The separation of WARNING for user-facing messages and DEBUG for technical details is appropriate. This aligns with the review feedback that individual tactic failures are warnings, not errors.
Consider breaking the long warning message across multiple lines for better readability:
- logger.warning( - f"[Autotuner] Failed when profiling runner={r}, tactic={tac}, shapes={shapes}. Set TLLM_LOG_LEVEL=DEBUG for more details." - ) + logger.warning( + f"[Autotuner] Failed when profiling runner={r}, tactic={tac}, shapes={shapes}. " + f"Set TLLM_LOG_LEVEL=DEBUG for more details." + )
430-436
: Informative warning message, but needs formatting.The detailed warning helps users understand the issue and provides actionable guidance.
Break the long lines to improve readability:
- logger.warning( - f"[Autotuner] No valid runner/tactic was found for custom_op={custom_op}, input_shapes={input_shapes}. " - f"At least one valid (runner, tactic) pair is required. " - f"If get_valid_tactics is intended to return empty list, please ensure that this profile is not valid for the custom_op " - f"and should not occurs during the inference stage, or fallback tactic is implemented. Otherwise, the the tuning process will crash." - ) + logger.warning( + f"[Autotuner] No valid runner/tactic was found for custom_op={custom_op}, " + f"input_shapes={input_shapes}. At least one valid (runner, tactic) pair is required. " + f"If get_valid_tactics is intended to return empty list, please ensure that this profile " + f"is not valid for the custom_op and should not occur during the inference stage, " + f"or fallback tactic is implemented. Otherwise, the tuning process will crash." + )Also note: "should not occurs" should be "should not occur" and "the the" should be "the".
668-675
: Useful debug method for cache inspection.The new
print_profiling_cache
method provides valuable visibility into the autotuner's cache state.Break the long line for better readability:
- logger.debug( - f"[Autotuner] Cache contents: (custom_op, runner, hash(attributes), shape_profiles) -> (runner_id, tactic, shape_profile(ignored))" - ) + logger.debug( + f"[Autotuner] Cache contents: (custom_op, runner, hash(attributes), shape_profiles) -> " + f"(runner_id, tactic, shape_profile(ignored))" + )
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
tensorrt_llm/_torch/autotuner.py
(8 hunks)tensorrt_llm/_torch/pyexecutor/model_engine.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tensorrt_llm/_torch/pyexecutor/model_engine.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py
: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.
Files:
tensorrt_llm/_torch/autotuner.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
tensorrt_llm/_torch/autotuner.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: amitz-nv
PR: NVIDIA/TensorRT-LLM#5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks `is_adapter_in_cpu_cache()` and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.
🪛 Ruff (0.12.2)
tensorrt_llm/_torch/autotuner.py
397-397: Line too long (154 > 120)
(E501)
432-432: Line too long (128 > 120)
(E501)
434-434: Line too long (145 > 120)
(E501)
435-435: Line too long (157 > 120)
(E501)
441-441: Line too long (169 > 120)
(E501)
671-671: Line too long (142 > 120)
(E501)
🔇 Additional comments (2)
tensorrt_llm/_torch/autotuner.py (2)
200-206
: LGTM! Clean log formatting.The removal of colons from the autotuning start/end messages improves consistency.
353-357
: Good improvement to cache miss logging behavior.The conditional warning only when cache is non-empty and the use of
warning_once
with custom op as key prevents log spam while still alerting users to performance-impacting cache misses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
tensorrt_llm/_torch/autotuner.py (3)
368-369
: Fix the typo in variable name.The variable name contains a spelling error that was flagged in previous reviews.
Apply this diff to fix the typo:
- new_tuning_failure_occured = False + new_tuning_failure_occurred = FalseThis typo also appears on lines 402 and 439 and needs to be fixed consistently.
402-402
: Fix the typo in variable name (continued).This is the same typo that needs to be fixed consistently.
Apply this diff:
- new_tuning_failuer_occured = True + new_tuning_failure_occurred = True
439-445
: Fix the typo and improve error message formatting.The error summary is a good addition, but there are issues to address:
- The typo in the variable name (flagged in previous reviews)
- Incorrect display of failed count (should use
len()
on the set)- Line length violations
Apply this diff to fix all issues:
- if new_tuning_failuer_occured: + if new_tuning_failure_occurred: logger.error( - f"[Autotuner] New tuning error occurs: Total failed profiling tactics occurs: {self.stats.failed_profiling_count[custom_op]} for custom_op={custom_op}. " - f"This will not block the tuning process. " - f"Please set TLLM_LOG_LEVEL=WARNING to find out when the tactic profiling fails. " - f"Set TLLM_LOG_LEVEL=DEBUG to get more details of the failures." + f"[Autotuner] New tuning error occurs: Total failed profiling tactics: " + f"{len(self.stats.failed_profiling_count[custom_op])} for custom_op={custom_op}. " + f"This will not block the tuning process. " + f"Please set TLLM_LOG_LEVEL=WARNING to find out when the tactic profiling fails. " + f"Set TLLM_LOG_LEVEL=DEBUG to get more details of the failures." )
🧹 Nitpick comments (3)
tensorrt_llm/_torch/autotuner.py (3)
396-399
: Appropriate log level adjustment for profiling exceptions.The change from ERROR to WARNING for profiling failures is correct since these are expected failures during the tuning process, not actual errors. The two-level logging approach (WARNING with summary, DEBUG with details) provides good user experience.
However, the line length violation should be addressed:
- logger.warning( - f"[Autotuner] Failed when profiling runner={r}, tactic={tac}, shapes={shapes}. Set TLLM_LOG_LEVEL=DEBUG for more details." - ) + logger.warning( + f"[Autotuner] Failed when profiling runner={r}, tactic={tac}, " + f"shapes={shapes}. Set TLLM_LOG_LEVEL=DEBUG for more details." + )
430-436
: Good addition of explicit warning for missing valid tactics.This warning message provides valuable feedback when no valid runner/tactic pairs are found, helping developers debug their
get_valid_tactics
implementations.However, the long lines should be broken up for better readability:
- logger.warning( - f"[Autotuner] No valid runner/tactic was found for custom_op={custom_op}, input_shapes={input_shapes}. " - f"At least one valid (runner, tactic) pair is required. " - f"If get_valid_tactics is intended to return empty list, please ensure that this profile is not valid for the custom_op " - f"and should not occurs during the inference stage, or fallback tactic is implemented. Otherwise, the the tuning process will crash." - ) + logger.warning( + f"[Autotuner] No valid runner/tactic was found for " + f"custom_op={custom_op}, input_shapes={input_shapes}. " + f"At least one valid (runner, tactic) pair is required. " + f"If get_valid_tactics is intended to return empty list, please ensure " + f"that this profile is not valid for the custom_op and should not occur " + f"during the inference stage, or fallback tactic is implemented. " + f"Otherwise, the tuning process will crash." + )
668-675
: Useful addition for debugging profiling cache contents.The new
print_profiling_cache
method is a valuable debugging tool that provides clear visibility into the cache state. The implementation correctly iterates through cache entries and formats the output clearly.However, the long line should be broken up:
logger.debug( - f"[Autotuner] Cache contents: (custom_op, runner, hash(attributes), shape_profiles) -> (runner_id, tactic, shape_profile(ignored))" + f"[Autotuner] Cache contents: (custom_op, runner, hash(attributes), " + f"shape_profiles) -> (runner_id, tactic, shape_profile(ignored))" )
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
tensorrt_llm/_torch/autotuner.py
(8 hunks)tensorrt_llm/_torch/pyexecutor/model_engine.py
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- tensorrt_llm/_torch/pyexecutor/model_engine.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py
: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.
Files:
tensorrt_llm/_torch/autotuner.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
tensorrt_llm/_torch/autotuner.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: amitz-nv
PR: NVIDIA/TensorRT-LLM#5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks `is_adapter_in_cpu_cache()` and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.
🪛 Ruff (0.12.2)
tensorrt_llm/_torch/autotuner.py
397-397: Line too long (154 > 120)
(E501)
432-432: Line too long (128 > 120)
(E501)
434-434: Line too long (145 > 120)
(E501)
435-435: Line too long (157 > 120)
(E501)
441-441: Line too long (169 > 120)
(E501)
671-671: Line too long (142 > 120)
(E501)
🔇 Additional comments (5)
tensorrt_llm/_torch/autotuner.py (5)
200-200
: LGTM! Improved log message consistency.Removing the colons from the autotuning start/end messages improves consistency with other log messages in the codebase.
Also applies to: 206-206
353-357
: Good improvement to cache miss logging behavior.The logic correctly:
- Only logs when the profiling cache is non-empty (avoiding noise during initial runs)
- Uses
logger.warning_once()
to prevent log spam for repeated cache misses- Provides useful context with input shapes and custom operation name
428-429
: LGTM! Improved debug message formatting.The debug message formatting is clearer and more consistent with other log messages.
505-505
: LGTM! Consistent debug message formatting.The profiling result message formatting is improved and consistent.
575-575
: LGTM! Clear debug message for generated profiles.The debug message clearly indicates when a new optimization profile is generated.
2550815
to
cb802f5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
tensorrt_llm/_torch/autotuner.py (3)
438-446
: Fix typo and line length violation in error summary.The error summary logic is good, but there are formatting issues that need attention:
- The variable name contains the typo identified in previous reviews (line 439)
- Line 442 exceeds the 120-character limit
Apply this diff to fix the line length violation:
logger.error( - f"[Autotuner] New tuning error occurs:" - f"Total failed profiling tactics occurs: {len(self.stats.failed_profiling_count[custom_op])} for custom_op={custom_op}. " - f"This will not block the tuning process. " - f"Please set TLLM_LOG_LEVEL=WARNING to find out when the tactic profiling fails. " - f"Set TLLM_LOG_LEVEL=DEBUG to get more details of the failures." + f"[Autotuner] New tuning error occurs: " + f"Total failed profiling tactics: {len(self.stats.failed_profiling_count[custom_op])} " + f"for custom_op={custom_op}. This will not block the tuning process. " + f"Please set TLLM_LOG_LEVEL=WARNING to find out when the tactic profiling fails. " + f"Set TLLM_LOG_LEVEL=DEBUG to get more details of the failures." )
368-368
: Fix the typo in variable name.The variable name contains typos that were identified in previous reviews:
new_tuning_failuer_occured
should benew_tuning_failure_occurred
.Apply this diff to fix the typo:
- new_tuning_failure_occured = False + new_tuning_failure_occurred = FalseThis typo also appears on lines 402 and 439 and needs to be fixed consistently.
402-402
: Fix the typo in variable name (duplicate issue).The variable name contains the same typo identified in previous reviews:
new_tuning_failuer_occured
should benew_tuning_failure_occurred
.
🧹 Nitpick comments (2)
tensorrt_llm/_torch/autotuner.py (2)
430-436
: Address line length violations in warning message.The warning message and logic are appropriate, but several lines exceed the 120-character limit flagged by static analysis.
Apply this diff to fix the line length violations:
else: logger.warning( - f"[Autotuner] No valid runner/tactic was found for custom_op={custom_op}, input_shapes={input_shapes}. " - f"At least one valid (runner, tactic) pair is required. " - f"If get_valid_tactics is intended to return empty list, please ensure that this profile is not valid for the custom_op " - f"and should not occurs during the inference stage, or fallback tactic is implemented. Otherwise, the the tuning process will crash." + f"[Autotuner] No valid runner/tactic was found for custom_op={custom_op}, " + f"input_shapes={input_shapes}. At least one valid (runner, tactic) pair is required. " + f"If get_valid_tactics is intended to return empty list, please ensure that this profile " + f"is not valid for the custom_op and should not occurs during the inference stage, " + f"or fallback tactic is implemented. Otherwise, the tuning process will crash." )
669-676
: Good addition but fix line length violation.The new
print_profiling_cache
method provides useful debugging capability, but line 672 exceeds the 120-character limit.Apply this diff to fix the line length violation:
logger.debug( - f"[Autotuner] Cache contents: (custom_op, runner, hash(attributes), shape_profiles) -> (runner_id, tactic, shape_profile(ignored))" + f"[Autotuner] Cache contents: (custom_op, runner, hash(attributes), shape_profiles) -> " + f"(runner_id, tactic, shape_profile(ignored))" )
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
tensorrt_llm/_torch/autotuner.py
(8 hunks)tensorrt_llm/_torch/pyexecutor/model_engine.py
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- tensorrt_llm/_torch/pyexecutor/model_engine.py
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: amitz-nv
PR: NVIDIA/TensorRT-LLM#5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks `is_adapter_in_cpu_cache()` and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.
🪛 Ruff (0.12.2)
tensorrt_llm/_torch/autotuner.py
397-397: Line too long (154 > 120)
(E501)
432-432: Line too long (128 > 120)
(E501)
434-434: Line too long (145 > 120)
(E501)
435-435: Line too long (157 > 120)
(E501)
442-442: Line too long (137 > 120)
(E501)
672-672: Line too long (142 > 120)
(E501)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (5)
tensorrt_llm/_torch/autotuner.py (5)
200-200
: LGTM! Consistent log message formatting.Removing the colons from the autotune start/end messages improves consistency in the logging format.
Also applies to: 206-206
353-357
: Good improvement to cache miss logging.The conditional logging (only when cache is non-empty) and use of
warning_once
prevents false alarms and log spam while still providing useful information about fallback usage.
396-399
: Good improvement to exception logging levels.Changing from error to warning+debug logging is appropriate since profiling failures are expected behavior rather than actual errors. The warning provides guidance while debug gives details when needed.
428-429
: LGTM! Improved debug message formatting.The reformatted debug message is clearer and more consistent with other logging in the file.
506-506
: LGTM! Improved debug message formatting.The reformatted debug messages for profiling results and generated profiles are clearer and more consistent with other logging in the file.
Also applies to: 576-576
/bot run --disable-fail-fast |
PR_Github #13169 [ run ] triggered by Bot |
PR_Github #13169 [ run ] completed with state |
/bot run --disable-fail-fast |
PR_Github #13208 [ run ] triggered by Bot |
PR_Github #13208 [ run ] completed with state |
cb802f5
to
8c65167
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (5)
tensorrt_llm/_torch/autotuner.py (5)
353-357
: Fix typo in log message and approve the improved cache miss logic.The conditional logic and
warning_once
usage effectively prevents log spam and false alarms when the cache is empty. However, there's a typo in the log message.Apply this diff to fix the typo:
- f"[AutoTunner] Using the fallback tactic, due to cache miss on input shapes={input_shapes}", + f"[AutoTuner] Using the fallback tactic, due to cache miss on input shapes={input_shapes}",
368-368
: Fix spelling error in variable name.The logic for tracking tuning failures is correct, but there's a spelling error in the variable name.
Apply this diff to fix the spelling:
- new_tuning_failure_occured = False + new_tuning_failure_occurred = False- new_tuning_failure_occured = True + new_tuning_failure_occurred = TrueAlso applies to: 402-402
430-436
: Fix line length violations while maintaining the good warning logic.The warning logic and messaging are excellent for user guidance when no valid runner/tactic pairs are found. However, several lines exceed the 120-character limit.
Apply this diff to fix line length violations:
- logger.warning( - f"[Autotuner] No valid runner/tactic was found for custom_op={custom_op}, input_shapes={input_shapes}. " - f"At least one valid (runner, tactic) pair is required. " - f"If get_valid_tactics is intended to return empty list, please ensure that this profile is not valid for the custom_op " - f"and should not occurs during the inference stage, or fallback tactic is implemented. Otherwise, the the tuning process will crash." - ) + logger.warning( + f"[Autotuner] No valid runner/tactic was found for custom_op={custom_op}, " + f"input_shapes={input_shapes}. At least one valid (runner, tactic) pair is required. " + f"If get_valid_tactics is intended to return empty list, please ensure that this profile " + f"is not valid for the custom_op and should not occurs during the inference stage, " + f"or fallback tactic is implemented. Otherwise, the tuning process will crash." + )
438-446
: Fix spelling error and line length violations in the error summary.The error summary logic is excellent and aligns with the agreed-upon compromise from past discussions. However, there are formatting issues to address.
Apply this diff to fix the spelling error and line length violations:
- if new_tuning_failure_occured: + if new_tuning_failure_occurred: logger.error( - f"[Autotuner] New tuning error occurs:" - f"Total failed profiling tactics occurs: {len(self.stats.failed_profiling_count[custom_op])} for custom_op={custom_op}. " - f"This will not block the tuning process. " - f"Please set TLLM_LOG_LEVEL=WARNING to find out when the tactic profiling fails. " - f"Set TLLM_LOG_LEVEL=DEBUG to get more details of the failures." + f"[Autotuner] New tuning error occurs: " + f"Total failed profiling tactics: {len(self.stats.failed_profiling_count[custom_op])} " + f"for custom_op={custom_op}. This will not block the tuning process. " + f"Please set TLLM_LOG_LEVEL=WARNING to find out when the tactic profiling fails. " + f"Set TLLM_LOG_LEVEL=DEBUG to get more details of the failures." )
668-676
: LGTM! Valuable debugging method with minor formatting fix needed.The new
print_profiling_cache
method provides excellent visibility into the autotuner's internal state, which aligns perfectly with the PR objectives for enhanced debugging capabilities.Apply this diff to fix the line length violation:
logger.debug( - f"[Autotuner] Cache contents: (custom_op, runner, hash(attributes), shape_profiles) -> (runner_id, tactic, shape_profile(ignored))" + f"[Autotuner] Cache contents: (custom_op, runner, hash(attributes), shape_profiles) -> " + f"(runner_id, tactic, shape_profile(ignored))" )
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
tensorrt_llm/_torch/autotuner.py
(8 hunks)tensorrt_llm/_torch/pyexecutor/model_engine.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tensorrt_llm/_torch/pyexecutor/model_engine.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py
: The code developed for TensorRT-LLM should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a file, prefer docstrings over comments in Python.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without reflection.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.
Files:
tensorrt_llm/_torch/autotuner.py
**/*.{cpp,h,cu,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
tensorrt_llm/_torch/autotuner.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: amitz-nv
PR: NVIDIA/TensorRT-LLM#5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks `is_adapter_in_cpu_cache()` and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.
🪛 Ruff (0.12.2)
tensorrt_llm/_torch/autotuner.py
397-397: Line too long (154 > 120)
(E501)
432-432: Line too long (128 > 120)
(E501)
434-434: Line too long (145 > 120)
(E501)
435-435: Line too long (157 > 120)
(E501)
442-442: Line too long (137 > 120)
(E501)
672-672: Line too long (142 > 120)
(E501)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (4)
tensorrt_llm/_torch/autotuner.py (4)
200-200
: LGTM! Improved logging consistency.The removal of colons from the autotune context manager log messages creates uniform formatting across the autotuning process.
Also applies to: 206-206
396-399
: LGTM! Excellent implementation of graduated logging levels.The change from a single ERROR log to WARNING (with user guidance) + DEBUG (with technical details) perfectly aligns with the PR objectives and provides the right balance of information at appropriate log levels.
428-429
: LGTM! Improved debug log formatting.The reformatted debug log message enhances readability while maintaining the same informational content.
576-576
: LGTM! Consistent debug log formatting improvement.The reformatted debug log message maintains consistency with other logging improvements in this PR.
/bot run --disable-fail-fast |
PR_Github #13611 [ run ] triggered by Bot |
PR_Github #13611 [ run ] completed with state |
* Change fallback alert from DEBUG level to WARNING level and only do once. * Add debug information for profiling cache right after warmup phase. * Change the level of exception message during tactic profiling from ERROR to WARNING. And all exception details are pushed to DEBUG. * Other trivial refinements. Signed-off-by: Yukun He <[email protected]>
8c65167
to
5af6f15
Compare
/bot run --disable-fail-fast |
PR_Github #13691 [ run ] triggered by Bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
tensorrt_llm/_torch/autotuner.py (2)
368-368
: Variable name still contains typo from past review.The variable
new_tuning_failure_occured
contains a typo that was flagged in previous reviews but hasn't been fixed.Apply this diff to fix the typo:
- new_tuning_failure_occured = False + new_tuning_failure_occurred = False
402-402
: Fix variable name typo.Same typo issue as line 368.
Apply this diff:
- new_tuning_failure_occured = True + new_tuning_failure_occurred = True
🧹 Nitpick comments (3)
tensorrt_llm/_torch/autotuner.py (3)
430-436
: Comprehensive warning message for critical scenario.This warning appropriately addresses the scenario where no valid runner/tactic pairs are found. The detailed explanation helps developers understand the implications and next steps.
Consider breaking the long lines to improve readability:
- logger.warning( - f"[Autotuner] No valid runner/tactic was found for custom_op={custom_op}, input_shapes={input_shapes}. " - f"At least one valid (runner, tactic) pair is required. " - f"If get_valid_tactics is intended to return empty list, please ensure that this profile is not valid for the custom_op " - f"and should not occurs during the inference stage, or fallback tactic is implemented. Otherwise, the the tuning process will crash." - ) + logger.warning( + f"[Autotuner] No valid runner/tactic was found for custom_op={custom_op}, " + f"input_shapes={input_shapes}. At least one valid (runner, tactic) pair is required. " + f"If get_valid_tactics is intended to return empty list, please ensure that this " + f"profile is not valid for the custom_op and should not occur during the inference " + f"stage, or fallback tactic is implemented. Otherwise, the tuning process will crash." + )
439-446
: Well-designed summary logging with actionable guidance.The summary warning provides valuable information about tuning failures with clear instructions for users to get more details. This addresses the concerns from past reviews about providing appropriate guidance.
Fix the variable name typo and address line length issues:
- if new_tuning_failure_occured: + if new_tuning_failure_occurred: logger.warning( - f"[Autotuner] New tuning error occurs:" - f"Total failed profiling tactics occurs: {len(self.stats.failed_profiling_count[custom_op])} for custom_op={custom_op}. " - f"This will not block the tuning process. " - f"Please set TLLM_LOG_LEVEL=WARNING to find out when the tactic profiling fails. " - f"Set TLLM_LOG_LEVEL=DEBUG to get more details of the failures." + f"[Autotuner] New tuning error occurs: " + f"Total failed profiling tactics: {len(self.stats.failed_profiling_count[custom_op])} " + f"for custom_op={custom_op}. This will not block the tuning process. " + f"Please set TLLM_LOG_LEVEL=WARNING to find out when the tactic profiling fails. " + f"Set TLLM_LOG_LEVEL=DEBUG to get more details of the failures." )
669-676
: Valuable addition for debugging profiling cache state.The new
print_profiling_cache
method provides excellent visibility into the cache contents, which aligns with the PR objective of improving AutoTuner log information.Address the line length issue:
logger.debug( - f"[Autotuner] Cache contents: (custom_op, runner, hash(attributes), shape_profiles) -> (runner_id, tactic, shape_profile(ignored))" + f"[Autotuner] Cache contents: (custom_op, runner, hash(attributes), shape_profiles) -> " + f"(runner_id, tactic, shape_profile(ignored))" )
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
tensorrt_llm/_torch/autotuner.py
(8 hunks)tensorrt_llm/_torch/pyexecutor/model_engine.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tensorrt_llm/_torch/pyexecutor/model_engine.py
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: amitz-nv
PR: NVIDIA/TensorRT-LLM#5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks `is_adapter_in_cpu_cache()` and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.
🪛 Ruff (0.12.2)
tensorrt_llm/_torch/autotuner.py
397-397: Line too long (154 > 120)
(E501)
432-432: Line too long (128 > 120)
(E501)
434-434: Line too long (145 > 120)
(E501)
435-435: Line too long (157 > 120)
(E501)
442-442: Line too long (137 > 120)
(E501)
672-672: Line too long (142 > 120)
(E501)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (5)
tensorrt_llm/_torch/autotuner.py (5)
200-206
: LGTM! Consistent log message formatting.The removal of colons from the
[Autotuner]:
prefixes creates consistent formatting across the codebase.
353-357
: Good improvement to reduce log noise.The change from multiple debug logs to a single
warning_once
per custom operation is a significant improvement. The condition to only warn when the cache is non-empty prevents false alarms during initial runs.
396-399
: Excellent improvement to log level hierarchy.The two-level logging approach is well-implemented:
- WARNING level provides actionable guidance without overwhelming users
- DEBUG level contains technical details for deeper investigation
This addresses the concerns raised in past reviews about not treating expected failures as errors.
428-429
: Improved debug log clarity.The reformatted debug message is cleaner and more readable.
576-576
: Improved debug log formatting.The consistent formatting enhances readability of debug messages.
PR_Github #13691 [ run ] completed with state |
* Change the fallback alert from DEBUG to WARNING level and only do it once. * Add debug information for profiling cache right after the warmup phase. * Change the level of exception message during tactic profiling from ERROR to WARNING level. All exception details are pushed to the DEBUG level. * Other trivial refinements and cleanups. Signed-off-by: Yukun He <[email protected]> Signed-off-by: Lanyu Liao <[email protected]>
* Change the fallback alert from DEBUG to WARNING level and only do it once. * Add debug information for profiling cache right after the warmup phase. * Change the level of exception message during tactic profiling from ERROR to WARNING level. All exception details are pushed to the DEBUG level. * Other trivial refinements and cleanups. Signed-off-by: Yukun He <[email protected]>
Summary by CodeRabbit
New Features
Improvements