Skip to content

Conversation

hyukn
Copy link
Collaborator

@hyukn hyukn commented Jul 25, 2025

  • Change the fallback alert from DEBUG to WARNING level and only do it once.
  • Add debug information for profiling cache right after the warmup phase.
  • Change the level of exception message during tactic profiling from ERROR to WARNING level. All exception details are pushed to the DEBUG level.
  • Other trivial refinements and cleanups.

Summary by CodeRabbit

  • New Features

    • Added a method to display the contents of the profiling cache for debugging purposes.
  • Improvements

    • Enhanced logging for cache misses, profiling failures, and tuning errors to provide clearer and more actionable information.
    • Reformatted and clarified log messages related to profiling and autotuner cache status.
    • Added detailed warnings and summaries to assist users in understanding tuning and inference issues.
    • Standardized autotuner cache size reporting and included profiling cache details after warmup.

@hyukn hyukn requested review from litaotju and djns99 July 25, 2025 14:11
@hyukn hyukn requested review from a team as code owners July 25, 2025 14:11
@hyukn hyukn requested a review from Naveassaf July 25, 2025 14:11
Copy link
Contributor

coderabbitai bot commented Jul 25, 2025

📝 Walkthrough

Walkthrough

The changes enhance logging and error reporting in the autotuning process for TensorRT-LLM's PyTorch integration. The AutoTuner class now provides improved warning and debug messages, explicit error logs for tuning failures, and a new method to print the profiling cache. The model engine's warmup procedure now logs cache size and prints cache contents after warmup.

Changes

Cohort / File(s) Change Summary
Autotuner Enhancements
tensorrt_llm/_torch/autotuner.py
Refined logging and error handling in AutoTuner.choose_one: replaced some debug logs with warnings, added warnings for cache misses and profiling failures, introduced a flag to track profiling failures, reformatted log messages, and added print_profiling_cache method.
Model Engine Logging Update
tensorrt_llm/_torch/pyexecutor/model_engine.py
Removed batch size 1 warmup start info log, reformatted cache size log message after warmup, and added call to AutoTuner.get().print_profiling_cache() to output profiling cache contents.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@hyukn
Copy link
Collaborator Author

hyukn commented Jul 25, 2025

/bot run

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
tensorrt_llm/_torch/pyexecutor/model_engine.py (1)

701-706: Guard verbose cache dump behind DEBUG log level

The cache-size message and print_profiling_cache() execute on every warm-up.
For large workloads this may emit thousands of lines at INFO level, cluttering logs and costing I/O even when users are not actively debugging.

-import logging
+import logging   # at top-of-file (if not imported already)

-logger.info(
-    f"[Autotuner]: Cache size after warmup is "
-    f"{len(AutoTuner.get().profiling_cache)}"
-)
-AutoTuner.get().print_profiling_cache()
+if logger.isEnabledFor(logging.DEBUG):
+    logger.debug(
+        "[Autotuner] Cache size after warmup: "
+        f"{len(AutoTuner.get().profiling_cache)}"
+    )
+    AutoTuner.get().print_profiling_cache()

Keeps default logs tidy while retaining full diagnostics when --log-level=debug is enabled.

tensorrt_llm/_torch/autotuner.py (3)

394-397: Enhanced error handling with appropriate log levels

The logging improvements properly separate warning-level information from debug-level details. The warning message provides actionable guidance to enable debug logs, while exception details are logged at debug level to avoid cluttering production logs.

However, there are line length violations that need to be addressed.

Fix the line length violations:

-            logger.warning(
-                f"[Autotuner]: Failed when profiling runner={r}, tactic={tac}, shapes={shapes}. Set TLLM_LOG_LEVEL=DEBUG for details."
-            )
+            logger.warning(
+                f"[Autotuner]: Failed when profiling runner={r}, tactic={tac}, "
+                f"shapes={shapes}. Set TLLM_LOG_LEVEL=DEBUG for details."
+            )

427-430: Critical error logging for autotuning failures

This addition properly logs critical situations where no valid runner/tactic combination can be found, which is essential for debugging autotuning issues. The error message provides sufficient context for troubleshooting.

However, there's a line length violation that should be addressed.

Fix the line length violation:

-                    logger.error(
-                        f"[Autotuner]: No valid runner/tactic was found for custom_op={custom_op}, input_shapes={input_shapes}. At least one valid (runner, tactic) pair is required. Please check the implementation of the runner for the custom_op."
-                    )
+                    logger.error(
+                        f"[Autotuner]: No valid runner/tactic was found for "
+                        f"custom_op={custom_op}, input_shapes={input_shapes}. "
+                        f"At least one valid (runner, tactic) pair is required. "
+                        f"Please check the implementation of the runner for the custom_op."
+                    )

652-660: New method for profiling cache visibility

The print_profiling_cache method provides valuable debugging capability by allowing inspection of cached profiling results. The implementation correctly iterates through cache entries and formats them for readability.

However, there's a line length violation in the header comment that should be addressed.

Fix the line length violation:

-        logger.debug(
-            f"[Autotuner]: Cache contents: (custom_op, runner, hash(attributes), shape_profiles) -> (runner_id, tactic, shape_profile(ignored))"
-        )
+        logger.debug(
+            f"[Autotuner]: Cache contents: (custom_op, runner, hash(attributes), "
+            f"shape_profiles) -> (runner_id, tactic, shape_profile(ignored))"
+        )
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3805976 and 3d4dd80.

📒 Files selected for processing (2)
  • tensorrt_llm/_torch/autotuner.py (5 hunks)
  • tensorrt_llm/_torch/pyexecutor/model_engine.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

  • tensorrt_llm/_torch/pyexecutor/model_engine.py
  • tensorrt_llm/_torch/autotuner.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

  • tensorrt_llm/_torch/pyexecutor/model_engine.py
  • tensorrt_llm/_torch/autotuner.py
🪛 Ruff (0.12.2)
tensorrt_llm/_torch/autotuner.py

395-395: Line too long (150 > 120)

(E501)


429-429: Line too long (247 > 120)

(E501)


656-656: Line too long (143 > 120)

(E501)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (3)
tensorrt_llm/_torch/autotuner.py (3)

353-357: Improved fallback warning logic with appropriate conditions

The changes correctly implement the PR objective to show fallback warnings only once and only when the profiling cache is not empty. The use of logger.warning_once with a key parameter ensures no spam in logs during inference mode.


425-426: Improved debug message formatting

The reformatted debug message provides clearer information about the profiling process by splitting the message across multiple lines for better readability.


490-490: Consistent debug message formatting

The reformatted debug message maintains consistency with other logging improvements and provides clear profiling results information.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #13017 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #13017 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #9720 completed with status: 'FAILURE'

@hyukn
Copy link
Collaborator Author

hyukn commented Jul 27, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #13106 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #13106 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #9803 completed with status: 'FAILURE'

@hyukn hyukn force-pushed the chore/refine_tuning_log branch 2 times, most recently from aac330d to 2550815 Compare July 28, 2025 05:32
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
tensorrt_llm/_torch/autotuner.py (3)

396-399: Well-structured error handling with appropriate log levels.

The separation of WARNING for user-facing messages and DEBUG for technical details is appropriate. This aligns with the review feedback that individual tactic failures are warnings, not errors.

Consider breaking the long warning message across multiple lines for better readability:

-                            logger.warning(
-                                f"[Autotuner] Failed when profiling runner={r}, tactic={tac}, shapes={shapes}. Set TLLM_LOG_LEVEL=DEBUG for more details."
-                            )
+                            logger.warning(
+                                f"[Autotuner] Failed when profiling runner={r}, tactic={tac}, shapes={shapes}. "
+                                f"Set TLLM_LOG_LEVEL=DEBUG for more details."
+                            )

430-436: Informative warning message, but needs formatting.

The detailed warning helps users understand the issue and provides actionable guidance.

Break the long lines to improve readability:

-                    logger.warning(
-                        f"[Autotuner] No valid runner/tactic was found for custom_op={custom_op}, input_shapes={input_shapes}. "
-                        f"At least one valid (runner, tactic) pair is required. "
-                        f"If get_valid_tactics is intended to return empty list, please ensure that this profile is not valid for the custom_op "
-                        f"and should not occurs during the inference stage, or fallback tactic is implemented. Otherwise, the the tuning process will crash."
-                    )
+                    logger.warning(
+                        f"[Autotuner] No valid runner/tactic was found for custom_op={custom_op}, "
+                        f"input_shapes={input_shapes}. At least one valid (runner, tactic) pair is required. "
+                        f"If get_valid_tactics is intended to return empty list, please ensure that this profile "
+                        f"is not valid for the custom_op and should not occur during the inference stage, "
+                        f"or fallback tactic is implemented. Otherwise, the tuning process will crash."
+                    )

Also note: "should not occurs" should be "should not occur" and "the the" should be "the".


668-675: Useful debug method for cache inspection.

The new print_profiling_cache method provides valuable visibility into the autotuner's cache state.

Break the long line for better readability:

-        logger.debug(
-            f"[Autotuner] Cache contents: (custom_op, runner, hash(attributes), shape_profiles) -> (runner_id, tactic, shape_profile(ignored))"
-        )
+        logger.debug(
+            f"[Autotuner] Cache contents: (custom_op, runner, hash(attributes), shape_profiles) -> "
+            f"(runner_id, tactic, shape_profile(ignored))"
+        )
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3d4dd80 and aac330d.

📒 Files selected for processing (2)
  • tensorrt_llm/_torch/autotuner.py (8 hunks)
  • tensorrt_llm/_torch/pyexecutor/model_engine.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tensorrt_llm/_torch/pyexecutor/model_engine.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

  • tensorrt_llm/_torch/autotuner.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

  • tensorrt_llm/_torch/autotuner.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: amitz-nv
PR: NVIDIA/TensorRT-LLM#5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks `is_adapter_in_cpu_cache()` and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.
🪛 Ruff (0.12.2)
tensorrt_llm/_torch/autotuner.py

397-397: Line too long (154 > 120)

(E501)


432-432: Line too long (128 > 120)

(E501)


434-434: Line too long (145 > 120)

(E501)


435-435: Line too long (157 > 120)

(E501)


441-441: Line too long (169 > 120)

(E501)


671-671: Line too long (142 > 120)

(E501)

🔇 Additional comments (2)
tensorrt_llm/_torch/autotuner.py (2)

200-206: LGTM! Clean log formatting.

The removal of colons from the autotuning start/end messages improves consistency.


353-357: Good improvement to cache miss logging behavior.

The conditional warning only when cache is non-empty and the use of warning_once with custom op as key prevents log spam while still alerting users to performance-impacting cache misses.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
tensorrt_llm/_torch/autotuner.py (3)

368-369: Fix the typo in variable name.

The variable name contains a spelling error that was flagged in previous reviews.

Apply this diff to fix the typo:

-        new_tuning_failure_occured = False
+        new_tuning_failure_occurred = False

This typo also appears on lines 402 and 439 and needs to be fixed consistently.


402-402: Fix the typo in variable name (continued).

This is the same typo that needs to be fixed consistently.

Apply this diff:

-                            new_tuning_failuer_occured = True
+                            new_tuning_failure_occurred = True

439-445: Fix the typo and improve error message formatting.

The error summary is a good addition, but there are issues to address:

  1. The typo in the variable name (flagged in previous reviews)
  2. Incorrect display of failed count (should use len() on the set)
  3. Line length violations

Apply this diff to fix all issues:

-        if new_tuning_failuer_occured:
+        if new_tuning_failure_occurred:
             logger.error(
-                f"[Autotuner] New tuning error occurs: Total failed profiling tactics occurs: {self.stats.failed_profiling_count[custom_op]} for custom_op={custom_op}. "
-                f"This will not block the tuning process. "
-                f"Please set TLLM_LOG_LEVEL=WARNING to find out when the tactic profiling fails. "
-                f"Set TLLM_LOG_LEVEL=DEBUG to get more details of the failures."
+                f"[Autotuner] New tuning error occurs: Total failed profiling tactics: "
+                f"{len(self.stats.failed_profiling_count[custom_op])} for custom_op={custom_op}. "
+                f"This will not block the tuning process. "
+                f"Please set TLLM_LOG_LEVEL=WARNING to find out when the tactic profiling fails. "
+                f"Set TLLM_LOG_LEVEL=DEBUG to get more details of the failures."
             )
🧹 Nitpick comments (3)
tensorrt_llm/_torch/autotuner.py (3)

396-399: Appropriate log level adjustment for profiling exceptions.

The change from ERROR to WARNING for profiling failures is correct since these are expected failures during the tuning process, not actual errors. The two-level logging approach (WARNING with summary, DEBUG with details) provides good user experience.

However, the line length violation should be addressed:

-                            logger.warning(
-                                f"[Autotuner] Failed when profiling runner={r}, tactic={tac}, shapes={shapes}. Set TLLM_LOG_LEVEL=DEBUG for more details."
-                            )
+                            logger.warning(
+                                f"[Autotuner] Failed when profiling runner={r}, tactic={tac}, "
+                                f"shapes={shapes}. Set TLLM_LOG_LEVEL=DEBUG for more details."
+                            )

430-436: Good addition of explicit warning for missing valid tactics.

This warning message provides valuable feedback when no valid runner/tactic pairs are found, helping developers debug their get_valid_tactics implementations.

However, the long lines should be broken up for better readability:

-                    logger.warning(
-                        f"[Autotuner] No valid runner/tactic was found for custom_op={custom_op}, input_shapes={input_shapes}. "
-                        f"At least one valid (runner, tactic) pair is required. "
-                        f"If get_valid_tactics is intended to return empty list, please ensure that this profile is not valid for the custom_op "
-                        f"and should not occurs during the inference stage, or fallback tactic is implemented. Otherwise, the the tuning process will crash."
-                    )
+                    logger.warning(
+                        f"[Autotuner] No valid runner/tactic was found for "
+                        f"custom_op={custom_op}, input_shapes={input_shapes}. "
+                        f"At least one valid (runner, tactic) pair is required. "
+                        f"If get_valid_tactics is intended to return empty list, please ensure "
+                        f"that this profile is not valid for the custom_op and should not occur "
+                        f"during the inference stage, or fallback tactic is implemented. "
+                        f"Otherwise, the tuning process will crash."
+                    )

668-675: Useful addition for debugging profiling cache contents.

The new print_profiling_cache method is a valuable debugging tool that provides clear visibility into the cache state. The implementation correctly iterates through cache entries and formats the output clearly.

However, the long line should be broken up:

         logger.debug(
-            f"[Autotuner] Cache contents: (custom_op, runner, hash(attributes), shape_profiles) -> (runner_id, tactic, shape_profile(ignored))"
+            f"[Autotuner] Cache contents: (custom_op, runner, hash(attributes), "
+            f"shape_profiles) -> (runner_id, tactic, shape_profile(ignored))"
         )
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aac330d and 2550815.

📒 Files selected for processing (2)
  • tensorrt_llm/_torch/autotuner.py (8 hunks)
  • tensorrt_llm/_torch/pyexecutor/model_engine.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • tensorrt_llm/_torch/pyexecutor/model_engine.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

  • tensorrt_llm/_torch/autotuner.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

  • tensorrt_llm/_torch/autotuner.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: amitz-nv
PR: NVIDIA/TensorRT-LLM#5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks `is_adapter_in_cpu_cache()` and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.
🪛 Ruff (0.12.2)
tensorrt_llm/_torch/autotuner.py

397-397: Line too long (154 > 120)

(E501)


432-432: Line too long (128 > 120)

(E501)


434-434: Line too long (145 > 120)

(E501)


435-435: Line too long (157 > 120)

(E501)


441-441: Line too long (169 > 120)

(E501)


671-671: Line too long (142 > 120)

(E501)

🔇 Additional comments (5)
tensorrt_llm/_torch/autotuner.py (5)

200-200: LGTM! Improved log message consistency.

Removing the colons from the autotuning start/end messages improves consistency with other log messages in the codebase.

Also applies to: 206-206


353-357: Good improvement to cache miss logging behavior.

The logic correctly:

  • Only logs when the profiling cache is non-empty (avoiding noise during initial runs)
  • Uses logger.warning_once() to prevent log spam for repeated cache misses
  • Provides useful context with input shapes and custom operation name

428-429: LGTM! Improved debug message formatting.

The debug message formatting is clearer and more consistent with other log messages.


505-505: LGTM! Consistent debug message formatting.

The profiling result message formatting is improved and consistent.


575-575: LGTM! Clear debug message for generated profiles.

The debug message clearly indicates when a new optimization profile is generated.

@hyukn hyukn force-pushed the chore/refine_tuning_log branch from 2550815 to cb802f5 Compare July 28, 2025 08:08
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
tensorrt_llm/_torch/autotuner.py (3)

438-446: Fix typo and line length violation in error summary.

The error summary logic is good, but there are formatting issues that need attention:

  1. The variable name contains the typo identified in previous reviews (line 439)
  2. Line 442 exceeds the 120-character limit

Apply this diff to fix the line length violation:

             logger.error(
-                f"[Autotuner] New tuning error occurs:"
-                f"Total failed profiling tactics occurs: {len(self.stats.failed_profiling_count[custom_op])} for custom_op={custom_op}. "
-                f"This will not block the tuning process. "
-                f"Please set TLLM_LOG_LEVEL=WARNING to find out when the tactic profiling fails. "
-                f"Set TLLM_LOG_LEVEL=DEBUG to get more details of the failures."
+                f"[Autotuner] New tuning error occurs: "
+                f"Total failed profiling tactics: {len(self.stats.failed_profiling_count[custom_op])} "
+                f"for custom_op={custom_op}. This will not block the tuning process. "
+                f"Please set TLLM_LOG_LEVEL=WARNING to find out when the tactic profiling fails. "
+                f"Set TLLM_LOG_LEVEL=DEBUG to get more details of the failures."
             )

368-368: Fix the typo in variable name.

The variable name contains typos that were identified in previous reviews: new_tuning_failuer_occured should be new_tuning_failure_occurred.

Apply this diff to fix the typo:

-        new_tuning_failure_occured = False
+        new_tuning_failure_occurred = False

This typo also appears on lines 402 and 439 and needs to be fixed consistently.


402-402: Fix the typo in variable name (duplicate issue).

The variable name contains the same typo identified in previous reviews: new_tuning_failuer_occured should be new_tuning_failure_occurred.

🧹 Nitpick comments (2)
tensorrt_llm/_torch/autotuner.py (2)

430-436: Address line length violations in warning message.

The warning message and logic are appropriate, but several lines exceed the 120-character limit flagged by static analysis.

Apply this diff to fix the line length violations:

                 else:
                     logger.warning(
-                        f"[Autotuner] No valid runner/tactic was found for custom_op={custom_op}, input_shapes={input_shapes}. "
-                        f"At least one valid (runner, tactic) pair is required. "
-                        f"If get_valid_tactics is intended to return empty list, please ensure that this profile is not valid for the custom_op "
-                        f"and should not occurs during the inference stage, or fallback tactic is implemented. Otherwise, the the tuning process will crash."
+                        f"[Autotuner] No valid runner/tactic was found for custom_op={custom_op}, "
+                        f"input_shapes={input_shapes}. At least one valid (runner, tactic) pair is required. "
+                        f"If get_valid_tactics is intended to return empty list, please ensure that this profile "
+                        f"is not valid for the custom_op and should not occurs during the inference stage, "
+                        f"or fallback tactic is implemented. Otherwise, the tuning process will crash."
                     )

669-676: Good addition but fix line length violation.

The new print_profiling_cache method provides useful debugging capability, but line 672 exceeds the 120-character limit.

Apply this diff to fix the line length violation:

         logger.debug(
-            f"[Autotuner] Cache contents: (custom_op, runner, hash(attributes), shape_profiles) -> (runner_id, tactic, shape_profile(ignored))"
+            f"[Autotuner] Cache contents: (custom_op, runner, hash(attributes), shape_profiles) -> "
+            f"(runner_id, tactic, shape_profile(ignored))"
         )
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2550815 and cb802f5.

📒 Files selected for processing (2)
  • tensorrt_llm/_torch/autotuner.py (8 hunks)
  • tensorrt_llm/_torch/pyexecutor/model_engine.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • tensorrt_llm/_torch/pyexecutor/model_engine.py
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: amitz-nv
PR: NVIDIA/TensorRT-LLM#5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks `is_adapter_in_cpu_cache()` and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.
🪛 Ruff (0.12.2)
tensorrt_llm/_torch/autotuner.py

397-397: Line too long (154 > 120)

(E501)


432-432: Line too long (128 > 120)

(E501)


434-434: Line too long (145 > 120)

(E501)


435-435: Line too long (157 > 120)

(E501)


442-442: Line too long (137 > 120)

(E501)


672-672: Line too long (142 > 120)

(E501)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (5)
tensorrt_llm/_torch/autotuner.py (5)

200-200: LGTM! Consistent log message formatting.

Removing the colons from the autotune start/end messages improves consistency in the logging format.

Also applies to: 206-206


353-357: Good improvement to cache miss logging.

The conditional logging (only when cache is non-empty) and use of warning_once prevents false alarms and log spam while still providing useful information about fallback usage.


396-399: Good improvement to exception logging levels.

Changing from error to warning+debug logging is appropriate since profiling failures are expected behavior rather than actual errors. The warning provides guidance while debug gives details when needed.


428-429: LGTM! Improved debug message formatting.

The reformatted debug message is clearer and more consistent with other logging in the file.


506-506: LGTM! Improved debug message formatting.

The reformatted debug messages for profiling results and generated profiles are clearer and more consistent with other logging in the file.

Also applies to: 576-576

@hyukn
Copy link
Collaborator Author

hyukn commented Jul 28, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #13169 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #13169 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #9862 completed with status: 'FAILURE'

@hyukn
Copy link
Collaborator Author

hyukn commented Jul 28, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #13208 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #13208 [ run ] completed with state FAILURE

@hyukn hyukn force-pushed the chore/refine_tuning_log branch from cb802f5 to 8c65167 Compare July 31, 2025 03:15
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
tensorrt_llm/_torch/autotuner.py (5)

353-357: Fix typo in log message and approve the improved cache miss logic.

The conditional logic and warning_once usage effectively prevents log spam and false alarms when the cache is empty. However, there's a typo in the log message.

Apply this diff to fix the typo:

-                    f"[AutoTunner] Using the fallback tactic, due to cache miss on input shapes={input_shapes}",
+                    f"[AutoTuner] Using the fallback tactic, due to cache miss on input shapes={input_shapes}",

368-368: Fix spelling error in variable name.

The logic for tracking tuning failures is correct, but there's a spelling error in the variable name.

Apply this diff to fix the spelling:

-        new_tuning_failure_occured = False
+        new_tuning_failure_occurred = False
-                            new_tuning_failure_occured = True
+                            new_tuning_failure_occurred = True

Also applies to: 402-402


430-436: Fix line length violations while maintaining the good warning logic.

The warning logic and messaging are excellent for user guidance when no valid runner/tactic pairs are found. However, several lines exceed the 120-character limit.

Apply this diff to fix line length violations:

-                    logger.warning(
-                        f"[Autotuner] No valid runner/tactic was found for custom_op={custom_op}, input_shapes={input_shapes}. "
-                        f"At least one valid (runner, tactic) pair is required. "
-                        f"If get_valid_tactics is intended to return empty list, please ensure that this profile is not valid for the custom_op "
-                        f"and should not occurs during the inference stage, or fallback tactic is implemented. Otherwise, the the tuning process will crash."
-                    )
+                    logger.warning(
+                        f"[Autotuner] No valid runner/tactic was found for custom_op={custom_op}, "
+                        f"input_shapes={input_shapes}. At least one valid (runner, tactic) pair is required. "
+                        f"If get_valid_tactics is intended to return empty list, please ensure that this profile "
+                        f"is not valid for the custom_op and should not occurs during the inference stage, "
+                        f"or fallback tactic is implemented. Otherwise, the tuning process will crash."
+                    )

438-446: Fix spelling error and line length violations in the error summary.

The error summary logic is excellent and aligns with the agreed-upon compromise from past discussions. However, there are formatting issues to address.

Apply this diff to fix the spelling error and line length violations:

-        if new_tuning_failure_occured:
+        if new_tuning_failure_occurred:
             logger.error(
-                f"[Autotuner] New tuning error occurs:"
-                f"Total failed profiling tactics occurs: {len(self.stats.failed_profiling_count[custom_op])} for custom_op={custom_op}. "
-                f"This will not block the tuning process. "
-                f"Please set TLLM_LOG_LEVEL=WARNING to find out when the tactic profiling fails. "
-                f"Set TLLM_LOG_LEVEL=DEBUG to get more details of the failures."
+                f"[Autotuner] New tuning error occurs: "
+                f"Total failed profiling tactics: {len(self.stats.failed_profiling_count[custom_op])} "
+                f"for custom_op={custom_op}. This will not block the tuning process. "
+                f"Please set TLLM_LOG_LEVEL=WARNING to find out when the tactic profiling fails. "
+                f"Set TLLM_LOG_LEVEL=DEBUG to get more details of the failures."
             )

668-676: LGTM! Valuable debugging method with minor formatting fix needed.

The new print_profiling_cache method provides excellent visibility into the autotuner's internal state, which aligns perfectly with the PR objectives for enhanced debugging capabilities.

Apply this diff to fix the line length violation:

         logger.debug(
-            f"[Autotuner] Cache contents: (custom_op, runner, hash(attributes), shape_profiles) -> (runner_id, tactic, shape_profile(ignored))"
+            f"[Autotuner] Cache contents: (custom_op, runner, hash(attributes), shape_profiles) -> "
+            f"(runner_id, tactic, shape_profile(ignored))"
         )
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cb802f5 and 8c65167.

📒 Files selected for processing (2)
  • tensorrt_llm/_torch/autotuner.py (8 hunks)
  • tensorrt_llm/_torch/pyexecutor/model_engine.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tensorrt_llm/_torch/pyexecutor/model_engine.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: The code developed for TensorRT-LLM should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a file, prefer docstrings over comments in Python.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without reflection.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

  • tensorrt_llm/_torch/autotuner.py
**/*.{cpp,h,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

  • tensorrt_llm/_torch/autotuner.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: amitz-nv
PR: NVIDIA/TensorRT-LLM#5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks `is_adapter_in_cpu_cache()` and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.
🪛 Ruff (0.12.2)
tensorrt_llm/_torch/autotuner.py

397-397: Line too long (154 > 120)

(E501)


432-432: Line too long (128 > 120)

(E501)


434-434: Line too long (145 > 120)

(E501)


435-435: Line too long (157 > 120)

(E501)


442-442: Line too long (137 > 120)

(E501)


672-672: Line too long (142 > 120)

(E501)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (4)
tensorrt_llm/_torch/autotuner.py (4)

200-200: LGTM! Improved logging consistency.

The removal of colons from the autotune context manager log messages creates uniform formatting across the autotuning process.

Also applies to: 206-206


396-399: LGTM! Excellent implementation of graduated logging levels.

The change from a single ERROR log to WARNING (with user guidance) + DEBUG (with technical details) perfectly aligns with the PR objectives and provides the right balance of information at appropriate log levels.


428-429: LGTM! Improved debug log formatting.

The reformatted debug log message enhances readability while maintaining the same informational content.


576-576: LGTM! Consistent debug log formatting improvement.

The reformatted debug log message maintains consistency with other logging improvements in this PR.

@hyukn
Copy link
Collaborator Author

hyukn commented Jul 31, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #13611 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #13611 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #10210 completed with status: 'FAILURE'

* Change fallback alert from DEBUG level to WARNING level  and only do once.
* Add debug information for profiling cache right after warmup phase.
* Change the level of exception message during tactic profiling from ERROR to WARNING. And all exception details are pushed to DEBUG.
* Other trivial refinements.

Signed-off-by: Yukun He <[email protected]>
@hyukn hyukn force-pushed the chore/refine_tuning_log branch from 8c65167 to 5af6f15 Compare July 31, 2025 14:48
@hyukn
Copy link
Collaborator Author

hyukn commented Jul 31, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #13691 [ run ] triggered by Bot

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
tensorrt_llm/_torch/autotuner.py (2)

368-368: Variable name still contains typo from past review.

The variable new_tuning_failure_occured contains a typo that was flagged in previous reviews but hasn't been fixed.

Apply this diff to fix the typo:

-        new_tuning_failure_occured = False
+        new_tuning_failure_occurred = False

402-402: Fix variable name typo.

Same typo issue as line 368.

Apply this diff:

-                            new_tuning_failure_occured = True
+                            new_tuning_failure_occurred = True
🧹 Nitpick comments (3)
tensorrt_llm/_torch/autotuner.py (3)

430-436: Comprehensive warning message for critical scenario.

This warning appropriately addresses the scenario where no valid runner/tactic pairs are found. The detailed explanation helps developers understand the implications and next steps.

Consider breaking the long lines to improve readability:

-                    logger.warning(
-                        f"[Autotuner] No valid runner/tactic was found for custom_op={custom_op}, input_shapes={input_shapes}. "
-                        f"At least one valid (runner, tactic) pair is required. "
-                        f"If get_valid_tactics is intended to return empty list, please ensure that this profile is not valid for the custom_op "
-                        f"and should not occurs during the inference stage, or fallback tactic is implemented. Otherwise, the the tuning process will crash."
-                    )
+                    logger.warning(
+                        f"[Autotuner] No valid runner/tactic was found for custom_op={custom_op}, "
+                        f"input_shapes={input_shapes}. At least one valid (runner, tactic) pair is required. "
+                        f"If get_valid_tactics is intended to return empty list, please ensure that this "
+                        f"profile is not valid for the custom_op and should not occur during the inference "
+                        f"stage, or fallback tactic is implemented. Otherwise, the tuning process will crash."
+                    )

439-446: Well-designed summary logging with actionable guidance.

The summary warning provides valuable information about tuning failures with clear instructions for users to get more details. This addresses the concerns from past reviews about providing appropriate guidance.

Fix the variable name typo and address line length issues:

-        if new_tuning_failure_occured:
+        if new_tuning_failure_occurred:
             logger.warning(
-                f"[Autotuner] New tuning error occurs:"
-                f"Total failed profiling tactics occurs: {len(self.stats.failed_profiling_count[custom_op])} for custom_op={custom_op}. "
-                f"This will not block the tuning process. "
-                f"Please set TLLM_LOG_LEVEL=WARNING to find out when the tactic profiling fails. "
-                f"Set TLLM_LOG_LEVEL=DEBUG to get more details of the failures."
+                f"[Autotuner] New tuning error occurs: "
+                f"Total failed profiling tactics: {len(self.stats.failed_profiling_count[custom_op])} "
+                f"for custom_op={custom_op}. This will not block the tuning process. "
+                f"Please set TLLM_LOG_LEVEL=WARNING to find out when the tactic profiling fails. "
+                f"Set TLLM_LOG_LEVEL=DEBUG to get more details of the failures."
             )

669-676: Valuable addition for debugging profiling cache state.

The new print_profiling_cache method provides excellent visibility into the cache contents, which aligns with the PR objective of improving AutoTuner log information.

Address the line length issue:

         logger.debug(
-            f"[Autotuner] Cache contents: (custom_op, runner, hash(attributes), shape_profiles) -> (runner_id, tactic, shape_profile(ignored))"
+            f"[Autotuner] Cache contents: (custom_op, runner, hash(attributes), shape_profiles) -> "
+            f"(runner_id, tactic, shape_profile(ignored))"
         )
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8c65167 and 5af6f15.

📒 Files selected for processing (2)
  • tensorrt_llm/_torch/autotuner.py (8 hunks)
  • tensorrt_llm/_torch/pyexecutor/model_engine.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tensorrt_llm/_torch/pyexecutor/model_engine.py
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: amitz-nv
PR: NVIDIA/TensorRT-LLM#5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks `is_adapter_in_cpu_cache()` and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.
🪛 Ruff (0.12.2)
tensorrt_llm/_torch/autotuner.py

397-397: Line too long (154 > 120)

(E501)


432-432: Line too long (128 > 120)

(E501)


434-434: Line too long (145 > 120)

(E501)


435-435: Line too long (157 > 120)

(E501)


442-442: Line too long (137 > 120)

(E501)


672-672: Line too long (142 > 120)

(E501)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (5)
tensorrt_llm/_torch/autotuner.py (5)

200-206: LGTM! Consistent log message formatting.

The removal of colons from the [Autotuner]: prefixes creates consistent formatting across the codebase.


353-357: Good improvement to reduce log noise.

The change from multiple debug logs to a single warning_once per custom operation is a significant improvement. The condition to only warn when the cache is non-empty prevents false alarms during initial runs.


396-399: Excellent improvement to log level hierarchy.

The two-level logging approach is well-implemented:

  • WARNING level provides actionable guidance without overwhelming users
  • DEBUG level contains technical details for deeper investigation

This addresses the concerns raised in past reviews about not treating expected failures as errors.


428-429: Improved debug log clarity.

The reformatted debug message is cleaner and more readable.


576-576: Improved debug log formatting.

The consistent formatting enhances readability of debug messages.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #13691 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #10283 completed with status: 'SUCCESS'

@hyukn hyukn merged commit 00059de into NVIDIA:main Aug 1, 2025
3 checks passed
lancelly pushed a commit to lancelly/TensorRT-LLM that referenced this pull request Aug 6, 2025
* Change the fallback alert from DEBUG to WARNING level and only do it once.
* Add debug information for profiling cache right after the warmup phase.
* Change the level of exception message during tactic profiling from ERROR to WARNING level. All exception details are pushed to the DEBUG level.
* Other trivial refinements and cleanups.

Signed-off-by: Yukun He <[email protected]>
Signed-off-by: Lanyu Liao <[email protected]>
jain-ria pushed a commit to jain-ria/TensorRT-LLM that referenced this pull request Aug 7, 2025
* Change the fallback alert from DEBUG to WARNING level and only do it once.
* Add debug information for profiling cache right after the warmup phase.
* Change the level of exception message during tactic profiling from ERROR to WARNING level. All exception details are pushed to the DEBUG level.
* Other trivial refinements and cleanups.

Signed-off-by: Yukun He <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants