Skip to content

Conversation

chzblych
Copy link
Collaborator

@chzblych chzblych commented Aug 12, 2025

Summary by CodeRabbit

  • Bug Fixes

    • Improved CUDA compatibility with guarded imports for runtime/driver bindings across modules.
    • Clearer error handling for CUDA runtime and driver failures.
  • Chores

    • Pinned cuda-python to 12.x for consistent environment support.
  • Tests

    • Updated test utilities and microbenchmarks to use robust CUDA import paths.
    • Ensured test suites work across environments with different CUDA binding layouts.

Signed-off-by: Yiqing Yan <[email protected]>
Signed-off-by: Yanchao Lu <[email protected]>
Co-authored-by: Yanchao Lu <[email protected]>
@chzblych chzblych requested review from a team as code owners August 12, 2025 02:23
@chzblych chzblych requested a review from lfr-0531 August 12, 2025 02:23
Copy link
Contributor

coderabbitai bot commented Aug 12, 2025

Walkthrough

Adds guarded CUDA import paths preferring cuda.bindings with legacy fallbacks across core and test modules, constrains cuda-python to >=12,<13, and updates _ipc_utils._raise_if_error to handle both cudart runtime and driver errors with a unified helper and adjusted type signature.

Changes

Cohort / File(s) Summary
CUDA import guard (core)
tensorrt_llm/_mnnvl_utils.py, tensorrt_llm/auto_parallel/cluster_info.py, tensorrt_llm/runtime/generation.py, tensorrt_llm/runtime/multimodal_model_runner.py
Switch to try/except imports preferring `cuda.bindings.(driver
CUDA import guard (tests)
tests/integration/defs/sysinfo/get_sysinfo.py, tests/microbenchmarks/*, tests/unittest/_torch/multi_gpu/test_lowprecision_allreduce.py, tests/unittest/trt/functional/*, tests/unittest/utils/util.py
Apply the same guarded import pattern for driver/runtime/nvrtc across test utilities, integration, and benchmarks.
CUDA import guard (cpp kernel test)
cpp/kernels/fmha_v2/fmha_test.py
Guarded import for cuda and nvrtc via cuda.bindings with legacy fallback.
IPC error handling update
tensorrt_llm/_ipc_utils.py
Dual-path import for driver/runtime; _raise_if_error now accepts `cudart.cudaError_t
Dependency constraint
requirements.txt
Constrain cuda-python to >=12,<13; retain other deps unchanged.

Sequence Diagram(s)

sequenceDiagram
  participant Module
  participant ImportSystem as Python Import System
  Module->>ImportSystem: import cuda.bindings.runtime as cudart
  alt bindings available
    ImportSystem-->>Module: cudart (bindings)
  else ImportError
    Module->>ImportSystem: from cuda import cudart
    ImportSystem-->>Module: cudart (legacy)
  end
Loading
sequenceDiagram
  participant Caller
  participant Utils as _raise_if_error
  participant CUDA as CUDA Enums
  Caller->>Utils: _raise_if_error(error)
  alt error is cudart.cudaError_t
    Utils->>CUDA: compare to cudart.cudaSuccess
    alt not success
      Utils-->>Caller: raise RuntimeError("CUDA Runtime API error")
    else success
      Utils-->>Caller: return
    end
  else error is cuda.CUresult
    Utils->>CUDA: compare to CUDA_SUCCESS
    alt not success
      Utils-->>Caller: raise RuntimeError("CUDA Driver API error")
    else success
      Utils-->>Caller: return
    end
  end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

Suggested reviewers

  • schetlur-nv
  • dcampora
  • chuangz0

Poem

I twitch my whiskers at imports anew,
Hop from bindings to legacy, clean and true.
Errors corralled, both runtime and drive,
Versioned carrots keep the garden alive.
With gentle thumps, I test and I run—
CUDA paths aligned, my sprint is done. 🥕🐇

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🔭 Outside diff range comments (2)
tensorrt_llm/runtime/multimodal_model_runner.py (1)

1-1: Add NVIDIA SPDX header to comply with repository policy

This source file appears to be missing the NVIDIA copyright header required for non-test Python files.

Apply at the top of the file:

+ # SPDX-FileCopyrightText: Copyright (c) 2022-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ # SPDX-License-Identifier: Apache-2.0
tensorrt_llm/_ipc_utils.py (1)

31-38: Fix Python 3.8 incompatibility and handle raw int error codes in _raise_if_error.

  • Using the PEP 604 union (cudart.cudaError_t | cuda.CUresult) requires Python 3.10+, but our standard is Python 3.8+. This will fail to parse on 3.8/3.9.
  • The function currently no-ops for raw int error codes (returned by some cuda-python versions), silently suppressing errors.

Update the annotation to typing.Union and add a fallback path for int codes.

Apply this diff:

-def _raise_if_error(error: cudart.cudaError_t | cuda.CUresult):
-    if isinstance(error, cudart.cudaError_t):
-        if error != cudart.cudaError_t.cudaSuccess:
-            raise RuntimeError(f"CUDA Runtime API error: {repr(error)}")
-    if isinstance(error, cuda.CUresult):
-        if error != cuda.CUresult.CUDA_SUCCESS:
-            raise RuntimeError(f"CUDA Driver API error: {repr(error)}")
+def _raise_if_error(error: Union[cudart.cudaError_t, cuda.CUresult, int]) -> None:
+    if isinstance(error, cudart.cudaError_t):
+        if error != cudart.cudaError_t.cudaSuccess:
+            raise RuntimeError(f"CUDA Runtime API error: {error!r}")
+    elif isinstance(error, cuda.CUresult):
+        if error != cuda.CUresult.CUDA_SUCCESS:
+            raise RuntimeError(f"CUDA Driver API error: {error!r}")
+    elif isinstance(error, int):
+        if error != 0:
+            raise RuntimeError(f"Non-zero CUDA error code: {error}")
+    else:
+        raise TypeError(f"Unsupported CUDA error type: {type(error)!r}")

Additionally, update the typing import to include Union:

from typing import List, Tuple, Union
🧹 Nitpick comments (4)
tensorrt_llm/_torch/pyexecutor/py_executor.py (2)

19-22: Remove the no-op try/except — it does nothing and adds confusion

This try/except doesn’t guard any import and has no side effects. It should be removed to avoid misleading future readers and failing lint rules (e.g., W0702/E722-style policies).

Apply this diff to remove the no-op block:

-try:
-    pass
-except ImportError:
-    pass

19-22: File is missing the required NVIDIA copyright header

Per repo guidelines, all .py files should include the NVIDIA copyright header with the current year. Please add it at the top of the file.

Add this header at the top of the file (outside the shown range):

# SPDX-FileCopyrightText: Copyright (c) 2022-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
tensorrt_llm/auto_parallel/cluster_info.py (1)

9-12: File is missing the required NVIDIA copyright header

This module lacks the repository-standard NVIDIA copyright header with the current year.

Add this header at the top of the file (outside the shown range):

# SPDX-FileCopyrightText: Copyright (c) 2022-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
tensorrt_llm/_ipc_utils.py (1)

20-24: Guarded CUDA imports look good; consider a tighter exception and minimal diagnostics.

Catching ImportError is fine. Optionally:

  • Catch ModuleNotFoundError specifically to avoid swallowing unrelated ImportErrors from within cuda.bindings.
  • Emit a debug log indicating which CUDA import path was selected (helps diagnose mixed environments). This would require moving the logger import above this try/except.
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 751d5f1 and bf46ac8.

📒 Files selected for processing (17)
  • cpp/kernels/fmha_v2/fmha_test.py (1 hunks)
  • requirements.txt (1 hunks)
  • tensorrt_llm/_ipc_utils.py (1 hunks)
  • tensorrt_llm/_mnnvl_utils.py (1 hunks)
  • tensorrt_llm/_torch/pyexecutor/py_executor.py (1 hunks)
  • tensorrt_llm/auto_parallel/cluster_info.py (1 hunks)
  • tensorrt_llm/runtime/generation.py (1 hunks)
  • tensorrt_llm/runtime/multimodal_model_runner.py (1 hunks)
  • tests/integration/defs/sysinfo/get_sysinfo.py (1 hunks)
  • tests/microbenchmarks/all_reduce.py (1 hunks)
  • tests/microbenchmarks/build_time_benchmark.py (1 hunks)
  • tests/unittest/_torch/multi_gpu/test_lowprecision_allreduce.py (1 hunks)
  • tests/unittest/trt/functional/test_allreduce_norm.py (1 hunks)
  • tests/unittest/trt/functional/test_allreduce_prepost_residual_norm.py (1 hunks)
  • tests/unittest/trt/functional/test_nccl.py (1 hunks)
  • tests/unittest/trt/functional/test_pp_reduce_scatter.py (1 hunks)
  • tests/unittest/utils/util.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

  • tests/unittest/trt/functional/test_pp_reduce_scatter.py
  • tensorrt_llm/_mnnvl_utils.py
  • cpp/kernels/fmha_v2/fmha_test.py
  • tensorrt_llm/auto_parallel/cluster_info.py
  • tensorrt_llm/runtime/generation.py
  • tests/unittest/trt/functional/test_nccl.py
  • tests/unittest/trt/functional/test_allreduce_norm.py
  • tensorrt_llm/_torch/pyexecutor/py_executor.py
  • tests/integration/defs/sysinfo/get_sysinfo.py
  • tests/unittest/_torch/multi_gpu/test_lowprecision_allreduce.py
  • tests/unittest/trt/functional/test_allreduce_prepost_residual_norm.py
  • tests/unittest/utils/util.py
  • tests/microbenchmarks/build_time_benchmark.py
  • tensorrt_llm/runtime/multimodal_model_runner.py
  • tests/microbenchmarks/all_reduce.py
  • tensorrt_llm/_ipc_utils.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

  • tests/unittest/trt/functional/test_pp_reduce_scatter.py
  • tensorrt_llm/_mnnvl_utils.py
  • cpp/kernels/fmha_v2/fmha_test.py
  • tensorrt_llm/auto_parallel/cluster_info.py
  • tensorrt_llm/runtime/generation.py
  • tests/unittest/trt/functional/test_nccl.py
  • tests/unittest/trt/functional/test_allreduce_norm.py
  • tensorrt_llm/_torch/pyexecutor/py_executor.py
  • tests/integration/defs/sysinfo/get_sysinfo.py
  • tests/unittest/_torch/multi_gpu/test_lowprecision_allreduce.py
  • tests/unittest/trt/functional/test_allreduce_prepost_residual_norm.py
  • tests/unittest/utils/util.py
  • tests/microbenchmarks/build_time_benchmark.py
  • tensorrt_llm/runtime/multimodal_model_runner.py
  • tests/microbenchmarks/all_reduce.py
  • tensorrt_llm/_ipc_utils.py
🧠 Learnings (2)
📚 Learning: 2025-08-06T13:58:07.506Z
Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.

Applied to files:

  • tests/unittest/utils/util.py
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • tests/unittest/utils/util.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (12)
tests/unittest/trt/functional/test_allreduce_prepost_residual_norm.py (1)

24-27: LGTM: Guarded CUDA runtime import

The guarded import pattern (prefer cuda.bindings.runtime, fallback to legacy cuda.cudart) looks correct and consistent with the PR’s goals.

tests/microbenchmarks/build_time_benchmark.py (1)

11-14: LGTM: Guarded cudart import

The fallback from cuda.bindings.runtime to cuda.cudart is appropriate. This file does not use driver APIs, so no additional guarded driver import is necessary here.

tests/unittest/trt/functional/test_pp_reduce_scatter.py (1)

24-27: LGTM: Guarded CUDA runtime import

The import guard improves portability across CUDA Python layouts without changing test logic.

tensorrt_llm/runtime/generation.py (1)

32-35: LGTM: Guarded CUDA runtime import

Switching to a guarded cudart import is appropriate and consistent with broader changes. No behavioral changes introduced.

tensorrt_llm/auto_parallel/cluster_info.py (1)

9-12: LGTM: Guarded cudart import

New import guard for cudart is consistent with the rest of the PR and enables compatibility across CUDA Python layouts.

tensorrt_llm/runtime/multimodal_model_runner.py (1)

16-21: Guarded cudart import improves portability — LGTM

The try/except fallback to legacy cuda.cudart is correct and scoped to ImportError. No further changes needed.

tests/unittest/trt/functional/test_nccl.py (1)

24-27: Guarded cudart import — LGTM

The fallback strategy to from cuda import cudart ensures compatibility across environments.

tensorrt_llm/_mnnvl_utils.py (1)

22-25: Guarded CUDA driver import — LGTM

The cuda.bindings.driver import with fallback to cuda is correct and localized to ImportError.

tests/unittest/_torch/multi_gpu/test_lowprecision_allreduce.py (1)

37-40: Guarded CUDA driver import inside worker — LGTM

Localizing the guarded import within run_single_rank is appropriate for multi-process execution.

cpp/kernels/fmha_v2/fmha_test.py (1)

5-9: Guarded CUDA driver and NVRTC imports — LGTM

This mirrors the pattern used elsewhere and keeps the existing ASSERT helpers working across layouts.

tests/unittest/utils/util.py (1)

11-16: Guarded CUDA driver and NVRTC imports — LGTM

Matches the repo-wide pattern and keeps ASSERT_DRV robust.

tests/unittest/trt/functional/test_allreduce_norm.py (1)

24-27: Guarded cudart import — LGTM

The fallback to legacy cuda.cudart ensures test portability across CUDA Python layouts.

@chzblych chzblych force-pushed the cherry-pick-from-main branch from bf46ac8 to fe345cd Compare August 12, 2025 02:56
@chzblych chzblych force-pushed the cherry-pick-from-main branch from fe345cd to 9a7e8be Compare August 12, 2025 03:04
@chzblych
Copy link
Collaborator Author

/bot run

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (2)
tests/integration/defs/sysinfo/get_sysinfo.py (2)

71-91: Respect the device_id parameter in get_compute_capability.

The function ignores the provided device_id and always queries device 0.

 def get_compute_capability(device_id=0):
     # Init
     err, = cuda.cuInit(0)
     ASSERT_DRV(err)
 
     # Device
-    err, cuDevice = cuda.cuDeviceGet(0)
+    err, cuDevice = cuda.cuDeviceGet(device_id)
     ASSERT_DRV(err)

169-190: Guard against uninitialized variables (chip, pci_device_id) when deriving GPU name.

chip and pci_device_id may be referenced before assignment if PCI info is None or mapping misses the entity, leading to UnboundLocalError.

-        full_name = pynvml.nvmlDeviceGetName(handle)
+        full_name = pynvml.nvmlDeviceGetName(handle)
         if "Graphics Device" in full_name:
-            # Use chip name to get the GPU name
+            # Use chip name to get the GPU name
+            chip = None
+            pci_device_id = None
             pci_info = pynvml.nvmlDeviceGetPciInfo(handle)
             if pci_info is None:
                 logger.warning(
                     "Failed to get PCI information. PCI parameters and chip name won't be reported"
                 )
             else:
-                pci_device_id = hex(pci_info.pciDeviceId)
-            if pci_device_id is not None and chip_name_mapping:
-                entity = str(pci_device_id).lower()
-                if entity in chip_name_mapping:
-                    chip = str(chip_name_mapping[entity]).lower()
-                else:
-                    err_msg = r"Could not find a chip name associated with this device id - {}. Chip name won't be reported".format(
-                        entity)
-                    logger.warning(err_msg)
-            full_name = "{} Bring-up Board".format(chip.upper())
+                pci_device_id = hex(pci_info.pciDeviceId)
+            if pci_device_id and chip_name_mapping:
+                entity = str(pci_device_id).lower()
+                chip = str(chip_name_mapping.get(entity, "")).lower()
+                if not chip:
+                    logger.warning(
+                        r"Could not find a chip name associated with this device id - {}. Chip name won't be reported".format(
+                            entity))
+            if chip:
+                full_name = f"{chip.upper()} Bring-up Board"
♻️ Duplicate comments (1)
tests/integration/defs/sysinfo/get_sysinfo.py (1)

28-31: Import nvrtc to match ASSERT_DRV usage; also narrow except to ModuleNotFoundError.

ASSERT_DRV references nvrtc.nvrtcResult, but nvrtc isn’t imported. This will raise NameError on the NVRTC path. Prior feedback already flagged this; applying here with a precise exception type.

-try:
-    from cuda.bindings import driver as cuda
-except ImportError:
-    from cuda import cuda
+try:
+    from cuda.bindings import driver as cuda
+    from cuda.bindings import nvrtc
+except ModuleNotFoundError:
+    from cuda import cuda, nvrtc
🧹 Nitpick comments (2)
tests/unittest/trt/functional/test_allreduce_prepost_residual_norm.py (2)

24-27: Narrow the import exception to ModuleNotFoundError to avoid masking unrelated ImportErrors.

Catching ImportError can hide real errors thrown inside cuda.bindings.runtime; ModuleNotFoundError limits the scope to missing-module cases, per our guidelines on precise exception handling.

-try:
-    from cuda.bindings import runtime as cudart
-except ImportError:
-    from cuda import cudart
+try:
+    from cuda.bindings import runtime as cudart
+except ModuleNotFoundError:
+    from cuda import cudart

1-1: Update header year to include 2025 per coding guidelines.

Guidelines require the NVIDIA copyright header to include the current year.

-# SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2022-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fe345cd and 9a7e8be.

📒 Files selected for processing (15)
  • cpp/kernels/fmha_v2/fmha_test.py (1 hunks)
  • tensorrt_llm/_ipc_utils.py (1 hunks)
  • tensorrt_llm/_mnnvl_utils.py (1 hunks)
  • tensorrt_llm/auto_parallel/cluster_info.py (1 hunks)
  • tensorrt_llm/runtime/generation.py (1 hunks)
  • tensorrt_llm/runtime/multimodal_model_runner.py (1 hunks)
  • tests/integration/defs/sysinfo/get_sysinfo.py (1 hunks)
  • tests/microbenchmarks/all_reduce.py (1 hunks)
  • tests/microbenchmarks/build_time_benchmark.py (1 hunks)
  • tests/unittest/_torch/multi_gpu/test_lowprecision_allreduce.py (1 hunks)
  • tests/unittest/trt/functional/test_allreduce_norm.py (1 hunks)
  • tests/unittest/trt/functional/test_allreduce_prepost_residual_norm.py (1 hunks)
  • tests/unittest/trt/functional/test_nccl.py (1 hunks)
  • tests/unittest/trt/functional/test_pp_reduce_scatter.py (1 hunks)
  • tests/unittest/utils/util.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (13)
  • tests/unittest/trt/functional/test_pp_reduce_scatter.py
  • tests/microbenchmarks/build_time_benchmark.py
  • tensorrt_llm/runtime/multimodal_model_runner.py
  • tests/microbenchmarks/all_reduce.py
  • tensorrt_llm/runtime/generation.py
  • tests/unittest/_torch/multi_gpu/test_lowprecision_allreduce.py
  • tensorrt_llm/_mnnvl_utils.py
  • tensorrt_llm/auto_parallel/cluster_info.py
  • cpp/kernels/fmha_v2/fmha_test.py
  • tensorrt_llm/_ipc_utils.py
  • tests/unittest/trt/functional/test_nccl.py
  • tests/unittest/trt/functional/test_allreduce_norm.py
  • tests/unittest/utils/util.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

  • tests/integration/defs/sysinfo/get_sysinfo.py
  • tests/unittest/trt/functional/test_allreduce_prepost_residual_norm.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

  • tests/integration/defs/sysinfo/get_sysinfo.py
  • tests/unittest/trt/functional/test_allreduce_prepost_residual_norm.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14890 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14890 [ run ] completed with state SUCCESS
/LLM/release-0.21/L0_MergeRequest_PR pipeline #268 completed with status: 'SUCCESS'

@chzblych chzblych changed the title [None][fix] Cherry pick #6700 and #6702 from main [https://nvbugs/5450855][fix] Cherry pick #6700 and #6702 from main Aug 12, 2025
@chzblych chzblych merged commit b4b1185 into NVIDIA:release/0.21 Aug 12, 2025
3 checks passed
@chzblych chzblych deleted the cherry-pick-from-main branch August 12, 2025 12:11
brb-nv pushed a commit to brb-nv/TensorRT-LLM that referenced this pull request Aug 22, 2025
…from main (NVIDIA#6808)

Signed-off-by: Yiqing Yan <[email protected]>
Signed-off-by: Yanchao Lu <[email protected]>
Co-authored-by: Yiqing Yan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants