-
Notifications
You must be signed in to change notification settings - Fork 1.7k
[https://nvbugs/5450855][fix] Cherry pick #6700 and #6702 from main #6808
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Yiqing Yan <[email protected]> Signed-off-by: Yanchao Lu <[email protected]> Co-authored-by: Yanchao Lu <[email protected]>
WalkthroughAdds guarded CUDA import paths preferring cuda.bindings with legacy fallbacks across core and test modules, constrains cuda-python to >=12,<13, and updates _ipc_utils._raise_if_error to handle both cudart runtime and driver errors with a unified helper and adjusted type signature. Changes
Sequence Diagram(s)sequenceDiagram
participant Module
participant ImportSystem as Python Import System
Module->>ImportSystem: import cuda.bindings.runtime as cudart
alt bindings available
ImportSystem-->>Module: cudart (bindings)
else ImportError
Module->>ImportSystem: from cuda import cudart
ImportSystem-->>Module: cudart (legacy)
end
sequenceDiagram
participant Caller
participant Utils as _raise_if_error
participant CUDA as CUDA Enums
Caller->>Utils: _raise_if_error(error)
alt error is cudart.cudaError_t
Utils->>CUDA: compare to cudart.cudaSuccess
alt not success
Utils-->>Caller: raise RuntimeError("CUDA Runtime API error")
else success
Utils-->>Caller: return
end
else error is cuda.CUresult
Utils->>CUDA: compare to CUDA_SUCCESS
alt not success
Utils-->>Caller: raise RuntimeError("CUDA Driver API error")
else success
Utils-->>Caller: return
end
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Possibly related PRs
Suggested reviewers
Poem
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🔭 Outside diff range comments (2)
tensorrt_llm/runtime/multimodal_model_runner.py (1)
1-1
: Add NVIDIA SPDX header to comply with repository policyThis source file appears to be missing the NVIDIA copyright header required for non-test Python files.
Apply at the top of the file:
+ # SPDX-FileCopyrightText: Copyright (c) 2022-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. + # SPDX-License-Identifier: Apache-2.0tensorrt_llm/_ipc_utils.py (1)
31-38
: Fix Python 3.8 incompatibility and handle raw int error codes in _raise_if_error.
- Using the PEP 604 union (cudart.cudaError_t | cuda.CUresult) requires Python 3.10+, but our standard is Python 3.8+. This will fail to parse on 3.8/3.9.
- The function currently no-ops for raw int error codes (returned by some cuda-python versions), silently suppressing errors.
Update the annotation to typing.Union and add a fallback path for int codes.
Apply this diff:
-def _raise_if_error(error: cudart.cudaError_t | cuda.CUresult): - if isinstance(error, cudart.cudaError_t): - if error != cudart.cudaError_t.cudaSuccess: - raise RuntimeError(f"CUDA Runtime API error: {repr(error)}") - if isinstance(error, cuda.CUresult): - if error != cuda.CUresult.CUDA_SUCCESS: - raise RuntimeError(f"CUDA Driver API error: {repr(error)}") +def _raise_if_error(error: Union[cudart.cudaError_t, cuda.CUresult, int]) -> None: + if isinstance(error, cudart.cudaError_t): + if error != cudart.cudaError_t.cudaSuccess: + raise RuntimeError(f"CUDA Runtime API error: {error!r}") + elif isinstance(error, cuda.CUresult): + if error != cuda.CUresult.CUDA_SUCCESS: + raise RuntimeError(f"CUDA Driver API error: {error!r}") + elif isinstance(error, int): + if error != 0: + raise RuntimeError(f"Non-zero CUDA error code: {error}") + else: + raise TypeError(f"Unsupported CUDA error type: {type(error)!r}")Additionally, update the typing import to include Union:
from typing import List, Tuple, Union
🧹 Nitpick comments (4)
tensorrt_llm/_torch/pyexecutor/py_executor.py (2)
19-22
: Remove the no-op try/except — it does nothing and adds confusionThis try/except doesn’t guard any import and has no side effects. It should be removed to avoid misleading future readers and failing lint rules (e.g., W0702/E722-style policies).
Apply this diff to remove the no-op block:
-try: - pass -except ImportError: - pass
19-22
: File is missing the required NVIDIA copyright headerPer repo guidelines, all .py files should include the NVIDIA copyright header with the current year. Please add it at the top of the file.
Add this header at the top of the file (outside the shown range):
# SPDX-FileCopyrightText: Copyright (c) 2022-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-License-Identifier: Apache-2.0tensorrt_llm/auto_parallel/cluster_info.py (1)
9-12
: File is missing the required NVIDIA copyright headerThis module lacks the repository-standard NVIDIA copyright header with the current year.
Add this header at the top of the file (outside the shown range):
# SPDX-FileCopyrightText: Copyright (c) 2022-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-License-Identifier: Apache-2.0tensorrt_llm/_ipc_utils.py (1)
20-24
: Guarded CUDA imports look good; consider a tighter exception and minimal diagnostics.Catching ImportError is fine. Optionally:
- Catch ModuleNotFoundError specifically to avoid swallowing unrelated ImportErrors from within cuda.bindings.
- Emit a debug log indicating which CUDA import path was selected (helps diagnose mixed environments). This would require moving the logger import above this try/except.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (17)
cpp/kernels/fmha_v2/fmha_test.py
(1 hunks)requirements.txt
(1 hunks)tensorrt_llm/_ipc_utils.py
(1 hunks)tensorrt_llm/_mnnvl_utils.py
(1 hunks)tensorrt_llm/_torch/pyexecutor/py_executor.py
(1 hunks)tensorrt_llm/auto_parallel/cluster_info.py
(1 hunks)tensorrt_llm/runtime/generation.py
(1 hunks)tensorrt_llm/runtime/multimodal_model_runner.py
(1 hunks)tests/integration/defs/sysinfo/get_sysinfo.py
(1 hunks)tests/microbenchmarks/all_reduce.py
(1 hunks)tests/microbenchmarks/build_time_benchmark.py
(1 hunks)tests/unittest/_torch/multi_gpu/test_lowprecision_allreduce.py
(1 hunks)tests/unittest/trt/functional/test_allreduce_norm.py
(1 hunks)tests/unittest/trt/functional/test_allreduce_prepost_residual_norm.py
(1 hunks)tests/unittest/trt/functional/test_nccl.py
(1 hunks)tests/unittest/trt/functional/test_pp_reduce_scatter.py
(1 hunks)tests/unittest/utils/util.py
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py
: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.
Files:
tests/unittest/trt/functional/test_pp_reduce_scatter.py
tensorrt_llm/_mnnvl_utils.py
cpp/kernels/fmha_v2/fmha_test.py
tensorrt_llm/auto_parallel/cluster_info.py
tensorrt_llm/runtime/generation.py
tests/unittest/trt/functional/test_nccl.py
tests/unittest/trt/functional/test_allreduce_norm.py
tensorrt_llm/_torch/pyexecutor/py_executor.py
tests/integration/defs/sysinfo/get_sysinfo.py
tests/unittest/_torch/multi_gpu/test_lowprecision_allreduce.py
tests/unittest/trt/functional/test_allreduce_prepost_residual_norm.py
tests/unittest/utils/util.py
tests/microbenchmarks/build_time_benchmark.py
tensorrt_llm/runtime/multimodal_model_runner.py
tests/microbenchmarks/all_reduce.py
tensorrt_llm/_ipc_utils.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
tests/unittest/trt/functional/test_pp_reduce_scatter.py
tensorrt_llm/_mnnvl_utils.py
cpp/kernels/fmha_v2/fmha_test.py
tensorrt_llm/auto_parallel/cluster_info.py
tensorrt_llm/runtime/generation.py
tests/unittest/trt/functional/test_nccl.py
tests/unittest/trt/functional/test_allreduce_norm.py
tensorrt_llm/_torch/pyexecutor/py_executor.py
tests/integration/defs/sysinfo/get_sysinfo.py
tests/unittest/_torch/multi_gpu/test_lowprecision_allreduce.py
tests/unittest/trt/functional/test_allreduce_prepost_residual_norm.py
tests/unittest/utils/util.py
tests/microbenchmarks/build_time_benchmark.py
tensorrt_llm/runtime/multimodal_model_runner.py
tests/microbenchmarks/all_reduce.py
tensorrt_llm/_ipc_utils.py
🧠 Learnings (2)
📚 Learning: 2025-08-06T13:58:07.506Z
Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.
Applied to files:
tests/unittest/utils/util.py
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Applied to files:
tests/unittest/utils/util.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (12)
tests/unittest/trt/functional/test_allreduce_prepost_residual_norm.py (1)
24-27
: LGTM: Guarded CUDA runtime importThe guarded import pattern (prefer
cuda.bindings.runtime
, fallback to legacycuda.cudart
) looks correct and consistent with the PR’s goals.tests/microbenchmarks/build_time_benchmark.py (1)
11-14
: LGTM: Guarded cudart importThe fallback from
cuda.bindings.runtime
tocuda.cudart
is appropriate. This file does not use driver APIs, so no additional guarded driver import is necessary here.tests/unittest/trt/functional/test_pp_reduce_scatter.py (1)
24-27
: LGTM: Guarded CUDA runtime importThe import guard improves portability across CUDA Python layouts without changing test logic.
tensorrt_llm/runtime/generation.py (1)
32-35
: LGTM: Guarded CUDA runtime importSwitching to a guarded
cudart
import is appropriate and consistent with broader changes. No behavioral changes introduced.tensorrt_llm/auto_parallel/cluster_info.py (1)
9-12
: LGTM: Guarded cudart importNew import guard for
cudart
is consistent with the rest of the PR and enables compatibility across CUDA Python layouts.tensorrt_llm/runtime/multimodal_model_runner.py (1)
16-21
: Guarded cudart import improves portability — LGTMThe try/except fallback to legacy
cuda.cudart
is correct and scoped toImportError
. No further changes needed.tests/unittest/trt/functional/test_nccl.py (1)
24-27
: Guarded cudart import — LGTMThe fallback strategy to
from cuda import cudart
ensures compatibility across environments.tensorrt_llm/_mnnvl_utils.py (1)
22-25
: Guarded CUDA driver import — LGTMThe
cuda.bindings.driver
import with fallback tocuda
is correct and localized toImportError
.tests/unittest/_torch/multi_gpu/test_lowprecision_allreduce.py (1)
37-40
: Guarded CUDA driver import inside worker — LGTMLocalizing the guarded import within
run_single_rank
is appropriate for multi-process execution.cpp/kernels/fmha_v2/fmha_test.py (1)
5-9
: Guarded CUDA driver and NVRTC imports — LGTMThis mirrors the pattern used elsewhere and keeps the existing ASSERT helpers working across layouts.
tests/unittest/utils/util.py (1)
11-16
: Guarded CUDA driver and NVRTC imports — LGTMMatches the repo-wide pattern and keeps
ASSERT_DRV
robust.tests/unittest/trt/functional/test_allreduce_norm.py (1)
24-27
: Guarded cudart import — LGTMThe fallback to legacy
cuda.cudart
ensures test portability across CUDA Python layouts.
bf46ac8
to
fe345cd
Compare
Signed-off-by: Yanchao Lu <[email protected]>
fe345cd
to
9a7e8be
Compare
/bot run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (2)
tests/integration/defs/sysinfo/get_sysinfo.py (2)
71-91
: Respect the device_id parameter in get_compute_capability.The function ignores the provided device_id and always queries device 0.
def get_compute_capability(device_id=0): # Init err, = cuda.cuInit(0) ASSERT_DRV(err) # Device - err, cuDevice = cuda.cuDeviceGet(0) + err, cuDevice = cuda.cuDeviceGet(device_id) ASSERT_DRV(err)
169-190
: Guard against uninitialized variables (chip, pci_device_id) when deriving GPU name.chip and pci_device_id may be referenced before assignment if PCI info is None or mapping misses the entity, leading to UnboundLocalError.
- full_name = pynvml.nvmlDeviceGetName(handle) + full_name = pynvml.nvmlDeviceGetName(handle) if "Graphics Device" in full_name: - # Use chip name to get the GPU name + # Use chip name to get the GPU name + chip = None + pci_device_id = None pci_info = pynvml.nvmlDeviceGetPciInfo(handle) if pci_info is None: logger.warning( "Failed to get PCI information. PCI parameters and chip name won't be reported" ) else: - pci_device_id = hex(pci_info.pciDeviceId) - if pci_device_id is not None and chip_name_mapping: - entity = str(pci_device_id).lower() - if entity in chip_name_mapping: - chip = str(chip_name_mapping[entity]).lower() - else: - err_msg = r"Could not find a chip name associated with this device id - {}. Chip name won't be reported".format( - entity) - logger.warning(err_msg) - full_name = "{} Bring-up Board".format(chip.upper()) + pci_device_id = hex(pci_info.pciDeviceId) + if pci_device_id and chip_name_mapping: + entity = str(pci_device_id).lower() + chip = str(chip_name_mapping.get(entity, "")).lower() + if not chip: + logger.warning( + r"Could not find a chip name associated with this device id - {}. Chip name won't be reported".format( + entity)) + if chip: + full_name = f"{chip.upper()} Bring-up Board"
♻️ Duplicate comments (1)
tests/integration/defs/sysinfo/get_sysinfo.py (1)
28-31
: Import nvrtc to match ASSERT_DRV usage; also narrow except to ModuleNotFoundError.ASSERT_DRV references nvrtc.nvrtcResult, but nvrtc isn’t imported. This will raise NameError on the NVRTC path. Prior feedback already flagged this; applying here with a precise exception type.
-try: - from cuda.bindings import driver as cuda -except ImportError: - from cuda import cuda +try: + from cuda.bindings import driver as cuda + from cuda.bindings import nvrtc +except ModuleNotFoundError: + from cuda import cuda, nvrtc
🧹 Nitpick comments (2)
tests/unittest/trt/functional/test_allreduce_prepost_residual_norm.py (2)
24-27
: Narrow the import exception to ModuleNotFoundError to avoid masking unrelated ImportErrors.Catching ImportError can hide real errors thrown inside cuda.bindings.runtime; ModuleNotFoundError limits the scope to missing-module cases, per our guidelines on precise exception handling.
-try: - from cuda.bindings import runtime as cudart -except ImportError: - from cuda import cudart +try: + from cuda.bindings import runtime as cudart +except ModuleNotFoundError: + from cuda import cudart
1-1
: Update header year to include 2025 per coding guidelines.Guidelines require the NVIDIA copyright header to include the current year.
-# SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-FileCopyrightText: Copyright (c) 2022-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (15)
cpp/kernels/fmha_v2/fmha_test.py
(1 hunks)tensorrt_llm/_ipc_utils.py
(1 hunks)tensorrt_llm/_mnnvl_utils.py
(1 hunks)tensorrt_llm/auto_parallel/cluster_info.py
(1 hunks)tensorrt_llm/runtime/generation.py
(1 hunks)tensorrt_llm/runtime/multimodal_model_runner.py
(1 hunks)tests/integration/defs/sysinfo/get_sysinfo.py
(1 hunks)tests/microbenchmarks/all_reduce.py
(1 hunks)tests/microbenchmarks/build_time_benchmark.py
(1 hunks)tests/unittest/_torch/multi_gpu/test_lowprecision_allreduce.py
(1 hunks)tests/unittest/trt/functional/test_allreduce_norm.py
(1 hunks)tests/unittest/trt/functional/test_allreduce_prepost_residual_norm.py
(1 hunks)tests/unittest/trt/functional/test_nccl.py
(1 hunks)tests/unittest/trt/functional/test_pp_reduce_scatter.py
(1 hunks)tests/unittest/utils/util.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (13)
- tests/unittest/trt/functional/test_pp_reduce_scatter.py
- tests/microbenchmarks/build_time_benchmark.py
- tensorrt_llm/runtime/multimodal_model_runner.py
- tests/microbenchmarks/all_reduce.py
- tensorrt_llm/runtime/generation.py
- tests/unittest/_torch/multi_gpu/test_lowprecision_allreduce.py
- tensorrt_llm/_mnnvl_utils.py
- tensorrt_llm/auto_parallel/cluster_info.py
- cpp/kernels/fmha_v2/fmha_test.py
- tensorrt_llm/_ipc_utils.py
- tests/unittest/trt/functional/test_nccl.py
- tests/unittest/trt/functional/test_allreduce_norm.py
- tests/unittest/utils/util.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py
: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.
Files:
tests/integration/defs/sysinfo/get_sysinfo.py
tests/unittest/trt/functional/test_allreduce_prepost_residual_norm.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
tests/integration/defs/sysinfo/get_sysinfo.py
tests/unittest/trt/functional/test_allreduce_prepost_residual_norm.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
PR_Github #14890 [ run ] triggered by Bot |
PR_Github #14890 [ run ] completed with state |
…from main (NVIDIA#6808) Signed-off-by: Yiqing Yan <[email protected]> Signed-off-by: Yanchao Lu <[email protected]> Co-authored-by: Yiqing Yan <[email protected]>
Summary by CodeRabbit
Bug Fixes
Chores
Tests