Skip to content

Commit 89eacb3

Browse files
fix __assert_fail() declaration mismatch error (pytorch#73040) (#920)
Summary: Pull Request resolved: pytorch#73040 This patch fixes a compilation error in PyTorch with ROCm when `NDEBUG` is passed. ## Problem Forward declaration of `__host__ __device__ __assert_fail()` is used in `c10/macros/Macros.h` for HIP compilation when `NDEBUG` is set However, HIP has `__device__ __assert_fail()` in `hip/amd_detail/amd_device_functions.h`, causing a function type error. This issue does not appear in ROCm CI tests since it happens only when `NDEBUG` is passed. ## Solution [EDIT] After the discussion on GitHub, we chose to entirely disable `CUDA_KERNEL_ASSERT()` for ROCm. --- To solve this compilation error, this patch disables `CUDA_KERNEL_ASSERT()`, which uses `__assert_fail()` when 1. `c10/macros/Macros.h` is included for `*.hip` (precisely speaking, `__HIP__` or `__HIP_ARCH__` is defined), and 2. `NDEBUG` is passed. Note that there's no impact on default compilation because, without a special compilation flag, those HIP files are compiled without `-NDEBUG`. And that's why this issue has not been found. ### Justification [1] We cannot declare one host-and-device function for two separate host and device functions. ``` __device__ int func() {return 0}; __host__ int func() {return 0}; // Compile error (hipcc) // __device__ __host__ int func(); ``` [2] Forward declaration of a correct `__device__` only `__assert_fail()` for `__HIP__` causes the following error: ``` pytorch/c10/util/TypeCast.h:135:7: error: reference to __device__ function '__assert_fail' in __host__ __device__ function ERROR_UNSUPPORTED_CAST ^ pytorch/c10/util/TypeCast.h:118:32: note: expanded from macro 'ERROR_UNSUPPORTED_CAST' #define ERROR_UNSUPPORTED_CAST CUDA_KERNEL_ASSERT(false); ^ pytorch/c10/macros/Macros.h:392:5: note: expanded from macro 'CUDA_KERNEL_ASSERT' __assert_fail( ``` [3] Maybe there's a way to properly define `__assert_fail()` for HIP + NDEBUG, but this might be too much. Please let me just disable it. ### Technical details Error ``` pytorch/c10/macros/Macros.h:368:5: error: __host__ __device__ function '__assert_fail' cannot overload __device__ function '__assert_fail' __assert_fail( ^ /opt/rocm/hip/include/hip/amd_detail/amd_device_functions.h:1173:6: note: previous declaration is here void __assert_fail(const char *assertion, ``` CUDA definition (9.x) of `__assert_fail()` ``` #elif defined(__GNUC__) extern __host__ __device__ __cudart_builtin__ void __assert_fail( const char *, const char *, unsigned int, const char *) __THROW; ``` ROCm definition (the latest version) ``` // https://github.com/ROCm-Developer-Tools/hipamd/blob/2b59661f3e51060338e0cd1e488a03b1aefdeb0c/include/hip/amd_detail/amd_device_functions.h#L1172-L1177 extern "C" __device__ __attribute__((noinline)) __attribute__((weak)) void __assert_fail(const char *assertion, const char *file, unsigned int line, const char *function); ``` Test Plan: CI + reproducer ``` python3 tools/amd_build/build_amd.py python3 setup.py develop --cmake-only cmake -DHIP_HIPCC_FLAGS_RELEASE="-DNDEBUG" build cmake --build build ``` Reviewed By: xw285cornell Differential Revision: D34310555 fbshipit-source-id: 7542288912590533ced3f20afd2e704b6551991b (cherry picked from commit 9e52196) Co-authored-by: Shintaro Iwasaki <[email protected]>
1 parent 663c718 commit 89eacb3

File tree

3 files changed

+7
-11
lines changed

3 files changed

+7
-11
lines changed

c10/macros/Macros.h

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -322,14 +322,14 @@ constexpr uint32_t CUDA_THREADS_PER_BLOCK_FALLBACK = 256;
322322
// even when NDEBUG is defined. This is useful for important assertions in CUDA
323323
// code that would otherwise be suppressed when building Release.
324324
#if defined(__ANDROID__) || defined(__APPLE__) || defined(__XROS__) || \
325-
(defined(USE_ROCM) && ROCM_VERSION < 40100)
325+
defined(USE_ROCM)
326326
// Those platforms do not support assert()
327327
#define CUDA_KERNEL_ASSERT(cond)
328328
#elif defined(_MSC_VER)
329329
#if defined(NDEBUG)
330330
extern "C" {
331331
C10_IMPORT
332-
#if defined(__CUDA_ARCH__) || defined(__HIP_ARCH__) || defined(__HIP__)
332+
#if defined(__CUDA_ARCH__)
333333
__host__ __device__
334334
#endif // __CUDA_ARCH__
335335
void
@@ -350,8 +350,7 @@ extern SYCL_EXTERNAL void __assert_fail(
350350
unsigned int line,
351351
const char* func);
352352
#else // __SYCL_DEVICE_ONLY__
353-
#if (defined(__CUDA_ARCH__) && !(defined(__clang__) && defined(__CUDA__))) || \
354-
defined(__HIP_ARCH__) || defined(__HIP__)
353+
#if (defined(__CUDA_ARCH__) && !(defined(__clang__) && defined(__CUDA__)))
355354
__host__ __device__
356355
#endif // __CUDA_ARCH__
357356
void

cmake/public/LoadHIP.cmake

Lines changed: 2 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -221,13 +221,8 @@ if(HIP_FOUND)
221221
find_package_and_print_version(hipcub REQUIRED)
222222
find_package_and_print_version(rocthrust REQUIRED)
223223

224-
if(ROCM_VERSION_DEV VERSION_GREATER_EQUAL "4.1.0")
225-
message("ROCm version >= 4.1; enabling asserts")
226-
else()
227-
# Disable Asserts In Code (Can't use asserts on HIP stack.)
228-
add_definitions(-DNDEBUG)
229-
message("ROCm version < 4.1; disablng asserts")
230-
endif()
224+
# Disable Asserts In Code (Can't use asserts on HIP stack.)
225+
add_definitions(-DNDEBUG)
231226

232227
if(HIP_COMPILER STREQUAL clang)
233228
set(hip_library_name amdhip64)

test/test_cuda.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1988,6 +1988,8 @@ def worker(rank):
19881988
t2.start()
19891989
"""])
19901990

1991+
# ROCm doesn't support device side asserts
1992+
@skipIfRocm
19911993
def test_fixed_cuda_assert_async(self):
19921994
with self.assertRaisesRegex(RuntimeError, "Boolean value of Tensor with no values is ambiguous"):
19931995
torch._assert_async(torch.tensor([], device="cuda"))

0 commit comments

Comments
 (0)