From 600383219291dc5ad6ffe6129798bc4dc143aca9 Mon Sep 17 00:00:00 2001 From: Nicolas Hug Date: Thu, 27 May 2021 01:41:48 -0700 Subject: [PATCH] Fix DeformConvTester::test_backward_cuda Summary: `test_backward_cuda_contiguous` and `test_backward_cuda_non_contiguous` have been failing on fbcode for a while with the following error `too many resources requested for launch` which suggests that too may threads per block are requested. This issue was already causing problems in the original PR https://github.com/pytorch/vision/pull/2791#issuecomment-711268155, where the author decided that CC >= 6 was a good threshold because with CC >= 6 GPUs have more registers. (CC = Compute Capability) However, I'm not certain that this is actually true: if we look at https://en.wikipedia.org/wiki/CUDA#Version_features_and_specifications, it's clear that 6.2 has less registers per thread block than 6.0. So I'm not sure this threshold completely makes sense. Moreover, let's note that that the current tests (as on `master`): - **pass** on OSS linux CI which rely on a P4 GPU (up to last week), i.e. **CC = 6.1** - **pass** on OSS windows CI which relies on a T4 GPU, i.e. **CC = 7.5** - **fail** on the AWS cluster which relies on a V100 GPU, i.e. **CC = 7.0** It is quite unclear to me what kind of resource is "enough" for the tests to pass on both 6.1 and 7.5 but not on 7.0. As a result, I think it's safer to just reduce the number of threads per block, irrespective of the CC. ngimel, fmassa suggested that I tag you here since you could have some valuable insight for us. Thanks! Reviewed By: fmassa Differential Revision: D28641626 fbshipit-source-id: 2618c366c5d18bbb7ebafc33032e7ac6c0404d0b --- torchvision/csrc/ops/cuda/deform_conv2d_kernel.cu | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/torchvision/csrc/ops/cuda/deform_conv2d_kernel.cu b/torchvision/csrc/ops/cuda/deform_conv2d_kernel.cu index 6f257322b85..2ea5e43f146 100644 --- a/torchvision/csrc/ops/cuda/deform_conv2d_kernel.cu +++ b/torchvision/csrc/ops/cuda/deform_conv2d_kernel.cu @@ -85,10 +85,7 @@ inline unsigned int GET_THREADS() { #ifdef __HIP_PLATFORM_HCC__ return 256; #endif - if (at::cuda::getCurrentDeviceProperties()->major >= 6) { - return 1024; - } - return 512; + return 512; } inline unsigned int GET_BLOCKS(