Skip to content

Single Channel GaussianBlur over 23x23 kernels fails on Windows #5464

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
malfet opened this issue Feb 24, 2022 · 3 comments
Open

Single Channel GaussianBlur over 23x23 kernels fails on Windows #5464

malfet opened this issue Feb 24, 2022 · 3 comments

Comments

@malfet
Copy link
Contributor

malfet commented Feb 24, 2022

🐛 Describe the bug

See https://app.circleci.com/pipelines/github/pytorch/vision/14951/workflows/dd6ed737-f31d-4baa-8f42-9e69d55650aa/jobs/1205316

For some reason the same code works with CUDA-11.1, but fails with 11.3, which makes me highly suspicious of bug in cuDNN side

cuda-memcheck reports invalid memory access in `cudnnConvolutionForward ` call
(C:\Users\circleci\project\env) C:\Users\circleci\project\test>"c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\cuda-memcheck.exe" pytest test_transforms_tensor.py -k test_gaussian_blur[1-meth_kwargs1
========= CUDA-MEMCHECK
================================================================================== test session starts ===================================================================================
platform win32 -- Python 3.8.12, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
rootdir: C:\Users\circleci\project, configfile: pytest.ini
plugins: cov-3.0.0, mock-3.6.1
collected 3066 items / 3064 deselected / 2 selected

test_transforms_tensor.py .FE                                                                                                                                                       [100%]

========================================================================================= ERRORS =========================================================================================
______________________________________________________________ ERROR at teardown of test_gaussian_blur[1-meth_kwargs1-cuda] ______________________________________________________________
Traceback (most recent call last):
  File "C:\Users\circleci\project\test\conftest.py", line 104, in prevent_leaking_rng
    torch.cuda.set_rng_state(cuda_rng_state)
  File "C:\Users\circleci\project\env\lib\site-packages\torch\cuda\random.py", line 64, in set_rng_state
    _lazy_call(cb)
  File "C:\Users\circleci\project\env\lib\site-packages\torch\cuda\__init__.py", line 155, in _lazy_call
    callable()
  File "C:\Users\circleci\project\env\lib\site-packages\torch\cuda\random.py", line 62, in cb
    default_generator.set_state(new_state_copy)
RuntimeError: CUDA error: unspecified launch failure
======================================================================================== FAILURES ========================================================================================
________________________________________________________________________ test_gaussian_blur[1-meth_kwargs1-cuda] _________________________________________________________________________
Traceback (most recent call last):
  File "C:\Users\circleci\project\test\test_transforms_tensor.py", line 963, in test_gaussian_blur
    _test_class_op(
  File "C:\Users\circleci\project\test\test_transforms_tensor.py", line 85, in _test_class_op
    _test_transform_vs_scripted_on_batch(f, scripted_fn, batch_tensors)
  File "C:\Users\circleci\project\test\test_transforms_tensor.py", line 36, in _test_transform_vs_scripted_on_batch
    transformed_batch = transform(batch_tensors)
  File "C:\Users\circleci\project\env\lib\site-packages\torch\nn\modules\module.py", line 1111, in _call_impl
    return forward_call(*input, **kwargs)
  File "c:\users\circleci\project\torchvision\transforms\transforms.py", line 1817, in forward
    return F.gaussian_blur(img, self.kernel_size, [sigma, sigma])
  File "c:\users\circleci\project\torchvision\transforms\functional.py", line 1326, in gaussian_blur
    output = F_t.gaussian_blur(t_img, kernel_size, sigma)
  File "c:\users\circleci\project\torchvision\transforms\functional_tensor.py", line 774, in gaussian_blur
    img = conv2d(img, kernel, groups=img.shape[-3])
RuntimeError: CUDA error: unspecified launch failure
================================================================================ short test summary info =================================================================================
ERROR test_transforms_tensor.py::test_gaussian_blur[1-meth_kwargs1-cuda] - RuntimeError: CUDA error: unspecified launch failure
FAILED test_transforms_tensor.py::test_gaussian_blur[1-meth_kwargs1-cuda] - RuntimeError: CUDA error: unspecified launch failure
================================================================= 1 failed, 1 passed, 3064 deselected, 1 error in 35.57s =================================================================
========= Invalid __shared__ read of size 4
=========     at 0x00001d10 in volta_scudnn_128x32_3dconv_fprop_xregs_large_nn_v1
=========     by thread (95,0,0) in block (24,0,0)
=========     Address 0x0000250c is out of bounds
=========     Device Frame:volta_scudnn_128x32_3dconv_fprop_xregs_large_nn_v1 (volta_scudnn_128x32_3dconv_fprop_xregs_large_nn_v1 : 0x1d10)
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:C:\Windows\system32\DriverStore\FileRepository\nv_dispswi.inf_amd64_8fb2f986cb3224d8\nvcuda64.dll [0x76888]
=========     Host Frame:C:\Windows\system32\DriverStore\FileRepository\nv_dispswi.inf_amd64_8fb2f986cb3224d8\nvcuda64.dll [0x76bb1]
=========     Host Frame:C:\Windows\system32\DriverStore\FileRepository\nv_dispswi.inf_amd64_8fb2f986cb3224d8\nvcuda64.dll [0x7b0da]
=========     Host Frame:C:\Windows\system32\DriverStore\FileRepository\nv_dispswi.inf_amd64_8fb2f986cb3224d8\nvcuda64.dll (cuProfilerStop + 0x11cc6a) [0x33d9ea]
=========     Host Frame:C:\Windows\system32\DriverStore\FileRepository\nv_dispswi.inf_amd64_8fb2f986cb3224d8\nvcuda64.dll [0x17069d]
=========     Host Frame:C:\Windows\system32\DriverStore\FileRepository\nv_dispswi.inf_amd64_8fb2f986cb3224d8\nvcuda64.dll (cuProfilerStop + 0xf0c72) [0x3119f2]
=========     Host Frame:C:\Windows\system32\DriverStore\FileRepository\nv_dispswi.inf_amd64_8fb2f986cb3224d8\nvcuda64.dll [0x38bdb]
=========     Host Frame:C:\Windows\system32\DriverStore\FileRepository\nv_dispswi.inf_amd64_8fb2f986cb3224d8\nvcuda64.dll [0x390af]
=========     Host Frame:C:\Windows\system32\DriverStore\FileRepository\nv_dispswi.inf_amd64_8fb2f986cb3224d8\nvcuda64.dll [0x39394]
=========     Host Frame:C:\Windows\system32\DriverStore\FileRepository\nv_dispswi.inf_amd64_8fb2f986cb3224d8\nvcuda64.dll (cuLaunchKernel + 0x234) [0x20fc44]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll [0x3896]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll [0x26fd]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll (cask_cudnn::getPlatform + 0xe9) [0x1d54529]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll (cask_cudnn::transformTensor + 0x1bc1) [0x1dc0651]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll (cask_cudnn::transformTensor + 0xbe6a) [0x1dca8fa]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll (cask_cudnn::ConvDgradShader::isSplitK + 0x49b) [0x1ddcd9b]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll (cudnn::backend::Descriptor::initialize_internal + 0x618e) [0x5c67ce]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll (cudnn::backend::Descriptor::initialize_internal + 0x6eb1) [0x5c74f1]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll (cudnn::cnn::EngineInterface::execute + 0x7e) [0x4e163e]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll (cudnn::cnn::EngineContainer<1012,113664>::execute_internal_impl + 0x2a) [0x54f27a]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll (cudnn::cnn::EngineInterface::execute + 0x7e) [0x4e163e]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll (cask_cudnn::TensorDesc::operator== + 0x2d2) [0x544612]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll (cudnn::cnn::EngineContainer<1,4096>::execute_internal_impl + 0xd241) [0x55c4d1]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll (cudnn::cnn::EngineInterface::execute + 0x7e) [0x4e163e]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll (cudnn::backend::execute + 0x103f) [0x54eebf]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll (cudnn::backend::Tensor::Tensor + 0x18b6) [0x5ab246]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll (cudnn::backend::Tensor::Tensor + 0xbe1) [0x5aa571]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll (cudnn::cnn::convolutionForward + 0x10b) [0x65609b]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll (cudnnConvolutionForward + 0x331) [0x657081]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cuda_cpp.dll (at::native::cudnn_convolution_transpose + 0x4263) [0x48863]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cuda_cpp.dll (at::native::cudnn_convolution_transpose + 0x83a7) [0x4c9a7]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cuda_cpp.dll (at::native::cudnn_convolution_transpose + 0x7736) [0x4bd36]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cuda_cpp.dll (at::native::cudnn_convolution_transpose + 0x1ae5) [0x460e5]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cuda_cpp.dll (at::native::cudnn_convolution_transpose + 0x752f) [0x4bb2f]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cuda_cpp.dll (at::native::cudnn_convolution_add_relu + 0x16ec) [0x43fec]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cuda_cpp.dll (at::native::cudnn_convolution + 0xc5) [0x428e5]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cuda_cu.dll (at::cuda::view_as_real + 0x14adc) [0x456680c]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cuda_cu.dll (at::cuda::bucketize_outf + 0x3df7a) [0x450361a]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cpu.dll (at::_ops::cudnn_convolution::call + 0x242) [0x70175b2]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cpu.dll (at::native::_convolution + 0xf5e) [0x692064e]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cpu.dll (at::compositeexplicitautograd::xlogy_ + 0x40e) [0x72d1bee]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cpu.dll (at::compositeexplicitautograd::bmm + 0x1a1ed) [0x72975bd]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cpu.dll (at::_ops::_convolution::call + 0x2d6) [0x6d5a226]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cpu.dll (at::native::convolution + 0x164) [0x6928914]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cpu.dll (at::compositeexplicitautograd::xlogy_ + 0xc6b) [0x72d244b]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cpu.dll (at::compositeexplicitautograd::bmm + 0x1a2ca) [0x729769a]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cpu.dll (at::TensorMaker::make_tensor + 0x88e49) [0x6d40779]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cpu.dll (at::_ops::convolution::redispatch + 0x123) [0x6dc39a3]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cpu.dll (torch::autograd::GraphRoot::apply + 0x157b1) [0x7bfd851]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cpu.dll (torch::autograd::GraphRoot::apply + 0xc6c8) [0x7bf4768]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cpu.dll (at::_ops::convolution::call + 0x26f) [0x6d71b6f]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cpu.dll (at::native::conv2d + 0x1be) [0x69277be]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cpu.dll (at::compositeimplicitautograd::where + 0x1db4) [0x73ac8e4]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cpu.dll (at::compositeimplicitautograd::broadcast_to + 0x2a7a3) [0x738d953]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cpu.dll (at::_ops::conv2d::call + 0x219) [0x70ba239]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_cpu.dll (at::conv2d + 0x64) [0x67106d4]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_python.dll (torch::FunctionSignature::operator= + 0x1096fc) [0x14d77c]
=========     Host Frame:C:\Users\circleci\project\env\lib\site-packages\torch\lib\torch_python.dll (torch::FunctionSignature::operator= + 0x12f7ab) [0x17382b]
=========     Host Frame:C:\Users\circleci\project\env\python38.dll (PyMethodDef_RawFastCallKeywords + 0x410) [0x126fe0]
=========     Host Frame:C:\Users\circleci\project\env\python38.dll (PyObject_MakeTpCall + 0x106) [0x125fa6]
=========     Host Frame:C:\Users\circleci\project\env\python38.dll (PyEval_GetFuncDesc + 0x408) [0x2036b8]
=========
...

cc @peterjc123 @nbcsm @guyang3532 @maxluk @gunandrose4u @mszhanyi @vfdev-5 @datumbox

@malfet malfet added the windows label Feb 24, 2022
@malfet malfet changed the title Single GaussianBlur over 23x23 kernels fails on Windows Single Channel GaussianBlur over 23x23 kernels fails on Windows Feb 24, 2022
@datumbox
Copy link
Contributor

@malfet Thanks for your investigation at #5451 (comment). The gaussianblur uses conv2d so this bug has the potential of being quite significant. I saw that on your analysis you seemed to be familiar with similar previous bugs, could you link to some of them to get more context of what you suspect?

@NicolasHug @jdsgomes @vfdev-5 FYI Nikita disabled the test_gaussian_blur test for the specific configuration to restore the CI while the issue is being investigated. We should restore it once the issue is fixed.

@malfet
Copy link
Contributor Author

malfet commented Feb 24, 2022

@datumbox move just exposed the bug that existed for a while, but indeed the test code indicates that problem is somewhere in pytorch core, as test simply calls torch.nn.conv2d
@ptrblck , @ngimel do you recall something of the nature?

@ptrblck
Copy link
Contributor

ptrblck commented Feb 24, 2022

Based on the compute-sanitizer output it's clearly an illegal memory access. I haven't seen one in this kernel so far, but will try to repro and forward to cuDNN.

@datumbox datumbox added the cuda label Feb 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants