Skip to content

🐛 [Bug] torchvision.ops.roi_align is throwing an internal assert failed #1157

Closed
@BrettRyland

Description

@BrettRyland

Bug Description

torchvision.ops.roi_align is throwing an internal assert failure when compiling a model using it with torch_tensorrt.compile.

To Reproduce

Repro script: trt_bug3.py

brett@br-workhorse:~/repos/Autosensor/NN/tmp$ python3 trt_bug3.py 
torch.Size([20, 3, 6, 9])
torch.Size([20, 3, 6, 9])
WARNING: [Torch-TensorRT] - Cannot infer input type from calcuations in graph for input x.1. Assuming it is Float32. If not, specify input type explicity
WARNING: [Torch-TensorRT] - Input type for doing shape analysis could not be determined, defaulting to F32
WARNING: [Torch-TensorRT] - Truncating graph input type from at::kLong to at::kInt
WARNING: [Torch-TensorRT] - Truncating graph input type from at::kLong to at::kInt
WARNING: [Torch-TensorRT] - Truncating graph input type from at::kLong to at::kInt
WARNING: [Torch-TensorRT] - Truncating graph input type from at::kLong to at::kInt
WARNING: [Torch-TensorRT] - Truncating graph input type from at::kLong to at::kInt
WARNING: [Torch-TensorRT] - Truncating graph input type from at::kLong to at::kInt
WARNING: [Torch-TensorRT] - Truncating graph input type from at::kLong to at::kInt
WARNING: [Torch-TensorRT] - Truncating graph input type from at::kLong to at::kInt
WARNING: [Torch-TensorRT] - Truncating graph input type from at::kLong to at::kInt
WARNING: [Torch-TensorRT] - Truncating graph input type from at::kLong to at::kInt
WARNING: [Torch-TensorRT] - Truncating graph input type from at::kLong to at::kInt
WARNING: [Torch-TensorRT] - Truncating graph input type from at::kLong to at::kInt
WARNING: [Torch-TensorRT] - Truncating graph input type from at::kLong to at::kInt
WARNING: [Torch-TensorRT] - Truncating graph input type from at::kLong to at::kInt
WARNING: [Torch-TensorRT] - Truncating graph input type from at::kLong to at::kInt
WARNING: [Torch-TensorRT] - Truncating graph input type from at::kLong to at::kInt
WARNING: [Torch-TensorRT TorchScript Conversion Context] - TensorRT was linked against cuBLAS/cuBLAS LT 11.8.0 but loaded cuBLAS/cuBLAS LT 111.0.1
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size
WARNING: [Torch-TensorRT TorchScript Conversion Context] - TensorRT was linked against cuBLAS/cuBLAS LT 11.8.0 but loaded cuBLAS/cuBLAS LT 111.0.1
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size
WARNING: [Torch-TensorRT TorchScript Conversion Context] - TensorRT was linked against cuBLAS/cuBLAS LT 11.8.0 but loaded cuBLAS/cuBLAS LT 111.0.1
WARNING: [Torch-TensorRT TorchScript Conversion Context] - Unused Input: input_1
WARNING: [Torch-TensorRT TorchScript Conversion Context] - [RemoveDeadLayers] Input Tensor input_1 is unused or used only at compile-time, but is not being removed.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - TensorRT was linked against cuBLAS/cuBLAS LT 11.8.0 but loaded cuBLAS/cuBLAS LT 111.0.1
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size
WARNING: [Torch-TensorRT TorchScript Conversion Context] - Unused Input: input_3
WARNING: [Torch-TensorRT TorchScript Conversion Context] - [RemoveDeadLayers] Input Tensor input_3 is unused or used only at compile-time, but is not being removed.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - TensorRT was linked against cuBLAS/cuBLAS LT 11.8.0 but loaded cuBLAS/cuBLAS LT 111.0.1
WARNING: [Torch-TensorRT TorchScript Conversion Context] - TensorRT was linked against cuBLAS/cuBLAS LT 11.8.0 but loaded cuBLAS/cuBLAS LT 111.0.1
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size
WARNING: [Torch-TensorRT TorchScript Conversion Context] - TensorRT was linked against cuBLAS/cuBLAS LT 11.8.0 but loaded cuBLAS/cuBLAS LT 111.0.1
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size
WARNING: [Torch-TensorRT TorchScript Conversion Context] - TensorRT was linked against cuBLAS/cuBLAS LT 11.8.0 but loaded cuBLAS/cuBLAS LT 111.0.1
WARNING: [Torch-TensorRT TorchScript Conversion Context] - Unused Input: input_1
WARNING: [Torch-TensorRT TorchScript Conversion Context] - [RemoveDeadLayers] Input Tensor input_1 is unused or used only at compile-time, but is not being removed.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - TensorRT was linked against cuBLAS/cuBLAS LT 11.8.0 but loaded cuBLAS/cuBLAS LT 111.0.1
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size
WARNING: [Torch-TensorRT TorchScript Conversion Context] - Unused Input: input_3
WARNING: [Torch-TensorRT TorchScript Conversion Context] - [RemoveDeadLayers] Input Tensor input_3 is unused or used only at compile-time, but is not being removed.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - TensorRT was linked against cuBLAS/cuBLAS LT 11.8.0 but loaded cuBLAS/cuBLAS LT 111.0.1
Traceback (most recent call last):
  File "/home/brett/repos/Autosensor/NN/tmp/trt_bug3.py", line 79, in <module>
    trt_m = torch_tensorrt.compile(scr_m, inputs=[torch_tensorrt.Input((2, 3, 10, 10)), ], truncate_long_and_double=True)
  File "/home/brett/.local/lib/python3.10/site-packages/torch_tensorrt/_compile.py", line 115, in compile
    return torch_tensorrt.ts.compile(ts_mod, inputs=inputs, enabled_precisions=enabled_precisions, **kwargs)
  File "/home/brett/.local/lib/python3.10/site-packages/torch_tensorrt/ts/_compiler.py", line 113, in compile
    compiled_cpp_mod = _C.compile_graph(module._c, _parse_compile_spec(spec))
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
  File "/home/brett/.local/lib/python3.10/site-packages/torchvision/ops/_utils.py", line 30, in forward
def check_roi_boxes_shape(boxes: Union[Tensor, List[Tensor]]):
    if isinstance(boxes, (list, tuple)):
        for _tensor in boxes:
        ~~~~~~~~~~~~~~~~~~~~~
            assert (
            ~~~~~~~~
                _tensor.size(1) == 4
                ~~~~~~~~~~~~~~~~~~~~
            ), "The shape of the tensor in the boxes list is not correct as List[Tensor[L, 4]]"
            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    elif isinstance(boxes, torch.Tensor):
        assert boxes.size(1) == 5, "The boxes tensor shape is not correct as Tensor[K, 5]"
RuntimeError: isList()INTERNAL ASSERT FAILED at "/home/brett/github/pytorch/aten/src/ATen/core/ivalue_inl.h":1884, please report a bug to PyTorch. Expected GenericList but got Tensor

Expected behavior

Expected graph compilation to succeed.

Environment

brett@br-workhorse:~/repos/Autosensor/NN/tmp$ python3 -c "from torch.utils.collect_env import main; main()"
Collecting environment information...
PyTorch version: 1.11.0a0+gitbc2c6ed
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04 LTS (x86_64)
GCC version: (Ubuntu 11.2.0-19ubuntu1) 11.2.0
Clang version: Could not collect
CMake version: version 3.22.1
Libc version: glibc-2.35

Python version: 3.10.4 (main, Apr  2 2022, 09:04:19) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-37-generic-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 11.7.64
GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1080 Ti
Nvidia driver version: 515.48.07
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.3.2
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.3.2
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.21.6
[pip3] numpy-quaternion==2022.4.1
[pip3] pytorch-ranger==0.1.1
[pip3] torch==1.11.0
[pip3] torch-optimizer==0.1.0
[pip3] torch-tensorrt==1.2.0a0+22d91f5e
[pip3] torchmetrics==0.7.3
[pip3] torchsummary==1.5.1
[pip3] torchvision==0.12.0
[conda] Could not collect

TensorRT is built from 91a92ca with PR #1067 merged in

brett@br-workhorse:~/github/TensorRT$ git log --oneline --graph
* 22d91f5e (HEAD -> master) fix: fix the bug that tag Constant node as fallback node
*   ccb826e7 Merge remote-tracking branch 'origin' into refactor_segmentation
|\  
| * 91a92ca4 (origin/master, origin/HEAD) docs: [Automated] Regenerating documenation for dcf3386
| *   dcf3386e Merge pull request #1057 from pytorch/anuragd/reduce_nox_sessions

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions