[OpenCL] Convolution NHWC vs. NCHW mismatch #3815

pjaaskel · 2019-11-23T09:47:03Z

After fixing Issue #3802, inception_v1 now fails due to a mismatching layouts error. Should there be automatic conversion operation insertion in such a case?

./bin/image-classifier -backend=OpenCL tests/images/imagenet/dog_207.png -expected-labels=207 -image-mode=0to255 -m=../build-llvm-7/inception_v1 -model-input-name=data 
Model: ../build-llvm-7/inception_v1
Running 1 thread(s).



In 'conv1_7x7_s2__6' From '../build-llvm-7/inception_v1'
input 0
Convolution
name : conv1_7x7_s2__6
Input : float<1 x 3 x 224 x 224>
Filter : float<64 x 3 x 7 x 7>
Bias : float<64>
Kernels : [7, 7]
Strides : [2, 2]
Pads : [3, 3, 3, 3]
Group : 1
Dilation : 1
Layout : NCHW
FusedActivation : 
users : 1
Result : float<1 x 64 x 112 x 112>

Mismatching layouts:
Provided layout
Layout: NHWC [name = N : alignment = 1 : index = 0, name = H : alignment = 1 : index = 1, name = W : alignment = 1 : index = 2, name = C : alignment = 1 : index = 3]
Expected layout
Layout: NCHW [name = N : alignment = 1 : index = 0, name = C : alignment = 1 : index = 1, name = H : alignment = 1 : index = 2, name = W : alignment = 1 : index = 3]
From '../build-llvm-7/inception_v1'
Expected correct backend-specific layouts for the graph
For comparison `LHS Equal RHS` with:
LHS: 0
RHS: 1
WARNING: Logging before InitGoogleLogging() is written to STDERR
F1123 11:26:22.983803 15371 Error.cpp:119] exitOnError(Error) got an unexpected ErrorValue: 
Error code: COMPILE_UNSUPPORTED_NODE_AFTER_OPTIMIZE
Error message: Unsupported node(s) found after optimizing Function ../build-llvm-7/inception_v1 for backend OpenCL
Error return stack:
../lib/Optimizer/GraphOptimizer/GraphOptimizer.cpp:3293
../lib/Partitioner/Partitioner.cpp:401
../tools/loader/Loader.cpp:505
*** Check failure stack trace: ***
Aborted (core dumped)

Any pointers? We can take a look if someone pokes us to the right direction.

The text was updated successfully, but these errors were encountered:

pjaaskel · 2019-11-28T15:43:41Z

This is actually a regression introduced with commit ec46f24 by @shajrawi and affects also squeezenet. Reverting this commit fixes the issue.

…ed bugs. (pytorch#3503)" Due to: pytorch#3815 This reverts commit ec46f24.

pjaaskel · 2019-12-07T11:01:45Z

This issue is still there with OpenCL and can be reproduced with Squeezenet.

XiZiler · 2019-12-26T07:43:49Z

and the c++ ResNet demo is also affected. Following is what I do.

compile for OpenCL, Release version:

cmake -G Ninja -DCMAKE_BUILD_TYPE=Release -DGLOW_WITH_CPU=1 -DGLOW_WITH_OPENCL=1 ../glow
ninja all -j 64

check GPU info:

clinfo

Platform Name NVIDIA CUDA
Number of devices 1
Device Name GeForce RTX 2080 Ti
Device Vendor NVIDIA Corporation
Device Vendor ID 0x10de
Device Version OpenCL 1.2 CUDA
Driver Version 430.26
Device OpenCL C Version OpenCL C 1.2
Device Type GPU
Device Topology (NV) PCI-E, 65:00.0

run ResNet50 demo:

./bin/resnet-runtime -backend=OpenCL -num-devices=1 -image-layout=NCHW

and get the result like this:

WARNING: Logging before InitGoogleLogging() is written to STDERR
I1226 15:26:25.958235 21561 resnet-runtime.cpp:130] Initializing 1 OpenCL devices on HostManager.
I1226 15:26:26.116268 21561 resnet-runtime.cpp:78] Loading resnet50 model.

In 'gpu_0_conv1__5' From 'resnet500'
input 0
Convolution
name : gpu_0_conv1__5
Input : float<1 x 3 x 224 x 224>
Filter : float<64 x 3 x 7 x 7>
Bias : float<64>
Kernels : [7, 7]
Strides : [2, 2]
Pads : [3, 3, 3, 3]
Group : 1
Dilation : 1
Layout : NCHW
FusedActivation :
users : 1
Result : float<1 x 64 x 112 x 112>

Mismatching layouts:
Provided layout
Layout: NHWC [name = N : alignment = 1 : index = 0, name = H : alignment = 1 : index = 1, name = W : alignment = 1 : index = 2, name = C : alignment = 1 : index = 3]
Expected layout
Layout: NCHW [name = N : alignment = 1 : index = 0, name = C : alignment = 1 : index = 1, name = H : alignment = 1 : index = 2, name = W : alignment = 1 : index = 3]
From 'resnet500'
Expected correct backend-specific layouts for the graph
For comparison LHS Equal RHS with:
LHS: 0
RHS: 1
F1226 15:26:27.040071 21561 Error.cpp:119] exitOnError(Error) got an unexpected ErrorValue:
Error code: COMPILE_UNSUPPORTED_NODE_AFTER_OPTIMIZE
Error message: Unsupported node(s) found after optimizing Function resnet500 for backend OpenCL
Error return stack:
/home/cambricon/workspace_xi/ATC/glow/lib/Optimizer/GraphOptimizer/GraphOptimizer.cpp:3523
/home/cambricon/workspace_xi/ATC/glow/lib/Partitioner/Partitioner.cpp:405
/home/cambricon/workspace_xi/ATC/glow/examples/resnet-runtime.cpp:165
*** Check failure stack trace: ***
./run.sh: line 1: 21561 Aborted (core dumped) ./bin/resnet-runtime -backend=OpenCL -num-devices=1 -image-layout=NCHW

revert GLOW version to the parent of ec46f24, the parent's commid ID is : bd69664

git reset --hard bd69664e1aae6f96ce84071bdcb9bef9180d6743

build and run:

cmake -G Ninja -DCMAKE_BUILD_TYPE=Release -DGLOW_WITH_CPU=1 -DGLOW_WITH_OPENCL=1 ../glow
ninja all -j 64

./bin/resnet-runtime -backend=OpenCL -num-devices=1 -image-layout=NCHW

and it crushed after finish classifying images. So sad.

I1226 15:38:27.883890 23036 resnet-runtime.cpp:78] Loading resnet50 model.
I1226 15:38:28.876335 23036 resnet-runtime.cpp:164] Loading files from ../glow/tests/images/imagenet/
I1226 15:38:28.882439 23036 resnet-runtime.cpp:122] Started run ID: 0
I1226 15:38:28.887506 23036 resnet-runtime.cpp:122] Started run ID: 1
I1226 15:38:28.892138 23036 resnet-runtime.cpp:122] Started run ID: 2
(0) ../glow/tests/images/imagenet/cat_285.png: 281
(1) ../glow/tests/images/imagenet/dog_207.png: 207
(2) ../glow/tests/images/imagenet/zebra_340.png: 340
I1226 15:38:28.935782 23036 resnet-runtime.cpp:215] Finished classifying 3 images.

Thread 2 "resnet-runtime" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff5301700 (LWP 23040)]
0x00007fffee574560 in ?? () from /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.1
(gdb) bt
#0 0x00007fffee574560 in ?? () from /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.1
#1 0x00007fffee385373 in ?? () from /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.1
#2 0x00007fffee3736f5 in ?? () from /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.1
#3 0x00007fffee373ae7 in ?? () from /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.1
#4 0x00007fffee384720 in ?? () from /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.1
#5 0x0000000001fbf8fe in glow::runtime::OpenCLBuffer::~OpenCLBuffer() ()
#6 0x0000000001fc73cd in std::_Sp_counted_deleter<glow::runtime::OpenCLBuffer*, std::__shared_ptr<glow::runtime::OpenCLBuffer, (__gnu_cxx::_Lock_policy)2>::_Deleter<std::allocatorglow::runtime::OpenCLBuffer >, std::allocatorglow::runtime::OpenCLBuffer, (__gnu_cxx::_Lock_policy)2>::_M_dispose() ()
#7 0x0000000001fc26dc in glow::runtime::OpenCLDeviceManager::evictNetworkImpl(std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::function<void (std::__cxx11::basic_string<char, std::char_traits, std::allocator >, glow::detail::GlowError)>) ()
#8 0x000000000065ef48 in glow::runtime::QueueBackedDeviceManager::evictNetwork(std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::function<void (std::__cxx11::basic_string<char, std::char_traits, std::allocator >, glow::detail::GlowError)>)::{lambda()#1}::operator()() const ()
#9 0x000000000065ee2a in std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<glow::runtime::QueueBackedDeviceManager::evictNetwork(std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::function<void (std::__cxx11::basic_string<char, std::char_traits, std::allocator >, glow::detail::GlowError)>)::{lambda()#1}, std::allocator, void ()>::_M_run()::{lambda()#1}, void> >::_M_invoke(std::_Any_data const&) ()
#10 0x00000000004b7257 in std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>, bool) ()
#11 0x00007ffff6b00827 in __pthread_once_slow (once_control=0x9d59408, init_routine=0x7ffff602d830 <__once_proxy>) at pthread_once.c:116
#12 0x000000000065ecd1 in std::__future_base::_Task_state<glow::runtime::QueueBackedDeviceManager::evictNetwork(std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::function<void (std::__cxx11::basic_string<char, std::char_traits, std::allocator >, glow::detail::GlowError)>)::{lambda()#1}, std::allocator, void ()>::_M_run() ()
#13 0x00000000021c81b0 in glow::ThreadExecutor::threadPoolWorkerMain() ()
#14 0x00007ffff602e66f in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#15 0x00007ffff6af86db in start_thread (arg=0x7ffff5301700) at pthread_create.c:463
#16 0x00007ffff5a8988f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

XiZiler · 2019-12-28T06:58:15Z

[Update] I just remove related codes for verifying layout, it works and get the same result as CPU backend's. I don't know if this will lead to other problems.

This is what I removed in the file ./lib/Optimizer/GraphOptimizer/GraphOptimizer.cpp:

glow::optimizeFunction(Function *F, const Backend &B,CompilationContext &cctx) {
     ...
    
     // if (!B.verify(*F)) {
     //   return MAKE_ERR(
     //       ErrorValue::ErrorCode::COMPILE_UNSUPPORTED_NODE_AFTER_OPTIMIZE,
     //       "Unsupported node(s) found after optimizing Function " +
     //           F->getName().str() + " for backend " + B.getBackendName());
     // }
    return Error::success();
}

It seems have done all the lowering and optimization, and crushed during the very last verify of layouts. So I removed the last verify and it can still work.

…osing the input to NCHW (pytorch#3951) Summary: See the in-source comment for workaround information, but: We have a model we load from the outside with no-way of knowing the constant/placeholder input layout, the default assumption for 4-D tensors (images) is NHWC format which is the canonical Glow format, PNG files are in NHWC format. Our image loader, when using the `image-layout` flag, transposes the image outside the Glow graph, since there's no easy way to propagate that information, weaken the OpenCL verifier, not the canonical verifier: for placeholders and constants, assume that the loader knows what it is doing and they are in the right format. Fixes pytorch#3815 Pull Request resolved: pytorch#3951 Test Plan: `ninja test` Differential Revision: D19252774 Pulled By: shajrawi fbshipit-source-id: f850c504245ee947794446b144b00df635a68497

pjaaskel mentioned this issue Nov 30, 2019

[Regression] model-runner not able to execute convolution single node graph #3826

Closed

pjaaskel added a commit to parmance/glow that referenced this issue Dec 7, 2019

Revert "Teach Glow how to support layout requirements and fix uncover…

9e1e1c9

…ed bugs. (pytorch#3503)" Due to: pytorch#3815 This reverts commit ec46f24.

shajrawi mentioned this issue Dec 30, 2019

[OpenCL] Fix a bug when running an external model with the image-loader transposing the input to NCHW #3951

Closed

facebook-github-bot closed this as completed in 37c3bd0 Dec 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[OpenCL] Convolution NHWC vs. NCHW mismatch #3815

[OpenCL] Convolution NHWC vs. NCHW mismatch #3815

pjaaskel commented Nov 23, 2019

pjaaskel commented Nov 28, 2019

Uh oh!

pjaaskel commented Dec 7, 2019

Uh oh!

XiZiler commented Dec 26, 2019

Uh oh!

XiZiler commented Dec 28, 2019

Uh oh!

[OpenCL] Convolution NHWC vs. NCHW mismatch #3815

[OpenCL] Convolution NHWC vs. NCHW mismatch #3815

Comments

pjaaskel commented Nov 23, 2019

pjaaskel commented Nov 28, 2019

Uh oh!

pjaaskel commented Dec 7, 2019

Uh oh!

XiZiler commented Dec 26, 2019

Uh oh!

XiZiler commented Dec 28, 2019

Uh oh!