Skip to content

[Layout] Layout verification failed #3804

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mciprian13 opened this issue Nov 21, 2019 · 3 comments
Closed

[Layout] Layout verification failed #3804

mciprian13 opened this issue Nov 21, 2019 · 3 comments
Assignees

Comments

@mciprian13
Copy link
Contributor

Now I don`t know why this happens (a previous version of Glow did not exhibit this) but I get an error during Layout verification in this portion of graph:

Untitled

I get the following error:

In 'dense_1' From 'model.onnx'
input 0
MatMul
name : dense_1
LHS : float<10 x 1280 x 1 x 1>
RHS : float<1280 x 50>
users : 1
Result : float<10 x 50>

Mismatching layouts:
Provided layout
Layout: NCHW [name = N : alignment = 1 : index = 0, name = C : alignment = 1 : index = 1, name = H : alignment = 1 : index = 2, name = W : alignment = 1 : index = 3]
Expected layout
Layout: NHWC [name = N : alignment = 1 : index = 0, name = H : alignment = 1 : index = 1, name = W : alignment = 1 : index = 2, name = C : alignment = 1 : index = 3]
From 'model/keurig.onnx'
Expected correct Glow canonical layouts for the graph
For comparison `LHS Equal RHS` with:
LHS: 0
RHS: 1

Im new to this Layout verification thing. I dont understand the problem here and the solution.
Can you please hint the problem/solutin? (I think @shajrawi is the author of this).

Thanks!

@shajrawi
Copy link
Contributor

Hmm, not a lot to go on, but: there are two stages in the Glow compilation pipeline: pre-backend-specific optimizations and post-lowering. Layout verification is disabled (by default), so a private backend should not fail with this, unless said backend enabled it (setting bool enabled_ to true). This leads me to assume that you most probably encountered that either:

  1. pre-lowering: this is the stage wherein Glow uses its "canonical" NHWC layout everywhere. based on the error message you got, one of the inputs is given in NCHW format so the verifier fails.
    Unless there's an implementation bug somewhere, One should never see a NCHW layout pre-lowering with the exception of placeholders/constants imported into glow in said format. If such a storage location is imported in NCHW, it is currently the loader's responsibility to add a transpose/reshape layer that moves into NHWC. That, and to mark said placeholder/constant as NCHW instead of the NHWC default value.
  2. post-lowering on the OpenCL backend: OpenCL is the only 'real' upstream backend that does non-canonical layouts, NHWC could have been transformed into NCHW during lowering. Verification is enabled by default on OpenCL.

Whatever it is 1 or 2 above, what's happening here is that we're searching for the input layout for the operation: going back the graph until we find a node that is layout-aware and/or layout-modifying, for some reason, said input (in the bigger graph/picture that I don't have in this issue), which is the output of another node, is in NCHW format which should never happen - the verifier caught a bug: without a test-case, my gut tells me the bug is either:

  1. In a weird pipleline pass that's not upstream
  2. In an upstream Graph Optimizer pass
  3. In the graph loader having inconsistency about a constant/placeholder

Without a reproducible test-case, I cannot narrow it down further.

@mciprian13
Copy link
Contributor Author

mciprian13 commented Nov 21, 2019

I forgot to mention I hit the problem while profiling the model with image-classifier in preparation for quantization. The problem is the same for both the standard CPU and Interpreter backend (I`m not using a private backend).

I cannot share the ONNX model since it`s something private/custom that is why I cropped a part of the model to show you.

@shajrawi
Copy link
Contributor

I assume it is pre-lowering then. during the canonical tensor layout phase.

Most likely, either our ONNX loader has a bug (not adding the layout information to an input storage or adding a bad reshape/transpose) or you caught a real bug in our pre-lowering graph optimizer.

Without an access to (a trimmed-down?) test-case, It is next to impossible for me figure it out, I'd suggest adding a breakpoint on TensorLayoutCommon::getNthInputLayoutRequirements (line 445 in lib/Graph/TensorLayout.cpp) and track down where we got that NCHW from..

@shajrawi shajrawi self-assigned this Dec 2, 2019
vdantu pushed a commit to vdantu/glow that referenced this issue Jul 12, 2020
…3832)

Summary:
While they are not data parallel, they can operate on any order of dimensions.

Fixes pytorch#3804
Pull Request resolved: pytorch#3832

Test Plan:
```
model-compiler -backend=CPU -model=model1.onnx -emit-bundle=build
model-compiler -backend=CPU -model=model2.onnx -emit-bundle=build
```

Differential Revision: D18765346

Pulled By: shajrawi

fbshipit-source-id: c44713036b00918ac14ef62f117982d0dfcfc062
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants