Skip to content

[TensorLayout] Propagate the input layout requirements for convertTo nodes #3831

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

shajrawi
Copy link
Contributor

@shajrawi shajrawi commented Dec 2, 2019

Summary:
The layout requirements for convertTo nodes should be the same as the requirement of their inputs. exact same behavior as quantization nodes. For example, if the input is in NCHW format, we should propagate that information.

Fixes #3826

Test Plan:

Test-case from the issue:
model-runner -m predict_net.pb -m init_net.pb -backend=Interpreter -convert-to-fp16 
Model: predict_net.pb
shape: ( 1 64 112 112 )
max: 328.750  min: -342.750
[[[[-4.305, -1.395, -1.973, -55.750, -28.594, 30.094, 1.790, -25.000, 6.543, 0.682, -17.844, 63.406, -13.500, -17.703, -63.969, 9.273, 28.141, 12.992, -30.406, -8.078, 3.312, 24.797, 2.342, 17.484, 13.156, -3.906, -52.469, -41.531, -4.105, 15.273, -9.305, -13.758, 28.453, -6.852, 13.828, -7.699, 3.133, -10.570, -12.523, -52.469, 7.520, -5.211, 9.406, -14.516, 0.541, -2.070, -1.676, -32.312, -4.531, 12.336, -26.359, 43.219, 42.219, 30.422, 3.301, -38.750, 14.617, 38.750, -29.219, -50.719, 0.854, -0.113, 4.035, -1.172, -23.875, -15.938, -8.805, 67.875, 7.152, -16.422, 56.875, -3.996, -42.562, 27.516, 6.699, -33.281, 28.078, -1.342, -6.727, -3.949, -23.953, 11.305, -29.656, -32.094, -67.562, 34.406, -38.656, 40.719, 31.188, 22.047, -83.938, -20.734, -5.492, -10.516, -11.422, 10.211, -13.719, 14.133, -31.797, 3.926, ...]

@mciprian13
Copy link
Contributor

mciprian13 commented Dec 2, 2019

Just a question: I confess I do not quite understand the nature of this problem, but I wonder whether this logic for propagating this layout information must be "manually" implemented for each new implemented node? If yes, is there a way to enforce this when a new node is added ... such that we dont end up with nodes which wont work because of missing this mechanism?

@opti-mix
Copy link
Contributor

opti-mix commented Dec 2, 2019

@shajrawi The PR looks fine. Thanks for the quick fix!

But I agree with @mciprian13's comment. It would be nice to have a more straight-forward way to specify any required layout propagation for the new nodes. We could at least mention the need for adding a custom logic somewhere in the *.md docs. We could also add a possibility to add some layout related attributes in NodeGen (e.g. NodeDef.layoutRequirements(Agnostic) or NodeDef.layoutRequirements(PropagateInputLayoutToOutputLayout)). WDYT?

@shajrawi
Copy link
Contributor Author

shajrawi commented Dec 2, 2019

@opti-mix I agree - such a refactoring would be nice - I created an issue for it: #3834

Basically we need to specify that while a node is not data parallel, it can accept any layout (see #3832) + specify that we can propagate the layout information (the bug fix in this PR)

@shajrawi
Copy link
Contributor Author

shajrawi commented Dec 2, 2019

@mciprian13 I just updated the docs, good point!

@opti-mix
Copy link
Contributor

opti-mix commented Dec 2, 2019

@shajrawi Thanks for updating the docs! But could you also mention/refer to the problematics of tensor layouts in the Backends.md where it describes how new nodes are defined in the NodeGen? Otherwise there is a risk that people never read your docs about TensorLayouts.

@shajrawi
Copy link
Contributor Author

shajrawi commented Dec 2, 2019

@opti-mix Done! I added such a note in NewOperators.md which describes how new nodes are defined for all backends :)

Copy link
Contributor

@opti-mix opti-mix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shajrawi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

vdantu pushed a commit to vdantu/glow that referenced this pull request Jul 12, 2020
…3831)

Summary:
The layout requirements for convertTo nodes should be the same as the requirement of their inputs. exact same behavior as quantization nodes. For example, if the input is in NCHW format, we should propagate that information.

Fixes pytorch#3826
Pull Request resolved: pytorch#3831

Test Plan:
```
Test-case from the issue:
model-runner -m predict_net.pb -m init_net.pb -backend=Interpreter -convert-to-fp16
Model: predict_net.pb
shape: ( 1 64 112 112 )
max: 328.750  min: -342.750
[[[[-4.305, -1.395, -1.973, -55.750, -28.594, 30.094, 1.790, -25.000, 6.543, 0.682, -17.844, 63.406, -13.500, -17.703, -63.969, 9.273, 28.141, 12.992, -30.406, -8.078, 3.312, 24.797, 2.342, 17.484, 13.156, -3.906, -52.469, -41.531, -4.105, 15.273, -9.305, -13.758, 28.453, -6.852, 13.828, -7.699, 3.133, -10.570, -12.523, -52.469, 7.520, -5.211, 9.406, -14.516, 0.541, -2.070, -1.676, -32.312, -4.531, 12.336, -26.359, 43.219, 42.219, 30.422, 3.301, -38.750, 14.617, 38.750, -29.219, -50.719, 0.854, -0.113, 4.035, -1.172, -23.875, -15.938, -8.805, 67.875, 7.152, -16.422, 56.875, -3.996, -42.562, 27.516, 6.699, -33.281, 28.078, -1.342, -6.727, -3.949, -23.953, 11.305, -29.656, -32.094, -67.562, 34.406, -38.656, 40.719, 31.188, 22.047, -83.938, -20.734, -5.492, -10.516, -11.422, 10.211, -13.719, 14.133, -31.797, 3.926, ...]
```

Differential Revision: D18765965

Pulled By: shajrawi

fbshipit-source-id: 17f2bd511754f0055259cb6a6cb8fe094097e7e7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Regression] model-runner not able to execute convolution single node graph
4 participants