Skip to content

Teach Glow how to support layout requirements for nodes and tensors #3452

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
shajrawi opened this issue Aug 22, 2019 · 6 comments
Closed

Teach Glow how to support layout requirements for nodes and tensors #3452

shajrawi opened this issue Aug 22, 2019 · 6 comments
Assignees

Comments

@shajrawi
Copy link
Contributor

Taking the tensor layouts portion out of #2686 as alignment requirements have already been done and this is a different issue.

Quoting @opti-mix :

Different operations (e.g. convolutions, gemms, etc) may require each of their inputs and outputs to use a specific memory layout, both in terms of alignments and logical ordering of dimensions (e.g. NHWC or NCHW). More over, these requirements may depend on the properties of the operation, e.g. dimensions of its operands, etc. E.g. a convolution with a small filter may need the input operands in a format different from a convolution with a big filter.

Today, Glow core and custom backends implicitly hard-code this knowledge about the operations into (backend-specific) nodes and code that works with them, i.e. in the code that transforms Glow nodes, which are in the canonical NHWC format, into backend-specific nodes. This is pretty fragile and involves a lot of boiler plate code.

What if Glow would be able to ask a backend for each node what are the requirements for each of node's inputs and results in terms of alignments and ordering of dimensions?

  • The backend would return this information and Glow core could insert all the required layout transformations using a new node called LayoutTransform (may be we could extend/subsume Transpose for these purposes).

  • This new LayoutTransform node could be applied to Constant weights at compile-time. It can also be optimized in almost the same ways as Glow optimizes Transpose nodes, e.g. multiple LayoutTransforms can be combined into one, opposite LayoutTransforms can eliminate each other, etc.

  • This way all of the functionality to insert the required layout transforms is handled by the Glow core, which removes a lot of code duplication from backends. It also allows for better optimizations as the optimizer would understand the semantics of LayoutTransform. And it also allows for better verification, as Glow can now check that layouts of all inputs/outputs of each node satisfy the requirements of this operation as reported by backends.

To expand on this idea I'd like to share an idea by @artemrakhov

first start with a "raw" state of uncompliance, where we insert LayoutTransformation nodes while needed. Then we have a loop to sink and clamp layout transformations together, similarly to what we do to transposes. In fact, LayoutTransformation is generalization of Transpose, since we can not only change the order of dimensions, but also their alignments.

Basically, we need to port/expand transpose syncing / elimination optimization to support layouts.

Additionally, I think that if we pursue this direction, It should also include a verifier that makes sure that a node A that expects certain layout constraints for its input have said constraints satisfied after doing layout transformations.

This would clean-up node creation in Graph.cpp and create a central location for satisfying layout-constraints. Consider static void assertConv3DDims for example, as the name suggests it does some checks to make sure the tensor layout and dimensions make sense. such tests could be moved into a central location: the proposed layout verifier.

Another useful outcome of this work is simplifying model-import: We (currently) silently add a transpose before some nodes because we may have them in a non-Glow-compliant layout. Take Bucketize from Caffee2 for example: The Caffe2 operator supports NCHW layout while we, in Glow, do not. When importing from Caffe2 we check for NCHW layout and add a transpose. We can make the import process for a large number of operators way more intuitive if we don't hard-code a transpose node but add them via layout requirements.

If we support such a change for model loaders, A tricky situation that we'd need to tackle is knowing the original layout that we imported. We have that knowledge when we import from say Caffe2, but, we don't keep track of it in Glow. One possible solution to this problem is if we included that information in Placeholders (and constants?).

@shajrawi shajrawi self-assigned this Aug 22, 2019
@shajrawi shajrawi changed the title Teach Glow how to support layout requirements for tensors Teach Glow how to support layout requirements for nodes and tensors Aug 22, 2019
@ayermolo
Copy link
Contributor

The way I see it is there are two types of Transposes. One is for layout to work with whatever backend supports, and another to massage the data for subsequent layers. For example in Object Detection post processing to feed in to NMS layer. The latter are part of the fundamental graph, the former are needed for correctness with backend.

Would it make more sense to load the graph in whatever default layout is. Then after all the optimizations have a LayoutLegalization pass, can be over loaded by backends either whole thing or querrying node layout, that would insert LayoutTransforms as necessary for that backend. I think introducing LayoutTransforms during loading just pollutes the graph, and complicates GraphOptimizer and transformPostLowering.

@shajrawi
Copy link
Contributor Author

Making GraphOptimizer aware of LayoutTransforms does add a bit of code to the optimizer, however, I believe the potential performance gains outweigh the cost:
Even if we look at it as two types of transposes, almost any optimization that can be applied to one type can almost certainly be applied to the other: When looking at a large graph, combining multiple LayoutTransforms into one, or eliminating opposing LayoutTransforms can give us a performance boost for relatively tiny amount of extra work.

@shajrawi
Copy link
Contributor Author

Just to clarify: I am not proposing that we manually add these nodes during load time, there should be an automated pass that adds them to the graph,. I think we are on the same page there.
We do, however, need to optimize whatever nodes we inserted to the graph by said transformational pass. I think GraphOptimizer is the natural place to perform said optimizations

@ayermolo
Copy link
Contributor

Ah ok. Yeah I think we are on same page. I guess I am mainly arguing for doing it later in compilation flow after the bulk of other optimizations is done. Of course either as a separate path or part of LayoutLegalization un-necessary Transposes/LayoutTransforms can be removed. It is a pretty expensive operation.

One benefit to giving more control to BE is that it can do a more holistic approach to the graph depending on it's needs. So kind of combining inserting layoutTransformations for legalization purposes one node at a time naively and optimization of GraphOptimizer.

So maybe both a querry of node layout or entire pass could be over-ridden by BE.

@shajrawi
Copy link
Contributor Author

I definitely see the backend over-riding node layout's query. per @opti-mix original comment:

Glow would be able to ask a backend for each node what are the requirements for each of node's inputs and results in terms of alignments and ordering of dimensions
The backend would return this information and Glow core could insert all the required layout transformations using a new node called LayoutTransform (may be we could extend/subsume Transpose for these purposes).

So the backend should be given control over how we transform and optimize the graph.

I can see a backend having additional, backend specific, passes for optimizing Transposes/LayoutTransforms. But, some basic transformations are backend agnostic given the (possibly over-ridden) node layout query.

@tlepley-cadence
Copy link
Contributor

IMO, the memory layout choice is something very target specific and it may not only depend on the node but also on the graph connectivity.

Then, I'm not convinced that doing such transformation early in the flow is meaningful because the answer to the memory layout question may only be available after nodes have been lowered and potentially transformed into backend specific nodes.

On the other side, I’m also not convinced that trying to do such transformation in a generic way later in the flow (after post-lowering for instance) is also meaningful because at this stage the IR can contain lots of backend specific nodes and a generic Glow optimization will hardly optimize out Transpose (or equivalent) nodes.

My preference would then be make the high-level IR actually canonical (taking* NCHW only* for instance) what is not the case today because the layouts of some node (convolution is NHWC for instance) has been chosen for the CPU backend. This pollutes the IR with plenty of Transpose nodes at model load time while the actual backend may not even want this transformation.

shajrawi added a commit to shajrawi/glow that referenced this issue Nov 6, 2019
Fixes pytorch#3452

Also Fixes pytorch#3493 and pytorch#3500 GraphOptimizer bugs which were found after adding the layout verifier.

Provides a workaround for the pytorch#3499 issue which was also found via the verifier.

Note: I did not want to break the `enum ConvolutionLayout` introduced in 5074a72, As such, I used it in the verifier / did not change the creation of said nodes.
HOWEVER: We should use the more-generic string-based layout, which I introduce to Transpose node in this commit: it is basically an extendable enum that can be used in the backends without touching the generic code base. as a bonus, it makes differentiation easier: see how it is done for transpose now in `Function *glow::differentiate`.

Getting rid of said enum is a proposed TODO / follow-up.

Also note that some nodes *need* layout requirements, which have been added, namely we need to know the layout for placeholders and constants (obviously) and for reshapes (in case we optimized a transpose into a reshape.

An additional nice-to-have feature of the string-based layout is the wildcard / any-layout option. Some operations, such as data parallel nodes, might accept any layout.

A potential follow-up is to get create a "Solver" that automatically inserts transposes if the layouts do not match, this might greatly simplify the loader: we no longer need to insert transposes based on if we are importing NHWC or NCHW (for example). We just need to annotate the placeholder with the layout information we've get at load-time, and which we "forget" afterwards.

The verifier is useful even without creating said solver, it exposed a couple of bugs which are mentioned in this commit, as such any proposed solvers are not a must-have to demonstrate the usefulness of this commit.
vdantu pushed a commit to vdantu/glow that referenced this issue Jul 12, 2020
…pytorch#3503)

Summary:
Note: I did not want to break the `enum ConvolutionLayout` introduced in 5074a72, As such, I used it in the verifier / did not change the creation of said nodes.
HOWEVER: We should use the more-generic string-based layout, which I introduce to Transpose node in this commit: it is basically an extendable enum that can be used in the backends without touching the generic code base. as a bonus, it makes differentiation easier: see how it is done for transpose now in `Function *glow::differentiate`.

Getting rid of said enum is a proposed TODO / follow-up.

Also note that some nodes *need* layout requirements, which have been added, namely we need to know the layout for placeholders and constants (obviously) and for reshapes (in case we optimized a transpose into a reshape).

An additional nice-to-have feature of the string-based layout is the wildcard / any-layout option. Some operations, such as data parallel nodes, might accept any layout.

A potential follow-up is to get create a "Solver" that automatically inserts transposes if the layouts do not match, this might greatly simplify the loader: we no longer need to insert transposes based on if we are importing NHWC or NCHW (for example). We just need to annotate the placeholder with the layout information we've get at load-time, and which we "forget" afterwards.

The verifier is useful even without creating said solver, it exposed a couple of bugs which are mentioned in this commit, as such any proposed solvers are not a must-have to demonstrate the usefulness of this commit.

Fixes pytorch#3452

Also Fixes pytorch#3493 and Fixes pytorch#3500 GraphOptimizer bugs which were found after adding the layout verifier.

Provides a workaround for the pytorch#3499 issue which was also found via the verifier.
Pull Request resolved: pytorch#3503

Test Plan: `ninja test`

Differential Revision: D18357369

Pulled By: shajrawi

fbshipit-source-id: 45f91fbe120b234c2a85879cee9ee0de6c100b50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants