Teach Glow how to support layout requirements and fix uncovered bugs. #3503

shajrawi · 2019-09-11T22:18:54Z

Summary:

Note: I did not want to break the enum ConvolutionLayout introduced in 5074a72, As such, I used it in the verifier / did not change the creation of said nodes.
HOWEVER: We should use the more-generic string-based layout, which I introduce to Transpose node in this commit: it is basically an extendable enum that can be used in the backends without touching the generic code base. as a bonus, it makes differentiation easier: see how it is done for transpose now in Function *glow::differentiate.

Getting rid of said enum is a proposed TODO / follow-up.

Also note that some nodes need layout requirements, which have been added, namely we need to know the layout for placeholders and constants (obviously) and for reshapes (in case we optimized a transpose into a reshape).

An additional nice-to-have feature of the string-based layout is the wildcard / any-layout option. Some operations, such as data parallel nodes, might accept any layout.

A potential follow-up is to get create a "Solver" that automatically inserts transposes if the layouts do not match, this might greatly simplify the loader: we no longer need to insert transposes based on if we are importing NHWC or NCHW (for example). We just need to annotate the placeholder with the layout information we've get at load-time, and which we "forget" afterwards.

The verifier is useful even without creating said solver, it exposed a couple of bugs which are mentioned in this commit, as such any proposed solvers are not a must-have to demonstrate the usefulness of this commit.

Fixes #3452

Also Fixes #3493 and Fixes #3500 GraphOptimizer bugs which were found after adding the layout verifier.

Provides a workaround for the #3499 issue which was also found via the verifier.

Test Plan:
ninja test

opti-mix

@shajrawi Very nice! I'm glad to see this feature being implemented!

I've done the first quick round of reviewing. Please find my comments below.

tools/ClassGen/NodeGen.cpp

tests/unittests/OperatorTest.cpp

opti-mix · 2019-09-11T22:43:23Z

tests/unittests/GraphTest.cpp

@@ -1788,6 +1789,7 @@ TEST(Graph, testDumpStructure) {
  std::string mesN = K->toString();
  std::string expectMes = R"(Placeholder
 name : "input"
+layout : *


* is used to express a layout with number of dimensions with any names? Have you considered to use e.g. **** for 4D layouts? Would it make anything easier or more complex compared to your approach?

More complex. Currently we have * as the default value when constructing the placeholder. We would need to change that and/or modify the creation method to count the number of dimensions. Same for constants. We can usually figure the the layout out from * so.. But I am open to making this change if we decide it is best.

I don't have a strong opinion here. My main concern was that e.g. if two tensors have different number of dimensions and one of them has the "*" layout and another one may be a more concrete layout like NCHW we should be able to correctly handle this in our layout compatibility checking logic. As long as this is the case, I'm fine with this short way of designating the any layout.

lib/Graph/Graph.cpp

include/glow/Base/TensorLayoutUtils.h

lib/Backend/TensorLayout.cpp

lib/Backends/OpenCL/OpenCLTensorLayout.h

include/glow/Base/TensorLayoutUtils.h

shajrawi · 2019-09-12T01:26:41Z

Thanks for the initial review @opti-mix ! I addressed your comments and fixed the cmake / shared library issue.

examples/char-rnn.cpp

lib/Graph/TensorLayout.cpp

lib/Importer/Caffe2ModelLoader.cpp

lib/Backends/OpenCL/OpenCLTensorLayout.cpp

tests/unittests/OperatorTest.cpp

opti-mix

Thanks for the quick iterations. Below some further comments.

In particular, the GLOW_DEFAULT_4D_LAYOUT change is a bit controversial as it turns out. What do you think about it? Should we undo it completely? Or should we just use GLOW_DEFAULT_4D_LAYOUT in fewer places? Maybe you have a better idea?

lib/Graph/TensorLayout.cpp

nadavrot · 2019-09-12T05:47:31Z

@shajrawi Could you please add a section to /docs that describes the tensor layout proposal?

lib/Graph/TensorLayout.cpp

lib/IR/IRGen.cpp

include/glow/Graph/Graph.h

shajrawi · 2019-09-13T02:13:07Z

I've amended the PR to take to address the review, including adding a small description in docs/Backends, but I can work on either creating a separate document with the proposal and/or expand the section in said MD file.

include/glow/Graph/Node.h

opti-mix · 2019-09-13T02:21:31Z

lib/Graph/Node.cpp

@@ -299,6 +299,50 @@ llvm::raw_ostream &operator<<(llvm::raw_ostream &os, const Node *node) {
  node->dump(os);
  return os;
 }
+
+bool isDataParallel(const Node *node) {


Long term we probably want to support the isDataParallel predicate in NodeGen just like we do for InstrGen, because we may need to support it for backend-specific nodes and this approach would not work properly in such cases. If you don't want to submit a small PR for it right now, please file an issue about it.

lib/Backends/OpenCL/OpenCLTensorLayout.cpp

lib/Importer/ONNXModelLoader.cpp

nadavrot · 2019-09-13T02:44:18Z

@shajrawi Joe, I don't yet understand the proposal. At the moment we have docs that describe the IR, the Graph and how they work in /docs. This PR looks like a major change to the type-system but it does not explain why it exists and how it works. Questions: Is this an optional addition to the IR? Is this metadata or does it have a functional use? Is this required for correctness? Is the graph always in a legal state? All of these things are not explained in the PR or in the docs section.

Could you please start with a short document that explains the motivation?

shajrawi · 2019-09-13T18:41:20Z

@nadavrot I amended the PR with a document that explains the motivation, interface, related work and future directions. See docs/TensorLayout.md .

@chadrosier, @pfierek, @tlepley-cadence, @mciprian13, and @vuzelac-cadence: I would appreciate any feedback you might have on this document and the changes it proposes to make the life of custom backends easier.

docs/TensorLayout.md

facebook-github-bot

@shajrawi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@shajrawi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@shajrawi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

jfix71

@shajrawi Seems like it's pretty flexible and so should be useful/useable by other backends. Added some comments, nothing major.

jfix71 · 2019-11-08T22:14:23Z

docs/TensorLayout.md

+
+1. A mandatory one character representing the current dimension. Either an  alphabetic letter or `*` (any layout).
+2. An optional token for the start of the current dimension's information: `[`.
+3. An optional namespace identifier for non-standard information, such as tiling, followed by `:`. Must have `[` from 2. in place. following said identifier, all subsequent data is considered as a "black box" until `]` is encountered.


nit: capitalize "following"

docs/TensorLayout.md

jfix71 · 2019-11-09T00:04:52Z

include/glow/Graph/Graph.h

@@ -1376,6 +1397,8 @@ llvm::raw_ostream &operator<<(llvm::raw_ostream &os, const Function &F);

 llvm::raw_ostream &operator<<(llvm::raw_ostream &os, const Function *F);

+#define NHWC2HWNC                                                              \


nit: why is this one not colocated with the others?

jfix71 · 2019-11-09T00:07:29Z

lib/Graph/TensorLayout.cpp

+    }
+    // Dynamically form the layout description for transposes.
+    auto input = TN->getInput();
+    auto inputLayout = getNthInputLayoutRequirements(node, 0);


Without thinking a ton about this, why is it safe to always use the 0th input here?

because there's only one input for transpose?

Ah yeah misread it -- so then yeah I'd use TransposeNode::InputIdx here.

no worries - good suggestion: changed that for clarity :)

jfix71 · 2019-11-09T00:08:14Z

lib/Optimizer/GraphOptimizer/GraphOptimizer.cpp

+      CN->getInputs().front().getNode()->getName(), newCN,
+      CN->getResult().dims(),
+      CanonicalTensorLayout::getInstance().getNthResultLayoutRequirements(CN,
+                                                                          0));


nit: ConcatNode::ResultIdx

jfix71 · 2019-11-09T00:22:05Z

tests/unittests/TensorLayoutTest.cpp

+
+  auto *input =
+      mod_.createPlaceholder(ElemKind::FloatTy, {1, 3, 3, 1}, "input", false);
+  auto IH = bindings_.allocate(input)->getHandle();


nit: don't think we need the bindings_ / need to allocate or set any values in this whole file, since we're not running anything right?

jfix71 · 2019-11-09T00:25:49Z

tools/ClassGen/NodeGen.cpp

@@ -684,12 +684,14 @@ int main(int argc, char **argv) {
  BB.newNode("Reshape")
      .addInput("Input")
      .addMember(MemberType::VectorSizeT, "Dims")
+      .addMember(MemberType::String, "Layout")


nit: no big deal, but we could special case this to .addLayout() or something in NodeBuilder as it will be the same across all Nodes.

Our plan is to move the layout information from nodes to Glow types. this can be done in a follow-up commit, even as a first task for someone motivated, during an operator's construction it will propagate the layout string (a pointer to global hash of strings to save on size). so we will not add a member called layout.
said follow-up is straightforward / just a lot of manual labor if we are fine with the design - no point in something fancy in here :)

jfix71 · 2019-11-09T00:27:13Z

include/glow/Graph/Graph.h


  TransposeNode *createTranspose(llvm::StringRef name, NodeValue input,
-                                 llvm::ArrayRef<unsigned_t> shuffle);
+                                 llvm::ArrayRef<unsigned_t> shuffle,
+                                 const std::string &layout = ANY_LAYOUT);


nit: seems like it's inconsistent whether you're using llvm::StringRef layout (above) vs. const std::string &layout (here)? No big deal but I'd say pick one and stick with it.

jfix71 · 2019-11-09T00:34:01Z

examples/fr2en.cpp

-    lastWordIdx =
-        F_->createReshape("decoder.reshape", topK->getIndices(), {batchSize_});
+    lastWordIdx = F_->createReshape("decoder.reshape", topK->getIndices(),
+                                    {batchSize_}, "N");


For my own understanding: "N" is just a descriptor that you chose to use here because of batchSize_, right? We aren't really going to use this anywhere in here. And it doesn't seem like it will make much of a difference for verification here, right? I'm wondering why you are using it here in general.

Currently we don't make extensive use of named tensor, so "N" is probably not mandatory for the current backends and current verifications, but, lets say we add a different 1-D layout, for example a private backend adds it during lowering, and it is NOT "N", then the verifier should fail.

jfix71 · 2019-11-09T00:35:57Z

examples/fr2en.cpp

    outputs.push_back(lastWordIdx);
  }

  Node *concat = F_->createConcat("decoder.output.concat", outputs, 0);
  Node *reshape = F_->createReshape("decoder.output.reshape", concat,
-                                    {MAX_LENGTH, batchSize_});
+                                    {MAX_LENGTH, batchSize_}, "*");


Curious why you've added "*" to many places, since it's already the default? Also wondering the benefit of ANY_LAYOUT if we're only going to use it sometimes. Unless there's a reason I'm missing, I'd suggest either always or never using it, otherwise it creates confusion for why they're both being used.

good catch - fixed

facebook-github-bot

@shajrawi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@shajrawi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

shajrawi · 2019-11-09T01:49:23Z

Thanks @jfix71 ! I added a new commit to address the nits

facebook-github-bot

@shajrawi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

opti-mix · 2019-11-09T06:06:15Z

docs/TensorLayout.md

+                               ```
+	- This function checks if `ty` satisfies `destLayout` layout requirements, if `srcLayout` is provided for `ty`, take that into account.
+
+- `virtual std::array<TensorLayoutDescription, max_tensor_dimensions + 1> &getLayoutsForDims() const`


The return type needs to be updated to return ArrayRef, after you changed the signature.

good catch - done! (I already changed the signature and used make array ref).

facebook-github-bot

@shajrawi has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

opti-mix · 2019-11-12T22:08:09Z

@shajrawi Looking good! I think it is ready to be merged.

opti-mix · 2019-11-12T22:08:56Z

@jfix71 Do you have any further comments or are you OK with merging it?

jfix71

Good by me!

facebook-github-bot · 2019-11-13T06:43:23Z

@shajrawi merged this pull request in ec46f24.

…ed bugs. (pytorch#3503)" Due to: pytorch#3815 This reverts commit ec46f24.

Summary: cherry-pick of the isDataParallel change from pytorch#3503 : it is an independent feature that we can merge per the offline discussion with opti-mix Pull Request resolved: pytorch#3715 Test Plan: ninja test Differential Revision: D18263215 Pulled By: shajrawi fbshipit-source-id: 52947ba5419c55eaf76048411d09a40a862fda1f

…pytorch#3503) Summary: Note: I did not want to break the `enum ConvolutionLayout` introduced in 5074a72, As such, I used it in the verifier / did not change the creation of said nodes. HOWEVER: We should use the more-generic string-based layout, which I introduce to Transpose node in this commit: it is basically an extendable enum that can be used in the backends without touching the generic code base. as a bonus, it makes differentiation easier: see how it is done for transpose now in `Function *glow::differentiate`. Getting rid of said enum is a proposed TODO / follow-up. Also note that some nodes *need* layout requirements, which have been added, namely we need to know the layout for placeholders and constants (obviously) and for reshapes (in case we optimized a transpose into a reshape). An additional nice-to-have feature of the string-based layout is the wildcard / any-layout option. Some operations, such as data parallel nodes, might accept any layout. A potential follow-up is to get create a "Solver" that automatically inserts transposes if the layouts do not match, this might greatly simplify the loader: we no longer need to insert transposes based on if we are importing NHWC or NCHW (for example). We just need to annotate the placeholder with the layout information we've get at load-time, and which we "forget" afterwards. The verifier is useful even without creating said solver, it exposed a couple of bugs which are mentioned in this commit, as such any proposed solvers are not a must-have to demonstrate the usefulness of this commit. Fixes pytorch#3452 Also Fixes pytorch#3493 and Fixes pytorch#3500 GraphOptimizer bugs which were found after adding the layout verifier. Provides a workaround for the pytorch#3499 issue which was also found via the verifier. Pull Request resolved: pytorch#3503 Test Plan: `ninja test` Differential Revision: D18357369 Pulled By: shajrawi fbshipit-source-id: 45f91fbe120b234c2a85879cee9ee0de6c100b50

facebook-github-bot added the CLA Signed label Sep 11, 2019

opti-mix reviewed Sep 11, 2019

View reviewed changes

shajrawi force-pushed the tesnor_layout branch 3 times, most recently from 21b1504 to 18ca0e3 Compare September 12, 2019 01:12

opti-mix reviewed Sep 12, 2019

View reviewed changes

lib/Graph/TensorLayout.cpp Outdated Show resolved Hide resolved

nadavrot reviewed Sep 12, 2019

View reviewed changes

lib/Graph/TensorLayout.cpp Outdated Show resolved Hide resolved

nadavrot reviewed Sep 12, 2019

View reviewed changes

lib/IR/IRGen.cpp Outdated Show resolved Hide resolved

tlepley-cadence reviewed Sep 12, 2019

View reviewed changes

include/glow/Graph/Graph.h Outdated Show resolved Hide resolved

shajrawi force-pushed the tesnor_layout branch from 18ca0e3 to 4826d78 Compare September 13, 2019 02:09

shajrawi force-pushed the tesnor_layout branch from 4826d78 to 80c724e Compare September 13, 2019 02:16

opti-mix reviewed Sep 13, 2019

View reviewed changes

include/glow/Graph/Node.h Outdated Show resolved Hide resolved

opti-mix reviewed Sep 13, 2019

View reviewed changes

shajrawi force-pushed the tesnor_layout branch 2 times, most recently from 113400b to 2e802e0 Compare September 13, 2019 18:36

shajrawi force-pushed the tesnor_layout branch 3 times, most recently from efb06fa to 9bd5dca Compare September 13, 2019 22:21

nadavrot reviewed Sep 16, 2019

View reviewed changes

docs/TensorLayout.md Show resolved Hide resolved

shajrawi force-pushed the tesnor_layout branch 4 times, most recently from 05a6ff6 to edd80ad Compare September 16, 2019 22:14

shajrawi force-pushed the tesnor_layout branch 2 times, most recently from 12f2bf9 to cd0c215 Compare November 8, 2019 20:35

facebook-github-bot reviewed Nov 8, 2019

View reviewed changes

shajrawi force-pushed the tesnor_layout branch from cd0c215 to 92ce3df Compare November 8, 2019 21:04

facebook-github-bot reviewed Nov 8, 2019

View reviewed changes

Update the tensor layouts PR based upon code-review.

284d165

shajrawi force-pushed the tesnor_layout branch from 92ce3df to 284d165 Compare November 9, 2019 01:00

facebook-github-bot reviewed Nov 9, 2019

View reviewed changes

jfix71 reviewed Nov 9, 2019

View reviewed changes

facebook-github-bot reviewed Nov 9, 2019

View reviewed changes

shajrawi force-pushed the tesnor_layout branch from 5ccd14e to 4c5ed2d Compare November 9, 2019 04:06

facebook-github-bot reviewed Nov 9, 2019

View reviewed changes

opti-mix reviewed Nov 9, 2019

View reviewed changes

Tensor Layouts: update PR based on Jordan's review

1b7ecf2

shajrawi force-pushed the tesnor_layout branch from 4c5ed2d to 1b7ecf2 Compare November 9, 2019 06:11

facebook-github-bot reviewed Nov 9, 2019

View reviewed changes

opti-mix approved these changes Nov 12, 2019

View reviewed changes

jfix71 approved these changes Nov 13, 2019

View reviewed changes

facebook-github-bot closed this in ec46f24 Nov 13, 2019

facebook-github-bot added the Merged label Nov 13, 2019

pjaaskel added a commit to parmance/glow that referenced this pull request Dec 7, 2019

Revert "Teach Glow how to support layout requirements and fix uncover…

9e1e1c9

…ed bugs. (pytorch#3503)" Due to: pytorch#3815 This reverts commit ec46f24.

		@@ -1376,6 +1397,8 @@ llvm::raw_ostream &operator<<(llvm::raw_ostream &os, const Function &F);

		llvm::raw_ostream &operator<<(llvm::raw_ostream &os, const Function *F);

		#define NHWC2HWNC \

Teach Glow how to support layout requirements and fix uncovered bugs. #3503

Teach Glow how to support layout requirements and fix uncovered bugs. #3503

Uh oh!

Conversation

shajrawi commented Sep 11, 2019

Uh oh!

opti-mix left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shajrawi commented Sep 12, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

opti-mix left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nadavrot commented Sep 12, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shajrawi commented Sep 13, 2019

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nadavrot commented Sep 13, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shajrawi commented Sep 13, 2019

Uh oh!

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

jfix71 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

nadavrot commented Sep 13, 2019 •

edited

Loading