diff --git a/docs/Backends.md b/docs/Backends.md index 82a3e2326c..79d79ceb3a 100644 --- a/docs/Backends.md +++ b/docs/Backends.md @@ -73,6 +73,10 @@ Additionally, there are virtual functions that backends can override: - Verifies that `IRFunction &IR` conforms to the backend-specific constraints. +- `virtual TensorLayoutCommon &getTensorLayoutRequirements() const;` + + - Gets the backend-specific tensor layout requirements. + - `virtual bool shouldLower(const Node *N) const;` - Allow the backend to prevent lowering for some `Node *N`. For example, if a diff --git a/docs/TensorLayout.md b/docs/TensorLayout.md new file mode 100644 index 0000000000..ac96b1c2d7 --- /dev/null +++ b/docs/TensorLayout.md @@ -0,0 +1,236 @@ +## Tensor Layout + +This document describes the design of the tensor Layout requirements in Glow. + +Certain operations (e.g. convolutions, gemms, etc) need to know the semantic +layout of their tensors, i.e. the logical ordering ordering of their dimensions +(e.g. `NHWC`). Some backends enforce additional backend-specific requirements +on said operations (e.g. tensor alignment). + +A theoretical clever backend, might even go a step further and have said +layout requirements depend on the properties of the operation: a convolution +with a small filter may need the input operands in a format different from a +convolution with a big filter. + +Tensor layout is a property of the operation, some operations, such as +element-wise operations, may not care about their input layout, we avoid adding +a layout field for said operations to reduce the dynamic memory consumption of +the compiler. + +For operations that do have layout requirements, Glow has an easily extendable +string-based layout field. This allows backends to override Glow's default +requirements without the hassle of creating a custom, backend-specific, node. + +Glow's string-based layout format is encoded as follows: + +1. A mandatory one character representing the current dimension. Either an alphabetic letter or `*` (any layout). +2. An optional token for the start of the current dimension's information: `[`. +3. An optional namespace identifier for non-standard information, such as tiling, followed by `:`. Must have `[` from 2. in place. Following said identifier, all subsequent data is considered as a "black box" until `]` is encountered. +4. Given that we have `[` from 2. in place, the closing bracket `]` for it. +5. Optionally go back to 2. + +As an example for this encoding, here's how we add alignment information, +which is an officially supported extension, thus not requiring a namespace, +followed by a backend-specific extension: +`N[a=32][namespace_for_unsupported:]HWC` would represent 4-D tensor wherein +`N` needs an alignment of 32 + some private backends' requirements we don't know about. +`HWC` have no layout restrictions. +We can, of course, combine "any" dimensions in there, for example: `N[a=32]*H*[a=64]` +would represent "any" for the second dimension with no restrictions whatsoever while +we have an alignment restriction of 64 on the 4th. + +Notes: + +1. For each dimension, the identifier can be either a single english alphabet letter, +either upper or lower case, or the star symbol. +2. We assume that a single letter is enough for each dimension, +it makes parsing easier and avoids adding delimiters in the serialized format, +however, we do have a constructor that (theoretically) accepts multi-letter dimensions. +If we decide to expand the current support, +we will need to add delimiters to the serialized form. + +## Layout Requirements Interface + +Backends in Glow *may* derive from [base class `TensorLayoutCommon`](https://github.com/pytorch/glow/blob/master/include/glow/Graph/TensorLayout.h). +Which includes the following virtual methods they can override: + +- `virtual std::string getDefaultNDLayout(unsigned dims) const` + + - This helper function takes a `unsigned dims` and returns the (current) default n-D layout. + +- `virtual std::string getNthInputLayoutRequirements(const Node *node, size_t n)` + + - This function takes an operator `Node *node` and returns the layout requirements of the Nth input `n`. + +- `virtual std::string getNthResultLayoutRequirements(const Node *node, size_t n)` + + - This function takes an operator `Node *node` and returns the layout requirements of the Nth result `n`. + +- ``` +virtual bool isSatisfiedBy(TypeRef ty, + const TensorLayoutDescription &destLayout, + const TensorLayoutDescription *srcLayout) const + ``` + - This function checks if `ty` satisfies `destLayout` layout requirements, if `srcLayout` is provided for `ty`, take that into account. + +- `virtual llvm::ArrayRef getLayoutsForDims() const` + + - This helper function returns an array of predefined layouts for all dimensions from `0-D` to Glow's max tensor layout dimension. + +- `bool isEnabled() const` + - Indicates whatever checking for layout requirements is enabled or not. default is off. + +An example of why backends may want to override such methods can be seen in the `OpenCL` backend: +Convolutions are more efficient in `NCHW` format, as such, we may lower a `ConvolutionNode` +into a `NHWC` to `NCHW` transpose + convolution. +The `OpenCL` verifier should expect `NCHW` for the input/output of the convolution instead of `NHWC`. +`OpenCL` opts-in to post-lowering verifications. + +## Canonical Tensor Layout + +Before lowering a Glow graph into a specific, we introduce a "Canonical" +representation that we expect for certain operations. +This allows us to verify the graph after every transformation and may expose `GraphOptimizer` bugs [^tl0]. +[class `CanonicalTensorLayout`](https://github.com/pytorch/glow/blob/master/include/glow/Graph/TensorLayout.h) +derives from `TensorLayoutCommon` and overrides the following functions: + +- `std::string getDefaultNDLayout(unsigned dims) const` + + - Overrides the default `n-D` layout from "any" into something else, e.g. 4-D any into `NHWC`. + +- `std::string getNthInputLayoutRequirements(const Node *node, size_t n)` + + - This function takes an operator `Node *node` and returns the layout requirements of the Nth input `n`. + - It returns Common layout constraints, for example, the input of `TransposeNode` is the same as layout of operation's result producing it. + +- `std::string getNthResultLayoutRequirements(const Node *node, size_t n)` + + - This function takes an operator `Node *node` and returns the layout requirements of the Nth result `n`. + - It returns Common layout constraints, for example, `ConvolutionNode` should be in `NHWC` format. + +## Placeholders and Constants + +An important thing to note is that some operators may have a `Placeholder` or +a `Constant` as their input. We may need to know a specific layout for said +storage. For example, a Placeholder may need to be in `NHWC` format for a +`ConvolutionNode`. However, we do not want to pollute the code by making +this a hard requirement, especially since the canonical layout may accept +anything for certain tensors (e.g. `1-D` tensor), as such, we introduce the +notion of `ANY_LAYOUT` and initialize them with this wildcard by default. +Note, that loaders have the ability to specify the layout based on network +description, e.g. they might accept either `NCHW` or `NHWC` as an input for +operator, and they can propagate that information to Glow. + + +## Related Work + +Other machine learning frameworks introduced similar concepts, this is not a +proposal unique to Glow, here are some notable mentions: + +### PlaidML + +Provides layout requirement information as a parameter to operations that need +to know tensor layouts instead of setting a global layout that would apply to +every operation. Allowing users to mix layouts throughout their network. + +PlaidML made the conscious decision to make the layout a property the operation +instead of the tensor, making the implementation of certain operations more +intuitive [^tl1]. + +### TVM + +TOPI is the operator collection library for TVM [^tl2]. Certain TOPI operations +include their layout requirements as a string. Here's layout section of +`topi.nn.pool` taken from version 0.6 of the document: + +> layout (string) – Layout of the input data. The layout is supposed to be composed +> of upper cases, lower cases and numbers, where upper case indicates a dimension +> and the corresponding lower case with factor size indicates the split dimension. +> For example, NCHW16c can describe a 5-D tensor of [batch_size, channel, height, +> width, channel_block], in which channel_block=16 is a split of dimension channel. + + +### XLA + +XLA adds backend specific layout constraints. Their CPU backend requires +constant arrays to be column major when all of their users are dot operations [^tl3]. While +Their GPU backend adds layout constraints on the cudnn custom-call instruction [^tl4]. + +It is also worth taking a look at XLA's layout optimizer [^tl5], part of their +effort to improve the out-of-the-box TensorFlow performance [^tl6]. + +Another thing to note is that their alter layout pass [^tl7] is similar, +in function, to the "Solver" we propose to automatically legalizes layouts in +the future work section of this document. + +### MLIR + +Does not currently have such support, but there are ongoing discussions to add such +support to MLIR Tensor Type [^tl8]. + +## Future Work + +There are a few neat things we can, and probably should, do to expand this support: + +### Remove `enum ConvolutionLayout` + +Our string based representation is more generic and extendable as it is basically an +extendable enum that can be used in the backends without touching the generic code base. + +### Remove shuffle arrays + +Some operations, such as `TransposeNode`, have a shuffle that tells them what to do. +This can be deprecated and automatically deduced by specifying layout constraints. + +There is some discrepancy is the fact that with currently use both typed tensor, +with named dimensions, and explicitly indexed dimensions like we currently do +everywhere in the code base, shuffle arrays being an example of that, This +may lead to potential inconsistency in certain cases. +We should gradually migrate towards typed tensors in the long run. + +### Introduce a "Solver" that automatically legalizes layouts + +Said solver will drastically reduce the complexity of loading models from other frameworks: +We no longer need to insert transposes based on if we are importing `NHWC` or `NCHW`. +We just need to annotate the `Placeholder` with the layout information we've get at load-time, +and which we "forget" afterwards, and let the solver transpose said `Placeholder` to our +canonical layout. + +First we will start with a "raw" state of non compliance, Then we have a loop to sink and +clamp layout transformations together. + +### Remove backend specific nodes + +Today, Glow core and custom backends implicitly hard-code this knowledge about the operations +into (backend-specific) nodes and code that works with them. This is pretty fragile and +involves a lot of boiler plate code. + +Combining the proposed solver with the backend-specified layout constraints would improve +this situation considerably: + +- The backend would return this information and Glow core could insert all the required layout transformations + +- The transformations can also be optimized "for free": Glow currently optimizes `TransposeNode`: + - Multiple transposes can be combined into one + - Opposite transposes can eliminate each other + +- The functionality to insert the required layout transforms is handled by the Glow core, +which removes a lot of code duplication from backends. + +[^tl0]: [Glow Issue: Fix bug in constant folding optimization](https://github.com/pytorch/glow/issues/3500) + +[^tl1]: [Tensor Layout Design Decision in PlaidML](https://github.com/plaidml/plaidml/blob/master/plaidml2/op/lib/design.md#tensor-layout) + +[^tl2]: [TVM Operator Inventory](https://docs.tvm.ai/api/python/topi.html) + +[^tl3]: [XLA CPU Layout Assignment](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/xla/service/cpu/cpu_layout_assignment.cc) + +[^tl4]: [XLA GPU Layout Assignment](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/xla/service/gpu/gpu_layout_assignment.cc) + +[^tl5]: [XLA Layout optimizer](https://github.com/tensorflow/tensorflow/blob/b6f7ce2b98b496886be4d900a6f88c24ae730f2c/tensorflow/core/grappler/optimizers/layout_optimizer.cc) + +[^tl6]: [TensorFlow Graph Optimizations](https://web.stanford.edu/class/cs245/slides/TFGraphOptimizationsStanford.pdf) + +[^tl7]: [XLA Alter Layout](https://github.com/dmlc/tvm/blob/025a6c8077cd1914bdd4132c6b86de007151344e/src/relay/pass/alter_op_layout.cc) + +[^tl8]: [Proposal to add layout attribute to MLIR Tensor Type](https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/sCaIEKm2RxA) diff --git a/examples/fr2en.cpp b/examples/fr2en.cpp index f6c3a48a42..6b93daaeed 100644 --- a/examples/fr2en.cpp +++ b/examples/fr2en.cpp @@ -277,14 +277,15 @@ void Model::loadEncoder() { {0, step, 0}, {batchSize_, step + 1, EMBEDDING_SIZE}); Node *reshape = F_->createReshape("encoder." + std::to_string(step) + ".reshape", - inputSlice, {batchSize_, EMBEDDING_SIZE}); + inputSlice, {batchSize_, EMBEDDING_SIZE}, ANY_LAYOUT); hidden = createPyTorchGRUCell(F_, reshape, hidden, wIh, bIh, wHh, bHh); outputs.push_back(hidden); } Node *output = F_->createConcat("encoder.output", outputs, 1); - Node *r2 = F_->createReshape("encoder.output.r2", output, - {MAX_LENGTH * batchSize_, EMBEDDING_SIZE}); + Node *r2 = + F_->createReshape("encoder.output.r2", output, + {MAX_LENGTH * batchSize_, EMBEDDING_SIZE}, ANY_LAYOUT); encoderHiddenOutput_ = F_->createGather("encoder.outputNth", r2, seqLength_); } @@ -339,14 +340,14 @@ void Model::loadDecoder() { Node *FC = F_->createFullyConnected("decoder.outFC", hidden, outW, outB); auto *topK = F_->createTopK("decoder.topK", FC, 1); - lastWordIdx = - F_->createReshape("decoder.reshape", topK->getIndices(), {batchSize_}); + lastWordIdx = F_->createReshape("decoder.reshape", topK->getIndices(), + {batchSize_}, "N"); outputs.push_back(lastWordIdx); } Node *concat = F_->createConcat("decoder.output.concat", outputs, 0); Node *reshape = F_->createReshape("decoder.output.reshape", concat, - {MAX_LENGTH, batchSize_}); + {MAX_LENGTH, batchSize_}, ANY_LAYOUT); auto *save = F_->createSave("decoder.output", reshape); output_ = save->getPlaceholder(); bindings.allocate(output_); diff --git a/include/glow/Backend/Backend.h b/include/glow/Backend/Backend.h index 93ed9702a1..33cc0a31f7 100644 --- a/include/glow/Backend/Backend.h +++ b/include/glow/Backend/Backend.h @@ -31,6 +31,7 @@ class Node; class PlaceholderBindings; class IRGenVisitor; class FunctionPassPipeline; +class TensorLayoutCommon; namespace runtime { @@ -121,6 +122,11 @@ class Backend { /// has a good reason not to call IRFunction::verify(). virtual bool verify(const IRFunction &IR) const; + /// \returns a reference to the backend-specific tensor layout requirements + /// singleton. If not overridden, the default requirement is Glow's + /// "canonical" form. + virtual TensorLayoutCommon &getTensorLayoutRequirements() const; + /// \returns true if the supplied Node \N should be lowered. By default, all /// Nodes are candidates for lowering. virtual bool shouldLower(const Node *N) const { return true; } diff --git a/include/glow/Graph/Graph.h b/include/glow/Graph/Graph.h index 80ea893a32..c87e04e7a6 100644 --- a/include/glow/Graph/Graph.h +++ b/include/glow/Graph/Graph.h @@ -56,6 +56,9 @@ enum class FunctionState { FuncLoaded, }; +/// Helper names for common tensor layouts. +#define ANY_LAYOUT "*" + class Module final { /// Stores the functions in the module. FunctionList functions_; @@ -173,26 +176,34 @@ class Module final { ///@{ Placeholder *createPlaceholder(ElemKind T, llvm::ArrayRef dims, - llvm::StringRef name, bool isTrainable); + llvm::StringRef name, bool isTrainable, + const std::string &layout = ANY_LAYOUT); Placeholder *createPlaceholder(TypeRef T, llvm::StringRef name, - bool isTrainable); + bool isTrainable, + const std::string &layout = ANY_LAYOUT); Placeholder *createPlaceholder(ElemKind T, llvm::ArrayRef dims, float scale, int32_t offset, - llvm::StringRef name, bool isTrainable); + llvm::StringRef name, bool isTrainable, + const std::string &layout = ANY_LAYOUT); - Constant *createConstant(TypeRef T, llvm::StringRef name); + Constant *createConstant(TypeRef T, llvm::StringRef name, + const std::string &layout = ANY_LAYOUT); Constant *createConstant(ElemKind T, llvm::ArrayRef dims, - llvm::StringRef name); + llvm::StringRef name, + const std::string &layout = ANY_LAYOUT); Constant *createConstant(ElemKind T, llvm::ArrayRef dims, float scale, - int32_t offset, llvm::StringRef name); + int32_t offset, llvm::StringRef name, + const std::string &layout = ANY_LAYOUT); - Constant *createConstant(llvm::StringRef name, const Tensor &tensor); + Constant *createConstant(llvm::StringRef name, const Tensor &tensor, + const std::string &layout = ANY_LAYOUT); - Constant *createConstant(llvm::StringRef name, Tensor &&tensor); + Constant *createConstant(llvm::StringRef name, Tensor &&tensor, + const std::string &layout = ANY_LAYOUT); ///@} @@ -250,6 +261,10 @@ class Module final { Module &operator=(PlaceholderBindings &&) = delete; }; +// Forward Declaration for verify's optional parameter +class Backend; +struct CompilationContext; + /// Represents the compute graph. class Function final : public Named { /// A list of nodes that the Function owns. @@ -597,10 +612,12 @@ class Function final : public Named { NodeValue targets); ReshapeNode *createReshape(llvm::StringRef name, NodeValue input, - UnsignedArrayRef shape); + UnsignedArrayRef shape, + llvm::StringRef layout = ANY_LAYOUT); TransposeNode *createTranspose(llvm::StringRef name, NodeValue input, - llvm::ArrayRef shuffle); + llvm::ArrayRef shuffle, + const std::string &layout = ANY_LAYOUT); /// Create a series of nodes that implement a Broadcast operation. The \p /// input Tensor is broadcasted based on \p newShape and along the \p axis, @@ -1302,9 +1319,11 @@ class Function final : public Named { Function *clone(llvm::StringRef newName, llvm::DenseMap *map = nullptr); - /// Verify the correctness of the Function. - /// \returns true when the function is valid. False otherwise. - bool verify() const; + /// Verify the correctness of the Function. If \p backend is provided, checks + /// backend-specific layout requirements. Else checks the requirements based + /// on Glow's "canonical" layout. \returns true when the function is valid. + /// False otherwise. + bool verify(const Backend *backend = nullptr) const; /// Dump a textual representation of the Function into provided output stream. void dump() const; @@ -1367,6 +1386,10 @@ Node *recursiveClone(Function *newF, Node *node, NodeMap &currToNew); { 0u, 2u, 3u, 1u } #define NHWC2NCHW \ { 0u, 3u, 1u, 2u } +#define HWCN2NHWC \ + { 3u, 0u, 1u, 2u } +#define NHWC2HWNC \ + { 1u, 2u, 0u, 3u } llvm::raw_ostream &operator<<(llvm::raw_ostream &os, const Module &mod); diff --git a/include/glow/Graph/Nodes.h b/include/glow/Graph/Nodes.h index e53a4da6c6..adb71cfc9a 100644 --- a/include/glow/Graph/Nodes.h +++ b/include/glow/Graph/Nodes.h @@ -36,7 +36,8 @@ class Storage : public Node { OutputIdx = 0, }; - Storage(Kinded::Kind k, llvm::StringRef name) : Node(k, name) {} + Storage(Kinded::Kind k, llvm::StringRef name, const std::string &layout) + : Node(k, name), layout_(layout) {} /// \return the single output value of the node. NodeValue getOutput() { return getNthResult(0); } @@ -68,6 +69,13 @@ class Storage : public Node { return k->getKind() == Kinded::Kind::ConstantKind || k->getKind() == Kinded::Kind::PlaceholderKind; } + + /// \return the layout of the storage. + const std::string &getLayout() const { return layout_; } + +private: + /// Specifies the Storage's layout + const std::string layout_; }; class Constant : public Storage { @@ -76,14 +84,14 @@ class Constant : public Storage { public: /// Create a new constant and initialize its payload. - Constant(llvm::StringRef name, TypeRef Ty) - : Storage(Kinded::Kind::ConstantKind, name) { + Constant(llvm::StringRef name, TypeRef Ty, const std::string &layout) + : Storage(Kinded::Kind::ConstantKind, name, layout) { addResult(Ty); payload_.reset(*Ty); } - Constant(llvm::StringRef name, Tensor &&payload) - : Storage(Kinded::Kind::ConstantKind, name), + Constant(llvm::StringRef name, Tensor &&payload, const std::string &layout) + : Storage(Kinded::Kind::ConstantKind, name, layout), payload_(std::move(payload)) { addResult(&payload_.getType()); } @@ -145,8 +153,9 @@ class Placeholder : public Storage { public: /// Create a new placeholder. - Placeholder(llvm::StringRef name, TypeRef Ty, bool isTrainable) - : Storage(Kinded::Kind::PlaceholderKind, name), + Placeholder(llvm::StringRef name, TypeRef Ty, bool isTrainable, + const std::string &layout) + : Storage(Kinded::Kind::PlaceholderKind, name, layout), isTrainable_(isTrainable) { addResult(Ty); } diff --git a/include/glow/Graph/TensorLayout.h b/include/glow/Graph/TensorLayout.h new file mode 100644 index 0000000000..8944256224 --- /dev/null +++ b/include/glow/Graph/TensorLayout.h @@ -0,0 +1,192 @@ +/** + * Copyright (c) 2017-present, Facebook, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +#ifndef GLOW_GRAPH_TENSORLAYOUT_H +#define GLOW_GRAPH_TENSORLAYOUT_H + +#include +#include + +#include "glow/Graph/Nodes.h" +#include "glow/Support/Error.h" + +namespace glow { + +/// Layout requirements's Singleton. +template class TensorLayoutSingleton { +public: + /// This is how the verifier, Backend and post-loading canonicalizer can + /// access layout constraints. + static T &getInstance() { + // The Ctor will only be called once. + static const std::unique_ptr instance{new T{token_{}}}; + return *instance; + } + +protected: + /// Allow the base class to call any subclass's constructor. + struct token_ {}; + + /// Default Ctor. + TensorLayoutSingleton() {} + + /// Dtor. + virtual ~TensorLayoutSingleton() {} + +private: + /// Delete copy constructor. + TensorLayoutSingleton(const TensorLayoutSingleton &) = delete; + + /// Delete move constructor. + TensorLayoutSingleton(TensorLayoutSingleton &&) = delete; + + /// Delete copy assignment. + TensorLayoutSingleton &operator=(const TensorLayoutSingleton &) = delete; + + /// Delete move assignment. + TensorLayoutSingleton &operator=(TensorLayoutSingleton &&) = delete; +}; + +/// TensorLayoutDescription - optional helper class for parsing string-based +/// layout. +class TensorLayoutDescription { + /// Tensor dimensions descriptions for all dimensions. + std::string dims_[max_tensor_dimensions]; + /// The serialization of the layout. + std::string serializedLayout_; + /// Expected number of dimensions. + size_t numDims_; + +public: + virtual ~TensorLayoutDescription() = default; + /// Constructs this helper class from a serialized string representation. + TensorLayoutDescription(const std::string &layoutStr); + /// Constructs this helper class from an array of strings representing each + /// individual / pre-separated dimension. + TensorLayoutDescription(llvm::ArrayRef dims); + /// \returns the alignment of a dimension \p n. + size_t getAlignment(size_t n) const; + /// \returns the alignment by parsing dimension string \p s. + size_t getAlignment(const std::string &s) const; + /// \returns true if both tensor layouts are the same. + bool isSameLayout(const TensorLayoutDescription &rhs) const; + /// \returns description of the dimension \p n. + const llvm::StringRef getNthDimDescription(size_t n) const; + /// \returns the description of all dimensions. + llvm::ArrayRef getDims() const; + /// \returns number of dimensions. + size_t getNumDims() const { return numDims_; } + /// \returns layout name. + llvm::StringRef getSerializedLayout() const { return serializedLayout_; } + /// \returns true if the layout is "*" in all dimensions. + bool isAnyLayout(); + std::string getDebugDesc() const; + +protected: + /// parse helper: get the custom extensions information. the default, virtual, + /// implementation just ignores all the data until the end token. + virtual void parseCustomExtensions(llvm::StringRef &text, unsigned idx); + +private: + /// Constructor helper: Parses the serialized string. + void parse(llvm::StringRef text); + + /// parse helper: get the official extensions information. + void parseOfficialExtensions(llvm::StringRef &text, unsigned idx); +}; + +/// Interface for finding out layout requirements. +class TensorLayoutCommon { +public: + /// \return the default n-D layout for Glow. + virtual std::string getDefaultNDLayout(unsigned dims) const; + + /// \returns layout requirements of the Nth input \p n of a Node \p node. + virtual std::string getNthInputLayoutRequirements(const Node *node, size_t n); + + /// \returns layout requirements of the Nth result \p n of a Node \p node. + virtual std::string getNthResultLayoutRequirements(const Node *node, + size_t n); + + /// \returns true if type \p ty satisfies the \p destLayout layout. If \p + /// srcLayout is provided, it is taken into account as well. + virtual bool isSatisfiedBy(TypeRef ty, + const TensorLayoutDescription &destLayout, + const TensorLayoutDescription *srcLayout) const; + + /// \return layouts for all tensor dimensions. + virtual llvm::ArrayRef getLayoutsForDims() const; + + /// \returns true if layout equirement verification is enabled. + bool isEnabled() const { return enabled_; } + +protected: + TensorLayoutCommon(); + TensorLayoutCommon(TensorLayoutCommon &&) = delete; + TensorLayoutCommon &operator=(const TensorLayoutCommon &) = delete; + TensorLayoutCommon &operator=(TensorLayoutCommon &&) = delete; + virtual ~TensorLayoutCommon(); + +protected: + bool enabled_; + +private: + std::unordered_map + layoutNameToLayoutDescription_; +}; + +class CanonicalTensorLayout final + : public TensorLayoutCommon, + public TensorLayoutSingleton { +public: + CanonicalTensorLayout(token_) {} + + /// \return the default n-D layout for Glow. + std::string getDefaultNDLayout(unsigned dims) const override; + + /// \returns layout requirements of the Nth input \p n of a Node \p node. + /// NOTE: Certain nodes are layout agnostic. Others expect their + /// inputs/outputs to have a canonical format. For some layout agnostic nodes + /// we need to look at the layout of their inputs to determine the layout of + /// their outputs, e.g. a batch norm. node, in the canonical representation, + /// accepts any input layout such as NCHW or NHWC, but, the output is a + /// propoagation of said layout. + std::string getNthInputLayoutRequirements(const Node *node, + size_t n) override; + + /// \returns layout requirements of the Nth result \p n of a Node \p node. + std::string getNthResultLayoutRequirements(const Node *node, + size_t n) override; + + /// \returns true of the node accepts any layout. + bool acceptsAnyLayout(const Node *node) const; +}; + +/// Checks if two layout descriptions \p lhs and \p rhs describe the same layout +/// for a value of the type \p ty \returns true if layouts are the same. if \p +/// verbose then print out verbose report. +bool checkSameLayout(llvm::StringRef srcLayoutStr, + llvm::StringRef destLayoutStr, TypeRef ty, + const Node *parent, const std::string &prefix, + const TensorLayoutCommon &TLC, bool verbose = true); + +/// Verifies the correctness of tensor layouts in the function \p F using layout +/// requirements interface \p TLC. if \p verbose then print out verbose report. +bool verifyLayouts(const Function &F, TensorLayoutCommon &TLC, + bool verbose = true); + +} // end namespace glow + +#endif // GLOW_GRAPH_TENSORLAYOUT_H diff --git a/lib/Backend/Backend.cpp b/lib/Backend/Backend.cpp index 235ac054a8..0a9dce96e8 100644 --- a/lib/Backend/Backend.cpp +++ b/lib/Backend/Backend.cpp @@ -19,6 +19,7 @@ #include "glow/Graph/Graph.h" #include "glow/Graph/PlaceholderBindings.h" +#include "glow/Graph/TensorLayout.h" #include "glow/IR/Instrs.h" #include "glow/Optimizer/GraphOptimizer/CompilationContext.h" #include "glow/Optimizer/GraphOptimizerPipeline/Pipeline.h" @@ -172,7 +173,7 @@ bool Backend::checkAllNodesSupported(const Function &F) const { } bool Backend::verify(const Function &F) const { - return F.verify() && checkAllNodesSupported(F); + return F.verify(this) && checkAllNodesSupported(F); } bool Backend::verify(const IRFunction &IR) const { @@ -180,6 +181,10 @@ bool Backend::verify(const IRFunction &IR) const { return true; } +TensorLayoutCommon &Backend::getTensorLayoutRequirements() const { + return CanonicalTensorLayout::getInstance(); +} + FunctionPassPipeline Backend::getOptimizationPipeline() const { auto p = createDefaultGraphOptimizationPassPipeline(); // Fold Tile followed by Add into BatchedAdd. Currently this is not part of diff --git a/lib/Backends/Interpreter/Interpreter.cpp b/lib/Backends/Interpreter/Interpreter.cpp index 2a7f53b5b2..52b0810a27 100644 --- a/lib/Backends/Interpreter/Interpreter.cpp +++ b/lib/Backends/Interpreter/Interpreter.cpp @@ -541,7 +541,7 @@ static bool checkLayoutForNode(const Node &N) { } bool Interpreter::verify(const Function &F) const { - if (!F.verify()) { + if (!F.verify(this)) { return false; } if (!checkAllNodesSupported(F)) { diff --git a/lib/Backends/Interpreter/tests/InterpreterTensorLayoutTest.cpp b/lib/Backends/Interpreter/tests/InterpreterTensorLayoutTest.cpp new file mode 100644 index 0000000000..bdf4d1d0f6 --- /dev/null +++ b/lib/Backends/Interpreter/tests/InterpreterTensorLayoutTest.cpp @@ -0,0 +1,20 @@ +/** + * Copyright (c) Glow Contributors. See CONTRIBUTORS file. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +#include "tests/unittests/BackendTestUtils.h" + +using namespace glow; + +std::set glow::backendTestBlacklist = {}; diff --git a/lib/Backends/OpenCL/CMakeLists.txt b/lib/Backends/OpenCL/CMakeLists.txt index 34a99c52ec..2339c82187 100644 --- a/lib/Backends/OpenCL/CMakeLists.txt +++ b/lib/Backends/OpenCL/CMakeLists.txt @@ -38,6 +38,7 @@ add_library(OpenCLBackend OpenCL.cpp OpenCLDeviceManager.cpp OpenCLFactory.cpp + OpenCLTensorLayout.cpp Transforms.cpp) target_link_libraries(OpenCLBackend diff --git a/lib/Backends/OpenCL/OpenCL.cpp b/lib/Backends/OpenCL/OpenCL.cpp index 46c388a447..63fbe6a9a3 100644 --- a/lib/Backends/OpenCL/OpenCL.cpp +++ b/lib/Backends/OpenCL/OpenCL.cpp @@ -22,6 +22,7 @@ #include "OpenCL.h" #include "OpenCLDeviceManager.h" +#include "OpenCLTensorLayout.h" #include "glow/Backend/BackendUtils.h" #include "glow/CodeGen/MemoryAllocator.h" @@ -1720,7 +1721,7 @@ template static bool checkSquare(const T &I) { } bool OCLBackend::verify(const Function &F) const { - if (!F.verify()) { + if (!F.verify(this)) { return false; } if (!checkAllNodesSupported(F)) { @@ -1879,6 +1880,10 @@ OCLBackend::createDeviceManager(const runtime::DeviceConfig &deviceConfig) { return createOCLDeviceManager(deviceConfig); } +TensorLayoutCommon &OCLBackend::getTensorLayoutRequirements() const { + return OpenCLTensorLayout::getInstance(); +} + TraceInfo OCLBackend::buildManualTraceInfo(Function *F) const { TraceInfo info(false, getTraceEventDataSize()); diff --git a/lib/Backends/OpenCL/OpenCL.h b/lib/Backends/OpenCL/OpenCL.h index 418e25f90f..6f1437f4ef 100644 --- a/lib/Backends/OpenCL/OpenCL.h +++ b/lib/Backends/OpenCL/OpenCL.h @@ -213,6 +213,8 @@ class OCLBackend final : public BackendUsingGlowIR { bool verify(const Function &F) const override; bool verify(const IRFunction &IR) const override; + TensorLayoutCommon &getTensorLayoutRequirements() const override; + bool shouldLower(const Node *N) const override { // The group convolution is supported in OpenCL slow convolution kernel. if (N->getKind() == Kinded::Kind::ConvolutionNodeKind) diff --git a/lib/Backends/OpenCL/OpenCLTensorLayout.cpp b/lib/Backends/OpenCL/OpenCLTensorLayout.cpp new file mode 100644 index 0000000000..6bd60da173 --- /dev/null +++ b/lib/Backends/OpenCL/OpenCLTensorLayout.cpp @@ -0,0 +1,122 @@ +/** + * Copyright (c) 2017-present, Facebook, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +#include "OpenCLTensorLayout.h" +#include "glow/Optimizer/GraphOptimizer/CompilationContext.h" + +using namespace glow; + +/// Definitions of different tensor layouts. +static std::string oclDimsNHWC[] = { + {"N"}, + {"H"}, + {"W"}, + {"C"}, +}; +static std::string oclDimsNCHW[] = { + {"N"}, + {"C"}, + {"H"}, + {"W"}, +}; +static TensorLayoutDescription oclLayoutNHWC(oclDimsNHWC); +static TensorLayoutDescription oclLayoutNCHW(oclDimsNCHW); + +static std::string returnBaseReqOrNHWC(TensorLayoutDescription &baseReq, + const Node *node) { + if (!baseReq.isSameLayout( + CanonicalTensorLayout::getInstance().getLayoutsForDims()[4])) { + return baseReq.getSerializedLayout(); + } + if (CanonicalTensorLayout::getInstance().acceptsAnyLayout(node)) { + // These nodes accept any 4-D layout. + return baseReq.getSerializedLayout(); + } + + return CanonicalTensorLayout::getInstance().getDefaultNDLayout(4); +} + +/// Helper function, \returns either NHWC or NCHW layout based on the +/// instruction's layout enum. This will be removed and refactored if/when we +/// move to using strings for all layout specifications and get rid of the enum. +template +static const TensorLayoutDescription *getLayoutFromEnum(const N &node) { + if (node->getLayout() == NCHW) { + return &oclLayoutNCHW; + } + return &oclLayoutNHWC; +} + +/// \returns either NHWC or NCHW layout based on the instruction's layout enum +/// if it has one. Else returns nullptr. This will be removed and refactored +/// if/when we move to using strings for all layout specifications and get rid +/// of the enum. +static const TensorLayoutDescription * +getLayoutForTempEnumRep(size_t n, const Node *node) { + if (const auto MP = llvm::dyn_cast(node)) { + return getLayoutFromEnum(MP); + } + if (const auto MPG = llvm::dyn_cast(node)) { + return getLayoutFromEnum(MPG); + } + if (const auto AP = llvm::dyn_cast(node)) { + return getLayoutFromEnum(AP); + } + if (const auto APG = llvm::dyn_cast(node)) { + return getLayoutFromEnum(APG); + } + + if (const auto *CN = llvm::dyn_cast(node)) { + switch (n) { + case ConvolutionNode::InputIndices::BiasIdx: + return &CanonicalTensorLayout::getInstance().getLayoutsForDims()[1]; + default: { return getLayoutFromEnum(CN); } + } + } + return nullptr; +} + +std::string OpenCLTensorLayout::getNthInputLayoutRequirements(const Node *node, + size_t n) { + DCHECK_LT(n, node->getNumInputs()) << "Wrong input number"; + auto inputNode = node->getNthInput(n); + auto dims = inputNode.getType()->dims(); + DCHECK_LE(dims.size(), max_tensor_dimensions) << "Too many dimensions"; + // TODO: Remove ->getLayout() enum and take a string like transpose. Refactor + // the following after doing so. + const auto *layout = getLayoutForTempEnumRep(n, node); + if (layout) { + return layout->getSerializedLayout(); + } + auto baseReq = TensorLayoutCommon::getNthInputLayoutRequirements(node, n); + auto baseReqHelper = TensorLayoutDescription(baseReq); + return returnBaseReqOrNHWC(baseReqHelper, node); +} + +std::string OpenCLTensorLayout::getNthResultLayoutRequirements(const Node *node, + size_t n) { + DCHECK_LT(n, node->getNumResults()) << "Wrong output number"; + auto dims = node->getNthResult(n).getType()->dims(); + DCHECK_LE(dims.size(), max_tensor_dimensions) << "Too many dimensions"; + // TODO: Remove ->getLayout() enum and take a string like transpose. Refactor + // the following after doing so. + const auto *layout = getLayoutForTempEnumRep(n, node); + if (layout) { + return layout->getSerializedLayout(); + } + auto baseReq = TensorLayoutCommon::getNthResultLayoutRequirements(node, n); + auto baseReqHelper = TensorLayoutDescription(baseReq); + return returnBaseReqOrNHWC(baseReqHelper, node); +} diff --git a/lib/Backends/OpenCL/OpenCLTensorLayout.h b/lib/Backends/OpenCL/OpenCLTensorLayout.h new file mode 100644 index 0000000000..fa3531097c --- /dev/null +++ b/lib/Backends/OpenCL/OpenCLTensorLayout.h @@ -0,0 +1,40 @@ +/** + * Copyright (c) 2017-present, Facebook, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +#ifndef GLOW_BACKENDS_OPENCL_TENSORLAYOUT_H +#define GLOW_BACKENDS_OPENCL_TENSORLAYOUT_H + +#include "glow/Graph/TensorLayout.h" + +namespace glow { + +class OpenCLTensorLayout final + : public TensorLayoutCommon, + public TensorLayoutSingleton { +public: + OpenCLTensorLayout(token_) { enabled_ = true; } + + /// \returns layout requirements of the Nth input \p n of a Node \p node. + std::string getNthInputLayoutRequirements(const Node *node, + size_t n) override; + + /// \returns layout requirements of the Nth result \p n of a Node \p node. + std::string getNthResultLayoutRequirements(const Node *node, + size_t n) override; +}; + +} // end namespace glow + +#endif // GLOW_BACKENDS_OPENCL_TENSORLAYOUT_H diff --git a/lib/Backends/OpenCL/tests/OpenCLTensorLayoutTest.cpp b/lib/Backends/OpenCL/tests/OpenCLTensorLayoutTest.cpp new file mode 100644 index 0000000000..bdf4d1d0f6 --- /dev/null +++ b/lib/Backends/OpenCL/tests/OpenCLTensorLayoutTest.cpp @@ -0,0 +1,20 @@ +/** + * Copyright (c) Glow Contributors. See CONTRIBUTORS file. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +#include "tests/unittests/BackendTestUtils.h" + +using namespace glow; + +std::set glow::backendTestBlacklist = {}; diff --git a/lib/Base/CMakeLists.txt b/lib/Base/CMakeLists.txt index 04ae417f6a..01c4243045 100644 --- a/lib/Base/CMakeLists.txt +++ b/lib/Base/CMakeLists.txt @@ -18,3 +18,5 @@ if(PNG_FOUND) PRIVATE ${PNG_LIBRARY}) endif() + +add_dependencies(Base AutoGen) diff --git a/lib/Graph/CMakeLists.txt b/lib/Graph/CMakeLists.txt index 2aa24d3cc8..c7a331970f 100644 --- a/lib/Graph/CMakeLists.txt +++ b/lib/Graph/CMakeLists.txt @@ -27,6 +27,7 @@ add_library(Graph NodeValue.cpp Log.cpp PlaceholderBindings.cpp + TensorLayout.cpp Graph.cpp Grad.cpp VerifierHelper.cpp) diff --git a/lib/Graph/Grad.cpp b/lib/Graph/Grad.cpp index f576bcbe3d..33b2ad9db4 100644 --- a/lib/Graph/Grad.cpp +++ b/lib/Graph/Grad.cpp @@ -138,7 +138,7 @@ Function *glow::differentiate(Function *F, const TrainingConfig &conf, // Swap the src and dest. auto *X = new ReshapeNode(N->getName(), inputW.getType(), outputG, - inputW.getType()->dims()); + inputW.getType()->dims(), RN->getLayout()); toAppend.push_back(X); map.addGradient(RN->getInput(), X); continue; @@ -164,8 +164,9 @@ Function *glow::differentiate(Function *F, const TrainingConfig &conf, auto *BRAInputType = F->getParent()->uniqueTypeWithNewShape(TNInputType, BRAInputDims); - auto *RN = new ReshapeNode(TN->getName().str() + ".grad.reshape", - BRAInputType, outputG, BRAInputType->dims()); + auto *RN = + new ReshapeNode(TN->getName().str() + ".grad.reshape", BRAInputType, + outputG, BRAInputType->dims(), "*"); auto *BRA = new BatchedReduceAddNode(TN->getName().str() + ".grad.bra", TN->getInput().getType(), RN, TN->getAxis()); @@ -195,14 +196,18 @@ Function *glow::differentiate(Function *F, const TrainingConfig &conf, // Generate the reverse shuffle. auto shuffle = TN->getShuffle(); + auto layout = TN->getLayout(); + std::string reverseLayout; + reverseLayout.resize(TN->getLayout().size()); std::vector reverseShuffle(shuffle.begin(), shuffle.end()); for (unsigned int i = 0; i < shuffle.size(); i++) { reverseShuffle[shuffle[i]] = i; + reverseLayout[shuffle[i]] = layout[i]; } // Swap the src and dest. auto *X = new TransposeNode(N->getName(), inputW.getType(), outputG, - reverseShuffle); + reverseShuffle, reverseLayout); toAppend.push_back(X); map.addGradient(TN->getInput(), X); continue; diff --git a/lib/Graph/Graph.cpp b/lib/Graph/Graph.cpp index faaf852c66..3b9f05f2b1 100644 --- a/lib/Graph/Graph.cpp +++ b/lib/Graph/Graph.cpp @@ -14,8 +14,10 @@ * limitations under the License. */ #include "glow/Graph/Graph.h" +#include "glow/Backend/Backend.h" #include "glow/Graph/Nodes.h" #include "glow/Graph/PlaceholderBindings.h" +#include "glow/Graph/TensorLayout.h" #include "glow/Graph/VerifierHelper.h" #include "glow/Quantization/Base/Base.h" #include "glow/Support/Support.h" @@ -495,9 +497,10 @@ static ShapeVector getNewShapeWithoutAxes(llvm::ArrayRef dims, //===----------------------------------------------------------------------===// Placeholder *Module::createPlaceholder(TypeRef T, llvm::StringRef name, - bool isTrainable) { + bool isTrainable, + const std::string &layout) { auto FT = uniqueType(*T); - auto *ph = new Placeholder(name, FT, isTrainable); + auto *ph = new Placeholder(name, FT, isTrainable, layout); ph->setName(uniqueName(ph->getName(), usedNodeNames_, usedStorageNames_)); placeholders_.push_back(ph); logStorageCreation(functions_, ph); @@ -505,44 +508,51 @@ Placeholder *Module::createPlaceholder(TypeRef T, llvm::StringRef name, } Placeholder *Module::createPlaceholder(ElemKind T, llvm::ArrayRef dims, - llvm::StringRef name, bool isTrainable) { + llvm::StringRef name, bool isTrainable, + const std::string &layout) { auto FT = uniqueType(T, dims); - return createPlaceholder(FT, name, isTrainable); + return createPlaceholder(FT, name, isTrainable, layout); } Placeholder *Module::createPlaceholder(ElemKind T, llvm::ArrayRef dims, float scale, int32_t offset, - llvm::StringRef name, bool isTrainable) { + llvm::StringRef name, bool isTrainable, + const std::string &layout) { auto FT = uniqueType(T, dims, scale, offset); - return createPlaceholder(FT, name, isTrainable); + return createPlaceholder(FT, name, isTrainable, layout); } -Constant *Module::createConstant(TypeRef T, llvm::StringRef name) { +Constant *Module::createConstant(TypeRef T, llvm::StringRef name, + const std::string &layout) { auto FT = uniqueType(*T); - return addConstant(new Constant(name, FT)); + return addConstant(new Constant(name, FT, layout)); } Constant *Module::createConstant(ElemKind T, llvm::ArrayRef dims, - llvm::StringRef name) { + llvm::StringRef name, + const std::string &layout) { auto FT = uniqueType(T, dims); - return createConstant(FT, name); + return createConstant(FT, name, layout); } Constant *Module::createConstant(ElemKind T, llvm::ArrayRef dims, float scale, int32_t offset, - llvm::StringRef name) { + llvm::StringRef name, + const std::string &layout) { auto FT = uniqueType(T, dims, scale, offset); - return createConstant(FT, name); + return createConstant(FT, name, layout); } -Constant *Module::createConstant(llvm::StringRef name, const Tensor &tensor) { - auto *V = createConstant(&tensor.getType(), name); +Constant *Module::createConstant(llvm::StringRef name, const Tensor &tensor, + const std::string &layout) { + auto *V = createConstant(&tensor.getType(), name, layout); V->assign(&tensor); return V; } -Constant *Module::createConstant(llvm::StringRef name, Tensor &&tensor) { - return addConstant(new Constant(name, std::move(tensor))); +Constant *Module::createConstant(llvm::StringRef name, Tensor &&tensor, + const std::string &layout) { + return addConstant(new Constant(name, std::move(tensor), layout)); } std::string Module::getPrefix(llvm::StringRef name) { @@ -956,23 +966,46 @@ Function::createSigmoidCrossEntropyWithLogits(llvm::StringRef name, } ReshapeNode *Function::createReshape(llvm::StringRef name, NodeValue input, - llvm::ArrayRef shape) { + llvm::ArrayRef shape, + llvm::StringRef layout) { auto TR = getParent()->uniqueTypeWithNewShape(input.getType(), shape); DCHECK_EQ(TR->size(), input.getType()->size()) << "Reshape to a different size"; - return addNode(new ReshapeNode(name, TR, input, shape.vec())); + return addNode(new ReshapeNode(name, TR, input, shape.vec(), layout)); } TransposeNode *Function::createTranspose(llvm::StringRef name, NodeValue input, - llvm::ArrayRef shuffle) { + llvm::ArrayRef shuffle, + const std::string &layout) { ShapeVector shape; auto dims = input.dims(); for (size_t i = 0; i < dims.size(); i++) { shape.push_back(dims[shuffle[i]]); } + // If the layout is known, check that it matches the shuffle: + auto compareShuffle = [&](const std::vector targetShuffle) { + auto shuffleVec = shuffle.vec(); + return targetShuffle.size() == dims.size() && + std::equal(shuffleVec.begin(), shuffleVec.end(), + targetShuffle.begin()); + }; + + auto currLayout = layout; + if (currLayout == ANY_LAYOUT) { + // If layout got a default value, change it based on shuffle: + // TODO: remove the shuffle and replace it with layout. + if (compareShuffle(NCHW2NHWC) || compareShuffle(HWCN2NHWC)) { + currLayout = "NHWC"; + } else if (compareShuffle(NHWC2NCHW)) { + currLayout = "NCHW"; + } else if (compareShuffle(NHWC2HWNC)) { + currLayout = "HWNC"; + } + } + auto NT = getParent()->uniqueTypeWithNewShape(input.getType(), shape); - return addNode(new TransposeNode(name, NT, input, shuffle.vec())); + return addNode(new TransposeNode(name, NT, input, shuffle.vec(), currLayout)); } Node *Function::createBroadcast(llvm::StringRef name, NodeValue input, @@ -3140,8 +3173,21 @@ insertAndReport(std::unordered_map &nameToNode, return true; } -bool Function::verify() const { +bool Function::verify(const Backend *backend) const { bool isValid = true; + if (backend) { + if (backend->getTensorLayoutRequirements().isEnabled()) { + isValid &= expectCompareTrue( + "Expected correct backend-specific layouts for the graph", + verifyLayouts(*this, backend->getTensorLayoutRequirements()), true, + this); + } + } else { + // Always run verification pre-lowering / when we don't have backend: + isValid &= expectCompareTrue( + "Expected correct Glow canonical layouts for the graph", + verifyLayouts(*this, CanonicalTensorLayout::getInstance()), true, this); + } std::unordered_map nameToNode; for (auto *V : getParent()->getConstants()) { diff --git a/lib/Graph/Nodes.cpp b/lib/Graph/Nodes.cpp index 25ac56cee2..7c96e75f93 100644 --- a/lib/Graph/Nodes.cpp +++ b/lib/Graph/Nodes.cpp @@ -86,6 +86,7 @@ Node *Storage::clone() const { llvm_unreachable("Storage can't be cloned."); } std::string Constant::getDebugDesc() const { DescriptionBuilder db(getKindName()); db.addParam("name", quote(getName())) + .addParam("layout", getLayout()) .addParam("output", *getType()) .addParam("users", getNumUsers()); return db; @@ -94,6 +95,7 @@ std::string Constant::getDebugDesc() const { std::string Placeholder::getDebugDesc() const { DescriptionBuilder db(getKindName()); db.addParam("name", quote(getName())) + .addParam("layout", getLayout()) .addParam("output", *getType()) .addParam("users", getNumUsers()) .addParam("trainable", isTraining()); diff --git a/lib/Graph/TensorLayout.cpp b/lib/Graph/TensorLayout.cpp new file mode 100644 index 0000000000..af4374dde2 --- /dev/null +++ b/lib/Graph/TensorLayout.cpp @@ -0,0 +1,581 @@ +/** + * Copyright (c) 2017-present, Facebook, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include +#include +#include + +#include + +#include "glow/Graph/Graph.h" +#include "glow/Graph/TensorLayout.h" +#include "glow/Graph/VerifierHelper.h" + +using namespace glow; + +/// Checks if two layout descriptions \p lhs and \p rhs describe the same layout +/// for a value of the type \p ty \returns true if layouts are the same. +bool glow::checkSameLayout(llvm::StringRef srcLayoutStr, + llvm::StringRef destLayoutStr, TypeRef ty, + const Node *parent, const std::string &prefix, + const TensorLayoutCommon &TLC, bool verbose) { + auto srcLayout = TensorLayoutDescription(srcLayoutStr); + auto destLayout = TensorLayoutDescription(destLayoutStr); + // Are layouts literally the same? + if (srcLayout.isSameLayout(destLayout)) { + return true; + } + // Does the type satisfy the dest layout? + if (TLC.isSatisfiedBy(ty, destLayout, &srcLayout)) { + return true; + } + if (verbose) { + report("\n\n\n"); + reportContext(parent); + report("\n"); + report(prefix); + report("\n"); + report(parent->getDebugDesc()); + report("\nMismatching layouts:\n"); + report("Provided layout\n"); + report(srcLayout.getDebugDesc()); + report("\n"); + report("Expected layout\n"); + report(destLayout.getDebugDesc()); + report("\n"); + } + return false; +} + +/// Verifies the correctness of tensor layouts in the function \p F using layout +/// requirements interface \p TLC. +bool glow::verifyLayouts(const Function &F, TensorLayoutCommon &TLC, + bool verbose) { + bool isValid = true; + for (const auto &N : F.getNodes()) { + for (unsigned idx = 0, e = N.getNumInputs(); idx < e; ++idx) { + auto input = N.getNthInput(idx); + auto producerLayout = + TLC.getNthResultLayoutRequirements(input.getNode(), input.getResNo()); + auto consumerLayout = TLC.getNthInputLayoutRequirements(&N, idx); + std::string inputName = strFormat("input %d", idx); + isValid &= checkSameLayout(producerLayout, consumerLayout, + input.getType(), &N, inputName, TLC, verbose); + } + } + return isValid; +} + +TensorLayoutDescription::TensorLayoutDescription(const std::string &layoutStr) { + if (layoutStr.empty()) { + // 0-D output + numDims_ = 0; + return; + } + parse(layoutStr); +} + +static bool isCustomExtension(llvm::StringRef text) { + auto nsPos = text.find(':'); + if (nsPos == llvm::StringRef::npos) { + return false; + } + auto bracketPos = text.find(']'); + assert(bracketPos != llvm::StringRef::npos && "Expected a closing bracket."); + return (bracketPos > nsPos); +} + +// Serialization format - +// The form for each dimension is as follows: +// 1. (mandatory) one char representing the current dimension. Either an +// alphabetic letter or '*'. +// 2. (optional) token for the start of optional dimension information: '[' +// 3. (optional, must have 2. in place) namespace of the extension followed by +// ':'. must be provided for non-official backends. example: ocl: +// 4. (optional, must have 2. in place) end of the current default extension +// ']' +// 5. (optional) go to 2. +// NOTE: To add alignment information, the format is: a= +// Example: N[a=32][namespace_for_unsupported:]HWC would represent 4-D +// tensor wherein N needs an alignment of 32 + some closed-backend requirements +// we don't know about. HWC have no restrictions. +// NOTES: +// 1. For each dimension, the identifier can be either a single english alphabet +// letter, either upper or lower case, or the star symbol. +// 2. We assume that a single letter is enough for each dimension, it makes +// parsing easier and avoids adding delimiters in the serialized format, +// however, we do have a constructor that (theoretically) accepts multi-letter +// dimensions. If we decide to expand the current support, we will need to add +// delimiters to the serialized form. +void TensorLayoutDescription::parse(llvm::StringRef text) { + unsigned idx = 0; + while (!text.empty()) { + char curr = text.front(); + text = text.drop_front(); + if (curr == '\0' || isblank(curr)) { + continue; + } + switch (curr) { + case '[': { + assert(idx > 0 && "Expected at least one parsed entry."); + if (isCustomExtension(text)) { + parseCustomExtensions(text, idx - 1); + } else { + parseOfficialExtensions(text, idx - 1); + } + break; + } + default: { + DCHECK(isalpha(curr) || curr == '*') + << "Expected an alphabetic letter or '*'., got: " << curr + << " in string: " << text.str(); + std::string currStr(1, curr); + dims_[idx].append(currStr); + serializedLayout_.append(dims_[idx]); + ++idx; + assert(idx <= max_tensor_dimensions && "Too many tensor dimensions"); + break; + } + } + } + numDims_ = idx; +} + +void TensorLayoutDescription::parseCustomExtensions(llvm::StringRef &text, + unsigned idx) { + char curr = '['; + dims_[idx].append("["); + for (curr = text.front(); curr != ']' && !text.empty(); curr = text.front()) { + dims_[idx].append(std::string(1, curr)); + text = text.drop_front(); + } + assert(curr == ']' && "Expected closing ']' bracket."); + text = text.drop_front(); + dims_[idx].append("]"); +} + +void TensorLayoutDescription::parseOfficialExtensions(llvm::StringRef &text, + unsigned idx) { + // Only alignment so far - very simple parser: + if (!text.consume_front("a=")) { + llvm_unreachable("Unsupported layout extension."); + } + size_t align; + if (text.consumeInteger(10, align)) { + llvm_unreachable("Expected alignment info."); + } + if (!text.consume_front("]")) { + llvm_unreachable("Expected closing ']'"); + } + dims_[idx].append("[a="); + dims_[idx].append(std::to_string(align)); + dims_[idx].append("]"); +} + +TensorLayoutDescription::TensorLayoutDescription( + llvm::ArrayRef dims) { + assert(dims.size() <= max_tensor_dimensions && "Too many tensor dimensions"); + numDims_ = dims.size(); + for (unsigned idx = 0; idx < numDims_; ++idx) { + dims_[idx] = dims[idx]; + serializedLayout_.append(dims_[idx]); + } +} + +const llvm::StringRef +TensorLayoutDescription::getNthDimDescription(size_t n) const { + assert(n < numDims_ && "Wrong dimension number"); + return dims_[n]; +} + +size_t TensorLayoutDescription::getAlignment(size_t n) const { + assert(n < numDims_ && "Wrong dimension number"); + return getAlignment(dims_[n]); +} + +size_t TensorLayoutDescription::getAlignment(const std::string &s) const { + std::string alignPrefix = "a="; + size_t pos = s.find(alignPrefix); + if (pos == std::string::npos) { + // Default alignment: + return 1; + } + auto align = s.substr(pos + alignPrefix.size()); + size_t ret; + std::istringstream(align) >> ret; + return ret; +} + +llvm::ArrayRef TensorLayoutDescription::getDims() const { + return llvm::makeArrayRef(dims_, numDims_); +} + +std::string TensorLayoutDescription::getDebugDesc() const { + std::string desc = "Layout: " + getSerializedLayout().str() + " ["; + for (unsigned idx = 0; idx < numDims_; idx++) { + if (idx > 0) { + desc += ", "; + } + desc += "name = "; + desc += dims_[idx]; + desc += " : alignment = "; + desc += std::to_string(getAlignment(idx)); + desc += " : index = "; + desc += std::to_string(idx); + } + desc += "]"; + return desc; +} + +bool TensorLayoutDescription::isSameLayout( + const TensorLayoutDescription &rhs) const { + if (numDims_ != rhs.numDims_) { + return false; + } + if (serializedLayout_ != rhs.serializedLayout_) { + return false; + } + return true; +} + +static bool isAnyHelper(llvm::StringRef layout) { + for (unsigned idx = 0, e = layout.size(); idx < e; ++idx) { + if (layout[idx] != '*') { + return false; + } + } + return true; +} + +bool TensorLayoutDescription::isAnyLayout() { + return (isAnyHelper(getSerializedLayout())); +} + +/// Definitions of different tensor layouts. +static std::string dimsNHWC[] = { + {"N"}, + {"H"}, + {"W"}, + {"C"}, +}; +static std::string dimsNCHW[] = { + {"N"}, + {"C"}, + {"H"}, + {"W"}, +}; +static std::string dimsHWNC[] = { + {"H"}, + {"W"}, + {"N"}, + {"C"}, +}; +static std::string dims0D[]{ + {""}, +}; +static std::string dims1D[] = { + {"N"}, +}; +static std::string dims2D[] = { + {"*"}, + {"*"}, +}; +static std::string dims3D[] = { + {"*"}, + {"*"}, + {"*"}, +}; +static std::string dims4D[] = { + {"*"}, + {"*"}, + {"*"}, + {"*"}, +}; +static std::string dims5D[] = { + {"*"}, {"*"}, {"*"}, {"*"}, {"*"}, +}; +static std::string dims6D[] = { + {"*"}, {"*"}, {"*"}, {"*"}, {"*"}, {"*"}, +}; + +static TensorLayoutDescription layoutNHWC(dimsNHWC); +static TensorLayoutDescription layoutNCHW(dimsNCHW); +static TensorLayoutDescription layoutHWNC(dimsHWNC); +static TensorLayoutDescription layout0D(dims0D); +static TensorLayoutDescription layout1D(dims1D); +static TensorLayoutDescription layout2D(dims2D); +static TensorLayoutDescription layout3D(dims3D); +static TensorLayoutDescription layout4D(dims4D); +static TensorLayoutDescription layout5D(dims5D); +static TensorLayoutDescription layout6D(dims6D); + +/// Glow layouts for any specific number of dimensions. +static TensorLayoutDescription layoutsForDims[] = { + layout0D, layout1D, layout2D, layout3D, layout4D, layout5D, layout6D, +}; + +TensorLayoutCommon::TensorLayoutCommon() : enabled_(false) { + layoutNameToLayoutDescription_.insert( + std::make_pair("NCHW", new TensorLayoutDescription("NCHW"))); + layoutNameToLayoutDescription_.insert( + std::make_pair("NHWC", new TensorLayoutDescription("NHWC"))); + layoutNameToLayoutDescription_.insert( + std::make_pair("HWNC", new TensorLayoutDescription("HWNC"))); + layoutNameToLayoutDescription_.insert( + std::make_pair("N", new TensorLayoutDescription("N"))); +} + +TensorLayoutCommon::~TensorLayoutCommon() { + while (!layoutNameToLayoutDescription_.empty()) { + auto curr = layoutNameToLayoutDescription_.begin(); + auto *tld = curr->second; + layoutNameToLayoutDescription_.erase(curr); + delete tld; + } +} + +llvm::ArrayRef +TensorLayoutCommon::getLayoutsForDims() const { + return llvm::makeArrayRef(layoutsForDims); +} + +static TensorLayoutDescription * +getLayoutFromName(const std::string &name, + std::unordered_map + &layoutNameToLayoutDescription) { + if (isAnyHelper(name)) { + return nullptr; + } + auto it = layoutNameToLayoutDescription.find(name); + if (it != layoutNameToLayoutDescription.end()) { + return it->second; + } + // Add new layout to map: + auto *ret = new TensorLayoutDescription(name); + if (ret->getNumDims() == 0) { + // empty / any layout. + delete ret; + ret = nullptr; + } + layoutNameToLayoutDescription.insert(std::make_pair(name, ret)); + return ret; +} + +std::string TensorLayoutCommon::getDefaultNDLayout(unsigned dims) const { + DCHECK_LE(dims, max_tensor_dimensions) << "Too many dimensions"; + return getLayoutsForDims()[dims].getSerializedLayout(); +} + +std::string TensorLayoutCommon::getNthInputLayoutRequirements(const Node *node, + size_t n) { + DCHECK_LT(n, node->getNumInputs()) << "Wrong input number"; + auto dims = node->getNthInput(n).getType()->dims(); + DCHECK_LE(dims.size(), max_tensor_dimensions) << "Too many dimensions"; + if (const auto *TN = llvm::dyn_cast(node)) { + // The layout for the input of transpose is the same as the layout of the + // operation's result producing this input. + auto input = TN->getInput(); + return getNthResultLayoutRequirements(input.getNode(), input.getResNo()); + } + if (const auto *QN = llvm::dyn_cast(node)) { + auto input = QN->getInput(); + return getNthResultLayoutRequirements(input.getNode(), input.getResNo()); + } + if (const auto *QPN = llvm::dyn_cast(node)) { + switch (n) { + case QuantizationProfileNode::InputIndices::InputIdx: { + auto input = QPN->getInput(); + return getNthResultLayoutRequirements(input.getNode(), input.getResNo()); + } + default: + return getLayoutsForDims()[dims.size()].getSerializedLayout(); + } + } + return getLayoutsForDims()[dims.size()].getSerializedLayout(); +} + +/// \returns The index of node \p N input \p in. NumInputs if not found. +static unsigned getInputIdx(const Node *N, NodeValue in) { + for (unsigned idx = 0, e = N->getNumInputs(); idx < e; ++idx) { + if (N->getNthInput(idx) == in) { + return idx; + } + } + return N->getNumInputs(); +} + +std::string TensorLayoutCommon::getNthResultLayoutRequirements(const Node *node, + size_t n) { + DCHECK_LT(n, node->getNumResults()) << "Wrong output number"; + auto dims = node->getNthResult(n).getType()->dims(); + DCHECK_LE(dims.size(), max_tensor_dimensions) << "Too many dimensions"; + if (auto *TN = llvm::dyn_cast(node)) { + // If the result of Transpose is a concrete layout, try to use this specific + // layout. + if (auto *layout = getLayoutFromName(TN->getLayout(), + layoutNameToLayoutDescription_)) { + return layout->getSerializedLayout(); + } + // Dynamically form the layout description for transposes. + auto input = TN->getInput(); + auto inputLayout = + getNthInputLayoutRequirements(node, TransposeNode::InputIdx); + auto inputLayoutHelper = TensorLayoutDescription(inputLayout); + llvm::SmallVector dims( + input.dims().size()); + auto shuffle = TN->getShuffle(); + for (unsigned idx = 0, e = inputLayoutHelper.getNumDims(); idx < e; ++idx) { + dims[shuffle[idx]] = inputLayoutHelper.getNthDimDescription(idx); + } + TensorLayoutDescription tld(dims); + return tld.getSerializedLayout(); + } + if (auto *C = llvm::dyn_cast(node)) { + if (auto *layout = + getLayoutFromName(C->getLayout(), layoutNameToLayoutDescription_)) { + return layout->getSerializedLayout(); + } + } + if (auto *PH = llvm::dyn_cast(node)) { + if (auto *layout = getLayoutFromName(PH->getLayout(), + layoutNameToLayoutDescription_)) { + return layout->getSerializedLayout(); + } + } + if (auto *RN = llvm::dyn_cast(node)) { + if (auto *layout = getLayoutFromName(RN->getLayout(), + layoutNameToLayoutDescription_)) { + return layout->getSerializedLayout(); + } + auto result = node->getNthResult(n); + auto *user = (*result.getUsers().begin()).getUser(); + int inputIdx = getInputIdx(user, result); + if (inputIdx >= user->getNumInputs() || llvm::isa(user)) { + return getLayoutsForDims()[dims.size()].getSerializedLayout(); + } + auto layout = getNthInputLayoutRequirements(user, inputIdx); + if (auto *layoutDesc = + getLayoutFromName(layout, layoutNameToLayoutDescription_)) { + return layoutDesc->getSerializedLayout(); + } + } + return getLayoutsForDims()[dims.size()].getSerializedLayout(); +} + +bool TensorLayoutCommon::isSatisfiedBy( + TypeRef ty, const TensorLayoutDescription &destLayout, + const TensorLayoutDescription *srcLayout) const { + // Strides of the type (in elements). + auto strides = ty->strides(); + if (strides.size() != destLayout.getNumDims()) { + return false; + } + unsigned idx = 0; + for (const auto &dim : destLayout.getDims()) { + // dim.alignment is in bytes, but strides are in elements. + if (strides[idx] * ty->getElementSize() % destLayout.getAlignment(dim) != + 0) { + return false; + } + idx++; + } + if (!srcLayout) { + return true; + } + if (destLayout.getNumDims() != srcLayout->getNumDims()) { + return false; + } + // Names should be compatible. * is compatible to anything. + if (srcLayout->getSerializedLayout().size() != + destLayout.getSerializedLayout().size()) { + return false; + } + for (unsigned idx = 0, e = destLayout.getSerializedLayout().size(); idx < e; + ++idx) { + // '*' is compatible with anything. + if (destLayout.getSerializedLayout()[idx] == '*' || + srcLayout->getSerializedLayout()[idx] == '*') { + continue; + } + // Non-'*' are only compatible with themselves. + if (srcLayout->getSerializedLayout()[idx] == + destLayout.getSerializedLayout()[idx]) { + continue; + } + return false; + } + return true; +} + +static std::string returnBaseReqOrNHWC(std::string baseReq) { + auto baseReqHelper = TensorLayoutDescription(baseReq); + if (!baseReqHelper.isSameLayout( + CanonicalTensorLayout::getInstance().getLayoutsForDims()[4])) { + return baseReq; + } + // NHWC is the canonical default + return CanonicalTensorLayout::getInstance().getDefaultNDLayout(4); +} + +std::string +CanonicalTensorLayout::getNthInputLayoutRequirements(const Node *node, + size_t n) { + auto baseReq = TensorLayoutCommon::getNthInputLayoutRequirements(node, n); + if (acceptsAnyLayout(node)) { + return baseReq; + } + return returnBaseReqOrNHWC(baseReq); +} + +std::string +CanonicalTensorLayout::getNthResultLayoutRequirements(const Node *node, + size_t n) { + auto baseReq = TensorLayoutCommon::getNthResultLayoutRequirements(node, n); + return returnBaseReqOrNHWC(baseReq); +} + +std::string CanonicalTensorLayout::getDefaultNDLayout(unsigned dims) const { + if (dims == 4) { + return layoutNHWC.getSerializedLayout(); + } + return TensorLayoutCommon::getDefaultNDLayout(dims); +} + +static bool acceptsAnyInputLayout(const glow::Node *node) { + switch (node->getKind()) { + case Kinded::Kind::ConcatNodeKind: + case Kinded::Kind::BatchedReduceMeanNodeKind: + case Kinded::Kind::BatchedAddNodeKind: + case Kinded::Kind::BatchedReduceMinNodeKind: + case Kinded::Kind::BatchNormalizationNodeKind: + case Kinded::Kind::BatchNormalizationGradNodeKind: + case Kinded::Kind::ReshapeNodeKind: + case Kinded::Kind::MeanVarNormalizationNodeKind: + case Kinded::Kind::SGDNodeKind: { + return true; + } + default: { return false; } + } +} + +bool CanonicalTensorLayout::acceptsAnyLayout(const Node *node) const { + if (node->isDataParallel()) { + return true; + } + // In the canonical representation, some nodes are input layout agnostic even + // if they are not necessarily data parallel: + return acceptsAnyInputLayout(node); +} diff --git a/lib/IR/IRGen.cpp b/lib/IR/IRGen.cpp index 2a8e1e07e6..47ddcb043a 100644 --- a/lib/IR/IRGen.cpp +++ b/lib/IR/IRGen.cpp @@ -443,7 +443,7 @@ void IRGenVisitor::post(Node *parent, Node *N) { } void IRFunction::generateIR(const Backend &B) { - assert(G_->verify() && "Invalid function"); + assert(G_->verify(&B) && "Invalid function"); // Schedule the nodes. NodesPtrList ScheduledNodes; scheduleGraph(ScheduledNodes); diff --git a/lib/Importer/Caffe2ModelLoader.cpp b/lib/Importer/Caffe2ModelLoader.cpp index 1f910f5b01..9a1cd16976 100644 --- a/lib/Importer/Caffe2ModelLoader.cpp +++ b/lib/Importer/Caffe2ModelLoader.cpp @@ -333,7 +333,7 @@ Error Caffe2ModelLoader::loadConv(const caffe2::OperatorDef &op, // Caffe2 "Conv" op always stores the weight as CKRS. Tensor wT; w->getPayload().transpose(&wT, NCHW2NHWC); - w = G_.getParent()->createConstant(w->getName(), std::move(wT)); + w = G_.getParent()->createConstant(w->getName(), std::move(wT), "NHWC"); // The structure of the conv weights is: CRSK. We take the C, which is the // number of filters. We use this value to calculate the size of the bias @@ -434,7 +434,7 @@ Error Caffe2ModelLoader::loadConvQuantized(const caffe2::OperatorDef &op, if (order != "NHWC") { Tensor wT; w->getPayload().transpose(&wT, NCHW2NHWC); - w = G_.getParent()->createConstant(w->getName(), std::move(wT)); + w = G_.getParent()->createConstant(w->getName(), std::move(wT), "NHWC"); } // The structure of the conv weights is: CRSK. We take the C, which is the diff --git a/lib/Optimizer/GraphOptimizer/ConstantFolding.cpp b/lib/Optimizer/GraphOptimizer/ConstantFolding.cpp index bbaa0dff5a..9c4c1358a7 100644 --- a/lib/Optimizer/GraphOptimizer/ConstantFolding.cpp +++ b/lib/Optimizer/GraphOptimizer/ConstantFolding.cpp @@ -22,6 +22,7 @@ #include "glow/Graph/Node.h" #include "glow/Graph/Nodes.h" #include "glow/Graph/PlaceholderBindings.h" +#include "glow/Graph/TensorLayout.h" #include "glow/Graph/Utils.h" #include "glow/Optimizer/GraphOptimizer/FunctionPasses.h" @@ -113,6 +114,44 @@ void run(Backend &backend, CompiledFunction &compiledF, context.movePlaceholderBindings().release(); } +static bool isCanonicalLayout(const NodeValue &RN, Backend &backend, + Node *clonedC, size_t idx) { + auto resultLayoutStr = + backend.getTensorLayoutRequirements().getNthResultLayoutRequirements( + clonedC, idx); + auto resultLayout = TensorLayoutDescription(resultLayoutStr); + auto &canInstance = CanonicalTensorLayout::getInstance(); + auto default4DStr = canInstance.getDefaultNDLayout(4); + auto default4D = TensorLayoutDescription(default4DStr); + if (resultLayout.getDims().size() == 4 && + !canInstance.isSatisfiedBy(RN.getType(), default4D, &resultLayout)) { + return false; + } + return true; +} + +// Bail on constant folding post-lowering for backends that break assumptions. +static void bailOnNonCanonicalLayout( + Function *constEvaluationF, Module &mod, + const llvm::SmallVectorImpl &savedResults) { + // Some results may be in a non-canonical format post-lowering. + // For example, if we are trying to constant fold an OpenCL 'Reshape' that + // has NCHW layout. We cannot transpose it back to canonical layout for + // two reasons: 1) Need to add a solver that supports weird non-NCHW2NHWC + // backends. 2) Even if we get a constant tensor as a new "save" of the + // transpose, the new constant tensor will have the wrong shape. We'd + // actually need to transpose it back to its pre-modification shape. These + // issues may be solved in the future (TODO), for now bail on such corner + // cases. Clean-up before bailing: + for (auto *SN : savedResults) { + // Now erase the Placeholder that we created for the SaveNode. + auto &vars = mod.getPlaceholders(); + mod.erasePlaceholder( + std::find(vars.begin(), vars.end(), SN->getPlaceholder())); + } + mod.eraseFunction(constEvaluationF); +} + /// Evaluates a provided constant operation \p C using the provided \p backend /// and using the compilation context \p cctx. /// \returns constant results. @@ -134,8 +173,12 @@ evaluateConstantOperation(Backend &backend, CompilationContext &cctx, Node *C) { // Create save nodes for each of the results. llvm::SmallVector savedResults; for (size_t idx = 0, e = clonedC->getNumResults(); idx < e; ++idx) { - auto *SN = constEvaluationF->createSave(clonedC->getName(), - clonedC->getNthResult(idx)); + auto RN = clonedC->getNthResult(idx); + auto *SN = constEvaluationF->createSave(clonedC->getName(), RN); + if (!isCanonicalLayout(RN, backend, clonedC, idx)) { + bailOnNonCanonicalLayout(constEvaluationF, mod, savedResults); + return {}; + } savedResults.emplace_back(SN); bindings.allocate(SN->getPlaceholder()); } diff --git a/lib/Optimizer/GraphOptimizer/GraphOptimizer.cpp b/lib/Optimizer/GraphOptimizer/GraphOptimizer.cpp index 4b32b8e271..9010822778 100644 --- a/lib/Optimizer/GraphOptimizer/GraphOptimizer.cpp +++ b/lib/Optimizer/GraphOptimizer/GraphOptimizer.cpp @@ -23,6 +23,7 @@ #include "glow/Graph/Node.h" #include "glow/Graph/Nodes.h" #include "glow/Graph/PlaceholderBindings.h" +#include "glow/Graph/TensorLayout.h" #include "glow/Graph/Utils.h" #include "glow/Graph/VerifierHelper.h" #include "glow/Optimizer/GraphOptimizer/FunctionPasses.h" @@ -287,7 +288,8 @@ static bool sinkTranposeBelowChannelShuffle(Function *F, TR->getShuffle()[CS->getKernel()]); // Create a copy of sinkingTR and insert after newChannelShuffle. - auto *newTR = F->createTranspose(TR->getName(), newCS, TR->getShuffle()); + auto *newTR = F->createTranspose(TR->getName(), newCS, TR->getShuffle(), + TR->getLayout()); CS->getResult().replaceAllUsesOfWith(newTR); @@ -320,7 +322,8 @@ bool SinkCode::run(Function *F, const CompilationContext &cctx) { BN->getMean(), BN->getVar(), newChannelIdx, BN->getEpsilon(), BN->getMomentum()); NewBN->setPredicate(node->getPredicate()); - auto *newTR = F->createTranspose(TR->getName(), NewBN, TR->getShuffle()); + auto *newTR = F->createTranspose(TR->getName(), NewBN, TR->getShuffle(), + TR->getLayout()); newTR->setPredicate(node->getPredicate()); BN->getResult().replaceAllUsesOfWith(newTR); @@ -342,7 +345,8 @@ bool SinkCode::run(Function *F, const CompilationContext &cctx) { RL->getResult().getType(), TR->getInput().dims()); auto *NRL = F->createRELU(RL->getName(), TR->getInput(), reluOutTy); NRL->setPredicate(node->getPredicate()); - auto *newTR = F->createTranspose(TR->getName(), NRL, TR->getShuffle()); + auto *newTR = F->createTranspose(TR->getName(), NRL, TR->getShuffle(), + TR->getLayout()); newTR->setPredicate(node->getPredicate()); RL->getResult().replaceAllUsesOfWith(newTR); changed = true; @@ -359,7 +363,8 @@ bool SinkCode::run(Function *F, const CompilationContext &cctx) { auto *NSI = F->createSigmoid(SI->getName(), TR->getInput()); NSI->setPredicate(node->getPredicate()); - auto *newTR = F->createTranspose(TR->getName(), NSI, TR->getShuffle()); + auto *newTR = F->createTranspose(TR->getName(), NSI, TR->getShuffle(), + TR->getLayout()); newTR->setPredicate(node->getPredicate()); SI->getResult().replaceAllUsesOfWith(newTR); changed = true; @@ -417,7 +422,8 @@ bool SinkCode::run(Function *F, const CompilationContext &cctx) { auto *NTN = F->createTanh(TN->getName(), TR->getInput()); NTN->setPredicate(node->getPredicate()); - auto *newTR = F->createTranspose(TR->getName(), NTN, TR->getShuffle()); + auto *newTR = F->createTranspose(TR->getName(), NTN, TR->getShuffle(), + TR->getLayout()); newTR->setPredicate(node->getPredicate()); TN->getResult().replaceAllUsesOfWith(newTR); changed = true; @@ -485,7 +491,8 @@ bool SinkCode::run(Function *F, const CompilationContext &cctx) { dyn_cast(node->getNthInput(ArithmeticNode::LHSIdx)); auto *NS = F->createSplat("splat", RTR->getInput().getType(), SN->getValue()); - LTR = F->createTranspose("transpose", NS, RTR->getShuffle()); + LTR = F->createTranspose("transpose", NS, RTR->getShuffle(), + RTR->getLayout()); changed = true; } else if (isa(node->getNthInput(ArithmeticNode::RHSIdx)) && LTR) { @@ -494,7 +501,8 @@ bool SinkCode::run(Function *F, const CompilationContext &cctx) { dyn_cast(node->getNthInput(ArithmeticNode::RHSIdx)); auto *NS = F->createSplat("splat", LTR->getInput().getType(), SN->getValue()); - RTR = F->createTranspose("transpose", NS, LTR->getShuffle()); + RTR = F->createTranspose("transpose", NS, LTR->getShuffle(), + LTR->getLayout()); changed = true; } else if (isa(node->getNthInput(ArithmeticNode::LHSIdx)) && RTR) { @@ -552,8 +560,8 @@ bool SinkCode::run(Function *F, const CompilationContext &cctx) { newAN->setPredicate(node->getPredicate()); changed = true; - auto *newTR = - F->createTranspose(LTR->getName(), newAN, LTR->getShuffle()); + auto *newTR = F->createTranspose(LTR->getName(), newAN, LTR->getShuffle(), + LTR->getLayout()); newTR->setPredicate(node->getPredicate()); node->getNthResult(ArithmeticNode::ResultIdx).replaceAllUsesOfWith(newTR); } @@ -587,7 +595,8 @@ bool SinkCode::run(Function *F, const CompilationContext &cctx) { RQ->getResult().getType(), TR->getInput().getType()->dims()); auto *newRQ = F->createRescaleQuantized(RQ->getName(), TR->getInput(), newRQType); - auto *newTR = F->createTranspose(TR->getName(), newRQ, TR->getShuffle()); + auto *newTR = F->createTranspose(TR->getName(), newRQ, TR->getShuffle(), + TR->getLayout()); RQ->getResult().replaceAllUsesOfWith(newTR); changed = true; } @@ -646,8 +655,9 @@ bool SinkCode::run(Function *F, const CompilationContext &cctx) { auto *newCN = F->createConcat(CN->getName(), transVector, newChannelIdx); newCN->setPredicate(node->getPredicate()); - auto *newTR = F->createTranspose(firstInput->getName(), newCN, - firstInput->getShuffle()); + auto *newTR = + F->createTranspose(firstInput->getName(), newCN, + firstInput->getShuffle(), firstInput->getLayout()); newTR->setPredicate(node->getPredicate()); CN->getResult().replaceAllUsesOfWith(newTR); changed = true; @@ -933,7 +943,8 @@ bool MergeTransposeIntoMatMulOrFC::run(Function *F, F->getParent()->uniqueTypeWithNewShape(W->getType(), newShape); // New reordered weights. - auto *newW = F->getParent()->createConstant(W->getType(), W->getName()); + auto *newW = F->getParent()->createConstant(W->getType(), W->getName(), + W->getLayout()); Tensor reshapedSrc(W->getPayload().getUnsafePtr(), reshapedWTy); Tensor reshapedDst(newW->getPayload().getUnsafePtr(), reshapedNewWTy); reshapedSrc.transpose(&reshapedDst, shuffle); @@ -1252,13 +1263,14 @@ bool OptimizeReduceMean::run(Function *F, const CompilationContext &cctx) { std::vector strides = {1, 1}; std::vector pads = {0, 0, 0, 0}; + // TODO: Fix bad assumption? See issue 3499, for now workaround it. // In Glow, AvgPool expects NHWC. auto *TR1 = F->createTranspose( - RM->getName().str() + ".transposeNCHW2NHWC", in, NCHW2NHWC); + RM->getName().str() + ".transposeNCHW2NHWC", in, NCHW2NHWC, "NHWC"); auto *AP = F->createAvgPool(RM->getName().str() + ".avgPool", TR1, kernels, strides, pads); auto *TR2 = F->createTranspose( - RM->getName().str() + ".transposeNHWC2NCHW", AP, NHWC2NCHW); + RM->getName().str() + ".transposeNHWC2NCHW", AP, NHWC2NCHW, "NCHW"); // AvgPool keeps original shape. Add reshape to match expected output. std::vector shape = TR2->getResult().dims(); @@ -1298,7 +1310,8 @@ static Constant *getUniquelyUsedConstant(Module *M, Node &node) { } // If constant has more than one use, duplicate it and return the duplicate. - auto *NC = M->createConstant(constant->getType(), constant->getName()); + auto *NC = M->createConstant(constant->getType(), constant->getName(), + constant->getLayout()); NC->getPayloadMutable().assign(&constant->getPayload()); return NC; } @@ -1594,8 +1607,11 @@ static NodeValue tryToOptimizeConcatOfRehapes(Function *F, ConcatNode *CN) { return NodeValue(nullptr); } auto *newCN = F->createConcat(CN->getName(), newConcatInputs, dim); - return F->createReshape(CN->getInputs().front().getNode()->getName(), newCN, - CN->getResult().dims()); + return F->createReshape( + CN->getInputs().front().getNode()->getName(), newCN, + CN->getResult().dims(), + CanonicalTensorLayout::getInstance().getNthResultLayoutRequirements( + CN, ConcatNode::ResultIdx)); } /// Simplify concat node. @@ -1796,8 +1812,8 @@ bool TransposeConstants::run(Function *F, const CompilationContext &cctx) { continue; } // Create a new Constant NC to hold the transposed result. - auto *NC = - F->getParent()->createConstant(TN->getResult().getType(), C->getName()); + auto *NC = F->getParent()->createConstant(TN->getResult().getType(), + C->getName(), TN->getLayout()); // Transpose the value of C into NC. genericTranspose(&C->getPayload(), &NC->getPayloadMutable(), TN->getShuffle()); @@ -2059,7 +2075,8 @@ bool OptimizeTransposeIntoReshape::run(Function *F, if (inDims != outDims) { continue; } - auto *RS = F->createReshape(TR->getName(), inputNode, outputDims); + auto *RS = + F->createReshape(TR->getName(), inputNode, outputDims, TR->getLayout()); TR->getResult().replaceAllUsesOfWith(RS); changed = true; } @@ -2115,9 +2132,9 @@ bool OptimizeReshape::run(Function *F, const CompilationContext &cctx) { // Reshape(Reshape(x)) -> Reshape(x). auto *reshapeNodeInput = dyn_cast(inputNode); if (reshapeNodeInput && reshapeNodeInput->hasOneUse()) { - auto *newReshape = - F->createReshape(reshapeNode->getName(), reshapeNodeInput->getInput(), - reshapeNode->getResult().dims()); + auto *newReshape = F->createReshape( + reshapeNode->getName(), reshapeNodeInput->getInput(), + reshapeNode->getResult().dims(), reshapeNode->getLayout()); reshapeNode->getResult().replaceAllUsesOfWith(newReshape); changed = true; continue; @@ -2128,8 +2145,11 @@ bool OptimizeReshape::run(Function *F, const CompilationContext &cctx) { auto *C = dyn_cast(inputNode); if (C && C->hasOneUse()) { // Create a new Constant with the type of the reshape. + auto layout = + CanonicalTensorLayout::getInstance().getNthResultLayoutRequirements( + reshapeNode, ReshapeNode::ResultIndices::ResultIdx); auto *newC = F->getParent()->createConstant( - reshapeNode->getResult().getType(), C->getName()); + reshapeNode->getResult().getType(), C->getName(), layout); // Create an unowned view of the original tensor with the correct shape, // and assign it to the new Constant. Tensor reshapedT = C->getPayload().getUnowned(reshapeNode->getDims()); @@ -2264,7 +2284,8 @@ static NodeValue convertConstant(Module &mod, Constant &constant, if (dstTy->getElementType() != ElemKind::UInt8FusedFP16QTy) { return NodeValue(); } - auto *NC = mod.createConstant(dstTy, constant.getName()); + auto *NC = + mod.createConstant(dstTy, constant.getName(), constant.getLayout()); NC->getPayloadMutable() = tensor.getCopyConvertedToType(dstTy->getElementType()); return NC->getOutput(); @@ -2520,8 +2541,9 @@ static bool sinkRescaleQuantizedNode(Function *F) { continue; } - auto *newReshape = F->createReshape( - reshape->getName(), rescale->getInput(), reshape->getResult().dims()); + auto *newReshape = + F->createReshape(reshape->getName(), rescale->getInput(), + reshape->getResult().dims(), reshape->getLayout()); auto *newRescale = F->createRescaleQuantized( rescale->getName(), newReshape, reshape->getResult().getType()); reshape->getResult().replaceAllUsesOfWith(newRescale); @@ -2558,8 +2580,9 @@ static bool sinkRescaleQuantizedNode(Function *F) { continue; } - auto *newTranspose = F->createTranspose( - transpose->getName(), rescale->getInput(), transpose->getShuffle()); + auto *newTranspose = + F->createTranspose(transpose->getName(), rescale->getInput(), + transpose->getShuffle(), transpose->getLayout()); auto rescaleOutTy = F->getParent()->uniqueTypeWithNewShape( rescale->getResult().getType(), transpose->getResult().dims()); auto *newRescale = F->createRescaleQuantized(rescale->getName(), @@ -2852,7 +2875,7 @@ void glow::convertPlaceholdersToConstants(Function *F, if (!tensor) { continue; } - auto *constant = M->createConstant(PH->getName(), *tensor); + auto *constant = M->createConstant(PH->getName(), *tensor, PH->getLayout()); PH->getOutput().replaceAllUsesOfWith(constant, F); } } diff --git a/lib/Optimizer/GraphOptimizer/Lower.cpp b/lib/Optimizer/GraphOptimizer/Lower.cpp index 566f42261a..41a0cd90ba 100644 --- a/lib/Optimizer/GraphOptimizer/Lower.cpp +++ b/lib/Optimizer/GraphOptimizer/Lower.cpp @@ -18,6 +18,7 @@ #include "glow/Graph/Graph.h" #include "glow/Graph/Node.h" #include "glow/Graph/Nodes.h" +#include "glow/Graph/TensorLayout.h" #include "glow/Optimizer/GraphOptimizer/FunctionPasses.h" #include "glow/Optimizer/GraphOptimizer/GraphOptimizer.h" @@ -171,7 +172,10 @@ static void lowerFullyConnectedGradNode(Function *F, CompilationContext &cctx, // dx = dout * w.T auto *wT = F->createTranspose("fcg.wT", FCG.getWeights(), {1, 0}); auto *dx2 = F->createMatMul("fcg.dot", dout, wT); - auto *dx = F->createReshape("fcg.inG", dx2, FCG.getInput().getType()->dims()); + auto *dx = F->createReshape( + "fcg.inG", dx2, FCG.getInput().getType()->dims(), + CanonicalTensorLayout::getInstance().getNthInputLayoutRequirements( + &FCG, FullyConnectedGradNode::InputIdx)); replaceAllUsesOfWith(cctx.loweredInfoMap, FCG.getGradOfInputNamedInput(), dx); // dw = xT * dout. @@ -675,7 +679,7 @@ static void lowerBucketizeNode(Function *F, CompilationContext &cctx, auto *oneSplat = F->createSplat("oneSplat", boundariesConst->getType(), 1.0); auto *reshapedInput = F->createReshape(baseStr + ".reshape.input", B.getInput(), - {B.getInput().getType()->size()}); + {B.getInput().getType()->size()}, "N"); std::vector results; for (size_t i = 0, e = reshapedInput->getResult().getType()->size(); i < e; i++) { @@ -877,10 +881,11 @@ static void lowerChannelShuffleNode(Function *F, CompilationContext &cctx, transpose[i] = i; } std::swap(transpose[kernel], transpose[kernel + 1]); - auto *T = - F->createTranspose(CSN.getName().str() + ".transpose", R1, transpose); + auto *T = F->createTranspose(CSN.getName().str() + ".transpose", R1, + transpose, R1->getLayout()); - auto *R2 = F->createReshape(CSN.getName().str() + ".reshape2", T, inDims); + auto *R2 = F->createReshape(CSN.getName().str() + ".reshape2", T, inDims, + T->getLayout()); replaceAllUsesOfWith(cctx.loweredInfoMap, CSN.getResult(), R2); } diff --git a/lib/Partitioner/Partitioner.cpp b/lib/Partitioner/Partitioner.cpp index 81123567d5..8eb906d933 100644 --- a/lib/Partitioner/Partitioner.cpp +++ b/lib/Partitioner/Partitioner.cpp @@ -75,14 +75,9 @@ void Partitioner::init() { Error Partitioner::finalize(const DAGListTy &partitions, const NodeToFunctionMap &mapping) { - // Validate the functions after partitioning. - for (Function *subF : module_->getFunctions()) { - if (!subF->verify()) { - return MAKE_ERR(ErrorValue::ErrorCode::PARTITIONER_ERROR, - "Conversion led to invalid function " + - subF->getName().str()); - } - } + // NOTE: Cannot validate the functions after partitioning here. The validation + // needs the backend specific verifier. Tensor layouts, for example, might + // have gone from canonical form to backend specific form. if (logPartition) { LOG(INFO) << "The number of partitions is : " diff --git a/tests/unittests/BackendTestUtils.cpp b/tests/unittests/BackendTestUtils.cpp index 766a18a9cc..30f9f0d9ca 100644 --- a/tests/unittests/BackendTestUtils.cpp +++ b/tests/unittests/BackendTestUtils.cpp @@ -58,9 +58,10 @@ namespace { // Helpers for creating and intializing placeholders from tensors. static Placeholder *createPlaceholder(Module &mod, PlaceholderBindings &bindings, - Tensor *tensor, llvm::StringRef name) { + Tensor *tensor, llvm::StringRef name, + const std::string layout = ANY_LAYOUT) { auto *P = mod.createPlaceholder(tensor->getElementType(), tensor->dims(), - name, false); + name, false, layout); auto *PTensor = bindings.allocate(P); PTensor->assign(tensor); @@ -682,7 +683,7 @@ void inferSmallConv(Tensor *inputs, Tensor *out, llvm::StringRef kind) { ExecutionEngine EE(kind); auto &mod = EE.getModule(); auto *F = mod.createFunction("main"); - auto *in = createPlaceholder(mod, bindings, inputs, "in"); + auto *in = createPlaceholder(mod, bindings, inputs, "in", "NHWC"); auto *C = F->createConv(bindings, "conv2a", in, 64, 1, 1, 0, 1); bindings.get(cast(C->getFilter()))->getHandle().clear(0.3); bindings.get(cast(C->getBias()))->getHandle().clear(0.4); @@ -981,7 +982,7 @@ void inferBasicConvNet(Tensor *inputs, Tensor *out, llvm::StringRef kind, ExecutionEngine EE(kind); auto &mod = EE.getModule(); Function *F = mod.createFunction("main"); - auto *var = createPlaceholder(mod, bindings, inputs, "var"); + auto *var = createPlaceholder(mod, bindings, inputs, "var", "NCHW"); auto *tr = F->createTranspose("tr", var, NCHW2NHWC); auto *conv = F->createConv(bindings, "conv", tr, convDepth, {5, 5}, {2, 2}, {1, 1, 1, 1}, 1); @@ -1004,8 +1005,8 @@ FunctionTensorPair createAndInitBasicFCNet(PlaceholderBindings &bindings, auto &mod = EE.getModule(); Function *F = mod.createFunction("main"); - auto *var = - mod.createPlaceholder(ElemKind::FloatTy, {2, 3, 16, 16}, "var", false); + auto *var = mod.createPlaceholder(ElemKind::FloatTy, {2, 3, 16, 16}, "var", + false, "NCHW"); auto *tr = F->createTranspose("tr", var, NCHW2NHWC); auto *fc = F->createFullyConnected(bindings, "fc", tr, 16); auto *rl0 = F->createRELU("relu", fc); @@ -1027,7 +1028,7 @@ void inferMixedNet(Tensor *inputs, Tensor *out, llvm::StringRef kind) { ExecutionEngine EE(kind); auto &mod = EE.getModule(); Function *F = mod.createFunction("main"); - auto *var = createPlaceholder(mod, bindings, inputs, "var"); + auto *var = createPlaceholder(mod, bindings, inputs, "var", "NCHW"); auto *selected = mod.createPlaceholder(ElemKind::Int64ITy, {2, 1}, "selected", false); @@ -1069,20 +1070,20 @@ void inferComplexNet1(Tensor *inputs1, Tensor *inputs2, Tensor *inputs3, auto *sigmoid1 = F->createSigmoid("sigmoid1", conv1); auto *fc1 = F->createFullyConnected(bindings, "fc1", var2, 2352); bindings.get(cast(fc1->getWeights()))->getHandle().clear(0.6); - auto *reshape1 = F->createReshape("reshape1", fc1, {8, 14, 28, 6}); + auto *reshape1 = F->createReshape("reshape1", fc1, {8, 14, 28, 6}, "NHWC"); auto *relu1 = F->createRELU("relu1", reshape1); auto *pool1 = F->createMaxPool("pool1", relu1, 2, 2, 1); auto *add = F->createAdd("add", sigmoid1, pool1->getResult()); auto *tanh = F->createTanh("tanh", add); auto *fc2 = F->createFullyConnected(bindings, "fc2", var3, 720); bindings.get(cast(fc2->getWeights()))->getHandle().clear(1.1); - auto *reshape2 = F->createReshape("reshape2", fc2, {8, 8, 15, 6}); + auto *reshape2 = F->createReshape("reshape2", fc2, {8, 8, 15, 6}, "NHWC"); auto *mul = F->createMul("mul", tanh, reshape2); auto *sigmoid2 = F->createSigmoid("sigmoid2", mul); auto *conv2 = F->createConv(bindings, "conv2", sigmoid2, 7, 3, 2, 1, 1); bindings.get(cast(conv2->getFilter()))->getHandle().clear(0.3); bindings.get(cast(conv2->getBias()))->getHandle().clear(1.3); - auto *reshape3 = F->createReshape("reshape3", conv2, {8, 8, 7, 4}); + auto *reshape3 = F->createReshape("reshape3", conv2, {8, 8, 7, 4}, "NHWC"); auto *sub = F->createSub("sub", reshape3, var4); auto *relu2 = F->createRELU("relu2", sub); auto *pool2 = F->createAvgPool("pool2", relu2, 3, 2, 1); @@ -1114,7 +1115,7 @@ void inferTinyResnet(Tensor *input, Tensor *out, std::vector &weights, auto &mod = EE.getModule(); auto *F = mod.createFunction("main"); - auto *in = createPlaceholder(mod, bindings, input, "in"); + auto *in = createPlaceholder(mod, bindings, input, "in", "NHWC"); auto *conv1 = F->createConv(bindings, "conv1", in, 256, 1, 1, 0, 1); auto *conv2a = F->createConv(bindings, "conv2a", conv1, 64, 1, 1, 0, 1); auto *relu2a = F->createRELU("relu2a", conv2a); diff --git a/tests/unittests/CMakeLists.txt b/tests/unittests/CMakeLists.txt index fc601766fb..04c3efc2c7 100755 --- a/tests/unittests/CMakeLists.txt +++ b/tests/unittests/CMakeLists.txt @@ -36,6 +36,8 @@ add_executable(BasicIRTest BasicIRTest.cpp) target_link_libraries(BasicIRTest PRIVATE + Backend + Backends Graph IR gtest @@ -165,6 +167,8 @@ add_executable(GraphSchedulerTest GraphSchedulerTest.cpp) target_link_libraries(GraphSchedulerTest PRIVATE + Backend + Backends Graph IR gtest @@ -373,6 +377,7 @@ foreach(backend ${GLOW_BACKENDS}) add_backend_test(TEST MLTest BACKEND "${backend}" UNOPT) add_backend_test(TEST OperatorGradTest BACKEND "${backend}" UNOPT) add_backend_test(TEST OperatorTest BACKEND "${backend}" UNOPT) + add_backend_test(TEST TensorLayoutTest BACKEND "${backend}" UNOPT) add_backend_test(TEST RecommendationSystemTest BACKEND @@ -471,6 +476,8 @@ add_executable(TensorPoolTest TensorPoolTest.cpp) target_link_libraries(TensorPoolTest PRIVATE + Backend + Backends Graph TensorPool gtest diff --git a/tests/unittests/GradCheckTest.cpp b/tests/unittests/GradCheckTest.cpp index 874bb05e02..580810ed2f 100644 --- a/tests/unittests/GradCheckTest.cpp +++ b/tests/unittests/GradCheckTest.cpp @@ -942,7 +942,8 @@ TEST_P(GradCheck, gradientCheckTranspose) { auto &mod = EE->getModule(); bindings.clear(); Function *F = mod.createFunction("main"); - A = mod.createPlaceholder(ElemKind::FloatTy, {1, 5, 10, 5}, "input", false); + A = mod.createPlaceholder(ElemKind::FloatTy, {1, 5, 10, 5}, "input", false, + "NHWC"); Exp = mod.createPlaceholder(ElemKind::FloatTy, {1, numOutputElem}, "exp", false); Node *TA = F->createTranspose("transpose", A, NHWC2NCHW); diff --git a/tests/unittests/GraphOptzTest.cpp b/tests/unittests/GraphOptzTest.cpp index 3dececfb0c..15cda60d90 100644 --- a/tests/unittests/GraphOptzTest.cpp +++ b/tests/unittests/GraphOptzTest.cpp @@ -355,7 +355,7 @@ TEST_F(GraphOptz, optimizeBatchNormAfterConvWithReshapeConst) { mod_.createPlaceholder(ElemKind::FloatTy, {5, 5, 3, 1}, "filter", false); auto *bias = mod_.createPlaceholder(ElemKind::FloatTy, {1}, "bias", false); - auto *TN = F_->createTranspose("transpose", filter, {3, 0, 1, 2}); + auto *TN = F_->createTranspose("transpose", filter, HWCN2NHWC); auto *CV = F_->createConv("conv", input, TN, bias, mod_.uniqueType(ElemKind::FloatTy, {1, 10, 20, 1}), 5, 1, 2, 1); diff --git a/tests/unittests/GraphTest.cpp b/tests/unittests/GraphTest.cpp index 507f923992..0df2d69859 100644 --- a/tests/unittests/GraphTest.cpp +++ b/tests/unittests/GraphTest.cpp @@ -888,7 +888,8 @@ TEST(Graph, parentLink) { ExecutionEngine EE; auto &mod = EE.getModule(); - Constant *V = new Constant("V", mod.uniqueType(ElemKind::FloatTy, {3, 32})); + Constant *V = + new Constant("V", mod.uniqueType(ElemKind::FloatTy, {3, 32}), ANY_LAYOUT); // Variables don't belong to any function... EXPECT_EQ(V->getParent(), nullptr); @@ -1814,6 +1815,7 @@ TEST(Graph, testDumpStructure) { std::string mesN = K->toString(); std::string expectMes = R"(Placeholder name : "input" +layout : * output : float<4 x 320 x 200 x 100 x 3> users : 0 trainable : 1 @@ -1857,6 +1859,7 @@ Indices : index64<10 x 3> std::string expectMesM = R"(Module structure: Constant name : "dummy" +layout : * output : float<1 x 1> users : 0 diff --git a/tests/unittests/OperatorTest.cpp b/tests/unittests/OperatorTest.cpp index 127f9aa7d8..c7dc859158 100644 --- a/tests/unittests/OperatorTest.cpp +++ b/tests/unittests/OperatorTest.cpp @@ -43,10 +43,10 @@ class OperatorTest : public BackendTest { /// dummy scale and offset, otherwise it will not. static Placeholder *createPlaceholderConditionallyQuantized( Module &mod, ElemKind T, llvm::ArrayRef dims, llvm::StringRef name, - bool isTrainable) { + bool isTrainable, llvm::StringRef layout = ANY_LAYOUT) { return isQuantizedElemKind(T) - ? mod.createPlaceholder(T, dims, 1.0, 0, name, isTrainable) - : mod.createPlaceholder(T, dims, name, isTrainable); + ? mod.createPlaceholder(T, dims, 1.0, 0, name, isTrainable, layout) + : mod.createPlaceholder(T, dims, name, isTrainable, layout); } /// Helper to get a unique Type; if \p T is quantized, then it will include a @@ -623,10 +623,11 @@ static void testSpaceToDepthBlock3(glow::PlaceholderBindings &bindings, glow::ExecutionEngine &EE, ElemKind DTy) { unsigned blockSize = 3; auto *in = createPlaceholderConditionallyQuantized(mod, DTy, {1, 2, 6, 6}, - "in", false); - auto *tri = F->createTranspose("sptdTransposeIn", in, {0, 2, 3, 1}); + "in", false, "NHWC"); + auto *tri = F->createTranspose("sptdTransposeIn", in, {0, 2, 3, 1}, "NHWC"); auto *stdn = F->createSpaceToDepth("spacetodepth", tri, blockSize); - auto *tro = F->createTranspose("sptdTransposeOut", stdn, {0, 3, 1, 2}); + auto *tro = + F->createTranspose("sptdTransposeOut", stdn, {0, 3, 1, 2}, "NCHW"); auto *save = F->createSave("save", tro); auto *result = bindings.allocate(save->getPlaceholder()); @@ -777,10 +778,11 @@ static void testSpaceToDepth(glow::PlaceholderBindings &bindings, glow::ExecutionEngine &EE, ElemKind DTy) { unsigned blockSize = 2; auto *in = createPlaceholderConditionallyQuantized(mod, DTy, {2, 2, 4, 4}, - "in", false); - auto *tri = F->createTranspose("sptdTransposeIn", in, {0, 2, 3, 1}); + "in", false, "NHWC"); + auto *tri = F->createTranspose("sptdTransposeIn", in, {0, 2, 3, 1}, "NHWC"); auto *stdn = F->createSpaceToDepth("spacetodepth", tri, blockSize); - auto *tro = F->createTranspose("sptdTransposeOut", stdn, {0, 3, 1, 2}); + auto *tro = + F->createTranspose("sptdTransposeOut", stdn, {0, 3, 1, 2}, "NCHW"); auto *save = F->createSave("save", tro); auto *result = bindings.allocate(save->getPlaceholder()); @@ -886,7 +888,7 @@ static void testResizeNearest(glow::PlaceholderBindings &bindings, glow::Module &mod, glow::Function *F, glow::ExecutionEngine &EE, ElemKind DTy) { auto *input = createPlaceholderConditionallyQuantized(mod, DTy, {1, 2, 2, 1}, - "input", false); + "input", false, "NHWC"); bindings.allocate(input)->getHandle() = {2, 4, 8, 16}; auto heightScaleUp = 2.0f; @@ -1640,7 +1642,7 @@ static void testBatchedReduceZeroDimResult(glow::PlaceholderBindings &bindings, glow::ExecutionEngine &EE, ElemKind DTy) { auto *batch = createPlaceholderConditionallyQuantized( - mod, DTy, {4}, "batch", /* isTrainable */ false); + mod, DTy, {4}, "batch", /* isTrainable */ false, "N"); bindings.allocate(batch)->getHandle() = {2, 4, 6, 8}; auto OT = uniqueTypeConditionallyQuantized(mod, DTy, {}); @@ -1941,7 +1943,8 @@ TEST_P(OperatorTest, batchedReduceMeanUsingAvgPool) { std::vector dims = {3, 20, 4, 8}; - auto *batch = mod_.createPlaceholder(ElemKind::FloatTy, dims, "batch", false); + auto *batch = + mod_.createPlaceholder(ElemKind::FloatTy, dims, "batch", false, "NHWC"); auto IH = bindings_.allocate(batch)->getHandle(); IH.randomize(1.0, 100.0, mod_.getPRNG()); @@ -2344,9 +2347,9 @@ static void testArgMaxKeepDim(glow::PlaceholderBindings &bindings, glow::Module &mod, glow::Function *F, glow::ExecutionEngine &EE, ElemKind DTy) { auto *input = createPlaceholderConditionallyQuantized(mod, DTy, {2, 3, 2, 2}, - "input", false); - auto *argmax = - mod.createPlaceholder(ElemKind::Int64ITy, {1, 3, 2, 2}, "argmax", false); + "input", false, "NHWC"); + auto *argmax = mod.createPlaceholder(ElemKind::Int64ITy, {1, 3, 2, 2}, + "argmax", false, "NHWC"); bindings.allocate(input)->getHandle() = { 11, 24, 33, 41, 15, 26, 37, 48, 12, 28, 31, 42, @@ -2389,7 +2392,7 @@ static void testArgMaxNoKeepDim(glow::PlaceholderBindings &bindings, glow::Module &mod, glow::Function *F, glow::ExecutionEngine &EE, ElemKind DTy) { auto *input = createPlaceholderConditionallyQuantized(mod, DTy, {2, 3, 2, 2}, - "input", false); + "input", false, "NHWC"); auto *argmax = mod.createPlaceholder(ElemKind::Int64ITy, {2, 2, 2}, "argmax", false); @@ -2788,8 +2791,8 @@ void gatherRangesTest(glow::PlaceholderBindings &bindings_, glow::Module &mod_, OUTPUT = [1, 3, 4, 5, 6] LENGTHS = [3, 2] */ - auto *data = - createPlaceholderConditionallyQuantized(mod_, DTy, {6}, "data", false); + auto *data = createPlaceholderConditionallyQuantized(mod_, DTy, {6}, "data", + false, "N"); auto *ranges = mod_.createPlaceholder(ITy, {2, 2, 2}, "ranges", false); bindings_.allocate(data)->getHandle() = {1, 2, 3, 4, 5, 6}; @@ -3005,14 +3008,14 @@ TEST_P(OperatorTest, Transpose3Dims_Int8) { /// Test that Transpose optimization into Reshape yields expected results. TEST_P(OperatorTest, TransposeIntoReshapeOptim) { CHECK_IF_ENABLED(); - auto *batch = - mod_.createPlaceholder(ElemKind::FloatTy, {1, 3, 2, 4}, "batch", false); + auto *batch = mod_.createPlaceholder(ElemKind::FloatTy, {1, 3, 2, 4}, "batch", + false, "NHWC"); auto IH = bindings_.allocate(batch)->getHandle(); for (size_t i = 0; i < 24; i++) { IH.raw(i) = i + 1; } - Node *T = F_->createTranspose("transpose", batch, {1, 2, 0, 3}); + Node *T = F_->createTranspose("transpose", batch, {1, 2, 0, 3}, "HWNC"); Node *R = F_->createBatchedReduceMean("reduce.mean", T, {2, 3}); SaveNode *O = F_->createSave("ret", R); bindings_.allocate(mod_.getPlaceholders()); @@ -5624,15 +5627,15 @@ TEST_P(OperatorTest, GroupConv3D) { TEST_P(OperatorTest, NonSquarePaddingConvolution) { CHECK_IF_ENABLED(); - auto *input = - mod_.createPlaceholder(ElemKind::FloatTy, {1, 4, 4, 1}, "input", false); + auto *input = mod_.createPlaceholder(ElemKind::FloatTy, {1, 4, 4, 1}, "input", + false, "NHWC"); auto IH = bindings_.allocate(input)->getHandle(); for (size_t i = 0; i < 4 * 4; i++) { IH.raw(i) = i + 1; } - auto filter = - mod_.createPlaceholder(ElemKind::FloatTy, {2, 2, 2, 1}, "filter", false); + auto filter = mod_.createPlaceholder(ElemKind::FloatTy, {2, 2, 2, 1}, + "filter", false, "NHWC"); auto FH = bindings_.allocate(filter)->getHandle(); for (size_t i = 0; i < 2 * 2 * 2; i++) { FH.raw(i) = pow(2.0, i); @@ -5655,8 +5658,8 @@ TEST_P(OperatorTest, NonSquarePaddingConvolution) { // Create the reference conv operator whose input is the same as the // after-padding-input above. - auto *input1 = - mod_.createPlaceholder(ElemKind::FloatTy, {1, 5, 9, 1}, "input1", false); + auto *input1 = mod_.createPlaceholder(ElemKind::FloatTy, {1, 5, 9, 1}, + "input1", false, "NHWC"); bindings_.allocate(input1)->zero(); auto IH1 = bindings_.get(input1)->getHandle(); for (size_t i = 0; i < 4; i++) @@ -6270,7 +6273,7 @@ static void testMaxPoolWithArgmax(glow::PlaceholderBindings &bindings, glow::Module &mod, glow::Function *F, glow::ExecutionEngine &EE, ElemKind DTy) { auto *input = createPlaceholderConditionallyQuantized(mod, DTy, {1, 3, 3, 1}, - "input", false); + "input", false, "NHWC"); bindings.allocate(input)->getHandle() = {0, 3, 7, 6, 5, 1, 2, 8, 4}; auto *pool = F->createMaxPool("pool", input, {2, 2}, {1, 1}, {0, 0, 0, 0}); auto *SResult = F->createSave("save_result", pool->getResult()); @@ -6310,7 +6313,7 @@ testMaxPoolWithArgmaxTransposed(glow::PlaceholderBindings &bindings, // Show that sequence Tensor(NCHW) -> Transpose(NCHWtoNHWC) -> // MaxPoolWithArgmax -> Transpose(NHWCtoNCHW) produces correct linearization. auto *inputNCHW = createPlaceholderConditionallyQuantized( - mod, DTy, {1, 3, 4, 4}, "input", false); + mod, DTy, {1, 3, 4, 4}, "input", false, "NCHW"); auto inHandle = bindings.allocate(inputNCHW)->getHandle(); inHandle.clear(0.); inHandle.at({0, 0, 2, 2}) = 11; @@ -6319,15 +6322,15 @@ testMaxPoolWithArgmaxTransposed(glow::PlaceholderBindings &bindings, // Input NCHW to NHWC conversion. auto *inputNHWC = - F->createTranspose("transposeInput", inputNCHW, {0, 2, 3, 1}); + F->createTranspose("transposeInput", inputNCHW, {0, 2, 3, 1}, "NHWC"); auto *pool = F->createMaxPool("pool", inputNHWC, {4, 4}, {4, 4}, {0, 0, 0, 0}); // NHWC to NCHW conversion. - auto *resultNCHW = - F->createTranspose("transposeRes", pool->getResult(), {0, 3, 1, 2}); - auto *argmaxNCHW = - F->createTranspose("transposeArgmax", pool->getArgmax(), {0, 3, 1, 2}); + auto *resultNCHW = F->createTranspose("transposeRes", pool->getResult(), + {0, 3, 1, 2}, "NCHW"); + auto *argmaxNCHW = F->createTranspose("transposeArgmax", pool->getArgmax(), + {0, 3, 1, 2}, "NCHW"); auto *SResult = F->createSave("save_result", resultNCHW); auto *SArgmax = F->createSave("save_argmax", argmaxNCHW); @@ -8854,7 +8857,7 @@ static void testFlatten(glow::PlaceholderBindings &bindings, glow::Module &mod, glow::Function *F, glow::ExecutionEngine &EE, ElemKind DTy) { auto *tensor4D = createPlaceholderConditionallyQuantized( - mod, DTy, {3, 2, 4, 3}, "4D", false); + mod, DTy, {3, 2, 4, 3}, "4D", false, "NHWC"); bindings.allocate(tensor4D)->getHandle().randomize(0, 100, mod.getPRNG()); @@ -8886,7 +8889,7 @@ static void testFlatten(glow::PlaceholderBindings &bindings, glow::Module &mod, // again because flattening is supported for every axis up and including the // rank of a tensor, 1D vector means we can flatten it on axis 1. auto *tensor1D = - createPlaceholderConditionallyQuantized(mod, DTy, {15}, "1D", false); + createPlaceholderConditionallyQuantized(mod, DTy, {15}, "1D", false, "N"); bindings.allocate(tensor1D)->getHandle().randomize(0, 100, mod.getPRNG()); @@ -9414,9 +9417,9 @@ void batchOneHotTest(glow::PlaceholderBindings &bindings, glow::Module &mod, auto *data = createPlaceholderConditionallyQuantized(mod, DTy, {3, 2}, "data", false); auto *lengths = - mod.createPlaceholder(ElemKind::Int32ITy, {2}, "lengths", false); - auto *values = - createPlaceholderConditionallyQuantized(mod, DTy, {6}, "values", false); + mod.createPlaceholder(ElemKind::Int32ITy, {2}, "lengths", false, "N"); + auto *values = createPlaceholderConditionallyQuantized(mod, DTy, {6}, + "values", false, "N"); bindings.allocate(data)->getHandle() = {5, 0, 11, 3, 0, 5}; bindings.allocate(lengths)->getHandle() = {4, 2}; @@ -9596,9 +9599,9 @@ static void testDotProduct1D(glow::PlaceholderBindings &bindings, // Input tensors. constexpr std::size_t kDataSize = 10; auto *X = createPlaceholderConditionallyQuantized(mod, DTy, {kDataSize}, "X", - false); + false, "N"); auto *Y = createPlaceholderConditionallyQuantized(mod, DTy, {kDataSize}, "Y", - false); + false, "N"); auto XH = bindings.allocate(X)->getHandle(); auto YH = bindings.allocate(Y)->getHandle(); @@ -9608,7 +9611,7 @@ static void testDotProduct1D(glow::PlaceholderBindings &bindings, // Compute expected output. auto *expected = createPlaceholderConditionallyQuantized( - mod, DTy, {kDataSize}, "expected", false); + mod, DTy, {kDataSize}, "expected", false, "N"); auto expectedH = bindings.allocate(expected)->getHandle(); for (std::size_t i = 0; i < kDataSize; ++i) { @@ -9732,8 +9735,8 @@ static void testDotProduct2D(glow::PlaceholderBindings &bindings, YH.randomize(-3.0, 3.0, mod.getPRNG()); // Compute expected output. - auto *expected = createPlaceholderConditionallyQuantized(mod, DTy, {kRows}, - "expected", false); + auto *expected = createPlaceholderConditionallyQuantized( + mod, DTy, {kRows}, "expected", false, "N"); auto expectedH = bindings.allocate(expected)->getHandle(); for (std::size_t i = 0; i < kRows; ++i) { diff --git a/tests/unittests/TensorLayoutTest.cpp b/tests/unittests/TensorLayoutTest.cpp new file mode 100644 index 0000000000..a7c0962368 --- /dev/null +++ b/tests/unittests/TensorLayoutTest.cpp @@ -0,0 +1,162 @@ +/** + * Copyright (c) Glow Contributors. See CONTRIBUTORS file. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +#include "BackendTestUtils.h" + +#include "glow/Backend/Backend.h" +#include "glow/Graph/Graph.h" +#include "glow/Graph/TensorLayout.h" +#include "llvm/Support/raw_ostream.h" + +#include "gtest/gtest.h" + +#include + +using namespace glow; + +class TensorLayoutTest : public BackendTest { +protected: + PlaceholderBindings bindings_; +}; + +// Check CanonicalTensorLayout for conv works default values: +TEST_P(TensorLayoutTest, convDefault) { + CHECK_IF_ENABLED(); + + auto *input = + mod_.createPlaceholder(ElemKind::FloatTy, {1, 3, 3, 1}, "input", false); + auto IH = bindings_.allocate(input)->getHandle(); + IH = {1, 1, 1, 1, 1, 1, 1, 1, 1}; + + auto filter = + mod_.createPlaceholder(ElemKind::FloatTy, {1, 3, 3, 1}, "filter", false); + auto FH = bindings_.allocate(filter)->getHandle(); + FH = {0, 0, 0, 1, 1, 1, 0, 0, 0}; + + auto *zeroBias = + mod_.createPlaceholder(ElemKind::FloatTy, {1}, "bias", false); + bindings_.allocate(zeroBias)->zero(); + + auto outTy = mod_.uniqueType(ElemKind::FloatTy, {1, 3, 3, 1}); + + ConvolutionNode *CN = + F_->createConv("Conv", input, filter, zeroBias, outTy, 3, 1, 1, 1); + SaveNode *S = F_->createSave("save", CN); + bindings_.allocate(S->getPlaceholder()); + + EXPECT_TRUE(verifyLayouts(*F_, CanonicalTensorLayout::getInstance())); +} + +static void buildBadConv(PlaceholderBindings &bindings, Module &mod, + Function *F) { + auto *input = mod.createPlaceholder(ElemKind::FloatTy, {1, 3, 3, 1}, "input", + false, "NWCH"); + auto IH = bindings.allocate(input)->getHandle(); + IH = {1, 1, 1, 1, 1, 1, 1, 1, 1}; + + auto filter = mod.createPlaceholder(ElemKind::FloatTy, {1, 3, 3, 1}, "filter", + false, "NWCH"); + auto FH = bindings.allocate(filter)->getHandle(); + FH = {0, 0, 0, 1, 1, 1, 0, 0, 0}; + + auto *zeroBias = mod.createPlaceholder(ElemKind::FloatTy, {1}, "bias", false); + bindings.allocate(zeroBias)->zero(); + + auto outTy = mod.uniqueType(ElemKind::FloatTy, {1, 3, 3, 1}); + + ConvolutionNode *CN = + F->createConv("Conv", input, filter, zeroBias, outTy, 3, 1, 1, 1); + SaveNode *S = F->createSave("save", CN); + bindings.allocate(S->getPlaceholder()); +} + +// Check CanonicalTensorLayout for conv fails verification with bad layout: +TEST_P(TensorLayoutTest, convBadLayout) { + CHECK_IF_ENABLED(); + + buildBadConv(bindings_, mod_, F_); + + EXPECT_FALSE(verifyLayouts(*F_, CanonicalTensorLayout::getInstance(), false)); +} + +// Check TensorLayoutDescription's parser with simple input. +TEST_P(TensorLayoutTest, parseTestSimple) { + CHECK_IF_ENABLED(); + + TensorLayoutDescription simple("NHWC"); + EXPECT_FALSE(simple.isAnyLayout()); + EXPECT_EQ(simple.getNumDims(), 4); + EXPECT_EQ(simple.getDims()[0], "N"); + EXPECT_EQ(simple.getDims()[1], "H"); + EXPECT_EQ(simple.getDims()[2], "W"); + EXPECT_EQ(simple.getDims()[3], "C"); + for (size_t i = 0; i < simple.getNumDims(); ++i) { + EXPECT_EQ(simple.getAlignment(i), 1); + } +} + +// Check TensorLayoutDescription's parser with alignment. +TEST_P(TensorLayoutTest, parseTestAlignment) { + CHECK_IF_ENABLED(); + + TensorLayoutDescription alignment("N[a=32]HW[a=64]C"); + EXPECT_FALSE(alignment.isAnyLayout()); + EXPECT_EQ(alignment.getNumDims(), 4); + EXPECT_EQ(alignment.getDims()[0], "N[a=32]"); + EXPECT_EQ(alignment.getDims()[1], "H"); + EXPECT_EQ(alignment.getDims()[2], "W[a=64]"); + EXPECT_EQ(alignment.getDims()[3], "C"); + EXPECT_EQ(alignment.getAlignment(0), 32); + EXPECT_EQ(alignment.getAlignment(1), 1); + EXPECT_EQ(alignment.getAlignment(2), 64); + EXPECT_EQ(alignment.getAlignment(3), 1); +} + +// Check TensorLayoutDescription's parser with custom extensions. +TEST_P(TensorLayoutTest, parseTestCustom) { + CHECK_IF_ENABLED(); + + TensorLayoutDescription custom("N[a=32][after:align]C[mal:reynolds][answer:" + "42]HW[before:alignment][a=64]"); + EXPECT_FALSE(custom.isAnyLayout()); + EXPECT_EQ(custom.getNumDims(), 4); + EXPECT_EQ(custom.getDims()[0], "N[a=32][after:align]"); + EXPECT_EQ(custom.getDims()[1], "C[mal:reynolds][answer:42]"); + EXPECT_EQ(custom.getDims()[2], "H"); + EXPECT_EQ(custom.getDims()[3], "W[before:alignment][a=64]"); + EXPECT_EQ(custom.getAlignment(0), 32); + EXPECT_EQ(custom.getAlignment(1), 1); + EXPECT_EQ(custom.getAlignment(2), 1); + EXPECT_EQ(custom.getAlignment(3), 64); +} + +// Check TensorLayoutDescription's parser with star dims. +TEST_P(TensorLayoutTest, parseTestStar) { + CHECK_IF_ENABLED(); + + TensorLayoutDescription custom("N[a=32]*H*[a=64]"); + EXPECT_FALSE(custom.isAnyLayout()); + EXPECT_EQ(custom.getNumDims(), 4); + EXPECT_EQ(custom.getDims()[0], "N[a=32]"); + EXPECT_EQ(custom.getDims()[1], "*"); + EXPECT_EQ(custom.getDims()[2], "H"); + EXPECT_EQ(custom.getDims()[3], "*[a=64]"); + EXPECT_EQ(custom.getAlignment(0), 32); + EXPECT_EQ(custom.getAlignment(1), 1); + EXPECT_EQ(custom.getAlignment(2), 1); + EXPECT_EQ(custom.getAlignment(3), 64); +} + +INSTANTIATE_BACKEND_TEST(TensorLayoutTest); diff --git a/tests/unittests/ThreadPoolExecutorTest.cpp b/tests/unittests/ThreadPoolExecutorTest.cpp index 6ff89a783d..84a6f0f0d4 100644 --- a/tests/unittests/ThreadPoolExecutorTest.cpp +++ b/tests/unittests/ThreadPoolExecutorTest.cpp @@ -622,8 +622,8 @@ TEST_F(ThreadPoolExecutorTest, EmptyDAG) { // compare the returned PlaceholderBindings with. PseudoRNG rng; auto type = std::unique_ptr(new Type(ElemKind::FloatTy, {1, 2, 2})); - auto placeholder = glow::make_unique("a", type.get(), - /*trainable=*/false); + auto placeholder = glow::make_unique( + "a", type.get(), /*trainable=*/false, ANY_LAYOUT); auto testContext = glow::make_unique(); auto refContext = glow::make_unique(); diff --git a/tools/ClassGen/NodeBuilder.cpp b/tools/ClassGen/NodeBuilder.cpp index 403e95b398..d9290aa44a 100644 --- a/tools/ClassGen/NodeBuilder.cpp +++ b/tools/ClassGen/NodeBuilder.cpp @@ -575,6 +575,7 @@ NodeBuilder &NodeBuilder::addGradient() { // The new 'Grad' class will have all of the fields of the current class. GN.members_ = members_; GN.enum_ = enum_; + GN.isDataParallel_ = isDataParallel_; // Add the inputs that we'll use in the grad instruction. for (const std::string &in : nodeInputs_) { diff --git a/tools/ClassGen/NodeGen.cpp b/tools/ClassGen/NodeGen.cpp index 0c19981308..b0706ffb32 100644 --- a/tools/ClassGen/NodeGen.cpp +++ b/tools/ClassGen/NodeGen.cpp @@ -684,12 +684,14 @@ int main(int argc, char **argv) { BB.newNode("Reshape") .addInput("Input") .addMember(MemberType::VectorSizeT, "Dims") + .addMember(MemberType::String, "Layout") .addResultFromCtorArg() .setDocstring("Reshape the Input tensor to shape Dims."); BB.newNode("Transpose") .addInput("Input") .addMember(MemberType::VectorUnsigned, "Shuffle") + .addMember(MemberType::String, "Layout") .addResultFromCtorArg() .setDocstring("Transpose the Input tensor based on the vector Shuffle, " "which assigns a new axis for each dimension in Input.");