diff --git a/docs/Backends.md b/docs/Backends.md
index 82a3e2326c..79d79ceb3a 100644
--- a/docs/Backends.md
+++ b/docs/Backends.md
@@ -73,6 +73,10 @@ Additionally, there are virtual functions that backends can override:
 
   - Verifies that `IRFunction &IR` conforms to the backend-specific constraints.
 
+- `virtual TensorLayoutCommon &getTensorLayoutRequirements() const;`
+
+  - Gets the backend-specific tensor layout requirements.
+
 - `virtual bool shouldLower(const Node *N) const;`
 
   - Allow the backend to prevent lowering for some `Node *N`. For example, if a
diff --git a/docs/TensorLayout.md b/docs/TensorLayout.md
new file mode 100644
index 0000000000..ac96b1c2d7
--- /dev/null
+++ b/docs/TensorLayout.md
@@ -0,0 +1,236 @@
+## Tensor Layout
+
+This document describes the design of the tensor Layout requirements in Glow.
+
+Certain operations (e.g. convolutions, gemms, etc) need to know the semantic
+layout of their tensors, i.e. the logical ordering ordering of their dimensions
+(e.g. `NHWC`). Some backends enforce additional backend-specific requirements
+on said operations (e.g. tensor alignment).
+
+A theoretical clever backend, might even go a step further and have said
+layout requirements depend on the properties of the operation: a convolution
+with a small filter may need the input operands in a format different from a
+convolution with a big filter.
+
+Tensor layout is a property of the operation, some operations, such as
+element-wise operations, may not care about their input layout, we avoid adding
+a layout field for said operations to reduce the dynamic memory consumption of
+the compiler.
+
+For operations that do have layout requirements, Glow has an easily extendable
+string-based layout field. This allows backends to override Glow's default
+requirements without the hassle of creating a custom, backend-specific, node.
+
+Glow's string-based layout format is encoded as follows:
+
+1. A mandatory one character representing the current dimension. Either an  alphabetic letter or `*` (any layout).
+2. An optional token for the start of the current dimension's information: `[`.
+3. An optional namespace identifier for non-standard information, such as tiling, followed by `:`. Must have `[` from 2. in place. Following said identifier, all subsequent data is considered as a "black box" until `]` is encountered.
+4. Given that we have `[` from 2. in place, the closing bracket `]` for it.
+5. Optionally go back to 2.
+
+As an example for this encoding, here's how we add alignment information,
+which is an officially supported extension, thus not requiring a namespace,
+followed by a backend-specific extension:
+`N[a=32][namespace_for_unsupported:<bla>]HWC` would represent 4-D tensor wherein
+`N` needs an alignment of 32 + some private backends' requirements we don't know about.
+`HWC` have no layout restrictions.
+We can, of course, combine "any" dimensions in there, for example: `N[a=32]*H*[a=64]`
+would represent "any" for the second dimension with no restrictions whatsoever while
+we have an alignment restriction of 64 on the 4th.
+
+Notes:
+
+1. For each dimension, the identifier can be either a single english alphabet letter,
+either upper or lower case, or the star symbol.
+2. We assume that a single letter is enough for each dimension,
+it makes parsing easier and avoids adding delimiters in the serialized format,
+however, we do have a constructor that (theoretically) accepts multi-letter dimensions.
+If we decide to expand the current support,
+we will need to add delimiters to the serialized form.
+
+## Layout Requirements Interface
+
+Backends in Glow *may* derive from [base class `TensorLayoutCommon`](https://github.com/pytorch/glow/blob/master/include/glow/Graph/TensorLayout.h).
+Which includes the following virtual methods they can override:
+
+- `virtual std::string getDefaultNDLayout(unsigned dims) const`
+
+  - This helper function takes a `unsigned dims` and returns the (current) default n-D layout.
+
+- `virtual std::string getNthInputLayoutRequirements(const Node *node, size_t n)`
+
+  - This function takes an operator `Node *node` and returns the layout requirements of the Nth input `n`.
+
+- `virtual std::string getNthResultLayoutRequirements(const Node *node, size_t n)`
+
+  - This function takes an operator `Node *node` and returns the layout requirements of the Nth result `n`.
+
+- ```
+virtual bool isSatisfiedBy(TypeRef ty,
+                               const TensorLayoutDescription &destLayout,
+                               const TensorLayoutDescription *srcLayout) const
+                               ```
+	- This function checks if `ty` satisfies `destLayout` layout requirements, if `srcLayout` is provided for `ty`, take that into account.
+
+- `virtual llvm::ArrayRef<TensorLayoutDescription> getLayoutsForDims() const`
+
+  - This helper function returns an array of predefined layouts for all dimensions from `0-D` to Glow's max tensor layout dimension.
+
+- `bool isEnabled() const`
+	- Indicates whatever checking for layout requirements is enabled or not. default is off.
+
+An example of why backends may want to override such methods can be seen in the `OpenCL` backend:
+Convolutions are more efficient in `NCHW` format, as such, we may lower a `ConvolutionNode`
+into a `NHWC` to `NCHW` transpose + convolution.
+The `OpenCL` verifier should expect `NCHW` for the input/output of the convolution instead of `NHWC`.
+`OpenCL` opts-in to post-lowering verifications.
+
+## Canonical Tensor Layout
+
+Before lowering a Glow graph into a specific, we introduce a "Canonical"
+representation that we expect for certain operations.
+This allows us to verify the graph after every transformation and may expose `GraphOptimizer` bugs [^tl0].
+[class `CanonicalTensorLayout`](https://github.com/pytorch/glow/blob/master/include/glow/Graph/TensorLayout.h)
+derives from `TensorLayoutCommon` and overrides the following functions:
+
+- `std::string getDefaultNDLayout(unsigned dims) const`
+
+  - Overrides the default `n-D` layout from "any" into something else, e.g. 4-D any into `NHWC`.
+
+- `std::string getNthInputLayoutRequirements(const Node *node, size_t n)`
+
+  - This function takes an operator `Node *node` and returns the layout requirements of the Nth input `n`.
+  - It returns Common layout constraints, for example, the input of `TransposeNode` is the same as layout of operation's result producing it.
+
+- `std::string getNthResultLayoutRequirements(const Node *node, size_t n)`
+
+  - This function takes an operator `Node *node` and returns the layout requirements of the Nth result `n`.
+  - It returns Common layout constraints, for example, `ConvolutionNode` should be in `NHWC` format.
+
+## Placeholders and Constants
+
+An important thing to note is that some operators may have a `Placeholder` or
+a `Constant` as their input. We may need to know a specific layout for said
+storage. For example, a Placeholder may need to be in `NHWC` format for a
+`ConvolutionNode`. However, we do not want to pollute the code by making
+this a hard requirement, especially since the canonical layout may accept
+anything for certain tensors (e.g. `1-D` tensor), as such, we introduce the
+notion of `ANY_LAYOUT` and initialize them with this wildcard by default.
+Note, that loaders have the ability to specify the layout based on network
+description, e.g. they might accept either `NCHW` or `NHWC` as an input for
+operator, and they can propagate that information to Glow.
+
+
+## Related Work
+
+Other machine learning frameworks introduced similar concepts, this is not a
+proposal unique to Glow, here are some notable mentions:
+
+### PlaidML
+
+Provides layout requirement information as a parameter to operations that need
+to know tensor layouts instead of setting a global layout that would apply to
+every operation. Allowing users to mix layouts throughout their network.
+
+PlaidML made the conscious decision to make the layout a property the operation
+instead of the tensor, making the implementation of certain operations more
+intuitive [^tl1].
+
+### TVM
+
+TOPI is the operator collection library for TVM [^tl2]. Certain TOPI operations
+include their layout requirements as a string. Here's layout section of
+`topi.nn.pool` taken from version 0.6 of the document:
+
+> layout (string) – Layout of the input data. The layout is supposed to be composed
+> of upper cases, lower cases and numbers, where upper case indicates a dimension
+> and the corresponding lower case with factor size indicates the split dimension.
+> For example, NCHW16c can describe a 5-D tensor of [batch_size, channel, height,
+> width, channel_block], in which channel_block=16 is a split of dimension channel.
+
+
+### XLA
+
+XLA adds backend specific layout constraints. Their CPU backend requires
+constant arrays to be column major when all of their users are dot operations [^tl3]. While
+Their GPU backend adds layout constraints on the cudnn custom-call instruction [^tl4].
+
+It is also worth taking a look at XLA's layout optimizer [^tl5], part of their
+effort to improve the out-of-the-box TensorFlow performance [^tl6].
+
+Another thing to note is that their alter layout pass [^tl7] is similar,
+in function, to the "Solver" we propose to automatically legalizes layouts in
+the future work section of this document.
+
+### MLIR
+
+Does not currently have such support, but there are ongoing discussions to add such
+support to MLIR Tensor Type [^tl8].
+
+## Future Work
+
+There are a few neat things we can, and probably should, do to expand this support:
+
+### Remove `enum ConvolutionLayout`
+
+Our string based representation is more generic and extendable as it is basically an
+extendable enum that can be used in the backends without touching the generic code base.
+
+### Remove shuffle arrays
+
+Some operations, such as `TransposeNode`, have a shuffle that tells them what to do.
+This can be deprecated and automatically deduced by specifying layout constraints.
+
+There is some discrepancy is the fact that with currently use both typed tensor, 
+with named dimensions, and explicitly indexed dimensions like we currently do
+everywhere in the code base, shuffle arrays being an example of that, This
+may lead to potential inconsistency in certain cases.
+We should gradually migrate towards typed tensors in the long run.
+
+### Introduce a "Solver" that automatically legalizes layouts
+
+Said solver will drastically reduce the complexity of loading models from other frameworks:
+We no longer need to insert transposes based on if we are importing `NHWC` or `NCHW`.
+We just need to annotate the `Placeholder` with the layout information we've get at load-time,
+and which we "forget" afterwards, and let the solver transpose said `Placeholder` to our
+canonical layout.
+
+First we will start with a "raw" state of non compliance, Then we have a loop to sink and
+clamp layout transformations together.
+
+### Remove backend specific nodes
+
+Today, Glow core and custom backends implicitly hard-code this knowledge about the operations
+into (backend-specific) nodes and code that works with them. This is pretty fragile and
+involves a lot of boiler plate code.
+
+Combining the proposed solver with the backend-specified layout constraints would improve
+this situation considerably:
+
+- The backend would return this information and Glow core could insert all the required layout transformations
+
+- The transformations can also be optimized "for free": Glow currently optimizes `TransposeNode`:
+ - Multiple transposes can be combined into one
+ - Opposite transposes can eliminate each other
+
+- The functionality to insert the required layout transforms is handled by the Glow core,
+which removes a lot of code duplication from backends.
+
+[^tl0]: [Glow Issue: Fix bug in constant folding optimization](https://github.com/pytorch/glow/issues/3500)
+
+[^tl1]: [Tensor Layout Design Decision in PlaidML](https://github.com/plaidml/plaidml/blob/master/plaidml2/op/lib/design.md#tensor-layout)
+
+[^tl2]: [TVM Operator Inventory](https://docs.tvm.ai/api/python/topi.html)
+
+[^tl3]: [XLA CPU Layout Assignment](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/xla/service/cpu/cpu_layout_assignment.cc)
+
+[^tl4]: [XLA GPU Layout Assignment](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/xla/service/gpu/gpu_layout_assignment.cc)
+
+[^tl5]: [XLA Layout optimizer](https://github.com/tensorflow/tensorflow/blob/b6f7ce2b98b496886be4d900a6f88c24ae730f2c/tensorflow/core/grappler/optimizers/layout_optimizer.cc)
+
+[^tl6]: [TensorFlow Graph Optimizations](https://web.stanford.edu/class/cs245/slides/TFGraphOptimizationsStanford.pdf)
+
+[^tl7]: [XLA Alter Layout](https://github.com/dmlc/tvm/blob/025a6c8077cd1914bdd4132c6b86de007151344e/src/relay/pass/alter_op_layout.cc)
+
+[^tl8]: [Proposal to add layout attribute to MLIR Tensor Type](https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/sCaIEKm2RxA)
diff --git a/examples/fr2en.cpp b/examples/fr2en.cpp
index f6c3a48a42..6b93daaeed 100644
--- a/examples/fr2en.cpp
+++ b/examples/fr2en.cpp
@@ -277,14 +277,15 @@ void Model::loadEncoder() {
         {0, step, 0}, {batchSize_, step + 1, EMBEDDING_SIZE});
     Node *reshape =
         F_->createReshape("encoder." + std::to_string(step) + ".reshape",
-                          inputSlice, {batchSize_, EMBEDDING_SIZE});
+                          inputSlice, {batchSize_, EMBEDDING_SIZE}, ANY_LAYOUT);
     hidden = createPyTorchGRUCell(F_, reshape, hidden, wIh, bIh, wHh, bHh);
     outputs.push_back(hidden);
   }
 
   Node *output = F_->createConcat("encoder.output", outputs, 1);
-  Node *r2 = F_->createReshape("encoder.output.r2", output,
-                               {MAX_LENGTH * batchSize_, EMBEDDING_SIZE});
+  Node *r2 =
+      F_->createReshape("encoder.output.r2", output,
+                        {MAX_LENGTH * batchSize_, EMBEDDING_SIZE}, ANY_LAYOUT);
 
   encoderHiddenOutput_ = F_->createGather("encoder.outputNth", r2, seqLength_);
 }
@@ -339,14 +340,14 @@ void Model::loadDecoder() {
     Node *FC = F_->createFullyConnected("decoder.outFC", hidden, outW, outB);
     auto *topK = F_->createTopK("decoder.topK", FC, 1);
 
-    lastWordIdx =
-        F_->createReshape("decoder.reshape", topK->getIndices(), {batchSize_});
+    lastWordIdx = F_->createReshape("decoder.reshape", topK->getIndices(),
+                                    {batchSize_}, "N");
     outputs.push_back(lastWordIdx);
   }
 
   Node *concat = F_->createConcat("decoder.output.concat", outputs, 0);
   Node *reshape = F_->createReshape("decoder.output.reshape", concat,
-                                    {MAX_LENGTH, batchSize_});
+                                    {MAX_LENGTH, batchSize_}, ANY_LAYOUT);
   auto *save = F_->createSave("decoder.output", reshape);
   output_ = save->getPlaceholder();
   bindings.allocate(output_);
diff --git a/include/glow/Backend/Backend.h b/include/glow/Backend/Backend.h
index 93ed9702a1..33cc0a31f7 100644
--- a/include/glow/Backend/Backend.h
+++ b/include/glow/Backend/Backend.h
@@ -31,6 +31,7 @@ class Node;
 class PlaceholderBindings;
 class IRGenVisitor;
 class FunctionPassPipeline;
+class TensorLayoutCommon;
 
 namespace runtime {
 
@@ -121,6 +122,11 @@ class Backend {
   /// has a good reason not to call IRFunction::verify().
   virtual bool verify(const IRFunction &IR) const;
 
+  /// \returns a reference to the backend-specific tensor layout requirements
+  /// singleton. If not overridden, the default requirement is Glow's
+  /// "canonical" form.
+  virtual TensorLayoutCommon &getTensorLayoutRequirements() const;
+
   /// \returns true if the supplied Node \N should be lowered. By default, all
   /// Nodes are candidates for lowering.
   virtual bool shouldLower(const Node *N) const { return true; }
diff --git a/include/glow/Graph/Graph.h b/include/glow/Graph/Graph.h
index 80ea893a32..c87e04e7a6 100644
--- a/include/glow/Graph/Graph.h
+++ b/include/glow/Graph/Graph.h
@@ -56,6 +56,9 @@ enum class FunctionState {
   FuncLoaded,
 };
 
+/// Helper names for common tensor layouts.
+#define ANY_LAYOUT "*"
+
 class Module final {
   /// Stores the functions in the module.
   FunctionList functions_;
@@ -173,26 +176,34 @@ class Module final {
   ///@{
 
   Placeholder *createPlaceholder(ElemKind T, llvm::ArrayRef<size_t> dims,
-                                 llvm::StringRef name, bool isTrainable);
+                                 llvm::StringRef name, bool isTrainable,
+                                 const std::string &layout = ANY_LAYOUT);
 
   Placeholder *createPlaceholder(TypeRef T, llvm::StringRef name,
-                                 bool isTrainable);
+                                 bool isTrainable,
+                                 const std::string &layout = ANY_LAYOUT);
 
   Placeholder *createPlaceholder(ElemKind T, llvm::ArrayRef<size_t> dims,
                                  float scale, int32_t offset,
-                                 llvm::StringRef name, bool isTrainable);
+                                 llvm::StringRef name, bool isTrainable,
+                                 const std::string &layout = ANY_LAYOUT);
 
-  Constant *createConstant(TypeRef T, llvm::StringRef name);
+  Constant *createConstant(TypeRef T, llvm::StringRef name,
+                           const std::string &layout = ANY_LAYOUT);
 
   Constant *createConstant(ElemKind T, llvm::ArrayRef<size_t> dims,
-                           llvm::StringRef name);
+                           llvm::StringRef name,
+                           const std::string &layout = ANY_LAYOUT);
 
   Constant *createConstant(ElemKind T, llvm::ArrayRef<size_t> dims, float scale,
-                           int32_t offset, llvm::StringRef name);
+                           int32_t offset, llvm::StringRef name,
+                           const std::string &layout = ANY_LAYOUT);
 
-  Constant *createConstant(llvm::StringRef name, const Tensor &tensor);
+  Constant *createConstant(llvm::StringRef name, const Tensor &tensor,
+                           const std::string &layout = ANY_LAYOUT);
 
-  Constant *createConstant(llvm::StringRef name, Tensor &&tensor);
+  Constant *createConstant(llvm::StringRef name, Tensor &&tensor,
+                           const std::string &layout = ANY_LAYOUT);
 
   ///@}
 
@@ -250,6 +261,10 @@ class Module final {
   Module &operator=(PlaceholderBindings &&) = delete;
 };
 
+// Forward Declaration for verify's optional parameter
+class Backend;
+struct CompilationContext;
+
 /// Represents the compute graph.
 class Function final : public Named {
   /// A list of nodes that the Function owns.
@@ -597,10 +612,12 @@ class Function final : public Named {
                                       NodeValue targets);
 
   ReshapeNode *createReshape(llvm::StringRef name, NodeValue input,
-                             UnsignedArrayRef shape);
+                             UnsignedArrayRef shape,
+                             llvm::StringRef layout = ANY_LAYOUT);
 
   TransposeNode *createTranspose(llvm::StringRef name, NodeValue input,
-                                 llvm::ArrayRef<unsigned_t> shuffle);
+                                 llvm::ArrayRef<unsigned_t> shuffle,
+                                 const std::string &layout = ANY_LAYOUT);
 
   /// Create a series of nodes that implement a Broadcast operation. The \p
   /// input Tensor is broadcasted based on \p newShape and along the \p axis,
@@ -1302,9 +1319,11 @@ class Function final : public Named {
   Function *clone(llvm::StringRef newName,
                   llvm::DenseMap<Node *, Node *> *map = nullptr);
 
-  /// Verify the correctness of the Function.
-  /// \returns true when the function is valid. False otherwise.
-  bool verify() const;
+  /// Verify the correctness of the Function. If \p backend is provided, checks
+  /// backend-specific layout requirements. Else checks the requirements based
+  /// on Glow's "canonical" layout. \returns true when the function is valid.
+  /// False otherwise.
+  bool verify(const Backend *backend = nullptr) const;
 
   /// Dump a textual representation of the Function into provided output stream.
   void dump() const;
@@ -1367,6 +1386,10 @@ Node *recursiveClone(Function *newF, Node *node, NodeMap &currToNew);
   { 0u, 2u, 3u, 1u }
 #define NHWC2NCHW                                                              \
   { 0u, 3u, 1u, 2u }
+#define HWCN2NHWC                                                              \
+  { 3u, 0u, 1u, 2u }
+#define NHWC2HWNC                                                              \
+  { 1u, 2u, 0u, 3u }
 
 llvm::raw_ostream &operator<<(llvm::raw_ostream &os, const Module &mod);
 
diff --git a/include/glow/Graph/Nodes.h b/include/glow/Graph/Nodes.h
index e53a4da6c6..adb71cfc9a 100644
--- a/include/glow/Graph/Nodes.h
+++ b/include/glow/Graph/Nodes.h
@@ -36,7 +36,8 @@ class Storage : public Node {
     OutputIdx = 0,
   };
 
-  Storage(Kinded::Kind k, llvm::StringRef name) : Node(k, name) {}
+  Storage(Kinded::Kind k, llvm::StringRef name, const std::string &layout)
+      : Node(k, name), layout_(layout) {}
 
   /// \return the single output value of the node.
   NodeValue getOutput() { return getNthResult(0); }
@@ -68,6 +69,13 @@ class Storage : public Node {
     return k->getKind() == Kinded::Kind::ConstantKind ||
            k->getKind() == Kinded::Kind::PlaceholderKind;
   }
+
+  /// \return the layout of the storage.
+  const std::string &getLayout() const { return layout_; }
+
+private:
+  /// Specifies the Storage's layout
+  const std::string layout_;
 };
 
 class Constant : public Storage {
@@ -76,14 +84,14 @@ class Constant : public Storage {
 
 public:
   /// Create a new constant and initialize its payload.
-  Constant(llvm::StringRef name, TypeRef Ty)
-      : Storage(Kinded::Kind::ConstantKind, name) {
+  Constant(llvm::StringRef name, TypeRef Ty, const std::string &layout)
+      : Storage(Kinded::Kind::ConstantKind, name, layout) {
     addResult(Ty);
     payload_.reset(*Ty);
   }
 
-  Constant(llvm::StringRef name, Tensor &&payload)
-      : Storage(Kinded::Kind::ConstantKind, name),
+  Constant(llvm::StringRef name, Tensor &&payload, const std::string &layout)
+      : Storage(Kinded::Kind::ConstantKind, name, layout),
         payload_(std::move(payload)) {
     addResult(&payload_.getType());
   }
@@ -145,8 +153,9 @@ class Placeholder : public Storage {
 
 public:
   /// Create a new placeholder.
-  Placeholder(llvm::StringRef name, TypeRef Ty, bool isTrainable)
-      : Storage(Kinded::Kind::PlaceholderKind, name),
+  Placeholder(llvm::StringRef name, TypeRef Ty, bool isTrainable,
+              const std::string &layout)
+      : Storage(Kinded::Kind::PlaceholderKind, name, layout),
         isTrainable_(isTrainable) {
     addResult(Ty);
   }
diff --git a/include/glow/Graph/TensorLayout.h b/include/glow/Graph/TensorLayout.h
new file mode 100644
index 0000000000..8944256224
--- /dev/null
+++ b/include/glow/Graph/TensorLayout.h
@@ -0,0 +1,192 @@
+/**
+ * Copyright (c) 2017-present, Facebook, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+#ifndef GLOW_GRAPH_TENSORLAYOUT_H
+#define GLOW_GRAPH_TENSORLAYOUT_H
+
+#include <memory>
+#include <string>
+
+#include "glow/Graph/Nodes.h"
+#include "glow/Support/Error.h"
+
+namespace glow {
+
+/// Layout requirements's Singleton.
+template <typename T> class TensorLayoutSingleton {
+public:
+  /// This is how the verifier, Backend and post-loading canonicalizer can
+  /// access layout constraints.
+  static T &getInstance() {
+    // The Ctor will only be called once.
+    static const std::unique_ptr<T> instance{new T{token_{}}};
+    return *instance;
+  }
+
+protected:
+  /// Allow the base class to call any subclass's constructor.
+  struct token_ {};
+
+  /// Default Ctor.
+  TensorLayoutSingleton() {}
+
+  /// Dtor.
+  virtual ~TensorLayoutSingleton() {}
+
+private:
+  /// Delete copy constructor.
+  TensorLayoutSingleton(const TensorLayoutSingleton &) = delete;
+
+  /// Delete move constructor.
+  TensorLayoutSingleton(TensorLayoutSingleton &&) = delete;
+
+  /// Delete copy assignment.
+  TensorLayoutSingleton &operator=(const TensorLayoutSingleton &) = delete;
+
+  /// Delete move assignment.
+  TensorLayoutSingleton &operator=(TensorLayoutSingleton &&) = delete;
+};
+
+/// TensorLayoutDescription - optional helper class for parsing string-based
+/// layout.
+class TensorLayoutDescription {
+  /// Tensor dimensions descriptions for all dimensions.
+  std::string dims_[max_tensor_dimensions];
+  /// The serialization of the layout.
+  std::string serializedLayout_;
+  /// Expected number of dimensions.
+  size_t numDims_;
+
+public:
+  virtual ~TensorLayoutDescription() = default;
+  /// Constructs this helper class from a serialized string representation.
+  TensorLayoutDescription(const std::string &layoutStr);
+  /// Constructs this helper class from an array of strings representing each
+  /// individual / pre-separated dimension.
+  TensorLayoutDescription(llvm::ArrayRef<std::string> dims);
+  /// \returns the alignment of a dimension \p n.
+  size_t getAlignment(size_t n) const;
+  /// \returns the alignment by parsing dimension string \p s.
+  size_t getAlignment(const std::string &s) const;
+  /// \returns true if both tensor layouts are the same.
+  bool isSameLayout(const TensorLayoutDescription &rhs) const;
+  /// \returns description of the dimension \p n.
+  const llvm::StringRef getNthDimDescription(size_t n) const;
+  /// \returns the description of all dimensions.
+  llvm::ArrayRef<std::string> getDims() const;
+  /// \returns number of dimensions.
+  size_t getNumDims() const { return numDims_; }
+  /// \returns layout name.
+  llvm::StringRef getSerializedLayout() const { return serializedLayout_; }
+  /// \returns true if the layout is "*" in all dimensions.
+  bool isAnyLayout();
+  std::string getDebugDesc() const;
+
+protected:
+  /// parse helper: get the custom extensions information. the default, virtual,
+  /// implementation just ignores all the data until the end token.
+  virtual void parseCustomExtensions(llvm::StringRef &text, unsigned idx);
+
+private:
+  /// Constructor helper: Parses the  serialized string.
+  void parse(llvm::StringRef text);
+
+  /// parse helper: get the official extensions information.
+  void parseOfficialExtensions(llvm::StringRef &text, unsigned idx);
+};
+
+/// Interface for finding out layout requirements.
+class TensorLayoutCommon {
+public:
+  /// \return the default n-D layout for Glow.
+  virtual std::string getDefaultNDLayout(unsigned dims) const;
+
+  /// \returns layout requirements of the Nth input \p n of a Node \p node.
+  virtual std::string getNthInputLayoutRequirements(const Node *node, size_t n);
+
+  /// \returns layout requirements of the Nth result \p n of a Node \p node.
+  virtual std::string getNthResultLayoutRequirements(const Node *node,
+                                                     size_t n);
+
+  /// \returns true if type \p ty satisfies the \p destLayout layout. If \p
+  /// srcLayout is provided, it is taken into account as well.
+  virtual bool isSatisfiedBy(TypeRef ty,
+                             const TensorLayoutDescription &destLayout,
+                             const TensorLayoutDescription *srcLayout) const;
+
+  /// \return layouts for all tensor dimensions.
+  virtual llvm::ArrayRef<TensorLayoutDescription> getLayoutsForDims() const;
+
+  /// \returns true if layout equirement verification is enabled.
+  bool isEnabled() const { return enabled_; }
+
+protected:
+  TensorLayoutCommon();
+  TensorLayoutCommon(TensorLayoutCommon &&) = delete;
+  TensorLayoutCommon &operator=(const TensorLayoutCommon &) = delete;
+  TensorLayoutCommon &operator=(TensorLayoutCommon &&) = delete;
+  virtual ~TensorLayoutCommon();
+
+protected:
+  bool enabled_;
+
+private:
+  std::unordered_map<std::string, TensorLayoutDescription *>
+      layoutNameToLayoutDescription_;
+};
+
+class CanonicalTensorLayout final
+    : public TensorLayoutCommon,
+      public TensorLayoutSingleton<CanonicalTensorLayout> {
+public:
+  CanonicalTensorLayout(token_) {}
+
+  /// \return the default n-D layout for Glow.
+  std::string getDefaultNDLayout(unsigned dims) const override;
+
+  /// \returns layout requirements of the Nth input \p n of a Node \p node.
+  /// NOTE: Certain nodes are layout agnostic. Others expect their
+  /// inputs/outputs to have a canonical format. For some layout agnostic nodes
+  /// we need to look at the layout of their inputs to determine the layout of
+  /// their outputs, e.g. a batch norm. node, in the canonical representation,
+  /// accepts any input layout such as NCHW or NHWC, but, the output is a
+  /// propoagation of said layout.
+  std::string getNthInputLayoutRequirements(const Node *node,
+                                            size_t n) override;
+
+  /// \returns layout requirements of the Nth result \p n of a Node \p node.
+  std::string getNthResultLayoutRequirements(const Node *node,
+                                             size_t n) override;
+
+  /// \returns true of the node accepts any layout.
+  bool acceptsAnyLayout(const Node *node) const;
+};
+
+/// Checks if two layout descriptions \p lhs and \p rhs describe the same layout
+/// for a value of the type \p ty \returns true if layouts are the same. if \p
+/// verbose then print out verbose report.
+bool checkSameLayout(llvm::StringRef srcLayoutStr,
+                     llvm::StringRef destLayoutStr, TypeRef ty,
+                     const Node *parent, const std::string &prefix,
+                     const TensorLayoutCommon &TLC, bool verbose = true);
+
+/// Verifies the correctness of tensor layouts in the function \p F using layout
+/// requirements interface \p TLC. if \p verbose then print out verbose report.
+bool verifyLayouts(const Function &F, TensorLayoutCommon &TLC,
+                   bool verbose = true);
+
+} // end namespace glow
+
+#endif // GLOW_GRAPH_TENSORLAYOUT_H
diff --git a/lib/Backend/Backend.cpp b/lib/Backend/Backend.cpp
index 235ac054a8..0a9dce96e8 100644
--- a/lib/Backend/Backend.cpp
+++ b/lib/Backend/Backend.cpp
@@ -19,6 +19,7 @@
 
 #include "glow/Graph/Graph.h"
 #include "glow/Graph/PlaceholderBindings.h"
+#include "glow/Graph/TensorLayout.h"
 #include "glow/IR/Instrs.h"
 #include "glow/Optimizer/GraphOptimizer/CompilationContext.h"
 #include "glow/Optimizer/GraphOptimizerPipeline/Pipeline.h"
@@ -172,7 +173,7 @@ bool Backend::checkAllNodesSupported(const Function &F) const {
 }
 
 bool Backend::verify(const Function &F) const {
-  return F.verify() && checkAllNodesSupported(F);
+  return F.verify(this) && checkAllNodesSupported(F);
 }
 
 bool Backend::verify(const IRFunction &IR) const {
@@ -180,6 +181,10 @@ bool Backend::verify(const IRFunction &IR) const {
   return true;
 }
 
+TensorLayoutCommon &Backend::getTensorLayoutRequirements() const {
+  return CanonicalTensorLayout::getInstance();
+}
+
 FunctionPassPipeline Backend::getOptimizationPipeline() const {
   auto p = createDefaultGraphOptimizationPassPipeline();
   // Fold Tile followed by Add into BatchedAdd. Currently this is not part of
diff --git a/lib/Backends/Interpreter/Interpreter.cpp b/lib/Backends/Interpreter/Interpreter.cpp
index 2a7f53b5b2..52b0810a27 100644
--- a/lib/Backends/Interpreter/Interpreter.cpp
+++ b/lib/Backends/Interpreter/Interpreter.cpp
@@ -541,7 +541,7 @@ static bool checkLayoutForNode(const Node &N) {
 }
 
 bool Interpreter::verify(const Function &F) const {
-  if (!F.verify()) {
+  if (!F.verify(this)) {
     return false;
   }
   if (!checkAllNodesSupported(F)) {
diff --git a/lib/Backends/Interpreter/tests/InterpreterTensorLayoutTest.cpp b/lib/Backends/Interpreter/tests/InterpreterTensorLayoutTest.cpp
new file mode 100644
index 0000000000..bdf4d1d0f6
--- /dev/null
+++ b/lib/Backends/Interpreter/tests/InterpreterTensorLayoutTest.cpp
@@ -0,0 +1,20 @@
+/**
+ * Copyright (c) Glow Contributors. See CONTRIBUTORS file.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+#include "tests/unittests/BackendTestUtils.h"
+
+using namespace glow;
+
+std::set<std::string> glow::backendTestBlacklist = {};
diff --git a/lib/Backends/OpenCL/CMakeLists.txt b/lib/Backends/OpenCL/CMakeLists.txt
index 34a99c52ec..2339c82187 100644
--- a/lib/Backends/OpenCL/CMakeLists.txt
+++ b/lib/Backends/OpenCL/CMakeLists.txt
@@ -38,6 +38,7 @@ add_library(OpenCLBackend
             OpenCL.cpp
             OpenCLDeviceManager.cpp
             OpenCLFactory.cpp
+            OpenCLTensorLayout.cpp
             Transforms.cpp)
 
 target_link_libraries(OpenCLBackend
diff --git a/lib/Backends/OpenCL/OpenCL.cpp b/lib/Backends/OpenCL/OpenCL.cpp
index 46c388a447..63fbe6a9a3 100644
--- a/lib/Backends/OpenCL/OpenCL.cpp
+++ b/lib/Backends/OpenCL/OpenCL.cpp
@@ -22,6 +22,7 @@
 
 #include "OpenCL.h"
 #include "OpenCLDeviceManager.h"
+#include "OpenCLTensorLayout.h"
 
 #include "glow/Backend/BackendUtils.h"
 #include "glow/CodeGen/MemoryAllocator.h"
@@ -1720,7 +1721,7 @@ template <class T> static bool checkSquare(const T &I) {
 }
 
 bool OCLBackend::verify(const Function &F) const {
-  if (!F.verify()) {
+  if (!F.verify(this)) {
     return false;
   }
   if (!checkAllNodesSupported(F)) {
@@ -1879,6 +1880,10 @@ OCLBackend::createDeviceManager(const runtime::DeviceConfig &deviceConfig) {
   return createOCLDeviceManager(deviceConfig);
 }
 
+TensorLayoutCommon &OCLBackend::getTensorLayoutRequirements() const {
+  return OpenCLTensorLayout::getInstance();
+}
+
 TraceInfo OCLBackend::buildManualTraceInfo(Function *F) const {
   TraceInfo info(false, getTraceEventDataSize());
 
diff --git a/lib/Backends/OpenCL/OpenCL.h b/lib/Backends/OpenCL/OpenCL.h
index 418e25f90f..6f1437f4ef 100644
--- a/lib/Backends/OpenCL/OpenCL.h
+++ b/lib/Backends/OpenCL/OpenCL.h
@@ -213,6 +213,8 @@ class OCLBackend final : public BackendUsingGlowIR {
   bool verify(const Function &F) const override;
   bool verify(const IRFunction &IR) const override;
 
+  TensorLayoutCommon &getTensorLayoutRequirements() const override;
+
   bool shouldLower(const Node *N) const override {
     // The group convolution is supported in OpenCL slow convolution kernel.
     if (N->getKind() == Kinded::Kind::ConvolutionNodeKind)
diff --git a/lib/Backends/OpenCL/OpenCLTensorLayout.cpp b/lib/Backends/OpenCL/OpenCLTensorLayout.cpp
new file mode 100644
index 0000000000..6bd60da173
--- /dev/null
+++ b/lib/Backends/OpenCL/OpenCLTensorLayout.cpp
@@ -0,0 +1,122 @@
+/**
+ * Copyright (c) 2017-present, Facebook, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+#include "OpenCLTensorLayout.h"
+#include "glow/Optimizer/GraphOptimizer/CompilationContext.h"
+
+using namespace glow;
+
+/// Definitions of different tensor layouts.
+static std::string oclDimsNHWC[] = {
+    {"N"},
+    {"H"},
+    {"W"},
+    {"C"},
+};
+static std::string oclDimsNCHW[] = {
+    {"N"},
+    {"C"},
+    {"H"},
+    {"W"},
+};
+static TensorLayoutDescription oclLayoutNHWC(oclDimsNHWC);
+static TensorLayoutDescription oclLayoutNCHW(oclDimsNCHW);
+
+static std::string returnBaseReqOrNHWC(TensorLayoutDescription &baseReq,
+                                       const Node *node) {
+  if (!baseReq.isSameLayout(
+          CanonicalTensorLayout::getInstance().getLayoutsForDims()[4])) {
+    return baseReq.getSerializedLayout();
+  }
+  if (CanonicalTensorLayout::getInstance().acceptsAnyLayout(node)) {
+    // These nodes accept any 4-D layout.
+    return baseReq.getSerializedLayout();
+  }
+
+  return CanonicalTensorLayout::getInstance().getDefaultNDLayout(4);
+}
+
+/// Helper function, \returns either NHWC or NCHW layout based on the
+/// instruction's layout enum. This will be removed and refactored if/when we
+/// move to using strings for all layout specifications and get rid of the enum.
+template <typename N>
+static const TensorLayoutDescription *getLayoutFromEnum(const N &node) {
+  if (node->getLayout() == NCHW) {
+    return &oclLayoutNCHW;
+  }
+  return &oclLayoutNHWC;
+}
+
+/// \returns either NHWC or NCHW layout based on the instruction's layout enum
+/// if it has one. Else returns nullptr. This will be removed and refactored
+/// if/when we move to using strings for all layout specifications and get rid
+/// of the enum.
+static const TensorLayoutDescription *
+getLayoutForTempEnumRep(size_t n, const Node *node) {
+  if (const auto MP = llvm::dyn_cast<MaxPoolNode>(node)) {
+    return getLayoutFromEnum(MP);
+  }
+  if (const auto MPG = llvm::dyn_cast<MaxPoolGradNode>(node)) {
+    return getLayoutFromEnum(MPG);
+  }
+  if (const auto AP = llvm::dyn_cast<AvgPoolNode>(node)) {
+    return getLayoutFromEnum(AP);
+  }
+  if (const auto APG = llvm::dyn_cast<AvgPoolGradNode>(node)) {
+    return getLayoutFromEnum(APG);
+  }
+
+  if (const auto *CN = llvm::dyn_cast<ConvolutionNode>(node)) {
+    switch (n) {
+    case ConvolutionNode::InputIndices::BiasIdx:
+      return &CanonicalTensorLayout::getInstance().getLayoutsForDims()[1];
+    default: { return getLayoutFromEnum(CN); }
+    }
+  }
+  return nullptr;
+}
+
+std::string OpenCLTensorLayout::getNthInputLayoutRequirements(const Node *node,
+                                                              size_t n) {
+  DCHECK_LT(n, node->getNumInputs()) << "Wrong input number";
+  auto inputNode = node->getNthInput(n);
+  auto dims = inputNode.getType()->dims();
+  DCHECK_LE(dims.size(), max_tensor_dimensions) << "Too many dimensions";
+  // TODO: Remove ->getLayout() enum and take a string like transpose. Refactor
+  // the following after doing so.
+  const auto *layout = getLayoutForTempEnumRep(n, node);
+  if (layout) {
+    return layout->getSerializedLayout();
+  }
+  auto baseReq = TensorLayoutCommon::getNthInputLayoutRequirements(node, n);
+  auto baseReqHelper = TensorLayoutDescription(baseReq);
+  return returnBaseReqOrNHWC(baseReqHelper, node);
+}
+
+std::string OpenCLTensorLayout::getNthResultLayoutRequirements(const Node *node,
+                                                               size_t n) {
+  DCHECK_LT(n, node->getNumResults()) << "Wrong output number";
+  auto dims = node->getNthResult(n).getType()->dims();
+  DCHECK_LE(dims.size(), max_tensor_dimensions) << "Too many dimensions";
+  // TODO: Remove ->getLayout() enum and take a string like transpose. Refactor
+  // the following after doing so.
+  const auto *layout = getLayoutForTempEnumRep(n, node);
+  if (layout) {
+    return layout->getSerializedLayout();
+  }
+  auto baseReq = TensorLayoutCommon::getNthResultLayoutRequirements(node, n);
+  auto baseReqHelper = TensorLayoutDescription(baseReq);
+  return returnBaseReqOrNHWC(baseReqHelper, node);
+}
diff --git a/lib/Backends/OpenCL/OpenCLTensorLayout.h b/lib/Backends/OpenCL/OpenCLTensorLayout.h
new file mode 100644
index 0000000000..fa3531097c
--- /dev/null
+++ b/lib/Backends/OpenCL/OpenCLTensorLayout.h
@@ -0,0 +1,40 @@
+/**
+ * Copyright (c) 2017-present, Facebook, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+#ifndef GLOW_BACKENDS_OPENCL_TENSORLAYOUT_H
+#define GLOW_BACKENDS_OPENCL_TENSORLAYOUT_H
+
+#include "glow/Graph/TensorLayout.h"
+
+namespace glow {
+
+class OpenCLTensorLayout final
+    : public TensorLayoutCommon,
+      public TensorLayoutSingleton<OpenCLTensorLayout> {
+public:
+  OpenCLTensorLayout(token_) { enabled_ = true; }
+
+  /// \returns layout requirements of the Nth input \p n of a Node \p node.
+  std::string getNthInputLayoutRequirements(const Node *node,
+                                            size_t n) override;
+
+  /// \returns layout requirements of the Nth result \p n of a Node \p node.
+  std::string getNthResultLayoutRequirements(const Node *node,
+                                             size_t n) override;
+};
+
+} // end namespace glow
+
+#endif // GLOW_BACKENDS_OPENCL_TENSORLAYOUT_H
diff --git a/lib/Backends/OpenCL/tests/OpenCLTensorLayoutTest.cpp b/lib/Backends/OpenCL/tests/OpenCLTensorLayoutTest.cpp
new file mode 100644
index 0000000000..bdf4d1d0f6
--- /dev/null
+++ b/lib/Backends/OpenCL/tests/OpenCLTensorLayoutTest.cpp
@@ -0,0 +1,20 @@
+/**
+ * Copyright (c) Glow Contributors. See CONTRIBUTORS file.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+#include "tests/unittests/BackendTestUtils.h"
+
+using namespace glow;
+
+std::set<std::string> glow::backendTestBlacklist = {};
diff --git a/lib/Base/CMakeLists.txt b/lib/Base/CMakeLists.txt
index 04ae417f6a..01c4243045 100644
--- a/lib/Base/CMakeLists.txt
+++ b/lib/Base/CMakeLists.txt
@@ -18,3 +18,5 @@ if(PNG_FOUND)
                         PRIVATE
                           ${PNG_LIBRARY})
 endif()
+
+add_dependencies(Base AutoGen)
diff --git a/lib/Graph/CMakeLists.txt b/lib/Graph/CMakeLists.txt
index 2aa24d3cc8..c7a331970f 100644
--- a/lib/Graph/CMakeLists.txt
+++ b/lib/Graph/CMakeLists.txt
@@ -27,6 +27,7 @@ add_library(Graph
             NodeValue.cpp
             Log.cpp
             PlaceholderBindings.cpp
+            TensorLayout.cpp
             Graph.cpp
             Grad.cpp
             VerifierHelper.cpp)
diff --git a/lib/Graph/Grad.cpp b/lib/Graph/Grad.cpp
index f576bcbe3d..33b2ad9db4 100644
--- a/lib/Graph/Grad.cpp
+++ b/lib/Graph/Grad.cpp
@@ -138,7 +138,7 @@ Function *glow::differentiate(Function *F, const TrainingConfig &conf,
 
       // Swap the src and dest.
       auto *X = new ReshapeNode(N->getName(), inputW.getType(), outputG,
-                                inputW.getType()->dims());
+                                inputW.getType()->dims(), RN->getLayout());
       toAppend.push_back(X);
       map.addGradient(RN->getInput(), X);
       continue;
@@ -164,8 +164,9 @@ Function *glow::differentiate(Function *F, const TrainingConfig &conf,
       auto *BRAInputType =
           F->getParent()->uniqueTypeWithNewShape(TNInputType, BRAInputDims);
 
-      auto *RN = new ReshapeNode(TN->getName().str() + ".grad.reshape",
-                                 BRAInputType, outputG, BRAInputType->dims());
+      auto *RN =
+          new ReshapeNode(TN->getName().str() + ".grad.reshape", BRAInputType,
+                          outputG, BRAInputType->dims(), "*");
       auto *BRA =
           new BatchedReduceAddNode(TN->getName().str() + ".grad.bra",
                                    TN->getInput().getType(), RN, TN->getAxis());
@@ -195,14 +196,18 @@ Function *glow::differentiate(Function *F, const TrainingConfig &conf,
 
       // Generate the reverse shuffle.
       auto shuffle = TN->getShuffle();
+      auto layout = TN->getLayout();
+      std::string reverseLayout;
+      reverseLayout.resize(TN->getLayout().size());
       std::vector<unsigned_t> reverseShuffle(shuffle.begin(), shuffle.end());
       for (unsigned int i = 0; i < shuffle.size(); i++) {
         reverseShuffle[shuffle[i]] = i;
+        reverseLayout[shuffle[i]] = layout[i];
       }
 
       // Swap the src and dest.
       auto *X = new TransposeNode(N->getName(), inputW.getType(), outputG,
-                                  reverseShuffle);
+                                  reverseShuffle, reverseLayout);
       toAppend.push_back(X);
       map.addGradient(TN->getInput(), X);
       continue;
diff --git a/lib/Graph/Graph.cpp b/lib/Graph/Graph.cpp
index faaf852c66..3b9f05f2b1 100644
--- a/lib/Graph/Graph.cpp
+++ b/lib/Graph/Graph.cpp
@@ -14,8 +14,10 @@
  * limitations under the License.
  */
 #include "glow/Graph/Graph.h"
+#include "glow/Backend/Backend.h"
 #include "glow/Graph/Nodes.h"
 #include "glow/Graph/PlaceholderBindings.h"
+#include "glow/Graph/TensorLayout.h"
 #include "glow/Graph/VerifierHelper.h"
 #include "glow/Quantization/Base/Base.h"
 #include "glow/Support/Support.h"
@@ -495,9 +497,10 @@ static ShapeVector getNewShapeWithoutAxes(llvm::ArrayRef<size_t> dims,
 //===----------------------------------------------------------------------===//
 
 Placeholder *Module::createPlaceholder(TypeRef T, llvm::StringRef name,
-                                       bool isTrainable) {
+                                       bool isTrainable,
+                                       const std::string &layout) {
   auto FT = uniqueType(*T);
-  auto *ph = new Placeholder(name, FT, isTrainable);
+  auto *ph = new Placeholder(name, FT, isTrainable, layout);
   ph->setName(uniqueName(ph->getName(), usedNodeNames_, usedStorageNames_));
   placeholders_.push_back(ph);
   logStorageCreation(functions_, ph);
@@ -505,44 +508,51 @@ Placeholder *Module::createPlaceholder(TypeRef T, llvm::StringRef name,
 }
 
 Placeholder *Module::createPlaceholder(ElemKind T, llvm::ArrayRef<size_t> dims,
-                                       llvm::StringRef name, bool isTrainable) {
+                                       llvm::StringRef name, bool isTrainable,
+                                       const std::string &layout) {
   auto FT = uniqueType(T, dims);
-  return createPlaceholder(FT, name, isTrainable);
+  return createPlaceholder(FT, name, isTrainable, layout);
 }
 
 Placeholder *Module::createPlaceholder(ElemKind T, llvm::ArrayRef<size_t> dims,
                                        float scale, int32_t offset,
-                                       llvm::StringRef name, bool isTrainable) {
+                                       llvm::StringRef name, bool isTrainable,
+                                       const std::string &layout) {
   auto FT = uniqueType(T, dims, scale, offset);
-  return createPlaceholder(FT, name, isTrainable);
+  return createPlaceholder(FT, name, isTrainable, layout);
 }
 
-Constant *Module::createConstant(TypeRef T, llvm::StringRef name) {
+Constant *Module::createConstant(TypeRef T, llvm::StringRef name,
+                                 const std::string &layout) {
   auto FT = uniqueType(*T);
-  return addConstant(new Constant(name, FT));
+  return addConstant(new Constant(name, FT, layout));
 }
 
 Constant *Module::createConstant(ElemKind T, llvm::ArrayRef<size_t> dims,
-                                 llvm::StringRef name) {
+                                 llvm::StringRef name,
+                                 const std::string &layout) {
   auto FT = uniqueType(T, dims);
-  return createConstant(FT, name);
+  return createConstant(FT, name, layout);
 }
 
 Constant *Module::createConstant(ElemKind T, llvm::ArrayRef<size_t> dims,
                                  float scale, int32_t offset,
-                                 llvm::StringRef name) {
+                                 llvm::StringRef name,
+                                 const std::string &layout) {
   auto FT = uniqueType(T, dims, scale, offset);
-  return createConstant(FT, name);
+  return createConstant(FT, name, layout);
 }
 
-Constant *Module::createConstant(llvm::StringRef name, const Tensor &tensor) {
-  auto *V = createConstant(&tensor.getType(), name);
+Constant *Module::createConstant(llvm::StringRef name, const Tensor &tensor,
+                                 const std::string &layout) {
+  auto *V = createConstant(&tensor.getType(), name, layout);
   V->assign(&tensor);
   return V;
 }
 
-Constant *Module::createConstant(llvm::StringRef name, Tensor &&tensor) {
-  return addConstant(new Constant(name, std::move(tensor)));
+Constant *Module::createConstant(llvm::StringRef name, Tensor &&tensor,
+                                 const std::string &layout) {
+  return addConstant(new Constant(name, std::move(tensor), layout));
 }
 
 std::string Module::getPrefix(llvm::StringRef name) {
@@ -956,23 +966,46 @@ Function::createSigmoidCrossEntropyWithLogits(llvm::StringRef name,
 }
 
 ReshapeNode *Function::createReshape(llvm::StringRef name, NodeValue input,
-                                     llvm::ArrayRef<size_t> shape) {
+                                     llvm::ArrayRef<size_t> shape,
+                                     llvm::StringRef layout) {
   auto TR = getParent()->uniqueTypeWithNewShape(input.getType(), shape);
   DCHECK_EQ(TR->size(), input.getType()->size())
       << "Reshape to a different size";
-  return addNode(new ReshapeNode(name, TR, input, shape.vec()));
+  return addNode(new ReshapeNode(name, TR, input, shape.vec(), layout));
 }
 
 TransposeNode *Function::createTranspose(llvm::StringRef name, NodeValue input,
-                                         llvm::ArrayRef<unsigned_t> shuffle) {
+                                         llvm::ArrayRef<unsigned_t> shuffle,
+                                         const std::string &layout) {
   ShapeVector shape;
   auto dims = input.dims();
   for (size_t i = 0; i < dims.size(); i++) {
     shape.push_back(dims[shuffle[i]]);
   }
 
+  // If the layout is known, check that it matches the shuffle:
+  auto compareShuffle = [&](const std::vector<unsigned_t> targetShuffle) {
+    auto shuffleVec = shuffle.vec();
+    return targetShuffle.size() == dims.size() &&
+           std::equal(shuffleVec.begin(), shuffleVec.end(),
+                      targetShuffle.begin());
+  };
+
+  auto currLayout = layout;
+  if (currLayout == ANY_LAYOUT) {
+    // If layout got a default value, change it based on shuffle:
+    // TODO: remove the shuffle and replace it with layout.
+    if (compareShuffle(NCHW2NHWC) || compareShuffle(HWCN2NHWC)) {
+      currLayout = "NHWC";
+    } else if (compareShuffle(NHWC2NCHW)) {
+      currLayout = "NCHW";
+    } else if (compareShuffle(NHWC2HWNC)) {
+      currLayout = "HWNC";
+    }
+  }
+
   auto NT = getParent()->uniqueTypeWithNewShape(input.getType(), shape);
-  return addNode(new TransposeNode(name, NT, input, shuffle.vec()));
+  return addNode(new TransposeNode(name, NT, input, shuffle.vec(), currLayout));
 }
 
 Node *Function::createBroadcast(llvm::StringRef name, NodeValue input,
@@ -3140,8 +3173,21 @@ insertAndReport(std::unordered_map<std::string, const Node *> &nameToNode,
   return true;
 }
 
-bool Function::verify() const {
+bool Function::verify(const Backend *backend) const {
   bool isValid = true;
+  if (backend) {
+    if (backend->getTensorLayoutRequirements().isEnabled()) {
+      isValid &= expectCompareTrue(
+          "Expected correct backend-specific layouts for the graph",
+          verifyLayouts(*this, backend->getTensorLayoutRequirements()), true,
+          this);
+    }
+  } else {
+    // Always run verification pre-lowering / when we don't have backend:
+    isValid &= expectCompareTrue(
+        "Expected correct Glow canonical layouts for the graph",
+        verifyLayouts(*this, CanonicalTensorLayout::getInstance()), true, this);
+  }
   std::unordered_map<std::string, const Node *> nameToNode;
 
   for (auto *V : getParent()->getConstants()) {
diff --git a/lib/Graph/Nodes.cpp b/lib/Graph/Nodes.cpp
index 25ac56cee2..7c96e75f93 100644
--- a/lib/Graph/Nodes.cpp
+++ b/lib/Graph/Nodes.cpp
@@ -86,6 +86,7 @@ Node *Storage::clone() const { llvm_unreachable("Storage can't be cloned."); }
 std::string Constant::getDebugDesc() const {
   DescriptionBuilder db(getKindName());
   db.addParam("name", quote(getName()))
+      .addParam("layout", getLayout())
       .addParam("output", *getType())
       .addParam("users", getNumUsers());
   return db;
@@ -94,6 +95,7 @@ std::string Constant::getDebugDesc() const {
 std::string Placeholder::getDebugDesc() const {
   DescriptionBuilder db(getKindName());
   db.addParam("name", quote(getName()))
+      .addParam("layout", getLayout())
       .addParam("output", *getType())
       .addParam("users", getNumUsers())
       .addParam("trainable", isTraining());
diff --git a/lib/Graph/TensorLayout.cpp b/lib/Graph/TensorLayout.cpp
new file mode 100644
index 0000000000..af4374dde2
--- /dev/null
+++ b/lib/Graph/TensorLayout.cpp
@@ -0,0 +1,581 @@
+/**
+ * Copyright (c) 2017-present, Facebook, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include <ctype.h>
+#include <memory>
+#include <sstream>
+
+#include <glog/logging.h>
+
+#include "glow/Graph/Graph.h"
+#include "glow/Graph/TensorLayout.h"
+#include "glow/Graph/VerifierHelper.h"
+
+using namespace glow;
+
+/// Checks if two layout descriptions \p lhs and \p rhs describe the same layout
+/// for a value of the type \p ty \returns true if layouts are the same.
+bool glow::checkSameLayout(llvm::StringRef srcLayoutStr,
+                           llvm::StringRef destLayoutStr, TypeRef ty,
+                           const Node *parent, const std::string &prefix,
+                           const TensorLayoutCommon &TLC, bool verbose) {
+  auto srcLayout = TensorLayoutDescription(srcLayoutStr);
+  auto destLayout = TensorLayoutDescription(destLayoutStr);
+  // Are layouts literally the same?
+  if (srcLayout.isSameLayout(destLayout)) {
+    return true;
+  }
+  // Does the type satisfy the dest layout?
+  if (TLC.isSatisfiedBy(ty, destLayout, &srcLayout)) {
+    return true;
+  }
+  if (verbose) {
+    report("\n\n\n");
+    reportContext(parent);
+    report("\n");
+    report(prefix);
+    report("\n");
+    report(parent->getDebugDesc());
+    report("\nMismatching layouts:\n");
+    report("Provided layout\n");
+    report(srcLayout.getDebugDesc());
+    report("\n");
+    report("Expected layout\n");
+    report(destLayout.getDebugDesc());
+    report("\n");
+  }
+  return false;
+}
+
+/// Verifies the correctness of tensor layouts in the function \p F using layout
+/// requirements interface \p TLC.
+bool glow::verifyLayouts(const Function &F, TensorLayoutCommon &TLC,
+                         bool verbose) {
+  bool isValid = true;
+  for (const auto &N : F.getNodes()) {
+    for (unsigned idx = 0, e = N.getNumInputs(); idx < e; ++idx) {
+      auto input = N.getNthInput(idx);
+      auto producerLayout =
+          TLC.getNthResultLayoutRequirements(input.getNode(), input.getResNo());
+      auto consumerLayout = TLC.getNthInputLayoutRequirements(&N, idx);
+      std::string inputName = strFormat("input %d", idx);
+      isValid &= checkSameLayout(producerLayout, consumerLayout,
+                                 input.getType(), &N, inputName, TLC, verbose);
+    }
+  }
+  return isValid;
+}
+
+TensorLayoutDescription::TensorLayoutDescription(const std::string &layoutStr) {
+  if (layoutStr.empty()) {
+    // 0-D output
+    numDims_ = 0;
+    return;
+  }
+  parse(layoutStr);
+}
+
+static bool isCustomExtension(llvm::StringRef text) {
+  auto nsPos = text.find(':');
+  if (nsPos == llvm::StringRef::npos) {
+    return false;
+  }
+  auto bracketPos = text.find(']');
+  assert(bracketPos != llvm::StringRef::npos && "Expected a closing bracket.");
+  return (bracketPos > nsPos);
+}
+
+// Serialization format -
+// The form for each dimension is as follows:
+// 1. (mandatory) one char representing the current dimension. Either an
+// alphabetic letter or '*'.
+// 2. (optional) token for the start of optional dimension information: '['
+// 3. (optional, must have 2. in place) namespace of the extension followed by
+// ':'. must be provided for non-official backends. example: ocl:<information>
+// 4. (optional,  must have 2. in place) end of the current default extension
+// ']'
+// 5. (optional) go to 2.
+// NOTE: To add alignment information, the format is: a=<size_t>
+// Example: N[a=32][namespace_for_unsupported:<bla>]HWC would represent 4-D
+// tensor wherein N needs an alignment of 32 + some closed-backend requirements
+// we don't know about. HWC have no restrictions.
+// NOTES:
+// 1. For each dimension, the identifier can be either a single english alphabet
+// letter, either upper or lower case, or the star symbol.
+// 2. We assume that a single letter is enough for each dimension, it makes
+// parsing easier and avoids adding delimiters in the serialized format,
+// however, we do have a constructor that (theoretically) accepts multi-letter
+// dimensions. If we decide to expand the current support, we will need to add
+// delimiters to the serialized form.
+void TensorLayoutDescription::parse(llvm::StringRef text) {
+  unsigned idx = 0;
+  while (!text.empty()) {
+    char curr = text.front();
+    text = text.drop_front();
+    if (curr == '\0' || isblank(curr)) {
+      continue;
+    }
+    switch (curr) {
+    case '[': {
+      assert(idx > 0 && "Expected at least one parsed entry.");
+      if (isCustomExtension(text)) {
+        parseCustomExtensions(text, idx - 1);
+      } else {
+        parseOfficialExtensions(text, idx - 1);
+      }
+      break;
+    }
+    default: {
+      DCHECK(isalpha(curr) || curr == '*')
+          << "Expected an alphabetic letter or '*'., got: " << curr
+          << " in string: " << text.str();
+      std::string currStr(1, curr);
+      dims_[idx].append(currStr);
+      serializedLayout_.append(dims_[idx]);
+      ++idx;
+      assert(idx <= max_tensor_dimensions && "Too many tensor dimensions");
+      break;
+    }
+    }
+  }
+  numDims_ = idx;
+}
+
+void TensorLayoutDescription::parseCustomExtensions(llvm::StringRef &text,
+                                                    unsigned idx) {
+  char curr = '[';
+  dims_[idx].append("[");
+  for (curr = text.front(); curr != ']' && !text.empty(); curr = text.front()) {
+    dims_[idx].append(std::string(1, curr));
+    text = text.drop_front();
+  }
+  assert(curr == ']' && "Expected closing ']' bracket.");
+  text = text.drop_front();
+  dims_[idx].append("]");
+}
+
+void TensorLayoutDescription::parseOfficialExtensions(llvm::StringRef &text,
+                                                      unsigned idx) {
+  // Only alignment so far - very simple parser:
+  if (!text.consume_front("a=")) {
+    llvm_unreachable("Unsupported layout extension.");
+  }
+  size_t align;
+  if (text.consumeInteger(10, align)) {
+    llvm_unreachable("Expected alignment info.");
+  }
+  if (!text.consume_front("]")) {
+    llvm_unreachable("Expected closing ']'");
+  }
+  dims_[idx].append("[a=");
+  dims_[idx].append(std::to_string(align));
+  dims_[idx].append("]");
+}
+
+TensorLayoutDescription::TensorLayoutDescription(
+    llvm::ArrayRef<std::string> dims) {
+  assert(dims.size() <= max_tensor_dimensions && "Too many tensor dimensions");
+  numDims_ = dims.size();
+  for (unsigned idx = 0; idx < numDims_; ++idx) {
+    dims_[idx] = dims[idx];
+    serializedLayout_.append(dims_[idx]);
+  }
+}
+
+const llvm::StringRef
+TensorLayoutDescription::getNthDimDescription(size_t n) const {
+  assert(n < numDims_ && "Wrong dimension number");
+  return dims_[n];
+}
+
+size_t TensorLayoutDescription::getAlignment(size_t n) const {
+  assert(n < numDims_ && "Wrong dimension number");
+  return getAlignment(dims_[n]);
+}
+
+size_t TensorLayoutDescription::getAlignment(const std::string &s) const {
+  std::string alignPrefix = "a=";
+  size_t pos = s.find(alignPrefix);
+  if (pos == std::string::npos) {
+    // Default alignment:
+    return 1;
+  }
+  auto align = s.substr(pos + alignPrefix.size());
+  size_t ret;
+  std::istringstream(align) >> ret;
+  return ret;
+}
+
+llvm::ArrayRef<std::string> TensorLayoutDescription::getDims() const {
+  return llvm::makeArrayRef(dims_, numDims_);
+}
+
+std::string TensorLayoutDescription::getDebugDesc() const {
+  std::string desc = "Layout: " + getSerializedLayout().str() + " [";
+  for (unsigned idx = 0; idx < numDims_; idx++) {
+    if (idx > 0) {
+      desc += ", ";
+    }
+    desc += "name = ";
+    desc += dims_[idx];
+    desc += " : alignment = ";
+    desc += std::to_string(getAlignment(idx));
+    desc += " : index = ";
+    desc += std::to_string(idx);
+  }
+  desc += "]";
+  return desc;
+}
+
+bool TensorLayoutDescription::isSameLayout(
+    const TensorLayoutDescription &rhs) const {
+  if (numDims_ != rhs.numDims_) {
+    return false;
+  }
+  if (serializedLayout_ != rhs.serializedLayout_) {
+    return false;
+  }
+  return true;
+}
+
+static bool isAnyHelper(llvm::StringRef layout) {
+  for (unsigned idx = 0, e = layout.size(); idx < e; ++idx) {
+    if (layout[idx] != '*') {
+      return false;
+    }
+  }
+  return true;
+}
+
+bool TensorLayoutDescription::isAnyLayout() {
+  return (isAnyHelper(getSerializedLayout()));
+}
+
+/// Definitions of different tensor layouts.
+static std::string dimsNHWC[] = {
+    {"N"},
+    {"H"},
+    {"W"},
+    {"C"},
+};
+static std::string dimsNCHW[] = {
+    {"N"},
+    {"C"},
+    {"H"},
+    {"W"},
+};
+static std::string dimsHWNC[] = {
+    {"H"},
+    {"W"},
+    {"N"},
+    {"C"},
+};
+static std::string dims0D[]{
+    {""},
+};
+static std::string dims1D[] = {
+    {"N"},
+};
+static std::string dims2D[] = {
+    {"*"},
+    {"*"},
+};
+static std::string dims3D[] = {
+    {"*"},
+    {"*"},
+    {"*"},
+};
+static std::string dims4D[] = {
+    {"*"},
+    {"*"},
+    {"*"},
+    {"*"},
+};
+static std::string dims5D[] = {
+    {"*"}, {"*"}, {"*"}, {"*"}, {"*"},
+};
+static std::string dims6D[] = {
+    {"*"}, {"*"}, {"*"}, {"*"}, {"*"}, {"*"},
+};
+
+static TensorLayoutDescription layoutNHWC(dimsNHWC);
+static TensorLayoutDescription layoutNCHW(dimsNCHW);
+static TensorLayoutDescription layoutHWNC(dimsHWNC);
+static TensorLayoutDescription layout0D(dims0D);
+static TensorLayoutDescription layout1D(dims1D);
+static TensorLayoutDescription layout2D(dims2D);
+static TensorLayoutDescription layout3D(dims3D);
+static TensorLayoutDescription layout4D(dims4D);
+static TensorLayoutDescription layout5D(dims5D);
+static TensorLayoutDescription layout6D(dims6D);
+
+/// Glow layouts for any specific number of dimensions.
+static TensorLayoutDescription layoutsForDims[] = {
+    layout0D, layout1D, layout2D, layout3D, layout4D, layout5D, layout6D,
+};
+
+TensorLayoutCommon::TensorLayoutCommon() : enabled_(false) {
+  layoutNameToLayoutDescription_.insert(
+      std::make_pair("NCHW", new TensorLayoutDescription("NCHW")));
+  layoutNameToLayoutDescription_.insert(
+      std::make_pair("NHWC", new TensorLayoutDescription("NHWC")));
+  layoutNameToLayoutDescription_.insert(
+      std::make_pair("HWNC", new TensorLayoutDescription("HWNC")));
+  layoutNameToLayoutDescription_.insert(
+      std::make_pair("N", new TensorLayoutDescription("N")));
+}
+
+TensorLayoutCommon::~TensorLayoutCommon() {
+  while (!layoutNameToLayoutDescription_.empty()) {
+    auto curr = layoutNameToLayoutDescription_.begin();
+    auto *tld = curr->second;
+    layoutNameToLayoutDescription_.erase(curr);
+    delete tld;
+  }
+}
+
+llvm::ArrayRef<TensorLayoutDescription>
+TensorLayoutCommon::getLayoutsForDims() const {
+  return llvm::makeArrayRef(layoutsForDims);
+}
+
+static TensorLayoutDescription *
+getLayoutFromName(const std::string &name,
+                  std::unordered_map<std::string, TensorLayoutDescription *>
+                      &layoutNameToLayoutDescription) {
+  if (isAnyHelper(name)) {
+    return nullptr;
+  }
+  auto it = layoutNameToLayoutDescription.find(name);
+  if (it != layoutNameToLayoutDescription.end()) {
+    return it->second;
+  }
+  // Add new layout to map:
+  auto *ret = new TensorLayoutDescription(name);
+  if (ret->getNumDims() == 0) {
+    // empty / any layout.
+    delete ret;
+    ret = nullptr;
+  }
+  layoutNameToLayoutDescription.insert(std::make_pair(name, ret));
+  return ret;
+}
+
+std::string TensorLayoutCommon::getDefaultNDLayout(unsigned dims) const {
+  DCHECK_LE(dims, max_tensor_dimensions) << "Too many dimensions";
+  return getLayoutsForDims()[dims].getSerializedLayout();
+}
+
+std::string TensorLayoutCommon::getNthInputLayoutRequirements(const Node *node,
+                                                              size_t n) {
+  DCHECK_LT(n, node->getNumInputs()) << "Wrong input number";
+  auto dims = node->getNthInput(n).getType()->dims();
+  DCHECK_LE(dims.size(), max_tensor_dimensions) << "Too many dimensions";
+  if (const auto *TN = llvm::dyn_cast<TransposeNode>(node)) {
+    // The layout for the input of transpose is the same as the layout of the
+    // operation's result producing this input.
+    auto input = TN->getInput();
+    return getNthResultLayoutRequirements(input.getNode(), input.getResNo());
+  }
+  if (const auto *QN = llvm::dyn_cast<QuantizeNode>(node)) {
+    auto input = QN->getInput();
+    return getNthResultLayoutRequirements(input.getNode(), input.getResNo());
+  }
+  if (const auto *QPN = llvm::dyn_cast<QuantizationProfileNode>(node)) {
+    switch (n) {
+    case QuantizationProfileNode::InputIndices::InputIdx: {
+      auto input = QPN->getInput();
+      return getNthResultLayoutRequirements(input.getNode(), input.getResNo());
+    }
+    default:
+      return getLayoutsForDims()[dims.size()].getSerializedLayout();
+    }
+  }
+  return getLayoutsForDims()[dims.size()].getSerializedLayout();
+}
+
+/// \returns The index of node \p N input \p in. NumInputs if not found.
+static unsigned getInputIdx(const Node *N, NodeValue in) {
+  for (unsigned idx = 0, e = N->getNumInputs(); idx < e; ++idx) {
+    if (N->getNthInput(idx) == in) {
+      return idx;
+    }
+  }
+  return N->getNumInputs();
+}
+
+std::string TensorLayoutCommon::getNthResultLayoutRequirements(const Node *node,
+                                                               size_t n) {
+  DCHECK_LT(n, node->getNumResults()) << "Wrong output number";
+  auto dims = node->getNthResult(n).getType()->dims();
+  DCHECK_LE(dims.size(), max_tensor_dimensions) << "Too many dimensions";
+  if (auto *TN = llvm::dyn_cast<TransposeNode>(node)) {
+    // If the result of Transpose is a concrete layout, try to use this specific
+    // layout.
+    if (auto *layout = getLayoutFromName(TN->getLayout(),
+                                         layoutNameToLayoutDescription_)) {
+      return layout->getSerializedLayout();
+    }
+    // Dynamically form the layout description for transposes.
+    auto input = TN->getInput();
+    auto inputLayout =
+        getNthInputLayoutRequirements(node, TransposeNode::InputIdx);
+    auto inputLayoutHelper = TensorLayoutDescription(inputLayout);
+    llvm::SmallVector<std::string, max_tensor_dimensions> dims(
+        input.dims().size());
+    auto shuffle = TN->getShuffle();
+    for (unsigned idx = 0, e = inputLayoutHelper.getNumDims(); idx < e; ++idx) {
+      dims[shuffle[idx]] = inputLayoutHelper.getNthDimDescription(idx);
+    }
+    TensorLayoutDescription tld(dims);
+    return tld.getSerializedLayout();
+  }
+  if (auto *C = llvm::dyn_cast<Constant>(node)) {
+    if (auto *layout =
+            getLayoutFromName(C->getLayout(), layoutNameToLayoutDescription_)) {
+      return layout->getSerializedLayout();
+    }
+  }
+  if (auto *PH = llvm::dyn_cast<Placeholder>(node)) {
+    if (auto *layout = getLayoutFromName(PH->getLayout(),
+                                         layoutNameToLayoutDescription_)) {
+      return layout->getSerializedLayout();
+    }
+  }
+  if (auto *RN = llvm::dyn_cast<ReshapeNode>(node)) {
+    if (auto *layout = getLayoutFromName(RN->getLayout(),
+                                         layoutNameToLayoutDescription_)) {
+      return layout->getSerializedLayout();
+    }
+    auto result = node->getNthResult(n);
+    auto *user = (*result.getUsers().begin()).getUser();
+    int inputIdx = getInputIdx(user, result);
+    if (inputIdx >= user->getNumInputs() || llvm::isa<TransposeNode>(user)) {
+      return getLayoutsForDims()[dims.size()].getSerializedLayout();
+    }
+    auto layout = getNthInputLayoutRequirements(user, inputIdx);
+    if (auto *layoutDesc =
+            getLayoutFromName(layout, layoutNameToLayoutDescription_)) {
+      return layoutDesc->getSerializedLayout();
+    }
+  }
+  return getLayoutsForDims()[dims.size()].getSerializedLayout();
+}
+
+bool TensorLayoutCommon::isSatisfiedBy(
+    TypeRef ty, const TensorLayoutDescription &destLayout,
+    const TensorLayoutDescription *srcLayout) const {
+  // Strides of the type (in elements).
+  auto strides = ty->strides();
+  if (strides.size() != destLayout.getNumDims()) {
+    return false;
+  }
+  unsigned idx = 0;
+  for (const auto &dim : destLayout.getDims()) {
+    // dim.alignment is in bytes, but strides are in elements.
+    if (strides[idx] * ty->getElementSize() % destLayout.getAlignment(dim) !=
+        0) {
+      return false;
+    }
+    idx++;
+  }
+  if (!srcLayout) {
+    return true;
+  }
+  if (destLayout.getNumDims() != srcLayout->getNumDims()) {
+    return false;
+  }
+  // Names should be compatible. * is compatible to anything.
+  if (srcLayout->getSerializedLayout().size() !=
+      destLayout.getSerializedLayout().size()) {
+    return false;
+  }
+  for (unsigned idx = 0, e = destLayout.getSerializedLayout().size(); idx < e;
+       ++idx) {
+    // '*' is compatible with anything.
+    if (destLayout.getSerializedLayout()[idx] == '*' ||
+        srcLayout->getSerializedLayout()[idx] == '*') {
+      continue;
+    }
+    // Non-'*' are only compatible with themselves.
+    if (srcLayout->getSerializedLayout()[idx] ==
+        destLayout.getSerializedLayout()[idx]) {
+      continue;
+    }
+    return false;
+  }
+  return true;
+}
+
+static std::string returnBaseReqOrNHWC(std::string baseReq) {
+  auto baseReqHelper = TensorLayoutDescription(baseReq);
+  if (!baseReqHelper.isSameLayout(
+          CanonicalTensorLayout::getInstance().getLayoutsForDims()[4])) {
+    return baseReq;
+  }
+  // NHWC is the canonical default
+  return CanonicalTensorLayout::getInstance().getDefaultNDLayout(4);
+}
+
+std::string
+CanonicalTensorLayout::getNthInputLayoutRequirements(const Node *node,
+                                                     size_t n) {
+  auto baseReq = TensorLayoutCommon::getNthInputLayoutRequirements(node, n);
+  if (acceptsAnyLayout(node)) {
+    return baseReq;
+  }
+  return returnBaseReqOrNHWC(baseReq);
+}
+
+std::string
+CanonicalTensorLayout::getNthResultLayoutRequirements(const Node *node,
+                                                      size_t n) {
+  auto baseReq = TensorLayoutCommon::getNthResultLayoutRequirements(node, n);
+  return returnBaseReqOrNHWC(baseReq);
+}
+
+std::string CanonicalTensorLayout::getDefaultNDLayout(unsigned dims) const {
+  if (dims == 4) {
+    return layoutNHWC.getSerializedLayout();
+  }
+  return TensorLayoutCommon::getDefaultNDLayout(dims);
+}
+
+static bool acceptsAnyInputLayout(const glow::Node *node) {
+  switch (node->getKind()) {
+  case Kinded::Kind::ConcatNodeKind:
+  case Kinded::Kind::BatchedReduceMeanNodeKind:
+  case Kinded::Kind::BatchedAddNodeKind:
+  case Kinded::Kind::BatchedReduceMinNodeKind:
+  case Kinded::Kind::BatchNormalizationNodeKind:
+  case Kinded::Kind::BatchNormalizationGradNodeKind:
+  case Kinded::Kind::ReshapeNodeKind:
+  case Kinded::Kind::MeanVarNormalizationNodeKind:
+  case Kinded::Kind::SGDNodeKind: {
+    return true;
+  }
+  default: { return false; }
+  }
+}
+
+bool CanonicalTensorLayout::acceptsAnyLayout(const Node *node) const {
+  if (node->isDataParallel()) {
+    return true;
+  }
+  // In the canonical representation, some nodes are input layout agnostic even
+  // if they are not necessarily data parallel:
+  return acceptsAnyInputLayout(node);
+}
diff --git a/lib/IR/IRGen.cpp b/lib/IR/IRGen.cpp
index 2a8e1e07e6..47ddcb043a 100644
--- a/lib/IR/IRGen.cpp
+++ b/lib/IR/IRGen.cpp
@@ -443,7 +443,7 @@ void IRGenVisitor::post(Node *parent, Node *N) {
 }
 
 void IRFunction::generateIR(const Backend &B) {
-  assert(G_->verify() && "Invalid function");
+  assert(G_->verify(&B) && "Invalid function");
   // Schedule the nodes.
   NodesPtrList ScheduledNodes;
   scheduleGraph(ScheduledNodes);
diff --git a/lib/Importer/Caffe2ModelLoader.cpp b/lib/Importer/Caffe2ModelLoader.cpp
index 1f910f5b01..9a1cd16976 100644
--- a/lib/Importer/Caffe2ModelLoader.cpp
+++ b/lib/Importer/Caffe2ModelLoader.cpp
@@ -333,7 +333,7 @@ Error Caffe2ModelLoader::loadConv(const caffe2::OperatorDef &op,
   // Caffe2 "Conv" op always stores the weight as CKRS.
   Tensor wT;
   w->getPayload().transpose(&wT, NCHW2NHWC);
-  w = G_.getParent()->createConstant(w->getName(), std::move(wT));
+  w = G_.getParent()->createConstant(w->getName(), std::move(wT), "NHWC");
 
   // The structure of the conv weights is: CRSK. We take the C, which is the
   // number of filters. We use this value to calculate the size of the bias
@@ -434,7 +434,7 @@ Error Caffe2ModelLoader::loadConvQuantized(const caffe2::OperatorDef &op,
   if (order != "NHWC") {
     Tensor wT;
     w->getPayload().transpose(&wT, NCHW2NHWC);
-    w = G_.getParent()->createConstant(w->getName(), std::move(wT));
+    w = G_.getParent()->createConstant(w->getName(), std::move(wT), "NHWC");
   }
 
   // The structure of the conv weights is: CRSK. We take the C, which is the
diff --git a/lib/Optimizer/GraphOptimizer/ConstantFolding.cpp b/lib/Optimizer/GraphOptimizer/ConstantFolding.cpp
index bbaa0dff5a..9c4c1358a7 100644
--- a/lib/Optimizer/GraphOptimizer/ConstantFolding.cpp
+++ b/lib/Optimizer/GraphOptimizer/ConstantFolding.cpp
@@ -22,6 +22,7 @@
 #include "glow/Graph/Node.h"
 #include "glow/Graph/Nodes.h"
 #include "glow/Graph/PlaceholderBindings.h"
+#include "glow/Graph/TensorLayout.h"
 #include "glow/Graph/Utils.h"
 #include "glow/Optimizer/GraphOptimizer/FunctionPasses.h"
 
@@ -113,6 +114,44 @@ void run(Backend &backend, CompiledFunction &compiledF,
   context.movePlaceholderBindings().release();
 }
 
+static bool isCanonicalLayout(const NodeValue &RN, Backend &backend,
+                              Node *clonedC, size_t idx) {
+  auto resultLayoutStr =
+      backend.getTensorLayoutRequirements().getNthResultLayoutRequirements(
+          clonedC, idx);
+  auto resultLayout = TensorLayoutDescription(resultLayoutStr);
+  auto &canInstance = CanonicalTensorLayout::getInstance();
+  auto default4DStr = canInstance.getDefaultNDLayout(4);
+  auto default4D = TensorLayoutDescription(default4DStr);
+  if (resultLayout.getDims().size() == 4 &&
+      !canInstance.isSatisfiedBy(RN.getType(), default4D, &resultLayout)) {
+    return false;
+  }
+  return true;
+}
+
+// Bail on constant folding post-lowering for backends that break assumptions.
+static void bailOnNonCanonicalLayout(
+    Function *constEvaluationF, Module &mod,
+    const llvm::SmallVectorImpl<SaveNode *> &savedResults) {
+  // Some results may be in a non-canonical format post-lowering.
+  // For example, if we are trying to constant fold an OpenCL 'Reshape' that
+  // has NCHW layout. We cannot transpose it back to canonical layout for
+  // two reasons: 1) Need to add a solver that supports weird non-NCHW2NHWC
+  // backends. 2) Even if we get a constant tensor as a new "save" of the
+  // transpose, the new constant tensor will have the wrong shape. We'd
+  // actually need to transpose it back to its pre-modification shape. These
+  // issues may be solved in the future (TODO), for now bail on such corner
+  // cases. Clean-up before bailing:
+  for (auto *SN : savedResults) {
+    // Now erase the Placeholder that we created for the SaveNode.
+    auto &vars = mod.getPlaceholders();
+    mod.erasePlaceholder(
+        std::find(vars.begin(), vars.end(), SN->getPlaceholder()));
+  }
+  mod.eraseFunction(constEvaluationF);
+}
+
 /// Evaluates a provided constant operation \p C using the provided \p backend
 /// and using the compilation context \p cctx.
 /// \returns constant results.
@@ -134,8 +173,12 @@ evaluateConstantOperation(Backend &backend, CompilationContext &cctx, Node *C) {
   // Create save nodes for each of the results.
   llvm::SmallVector<SaveNode *, 16> savedResults;
   for (size_t idx = 0, e = clonedC->getNumResults(); idx < e; ++idx) {
-    auto *SN = constEvaluationF->createSave(clonedC->getName(),
-                                            clonedC->getNthResult(idx));
+    auto RN = clonedC->getNthResult(idx);
+    auto *SN = constEvaluationF->createSave(clonedC->getName(), RN);
+    if (!isCanonicalLayout(RN, backend, clonedC, idx)) {
+      bailOnNonCanonicalLayout(constEvaluationF, mod, savedResults);
+      return {};
+    }
     savedResults.emplace_back(SN);
     bindings.allocate(SN->getPlaceholder());
   }
diff --git a/lib/Optimizer/GraphOptimizer/GraphOptimizer.cpp b/lib/Optimizer/GraphOptimizer/GraphOptimizer.cpp
index 4b32b8e271..9010822778 100644
--- a/lib/Optimizer/GraphOptimizer/GraphOptimizer.cpp
+++ b/lib/Optimizer/GraphOptimizer/GraphOptimizer.cpp
@@ -23,6 +23,7 @@
 #include "glow/Graph/Node.h"
 #include "glow/Graph/Nodes.h"
 #include "glow/Graph/PlaceholderBindings.h"
+#include "glow/Graph/TensorLayout.h"
 #include "glow/Graph/Utils.h"
 #include "glow/Graph/VerifierHelper.h"
 #include "glow/Optimizer/GraphOptimizer/FunctionPasses.h"
@@ -287,7 +288,8 @@ static bool sinkTranposeBelowChannelShuffle(Function *F,
                               TR->getShuffle()[CS->getKernel()]);
 
   // Create a copy of sinkingTR and insert after newChannelShuffle.
-  auto *newTR = F->createTranspose(TR->getName(), newCS, TR->getShuffle());
+  auto *newTR = F->createTranspose(TR->getName(), newCS, TR->getShuffle(),
+                                   TR->getLayout());
 
   CS->getResult().replaceAllUsesOfWith(newTR);
 
@@ -320,7 +322,8 @@ bool SinkCode::run(Function *F, const CompilationContext &cctx) {
           BN->getMean(), BN->getVar(), newChannelIdx, BN->getEpsilon(),
           BN->getMomentum());
       NewBN->setPredicate(node->getPredicate());
-      auto *newTR = F->createTranspose(TR->getName(), NewBN, TR->getShuffle());
+      auto *newTR = F->createTranspose(TR->getName(), NewBN, TR->getShuffle(),
+                                       TR->getLayout());
       newTR->setPredicate(node->getPredicate());
 
       BN->getResult().replaceAllUsesOfWith(newTR);
@@ -342,7 +345,8 @@ bool SinkCode::run(Function *F, const CompilationContext &cctx) {
           RL->getResult().getType(), TR->getInput().dims());
       auto *NRL = F->createRELU(RL->getName(), TR->getInput(), reluOutTy);
       NRL->setPredicate(node->getPredicate());
-      auto *newTR = F->createTranspose(TR->getName(), NRL, TR->getShuffle());
+      auto *newTR = F->createTranspose(TR->getName(), NRL, TR->getShuffle(),
+                                       TR->getLayout());
       newTR->setPredicate(node->getPredicate());
       RL->getResult().replaceAllUsesOfWith(newTR);
       changed = true;
@@ -359,7 +363,8 @@ bool SinkCode::run(Function *F, const CompilationContext &cctx) {
 
       auto *NSI = F->createSigmoid(SI->getName(), TR->getInput());
       NSI->setPredicate(node->getPredicate());
-      auto *newTR = F->createTranspose(TR->getName(), NSI, TR->getShuffle());
+      auto *newTR = F->createTranspose(TR->getName(), NSI, TR->getShuffle(),
+                                       TR->getLayout());
       newTR->setPredicate(node->getPredicate());
       SI->getResult().replaceAllUsesOfWith(newTR);
       changed = true;
@@ -417,7 +422,8 @@ bool SinkCode::run(Function *F, const CompilationContext &cctx) {
 
       auto *NTN = F->createTanh(TN->getName(), TR->getInput());
       NTN->setPredicate(node->getPredicate());
-      auto *newTR = F->createTranspose(TR->getName(), NTN, TR->getShuffle());
+      auto *newTR = F->createTranspose(TR->getName(), NTN, TR->getShuffle(),
+                                       TR->getLayout());
       newTR->setPredicate(node->getPredicate());
       TN->getResult().replaceAllUsesOfWith(newTR);
       changed = true;
@@ -485,7 +491,8 @@ bool SinkCode::run(Function *F, const CompilationContext &cctx) {
               dyn_cast<SplatNode>(node->getNthInput(ArithmeticNode::LHSIdx));
           auto *NS = F->createSplat("splat", RTR->getInput().getType(),
                                     SN->getValue());
-          LTR = F->createTranspose("transpose", NS, RTR->getShuffle());
+          LTR = F->createTranspose("transpose", NS, RTR->getShuffle(),
+                                   RTR->getLayout());
           changed = true;
         } else if (isa<SplatNode>(node->getNthInput(ArithmeticNode::RHSIdx)) &&
                    LTR) {
@@ -494,7 +501,8 @@ bool SinkCode::run(Function *F, const CompilationContext &cctx) {
               dyn_cast<SplatNode>(node->getNthInput(ArithmeticNode::RHSIdx));
           auto *NS = F->createSplat("splat", LTR->getInput().getType(),
                                     SN->getValue());
-          RTR = F->createTranspose("transpose", NS, LTR->getShuffle());
+          RTR = F->createTranspose("transpose", NS, LTR->getShuffle(),
+                                   LTR->getLayout());
           changed = true;
         } else if (isa<Constant>(node->getNthInput(ArithmeticNode::LHSIdx)) &&
                    RTR) {
@@ -552,8 +560,8 @@ bool SinkCode::run(Function *F, const CompilationContext &cctx) {
 
       newAN->setPredicate(node->getPredicate());
       changed = true;
-      auto *newTR =
-          F->createTranspose(LTR->getName(), newAN, LTR->getShuffle());
+      auto *newTR = F->createTranspose(LTR->getName(), newAN, LTR->getShuffle(),
+                                       LTR->getLayout());
       newTR->setPredicate(node->getPredicate());
       node->getNthResult(ArithmeticNode::ResultIdx).replaceAllUsesOfWith(newTR);
     }
@@ -587,7 +595,8 @@ bool SinkCode::run(Function *F, const CompilationContext &cctx) {
           RQ->getResult().getType(), TR->getInput().getType()->dims());
       auto *newRQ =
           F->createRescaleQuantized(RQ->getName(), TR->getInput(), newRQType);
-      auto *newTR = F->createTranspose(TR->getName(), newRQ, TR->getShuffle());
+      auto *newTR = F->createTranspose(TR->getName(), newRQ, TR->getShuffle(),
+                                       TR->getLayout());
       RQ->getResult().replaceAllUsesOfWith(newTR);
       changed = true;
     }
@@ -646,8 +655,9 @@ bool SinkCode::run(Function *F, const CompilationContext &cctx) {
 
       auto *newCN = F->createConcat(CN->getName(), transVector, newChannelIdx);
       newCN->setPredicate(node->getPredicate());
-      auto *newTR = F->createTranspose(firstInput->getName(), newCN,
-                                       firstInput->getShuffle());
+      auto *newTR =
+          F->createTranspose(firstInput->getName(), newCN,
+                             firstInput->getShuffle(), firstInput->getLayout());
       newTR->setPredicate(node->getPredicate());
       CN->getResult().replaceAllUsesOfWith(newTR);
       changed = true;
@@ -933,7 +943,8 @@ bool MergeTransposeIntoMatMulOrFC::run(Function *F,
         F->getParent()->uniqueTypeWithNewShape(W->getType(), newShape);
 
     // New reordered weights.
-    auto *newW = F->getParent()->createConstant(W->getType(), W->getName());
+    auto *newW = F->getParent()->createConstant(W->getType(), W->getName(),
+                                                W->getLayout());
     Tensor reshapedSrc(W->getPayload().getUnsafePtr(), reshapedWTy);
     Tensor reshapedDst(newW->getPayload().getUnsafePtr(), reshapedNewWTy);
     reshapedSrc.transpose(&reshapedDst, shuffle);
@@ -1252,13 +1263,14 @@ bool OptimizeReduceMean::run(Function *F, const CompilationContext &cctx) {
       std::vector<unsigned_t> strides = {1, 1};
       std::vector<unsigned_t> pads = {0, 0, 0, 0};
 
+      // TODO: Fix bad assumption? See issue 3499, for now workaround it.
       // In Glow, AvgPool expects NHWC.
       auto *TR1 = F->createTranspose(
-          RM->getName().str() + ".transposeNCHW2NHWC", in, NCHW2NHWC);
+          RM->getName().str() + ".transposeNCHW2NHWC", in, NCHW2NHWC, "NHWC");
       auto *AP = F->createAvgPool(RM->getName().str() + ".avgPool", TR1,
                                   kernels, strides, pads);
       auto *TR2 = F->createTranspose(
-          RM->getName().str() + ".transposeNHWC2NCHW", AP, NHWC2NCHW);
+          RM->getName().str() + ".transposeNHWC2NCHW", AP, NHWC2NCHW, "NCHW");
 
       // AvgPool keeps original shape. Add reshape to match expected output.
       std::vector<size_t> shape = TR2->getResult().dims();
@@ -1298,7 +1310,8 @@ static Constant *getUniquelyUsedConstant(Module *M, Node &node) {
   }
 
   // If constant has more than one use, duplicate it and return the duplicate.
-  auto *NC = M->createConstant(constant->getType(), constant->getName());
+  auto *NC = M->createConstant(constant->getType(), constant->getName(),
+                               constant->getLayout());
   NC->getPayloadMutable().assign(&constant->getPayload());
   return NC;
 }
@@ -1594,8 +1607,11 @@ static NodeValue tryToOptimizeConcatOfRehapes(Function *F, ConcatNode *CN) {
     return NodeValue(nullptr);
   }
   auto *newCN = F->createConcat(CN->getName(), newConcatInputs, dim);
-  return F->createReshape(CN->getInputs().front().getNode()->getName(), newCN,
-                          CN->getResult().dims());
+  return F->createReshape(
+      CN->getInputs().front().getNode()->getName(), newCN,
+      CN->getResult().dims(),
+      CanonicalTensorLayout::getInstance().getNthResultLayoutRequirements(
+          CN, ConcatNode::ResultIdx));
 }
 
 /// Simplify concat node.
@@ -1796,8 +1812,8 @@ bool TransposeConstants::run(Function *F, const CompilationContext &cctx) {
       continue;
     }
     // Create a new Constant NC to hold the transposed result.
-    auto *NC =
-        F->getParent()->createConstant(TN->getResult().getType(), C->getName());
+    auto *NC = F->getParent()->createConstant(TN->getResult().getType(),
+                                              C->getName(), TN->getLayout());
     // Transpose the value of C into NC.
     genericTranspose(&C->getPayload(), &NC->getPayloadMutable(),
                      TN->getShuffle());
@@ -2059,7 +2075,8 @@ bool OptimizeTransposeIntoReshape::run(Function *F,
     if (inDims != outDims) {
       continue;
     }
-    auto *RS = F->createReshape(TR->getName(), inputNode, outputDims);
+    auto *RS =
+        F->createReshape(TR->getName(), inputNode, outputDims, TR->getLayout());
     TR->getResult().replaceAllUsesOfWith(RS);
     changed = true;
   }
@@ -2115,9 +2132,9 @@ bool OptimizeReshape::run(Function *F, const CompilationContext &cctx) {
     // Reshape(Reshape(x)) -> Reshape(x).
     auto *reshapeNodeInput = dyn_cast<ReshapeNode>(inputNode);
     if (reshapeNodeInput && reshapeNodeInput->hasOneUse()) {
-      auto *newReshape =
-          F->createReshape(reshapeNode->getName(), reshapeNodeInput->getInput(),
-                           reshapeNode->getResult().dims());
+      auto *newReshape = F->createReshape(
+          reshapeNode->getName(), reshapeNodeInput->getInput(),
+          reshapeNode->getResult().dims(), reshapeNode->getLayout());
       reshapeNode->getResult().replaceAllUsesOfWith(newReshape);
       changed = true;
       continue;
@@ -2128,8 +2145,11 @@ bool OptimizeReshape::run(Function *F, const CompilationContext &cctx) {
     auto *C = dyn_cast<Constant>(inputNode);
     if (C && C->hasOneUse()) {
       // Create a new Constant with the type of the reshape.
+      auto layout =
+          CanonicalTensorLayout::getInstance().getNthResultLayoutRequirements(
+              reshapeNode, ReshapeNode::ResultIndices::ResultIdx);
       auto *newC = F->getParent()->createConstant(
-          reshapeNode->getResult().getType(), C->getName());
+          reshapeNode->getResult().getType(), C->getName(), layout);
       // Create an unowned view of the original tensor with the correct shape,
       // and assign it to the new Constant.
       Tensor reshapedT = C->getPayload().getUnowned(reshapeNode->getDims());
@@ -2264,7 +2284,8 @@ static NodeValue convertConstant(Module &mod, Constant &constant,
     if (dstTy->getElementType() != ElemKind::UInt8FusedFP16QTy) {
       return NodeValue();
     }
-    auto *NC = mod.createConstant(dstTy, constant.getName());
+    auto *NC =
+        mod.createConstant(dstTy, constant.getName(), constant.getLayout());
     NC->getPayloadMutable() =
         tensor.getCopyConvertedToType(dstTy->getElementType());
     return NC->getOutput();
@@ -2520,8 +2541,9 @@ static bool sinkRescaleQuantizedNode(Function *F) {
         continue;
       }
 
-      auto *newReshape = F->createReshape(
-          reshape->getName(), rescale->getInput(), reshape->getResult().dims());
+      auto *newReshape =
+          F->createReshape(reshape->getName(), rescale->getInput(),
+                           reshape->getResult().dims(), reshape->getLayout());
       auto *newRescale = F->createRescaleQuantized(
           rescale->getName(), newReshape, reshape->getResult().getType());
       reshape->getResult().replaceAllUsesOfWith(newRescale);
@@ -2558,8 +2580,9 @@ static bool sinkRescaleQuantizedNode(Function *F) {
         continue;
       }
 
-      auto *newTranspose = F->createTranspose(
-          transpose->getName(), rescale->getInput(), transpose->getShuffle());
+      auto *newTranspose =
+          F->createTranspose(transpose->getName(), rescale->getInput(),
+                             transpose->getShuffle(), transpose->getLayout());
       auto rescaleOutTy = F->getParent()->uniqueTypeWithNewShape(
           rescale->getResult().getType(), transpose->getResult().dims());
       auto *newRescale = F->createRescaleQuantized(rescale->getName(),
@@ -2852,7 +2875,7 @@ void glow::convertPlaceholdersToConstants(Function *F,
     if (!tensor) {
       continue;
     }
-    auto *constant = M->createConstant(PH->getName(), *tensor);
+    auto *constant = M->createConstant(PH->getName(), *tensor, PH->getLayout());
     PH->getOutput().replaceAllUsesOfWith(constant, F);
   }
 }
diff --git a/lib/Optimizer/GraphOptimizer/Lower.cpp b/lib/Optimizer/GraphOptimizer/Lower.cpp
index 566f42261a..41a0cd90ba 100644
--- a/lib/Optimizer/GraphOptimizer/Lower.cpp
+++ b/lib/Optimizer/GraphOptimizer/Lower.cpp
@@ -18,6 +18,7 @@
 #include "glow/Graph/Graph.h"
 #include "glow/Graph/Node.h"
 #include "glow/Graph/Nodes.h"
+#include "glow/Graph/TensorLayout.h"
 #include "glow/Optimizer/GraphOptimizer/FunctionPasses.h"
 #include "glow/Optimizer/GraphOptimizer/GraphOptimizer.h"
 
@@ -171,7 +172,10 @@ static void lowerFullyConnectedGradNode(Function *F, CompilationContext &cctx,
   // dx = dout * w.T
   auto *wT = F->createTranspose("fcg.wT", FCG.getWeights(), {1, 0});
   auto *dx2 = F->createMatMul("fcg.dot", dout, wT);
-  auto *dx = F->createReshape("fcg.inG", dx2, FCG.getInput().getType()->dims());
+  auto *dx = F->createReshape(
+      "fcg.inG", dx2, FCG.getInput().getType()->dims(),
+      CanonicalTensorLayout::getInstance().getNthInputLayoutRequirements(
+          &FCG, FullyConnectedGradNode::InputIdx));
   replaceAllUsesOfWith(cctx.loweredInfoMap, FCG.getGradOfInputNamedInput(), dx);
 
   // dw = xT * dout.
@@ -675,7 +679,7 @@ static void lowerBucketizeNode(Function *F, CompilationContext &cctx,
   auto *oneSplat = F->createSplat("oneSplat", boundariesConst->getType(), 1.0);
   auto *reshapedInput =
       F->createReshape(baseStr + ".reshape.input", B.getInput(),
-                       {B.getInput().getType()->size()});
+                       {B.getInput().getType()->size()}, "N");
   std::vector<NodeValue> results;
   for (size_t i = 0, e = reshapedInput->getResult().getType()->size(); i < e;
        i++) {
@@ -877,10 +881,11 @@ static void lowerChannelShuffleNode(Function *F, CompilationContext &cctx,
     transpose[i] = i;
   }
   std::swap(transpose[kernel], transpose[kernel + 1]);
-  auto *T =
-      F->createTranspose(CSN.getName().str() + ".transpose", R1, transpose);
+  auto *T = F->createTranspose(CSN.getName().str() + ".transpose", R1,
+                               transpose, R1->getLayout());
 
-  auto *R2 = F->createReshape(CSN.getName().str() + ".reshape2", T, inDims);
+  auto *R2 = F->createReshape(CSN.getName().str() + ".reshape2", T, inDims,
+                              T->getLayout());
   replaceAllUsesOfWith(cctx.loweredInfoMap, CSN.getResult(), R2);
 }
 
diff --git a/lib/Partitioner/Partitioner.cpp b/lib/Partitioner/Partitioner.cpp
index 81123567d5..8eb906d933 100644
--- a/lib/Partitioner/Partitioner.cpp
+++ b/lib/Partitioner/Partitioner.cpp
@@ -75,14 +75,9 @@ void Partitioner::init() {
 Error Partitioner::finalize(const DAGListTy &partitions,
                             const NodeToFunctionMap &mapping) {
 
-  // Validate the functions after partitioning.
-  for (Function *subF : module_->getFunctions()) {
-    if (!subF->verify()) {
-      return MAKE_ERR(ErrorValue::ErrorCode::PARTITIONER_ERROR,
-                      "Conversion led to invalid function " +
-                          subF->getName().str());
-    }
-  }
+  // NOTE: Cannot validate the functions after partitioning here. The validation
+  // needs the backend specific verifier. Tensor layouts, for example, might
+  // have gone from canonical form to backend specific form.
 
   if (logPartition) {
     LOG(INFO) << "The number of partitions is : "
diff --git a/tests/unittests/BackendTestUtils.cpp b/tests/unittests/BackendTestUtils.cpp
index 766a18a9cc..30f9f0d9ca 100644
--- a/tests/unittests/BackendTestUtils.cpp
+++ b/tests/unittests/BackendTestUtils.cpp
@@ -58,9 +58,10 @@ namespace {
 // Helpers for creating and intializing placeholders from tensors.
 static Placeholder *createPlaceholder(Module &mod,
                                       PlaceholderBindings &bindings,
-                                      Tensor *tensor, llvm::StringRef name) {
+                                      Tensor *tensor, llvm::StringRef name,
+                                      const std::string layout = ANY_LAYOUT) {
   auto *P = mod.createPlaceholder(tensor->getElementType(), tensor->dims(),
-                                  name, false);
+                                  name, false, layout);
   auto *PTensor = bindings.allocate(P);
   PTensor->assign(tensor);
 
@@ -682,7 +683,7 @@ void inferSmallConv(Tensor *inputs, Tensor *out, llvm::StringRef kind) {
   ExecutionEngine EE(kind);
   auto &mod = EE.getModule();
   auto *F = mod.createFunction("main");
-  auto *in = createPlaceholder(mod, bindings, inputs, "in");
+  auto *in = createPlaceholder(mod, bindings, inputs, "in", "NHWC");
   auto *C = F->createConv(bindings, "conv2a", in, 64, 1, 1, 0, 1);
   bindings.get(cast<Placeholder>(C->getFilter()))->getHandle().clear(0.3);
   bindings.get(cast<Placeholder>(C->getBias()))->getHandle().clear(0.4);
@@ -981,7 +982,7 @@ void inferBasicConvNet(Tensor *inputs, Tensor *out, llvm::StringRef kind,
   ExecutionEngine EE(kind);
   auto &mod = EE.getModule();
   Function *F = mod.createFunction("main");
-  auto *var = createPlaceholder(mod, bindings, inputs, "var");
+  auto *var = createPlaceholder(mod, bindings, inputs, "var", "NCHW");
   auto *tr = F->createTranspose("tr", var, NCHW2NHWC);
   auto *conv = F->createConv(bindings, "conv", tr, convDepth, {5, 5}, {2, 2},
                              {1, 1, 1, 1}, 1);
@@ -1004,8 +1005,8 @@ FunctionTensorPair createAndInitBasicFCNet(PlaceholderBindings &bindings,
   auto &mod = EE.getModule();
   Function *F = mod.createFunction("main");
 
-  auto *var =
-      mod.createPlaceholder(ElemKind::FloatTy, {2, 3, 16, 16}, "var", false);
+  auto *var = mod.createPlaceholder(ElemKind::FloatTy, {2, 3, 16, 16}, "var",
+                                    false, "NCHW");
   auto *tr = F->createTranspose("tr", var, NCHW2NHWC);
   auto *fc = F->createFullyConnected(bindings, "fc", tr, 16);
   auto *rl0 = F->createRELU("relu", fc);
@@ -1027,7 +1028,7 @@ void inferMixedNet(Tensor *inputs, Tensor *out, llvm::StringRef kind) {
   ExecutionEngine EE(kind);
   auto &mod = EE.getModule();
   Function *F = mod.createFunction("main");
-  auto *var = createPlaceholder(mod, bindings, inputs, "var");
+  auto *var = createPlaceholder(mod, bindings, inputs, "var", "NCHW");
   auto *selected =
       mod.createPlaceholder(ElemKind::Int64ITy, {2, 1}, "selected", false);
 
@@ -1069,20 +1070,20 @@ void inferComplexNet1(Tensor *inputs1, Tensor *inputs2, Tensor *inputs3,
   auto *sigmoid1 = F->createSigmoid("sigmoid1", conv1);
   auto *fc1 = F->createFullyConnected(bindings, "fc1", var2, 2352);
   bindings.get(cast<Placeholder>(fc1->getWeights()))->getHandle().clear(0.6);
-  auto *reshape1 = F->createReshape("reshape1", fc1, {8, 14, 28, 6});
+  auto *reshape1 = F->createReshape("reshape1", fc1, {8, 14, 28, 6}, "NHWC");
   auto *relu1 = F->createRELU("relu1", reshape1);
   auto *pool1 = F->createMaxPool("pool1", relu1, 2, 2, 1);
   auto *add = F->createAdd("add", sigmoid1, pool1->getResult());
   auto *tanh = F->createTanh("tanh", add);
   auto *fc2 = F->createFullyConnected(bindings, "fc2", var3, 720);
   bindings.get(cast<Placeholder>(fc2->getWeights()))->getHandle().clear(1.1);
-  auto *reshape2 = F->createReshape("reshape2", fc2, {8, 8, 15, 6});
+  auto *reshape2 = F->createReshape("reshape2", fc2, {8, 8, 15, 6}, "NHWC");
   auto *mul = F->createMul("mul", tanh, reshape2);
   auto *sigmoid2 = F->createSigmoid("sigmoid2", mul);
   auto *conv2 = F->createConv(bindings, "conv2", sigmoid2, 7, 3, 2, 1, 1);
   bindings.get(cast<Placeholder>(conv2->getFilter()))->getHandle().clear(0.3);
   bindings.get(cast<Placeholder>(conv2->getBias()))->getHandle().clear(1.3);
-  auto *reshape3 = F->createReshape("reshape3", conv2, {8, 8, 7, 4});
+  auto *reshape3 = F->createReshape("reshape3", conv2, {8, 8, 7, 4}, "NHWC");
   auto *sub = F->createSub("sub", reshape3, var4);
   auto *relu2 = F->createRELU("relu2", sub);
   auto *pool2 = F->createAvgPool("pool2", relu2, 3, 2, 1);
@@ -1114,7 +1115,7 @@ void inferTinyResnet(Tensor *input, Tensor *out, std::vector<Tensor> &weights,
   auto &mod = EE.getModule();
   auto *F = mod.createFunction("main");
 
-  auto *in = createPlaceholder(mod, bindings, input, "in");
+  auto *in = createPlaceholder(mod, bindings, input, "in", "NHWC");
   auto *conv1 = F->createConv(bindings, "conv1", in, 256, 1, 1, 0, 1);
   auto *conv2a = F->createConv(bindings, "conv2a", conv1, 64, 1, 1, 0, 1);
   auto *relu2a = F->createRELU("relu2a", conv2a);
diff --git a/tests/unittests/CMakeLists.txt b/tests/unittests/CMakeLists.txt
index fc601766fb..04c3efc2c7 100755
--- a/tests/unittests/CMakeLists.txt
+++ b/tests/unittests/CMakeLists.txt
@@ -36,6 +36,8 @@ add_executable(BasicIRTest
                BasicIRTest.cpp)
 target_link_libraries(BasicIRTest
                       PRIVATE
+                        Backend
+                        Backends
                         Graph
                         IR
                         gtest
@@ -165,6 +167,8 @@ add_executable(GraphSchedulerTest
                GraphSchedulerTest.cpp)
 target_link_libraries(GraphSchedulerTest
                       PRIVATE
+                        Backend
+                        Backends
                         Graph
                         IR
                         gtest
@@ -373,6 +377,7 @@ foreach(backend ${GLOW_BACKENDS})
   add_backend_test(TEST MLTest BACKEND "${backend}" UNOPT)
   add_backend_test(TEST OperatorGradTest BACKEND "${backend}" UNOPT)
   add_backend_test(TEST OperatorTest BACKEND "${backend}" UNOPT)
+  add_backend_test(TEST TensorLayoutTest BACKEND "${backend}" UNOPT)
   add_backend_test(TEST
                    RecommendationSystemTest
                    BACKEND
@@ -471,6 +476,8 @@ add_executable(TensorPoolTest
                TensorPoolTest.cpp)
 target_link_libraries(TensorPoolTest
                       PRIVATE
+                        Backend
+                        Backends
                         Graph
                         TensorPool
                         gtest
diff --git a/tests/unittests/GradCheckTest.cpp b/tests/unittests/GradCheckTest.cpp
index 874bb05e02..580810ed2f 100644
--- a/tests/unittests/GradCheckTest.cpp
+++ b/tests/unittests/GradCheckTest.cpp
@@ -942,7 +942,8 @@ TEST_P(GradCheck, gradientCheckTranspose) {
     auto &mod = EE->getModule();
     bindings.clear();
     Function *F = mod.createFunction("main");
-    A = mod.createPlaceholder(ElemKind::FloatTy, {1, 5, 10, 5}, "input", false);
+    A = mod.createPlaceholder(ElemKind::FloatTy, {1, 5, 10, 5}, "input", false,
+                              "NHWC");
     Exp = mod.createPlaceholder(ElemKind::FloatTy, {1, numOutputElem}, "exp",
                                 false);
     Node *TA = F->createTranspose("transpose", A, NHWC2NCHW);
diff --git a/tests/unittests/GraphOptzTest.cpp b/tests/unittests/GraphOptzTest.cpp
index 3dececfb0c..15cda60d90 100644
--- a/tests/unittests/GraphOptzTest.cpp
+++ b/tests/unittests/GraphOptzTest.cpp
@@ -355,7 +355,7 @@ TEST_F(GraphOptz, optimizeBatchNormAfterConvWithReshapeConst) {
       mod_.createPlaceholder(ElemKind::FloatTy, {5, 5, 3, 1}, "filter", false);
   auto *bias = mod_.createPlaceholder(ElemKind::FloatTy, {1}, "bias", false);
 
-  auto *TN = F_->createTranspose("transpose", filter, {3, 0, 1, 2});
+  auto *TN = F_->createTranspose("transpose", filter, HWCN2NHWC);
   auto *CV = F_->createConv("conv", input, TN, bias,
                             mod_.uniqueType(ElemKind::FloatTy, {1, 10, 20, 1}),
                             5, 1, 2, 1);
diff --git a/tests/unittests/GraphTest.cpp b/tests/unittests/GraphTest.cpp
index 507f923992..0df2d69859 100644
--- a/tests/unittests/GraphTest.cpp
+++ b/tests/unittests/GraphTest.cpp
@@ -888,7 +888,8 @@ TEST(Graph, parentLink) {
   ExecutionEngine EE;
 
   auto &mod = EE.getModule();
-  Constant *V = new Constant("V", mod.uniqueType(ElemKind::FloatTy, {3, 32}));
+  Constant *V =
+      new Constant("V", mod.uniqueType(ElemKind::FloatTy, {3, 32}), ANY_LAYOUT);
 
   // Variables don't belong to any function...
   EXPECT_EQ(V->getParent(), nullptr);
@@ -1814,6 +1815,7 @@ TEST(Graph, testDumpStructure) {
   std::string mesN = K->toString();
   std::string expectMes = R"(Placeholder
 name : "input"
+layout : *
 output : float<4 x 320 x 200 x 100 x 3>
 users : 0
 trainable : 1
@@ -1857,6 +1859,7 @@ Indices : index64<10 x 3>
   std::string expectMesM = R"(Module structure:
 Constant
 name : "dummy"
+layout : *
 output : float<1 x 1>
 users : 0
 
diff --git a/tests/unittests/OperatorTest.cpp b/tests/unittests/OperatorTest.cpp
index 127f9aa7d8..c7dc859158 100644
--- a/tests/unittests/OperatorTest.cpp
+++ b/tests/unittests/OperatorTest.cpp
@@ -43,10 +43,10 @@ class OperatorTest : public BackendTest {
 /// dummy scale and offset, otherwise it will not.
 static Placeholder *createPlaceholderConditionallyQuantized(
     Module &mod, ElemKind T, llvm::ArrayRef<size_t> dims, llvm::StringRef name,
-    bool isTrainable) {
+    bool isTrainable, llvm::StringRef layout = ANY_LAYOUT) {
   return isQuantizedElemKind(T)
-             ? mod.createPlaceholder(T, dims, 1.0, 0, name, isTrainable)
-             : mod.createPlaceholder(T, dims, name, isTrainable);
+             ? mod.createPlaceholder(T, dims, 1.0, 0, name, isTrainable, layout)
+             : mod.createPlaceholder(T, dims, name, isTrainable, layout);
 }
 
 /// Helper to get a unique Type; if \p T is quantized, then it will include a
@@ -623,10 +623,11 @@ static void testSpaceToDepthBlock3(glow::PlaceholderBindings &bindings,
                                    glow::ExecutionEngine &EE, ElemKind DTy) {
   unsigned blockSize = 3;
   auto *in = createPlaceholderConditionallyQuantized(mod, DTy, {1, 2, 6, 6},
-                                                     "in", false);
-  auto *tri = F->createTranspose("sptdTransposeIn", in, {0, 2, 3, 1});
+                                                     "in", false, "NHWC");
+  auto *tri = F->createTranspose("sptdTransposeIn", in, {0, 2, 3, 1}, "NHWC");
   auto *stdn = F->createSpaceToDepth("spacetodepth", tri, blockSize);
-  auto *tro = F->createTranspose("sptdTransposeOut", stdn, {0, 3, 1, 2});
+  auto *tro =
+      F->createTranspose("sptdTransposeOut", stdn, {0, 3, 1, 2}, "NCHW");
   auto *save = F->createSave("save", tro);
   auto *result = bindings.allocate(save->getPlaceholder());
 
@@ -777,10 +778,11 @@ static void testSpaceToDepth(glow::PlaceholderBindings &bindings,
                              glow::ExecutionEngine &EE, ElemKind DTy) {
   unsigned blockSize = 2;
   auto *in = createPlaceholderConditionallyQuantized(mod, DTy, {2, 2, 4, 4},
-                                                     "in", false);
-  auto *tri = F->createTranspose("sptdTransposeIn", in, {0, 2, 3, 1});
+                                                     "in", false, "NHWC");
+  auto *tri = F->createTranspose("sptdTransposeIn", in, {0, 2, 3, 1}, "NHWC");
   auto *stdn = F->createSpaceToDepth("spacetodepth", tri, blockSize);
-  auto *tro = F->createTranspose("sptdTransposeOut", stdn, {0, 3, 1, 2});
+  auto *tro =
+      F->createTranspose("sptdTransposeOut", stdn, {0, 3, 1, 2}, "NCHW");
   auto *save = F->createSave("save", tro);
   auto *result = bindings.allocate(save->getPlaceholder());
 
@@ -886,7 +888,7 @@ static void testResizeNearest(glow::PlaceholderBindings &bindings,
                               glow::Module &mod, glow::Function *F,
                               glow::ExecutionEngine &EE, ElemKind DTy) {
   auto *input = createPlaceholderConditionallyQuantized(mod, DTy, {1, 2, 2, 1},
-                                                        "input", false);
+                                                        "input", false, "NHWC");
   bindings.allocate(input)->getHandle<DataType>() = {2, 4, 8, 16};
 
   auto heightScaleUp = 2.0f;
@@ -1640,7 +1642,7 @@ static void testBatchedReduceZeroDimResult(glow::PlaceholderBindings &bindings,
                                            glow::ExecutionEngine &EE,
                                            ElemKind DTy) {
   auto *batch = createPlaceholderConditionallyQuantized(
-      mod, DTy, {4}, "batch", /* isTrainable */ false);
+      mod, DTy, {4}, "batch", /* isTrainable */ false, "N");
   bindings.allocate(batch)->getHandle<DataType>() = {2, 4, 6, 8};
 
   auto OT = uniqueTypeConditionallyQuantized(mod, DTy, {});
@@ -1941,7 +1943,8 @@ TEST_P(OperatorTest, batchedReduceMeanUsingAvgPool) {
 
   std::vector<size_t> dims = {3, 20, 4, 8};
 
-  auto *batch = mod_.createPlaceholder(ElemKind::FloatTy, dims, "batch", false);
+  auto *batch =
+      mod_.createPlaceholder(ElemKind::FloatTy, dims, "batch", false, "NHWC");
 
   auto IH = bindings_.allocate(batch)->getHandle();
   IH.randomize(1.0, 100.0, mod_.getPRNG());
@@ -2344,9 +2347,9 @@ static void testArgMaxKeepDim(glow::PlaceholderBindings &bindings,
                               glow::Module &mod, glow::Function *F,
                               glow::ExecutionEngine &EE, ElemKind DTy) {
   auto *input = createPlaceholderConditionallyQuantized(mod, DTy, {2, 3, 2, 2},
-                                                        "input", false);
-  auto *argmax =
-      mod.createPlaceholder(ElemKind::Int64ITy, {1, 3, 2, 2}, "argmax", false);
+                                                        "input", false, "NHWC");
+  auto *argmax = mod.createPlaceholder(ElemKind::Int64ITy, {1, 3, 2, 2},
+                                       "argmax", false, "NHWC");
 
   bindings.allocate(input)->getHandle<DataType>() = {
       11, 24, 33, 41, 15, 26, 37, 48, 12, 28, 31, 42,
@@ -2389,7 +2392,7 @@ static void testArgMaxNoKeepDim(glow::PlaceholderBindings &bindings,
                                 glow::Module &mod, glow::Function *F,
                                 glow::ExecutionEngine &EE, ElemKind DTy) {
   auto *input = createPlaceholderConditionallyQuantized(mod, DTy, {2, 3, 2, 2},
-                                                        "input", false);
+                                                        "input", false, "NHWC");
   auto *argmax =
       mod.createPlaceholder(ElemKind::Int64ITy, {2, 2, 2}, "argmax", false);
 
@@ -2788,8 +2791,8 @@ void gatherRangesTest(glow::PlaceholderBindings &bindings_, glow::Module &mod_,
     OUTPUT = [1, 3, 4, 5, 6]
     LENGTHS = [3, 2]
   */
-  auto *data =
-      createPlaceholderConditionallyQuantized(mod_, DTy, {6}, "data", false);
+  auto *data = createPlaceholderConditionallyQuantized(mod_, DTy, {6}, "data",
+                                                       false, "N");
   auto *ranges = mod_.createPlaceholder(ITy, {2, 2, 2}, "ranges", false);
 
   bindings_.allocate(data)->getHandle<DataType>() = {1, 2, 3, 4, 5, 6};
@@ -3005,14 +3008,14 @@ TEST_P(OperatorTest, Transpose3Dims_Int8) {
 /// Test that Transpose optimization into Reshape yields expected results.
 TEST_P(OperatorTest, TransposeIntoReshapeOptim) {
   CHECK_IF_ENABLED();
-  auto *batch =
-      mod_.createPlaceholder(ElemKind::FloatTy, {1, 3, 2, 4}, "batch", false);
+  auto *batch = mod_.createPlaceholder(ElemKind::FloatTy, {1, 3, 2, 4}, "batch",
+                                       false, "NHWC");
   auto IH = bindings_.allocate(batch)->getHandle();
   for (size_t i = 0; i < 24; i++) {
     IH.raw(i) = i + 1;
   }
 
-  Node *T = F_->createTranspose("transpose", batch, {1, 2, 0, 3});
+  Node *T = F_->createTranspose("transpose", batch, {1, 2, 0, 3}, "HWNC");
   Node *R = F_->createBatchedReduceMean("reduce.mean", T, {2, 3});
   SaveNode *O = F_->createSave("ret", R);
   bindings_.allocate(mod_.getPlaceholders());
@@ -5624,15 +5627,15 @@ TEST_P(OperatorTest, GroupConv3D) {
 TEST_P(OperatorTest, NonSquarePaddingConvolution) {
   CHECK_IF_ENABLED();
 
-  auto *input =
-      mod_.createPlaceholder(ElemKind::FloatTy, {1, 4, 4, 1}, "input", false);
+  auto *input = mod_.createPlaceholder(ElemKind::FloatTy, {1, 4, 4, 1}, "input",
+                                       false, "NHWC");
   auto IH = bindings_.allocate(input)->getHandle();
   for (size_t i = 0; i < 4 * 4; i++) {
     IH.raw(i) = i + 1;
   }
 
-  auto filter =
-      mod_.createPlaceholder(ElemKind::FloatTy, {2, 2, 2, 1}, "filter", false);
+  auto filter = mod_.createPlaceholder(ElemKind::FloatTy, {2, 2, 2, 1},
+                                       "filter", false, "NHWC");
   auto FH = bindings_.allocate(filter)->getHandle();
   for (size_t i = 0; i < 2 * 2 * 2; i++) {
     FH.raw(i) = pow(2.0, i);
@@ -5655,8 +5658,8 @@ TEST_P(OperatorTest, NonSquarePaddingConvolution) {
 
   // Create the reference conv operator whose input is the same as the
   // after-padding-input above.
-  auto *input1 =
-      mod_.createPlaceholder(ElemKind::FloatTy, {1, 5, 9, 1}, "input1", false);
+  auto *input1 = mod_.createPlaceholder(ElemKind::FloatTy, {1, 5, 9, 1},
+                                        "input1", false, "NHWC");
   bindings_.allocate(input1)->zero();
   auto IH1 = bindings_.get(input1)->getHandle();
   for (size_t i = 0; i < 4; i++)
@@ -6270,7 +6273,7 @@ static void testMaxPoolWithArgmax(glow::PlaceholderBindings &bindings,
                                   glow::Module &mod, glow::Function *F,
                                   glow::ExecutionEngine &EE, ElemKind DTy) {
   auto *input = createPlaceholderConditionallyQuantized(mod, DTy, {1, 3, 3, 1},
-                                                        "input", false);
+                                                        "input", false, "NHWC");
   bindings.allocate(input)->getHandle<DataType>() = {0, 3, 7, 6, 5, 1, 2, 8, 4};
   auto *pool = F->createMaxPool("pool", input, {2, 2}, {1, 1}, {0, 0, 0, 0});
   auto *SResult = F->createSave("save_result", pool->getResult());
@@ -6310,7 +6313,7 @@ testMaxPoolWithArgmaxTransposed(glow::PlaceholderBindings &bindings,
   // Show that sequence Tensor(NCHW) -> Transpose(NCHWtoNHWC) ->
   // MaxPoolWithArgmax -> Transpose(NHWCtoNCHW) produces correct linearization.
   auto *inputNCHW = createPlaceholderConditionallyQuantized(
-      mod, DTy, {1, 3, 4, 4}, "input", false);
+      mod, DTy, {1, 3, 4, 4}, "input", false, "NCHW");
   auto inHandle = bindings.allocate(inputNCHW)->getHandle<DataType>();
   inHandle.clear(0.);
   inHandle.at({0, 0, 2, 2}) = 11;
@@ -6319,15 +6322,15 @@ testMaxPoolWithArgmaxTransposed(glow::PlaceholderBindings &bindings,
 
   // Input NCHW to NHWC conversion.
   auto *inputNHWC =
-      F->createTranspose("transposeInput", inputNCHW, {0, 2, 3, 1});
+      F->createTranspose("transposeInput", inputNCHW, {0, 2, 3, 1}, "NHWC");
   auto *pool =
       F->createMaxPool("pool", inputNHWC, {4, 4}, {4, 4}, {0, 0, 0, 0});
 
   // NHWC to NCHW conversion.
-  auto *resultNCHW =
-      F->createTranspose("transposeRes", pool->getResult(), {0, 3, 1, 2});
-  auto *argmaxNCHW =
-      F->createTranspose("transposeArgmax", pool->getArgmax(), {0, 3, 1, 2});
+  auto *resultNCHW = F->createTranspose("transposeRes", pool->getResult(),
+                                        {0, 3, 1, 2}, "NCHW");
+  auto *argmaxNCHW = F->createTranspose("transposeArgmax", pool->getArgmax(),
+                                        {0, 3, 1, 2}, "NCHW");
 
   auto *SResult = F->createSave("save_result", resultNCHW);
   auto *SArgmax = F->createSave("save_argmax", argmaxNCHW);
@@ -8854,7 +8857,7 @@ static void testFlatten(glow::PlaceholderBindings &bindings, glow::Module &mod,
                         glow::Function *F, glow::ExecutionEngine &EE,
                         ElemKind DTy) {
   auto *tensor4D = createPlaceholderConditionallyQuantized(
-      mod, DTy, {3, 2, 4, 3}, "4D", false);
+      mod, DTy, {3, 2, 4, 3}, "4D", false, "NHWC");
   bindings.allocate(tensor4D)->getHandle<DataType>().randomize(0, 100,
                                                                mod.getPRNG());
 
@@ -8886,7 +8889,7 @@ static void testFlatten(glow::PlaceholderBindings &bindings, glow::Module &mod,
   // again because flattening is supported for every axis up and including the
   // rank of a tensor, 1D vector means we can flatten it on axis 1.
   auto *tensor1D =
-      createPlaceholderConditionallyQuantized(mod, DTy, {15}, "1D", false);
+      createPlaceholderConditionallyQuantized(mod, DTy, {15}, "1D", false, "N");
   bindings.allocate(tensor1D)->getHandle<DataType>().randomize(0, 100,
                                                                mod.getPRNG());
 
@@ -9414,9 +9417,9 @@ void batchOneHotTest(glow::PlaceholderBindings &bindings, glow::Module &mod,
   auto *data =
       createPlaceholderConditionallyQuantized(mod, DTy, {3, 2}, "data", false);
   auto *lengths =
-      mod.createPlaceholder(ElemKind::Int32ITy, {2}, "lengths", false);
-  auto *values =
-      createPlaceholderConditionallyQuantized(mod, DTy, {6}, "values", false);
+      mod.createPlaceholder(ElemKind::Int32ITy, {2}, "lengths", false, "N");
+  auto *values = createPlaceholderConditionallyQuantized(mod, DTy, {6},
+                                                         "values", false, "N");
 
   bindings.allocate(data)->getHandle<DataType>() = {5, 0, 11, 3, 0, 5};
   bindings.allocate(lengths)->getHandle<int32_t>() = {4, 2};
@@ -9596,9 +9599,9 @@ static void testDotProduct1D(glow::PlaceholderBindings &bindings,
   // Input tensors.
   constexpr std::size_t kDataSize = 10;
   auto *X = createPlaceholderConditionallyQuantized(mod, DTy, {kDataSize}, "X",
-                                                    false);
+                                                    false, "N");
   auto *Y = createPlaceholderConditionallyQuantized(mod, DTy, {kDataSize}, "Y",
-                                                    false);
+                                                    false, "N");
   auto XH = bindings.allocate(X)->getHandle<DataType>();
   auto YH = bindings.allocate(Y)->getHandle<DataType>();
 
@@ -9608,7 +9611,7 @@ static void testDotProduct1D(glow::PlaceholderBindings &bindings,
 
   // Compute expected output.
   auto *expected = createPlaceholderConditionallyQuantized(
-      mod, DTy, {kDataSize}, "expected", false);
+      mod, DTy, {kDataSize}, "expected", false, "N");
   auto expectedH = bindings.allocate(expected)->getHandle<DataType>();
 
   for (std::size_t i = 0; i < kDataSize; ++i) {
@@ -9732,8 +9735,8 @@ static void testDotProduct2D(glow::PlaceholderBindings &bindings,
   YH.randomize(-3.0, 3.0, mod.getPRNG());
 
   // Compute expected output.
-  auto *expected = createPlaceholderConditionallyQuantized(mod, DTy, {kRows},
-                                                           "expected", false);
+  auto *expected = createPlaceholderConditionallyQuantized(
+      mod, DTy, {kRows}, "expected", false, "N");
   auto expectedH = bindings.allocate(expected)->getHandle<DataType>();
 
   for (std::size_t i = 0; i < kRows; ++i) {
diff --git a/tests/unittests/TensorLayoutTest.cpp b/tests/unittests/TensorLayoutTest.cpp
new file mode 100644
index 0000000000..a7c0962368
--- /dev/null
+++ b/tests/unittests/TensorLayoutTest.cpp
@@ -0,0 +1,162 @@
+/**
+ * Copyright (c) Glow Contributors. See CONTRIBUTORS file.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+#include "BackendTestUtils.h"
+
+#include "glow/Backend/Backend.h"
+#include "glow/Graph/Graph.h"
+#include "glow/Graph/TensorLayout.h"
+#include "llvm/Support/raw_ostream.h"
+
+#include "gtest/gtest.h"
+
+#include <sstream>
+
+using namespace glow;
+
+class TensorLayoutTest : public BackendTest {
+protected:
+  PlaceholderBindings bindings_;
+};
+
+// Check CanonicalTensorLayout for conv works default values:
+TEST_P(TensorLayoutTest, convDefault) {
+  CHECK_IF_ENABLED();
+
+  auto *input =
+      mod_.createPlaceholder(ElemKind::FloatTy, {1, 3, 3, 1}, "input", false);
+  auto IH = bindings_.allocate(input)->getHandle();
+  IH = {1, 1, 1, 1, 1, 1, 1, 1, 1};
+
+  auto filter =
+      mod_.createPlaceholder(ElemKind::FloatTy, {1, 3, 3, 1}, "filter", false);
+  auto FH = bindings_.allocate(filter)->getHandle();
+  FH = {0, 0, 0, 1, 1, 1, 0, 0, 0};
+
+  auto *zeroBias =
+      mod_.createPlaceholder(ElemKind::FloatTy, {1}, "bias", false);
+  bindings_.allocate(zeroBias)->zero();
+
+  auto outTy = mod_.uniqueType(ElemKind::FloatTy, {1, 3, 3, 1});
+
+  ConvolutionNode *CN =
+      F_->createConv("Conv", input, filter, zeroBias, outTy, 3, 1, 1, 1);
+  SaveNode *S = F_->createSave("save", CN);
+  bindings_.allocate(S->getPlaceholder());
+
+  EXPECT_TRUE(verifyLayouts(*F_, CanonicalTensorLayout::getInstance()));
+}
+
+static void buildBadConv(PlaceholderBindings &bindings, Module &mod,
+                         Function *F) {
+  auto *input = mod.createPlaceholder(ElemKind::FloatTy, {1, 3, 3, 1}, "input",
+                                      false, "NWCH");
+  auto IH = bindings.allocate(input)->getHandle();
+  IH = {1, 1, 1, 1, 1, 1, 1, 1, 1};
+
+  auto filter = mod.createPlaceholder(ElemKind::FloatTy, {1, 3, 3, 1}, "filter",
+                                      false, "NWCH");
+  auto FH = bindings.allocate(filter)->getHandle();
+  FH = {0, 0, 0, 1, 1, 1, 0, 0, 0};
+
+  auto *zeroBias = mod.createPlaceholder(ElemKind::FloatTy, {1}, "bias", false);
+  bindings.allocate(zeroBias)->zero();
+
+  auto outTy = mod.uniqueType(ElemKind::FloatTy, {1, 3, 3, 1});
+
+  ConvolutionNode *CN =
+      F->createConv("Conv", input, filter, zeroBias, outTy, 3, 1, 1, 1);
+  SaveNode *S = F->createSave("save", CN);
+  bindings.allocate(S->getPlaceholder());
+}
+
+// Check CanonicalTensorLayout for conv fails verification with bad layout:
+TEST_P(TensorLayoutTest, convBadLayout) {
+  CHECK_IF_ENABLED();
+
+  buildBadConv(bindings_, mod_, F_);
+
+  EXPECT_FALSE(verifyLayouts(*F_, CanonicalTensorLayout::getInstance(), false));
+}
+
+// Check TensorLayoutDescription's parser with simple input.
+TEST_P(TensorLayoutTest, parseTestSimple) {
+  CHECK_IF_ENABLED();
+
+  TensorLayoutDescription simple("NHWC");
+  EXPECT_FALSE(simple.isAnyLayout());
+  EXPECT_EQ(simple.getNumDims(), 4);
+  EXPECT_EQ(simple.getDims()[0], "N");
+  EXPECT_EQ(simple.getDims()[1], "H");
+  EXPECT_EQ(simple.getDims()[2], "W");
+  EXPECT_EQ(simple.getDims()[3], "C");
+  for (size_t i = 0; i < simple.getNumDims(); ++i) {
+    EXPECT_EQ(simple.getAlignment(i), 1);
+  }
+}
+
+// Check TensorLayoutDescription's parser with alignment.
+TEST_P(TensorLayoutTest, parseTestAlignment) {
+  CHECK_IF_ENABLED();
+
+  TensorLayoutDescription alignment("N[a=32]HW[a=64]C");
+  EXPECT_FALSE(alignment.isAnyLayout());
+  EXPECT_EQ(alignment.getNumDims(), 4);
+  EXPECT_EQ(alignment.getDims()[0], "N[a=32]");
+  EXPECT_EQ(alignment.getDims()[1], "H");
+  EXPECT_EQ(alignment.getDims()[2], "W[a=64]");
+  EXPECT_EQ(alignment.getDims()[3], "C");
+  EXPECT_EQ(alignment.getAlignment(0), 32);
+  EXPECT_EQ(alignment.getAlignment(1), 1);
+  EXPECT_EQ(alignment.getAlignment(2), 64);
+  EXPECT_EQ(alignment.getAlignment(3), 1);
+}
+
+// Check TensorLayoutDescription's parser with custom extensions.
+TEST_P(TensorLayoutTest, parseTestCustom) {
+  CHECK_IF_ENABLED();
+
+  TensorLayoutDescription custom("N[a=32][after:align]C[mal:reynolds][answer:"
+                                 "42]HW[before:alignment][a=64]");
+  EXPECT_FALSE(custom.isAnyLayout());
+  EXPECT_EQ(custom.getNumDims(), 4);
+  EXPECT_EQ(custom.getDims()[0], "N[a=32][after:align]");
+  EXPECT_EQ(custom.getDims()[1], "C[mal:reynolds][answer:42]");
+  EXPECT_EQ(custom.getDims()[2], "H");
+  EXPECT_EQ(custom.getDims()[3], "W[before:alignment][a=64]");
+  EXPECT_EQ(custom.getAlignment(0), 32);
+  EXPECT_EQ(custom.getAlignment(1), 1);
+  EXPECT_EQ(custom.getAlignment(2), 1);
+  EXPECT_EQ(custom.getAlignment(3), 64);
+}
+
+// Check TensorLayoutDescription's parser with star dims.
+TEST_P(TensorLayoutTest, parseTestStar) {
+  CHECK_IF_ENABLED();
+
+  TensorLayoutDescription custom("N[a=32]*H*[a=64]");
+  EXPECT_FALSE(custom.isAnyLayout());
+  EXPECT_EQ(custom.getNumDims(), 4);
+  EXPECT_EQ(custom.getDims()[0], "N[a=32]");
+  EXPECT_EQ(custom.getDims()[1], "*");
+  EXPECT_EQ(custom.getDims()[2], "H");
+  EXPECT_EQ(custom.getDims()[3], "*[a=64]");
+  EXPECT_EQ(custom.getAlignment(0), 32);
+  EXPECT_EQ(custom.getAlignment(1), 1);
+  EXPECT_EQ(custom.getAlignment(2), 1);
+  EXPECT_EQ(custom.getAlignment(3), 64);
+}
+
+INSTANTIATE_BACKEND_TEST(TensorLayoutTest);
diff --git a/tests/unittests/ThreadPoolExecutorTest.cpp b/tests/unittests/ThreadPoolExecutorTest.cpp
index 6ff89a783d..84a6f0f0d4 100644
--- a/tests/unittests/ThreadPoolExecutorTest.cpp
+++ b/tests/unittests/ThreadPoolExecutorTest.cpp
@@ -622,8 +622,8 @@ TEST_F(ThreadPoolExecutorTest, EmptyDAG) {
   // compare the returned PlaceholderBindings with.
   PseudoRNG rng;
   auto type = std::unique_ptr<Type>(new Type(ElemKind::FloatTy, {1, 2, 2}));
-  auto placeholder = glow::make_unique<Placeholder>("a", type.get(),
-                                                    /*trainable=*/false);
+  auto placeholder = glow::make_unique<Placeholder>(
+      "a", type.get(), /*trainable=*/false, ANY_LAYOUT);
 
   auto testContext = glow::make_unique<ExecutionContext>();
   auto refContext = glow::make_unique<ExecutionContext>();
diff --git a/tools/ClassGen/NodeBuilder.cpp b/tools/ClassGen/NodeBuilder.cpp
index 403e95b398..d9290aa44a 100644
--- a/tools/ClassGen/NodeBuilder.cpp
+++ b/tools/ClassGen/NodeBuilder.cpp
@@ -575,6 +575,7 @@ NodeBuilder &NodeBuilder::addGradient() {
   // The new 'Grad' class will have all of the fields of the current class.
   GN.members_ = members_;
   GN.enum_ = enum_;
+  GN.isDataParallel_ = isDataParallel_;
 
   // Add the inputs that we'll use in the grad instruction.
   for (const std::string &in : nodeInputs_) {
diff --git a/tools/ClassGen/NodeGen.cpp b/tools/ClassGen/NodeGen.cpp
index 0c19981308..b0706ffb32 100644
--- a/tools/ClassGen/NodeGen.cpp
+++ b/tools/ClassGen/NodeGen.cpp
@@ -684,12 +684,14 @@ int main(int argc, char **argv) {
   BB.newNode("Reshape")
       .addInput("Input")
       .addMember(MemberType::VectorSizeT, "Dims")
+      .addMember(MemberType::String, "Layout")
       .addResultFromCtorArg()
       .setDocstring("Reshape the Input tensor to shape Dims.");
 
   BB.newNode("Transpose")
       .addInput("Input")
       .addMember(MemberType::VectorUnsigned, "Shuffle")
+      .addMember(MemberType::String, "Layout")
       .addResultFromCtorArg()
       .setDocstring("Transpose the Input tensor based on the vector Shuffle, "
                     "which assigns a new axis for each dimension in Input.");