pytorch · nadavrot · Oct 8, 2017 · Oct 4, 2017 · Oct 7, 2017 · Oct 7, 2017
diff --git a/docs/IR.md b/docs/IR.md
@@ -0,0 +1,125 @@
+## Design of the Glow IR
+
+### Introduction
+
+This document describes the motivation behind the Glow intermediate
+representation and some implementation details.
+
+Glow is a retargetable compiler that supports a number of different backends.
+This means that the first few layers of the compiler are target-independent, but
+as you get closer to the different backends things start to diverge.  The first
+two levels of IR are shared between all targets. Different backends may have
+additional layers of IR.
+
+### High-level Graph
+
+The high-level IR, is a graph-based representation that's similar to the graph
+that you may find inside Caffe.  When we load the model from a file we construct
+this graph in a direct translation of one operator to one node.  It's a simple
+graph that allows basic transformations such as swapping the order of nodes and
+removing nodes. The graph is strongly typed, which means that inputs and output
+have a known tensor type (dimension and element type), and that the types must
+match. This compile has a debug method for dumping a graphical representation of
+the graph into a dotty file. The method is called 'dumpDAG'. The textual
+representation of the graph is less informative and it looks like this:
+
+  ```
+  pool
+  name : "pool"
+  input : float<8 x 28 x 28 x 16>
+  output : float<8 x 9 x 9 x 16>
+  kernel : 3
+  stride : 3
+  pad : 0
+  kind : max
+
+  convolution
+  name : "conv"
+  input : float<8 x 9 x 9 x 16>
+  output : float<8 x 9 x 9 x 16>
+  filter : float<16 x 5 x 5 x 16>
+  bias : float<16>
+  kernel : 5
+  stride : 1
+  pad : 2
+  depth : 16
+
+  relu
+  name : "conv"
+  input : float<8 x 9 x 9 x 16>
+  ```
+
+After optimizing the graph with target-independent optimizations the code is
+lowered into the mid-level IR in a phase that's called "IRGen" (stands for IR
+generation). This is a one-to-many translation where each operator is translated
+into one or more instructions.
+
+### Mid-level Graph
+
+The low-level IR enables a different kind of target independent optimizations
+that are not possible with the high-level graph format. For example, the ability
+to share the memory buffers during the forward pass can't be expressed in the
+Graph form because buffers are not explicit.
+
+The mid-level IR is built like a sequence of instructions that perform things
+like copy-memory and perform-convolution.  The IR is not Static Single
+Assignment (SSA) based representation, because the IR does not support control
+flow. The IR is strongly typed and each instruction operand kind has known
+parameter types.  The IR representation is designed to be used as an in-memory
+form. The IR can be dumped to human readable assembly-like format.
+
+The IR has two sections: 'declare' and 'program'. In the first section of the IR
+we declare a number of memory regions that live throughout the lifetime of the
+program. This is similar to global variables in C++. The second part of the IR
+is list of instructions. Each variable is annotated with the kind of
+initialization that the program should do.
+
+There are two kinds of memory regions. The global memory regions and locally
+allocated regions. The locally allocated memory regions are similar to 'alloca'
+in C++, and in LLVM. Memory regions are strongly typed, which means that the
+kind of type of tensor that the region represents is known.
+
+Instructions operate on either global variables or locally allocated buffers.
+Each operand is annotated with one of the qualifiers '@in'/'@out'/'@inout'. In
+means that the buffer is read from. Out means that the buffer is written into.
+And InOut means that the instruction may read and write into the buffer. These
+operand qualifiers help the optimizer decide when it is legal to share buffers.
+Instructions may have other attributes that specify the legality of some
+optimizations. For example, some operands require that the data from the forward
+pass would be kept around for the backward pass, so if the program is not
+optimized for inference-only mode then certain memory optimizations can't
+happen.
+
+
+This is an example of an unoptimized IR.
+
+  ```
+  declare {
+    %input = weight float<8 x 28 x 28 x 1>, broadcast, 0.0
+    %filter = weight float<16 x 5 x 5 x 1>, xavier, 25.0
+    %filter0 = weight float<16>, broadcast, 0.100
+    %weights = weight float<10 x 144>, xavier, 144.0
+    %bias = weight float<10>, broadcast, 0.100
+    %selected = weight index<8 x 1>
+    ...
+    %result = weight float<8 x 10>
+  }
+
+  program {
+    %allo = alloc float<8 x 28 x 28 x 16>
+    %conv = convolution [5 1 2 16] @out %allo, @in %input, @in %filter3, @in %bias0
+    %allo0 = alloc float<8 x 28 x 28 x 16>
+    %relu = relu @out %allo0, @in %allo
+    %allo1 = alloc index<8 x 9 x 9 x 16 x 2>
+    %allo2 = alloc float<8 x 9 x 9 x 16>
+    %pool = pool max [3 3 0] @out %allo2, @in %allo0, @inout %allo1
+    ...
+    %deal6 = dealloc @out %allo6
+    %deal7 = dealloc @out %allo7
+    %deal8 = dealloc @out %allo8
+    %deal9 = dealloc @out %allo9
+  }
+  ```
+
+
+
diff --git a/docs/Testing.md b/docs/Testing.md
@@ -0,0 +1,29 @@
+## Testing the Glow compiler
+
+The Glow test suite contains four major categories: unit tests, regression
+tests, example programs, and the model loader.  Unit tests are the small tests
+that stress specific parts of the compiler.  These tests are added to the
+compiler when developing a feature. For example, we train a number of small
+network and perform a gradient check on the operators.  We also compile networks
+to IR and look for specific patterns.  Regression tests are tests that are added
+when we fix bugs.  Both regression tests and feature tests are found under the
+"test/" directory. To run the feature and regression tests run "ninja test".
+
+## Example test suites.
+
+We rely on external test suites to test the compiler. We use the data sets
+CIFAR10 and MNIST (located in the "example/" directory) to test the correctness
+of the whole system.  The script under 'utils/' download and extract the data
+set.
+
+## Model Loader
+
+We test the correctness of the Glow implementation by loading Caffe2 models and
+executing them end-to-end. The program 'loader' loads model, a png file, and
+runs a single pass of inference. If everything goes right the output of the
+program is identical to the output of the Caffe2 model. Unfortunately, the caffe
+model does not describe what the input format should be. Should the pixels be
+between zero and one, or negative 128 to positive 128? The user needs to be
+aware of these things when running the models. The script in the directory
+'utils/' downloads a number of pre-trained networks that we can use for testing.
+
diff --git a/examples/CMakeLists.txt b/examples/CMakeLists.txt
@@ -5,6 +5,7 @@ target_link_libraries(cifar10
                       PRIVATE
                         Interpreter
                         Network
+                        Graph
                         IR
                         Support)
 
@@ -14,6 +15,7 @@ target_link_libraries(mnist
                       PRIVATE
                         Interpreter
                         Network
+                        Graph
                         IR
                         Support)
 
diff --git a/include/glow/Graph/Graph.h b/include/glow/Graph/Graph.h
@@ -5,6 +5,7 @@
 
 #include "llvm/ADT/ArrayRef.h"
 
+#include <unordered_map>
 #include <vector>
 
 namespace glow {
@@ -25,6 +26,7 @@ class ConcatNode;
 class BatchNormalizationNode;
 class LocalResponseNormalizationNode;
 class ArithmeticNode;
+class ReturnNode;
 
 /// Represents the compute graph.
 class Graph final {
@@ -48,6 +50,9 @@ class Graph final {
   }
 
 public:
+  /// Holds the mapping between graph nodes to IR variables.
+  using NodeToInstrTy = std::unordered_map<Node *, Value *>;
+
   Graph(Module &M) : M_(M) {}
   ~Graph();
 
@@ -105,8 +110,14 @@ class Graph final {
 
   ArithmeticNode *createArithmetic(llvm::StringRef name, Node *LHS, Node *RHS,
                                    ArithmeticInst::OpKind op);
+
+  ReturnNode *createReturn(llvm::StringRef name, Node *input);
+
   /// @}
 
+  /// Generate IR from the nodes in the graph into the module.
+  NodeToInstrTy generateIR();
+
   /// Dumps the textual representation of the network.
   void dump();
 

diff --git a/include/glow/Graph/Nodes.h b/include/glow/Graph/Nodes.h
@@ -128,6 +128,9 @@ class ReluNode final : public Node {
 public:
   ReluNode(Node *in, llvm::StringRef name)
       : Node(Kinded::Kind::ReluInstKind, in->getType(), name), in_(in) {}
+  static bool classof(const Kinded *k) {
+    return k->getKind() == Kinded::Kind::ReluInstKind;
+  }
   Node *getInput() { return in_; }
 
   std::string getDebugDesc() const override;
@@ -140,7 +143,9 @@ class SigmoidNode final : public Node {
 public:
   SigmoidNode(Node *in, llvm::StringRef name)
       : Node(Kinded::Kind::SigmoidInstKind, in->getType(), name), in_(in) {}
-
+  static bool classof(const Kinded *k) {
+    return k->getKind() == Kinded::Kind::SigmoidInstKind;
+  }
   Node *getInput() { return in_; }
 
   std::string getDebugDesc() const override;
@@ -153,7 +158,9 @@ class TanhNode final : public Node {
 public:
   TanhNode(Node *in, llvm::StringRef name)
       : Node(Kinded::Kind::TanhInstKind, in->getType(), name), in_(in) {}
-
+  static bool classof(const Kinded *k) {
+    return k->getKind() == Kinded::Kind::TanhInstKind;
+  }
   Node *getInput() { return in_; }
 
   std::string getDebugDesc() const override;
@@ -355,6 +362,22 @@ class LocalResponseNormalizationNode final : public Node {
   void visit(Node *parent, NodeVisitor *visitor) override;
 };
 
+class ReturnNode final : public Node {
+  Node *in_;
+
+public:
+  ReturnNode(llvm::StringRef name, Node *input)
+      : Node(Kinded::Kind::ReturnInstKind, input->getType(), name), in_(input) {
+  }
+  static bool classof(const Kinded *k) {
+    return k->getKind() == Kinded::Kind::ReturnInstKind;
+  }
+  Node *getInput() const { return in_; }
+
+  std::string getDebugDesc() const override;
+  void visit(Node *parent, NodeVisitor *visitor) override;
+};
+
 } // namespace glow
 
 #endif // GLOW_GRAPH_NODES_H
diff --git a/include/glow/IR/Instrs.def b/include/glow/IR/Instrs.def
@@ -16,5 +16,10 @@ DEF_INSTR(ConcatInst, concat)
 DEF_INSTR(BatchNormalizationInst, batchnormalization)
 DEF_INSTR(LocalResponseNormalizationInst, localresponsenormalization)
 DEF_INSTR(ArithmeticInst, arithmetic)
+
+// Pseudo instructions (exist only in node form):
+DEF_NODE(ReturnInst, return)
+
+// Variables (exist as memory/variable declarations):
 DEF_VALUE(WeightVar, weight)
 
diff --git a/include/glow/IR/Traits.h b/include/glow/IR/Traits.h
@@ -48,18 +48,22 @@ class Kinded {
   enum class Kind {
 #define DEF_INSTR(CLASS, NAME) CLASS##Kind,
 #define DEF_VALUE(CLASS, NAME) CLASS##Kind,
+#define DEF_NODE(CLASS, NAME) CLASS##Kind,
 #include "glow/IR/Instrs.def"
 #undef DEF_INSTR
 #undef DEF_VALUE
+#undef DEF_NODE
   };
 
   static const char *getKindName(Kind IK) {
     const char *names[] = {
 #define DEF_INSTR(CLASS, NAME) #NAME,
 #define DEF_VALUE(CLASS, NAME) #NAME,
+#define DEF_NODE(CLASS, NAME) #NAME,
 #include "glow/IR/Instrs.def"
 #undef DEF_INSTR
 #undef DEF_VALUE
+#undef DEF_NODE
         nullptr};
     return names[(int)IK];
   }

diff --git a/include/glow/Interpreter/Interpreter.h b/include/glow/Interpreter/Interpreter.h
@@ -114,6 +114,7 @@ class Interpreter final {
   /// used by the interpreter to dispatch different instructions.
   ///@{
 #define DEF_VALUE(CLASS, NAME)
+#define DEF_NODE(CLASS, NAME)
 #define DEF_INSTR(CLASS, NAME)                                                 \
   void fwd##CLASS(Context *ctx, bool isTrain, const CLASS *I);                 \
   void bwd##CLASS(Context *ctx, const CLASS *I);

diff --git a/src/glow/Graph/CMakeLists.txt b/src/glow/Graph/CMakeLists.txt
@@ -1,5 +1,6 @@
 
 add_library(Graph
+            IRGen.cpp
             Nodes.cpp
             Graph.cpp)
 target_link_libraries(Graph

diff --git a/src/glow/Graph/Graph.cpp b/src/glow/Graph/Graph.cpp
@@ -91,7 +91,7 @@ FullyConnectedNode *Graph::createFullyConnected(llvm::StringRef name,
                            "weights", WeightVar::InitKind::Xavier, fanIn);
 
   auto *B = createVariable(T->getElementType(), {outDepth}, "bias",
-                           WeightVar::InitKind::Xavier, 0.1);
+                           WeightVar::InitKind::Broadcast, 0.1);
 
   auto OT = M_.uniqueType(T->getElementType(), {idim.first, outDepth});
   return addNode(new FullyConnectedNode(input, OT, name, W, B, outDepth));
@@ -188,14 +188,18 @@ Graph::createLocalResponseNormalization(llvm::StringRef name, Node *input,
 
   // The output tensor is of the same shape as the input tensor.
   return addNode(new LocalResponseNormalizationNode(
-      input, "LRN", scale, halfWindowSize, alpha, beta, k));
+      input, name, scale, halfWindowSize, alpha, beta, k));
 }
 
 ArithmeticNode *Graph::createArithmetic(llvm::StringRef name, Node *LHS,
                                         Node *RHS, ArithmeticInst::OpKind op) {
   assert(LHS->dims() == RHS->dims() && "Invalid operand shapes");
   // The output tensor is of the same shape as the input tensor.
-  return addNode(new ArithmeticNode("Arithmetic", LHS, RHS, op));
+  return addNode(new ArithmeticNode(name, LHS, RHS, op));
+}
+
+ReturnNode *Graph::createReturn(llvm::StringRef name, Node *input) {
+  return addNode(new ReturnNode(name, input));
 }
 
 //===----------------------------------------------------------------------===//