Skip to content

Implement IRGen for the GraphIR #25

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Oct 8, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
125 changes: 125 additions & 0 deletions docs/IR.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
## Design of the Glow IR

### Introduction

This document describes the motivation behind the Glow intermediate
representation and some implementation details.

Glow is a retargetable compiler that supports a number of different backends.
This means that the first few layers of the compiler are target-independent, but
as you get closer to the different backends things start to diverge. The first
two levels of IR are shared between all targets. Different backends may have
additional layers of IR.

### High-level Graph

The high-level IR, is a graph-based representation that's similar to the graph
that you may find inside Caffe. When we load the model from a file we construct
this graph in a direct translation of one operator to one node. It's a simple
graph that allows basic transformations such as swapping the order of nodes and
removing nodes. The graph is strongly typed, which means that inputs and output
have a known tensor type (dimension and element type), and that the types must
match. This compile has a debug method for dumping a graphical representation of
the graph into a dotty file. The method is called 'dumpDAG'. The textual
representation of the graph is less informative and it looks like this:

```
pool
name : "pool"
input : float<8 x 28 x 28 x 16>
output : float<8 x 9 x 9 x 16>
kernel : 3
stride : 3
pad : 0
kind : max

convolution
name : "conv"
input : float<8 x 9 x 9 x 16>
output : float<8 x 9 x 9 x 16>
filter : float<16 x 5 x 5 x 16>
bias : float<16>
kernel : 5
stride : 1
pad : 2
depth : 16

relu
name : "conv"
input : float<8 x 9 x 9 x 16>
```

After optimizing the graph with target-independent optimizations the code is
lowered into the mid-level IR in a phase that's called "IRGen" (stands for IR
generation). This is a one-to-many translation where each operator is translated
into one or more instructions.

### Mid-level Graph

The low-level IR enables a different kind of target independent optimizations
that are not possible with the high-level graph format. For example, the ability
to share the memory buffers during the forward pass can't be expressed in the
Graph form because buffers are not explicit.

The mid-level IR is built like a sequence of instructions that perform things
like copy-memory and perform-convolution. The IR is not Static Single
Assignment (SSA) based representation, because the IR does not support control
flow. The IR is strongly typed and each instruction operand kind has known
parameter types. The IR representation is designed to be used as an in-memory
form. The IR can be dumped to human readable assembly-like format.

The IR has two sections: 'declare' and 'program'. In the first section of the IR
we declare a number of memory regions that live throughout the lifetime of the
program. This is similar to global variables in C++. The second part of the IR
is list of instructions. Each variable is annotated with the kind of
initialization that the program should do.

There are two kinds of memory regions. The global memory regions and locally
allocated regions. The locally allocated memory regions are similar to 'alloca'
in C++, and in LLVM. Memory regions are strongly typed, which means that the
kind of type of tensor that the region represents is known.

Instructions operate on either global variables or locally allocated buffers.
Each operand is annotated with one of the qualifiers '@in'/'@out'/'@inout'. In
means that the buffer is read from. Out means that the buffer is written into.
And InOut means that the instruction may read and write into the buffer. These
operand qualifiers help the optimizer decide when it is legal to share buffers.
Instructions may have other attributes that specify the legality of some
optimizations. For example, some operands require that the data from the forward
pass would be kept around for the backward pass, so if the program is not
optimized for inference-only mode then certain memory optimizations can't
happen.


This is an example of an unoptimized IR.

```
declare {
%input = weight float<8 x 28 x 28 x 1>, broadcast, 0.0
%filter = weight float<16 x 5 x 5 x 1>, xavier, 25.0
%filter0 = weight float<16>, broadcast, 0.100
%weights = weight float<10 x 144>, xavier, 144.0
%bias = weight float<10>, broadcast, 0.100
%selected = weight index<8 x 1>
...
%result = weight float<8 x 10>
}

program {
%allo = alloc float<8 x 28 x 28 x 16>
%conv = convolution [5 1 2 16] @out %allo, @in %input, @in %filter3, @in %bias0
%allo0 = alloc float<8 x 28 x 28 x 16>
%relu = relu @out %allo0, @in %allo
%allo1 = alloc index<8 x 9 x 9 x 16 x 2>
%allo2 = alloc float<8 x 9 x 9 x 16>
%pool = pool max [3 3 0] @out %allo2, @in %allo0, @inout %allo1
...
%deal6 = dealloc @out %allo6
%deal7 = dealloc @out %allo7
%deal8 = dealloc @out %allo8
%deal9 = dealloc @out %allo9
}
```



29 changes: 29 additions & 0 deletions docs/Testing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
## Testing the Glow compiler

The Glow test suite contains four major categories: unit tests, regression
tests, example programs, and the model loader. Unit tests are the small tests
that stress specific parts of the compiler. These tests are added to the
compiler when developing a feature. For example, we train a number of small
network and perform a gradient check on the operators. We also compile networks
to IR and look for specific patterns. Regression tests are tests that are added
when we fix bugs. Both regression tests and feature tests are found under the
"test/" directory. To run the feature and regression tests run "ninja test".

## Example test suites.

We rely on external test suites to test the compiler. We use the data sets
CIFAR10 and MNIST (located in the "example/" directory) to test the correctness
of the whole system. The script under 'utils/' download and extract the data
set.

## Model Loader

We test the correctness of the Glow implementation by loading Caffe2 models and
executing them end-to-end. The program 'loader' loads model, a png file, and
runs a single pass of inference. If everything goes right the output of the
program is identical to the output of the Caffe2 model. Unfortunately, the caffe
model does not describe what the input format should be. Should the pixels be
between zero and one, or negative 128 to positive 128? The user needs to be
aware of these things when running the models. The script in the directory
'utils/' downloads a number of pre-trained networks that we can use for testing.

2 changes: 2 additions & 0 deletions examples/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ target_link_libraries(cifar10
PRIVATE
Interpreter
Network
Graph
IR
Support)

Expand All @@ -14,6 +15,7 @@ target_link_libraries(mnist
PRIVATE
Interpreter
Network
Graph
IR
Support)

11 changes: 11 additions & 0 deletions include/glow/Graph/Graph.h
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@

#include "llvm/ADT/ArrayRef.h"

#include <unordered_map>
#include <vector>

namespace glow {
Expand All @@ -25,6 +26,7 @@ class ConcatNode;
class BatchNormalizationNode;
class LocalResponseNormalizationNode;
class ArithmeticNode;
class ReturnNode;

/// Represents the compute graph.
class Graph final {
Expand All @@ -48,6 +50,9 @@ class Graph final {
}

public:
/// Holds the mapping between graph nodes to IR variables.
using NodeToInstrTy = std::unordered_map<Node *, Value *>;

Graph(Module &M) : M_(M) {}
~Graph();

Expand Down Expand Up @@ -105,8 +110,14 @@ class Graph final {

ArithmeticNode *createArithmetic(llvm::StringRef name, Node *LHS, Node *RHS,
ArithmeticInst::OpKind op);

ReturnNode *createReturn(llvm::StringRef name, Node *input);

/// @}

/// Generate IR from the nodes in the graph into the module.
NodeToInstrTy generateIR();

/// Dumps the textual representation of the network.
void dump();

Expand Down
27 changes: 25 additions & 2 deletions include/glow/Graph/Nodes.h
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,9 @@ class ReluNode final : public Node {
public:
ReluNode(Node *in, llvm::StringRef name)
: Node(Kinded::Kind::ReluInstKind, in->getType(), name), in_(in) {}
static bool classof(const Kinded *k) {
return k->getKind() == Kinded::Kind::ReluInstKind;
}
Node *getInput() { return in_; }

std::string getDebugDesc() const override;
Expand All @@ -140,7 +143,9 @@ class SigmoidNode final : public Node {
public:
SigmoidNode(Node *in, llvm::StringRef name)
: Node(Kinded::Kind::SigmoidInstKind, in->getType(), name), in_(in) {}

static bool classof(const Kinded *k) {
return k->getKind() == Kinded::Kind::SigmoidInstKind;
}
Node *getInput() { return in_; }

std::string getDebugDesc() const override;
Expand All @@ -153,7 +158,9 @@ class TanhNode final : public Node {
public:
TanhNode(Node *in, llvm::StringRef name)
: Node(Kinded::Kind::TanhInstKind, in->getType(), name), in_(in) {}

static bool classof(const Kinded *k) {
return k->getKind() == Kinded::Kind::TanhInstKind;
}
Node *getInput() { return in_; }

std::string getDebugDesc() const override;
Expand Down Expand Up @@ -355,6 +362,22 @@ class LocalResponseNormalizationNode final : public Node {
void visit(Node *parent, NodeVisitor *visitor) override;
};

class ReturnNode final : public Node {
Node *in_;

public:
ReturnNode(llvm::StringRef name, Node *input)
: Node(Kinded::Kind::ReturnInstKind, input->getType(), name), in_(input) {
}
static bool classof(const Kinded *k) {
return k->getKind() == Kinded::Kind::ReturnInstKind;
}
Node *getInput() const { return in_; }

std::string getDebugDesc() const override;
void visit(Node *parent, NodeVisitor *visitor) override;
};

} // namespace glow

#endif // GLOW_GRAPH_NODES_H
5 changes: 5 additions & 0 deletions include/glow/IR/Instrs.def
Original file line number Diff line number Diff line change
Expand Up @@ -16,5 +16,10 @@ DEF_INSTR(ConcatInst, concat)
DEF_INSTR(BatchNormalizationInst, batchnormalization)
DEF_INSTR(LocalResponseNormalizationInst, localresponsenormalization)
DEF_INSTR(ArithmeticInst, arithmetic)

// Pseudo instructions (exist only in node form):
DEF_NODE(ReturnInst, return)

// Variables (exist as memory/variable declarations):
DEF_VALUE(WeightVar, weight)

4 changes: 4 additions & 0 deletions include/glow/IR/Traits.h
Original file line number Diff line number Diff line change
Expand Up @@ -48,18 +48,22 @@ class Kinded {
enum class Kind {
#define DEF_INSTR(CLASS, NAME) CLASS##Kind,
#define DEF_VALUE(CLASS, NAME) CLASS##Kind,
#define DEF_NODE(CLASS, NAME) CLASS##Kind,
#include "glow/IR/Instrs.def"
#undef DEF_INSTR
#undef DEF_VALUE
#undef DEF_NODE
};

static const char *getKindName(Kind IK) {
const char *names[] = {
#define DEF_INSTR(CLASS, NAME) #NAME,
#define DEF_VALUE(CLASS, NAME) #NAME,
#define DEF_NODE(CLASS, NAME) #NAME,
#include "glow/IR/Instrs.def"
#undef DEF_INSTR
#undef DEF_VALUE
#undef DEF_NODE
nullptr};
return names[(int)IK];
}
Expand Down
1 change: 1 addition & 0 deletions include/glow/Interpreter/Interpreter.h
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,7 @@ class Interpreter final {
/// used by the interpreter to dispatch different instructions.
///@{
#define DEF_VALUE(CLASS, NAME)
#define DEF_NODE(CLASS, NAME)
#define DEF_INSTR(CLASS, NAME) \
void fwd##CLASS(Context *ctx, bool isTrain, const CLASS *I); \
void bwd##CLASS(Context *ctx, const CLASS *I);
Expand Down
1 change: 1 addition & 0 deletions src/glow/Graph/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@

add_library(Graph
IRGen.cpp
Nodes.cpp
Graph.cpp)
target_link_libraries(Graph
Expand Down
10 changes: 7 additions & 3 deletions src/glow/Graph/Graph.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ FullyConnectedNode *Graph::createFullyConnected(llvm::StringRef name,
"weights", WeightVar::InitKind::Xavier, fanIn);

auto *B = createVariable(T->getElementType(), {outDepth}, "bias",
WeightVar::InitKind::Xavier, 0.1);
WeightVar::InitKind::Broadcast, 0.1);

auto OT = M_.uniqueType(T->getElementType(), {idim.first, outDepth});
return addNode(new FullyConnectedNode(input, OT, name, W, B, outDepth));
Expand Down Expand Up @@ -188,14 +188,18 @@ Graph::createLocalResponseNormalization(llvm::StringRef name, Node *input,

// The output tensor is of the same shape as the input tensor.
return addNode(new LocalResponseNormalizationNode(
input, "LRN", scale, halfWindowSize, alpha, beta, k));
input, name, scale, halfWindowSize, alpha, beta, k));
}

ArithmeticNode *Graph::createArithmetic(llvm::StringRef name, Node *LHS,
Node *RHS, ArithmeticInst::OpKind op) {
assert(LHS->dims() == RHS->dims() && "Invalid operand shapes");
// The output tensor is of the same shape as the input tensor.
return addNode(new ArithmeticNode("Arithmetic", LHS, RHS, op));
return addNode(new ArithmeticNode(name, LHS, RHS, op));
}

ReturnNode *Graph::createReturn(llvm::StringRef name, Node *input) {
return addNode(new ReturnNode(name, input));
}

//===----------------------------------------------------------------------===//
Expand Down
Loading