Skip to content

Commit 3660118

Browse files
committed
Docs: Document the Glow IR.
1 parent ea084ed commit 3660118

File tree

1 file changed

+125
-0
lines changed

1 file changed

+125
-0
lines changed

docs/IR.md

Lines changed: 125 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
## Design of the Glow IR
2+
3+
### Introduction
4+
5+
This document describes the motivation behind the Glow intermediate
6+
representation and some implementation details.
7+
8+
Glow is a retargetable compiler that supports a number of different backends.
9+
This means that the first few layers of the compiler are target-independent, but
10+
as you get closer to the different backends things start to diverge. The first
11+
two levels of IR are shared between all targets. Different backends may have
12+
additional layers of IR.
13+
14+
### High-level Graph
15+
16+
The high-level IR, is a graph-based representation that's similar to the graph
17+
that you may find inside Caffe. When we load the model from a file we construct
18+
this graph in a direct translation of one operator to one node. It's a simple
19+
graph that allows basic transformations such as swapping the order of nodes and
20+
removing nodes. The graph is strongly typed, which means that inputs and output
21+
have a known tensor type (dimension and element type), and that the types must
22+
match. This compile has a debug method for dumping a graphical representation of
23+
the graph into a dotty file. The method is called 'dumpDAG'. The textual
24+
representation of the graph is less informative and it looks like this:
25+
26+
```
27+
pool
28+
name : "pool"
29+
input : float<8 x 28 x 28 x 16>
30+
output : float<8 x 9 x 9 x 16>
31+
kernel : 3
32+
stride : 3
33+
pad : 0
34+
kind : max
35+
36+
convolution
37+
name : "conv"
38+
input : float<8 x 9 x 9 x 16>
39+
output : float<8 x 9 x 9 x 16>
40+
filter : float<16 x 5 x 5 x 16>
41+
bias : float<16>
42+
kernel : 5
43+
stride : 1
44+
pad : 2
45+
depth : 16
46+
47+
relu
48+
name : "conv"
49+
input : float<8 x 9 x 9 x 16>
50+
```
51+
52+
After optimizing the graph with target-independent optimizations the code is
53+
lowered into the mid-level IR in a phase that's called "IRGen" (stands for IR
54+
generation). This is a one-to-many translation where each operator is translated
55+
into one or more instructions.
56+
57+
### Mid-level Graph
58+
59+
The low-level IR enables a different kind of target independent optimizations
60+
that are not possible with the high-level graph format. For example, the ability
61+
to share the memory buffers during the forward pass can't be expressed in the
62+
Graph form because buffers are not explicit.
63+
64+
The mid-level IR is built like a sequence of instructions that perform things
65+
like copy-memory and perform-convolution. The IR is not Static Single
66+
Assignment (SSA) based representation, because the IR does not support control
67+
flow. The IR is strongly typed and each instruction operand kind has known
68+
parameter types. The IR representation is designed to be used as an in-memory
69+
form. The IR can be dumped to human readable assembly-like format.
70+
71+
The IR has two sections: 'declare' and 'program'. In the first section of the IR
72+
we declare a number of memory regions that live throughout the lifetime of the
73+
program. This is similar to global variables in C++. The second part of the IR
74+
is list of instructions. Each variable is annotated with the kind of
75+
initialization that the program should do.
76+
77+
There are two kinds of memory regions. The global memory regions and locally
78+
allocated regions. The locally allocated memory regions are similar to 'alloca'
79+
in C++, and in LLVM. Memory regions are strongly typed, which means that the
80+
kind of type of tensor that the region represents is known.
81+
82+
Instructions operate on either global variables or locally allocated buffers.
83+
Each operand is annotated with one of the qualifiers '@in'/'@out'/'@inout'. In
84+
means that the buffer is read from. Out means that the buffer is written into.
85+
And InOut means that the instruction may read and write into the buffer. These
86+
operand qualifiers help the optimizer decide when it is legal to share buffers.
87+
Instructions may have other attributes that specify the legality of some
88+
optimizations. For example, some operands require that the data from the forward
89+
pass would be kept around for the backward pass, so if the program is not
90+
optimized for inference-only mode then certain memory optimizations can't
91+
happen.
92+
93+
94+
This is an example of an unoptimized IR.
95+
96+
```
97+
declare {
98+
%input = weight float<8 x 28 x 28 x 1>, broadcast, 0.0
99+
%filter = weight float<16 x 5 x 5 x 1>, xavier, 25.0
100+
%filter0 = weight float<16>, broadcast, 0.100
101+
%weights = weight float<10 x 144>, xavier, 144.0
102+
%bias = weight float<10>, broadcast, 0.100
103+
%selected = weight index<8 x 1>
104+
...
105+
%result = weight float<8 x 10>
106+
}
107+
108+
program {
109+
%allo = alloc float<8 x 28 x 28 x 16>
110+
%conv = convolution [5 1 2 16] @out %allo, @in %input, @in %filter3, @in %bias0
111+
%allo0 = alloc float<8 x 28 x 28 x 16>
112+
%relu = relu @out %allo0, @in %allo
113+
%allo1 = alloc index<8 x 9 x 9 x 16 x 2>
114+
%allo2 = alloc float<8 x 9 x 9 x 16>
115+
%pool = pool max [3 3 0] @out %allo2, @in %allo0, @inout %allo1
116+
...
117+
%deal6 = dealloc @out %allo6
118+
%deal7 = dealloc @out %allo7
119+
%deal8 = dealloc @out %allo8
120+
%deal9 = dealloc @out %allo9
121+
}
122+
```
123+
124+
125+

0 commit comments

Comments
 (0)