@@ -57,16 +57,16 @@ inference. Then, we recompile the network using this profile information to
57
57
convert the network into a quantized form, allowing for static optimization of
58
58
the quantized graph. We convert portions of the network into islands of integer
59
59
computation and aim to generate outputs in the range that the original
60
- floating-point network produces. During the conversion, for the following types
61
- of quantized nodes, we ignore the output's quantization params (if they are
62
- provided) and force the output have the same quantization params as the input
60
+ floating-point network produces. During the conversion, for the following types
61
+ of quantized nodes, we ignore the output's quantization params (if they are
62
+ provided) and force the output have the same quantization params as the input
63
63
for performance purpose:
64
64
```
65
- LocalResponseNormalizationNode
66
- SliceNode
67
- ReshapeNode
68
- TopKNode
69
- GatherNode
65
+ LocalResponseNormalizationNode
66
+ SliceNode
67
+ ReshapeNode
68
+ TopKNode
69
+ GatherNode
70
70
MaxPoolNode
71
71
```
72
72
@@ -131,9 +131,9 @@ By default, target quantization precision is int8. However, precision can be
131
131
controlled via command line parameter: ` quantization-precision ` . There are
132
132
two supported values: ` Int8 ` and ` Int16 ` .
133
133
134
- ## Caffe2 Quantized Model Support
134
+ ## Caffe2 Quantized Model Support
135
135
136
- Glow is able to support Caffe2 Resnet50 quantized model:
136
+ Glow is able to support Caffe2 Resnet50 quantized model:
137
137
https://github.com/caffe2/models/tree/master/resnet50_quantized
138
138
139
139
To support Caffe2 quantized models, Glow has:
@@ -152,16 +152,16 @@ Int8GivenTensorFill
152
152
```
153
153
- Supported int32 quantized bias.
154
154
155
- In most of the cases, bias is quantized in int32 to improve precision
156
- (the partial sum of the matrix-matrix multiplication is accumulated into int32,
157
- so int32 bias can be added to the int32 partial sum for better accuracy).
158
- Glow now supports int32 quantized bias in ``` Convolution ``` , ``` FullyConnected ```
155
+ In most of the cases, bias is quantized in int32 to improve precision
156
+ (the partial sum of the matrix-matrix multiplication is accumulated into int32,
157
+ so int32 bias can be added to the int32 partial sum for better accuracy).
158
+ Glow now supports int32 quantized bias in ``` Convolution ``` , ``` FullyConnected ```
159
159
and ``` RowwiseQuantizedFullyConnected ``` nodes.
160
160
161
161
- Supported the conversion from uint8 quantized activations to int8 quantized activations.
162
162
163
- For the quantized Caffe2 ops, the activations are quantized to uint8. In Glow, the
164
- activations are quantized to int_8. Therefore, for the offset read from quantized Caffe2
163
+ For the quantized Caffe2 ops, the activations are quantized to uint8. In Glow, the
164
+ activations are quantized to int_8. Therefore, for the offset read from quantized Caffe2
165
165
model, we need to subtract 128(i.e. INT8_MIN) to make the activations become int8.
166
166
167
167
## Compiler Optimizations
@@ -191,17 +191,24 @@ For more specific graph optimizations check [here](Optimizations.md#quantization
191
191
192
192
## Row-wise Quantization
193
193
194
- Row-wise (or channel-wise) quantization is an important way to minimize accuracy drop.
195
- Glow supports row-wise quantized FullyConnected node ``` RowwiseQuantizedFullyConnected ```
196
- which is enabled by an image-classifier/loader option "-enable-rowwise".
194
+ Row-wise (or channel-wise) quantization is an important way to minimize accuracy
195
+ drop. Glow supports row-wise quantized FullyConnected node
196
+ ``` RowwiseQuantizedFullyConnected ``` which is enabled by an
197
+ image-classifier/loader option "-enable-rowwise".
197
198
198
- For the regular quantized FC, we quantize the whole weights tensor with the same
199
- scale and offset, which are computed based on the max and min of the entire tensor).
200
- But for row-wise, after getting ``` min_i ``` and ``` max_i ``` for each row ``` i ``` , we compute the pair
201
- of ``` (scale_i, offset_i) ``` to quantize each element in row ``` i ``` . The figure below shows
202
- the quantized FC node and RowwiseQuantizedFullyConnected node. Instead of using only
203
- one tensor to represent the quantized weights, we need 2 extra vectors ``` Scales ```
204
- and ``` Offsets ``` to store the ``` (scale, offset) ``` for each row.
199
+ For the regular quantized FC, we quantize the whole weights tensor with the same
200
+ scale and offset, which are computed based on the max and min of the entire
201
+ tensor). But for row-wise, after getting ``` min_i ``` and ``` max_i ``` for each
202
+ row ``` i ``` , we compute the pair of ``` (scale_i, offset_i) ``` to quantize each
203
+ element in row ``` i ``` . The figure below shows the quantized FC node and
204
+ RowwiseQuantizedFullyConnected node. Instead of using only one tensor to
205
+ represent the quantized weights, we need 2 extra vectors ``` Scales ``` and
206
+ ``` Offsets ``` to store the ``` (scale, offset) ``` for each row.
205
207
206
208
207
209
![ ] ( rowwise_quantized_fc.png )
210
+
211
+ Row-wise quantized SparseLengthsWeightedSum is also supported. Similar to the
212
+ above, we compute scales and offsets per row, to be used with the ` Data ` input
213
+ for the ` RowwiseQuantizedSparseLengthsSumNode ` . Scales and Offsets are inputs to
214
+ the node. Output of this node is float, matching the Caffe2 implementation.
0 commit comments