@@ -18,7 +18,7 @@ taken on the topic, and is not a general reference.
18
18
19
19
The primary quantization mechanism supported by MLIR is a scheme which can
20
20
express fixed point and affine transformations via uniformly spaced point on the
21
- Real number line.
21
+ [ Real] ( https://en.wikipedia.org/wiki/Real_number ) number line.
22
22
23
23
Further, the scheme can be applied:
24
24
@@ -30,11 +30,11 @@ Further, the scheme can be applied:
30
30
31
31
[ Fixed point] ( https://en.wikipedia.org/wiki/Fixed-point_arithmetic ) values are a
32
32
[ Real] ( https://en.wikipedia.org/wiki/Real_number ) number divided by a * scale* .
33
- We will call the result of the divided Real the * scaled value* .
33
+ We will call the result of the divided real the * scaled value* .
34
34
35
35
$$ real\_value = scaled\_value * scale $$
36
36
37
- The scale can be interpreted as the distance, in Real units, between neighboring
37
+ The scale can be interpreted as the distance, in real units, between neighboring
38
38
scaled values. For example, if the scale is $$ \pi $$ , then fixed point values
39
39
with this scale can only represent multiples of $$ \pi $$ , and nothing in
40
40
between. The maximum rounding error to convert an arbitrary Real to a fixed
@@ -43,10 +43,10 @@ previous example, when $$ scale = \pi $$, the maximum rounding error will be $$
43
43
\frac{\pi}{2} $$.
44
44
45
45
Multiplication can be performed on scaled values with different scales, using
46
- the same algorithm as multiplication of Real values (note that product scaled
46
+ the same algorithm as multiplication of real values (note that product scaled
47
47
value has $$ scale_ {product} = scale_ {left \mbox{ } operand} * scale_ {right
48
- \mbox{ } operand} $$). Addition can be performed on scaled values, as long as
49
- they have the same scale, using the same algorithm as addition of Real values.
48
+ \mbox{ } operand} $$). Addition can be performed on scaled values, so long as
49
+ they have the same scale, using the same algorithm for addition of real values.
50
50
This makes it convenient to represent scaled values on a computer as signed
51
51
integers, and perform arithmetic on those signed integers, because the results
52
52
will be correct scaled values.
@@ -55,31 +55,31 @@ will be correct scaled values.
55
55
56
56
Mathematically speaking, affine values are the result of
57
57
[ adding a Real-valued * zero point* , to a scaled value] ( https://en.wikipedia.org/wiki/Affine_transformation#Representation ) .
58
- Or equivalently, subtracting a zero point from an affine value results in a
58
+ Alternatively (and equivalently) , subtracting a zero point from an affine value results in a
59
59
scaled value:
60
60
61
61
$$ real\_value = scaled\_value * scale = (affine\_value - zero\_point) * scale $$
62
62
63
- Essentially, affine values are a shifting of the scaled values by some constant
63
+ Essentially, affine values are a shift of the scaled values by some constant
64
64
amount. Arithmetic (i.e., addition, subtraction, multiplication, division)
65
- cannot, in general, be directly performed on affine values; you must first
66
- [ convert ] ( #affine-to-fixed-point ) them to the equivalent scaled values.
65
+ cannot, in general, be directly performed on affine values; they must first be
66
+ [ converted ] ( #affine-to-fixed-point ) to the equivalent scaled values.
67
67
68
68
As alluded to above, the motivation for using affine values is to more
69
- efficiently represent the Real values that will actually be encountered during
70
- computation. Frequently, the Real values that will be encountered are not
71
- symmetric around the Real zero. We also make the assumption that the Real zero
69
+ efficiently represent real values that will actually be encountered during
70
+ computation. Frequently, real values that will be encountered are not
71
+ symmetric around the real zero. We also make the assumption that the real zero
72
72
is encountered during computation, and should thus be represented.
73
73
74
- In this case, it's inefficient to store scaled values represented by signed
75
- integers, as some of the signed integers will never be used. The bit patterns
74
+ In this case, it is inefficient to store scaled values represented by signed
75
+ integers, as some of the signed integers will never be used. In effect, the bit patterns
76
76
corresponding to those signed integers are going to waste.
77
77
78
- In order to exactly represent the Real zero with an integral-valued affine
78
+ In order to exactly represent the real zero with an integral-valued affine
79
79
value, the zero point must be an integer between the minimum and maximum affine
80
80
value (inclusive). For example, given an affine value represented by an 8 bit
81
81
unsigned integer, we have: $$ 0 \leq zero\_point \leq 255 $$ . This is important,
82
- because in deep neural networks' convolution-like operations, we frequently
82
+ because in convolution-like operations of deep neural networks , we frequently
83
83
need to zero-pad inputs and outputs, so zero must be exactly representable, or
84
84
the result will be biased.
85
85
@@ -99,14 +99,14 @@ scope of this document, and it is safe to assume unless otherwise stated that
99
99
rounding should be according to the IEEE754 default of RNE (where hardware
100
100
permits).
101
101
102
- ### Converting between Real and fixed point or affine
102
+ ### Converting between real and fixed point or affine
103
103
104
- To convert a Real value to a fixed point value, you must know the scale. To
105
- convert a Real value to an affine value, you must know the scale and zero point.
104
+ To convert a real value to a fixed point value, we must know the scale. To
105
+ convert a real value to an affine value, we must know the scale and the zero point.
106
106
107
107
#### Real to affine
108
108
109
- To convert an input tensor of Real -valued elements (usually represented by a
109
+ To convert an input tensor of real -valued elements (usually represented by a
110
110
floating point format, frequently
111
111
[ Single precision] ( https://en.wikipedia.org/wiki/Single-precision_floating-point_format ) )
112
112
to a tensor of affine elements represented by an integral type (e.g. 8-bit
@@ -121,16 +121,16 @@ af&fine\_value_{uint8 \, or \, uint16} \\
121
121
$$
122
122
123
123
In the above, we assume that $$ real\_value $$ is a Single, $$ scale $$ is a Single,
124
- $$ roundToNearestInteger $$ returns a signed 32 bit integer, and $$ zero\_point $$
125
- is an unsigned 8 or 16 bit integer. Note that bit depth and number of fixed
124
+ $$ roundToNearestInteger $$ returns a signed 32- bit integer, and $$ zero\_point $$
125
+ is an unsigned 8-bit or 16- bit integer. Note that bit depth and number of fixed
126
126
point values are indicative of common types on typical hardware but is not
127
127
constrained to particular bit depths or a requirement that the entire range of
128
128
an N-bit integer is used.
129
129
130
- #### Affine to Real
130
+ #### Affine to real
131
131
132
132
To convert an output tensor of affine elements represented by uint8
133
- or uint16 to a tensor of Real -valued elements (usually represented with a
133
+ or uint16 to a tensor of real -valued elements (usually represented with a
134
134
floating point format, frequently Single precision), the following conversion
135
135
can be performed:
136
136
@@ -186,10 +186,10 @@ MLIR:
186
186
187
187
* The TFLite op-set natively supports uniform-quantized variants.
188
188
* Passes and tools exist to convert directly from the * TensorFlow* dialect
189
- to the TFLite quantized op- set.
189
+ to the TFLite quantized operation set.
190
190
191
191
* [ * FxpMath* dialect] ( #fxpmath-dialect ) containing (experimental) generalized
192
- representations of fixed-point math ops and conversions:
192
+ representations of fixed-point math operations and conversions:
193
193
194
194
* [ Real math ops] ( #real-math-ops ) representing common combinations of
195
195
arithmetic operations that closely match corresponding fixed-point math
@@ -198,16 +198,16 @@ MLIR:
198
198
* [ Fixed-point math ops] ( #fixed-point-math-ops ) that for carrying out
199
199
computations on integers, as are typically needed by uniform
200
200
quantization schemes.
201
- * Passes to lower from real math ops to fixed-point math ops .
201
+ * Passes to lower from real math operations to fixed-point math operations .
202
202
203
203
* [ Solver tools] ( #solver-tools ) which can (experimentally and generically
204
204
operate on computations expressed in the * FxpMath* dialect in order to
205
205
convert from floating point types to appropriate * QuantizedTypes* , allowing
206
- the computation to be further lowered to integral math ops .
206
+ the computation to be further lowered to integral math operations .
207
207
208
- Not every application of quantization will use all facilities. Specifically, the
208
+ Not every application of quantization will use all of these facilities. Specifically, the
209
209
TensorFlow to TensorFlow Lite conversion uses the QuantizedTypes but has its own
210
- ops for type conversion and expression of the backing math.
210
+ operations for type conversion and expression of the supporting math.
211
211
212
212
## Quantization Dialect
213
213
@@ -218,20 +218,20 @@ TODO : Flesh this section out.
218
218
* QuantizedType base class
219
219
* UniformQuantizedType
220
220
221
- ### Quantized type conversion ops
221
+ ### Quantized type conversion operations
222
222
223
223
* qcast : Convert from an expressed type to QuantizedType
224
224
* dcast : Convert from a QuantizedType to its expressed type
225
225
* scast : Convert between a QuantizedType and its storage type
226
226
227
- ### Instrumentation and constraint ops
227
+ ### Instrumentation and constraint operations
228
228
229
229
* const_fake_quant : Emulates the logic of the historic TensorFlow
230
- fake_quant_with_min_max_args op .
230
+ fake_quant_with_min_max_args operation .
231
231
* stats_ref : Declares that statistics should be gathered at this point with a
232
232
unique key and made available to future passes of the solver.
233
233
* stats : Declares inline statistics (per layer and per axis) for the point in
234
- the computation. stats_ref ops are generally converted to stats ops once
234
+ the computation. stats_ref ops are generally converted to statistical operations once
235
235
trial runs have been performed.
236
236
* coupled_ref : Declares points in the computation to be coupled from a type
237
237
inference perspective based on a unique key.
@@ -246,23 +246,23 @@ As originally implemented, TensorFlow Lite was the primary user of such
246
246
operations at inference time. When quantized inference was enabled, if every
247
247
eligible tensor passed through an appropriate fake_quant node (the rules of
248
248
which tensors can have fake_quant applied are somewhat involved), then
249
- TensorFlow Lite would use the attributes of the fake_quant ops to make a
250
- judgment about how to convert to use kernels from its quantized ops subset.
249
+ TensorFlow Lite would use the attributes of the fake_quant operations to make a
250
+ judgment about how to convert to use kernels from its quantized operations subset.
251
251
252
- In MLIR-based quantization, fake_quant_ \* ops are handled by converting them to
252
+ In MLIR-based quantization, fake_quant_ \* operationss are handled by converting them to
253
253
a sequence of * qcast* (quantize) followed by * dcast* (dequantize) with an
254
254
appropriate * UniformQuantizedType* as the target of the qcast operation.
255
255
256
256
This allows subsequent compiler passes to preserve the knowledge that
257
- quantization was simulated in a certain way while giving the compiler
257
+ quantization was simulated in a certain way, while giving the compiler
258
258
flexibility to move the casts as it simplifies the computation and converts it
259
259
to a form based on integral arithmetic.
260
260
261
261
This scheme also naturally allows computations that are * partially quantized*
262
- where the parts which could not be reduced to integral ops are still carried out
262
+ where the parts which could not be reduced to integral operationss are still carried out
263
263
in floating point with appropriate conversions at the boundaries.
264
264
265
- ## TFLite Native Quantization
265
+ ## TFLite native quantization
266
266
267
267
TODO : Flesh this out
268
268
@@ -280,16 +280,16 @@ TODO : Flesh this out
280
280
-> tfl.Q) and replaces with (op). Also replace (constant_float -> tfl.Q)
281
281
with (constant_quant).
282
282
283
- ## FxpMath Dialect
283
+ ## FxpMath dialect
284
284
285
- ### Real math ops
285
+ ### Real math operations
286
286
287
287
Note that these all support explicit clamps, which allows for simple fusions and
288
288
representation of some common sequences quantization-compatible math. Of
289
289
addition, some support explicit biases, which are often represented as separate
290
290
adds in source dialects.
291
291
292
- TODO: This op set is still evolving and needs to be completed.
292
+ TODO: This operation set is still evolving and needs to be completed.
293
293
294
294
* RealBinaryOp
295
295
* RealAddEwOp
@@ -312,9 +312,9 @@ TODO: This op set is still evolving and needs to be completed.
312
312
* CMPLZ
313
313
* CMPGZ
314
314
315
- ### Fixed-point math ops
315
+ ### Fixed-point math operationss
316
316
317
- TODO: This op set only has enough ops to lower a simple power-of-two
317
+ TODO: This operation set only has enough operations to lower a simple power-of-two
318
318
RealAddEwOp.
319
319
320
320
* RoundingDivideByPotFxpOp
@@ -331,26 +331,26 @@ adjacent areas such as solving for transformations to other kinds of lower
331
331
precision types (i.e. bfloat16 or fp16).
332
332
333
333
Solver tools are expected to operate in several modes, depending on the
334
- computation and the manner in which it was trained :
334
+ computation and the training characteristics of the model :
335
335
336
336
* * Transform* : With all available information in the MLIR computation, infer
337
337
boundaries where the computation can be carried out with integral math and
338
338
change types accordingly to appropriate QuantizedTypes:
339
339
340
340
* For passthrough ops which do not perform active math, change them to
341
341
operate directly on the storage type, converting in and out at the edges
342
- via scast ops .
343
- * For ops that have the * Quantizable* trait, the type can be set directly.
344
- This includes ops from the [ real math ops set] {#real-math-ops}.
345
- * For others, encase them in appropriate dcast/qcast ops , presuming that
342
+ via scast operations .
343
+ * For operations that have the * Quantizable* trait, the type can be set directly.
344
+ This includes operations from the [ real math ops set] {#real-math-ops}.
345
+ * For others, encase them in appropriate dcast/qcast operations , presuming that
346
346
some follow-on pass will know what to do with them.
347
347
348
348
* * Instrument* : Most of the time, there are not sufficient implied
349
349
constraints within a computation to perform many transformations. For this
350
- reason, the solver can insert instrumentation ops at points where additional
350
+ reason, the solver can insert instrumentation operations at points where additional
351
351
runtime statistics may yield solutions. It is expected that such
352
352
computations will be lowered as-is for execution, run over an appropriate
353
- eval set, and statistics at each instrumentation point made available for a
353
+ evaluation set, and statistics at each instrumentation point made available for a
354
354
future invocation of the solver.
355
355
356
356
* * Simplify* : A variety of passes and simplifications are applied once
0 commit comments