|
| 1 | +TensorFlow |
| 2 | +=============== |
| 3 | + |
| 4 | + |
| 5 | +1. [Introduction](#introduction) |
| 6 | +2. [API for TensorFlow](#api-for-tensorflow) |
| 7 | +3. [Support Matrix](#support-matrix) |
| 8 | + 3.1 [Quantization Scheme](#quantization-scheme) |
| 9 | + 3.2 [Quantization Approaches](#quantization-approaches) |
| 10 | + 3.3 [Backend and Device](#backend-and-device) |
| 11 | + |
| 12 | +## Introduction |
| 13 | + |
| 14 | +<div align="center"> |
| 15 | + <img src="https://www.tensorflow.org/images/tf_logo_horizontal.png"> |
| 16 | +</div> |
| 17 | + |
| 18 | +[TensorFlow](https://www.tensorflow.org/) is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of [tools](https://www.tensorflow.org/resources/tools), [libraries](https://www.tensorflow.org/resources/libraries-extensions), and [community](https://www.tensorflow.org/community) resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML-powered applications. It provides stable [Python](https://www.tensorflow.org/api_docs/python) and [C++](https://www.tensorflow.org/api_docs/cc) APIs, as well as a non-guaranteed backward compatible API for [other languages](https://www.tensorflow.org/api_docs). |
| 19 | + |
| 20 | +Keras is a multi-backend deep learning framework , supporting JAX, TensorFlow, and PyTorch. It serves as a dependency of TensorFlow, providing high-level API. Effortlessly build and train models for computer vision, natural language processing, audio processing, timeseries forecasting, recommender systems, etc. |
| 21 | + |
| 22 | + |
| 23 | + |
| 24 | +## API for TensorFlow |
| 25 | + |
| 26 | +Intel(R) Neural Compressor provides `quantize_model` and `autotune` as main interfaces for supported algorithms on TensorFlow framework. |
| 27 | + |
| 28 | + |
| 29 | +**quantize_model** |
| 30 | + |
| 31 | +The design philosophy of the `quantize_model` interface is easy-of-use. With minimal parameters requirement, including `model`, `quant_config`, `calib_dataloader` and `calib_iteration`, it offers a straightforward choice of quantizing TF model in one-shot. |
| 32 | + |
| 33 | +```python |
| 34 | +def quantize_model( |
| 35 | + model: Union[str, tf.keras.Model, BaseModel], |
| 36 | + quant_config: Union[BaseConfig, list], |
| 37 | + calib_dataloader: Callable = None, |
| 38 | + calib_iteration: int = 100, |
| 39 | +): |
| 40 | +``` |
| 41 | +`model` should be a string of the model's location, the object of Keras model or INC TF model wrapper class. |
| 42 | + |
| 43 | +`quant_config` is either the `StaticQuantConfig` object or a list contains `SmoothQuantConfig` and `StaticQuantConfig` to indicate what algorithm should be used and what specific quantization rules should be applied. |
| 44 | + |
| 45 | +`calib_dataloader` is used to load the data samples for calibration phase. In most cases, it could be the partial samples of the evaluation dataset. |
| 46 | + |
| 47 | +`calib_iteration` is used to decide how many iterations the calibration process will be run. |
| 48 | + |
| 49 | +Here is a simple example of using `quantize_model` interface with a dummy calibration dataloader and the default `StaticQuantConfig`: |
| 50 | +```python |
| 51 | +from neural_compressor.tensorflow import StaticQuantConfig, quantize_model |
| 52 | +from neural_compressor.tensorflow.utils import DummyDataset |
| 53 | + |
| 54 | +dataset = DummyDataset(shape=(100, 32, 32, 3), label=True) |
| 55 | +calib_dataloader = MyDataLoader(dataset=dataset) |
| 56 | +quant_config = StaticQuantConfig() |
| 57 | + |
| 58 | +qmodel = quantize_model("fp32_model.pb", quant_config, calib_dataloader) |
| 59 | +``` |
| 60 | +**autotune** |
| 61 | + |
| 62 | +The `autotune` interface, on the other hand, provides greater flexibility and power. It's particularly useful when accuracy is a critical factor. If the initial quantization doesn't meet the tolerance of accuracy loss, `autotune` will iteratively try quantization rules according to the `tune_config`. |
| 63 | + |
| 64 | +Just like `quantize_model`, `autotune` requires `model`, `calib_dataloader` and `calib_iteration`. And the `eval_fn`, `eval_args` are used to build evaluation process. |
| 65 | + |
| 66 | + |
| 67 | + |
| 68 | +```python |
| 69 | +def autotune( |
| 70 | + model: Union[str, tf.keras.Model, BaseModel], |
| 71 | + tune_config: TuningConfig, |
| 72 | + eval_fn: Callable, |
| 73 | + eval_args: Optional[Tuple[Any]] = None, |
| 74 | + calib_dataloader: Callable = None, |
| 75 | + calib_iteration: int = 100, |
| 76 | +) -> Optional[BaseModel]: |
| 77 | +``` |
| 78 | +`model` should be a string of the model's location, the object of Keras model or INC TF model wrapper class. |
| 79 | + |
| 80 | +`tune_config` is the `TuningConfig` object which contains multiple quantization rules. |
| 81 | + |
| 82 | +`eval_fn` is the evaluation function that measures the accuracy of a model. |
| 83 | + |
| 84 | +`eval_args` is the supplemental arguments required by the defined evaluation function. |
| 85 | + |
| 86 | +`calib_dataloader` is used to load the data samples for calibration phase. In most cases, it could be the partial samples of the evaluation dataset. |
| 87 | + |
| 88 | +`calib_iteration` is used to decide how many iterations the calibration process will be run. |
| 89 | + |
| 90 | +Here is a simple example of using `autotune` interface with different quantization rules defined by a list of `StaticQuantConfig`: |
| 91 | +```python |
| 92 | +from neural_compressor.common.base_tuning import TuningConfig |
| 93 | +from neural_compressor.tensorflow import StaticQuantConfig, autotune |
| 94 | + |
| 95 | +calib_dataloader = MyDataloader(dataset=Dataset()) |
| 96 | +custom_tune_config = TuningConfig( |
| 97 | + config_set=[ |
| 98 | + StaticQuantConfig(weight_sym=True, act_sym=True), |
| 99 | + StaticQuantConfig(weight_sym=False, act_sym=False), |
| 100 | + ] |
| 101 | +) |
| 102 | +best_model = autotune( |
| 103 | + model="baseline_model", |
| 104 | + tune_config=custom_tune_config, |
| 105 | + eval_fn=eval_acc_fn, |
| 106 | + calib_dataloader=calib_dataloader, |
| 107 | +) |
| 108 | +``` |
| 109 | + |
| 110 | +### Support Matrix |
| 111 | + |
| 112 | +#### Quantization Scheme |
| 113 | + |
| 114 | +| Framework | Backend Library | Symmetric Quantization | Asymmetric Quantization | |
| 115 | +| :-------------- |:---------------:| ---------------:|---------------:| |
| 116 | +| TensorFlow | [oneDNN](https://github.com/oneapi-src/oneDNN) | Activation (int8/uint8), Weight (int8) | - | |
| 117 | +| Keras | [ITEX](https://github.com/intel/intel-extension-for-tensorflow) | Activation (int8/uint8), Weight (int8) | - | |
| 118 | + |
| 119 | + |
| 120 | ++ Symmetric Quantization |
| 121 | + + int8: scale = 2 * max(abs(rmin), abs(rmax)) / (max(int8) - min(int8) - 1) |
| 122 | + + uint8: scale = max(rmin, rmax) / (max(uint8) - min(uint8)) |
| 123 | + |
| 124 | + |
| 125 | ++ oneDNN: [Lower Numerical Precision Deep Learning Inference and Training](https://software.intel.com/content/www/us/en/develop/articles/lower-numerical-precision-deep-learning-inference-and-training.html) |
| 126 | + |
| 127 | +#### Quantization Approaches |
| 128 | + |
| 129 | +The supported Quantization methods for TensorFlow and Keras are listed below: |
| 130 | +<table class="center"> |
| 131 | + <thead> |
| 132 | + <tr> |
| 133 | + <th>Types</th> |
| 134 | + <th>Quantization</th> |
| 135 | + <th>Dataset Requirements</th> |
| 136 | + <th>Framework</th> |
| 137 | + <th>Backend</th> |
| 138 | + </tr> |
| 139 | + </thead> |
| 140 | + <tbody> |
| 141 | + <tr> |
| 142 | + <td rowspan="2" align="center">Post-Training Static Quantization (PTQ)</td> |
| 143 | + <td rowspan="2" align="center">weights and activations</td> |
| 144 | + <td rowspan="2" align="center">calibration</td> |
| 145 | + <td align="center">Keras</td> |
| 146 | + <td align="center"><a href="https://github.com/intel/intel-extension-for-tensorflow">ITEX</a></td> |
| 147 | + </tr> |
| 148 | + <tr> |
| 149 | + <td align="center">TensorFlow</td> |
| 150 | + <td align="center"><a href="https://github.com/tensorflow/tensorflow">TensorFlow</a>/<a href="https://github.com/Intel-tensorflow/tensorflow">Intel TensorFlow</a></td> |
| 151 | + </tr> |
| 152 | + <tr> |
| 153 | + <td rowspan="2" align="center">Smooth Quantization(SQ)</td> |
| 154 | + <td rowspan="2" align="center">weights</td> |
| 155 | + <td rowspan="2" align="center">calibration</td> |
| 156 | + <td align="center">Tensorflow</td> |
| 157 | + <td align="center"><a href="https://github.com/tensorflow/tensorflow">TensorFlow</a>/<a href="https://github.com/Intel-tensorflow/tensorflow">Intel TensorFlow</a></td> |
| 158 | + </tr> |
| 159 | + </tbody> |
| 160 | +</table> |
| 161 | +<br> |
| 162 | +<br> |
| 163 | + |
| 164 | +##### Post Training Static Quantization |
| 165 | + |
| 166 | +The min/max range in weights and activations are collected offline on a so-called `calibration` dataset. This dataset should be able to represent the data distribution of those unseen inference dataset. The `calibration` process runs on the original fp32 model and dumps out all the tensor distributions for `Scale` and `ZeroPoint` calculations. Usually preparing 100 samples are enough for calibration. |
| 167 | + |
| 168 | +Refer to the [PTQ Guide](./TF_Quant.md) for detailed information. |
| 169 | + |
| 170 | +##### Smooth Quantization |
| 171 | + |
| 172 | +Smooth Quantization (SQ) is an advanced quantization technique designed to optimize model performance while maintaining high accuracy. Unlike traditional quantization methods that can lead to significant accuracy loss, SQ focuses on a more refined approach by taking a balance between the scale of activations and weights. |
| 173 | + |
| 174 | +Refer to the [SQ Guide](./TF_SQ.md) for detailed information. |
| 175 | + |
| 176 | +#### Backend and Device |
| 177 | +Intel(R) Neural Compressor supports TF GPU with [ITEX-XPU](https://github.com/intel/intel-extension-for-tensorflow). We will automatically run model on GPU by checking if it has been installed. |
| 178 | + |
| 179 | +<table class="center"> |
| 180 | + <thead> |
| 181 | + <tr> |
| 182 | + <th>Framework</th> |
| 183 | + <th>Backend</th> |
| 184 | + <th>Backend Library</th> |
| 185 | + <th>Backend Value</th> |
| 186 | + <th>Support Device(cpu as default)</th> |
| 187 | + </tr> |
| 188 | + </thead> |
| 189 | + <tbody> |
| 190 | + <tr> |
| 191 | + <td rowspan="2" align="left">TensorFlow</td> |
| 192 | + <td align="left">TensorFlow</td> |
| 193 | + <td align="left">OneDNN</td> |
| 194 | + <td align="left">"default"</td> |
| 195 | + <td align="left">cpu</td> |
| 196 | + </tr> |
| 197 | + <tr> |
| 198 | + <td align="left">ITEX</td> |
| 199 | + <td align="left">OneDNN</td> |
| 200 | + <td align="left">"itex"</td> |
| 201 | + <td align="left">cpu | gpu</td> |
| 202 | + </tr> |
| 203 | + </tbody> |
| 204 | +</table> |
| 205 | +<br> |
| 206 | +<br> |
0 commit comments