Re-quantify some models from per_channel mode to per_tensor mode (#90)

WanliZhong · web-flow · commit df8b973a5b3d · 2023-01-06T14:17:39.000+08:00
* re-quantize some models from per_channel mode to per_tensor mode

* remove the description about per_channel
diff --git a/README.md b/README.md
@@ -19,7 +19,7 @@ Guidelines:
 | ---------------------------------------------------- | ----------------------------- | ---------- | -------------- | ------------ | --------------- | ------------ | ----------- |
 | [YuNet](./models/face_detection_yunet)                  | Face Detection                | 160x120    | 1.45           | 6.22         | 12.18           | 4.04         | 86.69       |
 | [SFace](./models/face_recognition_sface)                | Face Recognition              | 112x112    | 8.65           | 99.20        | 24.88           | 46.25        | ---         |
-| [LPD-YuNet](./models/license_plate_detection_yunet/)    | License Plate Detection       | 320x240    | ---            | 168.03       | 56.12           | 154.20\*     |             |
+| [LPD-YuNet](./models/license_plate_detection_yunet/)    | License Plate Detection       | 320x240    | ---            | 168.03       | 56.12           | 29.53        |             |
 | [DB-IC15](./models/text_detection_db)                   | Text Detection                | 640x480    | 142.91         | 2835.91      | 208.41          | ---          | ---         |
 | [DB-TD500](./models/text_detection_db)                  | Text Detection                | 640x480    | 142.91         | 2841.71      | 210.51          | ---          | ---         |
 | [CRNN-EN](./models/text_recognition_crnn)               | Text Recognition              | 100x32     | 50.21          | 234.32       | 196.15          | 125.30       | ---         |
@@ -31,8 +31,8 @@ Guidelines:
 | [WeChatQRCode](./models/qrcode_wechatqrcode)            | QR Code Detection and Parsing | 100x100    | 7.04           | 37.68        | ---             | ---          | ---         |
 | [DaSiamRPN](./models/object_tracking_dasiamrpn)         | Object Tracking               | 1280x720   | 36.15          | 705.48       | 76.82           | ---          | ---         |
 | [YoutuReID](./models/person_reid_youtureid)             | Person Re-Identification      | 128x256    | 35.81          | 521.98       | 90.07           | 44.61        | ---         |
-| [MP-PalmDet](./models/palm_detection_mediapipe)         | Palm Detection                | 256x256    | 15.57          | 168.37       | 50.64           | 145.56\*     | ---         |
-| [MP-HandPose](./models/handpose_estimation_mediapipe)   | Hand Pose Estimation          | 256x256    | 20.16          | 148.24       | 156.30          | 663.77\*     | ---         |
+| [MP-PalmDet](./models/palm_detection_mediapipe)         | Palm Detection                | 256x256    | 15.57          | 168.37       | 50.64           | 62.45        | ---         |
+| [MP-HandPose](./models/handpose_estimation_mediapipe)   | Hand Pose Estimation          | 256x256    | 20.16          | 148.24       | 156.30          | 42.70        | ---         |
 
 \*: Models are quantized in per-channel mode, which run slower than per-tensor quantized models on NPU.
 
diff --git a/models/handpose_estimation_mediapipe/README.md b/models/handpose_estimation_mediapipe/README.md
@@ -9,8 +9,6 @@ This model is converted from Tensorflow-JS to ONNX using following tools:
 - tf_saved_model to ONNX: https://github.com/onnx/tensorflow-onnx
 - simplified by [onnx-simplifier](https://github.com/daquexian/onnx-simplifier)
 
-Also note that the model is quantized in per-channel mode with [Intel's neural compressor](https://github.com/intel/neural-compressor), which gives better accuracy but may lose some speed.
-
 ## Demo
 
 Run the following commands to try the demo:
diff --git a/models/handpose_estimation_mediapipe/handpose_estimation_mediapipe_2022may-int8-quantized.onnx b/models/handpose_estimation_mediapipe/handpose_estimation_mediapipe_2022may-int8-quantized.onnx
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:05a0cc7d3f4dfa135795173c2458f5ac01c8c93e16596b1ac02c144e3d236e77
-size 1607095
+oid sha256:2ebaf701aa5f13de101a6d27ae5b1b011201f0b4d177e06d57e4f7f5970e985b
+size 1559235
diff --git a/models/license_plate_detection_yunet/license_plate_detection_lpd_yunet_2022may-int8-quantized.onnx b/models/license_plate_detection_yunet/license_plate_detection_lpd_yunet_2022may-int8-quantized.onnx
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:26c4769e86df6a079f538f9daf4a9c7b1f386dccab9bd1292c75fdf9a37ff240
-size 1129229
+oid sha256:933f8332152718d2b7b30ba40cf07fbbb4099d7ecc3709708862e3d36b5661a9
+size 1087947
diff --git a/models/palm_detection_mediapipe/README.md b/models/palm_detection_mediapipe/README.md
@@ -6,8 +6,6 @@ This model detects palm bounding boxes and palm landmarks, and is converted from
 - tf_saved_model to ONNX: https://github.com/onnx/tensorflow-onnx
 - simplified by [onnx-simplifier](https://github.com/daquexian/onnx-simplifier)
 
-Also note that the model is quantized in per-channel mode with [Intel&#39;s neural compressor](https://github.com/intel/neural-compressor), which gives better accuracy but may lose some speed.
-
 ## Demo
 
 Run the following commands to try the demo:
diff --git a/models/palm_detection_mediapipe/palm_detection_mediapipe_2022may-int8-quantized.onnx b/models/palm_detection_mediapipe/palm_detection_mediapipe_2022may-int8-quantized.onnx
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e0430ef470cceb826446585a6c7af6911bba58affb416186cb134e6ddc8a76de
-size 3182222
+oid sha256:4f634e62a9f4a838c953c8d25389c340568234c473aff383986aa854ab1b36f4
+size 3120401
diff --git a/tools/quantize/inc_configs/lpd_yunet.yaml b/tools/quantize/inc_configs/lpd_yunet.yaml
@@ -32,6 +32,18 @@ quantization:                                        # optional. tuning constrai
           dtype: float32
           label: True
 
+  model_wise:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+    weight:
+      granularity: per_tensor
+      scheme: asym
+      dtype: int8
+      algorithm: minmax
+    activation:
+      granularity: per_tensor
+      scheme: asym
+      dtype: int8
+      algorithm: minmax
+
 tuning:
   accuracy_criterion:
     relative:  0.02                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
diff --git a/tools/quantize/inc_configs/mp_handpose.yaml b/tools/quantize/inc_configs/mp_handpose.yaml
@@ -32,6 +32,18 @@ quantization:                                        # optional. tuning constrai
           dtype: float32
           label: True
 
+  model_wise:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+    weight:
+      granularity: per_tensor
+      scheme: asym
+      dtype: int8
+      algorithm: minmax
+    activation:
+      granularity: per_tensor
+      scheme: asym
+      dtype: int8
+      algorithm: minmax
+
 tuning:
   accuracy_criterion:
     relative:  0.02                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
diff --git a/tools/quantize/inc_configs/mp_palmdet.yaml b/tools/quantize/inc_configs/mp_palmdet.yaml
@@ -32,6 +32,18 @@ quantization:                                        # optional. tuning constrai
           dtype: float32
           label: True
 
+  model_wise:                                        # optional. tuning constraints on model-wise for advance user to reduce tuning space.
+    weight:
+      granularity: per_tensor
+      scheme: asym
+      dtype: int8
+      algorithm: minmax
+    activation:
+      granularity: per_tensor
+      scheme: asym
+      dtype: int8
+      algorithm: minmax
+
 tuning:
   accuracy_criterion:
     relative:  0.02                                  # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.