Update on "[Quant] Enable XNNPACK ops in QNNPACK BackendConfig"

andrewor14 · andrewor14 · commit a7f62c5db999 · 2022-09-29T15:20:17.000-07:00
**Summary:** This commit enforces the following constraints on the QNNPACK BackendConfig: - `quant_min_lower_bound` = -127 for qint8 weight - `quant_max_upper_bound` = 127 for qint8 weight - `scale_min_lower_bound` = 2 ** -12 for qint8 activations and weight These constraints will enable users to use this BackendConfig with faster XNNPACK quantized ops. They are also consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT variants. For more detail on why these exact values were chosen, please see the description of #74396. Note that there are currently no restrictions on the qscheme in DTypeConfig. This should be added in the future to further enforce the restriction that the weights must be quantized with either per_tensor_symmetric or per_channel_symmetric. Existing default QConfigs such as `get_default_qconfig("qnnpack")` and `get_default_qat_qconfig("qnnpack")` will continue to be supported, but only for the existing dtypes, e.g. quint8 activations for weighted ops like linear and conv. In the future, we should revisit whether to enable XNNPACK ops using these QConfigs as well. **Test Plan:** python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo [ghstack-poisoned]
diff --git a/torch/ao/quantization/fx/utils.py b/torch/ao/quantization/fx/utils.py
@@ -964,7 +964,7 @@ def _get_observer_from_activation_post_process(
     if isinstance(activation_post_process, ObserverBase):
         return activation_post_process
     else:
-        return activation_post_process.activation_post_process
+        return activation_post_process.activation_post_process  # type: ignore[return-value]
 
 def _qconfig_satisfies_dtype_config_constraints(
         qconfig: QConfigAny,