-
Notifications
You must be signed in to change notification settings - Fork 24.2k
[quant] Add default symmetric qconfig for qnnpack #74396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: # New qconfig `default_symmetric_qnnpack_qconfig` Returns a qconfig with signed activation and symmetric weights with range restrictions. Also adds per_channel variant for the same. ## Restrictions on weights Restrictions on weights include, 1. weight zero point is force zero. and 2. weight 8-bit signed quantized value are limited to [-127, +127] excluding the value +128. This is driven, in part, by the desire to achieve better performance by XNNPACK ops. ## qengine/backend = `qnnpack` and XNNPACK ops Qconfig returned by this function allows us to use faster XNNPACK quantized ops for CPUs w/ said restrictions. Although we are using XNNPACK ops the qengine is still `qnnpack`, and there are no plans to introduce a new qengine for XNNPACK ops. Support to use XNNPACK ops with asymmetric (returned by get_default_qconfig()) qconfig is WIP. ## Updated EPS value: * From PyTorch: eps: ``` >>> import torch >>> torch.finfo(torch.float32).eps 1.1920928955078125e-07 >>> torch.finfo(torch.float32).eps.hex() '0x1.0000000000000p-23' ``` All scale values are float32 and `scale = max(scale, eps)` * Requirement from XNNPACK For both fp32 as well as rndnu requantization schema, `0x1p-32 <= requantization_scale < 256.0` Where, requantization_scale = (input_scale * kernel_scale) / (output_scale) * New minimum allowed scale value With current float32 eps (=0x1p-23) as minimum, xnnpack lower bound is the problem. We haven’t observed upper bound issues so far with assuming the max scale value of 256. So focusing on the lower bound, to cover all possible cases of requantization value, conservatively, we must have the minimum possible requantization scale value such that, ``` minimum_requantization_value = xnnpack_lower_threshold input_scale * kernel_scale / output_scale = 0x1p-32 min_scale_value * min_scale_value / max_scale_value = 0x1p-32 min_scale_value * new_eps / 256 = 0x1p-32 min_scale_value**2 = 0x1p-24 min_scale_value = 0x1p-12 ``` With `scale_value >= 0x1p-12`, we should be able to avoid the lower threshold on requantization scale by xnnpack kernels. Obviously this is a very unlikely to happen. So practically, we should be get away with much smaller value than `0x1p-12` as EPS, but it is not easy to choose a smaller value empirically. * Impact on accuracy is unclear as of writing this. Reviewed By: kimishpatel Differential Revision: D34625300 fbshipit-source-id: f8ddea2ec3c2d31aae03096d8851e9893344a0fc
CI Flow Status⚛️ CI FlowRuleset - Version:
|
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit e1b003e (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
This pull request was exported from Phabricator. Differential Revision: D34625300 |
Summary: Pull Request resolved: #74396 # New qconfig `default_symmetric_qnnpack_qconfig` Returns a qconfig with signed activation and symmetric weights with range restrictions. Also adds per_channel variant for the same. ## Restrictions on weights Restrictions on weights include, 1. weight zero point is force zero. and 2. weight 8-bit signed quantized value are limited to [-127, +127] excluding the value +128. This is driven, in part, by the desire to achieve better performance by XNNPACK ops. ## qengine/backend = `qnnpack` and XNNPACK ops Qconfig returned by this function allows us to use faster XNNPACK quantized ops for CPUs w/ said restrictions. Although we are using XNNPACK ops the qengine is still `qnnpack`, and there are no plans to introduce a new qengine for XNNPACK ops. Support to use XNNPACK ops with asymmetric (returned by get_default_qconfig()) qconfig is WIP. ## Updated EPS value: * From PyTorch: eps: ``` >>> import torch >>> torch.finfo(torch.float32).eps 1.1920928955078125e-07 >>> torch.finfo(torch.float32).eps.hex() '0x1.0000000000000p-23' ``` All scale values are float32 and `scale = max(scale, eps)` * Requirement from XNNPACK For both fp32 as well as rndnu requantization schema, `0x1p-32 <= requantization_scale < 256.0` Where, requantization_scale = (input_scale * kernel_scale) / (output_scale) * New minimum allowed scale value With current float32 eps (=0x1p-23) as minimum, xnnpack lower bound is the problem. We haven’t observed upper bound issues so far with assuming the max scale value of 256. So focusing on the lower bound, to cover all possible cases of requantization value, conservatively, we must have the minimum possible requantization scale value such that, ``` minimum_requantization_value = xnnpack_lower_threshold input_scale * kernel_scale / output_scale = 0x1p-32 min_scale_value * min_scale_value / max_scale_value = 0x1p-32 min_scale_value * new_eps / 256 = 0x1p-32 min_scale_value**2 = 0x1p-24 min_scale_value = 0x1p-12 ``` With `scale_value >= 0x1p-12`, we should be able to avoid the lower threshold on requantization scale by xnnpack kernels. Obviously this is a very unlikely to happen. So practically, we should be get away with much smaller value than `0x1p-12` as EPS, but it is not easy to choose a smaller value empirically. * Impact on accuracy is unclear as of writing this. Reviewed By: kimishpatel Differential Revision: D34625300 fbshipit-source-id: 005e6757ed1185b3940b58ac55246cba8b267828
Hey @digantdesai. |
Summary: Pull Request resolved: #74396 # New qconfig `default_symmetric_qnnpack_qconfig` Returns a qconfig with signed activation and symmetric weights with range restrictions. Also adds per_channel variant for the same. ## Restrictions on weights Restrictions on weights include, 1. weight zero point is force zero. and 2. weight 8-bit signed quantized value are limited to [-127, +127] excluding the value +128. This is driven, in part, by the desire to achieve better performance by XNNPACK ops. ## qengine/backend = `qnnpack` and XNNPACK ops Qconfig returned by this function allows us to use faster XNNPACK quantized ops for CPUs w/ said restrictions. Although we are using XNNPACK ops the qengine is still `qnnpack`, and there are no plans to introduce a new qengine for XNNPACK ops. Support to use XNNPACK ops with asymmetric (returned by get_default_qconfig()) qconfig is WIP. ## Updated EPS value: * From PyTorch: eps: ``` >>> import torch >>> torch.finfo(torch.float32).eps 1.1920928955078125e-07 >>> torch.finfo(torch.float32).eps.hex() '0x1.0000000000000p-23' ``` All scale values are float32 and `scale = max(scale, eps)` * Requirement from XNNPACK For both fp32 as well as rndnu requantization schema, `0x1p-32 <= requantization_scale < 256.0` Where, requantization_scale = (input_scale * kernel_scale) / (output_scale) * New minimum allowed scale value With current float32 eps (=0x1p-23) as minimum, xnnpack lower bound is the problem. We haven’t observed upper bound issues so far with assuming the max scale value of 256. So focusing on the lower bound, to cover all possible cases of requantization value, conservatively, we must have the minimum possible requantization scale value such that, ``` minimum_requantization_value = xnnpack_lower_threshold input_scale * kernel_scale / output_scale = 0x1p-32 min_scale_value * min_scale_value / max_scale_value = 0x1p-32 min_scale_value * new_eps / 256 = 0x1p-32 min_scale_value**2 = 0x1p-24 min_scale_value = 0x1p-12 ``` With `scale_value >= 0x1p-12`, we should be able to avoid the lower threshold on requantization scale by xnnpack kernels. Obviously this is a very unlikely to happen. So practically, we should be get away with much smaller value than `0x1p-12` as EPS, but it is not easy to choose a smaller value empirically. * Impact on accuracy is unclear as of writing this. Reviewed By: kimishpatel Differential Revision: D34625300 fbshipit-source-id: 005e6757ed1185b3940b58ac55246cba8b267828 (cherry picked from commit 61ed1a2)
**Summary:** This commit enforces the following constraints on the QNNPACK BackendConfig: - quant_min_lower_bound = -127 for weight - quant_max_upper_bound = 127 for weight - scale_min_lower_bound = 2 ** -12 for both activations and weight These are consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT equivalents, which were added in #74396 and #74507 to enable users to use this backend with faster XNNPACK quantized ops. **BC-breaking notes:** The QConfigs returned by `get_default_qconfig("qnnpack")` and `get_default_qat_qconfig("qnnpack")` are changed to reflect the new constraints imposed on the backend. These default QConfigs are still compatible with the BackendConfig returned by `get_qnnpack_backend_config()`. However, existing non-default QConfigs that did not impose the constraints described above will no longer work with the QNNPACK BackendConfig. The resulting behavior in this case is that the corresponding patterns will no longer by quantized, and a warning explaining what the missing constraints are will be logged. **Test Plan:** python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo
**Summary:** This commit enforces the following constraints on the QNNPACK BackendConfig: - quant_min_lower_bound = -127 for weight - quant_max_upper_bound = 127 for weight - scale_min_lower_bound = 2 ** -12 for both activations and weight These are consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT equivalents, which were added in #74396 and #74507 to enable users to use this backend with faster XNNPACK quantized ops. **BC-breaking notes:** The QConfigs returned by `get_default_qconfig("qnnpack")` and `get_default_qat_qconfig("qnnpack")` are changed to reflect the new constraints imposed on the backend. These default QConfigs are still compatible with the BackendConfig returned by `get_qnnpack_backend_config()`. However, existing non-default QConfigs that did not impose the constraints described above will no longer work with the QNNPACK BackendConfig. The resulting behavior in this case is that the corresponding patterns will no longer by quantized, and a warning explaining what the missing constraints are will be logged. **Test Plan:** python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo [ghstack-poisoned]
**Summary:** This commit enforces the following constraints on the QNNPACK BackendConfig: - `quant_min_lower_bound` = -127 for weight - `quant_max_upper_bound` = 127 for weight - `scale_min_lower_bound` = 2 ** -12 for both activations and weight These are consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT equivalents, which were added in #74396 and #74507 to enable users to use this backend with faster XNNPACK quantized ops. **BC-breaking notes:** The QConfigs returned by `get_default_qconfig("qnnpack")` and `get_default_qat_qconfig("qnnpack")` are changed to reflect the new constraints imposed by the backend. These default QConfigs are still compatible with the BackendConfig returned by `get_qnnpack_backend_config()`. However, existing non-default QConfigs that did not impose the constraints described above will no longer work with the QNNPACK BackendConfig. The resulting behavior in this case is that the corresponding patterns will no longer be quantized, and a warning explaining what the missing constraints are will be logged. **Test Plan:** python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo [ghstack-poisoned]
**Summary:** This commit enforces the following constraints on the QNNPACK BackendConfig: - `quant_min_lower_bound` = -127 for weight - `quant_max_upper_bound` = 127 for weight - `scale_min_lower_bound` = 2 ** -12 for both activations and weight These are consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT equivalents, which were added in #74396 and #74507 to enable users to use this backend with faster XNNPACK quantized ops. **BC-breaking notes:** The QConfigs returned by `get_default_qconfig("qnnpack")` and `get_default_qat_qconfig("qnnpack")` are changed to reflect the new constraints imposed by the backend. These default QConfigs are still compatible with the BackendConfig returned by `get_qnnpack_backend_config()`. However, existing non-default QConfigs that did not impose the constraints described above will no longer work with the QNNPACK BackendConfig. The resulting behavior in this case is that the corresponding patterns will no longer be quantized, and a warning explaining what the missing constraints are will be logged. **Test Plan:** python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo ghstack-source-id: ed1927e Pull Request resolved: #85863
…kendConfig" **Summary:** This commit enforces the following constraints on the QNNPACK BackendConfig: - `quant_min_lower_bound` = -127 for weight - `quant_max_upper_bound` = 127 for weight - `scale_min_lower_bound` = 2 ** -12 for both activations and weight These are consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT equivalents, which were added in #74396 and #74507 to enable users to use this backend with faster XNNPACK quantized ops. **BC-breaking notes:** The QConfigs returned by `get_default_qconfig("qnnpack")` and `get_default_qat_qconfig("qnnpack")` are changed to reflect the new constraints imposed by the backend. These default QConfigs are still compatible with the BackendConfig returned by `get_qnnpack_backend_config()`. However, existing non-default QConfigs that did not impose the constraints described above will no longer work with the QNNPACK BackendConfig. The resulting behavior in this case is that the corresponding patterns will no longer be quantized, and a warning explaining what the missing constraints are will be logged. **Test Plan:** python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo [ghstack-poisoned]
**Summary:** This commit enforces the following constraints on the QNNPACK BackendConfig: - `quant_min_lower_bound` = -127 for weight - `quant_max_upper_bound` = 127 for weight - `scale_min_lower_bound` = 2 ** -12 for both activations and weight These are consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT equivalents, which were added in #74396 and #74507 to enable users to use this backend with faster XNNPACK quantized ops. **BC-breaking notes:** The QConfigs returned by `get_default_qconfig("qnnpack")` and `get_default_qat_qconfig("qnnpack")` are changed to reflect the new constraints imposed by the backend. These default QConfigs are still compatible with the BackendConfig returned by `get_qnnpack_backend_config()`. However, existing non-default QConfigs that did not impose the constraints described above will no longer work with the QNNPACK BackendConfig. The resulting behavior in this case is that the corresponding patterns will no longer be quantized, and a warning explaining what the missing constraints are will be logged. **Test Plan:** python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo [ghstack-poisoned]
…BackendConfig" **Summary:** This commit enforces the following constraints on the QNNPACK BackendConfig: - `quant_min_lower_bound` = -127 for weight - `quant_max_upper_bound` = 127 for weight - `scale_min_lower_bound` = 2 ** -12 for both activations and weight These constraints will enable users to use this backend with faster XNNPACK quantized ops and are consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT variants. For more detail on why these exact values were chosen, please see the description of #74396. Note that there are currently no restrictions on the qscheme in DTypeConfig. This should be added in the future to further enforce the restriction that the weights must be either per_tensor_symmetric or per_channel_symmetric. **Test Plan:** python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo [ghstack-poisoned]
**Summary:** This commit enforces the following constraints on the QNNPACK BackendConfig: - `quant_min_lower_bound` = -127 for weight - `quant_max_upper_bound` = 127 for weight - `scale_min_lower_bound` = 2 ** -12 for both activations and weight These constraints will enable users to use this backend with faster XNNPACK quantized ops and are consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT variants. For more detail on why these exact values were chosen, please see the description of #74396. Note that there are currently no restrictions on the qscheme in DTypeConfig. This should be added in the future to further enforce the restriction that the weights must be either per_tensor_symmetric or per_channel_symmetric. **Test Plan:** python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo [ghstack-poisoned]
…endConfig" **Summary:** This commit enforces the following constraints on the QNNPACK BackendConfig: - `quant_min_lower_bound` = -127 for qint8 weight - `quant_max_upper_bound` = 127 for qint8 weight - `scale_min_lower_bound` = 2 ** -12 for qint8 activations and weight These constraints will enable users to use this BackendConfig with faster XNNPACK quantized ops. They are also consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT variants. For more detail on why these exact values were chosen, please see the description of #74396. Note that there are currently no restrictions on the qscheme in DTypeConfig. This should be added in the future to further enforce the restriction that the weights must be quantized with either per_tensor_symmetric or per_channel_symmetric. Existing default QConfigs such as `get_default_qconfig("qnnpack")` and `get_default_qat_qconfig("qnnpack")` will continue to be supported, but only for the existing dtypes, e.g. quint8 activations for weighted ops like linear and conv. In the future, we should revisit whether to enable XNNPACK ops using these QConfigs as well. **Test Plan:** python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo [ghstack-poisoned]
**Summary:** This commit enforces the following constraints on the QNNPACK BackendConfig: - `quant_min_lower_bound` = -127 for qint8 weight - `quant_max_upper_bound` = 127 for qint8 weight - `scale_min_lower_bound` = 2 ** -12 for qint8 activations and weight These constraints will enable users to use this BackendConfig with faster XNNPACK quantized ops. They are also consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT variants. For more detail on why these exact values were chosen, please see the description of #74396. Note that there are currently no restrictions on the qscheme in DTypeConfig. This should be added in the future to further enforce the restriction that the weights must be quantized with either per_tensor_symmetric or per_channel_symmetric. Existing default QConfigs such as `get_default_qconfig("qnnpack")` and `get_default_qat_qconfig("qnnpack")` will continue to be supported, but only for the existing dtypes, e.g. quint8 activations for weighted ops like linear and conv. In the future, we should revisit whether to enable XNNPACK ops using these QConfigs as well. **Test Plan:** python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo [ghstack-poisoned]
…endConfig" **Summary:** This commit enforces the following constraints on the QNNPACK BackendConfig: - `quant_min_lower_bound` = -127 for qint8 weight - `quant_max_upper_bound` = 127 for qint8 weight - `scale_min_lower_bound` = 2 ** -12 for qint8 activations and weight These constraints will enable users to use this BackendConfig with faster XNNPACK quantized ops. They are also consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT variants. For more detail on why these exact values were chosen, please see the description of #74396. Note that there are currently no restrictions on the qscheme in DTypeConfig. This should be added in the future to further enforce the restriction that the weights must be quantized with either per_tensor_symmetric or per_channel_symmetric. Existing default QConfigs such as `get_default_qconfig("qnnpack")` and `get_default_qat_qconfig("qnnpack")` will continue to be supported, but only for the existing dtypes, e.g. quint8 activations for weighted ops like linear and conv. In the future, we should revisit whether to enable XNNPACK ops using these QConfigs as well. **Test Plan:** python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo [ghstack-poisoned]
**Summary:** This commit enforces the following constraints on the QNNPACK BackendConfig: - `quant_min_lower_bound` = -127 for qint8 weight - `quant_max_upper_bound` = 127 for qint8 weight - `scale_min_lower_bound` = 2 ** -12 for qint8 activations and weight These constraints will enable users to use this BackendConfig with faster XNNPACK quantized ops. They are also consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT variants. For more detail on why these exact values were chosen, please see the description of #74396. Note that there are currently no restrictions on the qscheme in DTypeConfig. This should be added in the future to further enforce the restriction that the weights must be quantized with either per_tensor_symmetric or per_channel_symmetric. Existing default QConfigs such as `get_default_qconfig("qnnpack")` and `get_default_qat_qconfig("qnnpack")` will continue to be supported, but only for the existing dtypes, e.g. quint8 activations for weighted ops like linear and conv. In the future, we should revisit whether to enable XNNPACK ops using these QConfigs as well. **Test Plan:** python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo [ghstack-poisoned]
**Summary:** This commit enforces the following constraints on the QNNPACK BackendConfig: - `quant_min_lower_bound` = -127 for qint8 weight - `quant_max_upper_bound` = 127 for qint8 weight - `scale_min_lower_bound` = 2 ** -12 for qint8 activations and weight These constraints will enable users to use this BackendConfig with faster XNNPACK quantized ops. They are also consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT variants. For more detail on why these exact values were chosen, please see the description of #74396. Note that there are currently no restrictions on the qscheme in DTypeConfig. This should be added in the future to further enforce the restriction that the weights must be quantized with either per_tensor_symmetric or per_channel_symmetric. Existing default QConfigs such as `get_default_qconfig("qnnpack")` and `get_default_qat_qconfig("qnnpack")` will continue to be supported, but only for the existing dtypes, e.g. quint8 activations for weighted ops like linear and conv. In the future, we should revisit whether to enable XNNPACK ops using these QConfigs as well. **Test Plan:** python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo ghstack-source-id: 0aff560 Pull Request resolved: #85863
…endConfig" **Summary:** This commit enforces the following constraints on the QNNPACK BackendConfig: - `quant_min_lower_bound` = -127 for qint8 weight - `quant_max_upper_bound` = 127 for qint8 weight - `scale_min_lower_bound` = 2 ** -12 for qint8 activations and weight These constraints will enable users to use this BackendConfig with faster XNNPACK quantized ops. They are also consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT variants. For more detail on why these exact values were chosen, please see the description of #74396. Note that there are currently no restrictions on the qscheme in DTypeConfig. This should be added in the future to further enforce the restriction that the weights must be quantized with either per_tensor_symmetric or per_channel_symmetric. Existing default QConfigs such as `get_default_qconfig("qnnpack")` and `get_default_qat_qconfig("qnnpack")` will continue to be supported, but only for the existing dtypes, e.g. quint8 activations for weighted ops like linear and conv. In the future, we should revisit whether to enable XNNPACK ops using these QConfigs as well. **Test Plan:** python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo [ghstack-poisoned]
**Summary:** This commit enforces the following constraints on the QNNPACK BackendConfig: - `quant_min_lower_bound` = -127 for qint8 weight - `quant_max_upper_bound` = 127 for qint8 weight - `scale_min_lower_bound` = 2 ** -12 for qint8 activations and weight These constraints will enable users to use this BackendConfig with faster XNNPACK quantized ops. They are also consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT variants. For more detail on why these exact values were chosen, please see the description of #74396. Note that there are currently no restrictions on the qscheme in DTypeConfig. This should be added in the future to further enforce the restriction that the weights must be quantized with either per_tensor_symmetric or per_channel_symmetric. Existing default QConfigs such as `get_default_qconfig("qnnpack")` and `get_default_qat_qconfig("qnnpack")` will continue to be supported, but only for the existing dtypes, e.g. quint8 activations for weighted ops like linear and conv. In the future, we should revisit whether to enable XNNPACK ops using these QConfigs as well. **Test Plan:** python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo [ghstack-poisoned]
…endConfig" **Summary:** This commit enforces the following constraints on the QNNPACK BackendConfig: - `quant_min_lower_bound` = -127 for qint8 weight - `quant_max_upper_bound` = 127 for qint8 weight - `scale_min_lower_bound` = 2 ** -12 for qint8 activations and weight These constraints will enable users to use this BackendConfig with faster XNNPACK quantized ops. They are also consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT variants. For more detail on why these exact values were chosen, please see the description of #74396. Note that there are currently no restrictions on the qscheme in DTypeConfig. This should be added in the future to further enforce the restriction that the weights must be quantized with either per_tensor_symmetric or per_channel_symmetric. Existing default QConfigs such as `get_default_qconfig("qnnpack")` and `get_default_qat_qconfig("qnnpack")` will continue to be supported, but only for the existing dtypes, e.g. quint8 activations for weighted ops like linear and conv. In the future, we should revisit whether to enable XNNPACK ops using these QConfigs as well. **Test Plan:** python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo [ghstack-poisoned]
**Summary:** This commit enforces the following constraints on the QNNPACK BackendConfig: - `quant_min_lower_bound` = -127 for qint8 weight - `quant_max_upper_bound` = 127 for qint8 weight - `scale_min_lower_bound` = 2 ** -12 for qint8 activations and weight These constraints will enable users to use this BackendConfig with faster XNNPACK quantized ops. They are also consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT variants. For more detail on why these exact values were chosen, please see the description of #74396. Note that there are currently no restrictions on the qscheme in DTypeConfig. This should be added in the future to further enforce the restriction that the weights must be quantized with either per_tensor_symmetric or per_channel_symmetric. Existing default QConfigs such as `get_default_qconfig("qnnpack")` and `get_default_qat_qconfig("qnnpack")` will continue to be supported, but only for the existing dtypes, e.g. quint8 activations for weighted ops like linear and conv. In the future, we should revisit whether to enable XNNPACK ops using these QConfigs as well. **Test Plan:** python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo [ghstack-poisoned]
…endConfig" **Summary:** This commit enforces the following constraints on the QNNPACK BackendConfig: - `quant_min_lower_bound` = -127 for qint8 weight - `quant_max_upper_bound` = 127 for qint8 weight - `scale_min_lower_bound` = 2 ** -12 for qint8 activations and weight These constraints will enable users to use this BackendConfig with faster XNNPACK quantized ops. They are also consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT variants. For more detail on why these exact values were chosen, please see the description of #74396. Note that there are currently no restrictions on the qscheme in DTypeConfig. This should be added in the future to further enforce the restriction that the weights must be quantized with either per_tensor_symmetric or per_channel_symmetric. Existing default QConfigs such as `get_default_qconfig("qnnpack")` and `get_default_qat_qconfig("qnnpack")` will continue to be supported, but only for the existing dtypes, e.g. quint8 activations for weighted ops like linear and conv. In the future, we should revisit whether to enable XNNPACK ops using these QConfigs as well. **Test Plan:** python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo [ghstack-poisoned]
**Summary:** This commit enforces the following constraints on the QNNPACK BackendConfig: - `quant_min_lower_bound` = -127 for qint8 weight - `quant_max_upper_bound` = 127 for qint8 weight - `scale_min_lower_bound` = 2 ** -12 for qint8 activations and weight These constraints will enable users to use this BackendConfig with faster XNNPACK quantized ops. They are also consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT variants. For more detail on why these exact values were chosen, please see the description of #74396. Note that there are currently no restrictions on the qscheme in DTypeConfig. This should be added in the future to further enforce the restriction that the weights must be quantized with either per_tensor_symmetric or per_channel_symmetric. Existing default QConfigs such as `get_default_qconfig("qnnpack")` and `get_default_qat_qconfig("qnnpack")` will continue to be supported, but only for the existing dtypes, e.g. quint8 activations for weighted ops like linear and conv. In the future, we should revisit whether to enable XNNPACK ops using these QConfigs as well. **Test Plan:** python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo [ghstack-poisoned]
**Summary:** This commit enforces the following constraints on the QNNPACK BackendConfig: - `quant_min_lower_bound` = -127 for qint8 weight - `quant_max_upper_bound` = 127 for qint8 weight - `scale_min_lower_bound` = 2 ** -12 for qint8 activations and weight These constraints will enable users to use this BackendConfig with faster XNNPACK quantized ops. They are also consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT variants. For more detail on why these exact values were chosen, please see the description of #74396. Note that there are currently no restrictions on the qscheme in DTypeConfig. This should be added in the future to further enforce the restriction that the weights must be quantized with either per_tensor_symmetric or per_channel_symmetric. Existing default QConfigs such as `get_default_qconfig("qnnpack")` and `get_default_qat_qconfig("qnnpack")` will continue to be supported, but only for the existing dtypes, e.g. quint8 activations for weighted ops like linear and conv. In the future, we should revisit whether to enable XNNPACK ops using these QConfigs as well. **Test Plan:** python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo ghstack-source-id: 30dd311 Pull Request resolved: #85863
**Summary:** This commit enforces the following constraints on the QNNPACK BackendConfig: - `quant_min_lower_bound` = -127 for qint8 weight - `quant_max_upper_bound` = 127 for qint8 weight - `scale_min_lower_bound` = 2 ** -12 for qint8 activations and weight These constraints will enable users to use this BackendConfig with faster XNNPACK quantized ops. They are also consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT variants. For more detail on why these exact values were chosen, please see the description of #74396. Note that there are currently no restrictions on the qscheme in DTypeConfig. This should be added in the future to further enforce the restriction that the weights must be quantized with either per_tensor_symmetric or per_channel_symmetric. Existing default QConfigs such as `get_default_qconfig("qnnpack")` and `get_default_qat_qconfig("qnnpack")` will continue to be supported, but only for the existing dtypes, e.g. quint8 activations for weighted ops like linear and conv. In the future, we should revisit whether to enable XNNPACK ops using these QConfigs as well. **Test Plan:** python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo Pull Request resolved: #85863 Approved by: https://github.com/jerryzh168
**Summary:** This commit enforces the following constraints on the QNNPACK BackendConfig: - `quant_min_lower_bound` = -127 for qint8 weight - `quant_max_upper_bound` = 127 for qint8 weight - `scale_min_lower_bound` = 2 ** -12 for qint8 activations and weight These constraints will enable users to use this BackendConfig with faster XNNPACK quantized ops. They are also consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT variants. For more detail on why these exact values were chosen, please see the description of #74396. Note that there are currently no restrictions on the qscheme in DTypeConfig. This should be added in the future to further enforce the restriction that the weights must be quantized with either per_tensor_symmetric or per_channel_symmetric. Existing default QConfigs such as `get_default_qconfig("qnnpack")` and `get_default_qat_qconfig("qnnpack")` will continue to be supported, but only for the existing dtypes, e.g. quint8 activations for weighted ops like linear and conv. In the future, we should revisit whether to enable XNNPACK ops using these QConfigs as well. **Test Plan:** python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo Pull Request resolved: #85863 Approved by: https://github.com/jerryzh168
Summary:
New qconfig
default_symmetric_qnnpack_qconfig
Returns a qconfig with signed activation and symmetric weights with range restrictions. Also adds per_channel variant for the same.
Restrictions on weights
Restrictions on weights include,
This is driven, in part, by the desire to achieve better performance by XNNPACK ops.
qengine/backend =
qnnpack
and XNNPACK opsQconfig returned by this function allows us to use faster XNNPACK quantized ops for CPUs w/ said restrictions. Although we are using XNNPACK ops the qengine is still
qnnpack
, and there are no plans to introduce a new qengine for XNNPACK ops. Support to use XNNPACK ops with asymmetric (returned by get_default_qconfig()) qconfig is WIP.Updated EPS value:
eps:
All scale values are float32 and
scale = max(scale, eps)
For both fp32 as well as rndnu requantization schema,
0x1p-32 <= requantization_scale < 256.0
Where, requantization_scale = (input_scale * kernel_scale) / (output_scale)
With current float32 eps (=0x1p-23) as minimum, xnnpack lower bound is the problem. We haven’t observed upper bound issues so far with assuming the max scale value of 256. So focusing on the lower bound, to cover all possible cases of requantization value, conservatively, we must have the minimum possible requantization scale value such that,
With
scale_value >= 0x1p-12
, we should be able to avoid the lower threshold on requantization scale by xnnpack kernels.Obviously this is a very unlikely to happen. So practically, we should be get away with much smaller value than
0x1p-12
as EPS, but it is not easy to choose a smaller value empirically.Reviewed By: kimishpatel
Differential Revision: D34625300