[quant] Add default symmetric qconfig for qnnpack #74396

digantdesai · 2022-03-17T23:48:26Z

Summary:

New qconfig `default_symmetric_qnnpack_qconfig`

Returns a qconfig with signed activation and symmetric weights with range restrictions. Also adds per_channel variant for the same.

Restrictions on weights

Restrictions on weights include,

weight zero point is force zero. and
weight 8-bit signed quantized value are limited to [-127, +127] excluding the value +128.

This is driven, in part, by the desire to achieve better performance by XNNPACK ops.

qengine/backend = `qnnpack` and XNNPACK ops

Qconfig returned by this function allows us to use faster XNNPACK quantized ops for CPUs w/ said restrictions. Although we are using XNNPACK ops the qengine is still qnnpack, and there are no plans to introduce a new qengine for XNNPACK ops. Support to use XNNPACK ops with asymmetric (returned by get_default_qconfig()) qconfig is WIP.

Updated EPS value:

From PyTorch:

eps:

>>> import torch
>>> torch.finfo(torch.float32).eps
1.1920928955078125e-07
>>> torch.finfo(torch.float32).eps.hex()
'0x1.0000000000000p-23'

All scale values are float32 and scale = max(scale, eps)

Requirement from XNNPACK

For both fp32 as well as rndnu requantization schema, 0x1p-32 <= requantization_scale < 256.0
Where, requantization_scale = (input_scale * kernel_scale) / (output_scale)

New minimum allowed scale value

With current float32 eps (=0x1p-23) as minimum, xnnpack lower bound is the problem. We haven’t observed upper bound issues so far with assuming the max scale value of 256. So focusing on the lower bound, to cover all possible cases of requantization value, conservatively, we must have the minimum possible requantization scale value such that,

minimum_requantization_value = xnnpack_lower_threshold
input_scale * kernel_scale / output_scale = 0x1p-32
min_scale_value * min_scale_value / max_scale_value = 0x1p-32
min_scale_value * new_eps / 256 = 0x1p-32
min_scale_value**2 = 0x1p-24
min_scale_value = 0x1p-12

With scale_value >= 0x1p-12, we should be able to avoid the lower threshold on requantization scale by xnnpack kernels.

Obviously this is a very unlikely to happen. So practically, we should be get away with much smaller value than 0x1p-12 as EPS, but it is not easy to choose a smaller value empirically.

Impact on accuracy is unclear as of writing this.

Reviewed By: kimishpatel

Differential Revision: D34625300

Summary: # New qconfig `default_symmetric_qnnpack_qconfig` Returns a qconfig with signed activation and symmetric weights with range restrictions. Also adds per_channel variant for the same. ## Restrictions on weights Restrictions on weights include, 1. weight zero point is force zero. and 2. weight 8-bit signed quantized value are limited to [-127, +127] excluding the value +128. This is driven, in part, by the desire to achieve better performance by XNNPACK ops. ## qengine/backend = `qnnpack` and XNNPACK ops Qconfig returned by this function allows us to use faster XNNPACK quantized ops for CPUs w/ said restrictions. Although we are using XNNPACK ops the qengine is still `qnnpack`, and there are no plans to introduce a new qengine for XNNPACK ops. Support to use XNNPACK ops with asymmetric (returned by get_default_qconfig()) qconfig is WIP. ## Updated EPS value: * From PyTorch: eps: ``` >>> import torch >>> torch.finfo(torch.float32).eps 1.1920928955078125e-07 >>> torch.finfo(torch.float32).eps.hex() '0x1.0000000000000p-23' ``` All scale values are float32 and `scale = max(scale, eps)` * Requirement from XNNPACK For both fp32 as well as rndnu requantization schema, `0x1p-32 <= requantization_scale < 256.0` Where, requantization_scale = (input_scale * kernel_scale) / (output_scale) * New minimum allowed scale value With current float32 eps (=0x1p-23) as minimum, xnnpack lower bound is the problem. We haven’t observed upper bound issues so far with assuming the max scale value of 256. So focusing on the lower bound, to cover all possible cases of requantization value, conservatively, we must have the minimum possible requantization scale value such that, ``` minimum_requantization_value = xnnpack_lower_threshold input_scale * kernel_scale / output_scale = 0x1p-32 min_scale_value * min_scale_value / max_scale_value = 0x1p-32 min_scale_value * new_eps / 256 = 0x1p-32 min_scale_value**2 = 0x1p-24 min_scale_value = 0x1p-12 ``` With `scale_value >= 0x1p-12`, we should be able to avoid the lower threshold on requantization scale by xnnpack kernels. Obviously this is a very unlikely to happen. So practically, we should be get away with much smaller value than `0x1p-12` as EPS, but it is not easy to choose a smaller value empirically. * Impact on accuracy is unclear as of writing this. Reviewed By: kimishpatel Differential Revision: D34625300 fbshipit-source-id: f8ddea2ec3c2d31aae03096d8851e9893344a0fc

pytorch-bot · 2022-03-17T23:48:30Z

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/digantdesai/pytorch/blob/e1b003e5aa3d8a11f3381fd29ee1b8b3e750abfb/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default
Add ciflow labels to this PR to trigger more builds:

Workflows	Labels (bold enabled)	Status
Triggered Workflows
deploy-linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-binary-conda	`ciflow/binaries`, `ciflow/binaries_conda`, `ciflow/default`	✅ triggered
linux-binary-libtorch-cxx11-abi	`ciflow/all`, `ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`, `ciflow/trunk`	✅ triggered
linux-binary-libtorch-pre-cxx11	`ciflow/all`, `ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`, `ciflow/trunk`	✅ triggered
linux-binary-manywheel	`ciflow/all`, `ciflow/binaries`, `ciflow/binaries_wheel`, `ciflow/default`, `ciflow/trunk`	✅ triggered
linux-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/noarch`, `ciflow/trunk`	✅ triggered
linux-bionic-rocm4.5-py3.7	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/rocm`, `ciflow/trunk`	✅ triggered
linux-docs	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/docs`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-vulkan-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`, `ciflow/vulkan`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test	`ciflow/all`, `ciflow/bazel`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-build	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-asan	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/sanitizers`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-onnx	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/onnx`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc7	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc7-no-ops	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
macos-arm64-binary-conda	`ciflow/binaries`, `ciflow/binaries_conda`, `ciflow/default`	✅ triggered
macos-arm64-binary-wheel	`ciflow/binaries`, `ciflow/binaries_wheel`, `ciflow/default`	✅ triggered
macos-binary-conda	`ciflow/binaries`, `ciflow/binaries_conda`, `ciflow/default`	✅ triggered
macos-binary-libtorch-cxx11-abi	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
macos-binary-libtorch-pre-cxx11	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
macos-binary-wheel	`ciflow/binaries`, `ciflow/binaries_wheel`, `ciflow/default`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
win-vs2019-cpu-py3	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
win-vs2019-cuda11.3-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
windows-binary-conda	`ciflow/binaries`, `ciflow/binaries_conda`, `ciflow/default`	✅ triggered
windows-binary-libtorch-debug	`ciflow/all`, `ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`, `ciflow/trunk`	✅ triggered
windows-binary-libtorch-release	`ciflow/all`, `ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`, `ciflow/trunk`	✅ triggered
windows-binary-wheel	`ciflow/all`, `ciflow/binaries`, `ciflow/binaries_wheel`, `ciflow/default`, `ciflow/trunk`	✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
docker-builds	`ciflow/all`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/scheduled`	🚫 skipped
ios-12-5-1-arm64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/scheduled`	🚫 skipped
ios-12-5-1-arm64-custom-ops	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/scheduled`	🚫 skipped
ios-12-5-1-arm64-metal	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/scheduled`	🚫 skipped
ios-12-5-1-x86-64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/slow`, `ciflow/trunk`	🚫 skipped
linux-bionic-rocm4.5-py3.7-distributed	`ciflow/all`, `ciflow/linux`, `ciflow/rocm`, `ciflow/trunk`	🚫 skipped
linux-docs-push	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
linux-xenial-cuda11.3-py3.7-gcc7-no-ops	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-arm64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-11-py3-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`, `ciflow/slow`, `ciflow/slow-gradcheck`	🚫 skipped
periodic-linux-xenial-cuda11.3-py3.7-gcc7-debug	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-win-vs2019-cuda11.5-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
pytorch-xla-linux-bionic-py3.7-clang8	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`, `ciflow/xla`	🚫 skipped

facebook-github-bot · 2022-03-17T23:48:34Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/74396
📄 Preview docs built from this PR
📄 Preview C++ docs built from this PR
🔧 Opt-in to CIFlow to control what jobs run on your PRs

💊 CI failures summary and remediations

As of commit e1b003e (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

facebook-github-bot · 2022-03-17T23:48:59Z

This pull request was exported from Phabricator. Differential Revision: D34625300

Summary: Pull Request resolved: #74396 # New qconfig `default_symmetric_qnnpack_qconfig` Returns a qconfig with signed activation and symmetric weights with range restrictions. Also adds per_channel variant for the same. ## Restrictions on weights Restrictions on weights include, 1. weight zero point is force zero. and 2. weight 8-bit signed quantized value are limited to [-127, +127] excluding the value +128. This is driven, in part, by the desire to achieve better performance by XNNPACK ops. ## qengine/backend = `qnnpack` and XNNPACK ops Qconfig returned by this function allows us to use faster XNNPACK quantized ops for CPUs w/ said restrictions. Although we are using XNNPACK ops the qengine is still `qnnpack`, and there are no plans to introduce a new qengine for XNNPACK ops. Support to use XNNPACK ops with asymmetric (returned by get_default_qconfig()) qconfig is WIP. ## Updated EPS value: * From PyTorch: eps: ``` >>> import torch >>> torch.finfo(torch.float32).eps 1.1920928955078125e-07 >>> torch.finfo(torch.float32).eps.hex() '0x1.0000000000000p-23' ``` All scale values are float32 and `scale = max(scale, eps)` * Requirement from XNNPACK For both fp32 as well as rndnu requantization schema, `0x1p-32 <= requantization_scale < 256.0` Where, requantization_scale = (input_scale * kernel_scale) / (output_scale) * New minimum allowed scale value With current float32 eps (=0x1p-23) as minimum, xnnpack lower bound is the problem. We haven’t observed upper bound issues so far with assuming the max scale value of 256. So focusing on the lower bound, to cover all possible cases of requantization value, conservatively, we must have the minimum possible requantization scale value such that, ``` minimum_requantization_value = xnnpack_lower_threshold input_scale * kernel_scale / output_scale = 0x1p-32 min_scale_value * min_scale_value / max_scale_value = 0x1p-32 min_scale_value * new_eps / 256 = 0x1p-32 min_scale_value**2 = 0x1p-24 min_scale_value = 0x1p-12 ``` With `scale_value >= 0x1p-12`, we should be able to avoid the lower threshold on requantization scale by xnnpack kernels. Obviously this is a very unlikely to happen. So practically, we should be get away with much smaller value than `0x1p-12` as EPS, but it is not easy to choose a smaller value empirically. * Impact on accuracy is unclear as of writing this. Reviewed By: kimishpatel Differential Revision: D34625300 fbshipit-source-id: 005e6757ed1185b3940b58ac55246cba8b267828

github-actions · 2022-03-18T13:43:14Z

Hey @digantdesai.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

Summary: Pull Request resolved: #74396 # New qconfig `default_symmetric_qnnpack_qconfig` Returns a qconfig with signed activation and symmetric weights with range restrictions. Also adds per_channel variant for the same. ## Restrictions on weights Restrictions on weights include, 1. weight zero point is force zero. and 2. weight 8-bit signed quantized value are limited to [-127, +127] excluding the value +128. This is driven, in part, by the desire to achieve better performance by XNNPACK ops. ## qengine/backend = `qnnpack` and XNNPACK ops Qconfig returned by this function allows us to use faster XNNPACK quantized ops for CPUs w/ said restrictions. Although we are using XNNPACK ops the qengine is still `qnnpack`, and there are no plans to introduce a new qengine for XNNPACK ops. Support to use XNNPACK ops with asymmetric (returned by get_default_qconfig()) qconfig is WIP. ## Updated EPS value: * From PyTorch: eps: ``` >>> import torch >>> torch.finfo(torch.float32).eps 1.1920928955078125e-07 >>> torch.finfo(torch.float32).eps.hex() '0x1.0000000000000p-23' ``` All scale values are float32 and `scale = max(scale, eps)` * Requirement from XNNPACK For both fp32 as well as rndnu requantization schema, `0x1p-32 <= requantization_scale < 256.0` Where, requantization_scale = (input_scale * kernel_scale) / (output_scale) * New minimum allowed scale value With current float32 eps (=0x1p-23) as minimum, xnnpack lower bound is the problem. We haven’t observed upper bound issues so far with assuming the max scale value of 256. So focusing on the lower bound, to cover all possible cases of requantization value, conservatively, we must have the minimum possible requantization scale value such that, ``` minimum_requantization_value = xnnpack_lower_threshold input_scale * kernel_scale / output_scale = 0x1p-32 min_scale_value * min_scale_value / max_scale_value = 0x1p-32 min_scale_value * new_eps / 256 = 0x1p-32 min_scale_value**2 = 0x1p-24 min_scale_value = 0x1p-12 ``` With `scale_value >= 0x1p-12`, we should be able to avoid the lower threshold on requantization scale by xnnpack kernels. Obviously this is a very unlikely to happen. So practically, we should be get away with much smaller value than `0x1p-12` as EPS, but it is not easy to choose a smaller value empirically. * Impact on accuracy is unclear as of writing this. Reviewed By: kimishpatel Differential Revision: D34625300 fbshipit-source-id: 005e6757ed1185b3940b58ac55246cba8b267828 (cherry picked from commit 61ed1a2)

**Summary:** This commit enforces the following constraints on the QNNPACK BackendConfig: - quant_min_lower_bound = -127 for weight - quant_max_upper_bound = 127 for weight - scale_min_lower_bound = 2 ** -12 for both activations and weight These are consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT equivalents, which were added in #74396 and #74507 to enable users to use this backend with faster XNNPACK quantized ops. **BC-breaking notes:** The QConfigs returned by `get_default_qconfig("qnnpack")` and `get_default_qat_qconfig("qnnpack")` are changed to reflect the new constraints imposed on the backend. These default QConfigs are still compatible with the BackendConfig returned by `get_qnnpack_backend_config()`. However, existing non-default QConfigs that did not impose the constraints described above will no longer work with the QNNPACK BackendConfig. The resulting behavior in this case is that the corresponding patterns will no longer by quantized, and a warning explaining what the missing constraints are will be logged. **Test Plan:** python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo

**Summary:** This commit enforces the following constraints on the QNNPACK BackendConfig: - quant_min_lower_bound = -127 for weight - quant_max_upper_bound = 127 for weight - scale_min_lower_bound = 2 ** -12 for both activations and weight These are consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT equivalents, which were added in #74396 and #74507 to enable users to use this backend with faster XNNPACK quantized ops. **BC-breaking notes:** The QConfigs returned by `get_default_qconfig("qnnpack")` and `get_default_qat_qconfig("qnnpack")` are changed to reflect the new constraints imposed on the backend. These default QConfigs are still compatible with the BackendConfig returned by `get_qnnpack_backend_config()`. However, existing non-default QConfigs that did not impose the constraints described above will no longer work with the QNNPACK BackendConfig. The resulting behavior in this case is that the corresponding patterns will no longer by quantized, and a warning explaining what the missing constraints are will be logged. **Test Plan:** python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo [ghstack-poisoned]

**Summary:** This commit enforces the following constraints on the QNNPACK BackendConfig: - `quant_min_lower_bound` = -127 for weight - `quant_max_upper_bound` = 127 for weight - `scale_min_lower_bound` = 2 ** -12 for both activations and weight These are consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT equivalents, which were added in #74396 and #74507 to enable users to use this backend with faster XNNPACK quantized ops. **BC-breaking notes:** The QConfigs returned by `get_default_qconfig("qnnpack")` and `get_default_qat_qconfig("qnnpack")` are changed to reflect the new constraints imposed by the backend. These default QConfigs are still compatible with the BackendConfig returned by `get_qnnpack_backend_config()`. However, existing non-default QConfigs that did not impose the constraints described above will no longer work with the QNNPACK BackendConfig. The resulting behavior in this case is that the corresponding patterns will no longer be quantized, and a warning explaining what the missing constraints are will be logged. **Test Plan:** python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo [ghstack-poisoned]

**Summary:** This commit enforces the following constraints on the QNNPACK BackendConfig: - `quant_min_lower_bound` = -127 for weight - `quant_max_upper_bound` = 127 for weight - `scale_min_lower_bound` = 2 ** -12 for both activations and weight These are consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT equivalents, which were added in #74396 and #74507 to enable users to use this backend with faster XNNPACK quantized ops. **BC-breaking notes:** The QConfigs returned by `get_default_qconfig("qnnpack")` and `get_default_qat_qconfig("qnnpack")` are changed to reflect the new constraints imposed by the backend. These default QConfigs are still compatible with the BackendConfig returned by `get_qnnpack_backend_config()`. However, existing non-default QConfigs that did not impose the constraints described above will no longer work with the QNNPACK BackendConfig. The resulting behavior in this case is that the corresponding patterns will no longer be quantized, and a warning explaining what the missing constraints are will be logged. **Test Plan:** python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo ghstack-source-id: ed1927e Pull Request resolved: #85863

…kendConfig" **Summary:** This commit enforces the following constraints on the QNNPACK BackendConfig: - `quant_min_lower_bound` = -127 for weight - `quant_max_upper_bound` = 127 for weight - `scale_min_lower_bound` = 2 ** -12 for both activations and weight These are consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT equivalents, which were added in #74396 and #74507 to enable users to use this backend with faster XNNPACK quantized ops. **BC-breaking notes:** The QConfigs returned by `get_default_qconfig("qnnpack")` and `get_default_qat_qconfig("qnnpack")` are changed to reflect the new constraints imposed by the backend. These default QConfigs are still compatible with the BackendConfig returned by `get_qnnpack_backend_config()`. However, existing non-default QConfigs that did not impose the constraints described above will no longer work with the QNNPACK BackendConfig. The resulting behavior in this case is that the corresponding patterns will no longer be quantized, and a warning explaining what the missing constraints are will be logged. **Test Plan:** python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo [ghstack-poisoned]

**Summary:** This commit enforces the following constraints on the QNNPACK BackendConfig: - `quant_min_lower_bound` = -127 for weight - `quant_max_upper_bound` = 127 for weight - `scale_min_lower_bound` = 2 ** -12 for both activations and weight These are consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT equivalents, which were added in #74396 and #74507 to enable users to use this backend with faster XNNPACK quantized ops. **BC-breaking notes:** The QConfigs returned by `get_default_qconfig("qnnpack")` and `get_default_qat_qconfig("qnnpack")` are changed to reflect the new constraints imposed by the backend. These default QConfigs are still compatible with the BackendConfig returned by `get_qnnpack_backend_config()`. However, existing non-default QConfigs that did not impose the constraints described above will no longer work with the QNNPACK BackendConfig. The resulting behavior in this case is that the corresponding patterns will no longer be quantized, and a warning explaining what the missing constraints are will be logged. **Test Plan:** python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo [ghstack-poisoned]

…BackendConfig" **Summary:** This commit enforces the following constraints on the QNNPACK BackendConfig: - `quant_min_lower_bound` = -127 for weight - `quant_max_upper_bound` = 127 for weight - `scale_min_lower_bound` = 2 ** -12 for both activations and weight These constraints will enable users to use this backend with faster XNNPACK quantized ops and are consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT variants. For more detail on why these exact values were chosen, please see the description of #74396. Note that there are currently no restrictions on the qscheme in DTypeConfig. This should be added in the future to further enforce the restriction that the weights must be either per_tensor_symmetric or per_channel_symmetric. **Test Plan:** python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo [ghstack-poisoned]

**Summary:** This commit enforces the following constraints on the QNNPACK BackendConfig: - `quant_min_lower_bound` = -127 for weight - `quant_max_upper_bound` = 127 for weight - `scale_min_lower_bound` = 2 ** -12 for both activations and weight These constraints will enable users to use this backend with faster XNNPACK quantized ops and are consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT variants. For more detail on why these exact values were chosen, please see the description of #74396. Note that there are currently no restrictions on the qscheme in DTypeConfig. This should be added in the future to further enforce the restriction that the weights must be either per_tensor_symmetric or per_channel_symmetric. **Test Plan:** python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo [ghstack-poisoned]

…endConfig" **Summary:** This commit enforces the following constraints on the QNNPACK BackendConfig: - `quant_min_lower_bound` = -127 for qint8 weight - `quant_max_upper_bound` = 127 for qint8 weight - `scale_min_lower_bound` = 2 ** -12 for qint8 activations and weight These constraints will enable users to use this BackendConfig with faster XNNPACK quantized ops. They are also consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT variants. For more detail on why these exact values were chosen, please see the description of #74396. Note that there are currently no restrictions on the qscheme in DTypeConfig. This should be added in the future to further enforce the restriction that the weights must be quantized with either per_tensor_symmetric or per_channel_symmetric. Existing default QConfigs such as `get_default_qconfig("qnnpack")` and `get_default_qat_qconfig("qnnpack")` will continue to be supported, but only for the existing dtypes, e.g. quint8 activations for weighted ops like linear and conv. In the future, we should revisit whether to enable XNNPACK ops using these QConfigs as well. **Test Plan:** python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo [ghstack-poisoned]

**Summary:** This commit enforces the following constraints on the QNNPACK BackendConfig: - `quant_min_lower_bound` = -127 for qint8 weight - `quant_max_upper_bound` = 127 for qint8 weight - `scale_min_lower_bound` = 2 ** -12 for qint8 activations and weight These constraints will enable users to use this BackendConfig with faster XNNPACK quantized ops. They are also consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT variants. For more detail on why these exact values were chosen, please see the description of #74396. Note that there are currently no restrictions on the qscheme in DTypeConfig. This should be added in the future to further enforce the restriction that the weights must be quantized with either per_tensor_symmetric or per_channel_symmetric. Existing default QConfigs such as `get_default_qconfig("qnnpack")` and `get_default_qat_qconfig("qnnpack")` will continue to be supported, but only for the existing dtypes, e.g. quint8 activations for weighted ops like linear and conv. In the future, we should revisit whether to enable XNNPACK ops using these QConfigs as well. **Test Plan:** python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo [ghstack-poisoned]

…endConfig" **Summary:** This commit enforces the following constraints on the QNNPACK BackendConfig: - `quant_min_lower_bound` = -127 for qint8 weight - `quant_max_upper_bound` = 127 for qint8 weight - `scale_min_lower_bound` = 2 ** -12 for qint8 activations and weight These constraints will enable users to use this BackendConfig with faster XNNPACK quantized ops. They are also consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT variants. For more detail on why these exact values were chosen, please see the description of #74396. Note that there are currently no restrictions on the qscheme in DTypeConfig. This should be added in the future to further enforce the restriction that the weights must be quantized with either per_tensor_symmetric or per_channel_symmetric. Existing default QConfigs such as `get_default_qconfig("qnnpack")` and `get_default_qat_qconfig("qnnpack")` will continue to be supported, but only for the existing dtypes, e.g. quint8 activations for weighted ops like linear and conv. In the future, we should revisit whether to enable XNNPACK ops using these QConfigs as well. **Test Plan:** python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo [ghstack-poisoned]

**Summary:** This commit enforces the following constraints on the QNNPACK BackendConfig: - `quant_min_lower_bound` = -127 for qint8 weight - `quant_max_upper_bound` = 127 for qint8 weight - `scale_min_lower_bound` = 2 ** -12 for qint8 activations and weight These constraints will enable users to use this BackendConfig with faster XNNPACK quantized ops. They are also consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT variants. For more detail on why these exact values were chosen, please see the description of #74396. Note that there are currently no restrictions on the qscheme in DTypeConfig. This should be added in the future to further enforce the restriction that the weights must be quantized with either per_tensor_symmetric or per_channel_symmetric. Existing default QConfigs such as `get_default_qconfig("qnnpack")` and `get_default_qat_qconfig("qnnpack")` will continue to be supported, but only for the existing dtypes, e.g. quint8 activations for weighted ops like linear and conv. In the future, we should revisit whether to enable XNNPACK ops using these QConfigs as well. **Test Plan:** python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo [ghstack-poisoned]

**Summary:** This commit enforces the following constraints on the QNNPACK BackendConfig: - `quant_min_lower_bound` = -127 for qint8 weight - `quant_max_upper_bound` = 127 for qint8 weight - `scale_min_lower_bound` = 2 ** -12 for qint8 activations and weight These constraints will enable users to use this BackendConfig with faster XNNPACK quantized ops. They are also consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT variants. For more detail on why these exact values were chosen, please see the description of #74396. Note that there are currently no restrictions on the qscheme in DTypeConfig. This should be added in the future to further enforce the restriction that the weights must be quantized with either per_tensor_symmetric or per_channel_symmetric. Existing default QConfigs such as `get_default_qconfig("qnnpack")` and `get_default_qat_qconfig("qnnpack")` will continue to be supported, but only for the existing dtypes, e.g. quint8 activations for weighted ops like linear and conv. In the future, we should revisit whether to enable XNNPACK ops using these QConfigs as well. **Test Plan:** python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo ghstack-source-id: 0aff560 Pull Request resolved: #85863

…endConfig" **Summary:** This commit enforces the following constraints on the QNNPACK BackendConfig: - `quant_min_lower_bound` = -127 for qint8 weight - `quant_max_upper_bound` = 127 for qint8 weight - `scale_min_lower_bound` = 2 ** -12 for qint8 activations and weight These constraints will enable users to use this BackendConfig with faster XNNPACK quantized ops. They are also consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT variants. For more detail on why these exact values were chosen, please see the description of #74396. Note that there are currently no restrictions on the qscheme in DTypeConfig. This should be added in the future to further enforce the restriction that the weights must be quantized with either per_tensor_symmetric or per_channel_symmetric. Existing default QConfigs such as `get_default_qconfig("qnnpack")` and `get_default_qat_qconfig("qnnpack")` will continue to be supported, but only for the existing dtypes, e.g. quint8 activations for weighted ops like linear and conv. In the future, we should revisit whether to enable XNNPACK ops using these QConfigs as well. **Test Plan:** python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo [ghstack-poisoned]

**Summary:** This commit enforces the following constraints on the QNNPACK BackendConfig: - `quant_min_lower_bound` = -127 for qint8 weight - `quant_max_upper_bound` = 127 for qint8 weight - `scale_min_lower_bound` = 2 ** -12 for qint8 activations and weight These constraints will enable users to use this BackendConfig with faster XNNPACK quantized ops. They are also consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT variants. For more detail on why these exact values were chosen, please see the description of #74396. Note that there are currently no restrictions on the qscheme in DTypeConfig. This should be added in the future to further enforce the restriction that the weights must be quantized with either per_tensor_symmetric or per_channel_symmetric. Existing default QConfigs such as `get_default_qconfig("qnnpack")` and `get_default_qat_qconfig("qnnpack")` will continue to be supported, but only for the existing dtypes, e.g. quint8 activations for weighted ops like linear and conv. In the future, we should revisit whether to enable XNNPACK ops using these QConfigs as well. **Test Plan:** python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo [ghstack-poisoned]

…endConfig" **Summary:** This commit enforces the following constraints on the QNNPACK BackendConfig: - `quant_min_lower_bound` = -127 for qint8 weight - `quant_max_upper_bound` = 127 for qint8 weight - `scale_min_lower_bound` = 2 ** -12 for qint8 activations and weight These constraints will enable users to use this BackendConfig with faster XNNPACK quantized ops. They are also consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT variants. For more detail on why these exact values were chosen, please see the description of #74396. Note that there are currently no restrictions on the qscheme in DTypeConfig. This should be added in the future to further enforce the restriction that the weights must be quantized with either per_tensor_symmetric or per_channel_symmetric. Existing default QConfigs such as `get_default_qconfig("qnnpack")` and `get_default_qat_qconfig("qnnpack")` will continue to be supported, but only for the existing dtypes, e.g. quint8 activations for weighted ops like linear and conv. In the future, we should revisit whether to enable XNNPACK ops using these QConfigs as well. **Test Plan:** python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo [ghstack-poisoned]

**Summary:** This commit enforces the following constraints on the QNNPACK BackendConfig: - `quant_min_lower_bound` = -127 for qint8 weight - `quant_max_upper_bound` = 127 for qint8 weight - `scale_min_lower_bound` = 2 ** -12 for qint8 activations and weight These constraints will enable users to use this BackendConfig with faster XNNPACK quantized ops. They are also consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT variants. For more detail on why these exact values were chosen, please see the description of #74396. Note that there are currently no restrictions on the qscheme in DTypeConfig. This should be added in the future to further enforce the restriction that the weights must be quantized with either per_tensor_symmetric or per_channel_symmetric. Existing default QConfigs such as `get_default_qconfig("qnnpack")` and `get_default_qat_qconfig("qnnpack")` will continue to be supported, but only for the existing dtypes, e.g. quint8 activations for weighted ops like linear and conv. In the future, we should revisit whether to enable XNNPACK ops using these QConfigs as well. **Test Plan:** python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo [ghstack-poisoned]

…endConfig" **Summary:** This commit enforces the following constraints on the QNNPACK BackendConfig: - `quant_min_lower_bound` = -127 for qint8 weight - `quant_max_upper_bound` = 127 for qint8 weight - `scale_min_lower_bound` = 2 ** -12 for qint8 activations and weight These constraints will enable users to use this BackendConfig with faster XNNPACK quantized ops. They are also consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT variants. For more detail on why these exact values were chosen, please see the description of #74396. Note that there are currently no restrictions on the qscheme in DTypeConfig. This should be added in the future to further enforce the restriction that the weights must be quantized with either per_tensor_symmetric or per_channel_symmetric. Existing default QConfigs such as `get_default_qconfig("qnnpack")` and `get_default_qat_qconfig("qnnpack")` will continue to be supported, but only for the existing dtypes, e.g. quint8 activations for weighted ops like linear and conv. In the future, we should revisit whether to enable XNNPACK ops using these QConfigs as well. **Test Plan:** python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo [ghstack-poisoned]

**Summary:** This commit enforces the following constraints on the QNNPACK BackendConfig: - `quant_min_lower_bound` = -127 for qint8 weight - `quant_max_upper_bound` = 127 for qint8 weight - `scale_min_lower_bound` = 2 ** -12 for qint8 activations and weight These constraints will enable users to use this BackendConfig with faster XNNPACK quantized ops. They are also consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT variants. For more detail on why these exact values were chosen, please see the description of #74396. Note that there are currently no restrictions on the qscheme in DTypeConfig. This should be added in the future to further enforce the restriction that the weights must be quantized with either per_tensor_symmetric or per_channel_symmetric. Existing default QConfigs such as `get_default_qconfig("qnnpack")` and `get_default_qat_qconfig("qnnpack")` will continue to be supported, but only for the existing dtypes, e.g. quint8 activations for weighted ops like linear and conv. In the future, we should revisit whether to enable XNNPACK ops using these QConfigs as well. **Test Plan:** python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo [ghstack-poisoned]

**Summary:** This commit enforces the following constraints on the QNNPACK BackendConfig: - `quant_min_lower_bound` = -127 for qint8 weight - `quant_max_upper_bound` = 127 for qint8 weight - `scale_min_lower_bound` = 2 ** -12 for qint8 activations and weight These constraints will enable users to use this BackendConfig with faster XNNPACK quantized ops. They are also consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT variants. For more detail on why these exact values were chosen, please see the description of #74396. Note that there are currently no restrictions on the qscheme in DTypeConfig. This should be added in the future to further enforce the restriction that the weights must be quantized with either per_tensor_symmetric or per_channel_symmetric. Existing default QConfigs such as `get_default_qconfig("qnnpack")` and `get_default_qat_qconfig("qnnpack")` will continue to be supported, but only for the existing dtypes, e.g. quint8 activations for weighted ops like linear and conv. In the future, we should revisit whether to enable XNNPACK ops using these QConfigs as well. **Test Plan:** python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo ghstack-source-id: 30dd311 Pull Request resolved: #85863

**Summary:** This commit enforces the following constraints on the QNNPACK BackendConfig: - `quant_min_lower_bound` = -127 for qint8 weight - `quant_max_upper_bound` = 127 for qint8 weight - `scale_min_lower_bound` = 2 ** -12 for qint8 activations and weight These constraints will enable users to use this BackendConfig with faster XNNPACK quantized ops. They are also consistent with the existing settings in `default_symmetric_qnnpack_qconfig` and its per_channel and QAT variants. For more detail on why these exact values were chosen, please see the description of #74396. Note that there are currently no restrictions on the qscheme in DTypeConfig. This should be added in the future to further enforce the restriction that the weights must be quantized with either per_tensor_symmetric or per_channel_symmetric. Existing default QConfigs such as `get_default_qconfig("qnnpack")` and `get_default_qat_qconfig("qnnpack")` will continue to be supported, but only for the existing dtypes, e.g. quint8 activations for weighted ops like linear and conv. In the future, we should revisit whether to enable XNNPACK ops using these QConfigs as well. **Test Plan:** python test/test_quantization.py TestQuantizeFx.test_qnnpack_backend_config **Reviewers:** jerryzh168, vkuzo **Subscribers:** jerryzh168, vkuzo Pull Request resolved: #85863 Approved by: https://github.com/jerryzh168

pytorch-bot bot added the ciflow/default label Mar 17, 2022

facebook-github-bot added the cla signed label Mar 17, 2022

facebook-github-bot added the fb-exported label Mar 17, 2022

pytorchmergebot closed this in cfe1a41 Mar 18, 2022

WBobby mentioned this pull request Aug 17, 2022

Add ROCm5.2.3/AMDGPU support for PyTorch WBobby/pytorch#2

Closed

andrewor14 mentioned this pull request Sep 28, 2022

[Quant] Enable XNNPACK ops in QNNPACK BackendConfig #85863

Closed

vkuzo mentioned this pull request Dec 22, 2022

quantization observers: can we relax the default epsilon value? #90288

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[quant] Add default symmetric qconfig for qnnpack #74396

[quant] Add default symmetric qconfig for qnnpack #74396

Uh oh!

digantdesai commented Mar 17, 2022

Uh oh!

pytorch-bot bot commented Mar 17, 2022

⚛️ CI Flow

Uh oh!

facebook-github-bot commented Mar 17, 2022 •

edited

Loading

Uh oh!

facebook-github-bot commented Mar 17, 2022

Uh oh!

github-actions bot commented Mar 18, 2022

Uh oh!

Uh oh!

[quant] Add default symmetric qconfig for qnnpack #74396

[quant] Add default symmetric qconfig for qnnpack #74396

Uh oh!

Conversation

digantdesai commented Mar 17, 2022

New qconfig default_symmetric_qnnpack_qconfig

Restrictions on weights

qengine/backend = qnnpack and XNNPACK ops

Updated EPS value:

Uh oh!

pytorch-bot bot commented Mar 17, 2022

⚛️ CI Flow

Uh oh!

facebook-github-bot commented Mar 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

💊 CI failures summary and remediations

Uh oh!

facebook-github-bot commented Mar 17, 2022

Uh oh!

github-actions bot commented Mar 18, 2022

Uh oh!

Uh oh!

New qconfig `default_symmetric_qnnpack_qconfig`

qengine/backend = `qnnpack` and XNNPACK ops

facebook-github-bot commented Mar 17, 2022 •

edited

Loading