-
Notifications
You must be signed in to change notification settings - Fork 24.4k
Unify Quantization APIs for add, pool and relu #26335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: Use the backend engine flag to call QNNPACK for quantized ops. Test Plan: python test/test_quantized.py TestQNNPACKOps Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Use the backend engine flag to call QNNPACK for quantized ops. Test Plan: python test/test_quantized.py TestQNNPACKOps Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Use the backend engine flag to call QNNPACK for quantized ops. Test Plan: python test/test_quantized.py TestQNNPACKOps Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Use the backend engine flag to call QNNPACK for quantized ops. Test Plan: python test/test_quantized.py TestQNNPACKOps Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Use the backend engine flag to call QNNPACK for quantized ops. Test Plan: python test/test_quantized.py TestQNNPACKOps Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Use the backend engine flag to call QNNPACK for quantized ops. Test Plan: python test/test_quantized.py TestQNNPACKOps Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Use the backend engine flag to call QNNPACK for quantized ops. Test Plan: python test/test_quantized.py TestQNNPACKOps Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Use the backend engine flag to call QNNPACK for quantized ops. Test Plan: python test/test_quantized.py TestQNNPACKOps Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
"qnnpack_maxpool(): Expected padding to be 2-dimensional: got ", | ||
padding.size()); | ||
|
||
Tensor input_contig = input.contiguous(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need a check for input layout here? How do you know that the input is nhwc contiguous?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll make the change similar to #26242 once that lands for all qnnpack ops.
Summary: Use the backend engine flag to call QNNPACK for quantized ops. Test Plan: python test/test_quantized.py TestQNNPACKOps Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Use the backend engine flag to call QNNPACK for quantized ops. Test Plan: python test/test_quantized.py TestQNNPACKOps Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
It might well be me, but on raspbian (32bit armv7 hf) and
I do get some torch.uint8 tensor when I do the same with a vanilla build on x86 (i.e. with FBGEMM). |
Did you try setting torch.backends.quantized.engine = torch.qnnpack? You can set the same on server side (x86) as well to confirm qnnpack works. |
Indeed, this fixes the example case and seems to return something in the quantized model. Thank you! At some point, |
Summary: Use the backend engine flag to call QNNPACK for quantized ops. Test Plan: python test/test_quantized.py TestQNNPACKOps Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
int64_t inW = input.size(3); | ||
// TODO: change it to contiguous(MemoryFormat::ChannelsLast) once a perf | ||
// regression of it is fixed. | ||
Tensor input_contig = input.permute({0, 2, 3, 1}).contiguous(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dzhulgakov please review this. I changed it to permute input tensor to NHWC similar to your changes for qconv.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, it should be fine
Summary: Use the backend engine flag to call QNNPACK for quantized ops. Test Plan: python test/test_quantized.py TestQNNPACKOps Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I'll let Jerry to review in the details.
Would be really nice to unify the tests too and extend the benchmarks in benchmarks/op_benchmarks to go through both engines
int64_t inW = input.size(3); | ||
// TODO: change it to contiguous(MemoryFormat::ChannelsLast) once a perf | ||
// regression of it is fixed. | ||
Tensor input_contig = input.permute({0, 2, 3, 1}).contiguous(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, it should be fine
Summary: Use the backend engine flag to call QNNPACK for quantized ops. Test Plan: python test/test_quantized.py TestQNNPACKOps Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Use the backend engine flag to call QNNPACK for quantized ops. Test Plan: python test/test_quantized.py TestQNNPACKOps Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Use the backend engine flag to call QNNPACK for quantized ops. Test Plan: python test/test_quantized.py TestQNNPACKOps Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Summary: Use the backend engine flag to call QNNPACK for quantized ops. Test Plan: python test/test_quantized.py TestQNNPACKOps Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D17504331](https://our.internmc.facebook.com/intern/diff/D17504331) [ghstack-poisoned]
Summary: Pull Request resolved: pytorch/pytorch#26335 Use the backend engine flag to call QNNPACK for quantized ops. Test Plan: python test/test_quantized.py TestQNNPACKOps Imported from OSS Differential Revision: D17504331 fbshipit-source-id: 35cb2189067ac5cc6a7307179ef0335d1cec7b8f
This pull request has been merged in f337459. |
Summary: Pull Request resolved: pytorch#26335 Use the backend engine flag to call QNNPACK for quantized ops. Test Plan: python test/test_quantized.py TestQNNPACKOps Imported from OSS Differential Revision: D17504331 fbshipit-source-id: 35cb2189067ac5cc6a7307179ef0335d1cec7b8f
Stack from ghstack:
Summary:
Use the backend engine flag to call QNNPACK for quantized ops.
Test Plan:
python test/test_quantized.py TestQNNPACKOps
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: D17504331