-
Notifications
You must be signed in to change notification settings - Fork 647
Qualcomm AI Engine Direct - Enable custom operator #8726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qualcomm AI Engine Direct - Enable custom operator #8726
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8726
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit de02835 with merge base c6c3616 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Hi @cccclai, This PR is to support custom kernel in QNN Backend. |
|
||
|
||
@impl(my_op_lib, "mul3", dispatch_key="CompositeExplicitAutograd") | ||
def mul3_impl(a: torch.Tensor) -> torch.Tensor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may not know enough details, but can you tell me how qnn backend knows this custom op can be consumed? Previously I was thinking qnn custom ops can be registered in a specific namespace, but it looks a bit different to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly. QNN uses the op package mechanism to create custom ops. Once the op package is prepared, it is registered using qnn_backend_register_op_package. In the op builder, you provide the corresponding op_package_name , qnn_op_type_name and schema to create a QNN node.
In the executorh, through compile_spec, you pass QnnExecuTorchOpPackageOptions which includes the op package info and custom_op_name. Using the custom_op_name, such as my_op_lib.mul3.default, a CustomOp builder is created to consume the custom op.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry a bit late on this, need some time to read
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting approach, the custom op is authored as torch custom op and then lowered through ET, annotating and also partitioning through compile_spec.
Curious, did you consider other design choices?
For example, if someone were to use a custom op, they have to write all this, an alternative would be to take set of "already partitioned" nodes, and convert them into a custom op package in the preprocess function.
I have more questions but I haven't read the PR in detail yet :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, if someone were to use a custom op, they have to write all this, an alternative would be to take set of "already partitioned" nodes, and convert them into a custom op package in the preprocess function.
Let me clarify your point about "already partitioned" nodes. Do you mean define a namespace for QNN custom ops, identifying this namespace during partitioning to return True, and then creating the corresponding builder during the preprocess stage?
If so, my initial design choice to create the custom op builder during partitioning was because QNN provides an Op validation API. When dealing with custom ops, users can implement how to validate the op within the op package. Therefore, I chose to create the custom op builder during partitioning to validate the op.
I have more questions but I haven't read the PR in detail yet :(
Feel free to ask them whenever you're ready. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we have htp x86 simulator support to make the test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we have htp x86 simulator support to make the test?
Sure. Let me add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The readme is nice, I will try to reproduce this. Thanks!
064d12f
to
5f65810
Compare
Apologies this is still pending. I will also help review this next week. |
Will finish review by the end of this week, thank you for being patient. |
@@ -44,7 +49,11 @@ def __init__( | |||
skip_node_id_set: set = None, | |||
skip_node_op_set: set = None, | |||
): | |||
self.node_visitors = node_visitor.get_node_visitors(edge_program) | |||
python_options = flatbuffer_to_option(compiler_specs[0].value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is python_options
and why do we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is a custom op users wrote for qnn, and they failed to tag them, are we able to error out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this python options are used to create custom op builder. If user didn't add custom op package info into compile_spec and the graph contained custom op, error will happen here.
op_wrapper = self.node_visitors[node.target.__name__].define_node( |
Finally got time to finish reading, a couple of questions (can be follow up as well).
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approve as I generally like the flexibility of this approach. The only I'm worried is user experience, and would like us improving it.
Originally, our thought on supporting custom ops was to force users to register the custom ops under a specific namespace, like qnn namespace. It's less flexible, but easier to debug, given that users are required to write the qnn package anyway for the qnn custom ops. It's similar to how you support AIHub model. What do you think?
|
I think it's fine for me. If I add a check for the namespace based on the current design, what are your thoughts? |
5f65810
to
eaf8aa6
Compare
@cccclai I have rebased this PR. Thanks! |
5b04add
to
3bbb5c0
Compare
@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
The CI is currently failing because we pin to 338393f8 in flatbuffers internally, which is different than the one in executorch in open source... https://github.com/google/flatbuffers/commits/338393f8/ |
Thanks for enabling x86 in the latest commit, would you be able to help fixing the issue due to flatbuffers version mismatch? We may need to downgrade the flatbuffers in open source unfortunately |
Let me make clear for the following.
++ @haowhsu-quic aware |
The mismatch issue is an existing issue, and this PR uses the newer feature in flatbuffers, causing build failing when compiling with the old flatbuffers version. If you downgrade the flatbuffers to this older version (https://github.com/google/flatbuffers/commits/338393f8/), the error message is like following
|
When I checkout to 338393f8 in flatbuffers, I get the following error with ./backends/qualcomm/scripts/build.sh. Error logError while generating /local3/mnt/workspace/shewu/executorch/executorch_shewu/executorch/build-android/executorch_srcs.cmake. Exit code: 1 Output:Error: Traceback (most recent call last): The above exception was the direct cause of the following exception: Traceback (most recent call last): Caused by:
/buffer_ref.h", "flatbuffers/include/flatbuffers/code_generator.h", "flatbuffers/include/flatbuffers/default_allocator.h", "flatbuffers/include/flatbuffers/detached_buffer.h", "flatbuffers/include/flatbuffers/file_manager.h", "flat CMake Error at tools/cmake/Utils.cmake:109 (message): -- Configuring incomplete, errors occurred! |
Woops, I got it. This commit is very different from now. It is missing some header file... |
eff8673
to
a22cdd2
Compare
It seems flatbuffers::Vector only takes one template argument. I've made the adjustments. Could you please try again? |
@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
It seems like flatbuffers::Vector issue is resolved now |
But there is a different issue... Error logfrom executorch.backends.qualcomm.serialization.qc_schema import ( File "/data/sandcastle/boxes/eden-trunk-hg-full-fbsource/buck-out/v2/gen/fbcode/192ecd433d810457/executorch/examples/models/llama/fb/__test_llama_qnn__/test_llama_qnn#link-tree/executorch/backends/qualcomm/serialization/qc_schema.py", line 178, in [@DataClass](https://www.internalfb.com/intern/profile/dataclass) ^^^^^^^^^ File "/usr/local/fbcode/platform010/lib/python3.12/dataclasses.py", line 1275, in dataclass return wrap(cls) ^^^^^^^^^ File "/usr/local/fbcode/platform010/lib/python3.12/dataclasses.py", line 1265, in wrap return _process_class(cls, init, repr, eq, order, unsafe_hash, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/fbcode/platform010/lib/python3.12/dataclasses.py", line 994, in _process_class cls_fields.append(_get_field(cls, name, type, kw_only)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/fbcode/platform010/lib/python3.12/dataclasses.py", line 852, in _get_field raise ValueError(f'mutable default {type(f.default)} for field ' ValueError: mutable default for field op_package_options is not allowed: use default_factory |
a22cdd2
to
f28ab0e
Compare
It seems that dataclass cannot set mutable value to a class attributes. |
@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Hmm seems like importing fail, can you rebase? |
f28ab0e
to
655e416
Compare
Done. Thanks :) |
@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
1 similar comment
@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Hmm our infra was a bit flaky last week, can you rebase again? It should recover now |
Summary: - Support to register op package in QNN Backend - Add example script to run torch custom op with QNN Op package - Allow op package override torch built-in operator - Add op package example - move test_custom_op to TestUtilScript - Modify the flag of dlopen for QNN library - Generate custom op based on the meta and _schema.arguments of torch.fx.Node - Add README for the custom op
655e416
to
de02835
Compare
@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
1 similar comment
@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary:
Reproduce commands: