Skip to content

Add aot example with Neutron Backend #10871

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

robert-kalmar
Copy link
Collaborator

@robert-kalmar robert-kalmar commented May 14, 2025

Summary

This PR add a AoT example with the eIQ Neutron Backend. The Backend is demonstrated on tiny CNN model named CifarNet, trained on Cifar10 dataset, which is part of the PR.

Test plan

Manual testing, executing the example based on steps in the Readme.md and validating the PTE on i.MX RT700 platform with the Neutron Backend runtime.

Resolves #10898

cc @digantdesai @JakeStevens , @JakeStevens , @skywall , @jirioc

Copy link

pytorch-bot bot commented May 14, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10871

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Unrelated Failure

As of commit c7d4b49 with merge base 77f16dc (image):

NEW FAILURE - The following job has failed:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 14, 2025
@robert-kalmar robert-kalmar force-pushed the upstream/release-mcux-25.03-full/aot-example branch from e5ed112 to 46c2a58 Compare May 14, 2025 11:56
@robert-kalmar
Copy link
Collaborator Author

robert-kalmar commented May 14, 2025

@pytorchbot label "module: nxp" "release notes: nxp"

Copy link

pytorch-bot bot commented May 14, 2025

Didn't find following labels among repository labels: ,,label

@pytorch-bot pytorch-bot bot added module: nxp Issues related to NXP Neutron NPU delegation and code under backends/nxp/ release notes: nxp Changes to the NXP Neutron backend delegate labels May 14, 2025
action="store_true",
required=False,
default=False,
help="Flag for producing ArmBackend delegated model",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
help="Flag for producing ArmBackend delegated model",
help="Flag for producing NeutronBackend delegated model",

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model, example_inputs, strict=True
)

# TODO: Add Neutron ATen Passes, once https://github.com/pytorch/executorch/pull/10579 is merged
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: file a task so we can track and not lose this

Copy link
Collaborator Author

@robert-kalmar robert-kalmar May 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#10898

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#10579 is now merged!

"_portable_lib.cpython* using --portable_lib CLI options. \n"
"This is required for running quantized models with unquantized input."
)
sys.exit(-1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you either: (1) just not sys.exit entirely and let it fail loudly later when it will hit the runtime exception or (2) add a CLI arg to allow skipping this part-- and the part below for the torch.loads

In internal infra, these libraries are loaded a slightly different way and I do not actually pass the .so on command line, and it is not loaded a few lines below.

Copy link
Collaborator Author

@robert-kalmar robert-kalmar May 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


Ok, so reverted back to our original solution. There is only a warning raised and normally fails later when exporting to ExecuTorch Program:

 # 6. Export to ExecuTorch program
    try:
        exec_prog = edge_program.to_executorch(
            config=ExecutorchBackendConfig(extract_delegate_segments=False)
        )
    except RuntimeError as e:
        if "Missing out variants" in str(e.args[0]):
            raise RuntimeError(
                e.args[0]
                + ".\nThis likely due to an external so library not being loaded. Supply a path to it with the "
                "--portable_lib flag."
            ).with_traceback(e.__traceback__) from None
        else:
            raise e

x = self.conv3(x)
x = self.pool2(x)

# The output of the previous MaxPool has shape [batch, 64, 4, 4] ([batch, 4, 4, 64] in TFLite). When running
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# The output of the previous MaxPool has shape [batch, 64, 4, 4] ([batch, 4, 4, 64] in TFLite). When running
# The output of the previous MaxPool has shape [batch, 64, 4, 4] ([batch, 4, 4, 64] in Neutron IR). When running

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x = self.pool2(x)

# The output of the previous MaxPool has shape [batch, 64, 4, 4] ([batch, 4, 4, 64] in TFLite). When running
# inference of the `FullyConnected`, TFlite will automatically collapse the channels and spatial dimensions and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# inference of the `FullyConnected`, TFlite will automatically collapse the channels and spatial dimensions and
# inference of the `FullyConnected`, Neutron IR will automatically collapse the channels and spatial dimensions and

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parser.add_argument(
"-p",
"--portable_lib",
required=True,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably shouldn't be required because portable library is loaded only when --quantize=True.

Copy link
Collaborator Author

@robert-kalmar robert-kalmar May 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Thanks, fixed in latest push.


# For quantization we need to build the quantized_ops_aot_lib.so and _portable_lib.*.so
# Use this CMake options
# -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this documentation up to date? Is portable lib built just by specifying these two flags?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The quantized_ops_aot_lib links to portable_lib

$ ldd ./venv3.10/lib/python3.10/site-packages/executorch/kernels/quantized/libquantized_ops_aot_lib.so
        _portable_lib.cpython-310d-x86_64-linux-gnu.so => not found
       ....

For some reason we must load the portable_lib manually prior to libquantized_ops_aot_lib.so, the dlopen does not not find is by its own.

Copy link
Collaborator Author

@robert-kalmar robert-kalmar May 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


FYI @skywall , we do not need any custom library loading for the quantized kernels out variants. There are already a python packages for this:

import executorch.extension.pybindings.portable_lib
import executorch.kernels.quantized 

Thanks to @digantdesai for the review items which helped me to find it out.

@robert-kalmar robert-kalmar force-pushed the upstream/release-mcux-25.03-full/aot-example branch from 46c2a58 to 2397cb0 Compare May 15, 2025 11:10
2. After building the ExecuTorch you shall have the `libquantized_ops_aot_lib.so` and `_portable_lib.<python_version>.so` located in the `pip_out/lib` folder. We will need this library when generating the quantized cifarnet ExecuTorch model. So as first step we will find it:
```commandline
$ find . -name "libquantized_ops_aot_lib.so"
./pip-out/lib.linux-x86_64-cpython-310-pydebug/executorch/kernels/quantized/libquantized_ops_aot_lib.so
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI I added optimized cortex-M q/dq int8 op if you want to use that , it is still quite early days for that lib

./pip-out/lib.linux-x86_64-cpython-310-pydebug/executorch/kernels/quantized/libquantized_ops_aot_lib.so

$ find . -name "_portable_lib.cpython-310d-x86_64-linux-gnu.so"
./pip-out/lib.linux-x86_64-cpython-310-pydebug/executorch/extension/pybindings/_portable_lib.cpython-310d-x86_64-linux-gnu.so
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this using selective build?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean.

Copy link
Collaborator Author

@robert-kalmar robert-kalmar May 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I understand where you are heading. We needed the quantized_aot_lib to get the out variants for quantize/dequantize_per_tensor operators.
I find there are already python bindings and modules to solve it:

import executorch.extension.pybindings.portable_lib
import executorch.kernels.quantized 

Comment on lines 255 to 256
torch.ops.load_library(args.portable_lib)
torch.ops.load_library(args.so_library)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need these? just include the python module perhaps?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right (obviously) , we don't. Importing the python modules instead.

import executorch.extension.pybindings.portable_lib
import executorch.kernels.quantized 

Thanks for the finding, it helped me to locate these python modules.

logger.info(f"Using pre-trained weights from `{state_dict_file}`.")
cifar_net.load_state_dict(torch.load(state_dict_file, weights_only=True))

if train:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Copy link
Contributor

@digantdesai digantdesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Thanks.

@digantdesai
Copy link
Contributor

Ready to merge? Fix linter please?

@robert-kalmar
Copy link
Collaborator Author

Ready to merge? Fix linter please?

Not yet, updating quantizer to the recent changes : moving torchao to torch.ao

@robert-kalmar robert-kalmar force-pushed the upstream/release-mcux-25.03-full/aot-example branch 2 times, most recently from 3018aea to 2941a74 Compare May 23, 2025 11:33
@robert-kalmar
Copy link
Collaborator Author

Linting error - fixed.
Quantizer invocation (using torchao instead of torch.ao) in the aot_neutron_example to align with updates in #10294 -fixed
Importing quantized operators instead of loading the *.so library - fixed

Now it is ready to merge.

@robert-kalmar
Copy link
Collaborator Author

3 checks failed. All with missing the "llm" preset. They were added in a later commit (c256723#diff-fc10486ef573a9c92fe4a135b8a1b20157154af6e83dacfd1ea046bda7814c84). I guess, those failures are unrelated with changes in the PR.

Although I wonder, why those tests got even triggered, as they are not in the .github/workflows of this codebase.

@digantdesai
Copy link
Contributor

Let's re-merge the CI PR, and then we can merge this, so we have some confidence in this and know we won't be regressing. Thanks.

@robert-kalmar robert-kalmar force-pushed the upstream/release-mcux-25.03-full/aot-example branch from 2941a74 to c7d4b49 Compare June 10, 2025 09:10

2. Now run the `aot_neutron_compile.py` example with the `cifar10` model
```commandline
$ python -m examples.nxp.aot_neutron_compile --quantize \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we include this in the CI?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about it, but as the simulator is not yet ready only reasonable check is if the example not crash and produce some output.

I can include in the CI. No preference here.

Copy link
Contributor

@digantdesai digantdesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Is the setup.sh empty for a reason?

@robert-kalmar
Copy link
Collaborator Author

robert-kalmar commented Jun 10, 2025

Looks good. Is the setup.sh empty for a reason?

It is not empty, just it content has not changed - https://github.com/pytorch/executorch/blob/2941a74be7f4d49198087d3983d591911c614260/examples/nxp/setup.sh
The change is the file mode - adding the execute bit chmod +x .

The WebUI is misleading here. By "empty file" it evidently means empty diff 🙃

@robert-kalmar robert-kalmar marked this pull request as draft June 18, 2025 12:10
@robert-kalmar
Copy link
Collaborator Author

Converting to draft unless the NXP Backend CI is back (#11756)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: nxp Issues related to NXP Neutron NPU delegation and code under backends/nxp/ release notes: nxp Changes to the NXP Neutron backend delegate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NeutronBackend: Add Neutron ATen Passes to Neutron aot example
5 participants