Add _clone_dim_order portable kernel #12974

keyprocedure · 2025-07-29T20:06:24Z

Summary

This is PR 1 of 3 implementing a dim order aware clone op.

Currently, clone ops are removed during export as no-ops, causing memory layout (dim order) changes to be lost. This can cause backend failures, incorrect outputs when ops expect specific layouts, and performance degradation. This set of PRs introduces a dim order aware clone op, _clone_dim_order, which preserves memory layout changes by explicitly storing dim order information. This is implemented by replacing standard clone ops with this variant during export and updating the clone removal transform to preserve clones that change layout.

This PR adds the portable CPU kernel for the _clone_dim_order op, implementing a clone variant that preserves dim order at runtime. The portable kernel validates dtype and layout compatibility, resizes the output tensor if needed, and performs an element wise clone of the tensors.

Note: A future PR will add the ATen kernel for _clone_dim_order.

Related PRs:

PR 2: #12971 - Register _clone_dim_order op and map aten.clone
PR 3: #12976 - Update RemoveCloneOpsTransform to be dim_order aware

Fixes #12645

Test plan

Added kernel runtime tests to verify:

Tensors of all real dtypes are cloned correctly.
Failure when input and output tensor shapes mismatch.
Failure with unsupported memory formats.
Failure when non_blocking=true since the portable kernel only supports blocking data transfer.
Dynamic shape outputs are cloned with correct values.
Layout conversions are cloned correctly for contiguous to channels_last, channels_last to contiguous, and channels_last is preserved.

All runtime tests pass via:
build-ninja/kernels/test/portable_kernels_test

pytorch-bot · 2025-07-29T20:06:27Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/12974

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit e28b9b9 with merge base 2d4533a ():

NEW FAILURES - The following jobs have failed:

Build documentation / build (buck2) / Build doc (gh)
At least one of the pre-conditions you specified did not hold
pull / unittest / macos / macos-job (gh)
backends/xnnpack/test/ops/test_conv1d.py::TestConv1d::test_qs8_conv1d_batchnorm_seq

This comment was automatically generated by Dr. CI and updates every 15 minutes.

keyprocedure · 2025-07-29T20:25:59Z

@pytorchbot label "release notes: none"

Gasoonjia

@keyprocedure

Good start! Thanks for your great work!

Beyond than what we have now, it would be awesome if we can have a runtime test for the new runtime operator. https://github.com/pytorch/executorch/blob/main/kernels/test/op__to_dim_order_copy_test.cpp is the test for to_dim_order_copy and you can use that as an example.

Also please mark this PR as Open instead of Draft if it is ready to review. Thx!

Gasoonjia · 2025-08-04T19:14:14Z

exir/passes/dim_order_ops_registry.py

@@ -28,6 +28,14 @@
    "_empty_dim_order.out(int[] size, *, int[]? dim_order=None, Tensor(a!) out) -> Tensor(a!)"
 )

+lib.define(


It is ok to leave here since we are gonna need it in the future, but when we talk about adding portable kernels we mainly focus on the kernels in the runtime, specificly under executorch/kernels/portable.

That makes sense, but I needed to register the operator here otherwise the tests I added fail since there is no Python side reference to _clone_dim_order

Oh, should all the tests for this PR have been on the kernel side and not in test_memory_format_ops_pass.py?

This PR should only focus on our runtime changes, so no python side will refer to our new operator.

should all the tests for this PR have been on the kernel side and not in test_memory_format_ops_pass.py

Yes absolutely correct! This PR is only for portable kernel and its tests. Sorry for any misleading!

I’ve added the runtime test and all tests passed locally. I couldn’t run the DynamicShapeUnbound test since it depends on SupportedFeatures and supported_features.h doesn’t seem to be generated in OSS builds.

Please disregard my previous comment about the missing SupportedFeatures dependency, the issue was with my local build setup. All tests pass now.

Gasoonjia

LGTM! Some minor feedback but majority looks good!

Gasoonjia · 2025-08-06T17:22:31Z

kernels/test/op__clone_dim_order_test.cpp

+    }
+  }
+
+  /* %python


Please remove comments unrelated with tests

Gasoonjia · 2025-08-06T17:22:39Z

kernels/test/op__clone_dim_order_test.cpp

+  void test_dynamic_shape(
+      const std::vector<int32_t>& out_shape,
+      enum torch::executor::TensorShapeDynamism dynamism) {
+    /* %python


Gasoonjia · 2025-08-06T17:23:50Z

kernels/test/op__clone_dim_order_test.cpp

+TEST_F(OpDimOrderCloneTest, ContiguousToChannelsLast) {
+  TensorFactory<ScalarType::Float> tf;
+
+  Tensor x = tf.make_with_dimorder(


now x is using contiguous dim order (0,1,2,3) as default. Plz add a comment here for clarify

Gasoonjia · 2025-08-06T17:24:03Z

kernels/test/op__clone_dim_order_test.cpp

+       0.3597, 0.0911, 0.7719, 0.8151, 0.4296, 0.5552},
+      /*dim_order=*/{0, 2, 3, 1});
+
+  Tensor expected = tf.make_with_dimorder(


same as here for comment

keyprocedure · 2025-08-06T19:36:34Z

@Gasoonjia Everything runs fine locally but CI failed because of a missing dependency in copy_ops_util.h:
fatal error: 'executorch/kernels/portable/cpu/util/broadcast_util.h' file not found.

This happened after I refactored a function into copy_ops_util.h and added the import.
I added the import to targets.bzl, but CI still fails. Do you have insights into what I could be missing?

keyprocedure · 2025-08-07T15:28:01Z

@Gasoonjia Everything runs fine locally but CI failed because of a missing dependency in copy_ops_util.h: fatal error: 'executorch/kernels/portable/cpu/util/broadcast_util.h' file not found.

This happened after I refactored a function into copy_ops_util.h and added the import. I added the import to targets.bzl, but CI still fails. Do you have insights into what I could be missing?

I think I got it: broadcast_util wasn't being exported, so I added it to exported_deps for copy_ops_util.
Since I was building with CMake earlier, there weren't any dependency failures, but I was able to successfully build all portable kernels locally with buck2 after the fix.

Can we try CI again?

Gasoonjia · 2025-08-08T04:26:23Z

@keyprocedure so glad you've fixed the issue! Sry for late review. Restart ci.

keyprocedure · 2025-08-08T22:11:20Z

@keyprocedure so glad you've fixed the issue! Sry for late review. Restart ci.

No worries, I appreciate all the support :)

Progress with CI:
The 10 failing tests were all due to a link error:
Action failed: root//examples/portable/executor_runner:executor_runner (cxx_link_executable)
I've registered _clone_dim_order in op_registration_util.bzl and I can successfully build executor_runner locally.

Do you have any recommendations on which targets I should build locally to ensure everything that relies on new ops will build successfully or is there a Docker image available to run CI locally? I've tried to build entire dirs such as //examples/portable/... but run into dep failures unrelated to this PR.

Gasoonjia · 2025-08-08T22:36:33Z

shim_et/xplat/executorch/kernels/portable/op_registration_util.bzl

@@ -1329,6 +1329,13 @@ ATEN_OPS = (
            "//executorch/kernels/portable/cpu/util:copy_ops_util",


we can remove broadcast_util here right? Since copy_util will depend on it

Good catch. I'll remove it here and in op__to_dim_order_copy.cpp, then push once CI finishes.

Should I still push the changes for removing the unused broadcast_util dep, or save it for a follow up?

Gasoonjia · 2025-08-08T22:38:40Z

thansk for your great work! @keyprocedure
I don't think there's one single point target you can built for running CI locally. Thanks for bring it up! I think it is a good suggestion we can work on in the future for better local ci coverage!

Gasoonjia · 2025-08-08T23:07:19Z

BTW Have you tried this https://github.com/pytorch/executorch/blob/main/CONTRIBUTING.md#testing ? @keyprocedure

keyprocedure · 2025-08-09T00:23:50Z

BTW Have you tried this main/CONTRIBUTING.md#testing ? @keyprocedure

Thanks for sharing this!

After stopping some warnings from causing the build to fail, I ran the test script and everything passes except the PyTree EmptySpec test, which seems unrelated.

But I'll run this script to validate future changes.

Gasoonjia · 2025-08-11T16:51:51Z

ci looks good. Stamped!

keyprocedure added 2 commits July 29, 2025 11:44

Add clone_dim_order kernel (portable + ATen) and layout conversion tests

1f60bb7

Remove duplicate helper functions (moved to utils)

beeeebd

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 29, 2025

pytorch-bot bot added the release notes: none Do not include this in the release notes label Jul 29, 2025

This was referenced Jul 30, 2025

[EXIR] Register _clone_dim_order op and map aten.clone #12971

Open

[EXIR] Update RemoveCloneOpsTransform to be dim order aware #12976

Draft

keyprocedure added 2 commits August 2, 2025 18:31

Revert ATen _clone_dim_order implementation and refactoring

f8d18ec

Register _clone_dim_order and add operator tests

62bced0

keyprocedure added a commit to keyprocedure/executorch that referenced this pull request Aug 3, 2025

Remove _clone_dim_order op registration (moved to PR pytorch#12974)

1fe461f

keyprocedure changed the title ~~Add portable and ATen kernels for clone_dim_order op~~ Add _clone_dim_order portable kernel Aug 3, 2025

Gasoonjia reviewed Aug 4, 2025

View reviewed changes

keyprocedure added 3 commits August 4, 2025 16:03

Add broadcast_util dep for copy_ops_util.h

b4e7f7a

Remove Python side registration and tests for _clone_dim_order

a4f98ac

Add _clone_dim_order runtime test

9d45191

keyprocedure marked this pull request as ready for review August 5, 2025 23:46

keyprocedure requested review from larryliu0820, kirklandsign, manuelcandales and swolchok as code owners August 5, 2025 23:46

Merge branch 'main' into add-dim-order-clone-kernel

de049ef

Gasoonjia reviewed Aug 6, 2025

View reviewed changes

Clarify dim_order format and remove unrelated comments

e1865bf

Gasoonjia and others added 2 commits August 6, 2025 13:03

Merge branch 'main' into add-dim-order-clone-kernel

72fcff3

Move broadcast_util to exported_deps for copy_ops_util

19d14e1

Merge branch 'main' into add-dim-order-clone-kernel

0a4fe14

keyprocedure and others added 2 commits August 8, 2025 14:57

Add op__clone_dim_order to op_library

0100fd4

Merge branch 'main' into add-dim-order-clone-kernel

06c38ad

Gasoonjia reviewed Aug 8, 2025

View reviewed changes

Merge branch 'main' into add-dim-order-clone-kernel

b7bc064

Merge branch 'main' into add-dim-order-clone-kernel

e28b9b9

Gasoonjia approved these changes Aug 11, 2025

View reviewed changes

Gasoonjia merged commit 3a02146 into pytorch:main Aug 11, 2025
100 of 102 checks passed

		@@ -1329,6 +1329,13 @@ ATEN_OPS = (
		"//executorch/kernels/portable/cpu/util:copy_ops_util",

Add _clone_dim_order portable kernel #12974

Add _clone_dim_order portable kernel #12974

Uh oh!

Conversation

keyprocedure commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

pytorch-bot bot commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/12974

❌ 2 New Failures

Uh oh!

keyprocedure commented Jul 29, 2025

Uh oh!

Gasoonjia left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

keyprocedure Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Gasoonjia left a comment

Choose a reason for hiding this comment

Uh oh!

Gasoonjia Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

keyprocedure commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

keyprocedure commented Aug 7, 2025

Uh oh!

Gasoonjia commented Aug 8, 2025

Uh oh!

keyprocedure commented Aug 8, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Gasoonjia commented Aug 8, 2025

Uh oh!

Gasoonjia commented Aug 8, 2025

Uh oh!

keyprocedure commented Aug 9, 2025

Uh oh!

Gasoonjia commented Aug 11, 2025

Uh oh!

Uh oh!

Uh oh!

keyprocedure commented Jul 29, 2025 •

edited

Loading

pytorch-bot bot commented Jul 29, 2025 •

edited

Loading

Gasoonjia left a comment •

edited

Loading

keyprocedure Aug 5, 2025 •

edited

Loading

Gasoonjia Aug 6, 2025 •

edited

Loading

keyprocedure commented Aug 6, 2025 •

edited

Loading