-
Notifications
You must be signed in to change notification settings - Fork 649
Add _clone_dim_order portable kernel #12974
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add _clone_dim_order portable kernel #12974
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/12974
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New FailuresAs of commit e28b9b9 with merge base 2d4533a ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@pytorchbot label "release notes: none" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good start! Thanks for your great work!
Beyond than what we have now, it would be awesome if we can have a runtime test for the new runtime operator. https://github.com/pytorch/executorch/blob/main/kernels/test/op__to_dim_order_copy_test.cpp is the test for to_dim_order_copy and you can use that as an example.
Also please mark this PR as Open instead of Draft if it is ready to review. Thx!
@@ -28,6 +28,14 @@ | |||
"_empty_dim_order.out(int[] size, *, int[]? dim_order=None, Tensor(a!) out) -> Tensor(a!)" | |||
) | |||
|
|||
lib.define( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is ok to leave here since we are gonna need it in the future, but when we talk about adding portable kernels we mainly focus on the kernels in the runtime, specificly under executorch/kernels/portable
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, but I needed to register the operator here otherwise the tests I added fail since there is no Python side reference to _clone_dim_order
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, should all the tests for this PR have been on the kernel side and not in test_memory_format_ops_pass.py?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR should only focus on our runtime changes, so no python side will refer to our new operator.
should all the tests for this PR have been on the kernel side and not in test_memory_format_ops_pass.py
Yes absolutely correct! This PR is only for portable kernel and its tests. Sorry for any misleading!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve added the runtime test and all tests passed locally. I couldn’t run the DynamicShapeUnbound
test since it depends on SupportedFeatures
and supported_features.h
doesn’t seem to be generated in OSS builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please disregard my previous comment about the missing SupportedFeatures
dependency, the issue was with my local build setup. All tests pass now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Some minor feedback but majority looks good!
} | ||
} | ||
|
||
/* %python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove comments unrelated with tests
void test_dynamic_shape( | ||
const std::vector<int32_t>& out_shape, | ||
enum torch::executor::TensorShapeDynamism dynamism) { | ||
/* %python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
TEST_F(OpDimOrderCloneTest, ContiguousToChannelsLast) { | ||
TensorFactory<ScalarType::Float> tf; | ||
|
||
Tensor x = tf.make_with_dimorder( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now x is using contiguous dim order (0,1,2,3) as default. Plz add a comment here for clarify
0.3597, 0.0911, 0.7719, 0.8151, 0.4296, 0.5552}, | ||
/*dim_order=*/{0, 2, 3, 1}); | ||
|
||
Tensor expected = tf.make_with_dimorder( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as here for comment
@Gasoonjia Everything runs fine locally but CI failed because of a missing dependency in copy_ops_util.h: This happened after I refactored a function into copy_ops_util.h and added the import. |
I think I got it: Can we try CI again? |
@keyprocedure so glad you've fixed the issue! Sry for late review. Restart ci. |
No worries, I appreciate all the support :) Progress with CI: Do you have any recommendations on which targets I should build locally to ensure everything that relies on new ops will build successfully or is there a Docker image available to run CI locally? I've tried to build entire dirs such as |
@@ -1329,6 +1329,13 @@ ATEN_OPS = ( | |||
"//executorch/kernels/portable/cpu/util:copy_ops_util", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can remove broadcast_util here right? Since copy_util will depend on it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I'll remove it here and in op__to_dim_order_copy.cpp
, then push once CI finishes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I still push the changes for removing the unused broadcast_util
dep, or save it for a follow up?
thansk for your great work! @keyprocedure |
BTW Have you tried this https://github.com/pytorch/executorch/blob/main/CONTRIBUTING.md#testing ? @keyprocedure |
Thanks for sharing this! After stopping some warnings from causing the build to fail, I ran the test script and everything passes except the PyTree EmptySpec test, which seems unrelated. But I'll run this script to validate future changes. |
ci looks good. Stamped! |
Summary
This is PR 1 of 3 implementing a dim order aware clone op.
Currently, clone ops are removed during export as no-ops, causing memory layout (dim order) changes to be lost. This can cause backend failures, incorrect outputs when ops expect specific layouts, and performance degradation. This set of PRs introduces a dim order aware clone op,
_clone_dim_order
, which preserves memory layout changes by explicitly storing dim order information. This is implemented by replacing standard clone ops with this variant during export and updating the clone removal transform to preserve clones that change layout.This PR adds the portable CPU kernel for the
_clone_dim_order
op, implementing a clone variant that preserves dim order at runtime. The portable kernel validates dtype and layout compatibility, resizes the output tensor if needed, and performs an element wise clone of the tensors.Note: A future PR will add the ATen kernel for
_clone_dim_order
.Related PRs:
_clone_dim_order
op and mapaten.clone
Fixes #12645
Test plan
Added kernel runtime tests to verify:
non_blocking=true
since the portable kernel only supports blocking data transfer.contiguous
tochannels_last
,channels_last
tocontiguous
, andchannels_last
is preserved.All runtime tests pass via:
build-ninja/kernels/test/portable_kernels_test