Re-reapply "Add vectorized_math.h (#11204)", "Add optimized_portable_kernels test (#11205)", and "Add vectorization in elementwise_util (#9432)" #11802

swolchok · 2025-06-18T23:11:54Z

Stack from ghstack (oldest at bottom):

-> Re-reapply "Add vectorized_math.h (#11204)", "Add optimized_portable_kernels test (#11205)", and "Add vectorization in elementwise_util (#9432)" #11802
Enable sleef in xplat_exported_deps of aten_headers_for_executorch #11801
ET_HAS_EXCEPTIONS: require defined(_MSC_VER) to conclude _HAS_EXCEPTIONS implies exceptions are on #11770

Stack was reverted (again! I bypassed some broken jobs and it turns
out this re-broke them) due to internal CI failures. Reapplying as an
exported internal diff so that we make sure to catch any more of
those.

New fixes in first reapply:

straightforward op_sub build fixes
s/EXPECT_EQ/EXPECT_FLOAT_EQ/ in vectorized_math_test
define ET_USE_PYTORCH_HEADERS to detect whether exceptions are
enabled, and use #if instead of #ifdef to check the macro so
that we don't use PyTorch headers if exceptions are
disabled. (otherwise, we might have problems with e.g. TORCH_CHECK)

New fixes in second reapply:

So far, none; D76843086 and D76857541 fix things up in preparation for this diff. (some rebase conflict fixes though)

Original summary for #11204:
Set of math functions that work on both scalars and at::vec::Vectorized,
to be used in #9432.

Original summary for #11205:
Make sure we test the optimized versions of portable kernels even if
they are shadowed by optimized implementations. Intended to support
#9432.

Original summary for #9432:

This is a first cut at #9241 . In this PR I've vectorized a small
initial set of ops: atan2, clamp, fmod_Scalar, maximum, minimum, mul,
pow, and sigmoid. In addition, the following ops should have gotten
vectorized automatically because they already used generic lambdas: add,
div, rsub, sub. I've left covering ops that use the unary_ufunc_*
utilities in
pattern.h
for a follow-up push, because pattern.h and elementwise_util need some
work before we can migrate pattern.h's utilities to be backed by
elementwise_util.

This PR adds an interesting testing problem: in theory, all operators
might need test cases long enough to tickle vectorization, because we
might accidentally vectorize ops unexpectedly and break their lambdas
due to anticipated differences in semantics. I address this issue by
using Vectorized for the scalar prologue/epilogue in debug mode (we run
tests in both debug and release) so that we can detect broken lambdas. I
additionally intentionally introduced a bug in the vectorized path in
elementwise_util and manually verified that we saw test failures for
each vectorized op called out above.

Differential Revision: D76754826

…kernels test (#11205)", and "Add vectorization in elementwise_util (#9432)" Stack was reverted (again! I bypassed some broken jobs and it turns out this re-broke them) due to internal CI failures. Reapplying as an exported internal diff so that we make sure to catch any more of those. New fixes in first reapply: - straightforward op_sub build fixes - s/EXPECT_EQ/EXPECT_FLOAT_EQ/ in vectorized_math_test - define ET_USE_PYTORCH_HEADERS to detect whether exceptions are enabled, and use `#if` instead of `#ifdef` to check the macro so that we don't use PyTorch headers if exceptions are disabled. (otherwise, we might have problems with e.g. TORCH_CHECK) New fixes in second reapply: - So far, none; D76843086 and D76857541 fix things up in preparation for this diff. (some rebase conflict fixes though) Original summary for #11204: Set of math functions that work on both scalars and at::vec::Vectorized, to be used in #9432. Original summary for #11205: Make sure we test the optimized versions of portable kernels even if they are shadowed by optimized implementations. Intended to support #9432. Original summary for #9432: This is a first cut at #9241 . In this PR I've vectorized a small initial set of ops: atan2, clamp, fmod_Scalar, maximum, minimum, mul, pow, and sigmoid. In addition, the following ops should have gotten vectorized automatically because they already used generic lambdas: add, div, rsub, sub. I've left covering ops that use the `unary_ufunc_*` utilities in [pattern.h](https://github.com/pytorch/executorch/blob/main/kernels/portable/cpu/pattern/pattern.h) for a follow-up push, because pattern.h and elementwise_util need some work before we can migrate pattern.h's utilities to be backed by elementwise_util. This PR adds an interesting testing problem: in theory, *all* operators might need test cases long enough to tickle vectorization, because we might accidentally vectorize ops unexpectedly and break their lambdas due to anticipated differences in semantics. I address this issue by using Vectorized for the scalar prologue/epilogue in debug mode (we run tests in both debug and release) so that we can detect broken lambdas. I additionally intentionally introduced a bug in the vectorized path in elementwise_util and manually verified that we saw test failures for each vectorized op called out above. Differential Revision: [D76754826](https://our.internmc.facebook.com/intern/diff/D76754826/) [ghstack-poisoned]

pytorch-bot · 2025-06-18T23:11:57Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11802

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 1c7d063 with merge base 44d2643 ():

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / test-moshi-linux / linux-job (gh) (trunk failure)
test_exported_decoder_xnnpack

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…kernels test (#11205)", and "Add vectorization in elementwise_util (#9432)" Stack was reverted (again! I bypassed some broken jobs and it turns out this re-broke them) due to internal CI failures. Reapplying as an exported internal diff so that we make sure to catch any more of those. New fixes in first reapply: - straightforward op_sub build fixes - s/EXPECT_EQ/EXPECT_FLOAT_EQ/ in vectorized_math_test - define ET_USE_PYTORCH_HEADERS to detect whether exceptions are enabled, and use `#if` instead of `#ifdef` to check the macro so that we don't use PyTorch headers if exceptions are disabled. (otherwise, we might have problems with e.g. TORCH_CHECK) New fixes in second reapply: - So far, none; D76843086 and D76857541 fix things up in preparation for this diff. (some rebase conflict fixes though) Original summary for #11204: Set of math functions that work on both scalars and at::vec::Vectorized, to be used in #9432. Original summary for #11205: Make sure we test the optimized versions of portable kernels even if they are shadowed by optimized implementations. Intended to support #9432. Original summary for #9432: This is a first cut at #9241 . In this PR I've vectorized a small initial set of ops: atan2, clamp, fmod_Scalar, maximum, minimum, mul, pow, and sigmoid. In addition, the following ops should have gotten vectorized automatically because they already used generic lambdas: add, div, rsub, sub. I've left covering ops that use the `unary_ufunc_*` utilities in [pattern.h](https://github.com/pytorch/executorch/blob/main/kernels/portable/cpu/pattern/pattern.h) for a follow-up push, because pattern.h and elementwise_util need some work before we can migrate pattern.h's utilities to be backed by elementwise_util. This PR adds an interesting testing problem: in theory, *all* operators might need test cases long enough to tickle vectorization, because we might accidentally vectorize ops unexpectedly and break their lambdas due to anticipated differences in semantics. I address this issue by using Vectorized for the scalar prologue/epilogue in debug mode (we run tests in both debug and release) so that we can detect broken lambdas. I additionally intentionally introduced a bug in the vectorized path in elementwise_util and manually verified that we saw test failures for each vectorized op called out above. Differential Revision: [D76754826](https://our.internmc.facebook.com/intern/diff/D76754826/) ghstack-source-id: 291370586 Pull Request resolved: #11802

facebook-github-bot · 2025-06-18T23:12:25Z

This pull request was exported from Phabricator. Differential Revision: D76754826

… optimized_portable_kernels test (#11205)", and "Add vectorization in elementwise_util (#9432)" ghstack PR number: #11802 Please see that original PR for details; this is a manual cherry-pick because mergebot failed. ghstack-source-id: e54d27c ghstack-comment-id: 3001228646 Pull-Request-resolved: #11912

… optimized_portable_kernels test (#11205)", and "Add vectorization in elementwise_util (#9432)" (#11912) ghstack PR number: #11802 Please see that original PR for details; this is a manual cherry-pick because mergebot failed.

…", "Add optimized_portable_kernels test (pytorch#11205)", and "Add vectorization in elementwise_util (pytorch#9432)" (pytorch#11912) ghstack PR number: pytorch#11802 Please see that original PR for details; this is a manual cherry-pick because mergebot failed.

swolchok requested review from JacobSzwejbka, kirklandsign, larryliu0820, lucylq and manuelcandales as code owners June 18, 2025 23:11

This was referenced Jun 18, 2025

ET_HAS_EXCEPTIONS: require defined(_MSC_VER) to conclude _HAS_EXCEPTIONS implies exceptions are on #11770

Merged

Enable sleef in xplat_exported_deps of aten_headers_for_executorch #11801

Merged

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 18, 2025

facebook-github-bot added the fb-exported label Jun 18, 2025

swolchok added the release notes: ops & kernels Changes to the opset and any new / changed kernel implementations label Jun 20, 2025

manuelcandales approved these changes Jun 23, 2025

View reviewed changes

facebook-github-bot merged commit bdf7003 into gh/swolchok/465/base Jun 24, 2025
168 of 175 checks passed

facebook-github-bot deleted the gh/swolchok/465/head branch June 24, 2025 00:06

facebook-github-bot had a problem deploying to cherry-pick-bot June 24, 2025 00:06 — with GitHub Actions Failure

swolchok mentioned this pull request Jun 24, 2025

Manual cherry-pick: Re-reapply "Add vectorized_math.h (#11204)", "Add optimized_portable_kernels test (#11205)", and "Add vectorization in elementwise_util (#9432)" #11912

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Re-reapply "Add vectorized_math.h (#11204)", "Add optimized_portable_kernels test (#11205)", and "Add vectorization in elementwise_util (#9432)" #11802

Re-reapply "Add vectorized_math.h (#11204)", "Add optimized_portable_kernels test (#11205)", and "Add vectorization in elementwise_util (#9432)" #11802

Uh oh!

swolchok commented Jun 18, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jun 18, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented Jun 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Re-reapply "Add vectorized_math.h (#11204)", "Add optimized_portable_kernels test (#11205)", and "Add vectorization in elementwise_util (#9432)" #11802

Re-reapply "Add vectorized_math.h (#11204)", "Add optimized_portable_kernels test (#11205)", and "Add vectorization in elementwise_util (#9432)" #11802

Uh oh!

Conversation

swolchok commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11802

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

facebook-github-bot commented Jun 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

swolchok commented Jun 18, 2025 •

edited

Loading

pytorch-bot bot commented Jun 18, 2025 •

edited

Loading