Skip to content

[AArch64] Combine store (trunc X to <3 x i8>) to sequence of ST1.b #8052

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 26, 2024

Conversation

fhahn
Copy link

@fhahn fhahn commented Jan 25, 2024

Improve codegen for (trunc X to <3 x i8>) by converting it to a sequence
of 3 ST1.b, but first converting the truncate operand to either v8i8 or
v16i8, extracting the lanes for the truncate results and storing them.

At the moment, there are almost no cases in which such vector operations
will be generated automatically. The motivating case is non-power-of-2
SLP vectorization: llvm#77790

PR: llvm#78637

(cherry-picked from eb678d8)

fhahn added 5 commits January 25, 2024 18:58

Verified

This commit was signed with the committer’s verified signature. The key has expired.
graingert Thomas Grainger
(cherry-picked from 8336515)

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Extra tests for
  llvm#78637
  llvm#78632

(cherry-picked from ff1cde5)

Unverified

This commit is not signed, but one or more authors requires that any commit attributed to them is signed.
Extra tests for
  llvm#78637
  llvm#78632

(cherry-picked from e7b4ff8)

Unverified

This commit is not signed, but one or more authors requires that any commit attributed to them is signed.
Add extra tests with different load/store alignments for
llvm#78637.

(cherry-picked from 98509c7)

Partially verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
We cannot verify signatures from co-authors, and some of the co-authors attributed to this commit require their commits to be signed.
…lvm#78637)

Improve codegen for (trunc X to <3 x i8>) by converting it to a sequence
of 3 ST1.b, but first converting the truncate operand to either v8i8 or
v16i8, extracting the lanes for the truncate results and storing them.

At the moment, there are almost no cases in which such vector operations
will be generated automatically. The motivating case is non-power-of-2
SLP vectorization: llvm#77790

PR: llvm#78637

(cherry-picked from eb678d8)
@fhahn
Copy link
Author

fhahn commented Jan 25, 2024

@swift-ci please test

@fhahn
Copy link
Author

fhahn commented Jan 25, 2024

@swift-ci please test llvm

@fhahn
Copy link
Author

fhahn commented Jan 26, 2024

@swift-ci please test macos

@fhahn fhahn merged commit b0f6f4f into swiftlang:stable/20230725 Jan 26, 2024
@fhahn fhahn deleted the vec3-trunc-store branch January 26, 2024 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant