Skip to content

AArch64: sdot/udot not generated with variable or high trip counts #81516

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
rscottmanley opened this issue Feb 12, 2024 · 3 comments
Open

Comments

@rscottmanley
Copy link
Contributor

The following test is not generating sdot/udot (depending on types) when the tripcount is variable, or if the loop is not unwound (the cutoff is at 60 trips, though may differ per target).

#include <stdint.h>
int32_t f(int8_t * restrict x, int8_t * restrict y, int n)
{
  int32_t r = 0;
  for (int j = 0; j < n; ++j) {
      r += x[j] * y[j];
  }
  return r;
}

clang (does not generate sdot): https://godbolt.org/z/KznKr1Kh8

gcc (generates sdot): https://godbolt.org/z/h1xqM1xMc

If you replace 'n' with some value < 60, you will see the instructions. Same problem with unsigned. This appears to be a problem with neoverse-n1, neoverse-v1 and neoverse-v2

@rscottmanley
Copy link
Contributor Author

@llvmbot
Copy link
Member

llvmbot commented Feb 12, 2024

@llvm/issue-subscribers-backend-aarch64

Author: Scott Manley (rscottmanley)

The following test is not generating sdot/udot (depending on types) when the tripcount is variable, or if the loop is not unwound (the cutoff is at 60 trips, though may differ per target).
#include &lt;stdint.h&gt;
int32_t f(int8_t * restrict x, int8_t * restrict y, int n)
{
  int32_t r = 0;
  for (int j = 0; j &lt; n; ++j) {
      r += x[j] * y[j];
  }
  return r;
}

clang (does not generate sdot): https://godbolt.org/z/KznKr1Kh8

gcc (generates sdot): https://godbolt.org/z/h1xqM1xMc

If you replace 'n' with some value < 60, you will see the instructions. Same problem with unsigned. This appears to be a problem with neoverse-n1, neoverse-v1 and neoverse-v2

@paulwalker-arm
Copy link
Collaborator

You might be interested in #69587 and #69583. They're SVE focused but the plan is to expand the coverage to fixed length vectors as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants