-
Notifications
You must be signed in to change notification settings - Fork 288
Add arm neon vector types and fixes ARM tests #384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
mod v7; | ||
#[cfg(any(target_arch = "aarch64", target_feature = "v7"))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm quite a newbie when it comes to ARM intrinsics, but how come the #[cfg]
is needed here?
It seems like aarch64
implies the v7
feature (although rustc doesn't print that out?) but is it a bug in rustc/llvm that we can only define the module when the v7
feature is compiled in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, AArch64 always have a v7 feature enabled, so the LLVM backend does not advertise it as a feature (the v7 feature for AArch64 does not exist).
The cfg checks that either we are targeting for AArch64, or we have v7 enabled (that is, we are targetting ARM, but with v7).
I don't think this is a bug. There are arm-unknown...
and armv7-unknown...
targets. I think this feature can only be enabled at compile time, and maybe linking between binaries compiled w/o this feature is not allowed ? (they might be different targets for a reason?). @japaric ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh hm ok so does v7 behave differently from like AVX on x86_64? It's not enabled in the baseline target but we still provide it all the time, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for Aarch64
it is always enabled.
For arm
AFAICT it must be enabled in the baseline target. That is, arm-unknown...
does not support it at all, but armv7-unknown
does. It is also not detectable at run-time AFAICT from looking at the linux hwcap files. So it looks to me that it is an option that v7
must be enabled statically for the hole binary for arm
targets. It might well be that arm-unknown-... + target-feature=v7
is not equal to armv7-unknown-...
because we are missing some features that are necessary (hence why the armv7-..
target exists).
pub use self::v7::*; | ||
|
||
#[cfg(target_feature = "neon")] | ||
// NEON is supported on AArch64, and on ARM when built with the v7 and neon | ||
// features. Building ARM without neon produces incorrect codegen. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is probably related to the above comment, but what's the incorrect codegen here? Is that an LLVM/rustc bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be an LLVM/rustc bug AFAICT.
On AArch64 we can compile without enabling the neon
feature globally, use #[target_feature(enable="neon")]
to enable neon for some functions, and call these with run-time feature detection without issues. The code generated looks pretty good.
ARMv6 and older do not support neon at all, so for ARM neon is only allowed if the v7
feature is enabled (probably should check v8
here as well but that's not supported by rustc yet?).
Using the armv7-... target (with the v7 feature enabled) and tagging functions with #[target_feature(enable="v7,neon")]
and calling them using run-time feature detection did work, but it produced horrible code (as in, way too many instructions). Enabling neon
at compile-time produced the code I expected. So... maybe there is an LLVM/rustc bug where using#[target_feature(enable = "v7,neon")]
is not enough for armv7 targets to generate good code? Maybe we need to enable some more features for these, I don't know yet (i'd like to know though!).
Note that before, these modules where guarded with #[cfg(target_feature = "neon")]
, so if these weren't enabled at compile-time we were not including them at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm ok, so it sounds like v7 is very different from AVX in x86_64, but it's more like the 64-bit intrinsics on x86_64 wrt the i686 platform? (fundamentally not possible).
If that's the case, do you have some examples of the v7+neon code generation that are bad? I'd be curious to dig in there and see what's going on!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm ok, so it sounds like v7 is very different from AVX in x86_64, but it's more like the 64-bit intrinsics on x86_64 wrt the i686 platform? (fundamentally not possible).
I think arm
, armv7
, and aarch64
are just different targets, as different at least as x86
and x86_64
.
If that's the case, do you have some examples of the v7+neon code generation that are bad? I'd be curious to dig in there and see what's going on!
So branched this, removed those checks, and disabled neon in ci/run.sh on this branch (gnzlbg@9c9b2cf), which results in this travis failure(https://travis-ci.org/gnzlbg/stdsimd/jobs/355738848#L3159) .
Now I remember what was wrong: the run-time tests were not passing! The results generated are incorrect! This might well be an LLVM or rustc bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @petrochenkov @parched @Amanieu
This also might be related to #139 where it was discussed that we might need to treat arm, armv7, armv8, as different architectures, some of which have a 64-bit mode.
|
||
#[cfg_attr(feature = "cargo-clippy", allow(expl_impl_clone_on_copy))] | ||
impl ::clone::Clone for $name { | ||
#[inline] // currently needed for correctness |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can actually drop this now (the manual impl and just use #[derive]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have these in at least 3 places throughout the library. I'll submit a PR that removes all of them.
Ok thanks for all the info! I think it's good to land this in the meantime, so let's do so! |
NEON is supported on:
v7
andneon
features. Building only withv7
and trying to use run-time feature detection result in bad codegen. This might be a bug?This PR adds ARM/AArch64 NEON architecture-specific vector types:
ARM:
AArch64:
It also adds appropriate conversions from/to portable vector types, and updates all the ARM/AArch64 intrinsics to the new vector types.
Other fixes:
assert_instr
for all armv7 intrinsicsvrsqrte_f32
intrinsic fails to lower on ARM - this PR disables the intrinsic and filled vrsqrte_f32 fails to lower on ARMv7 NEON #383core::intrinsic::bitreverse