Skip to content

Add arm neon vector types and fixes ARM tests #384

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 20, 2018

Conversation

gnzlbg
Copy link
Contributor

@gnzlbg gnzlbg commented Mar 19, 2018

NEON is supported on:

  • AArch64
  • ARM when built with the v7 and neon features. Building only with v7 and trying to use run-time feature detection result in bad codegen. This might be a bug?

This PR adds ARM/AArch64 NEON architecture-specific vector types:

  • ARM:

    • 64-bit wide: float32x2_t, int16x4_t, int32x2_t, int64x1_t, int8x8_t, poly16x4_t, poly8x8_t, uint16x4_t, uint32x2_t, uint64x1_t, uint8x8_t
    • 128-bit wide: float32x4_t, int16x8_t, int32x4_t, int64x2_t, int8x16_t, poly16x8_t, poly8x16_t, uint16x8_t, uint32x4_t, uint64x2_t, uint8x16_t
  • AArch64:

    • 64-bit wide: float32x2_t, float64x1_t, int16x4_t, int32x2_t, int64x1_t, int8x8_t, poly16x4_t, poly8x8_t, uint16x4_t, uint32x2_t, uint64x1_t, uint8x8_t
    • 128-bit wide: float32x4_t, float64x2_t, int16x8_t, int32x4_t, int64x2_t, int8x16_t, poly16x8_t, poly8x16_t, uint16x8_t, uint32x4_t, uint64x2_t, uint8x16_t

It also adds appropriate conversions from/to portable vector types, and updates all the ARM/AArch64 intrinsics to the new vector types.


Other fixes:

  • the armv7 tests weren't properly running before: this PR activates these tests and adds assert_instr for all armv7 intrinsics
  • on armv7 clz intrinsics are broken, so this PR filled clz fails to produce correct assembly on armv7 #382 and disabled these tests temporarily. The problem might be due to the core::intrinsic not being inlined here.
  • the arm v7 neon vrsqrte_f32 intrinsic fails to lower on ARM - this PR disables the intrinsic and filled vrsqrte_f32 fails to lower on ARMv7 NEON #383
  • rbit now uses the core::intrinsic::bitreverse
  • properly error at compile-time when trying to perform run-time feature detection on ARM features that can't be detected at run-time

mod v7;
#[cfg(any(target_arch = "aarch64", target_feature = "v7"))]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm quite a newbie when it comes to ARM intrinsics, but how come the #[cfg] is needed here?

It seems like aarch64 implies the v7 feature (although rustc doesn't print that out?) but is it a bug in rustc/llvm that we can only define the module when the v7 feature is compiled in?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, AArch64 always have a v7 feature enabled, so the LLVM backend does not advertise it as a feature (the v7 feature for AArch64 does not exist).

The cfg checks that either we are targeting for AArch64, or we have v7 enabled (that is, we are targetting ARM, but with v7).

I don't think this is a bug. There are arm-unknown... and armv7-unknown... targets. I think this feature can only be enabled at compile time, and maybe linking between binaries compiled w/o this feature is not allowed ? (they might be different targets for a reason?). @japaric ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh hm ok so does v7 behave differently from like AVX on x86_64? It's not enabled in the baseline target but we still provide it all the time, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for Aarch64 it is always enabled.

For arm AFAICT it must be enabled in the baseline target. That is, arm-unknown... does not support it at all, but armv7-unknown does. It is also not detectable at run-time AFAICT from looking at the linux hwcap files. So it looks to me that it is an option that v7 must be enabled statically for the hole binary for arm targets. It might well be that arm-unknown-... + target-feature=v7 is not equal to armv7-unknown-... because we are missing some features that are necessary (hence why the armv7-.. target exists).

pub use self::v7::*;

#[cfg(target_feature = "neon")]
// NEON is supported on AArch64, and on ARM when built with the v7 and neon
// features. Building ARM without neon produces incorrect codegen.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is probably related to the above comment, but what's the incorrect codegen here? Is that an LLVM/rustc bug?

Copy link
Contributor Author

@gnzlbg gnzlbg Mar 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be an LLVM/rustc bug AFAICT.

On AArch64 we can compile without enabling the neon feature globally, use #[target_feature(enable="neon")] to enable neon for some functions, and call these with run-time feature detection without issues. The code generated looks pretty good.

ARMv6 and older do not support neon at all, so for ARM neon is only allowed if the v7 feature is enabled (probably should check v8 here as well but that's not supported by rustc yet?).

Using the armv7-... target (with the v7 feature enabled) and tagging functions with #[target_feature(enable="v7,neon")] and calling them using run-time feature detection did work, but it produced horrible code (as in, way too many instructions). Enabling neon at compile-time produced the code I expected. So... maybe there is an LLVM/rustc bug where using#[target_feature(enable = "v7,neon")] is not enough for armv7 targets to generate good code? Maybe we need to enable some more features for these, I don't know yet (i'd like to know though!).

Note that before, these modules where guarded with #[cfg(target_feature = "neon")], so if these weren't enabled at compile-time we were not including them at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm ok, so it sounds like v7 is very different from AVX in x86_64, but it's more like the 64-bit intrinsics on x86_64 wrt the i686 platform? (fundamentally not possible).

If that's the case, do you have some examples of the v7+neon code generation that are bad? I'd be curious to dig in there and see what's going on!

Copy link
Contributor Author

@gnzlbg gnzlbg Mar 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm ok, so it sounds like v7 is very different from AVX in x86_64, but it's more like the 64-bit intrinsics on x86_64 wrt the i686 platform? (fundamentally not possible).

I think arm, armv7, and aarch64 are just different targets, as different at least as x86 and x86_64.

If that's the case, do you have some examples of the v7+neon code generation that are bad? I'd be curious to dig in there and see what's going on!

So branched this, removed those checks, and disabled neon in ci/run.sh on this branch (gnzlbg@9c9b2cf), which results in this travis failure(https://travis-ci.org/gnzlbg/stdsimd/jobs/355738848#L3159) .

Now I remember what was wrong: the run-time tests were not passing! The results generated are incorrect! This might well be an LLVM or rustc bug.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @petrochenkov @parched @Amanieu


This also might be related to #139 where it was discussed that we might need to treat arm, armv7, armv8, as different architectures, some of which have a 64-bit mode.


#[cfg_attr(feature = "cargo-clippy", allow(expl_impl_clone_on_copy))]
impl ::clone::Clone for $name {
#[inline] // currently needed for correctness
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can actually drop this now (the manual impl and just use #[derive]

Copy link
Contributor Author

@gnzlbg gnzlbg Mar 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have these in at least 3 places throughout the library. I'll submit a PR that removes all of them.

@alexcrichton
Copy link
Member

Ok thanks for all the info!

I think it's good to land this in the meantime, so let's do so!

@alexcrichton alexcrichton merged commit 8b5843e into rust-lang:master Mar 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants