Skip to content

sse 4.1 #183

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Nov 17, 2017
Merged

sse 4.1 #183

merged 13 commits into from
Nov 17, 2017

Conversation

p32blo
Copy link
Contributor

@p32blo p32blo commented Nov 7, 2017

  • _mm_cvtepi32_epi64
  • _mm_cvtepu8_epi16
  • _mm_cvtepu8_epi32
  • _mm_cvtepu8_epi64
  • _mm_cvtepu16_epi32
  • _mm_cvtepu16_epi64
  • _mm_cvtepu32_epi64
  • _mm_mul_epi32
  • _mm_mullo_epi32
  • _mm_testz_si128
  • _mm_testc_si128
  • _mm_testnzc_si128
  • _mm_test_all_zeros
  • _mm_test_all_ones
  • _mm_test_mix_ones_zeros
  • _mm_minpos_epu16
  • _mm_mpsadbw_epu8

@BurntSushi
Copy link
Member

Looks like this has some overlap with #181?

@p32blo
Copy link
Contributor Author

p32blo commented Nov 7, 2017

All the test intrinsics receive a i64x2 but should work for any type that is 128-bit wide. How should this be handled?

The _mm_mpsadbw_epu8 intrinsic uses unsigned types but the llvm intrinsic works with signed ones, is changing it to unsigned ok?

@BurntSushi
Copy link
Member

All the test intrinsics receive a i64x2 but should work for any type that is 128-bit wide.

Use the __m128i type alias, which I defined exactly for this purpose: https://docs.rs/stdsimd/0.0.3/stdsimd/vendor/type.__m128i.html

The _mm_mpsadbw_epu8 intrinsic uses unsigned types but the llvm intrinsic works with signed ones, is changing it to unsigned ok?

Yes, use unsigned, like all the other epu intrinsics. :-) llvm doesn't distinguish between signed and unsigned in the types IIRC, but the Clang docs clearly call it unsigned: https://github.com/llvm-mirror/clang/blob/a38ba9770a70056f9fdd6c71f819e5db45a105e4/lib/Headers/smmintrin.h#L1514

@p32blo
Copy link
Contributor Author

p32blo commented Nov 7, 2017

Interestingly #181 uses an intrinsic instead of simd_shuffle. Where are these intrinsics listed? Can't find them in the intrinsics file at #40. Also clang doesn't use them when implementing the intrinsics.

@BurntSushi
Copy link
Member

@p32blo That's interesting. Usually if Clang doesn't use an intrinsic it's because the intrinsic isn't available. Over time, intrinsics are removed in favor of better codegen, so we should probably emulate Clang here. cc @c410-f3r

@p32blo
Copy link
Contributor Author

p32blo commented Nov 7, 2017

@BurntSushi Regarding _m128i, llvm intrinsic only accepts i64x2. So I changed the test intrinsics to recive _m128i and then used .into() to transmute them and be able to call the llvm intrinsic

@p32blo p32blo force-pushed the sse4.1 branch 2 times, most recently from 9c6061f to c96b603 Compare November 13, 2017 10:16
@p32blo
Copy link
Contributor Author

p32blo commented Nov 13, 2017

I'm getting an error with TARGET=i586-unknown-linux-gnu:

  • test x86::sse41::tests::_mm_test_all_ones ... error: An unknown error occurred

Are the instructions not supported? I'm not sure what to do here...

@gnzlbg
Copy link
Contributor

gnzlbg commented Nov 13, 2017

@p32blo try adding --verbose -- --nocapture to the cargo calls in ci/run.sh and retrying, that might give you a bit more information about the problem.

@gnzlbg
Copy link
Contributor

gnzlbg commented Nov 17, 2017

@p32blo I've sent you a PR with the fix, I think we can merge this afterwards :)

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
P32blo sse41
@p32blo
Copy link
Contributor Author

p32blo commented Nov 17, 2017

@gnzlbg It looks like it worked. Thank you!

@gnzlbg gnzlbg merged commit de05999 into rust-lang:master Nov 17, 2017
@gnzlbg
Copy link
Contributor

gnzlbg commented Nov 17, 2017

So @BurntSushi this LGTM now. The sketchy // i586 lines are not @p32blo's fault but mine. Adding intrinsics that work for i586 and up, or x86_64 and up is easy, but adding intrinsics for i686 and up is hard.

This PR got delayed ~1 week because of it, and while fixing that I've found some cases where we say "this requires x86_64" but we should be saying "this requires i686" instead.

I don't want to spread checks for all this stuff all over the place and have to think about it all the time, so I am going to split the x86 modules into 3 submodules, i586, i686, and x86_64, and put the intrinsics in the proper one so that we can just have the target macros at the top level.

The PR #175 adds an ia32 module, but I think that afterwards I am going to move that into an i386 module for consistency.

That should make it trivial to fix these things in the future. For example if you add a new intrinsic to the i586 module and the travis build bots for i586 break but the ones for i686 pass then you have probably added the intrinsic to the wrong module.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants