Implement all x86 vendor intrinsics

This is intended to be a tracking issue for implementing all vendor intrinsics in this repository.
This issue is also intended to be a guide for documenting the process of adding new vendor intrinsics to this crate.

If you decide to implement a set of vendor intrinsics, please check the list below to make sure somebody else isn't already working on them. If it's not checked off or has a name next to it, feel free to comment that you'd like to implement it!

At a high level, each vendor intrinsic should correspond to a single exported Rust function with an appropriate `target_feature` attribute. Here's an example for `_mm_adds_epi16`:

```rust
/// Add packed 16-bit integers in `a` and `b` using saturation.
#[inline]
#[target_feature(enable = "sse2")]
#[cfg_attr(test, assert_instr(paddsw))]
pub unsafe fn _mm_adds_epi16(a: __m128i, b: __m128i) -> __m128i {
 unsafe { paddsw(a, b) }
}
```

Let's break this down:

* The `#[inline]` is added because vendor intrinsic functions generally should always be inlined because the intent of a vendor intrinsic is to correspond to a single particular CPU instruction. A vendor intrinsic that is compiled into an actual function call could be quite disastrous for performance.
* The `#[target_feature(enable = "sse2")]` attribute intructs the compiler to generate code with the `sse2` target feature enabled, *regardless* of the target platform. That is, even if you're compiling for a platform that doesn't support `sse2`, the compiler will still generate code for `_mm_adds_epi16` *as if* `sse2` support existed. Without this attribute, the compiler might not generate the intended CPU instruction.
* The `#[cfg_attr(test, assert_instr(paddsw))]` attribute indicates that when we're testing the crate we'll assert that the `paddsw` instruction is generated inside this function, ensuring that the SIMD intrinsic truly is an intrinsic for the instruction!
* The types of the vectors given to the intrinsic should match exactly the types as provided in the vendor interface. (with things like `int64_t` translated to `i64` in Rust)
* The implementation of the vendor intrinsic is generally very simple. Remember, the goal is to compile a call to `_mm_adds_epi16` down to a single particular CPU instruction. As such, the implementation typically defers to a compiler intrinsic (in this case, `paddsw`) when one is available. More on this below as well.
* The intrinsic itself is `unsafe` due to the usage of `#[target_feature]`

Once a function has been added, you should also add at least one test for basic functionality. Here's an example for `_mm_adds_epi16`:

```rust
#[simd_test = "sse2"]
unsafe fn test_mm_adds_epi16() {
 let a = _mm_set_epi16(0, 1, 2, 3, 4, 5, 6, 7);
 let b = _mm_set_epi16(8, 9, 10, 11, 12, 13, 14, 15);
 let r = _mm_adds_epi16(a, b);
 let e = _mm_set_epi16(8, 10, 12, 14, 16, 18, 20, 22);
 assert_eq_m128i(r, e);
}
```

Note that `#[simd_test]` is the same as `#[test]`, it's just a custom macro to enable the target feature in the test and generate a wrapper for ensuring the feature is available on the local cpu as well.

Finally, once that's done, send a PR!

## Writing the implementation

An implementation of an intrinsic (so far) generally has one of three shapes:

1. The vendor intrinsic does not have any corresponding compiler intrinsic, so you must write the implementation in such a way that the compiler will recognize it and produce the desired codegen. For example, the `_mm_add_epi16` intrinsic (note the missing `s` in `add`) is implemented via `simd_add(a, b)`, which compiles down to LLVM's cross platform SIMD vector API.
2. The vendor intrinsic *does* have a corresponding compiler intrinsic, so you must write an `extern` block to bring that intrinsic into scope and then call it. The example above (`_mm_adds_epi16`) uses this approach.
3. The vendor intrinsic has a parameter that must be a *constant* value when given to the CPU instruction, where that constant is often a parameter that impacts the operation of the intrinsic. This means the implementation of the vendor intrinsic must guarantee that a particular parameter be a constant. This is tricky because Rust doesn't (yet) have a stable way of doing this, so we have to do it ourselves. How you do it can vary, but one particularly gnarly example is [`_mm_cmpestri`](https://github.com/BurntSushi/stdsimd/blob/ff6021b72e8cc1e7db942847d99278fe0056c245/src/x86/sse42.rs#L286) (make sure to look at the `constify_imm8!` macro).

## References

All intel intrinsics can be found here: https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=5236

The compiler intrinsics available to us through LLVM can be found here: https://gist.github.com/anonymous/a25d3e3b4c14ee68d63bd1dcb0e1223c

The Intel vendor intrinsic API can be found here: https://gist.github.com/anonymous/25d752fda8521d29699a826b980218fc

The Clang header files for vendor intrinsics can also be incredibly useful. When in doubt, Do What Clang Does:
https://github.com/llvm-mirror/clang/tree/master/lib/Headers


## TODO


<details><summary>["AVX2"]</summary>

 * [ ] [`_mm256_stream_load_si256`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm256_stream_load_si256&expand=5236)
 * [ ] [`_mm_broadcastsi128_si256`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_broadcastsi128_si256&expand=5236)
</details>

<details><summary>["MMX"]</summary>

 * [ ] [`_mm_srli_pi16`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_srli_pi16&expand=5236)
 * [ ] [`_mm_srl_pi16`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_srl_pi16&expand=5236)
 * [ ] [`_mm_mullo_pi16`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_mullo_pi16&expand=5236)
 * [ ] [`_mm_slli_si64`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_slli_si64&expand=5236)
 * [ ] [`_mm_mulhi_pi16`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_mulhi_pi16&expand=5236)
 * [ ] [`_mm_srai_pi16`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_srai_pi16&expand=5236)
 * [ ] [`_mm_srli_si64`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_srli_si64&expand=5236)
 * [ ] [`_mm_and_si64`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_and_si64&expand=5236)
 * [ ] [`_mm_cvtsi32_si64`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_cvtsi32_si64&expand=5236)
 * [ ] [`_mm_cvtm64_si64`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_cvtm64_si64&expand=5236)
 * [ ] [`_mm_andnot_si64`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_andnot_si64&expand=5236)
 * [ ] [`_mm_packs_pu16`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_packs_pu16&expand=5236)
 * [ ] [`_mm_madd_pi16`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_madd_pi16&expand=5236)
 * [ ] [`_mm_cvtsi64_m64`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_cvtsi64_m64&expand=5236)
 * [ ] [`_mm_cmpeq_pi16`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_cmpeq_pi16&expand=5236)
 * [ ] [`_mm_sra_pi32`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_sra_pi32&expand=5236)
 * [ ] [`_mm_cvtsi64_si32`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_cvtsi64_si32&expand=5236)
 * [ ] [`_mm_cmpeq_pi8`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_cmpeq_pi8&expand=5236)
 * [ ] [`_mm_srai_pi32`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_srai_pi32&expand=5236)
 * [ ] [`_mm_sll_pi16`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_sll_pi16&expand=5236)
 * [ ] [`_mm_srli_pi32`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_srli_pi32&expand=5236)
 * [ ] [`_mm_slli_pi16`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_slli_pi16&expand=5236)
 * [ ] [`_mm_srl_si64`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_srl_si64&expand=5236)
 * [ ] [`_mm_empty`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_empty&expand=5236)
 * [ ] [`_mm_srl_pi32`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_srl_pi32&expand=5236)
 * [ ] [`_mm_slli_pi32`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_slli_pi32&expand=5236)
 * [ ] [`_mm_or_si64`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_or_si64&expand=5236)
 * [ ] [`_mm_sll_si64`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_sll_si64&expand=5236)
 * [ ] [`_mm_sra_pi16`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_sra_pi16&expand=5236)
 * [ ] [`_mm_sll_pi32`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_sll_pi32&expand=5236)
 * [ ] [`_mm_xor_si64`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_xor_si64&expand=5236)
 * [ ] [`_mm_cmpeq_pi32`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_cmpeq_pi32&expand=5236)
</details>

<details><summary>["SSE"]</summary>

 * [ ] [`_mm_free`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_free&expand=5236)
 * [ ] [`_mm_storeu_si16`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_storeu_si16&expand=5236)
 * [ ] [`_mm_loadu_si16`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_loadu_si16&expand=5236)
 * [x] [`_mm_loadu_si64`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_loadu_si64&expand=5236)
 * [ ] [`_mm_malloc`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_malloc&expand=5236)
 * [ ] [`_mm_storeu_si64`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_storeu_si64&expand=5236)
</details>


<details><summary>["SSE2"]</summary>

 * [ ] [`_mm_loadu_si32`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_loadu_si32&expand=5236)
 * [ ] [`_mm_storeu_si32`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_storeu_si32&expand=5236)
</details>


<details><summary>["SSE4.1"]</summary>

 * [ ] [`_mm_stream_load_si128`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_stream_load_si128&expand=5236)
</details>


---

[previous description of this issue](https://gist.github.com/alexcrichton/58838cc127838da9d9584446b95aa1b4)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement all x86 vendor intrinsics #40

Writing the implementation

References

TODO

66 remaining items

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Implement all x86 vendor intrinsics #40

Description

Writing the implementation

References

TODO

Activity

alexcrichton commented on Sep 25, 2017

mattico commented on Sep 25, 2017

alexcrichton commented on Sep 26, 2017

AdamNiederer commented on Sep 26, 2017

gnzlbg commented on Sep 26, 2017

AdamNiederer commented on Sep 26, 2017

alexcrichton commented on Sep 26, 2017

gnzlbg commented on Sep 26, 2017

AdamNiederer commented on Sep 26, 2017

alexcrichton commented on Sep 26, 2017

dlrobertson commented on Sep 28, 2017

vbarrielle commented on Sep 29, 2017

dlrobertson commented on Sep 30, 2017

66 remaining items

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions