-
Notifications
You must be signed in to change notification settings - Fork 13.6k
pub use core::simd; #89167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+7,631
−0
Merged
pub use core::simd; #89167
Changes from all commits
Commits
Show all changes
307 commits
Select commit
Hold shift + click to select a range
db7d986
Add cross-links to Zulip, etc.
workingjubilee 285fff0
Merge pull request #37 from rust-lang/link-more
KodrAus 3870633
Add rounding mode test
calebzulawski 4baa8c2
Merge pull request #34 from rust-lang/feature/round
calebzulawski a5bdb8b
Document size/align of SIMD types
workingjubilee 3aec4a2
Merge pull request #43 from rust-lang/docs/layout
workingjubilee 7538ff8
Revert "Disable riscv64gc"
workingjubilee e9cc306
Remove round, trunc tests
workingjubilee 3d9bbf9
Use simd_{floor,ceil} intrinsics for round.rs
workingjubilee a69c441
Use platform intrinsics, not LLVM, for floor & ceil
workingjubilee cebc2ca
Add opaque masks
calebzulawski 5bc5d7f
Add comparison ops
calebzulawski 78a8d61
Implement missing traits on opaque masks, fix tests
calebzulawski 35b9ab9
Simplify some formatting
calebzulawski 27e9442
Begin changing vectors to const generics
calebzulawski 22576bb
Implement additional functions
calebzulawski 25c7640
Reenable ops and fix tests
calebzulawski 9cc3dea
Finish refactoring vector types
calebzulawski 0ddf7ac
Reenable rounding ops
calebzulawski 2720ccc
Fix masks
calebzulawski 62d98e3
Remove obsolete macros
calebzulawski 9b8cb18
Remove obsolete files
calebzulawski 5994771
Add workaround for rust-lang/rust#80108
calebzulawski cd36c98
Deploy documentation to GitHub Pages
calebzulawski c7d031c
Merge pull request #52 from rust-lang/feature/deploy-docs
KodrAus d72927c
Switch docs deploy to GITHUB_TOKEN
calebzulawski b931c15
Merge pull request #49 from rust-lang/feature/const-generics
KodrAus cb036b5
min_const_generics ride the train to stable
workingjubilee 3717408
Merge pull request #57 from rust-lang/stable-min-const
calebzulawski c67fc2e
Add guards/tests for div,rem overflow cases
miguelraz 9801540
Merge pull request #55 from miguelraz/min-panics
workingjubilee e4cdd15
Add issue templates
workingjubilee 9b6b5d7
Merge pull request #58 from rust-lang/issue-templates
workingjubilee fd6179b
Add a blank issue template
workingjubilee 8bea634
AsRef -> as_slices()
miguelraz acbde03
Merge pull request #60 from miguelraz/as-slice-prettify
workingjubilee 4ad53da
Merge pull request #59 from rust-lang/blank-ish
workingjubilee 5424140
Add SIMD shuffles for SimdType{2,4,8,16,32,64}
workingjubilee 55f0efc
add a link to docs to the readme
KodrAus 38ae9bd
Merge pull request #65 from rust-lang/KodrAus-patch-1
KodrAus 1b0c231
Merge pull request #62 from rust-lang/feature/shuffle-self
KodrAus 92293af
Add bitmask that supports up to 64 lanes. Simplify mask op API.
calebzulawski 9e96c8a
Add missing From implementation, add simple mask API tests
calebzulawski 26061b4
Fix wasm tests
calebzulawski 8aa7ba7
Merge pull request #61 from rust-lang/feature/masks
calebzulawski 6362540
Limit all types to 64 lanes
calebzulawski 16904eb
Add missing type bounds
calebzulawski faae170
Remove glob import
calebzulawski 08ee338
Add to glossary: vectorize, scalar, vector register
workingjubilee cbca211
Merge pull request #67 from rust-lang/limit-lanes
workingjubilee d3c58da
Merge pull request #73 from rust-lang/scalar-docs
calebzulawski d5c2279
Add proptest float tests
calebzulawski 0ac057a
Add integer tests
calebzulawski 5b0818a
Remove old integer tests
calebzulawski 223daea
Update supported lane counts
calebzulawski b38d342
Simplify test creation
calebzulawski 38b1890
Remove obsolete helpers
calebzulawski 8d5702e
Fix performance issues
calebzulawski 976fafc
Fix wasm tests
calebzulawski 0ec3ecf
Split ops tests
calebzulawski 714ad63
Fix MulAssign typo in tests, move panic tests
calebzulawski 8c378d3
Add documentation
calebzulawski 15dd0ae
Disable CI on branches without PR
calebzulawski 2b3f4b2
Add LanesAtMost64 bounds
calebzulawski f85bd24
Merge pull request #72 from rust-lang/feature/proptest
workingjubilee 2f2a463
Remove From<Scalar> for SimdTy impl
workingjubilee e3b729c
Merge pull request #75 from rust-lang/no-scalar-from
calebzulawski 1a19ad4
Reorg vectors into crate::vector::*;
workingjubilee 27f094f
Nominate base files
workingjubilee ca15e4f
cat vector types by kind
workingjubilee 39fb223
Partially carve macros.rs into other files
workingjubilee a2302da
Move macros.rs to first.rs
workingjubilee 8ad4f14
Merge pull request #77 from rust-lang/reorg-vectors
workingjubilee 8cb1fe0
Fix wasm-bindgen dependency
calebzulawski d95433d
Merge pull request #82 from rust-lang/bugfix/wasm-dependencies
workingjubilee fa77b19
Add std cargo feature
calebzulawski 65c3ce9
Merge pull request #81 from rust-lang/feature/std-cargo-feature
workingjubilee 4a6b4c0
Introduce saturating math
workingjubilee 6620015
Merge pull request #86 from rust-lang/feat/saturating
workingjubilee dd1a5e4
Add saturating abs/neg
workingjubilee 331230f
Explain why to use saturation
workingjubilee 4e6d440
Merge pull request #87 from rust-lang/feat/sat-abs-neg
calebzulawski 93ce1c1
Add floating-point classification functions
calebzulawski 07247a0
Various bug fixes
calebzulawski 97bbe2d
Fix normal and subnormal classification
calebzulawski e6a5309
Reduce maximum lanes from 64 to 32
calebzulawski 0682c31
Merge pull request #80 from rust-lang/feature/comparisons
workingjubilee b0a005d
Add floating-point classification functions
calebzulawski d7649f4
Various bug fixes
calebzulawski 926cf3a
Add intrinsics
calebzulawski 875b31c
Implement reductions
calebzulawski a7b82ad
Add tests
calebzulawski 193cd14
Enable special handling of zero
calebzulawski 02608d4
Fix mask ops
calebzulawski 64f5648
Update documentation and fix i586 inaccuracy
calebzulawski 4b8cbd5
Fix i586 detection
calebzulawski b51febb
Revert i586 fix, fix test instead
calebzulawski 3fae09b
Revert "Revert i586 fix, fix test instead"
calebzulawski 3cf970f
Fix test sum/product implementation
calebzulawski e2fa502
Enable i586 workaround for both f32 and f64
calebzulawski e127586
Improve function names and docs
calebzulawski e3f0124
Silence warnings
workingjubilee 894062f
Burn Chrome again
workingjubilee 81c9633
Drop wasm SIMD tests
workingjubilee 87b7207
Use neg intrinsics
workingjubilee 1c3d957
Merge pull request #96 from rust-lang/burning-chrome
workingjubilee 01d78aa
Update docs
calebzulawski e73985f
Merge pull request #89 from rust-lang/intrinsic-neg
calebzulawski 977f26f
Add some common shuffles
calebzulawski 1999c54
Clarify concatenation order
calebzulawski 9acc112
Use fabs intrinsic
workingjubilee b2e25bc
Merge pull request #95 from rust-lang/intrinsic-fabs
calebzulawski 828b274
Rename sum, product to horizontal_{sum,product}
calebzulawski 7028a58
Attempt to clarify interleave/deinterleave
calebzulawski 2fa62b9
Merge pull request #98 from rust-lang/feature/common-shuffles
workingjubilee 04ee107
Remove wrapping from sum/product fns
calebzulawski 24ebae8
Merge pull request #83 from rust-lang/feature/reductions
workingjubilee 1f4e902
Fix saturating math docs
calebzulawski e8b6bca
Finish fixing up abs docs
workingjubilee 91134e6
Branchless abs
workingjubilee f06427f
Move lanes_at_most_64 to _32
workingjubilee 92d643b
Remove Simd{U,I}128
workingjubilee b4fda6e
Give rounding intrinsics their own modules
workingjubilee 6ea08d8
Add SIMD round, trunc, fract
workingjubilee a9a1c9d
Merge pull request #100 from rust-lang/fix-sat-math
workingjubilee 5751179
Merge pull request #107 from rust-lang/feat/simd-round
workingjubilee da42aa5
Begin reducing mask API
calebzulawski eec4280
Update bitmask API
calebzulawski 98dad13
Make implementation more scalable by using a helper trait to determin…
calebzulawski 589fce0
Attempt to workaround MIPS bug
calebzulawski 9a063bc
Merge pull request #99 from rust-lang/feature/simplify-masks
workingjubilee 563d2a2
Add select function
calebzulawski dfebaf9
Merge pull request #103 from rust-lang/feature/select
workingjubilee 0bf5eb5
Add select for masks
calebzulawski e8cae87
Fix rustfmt
calebzulawski 45d7e80
Clarify documentation
calebzulawski ce92300
Merge pull request #117 from rust-lang/feature/mask-select
workingjubilee d679581
add simd_fsqrt intrinsic
miguelraz 1c18f8f
Add byte conversions
calebzulawski 20c3b8e
Merge pull request #120 from miguelraz/simd_fsqrt
calebzulawski e52d51c
nbody example
miguelraz 2591c59
fix imports
miguelraz ab6af37
Simdf64 from attempt
miguelraz 8bea362
replace sum() with horizontal_sum()
miguelraz 9926218
Remove extended_key_value_attributes feature
calebzulawski 93ee641
Merge pull request #125 from rust-lang/remove-stable-feature
workingjubilee 3c05cee
Update crates/core_simd/examples/nbody.rs
miguelraz 83dc5b7
don't need clippy
miguelraz 5605056
don't use turbofish on run
miguelraz f24110a
collapse run_k into run
miguelraz 5557907
rewrite unaligned slice, fix output const array
miguelraz 4e86aeb
finish nbody
miguelraz 70305c5
add main to avoid CI crash
miguelraz c042f33
clean up code, fudge approx true
miguelraz 435d1cf
Update crates/core_simd/examples/nbody.rs
miguelraz be121c9
clean code vis. Jubilee's comments
miguelraz 4311c06
Merge pull request #122 from miguelraz/nbodyexample
workingjubilee 3032a62
add helloworld to README (#134)
miguelraz 68393aa
Add mask width conversion (#127)
calebzulawski 57e67c9
add doctests for shuffle (#130)
miguelraz bdcccba
Implement Sum/Product traits
calebzulawski 96f0f5d
Implement Sum/Product over references
calebzulawski b936f34
Add various special functions (recip, signum, copysign)
calebzulawski 74e6262
Add min/max/clamp
calebzulawski f102de7
Add mul_add
calebzulawski 7b66032
Fix test typo
calebzulawski 15b4e28
Add from_bitmask (#136)
calebzulawski 708ae61
Remove scalar Sum/Product over questionable order of operations
calebzulawski b0a9fe5
Extract constant from scalar to_radians as well
calebzulawski 16765a1
Introduce SimdArray trait
workingjubilee 2f99cc8
Add pointer vectors: SimdConstPtr, SimdMutPtr
workingjubilee 128b6f5
Add SimdArray::gather_{or,or_default,select}
workingjubilee 81ceda8
Add SimdArray::scatter{,_select}
workingjubilee f38659a
Add assoc const SimdArray::LANES
workingjubilee 1529ed4
Document and test doubled writes in scatter
workingjubilee 3872723
Merge pull request #138 from rust-lang/feature/various-fns
workingjubilee b5ba195
Merge pull request #139 from rust-lang/feat/gather
workingjubilee 715f9ac
Fix typo. Closes #140
calebzulawski 871d588
Add 32-bit SIMD types i/u16x2 and i/u8x4. (#145)
adamgreig ac749a1
add matrix_inversion example (#131)
miguelraz 3954b27
Add conversions between vendor intrinsics (#144)
calebzulawski be96995
Add portable_simd unstable feature gate (#141)
calebzulawski 732b7ed
Add fmt and clippy to CI (#147)
calebzulawski c077bf3
Rename SimdArray to Vector, remove its generic parameter, and remove …
calebzulawski f178dda
Add as_slice/as_mut_slice to Vector
calebzulawski fdd7d6e
Change as_slice to as_array
calebzulawski 529ffe0
Use new module naming
calebzulawski f93bef3
Move vector implementation
calebzulawski 97c25dd
Add lane count marker type
calebzulawski 82e3405
Merge pull request #142 from rust-lang/feature/traits
workingjubilee 34384b7
Add const_evaluatable_checked feature, change to_bitmask to use it, a…
calebzulawski 1f69bc4
Add CI for testing cargo features
calebzulawski 9ab0507
Fix feature flag in CI
calebzulawski cca9102
Change bitmasks to use less opaque type
calebzulawski c36d17d
Merge pull request #152 from rust-lang/feature/const_eval_checked
workingjubilee 2acf204
Rename to portable-simd and remove other names
workingjubilee 50eb35e
Merge pull request #153 from rust-lang/death-of-the-author
calebzulawski 054f25f
Convert all vectors to a single type
calebzulawski dc4dc99
Change to various generic impls
calebzulawski 8cc38ae
Remove Vector trait
calebzulawski ddc67e3
Remove Mask trait
calebzulawski de13b20
Convert all masks to a single type
calebzulawski ea02805
Implement select generically
calebzulawski e6d95e4
Implement comparisons generically
calebzulawski e11286a
Remove unused transmute file
calebzulawski 5ed57b4
Remove most usage of type aliases
calebzulawski 88f79d4
Remove aliases from op trait impls
calebzulawski f7f2968
Remove aliases from most tests
calebzulawski 275889f
Remove remaining usage of aliases
calebzulawski 40142ac
Remove aliases
calebzulawski 00165ed
Remove mask aliases
calebzulawski cf653c7
Update crates/core_simd/src/vector.rs
calebzulawski 4aafd8e
Rename element type variable
calebzulawski d428753
Merge pull request #154 from rust-lang/feature/generic-element-type
workingjubilee 8cf7a62
Fix cargo features for nightly (#155)
calebzulawski b25ed7f
Restructure crate as core module
workingjubilee 8342fe7
Cleanup more for std::simd also
workingjubilee 6d3d07a
Feature-flag doc tests so they run for core
workingjubilee c2f5948
Feature-flag fused mul-add to block libcalls
workingjubilee afd7c5a
Make sure MaskElement is in bitmasks.rs
workingjubilee 4fbccaf
Add lanes()
mulimoen ec05dfb
Add associated LANES const
mulimoen b506e3e
Renovate for Edition 2021
workingjubilee 436ca7f
Add lanes() and associated LANES const
workingjubilee 6d23662
Add {gather,scatter}_select_unchecked
workingjubilee 01e9816
docs: fix typo gather -> scatter
workingjubilee 9be2665
Rewrite gather/scatter docs
workingjubilee a16b481
Simplify language for scatter/gather
workingjubilee 10168fb
Add new swizzle API
calebzulawski 98e4fca
Fix macro in core
calebzulawski 37797d9
simd_shuffle -> simd_swizzle
calebzulawski cd7ecba
Remove adt_const_params feature
calebzulawski 765bee6
Update crates/core_simd/src/swizzle.rs
calebzulawski 5b4282e
Improve docs
calebzulawski ab8eec7
Fixup import pathing for core
workingjubilee 7c2d295
Hide mask impl details in sealed trait.
calebzulawski 772bf20
Hide select impl in sealed trait
calebzulawski 4e00aa6
rotate_{left,right} -> rotate_lanes_{left,right}
calebzulawski d2e8728
add `Simd::from_slice` (#177)
pro465 0ecf987
Merge pull request #181 from rust-lang/rotate_lanes
calebzulawski 349a611
Delete travis config, move tests to github actions.
calebzulawski 081240a
Merge pull request #175 from rust-lang/feature/more-actions
workingjubilee c52083e
Use the right name for AVX512F
workingjubilee 949f71c
Deny warnings in CI and fix
workingjubilee 7d91357
Dynamically detect AVX512 in CI
workingjubilee 6ddf7ad
Restrict Arm types to Arm v7+
workingjubilee 1ce1c64
Rewrite Arm transmutes, reading std::arch closer
workingjubilee fdee059
Add 'library/portable-simd/' from commit '1ce1c645cf27c4acdefe6ec8a11…
workingjubilee 39cb863
Expose portable-simd as core::simd
workingjubilee 7c3d72d
Test core::simd works
workingjubilee File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
use core::simd::f32x4; | ||
|
||
#[test] | ||
fn testing() { | ||
let x = f32x4::from_array([1.0, 1.0, 1.0, 1.0]); | ||
let y = -x; | ||
|
||
let h = x * 0.5; | ||
|
||
let r = y.abs(); | ||
assert_eq!(x, r); | ||
assert_eq!(h, f32x4::splat(0.5)); | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
--- | ||
name: Blank Issue | ||
about: Create a blank issue. | ||
--- |
50 changes: 50 additions & 0 deletions
50
library/portable-simd/.github/ISSUE_TEMPLATE/bug_report.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
--- | ||
name: Bug Report | ||
about: Create a bug report for Rust. | ||
labels: C-bug | ||
--- | ||
<!-- | ||
Thank you for filing a bug report! 🐛 Please provide a short summary of the bug, | ||
along with any information you feel relevant to replicating the bug. | ||
--> | ||
|
||
I tried this code: | ||
|
||
```rust | ||
<code> | ||
``` | ||
|
||
I expected to see this happen: *explanation* | ||
|
||
Instead, this happened: *explanation* | ||
|
||
### Meta | ||
|
||
`rustc --version --verbose`: | ||
``` | ||
<version> | ||
``` | ||
|
||
|
||
`crate version in Cargo.toml`: | ||
```toml | ||
[dependencies] | ||
stdsimd = | ||
``` | ||
<!-- If this specifies the repo at HEAD, please include the latest commit. --> | ||
|
||
|
||
<!-- | ||
If a backtrace is available, please include a backtrace in the code block by | ||
setting `RUST_BACKTRACE=1` in your environment. e.g. | ||
`RUST_BACKTRACE=1 cargo build`. | ||
--> | ||
<details><summary>Backtrace</summary> | ||
<p> | ||
|
||
``` | ||
<backtrace> | ||
``` | ||
|
||
</p> | ||
</details> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# This only controls whether a tiny, hard-to-find "open a blank issue" link appears at the end of | ||
# the template list. | ||
blank_issues_enabled: true | ||
contact_links: | ||
- name: Intrinsic Support | ||
url: https://github.com/rust-lang/stdarch/issues | ||
about: Please direct issues about Rust's support for vendor intrinsics to core::arch | ||
- name: Internal Compiler Error | ||
url: https://github.com/rust-lang/rust/issues | ||
about: Please report ICEs to the rustc repository |
14 changes: 14 additions & 0 deletions
14
library/portable-simd/.github/ISSUE_TEMPLATE/feature_request.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
--- | ||
name: Feature Request | ||
about: Request an addition to the core::simd API | ||
labels: C-feature-request | ||
--- | ||
<!-- | ||
Hello! | ||
We are very interested in any feature requests you may have. | ||
However, please be aware that core::simd exists to address concerns with creating a portable SIMD API for Rust. | ||
Requests for extensions to compiler features, such as `target_feature`, binary versioning for SIMD APIs, or | ||
improving specific compilation issues in general should be discussed at https://internals.rust-lang.org/ | ||
--> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
Hello, welcome to `std::simd`! | ||
|
||
It seems this pull request template checklist was created while a lot of vector math ops were being implemented, and only really applies to ops. Feel free to delete everything here if it's not applicable, or ask for help if you're not sure what it means! | ||
|
||
For a given vector math operation on TxN, please add tests for interactions with: | ||
- [ ] `T::MAX` | ||
- [ ] `T::MIN` | ||
- [ ] -1 | ||
- [ ] 1 | ||
- [ ] 0 | ||
|
||
|
||
For a given vector math operation on TxN where T is a float, please add tests for test interactions with: | ||
- [ ] a really large number, larger than the mantissa | ||
- [ ] a really small "subnormal" number | ||
- [ ] NaN | ||
- [ ] Infinity | ||
- [ ] Negative Infinity |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,260 @@ | ||
name: CI | ||
|
||
on: | ||
pull_request: | ||
push: | ||
branches: | ||
- master | ||
|
||
env: | ||
CARGO_NET_RETRY: 10 | ||
RUSTUP_MAX_RETRIES: 10 | ||
|
||
jobs: | ||
rustfmt: | ||
name: "rustfmt" | ||
runs-on: ubuntu-latest | ||
|
||
steps: | ||
- uses: actions/checkout@v2 | ||
- name: Setup Rust | ||
run: | | ||
rustup update nightly --no-self-update | ||
rustup default nightly | ||
rustup component add rustfmt | ||
- name: Run rustfmt | ||
run: cargo fmt --all -- --check | ||
|
||
clippy: | ||
name: "clippy on ${{ matrix.target }}" | ||
runs-on: ubuntu-latest | ||
strategy: | ||
fail-fast: false | ||
matrix: | ||
target: | ||
# We shouldn't really have any OS-specific code, so think of this as a list of architectures | ||
- x86_64-unknown-linux-gnu | ||
- i686-unknown-linux-gnu | ||
- i586-unknown-linux-gnu | ||
- aarch64-unknown-linux-gnu | ||
- armv7-unknown-linux-gnueabihf | ||
- mips-unknown-linux-gnu | ||
- mips64-unknown-linux-gnuabi64 | ||
- powerpc-unknown-linux-gnu | ||
- powerpc64-unknown-linux-gnu | ||
- riscv64gc-unknown-linux-gnu | ||
- s390x-unknown-linux-gnu | ||
- sparc64-unknown-linux-gnu | ||
- wasm32-unknown-unknown | ||
|
||
steps: | ||
- uses: actions/checkout@v2 | ||
- name: Setup Rust | ||
run: | | ||
rustup update nightly --no-self-update | ||
rustup default nightly | ||
rustup target add ${{ matrix.target }} | ||
rustup component add clippy | ||
- name: Run Clippy | ||
run: cargo clippy --all-targets --target ${{ matrix.target }} | ||
|
||
x86-tests: | ||
name: "${{ matrix.target_feature }} on ${{ matrix.target }}" | ||
runs-on: ${{ matrix.os }} | ||
strategy: | ||
fail-fast: false | ||
matrix: | ||
target: [x86_64-pc-windows-msvc, i686-pc-windows-msvc, i586-pc-windows-msvc, x86_64-unknown-linux-gnu, x86_64-apple-darwin] | ||
# `default` means we use the default target config for the target, | ||
# `native` means we run with `-Ctarget-cpu=native`, and anything else is | ||
# an arg to `-Ctarget-feature` | ||
target_feature: [default, native, +sse3, +ssse3, +sse4.1, +sse4.2, +avx, +avx2] | ||
|
||
exclude: | ||
# The macos runners seem to only reliably support up to `avx`. | ||
- { target: x86_64-apple-darwin, target_feature: +avx2 } | ||
# These features are statically known to be present for all 64 bit | ||
# macs, and thus are covered by the `default` test | ||
- { target: x86_64-apple-darwin, target_feature: +sse3 } | ||
- { target: x86_64-apple-darwin, target_feature: +ssse3 } | ||
# -Ctarget-cpu=native sounds like bad-news if target != host | ||
- { target: i686-pc-windows-msvc, target_feature: native } | ||
- { target: i586-pc-windows-msvc, target_feature: native } | ||
|
||
include: | ||
# Populate the `matrix.os` field | ||
- { target: x86_64-apple-darwin, os: macos-latest } | ||
- { target: x86_64-unknown-linux-gnu, os: ubuntu-latest } | ||
- { target: x86_64-pc-windows-msvc, os: windows-latest } | ||
- { target: i686-pc-windows-msvc, os: windows-latest } | ||
- { target: i586-pc-windows-msvc, os: windows-latest } | ||
|
||
# These are globally available on all the other targets. | ||
- { target: i586-pc-windows-msvc, target_feature: +sse, os: windows-latest } | ||
- { target: i586-pc-windows-msvc, target_feature: +sse2, os: windows-latest } | ||
|
||
# Annoyingly, the x86_64-unknown-linux-gnu runner *almost* always has | ||
# avx512vl, but occasionally doesn't. Maybe one day we can enable it. | ||
|
||
steps: | ||
- uses: actions/checkout@v2 | ||
- name: Setup Rust | ||
run: | | ||
rustup update nightly --no-self-update | ||
rustup default nightly | ||
rustup target add ${{ matrix.target }} | ||
- name: Configure RUSTFLAGS | ||
shell: bash | ||
run: | | ||
case "${{ matrix.target_feature }}" in | ||
default) | ||
echo "RUSTFLAGS=-Dwarnings" >> $GITHUB_ENV;; | ||
native) | ||
echo "RUSTFLAGS=-Dwarnings -Ctarget-cpu=native" >> $GITHUB_ENV | ||
;; | ||
*) | ||
echo "RUSTFLAGS=-Dwarnings -Ctarget-feature=${{ matrix.target_feature }}" >> $GITHUB_ENV | ||
;; | ||
esac | ||
# Super useful for debugging why a SIGILL occurred. | ||
- name: Dump target configuration and support | ||
run: | | ||
rustc -Vv | ||
echo "Caveat: not all target features are expected to be logged" | ||
echo "## Requested target configuration (RUSTFLAGS=$RUSTFLAGS)" | ||
rustc --print=cfg --target=${{ matrix.target }} $RUSTFLAGS | ||
echo "## Supported target configuration for --target=${{ matrix.target }}" | ||
rustc --print=cfg --target=${{ matrix.target }} -Ctarget-cpu=native | ||
echo "## Natively supported target configuration" | ||
rustc --print=cfg -Ctarget-cpu=native | ||
- name: Test (debug) | ||
run: cargo test --verbose --target=${{ matrix.target }} | ||
|
||
- name: Test (release) | ||
run: cargo test --verbose --target=${{ matrix.target }} --release | ||
|
||
wasm-tests: | ||
name: "wasm (firefox, ${{ matrix.name }})" | ||
runs-on: ubuntu-latest | ||
strategy: | ||
matrix: | ||
include: | ||
- { name: default, RUSTFLAGS: "" } | ||
- { name: simd128, RUSTFLAGS: "-C target-feature=+simd128" } | ||
steps: | ||
- uses: actions/checkout@v2 | ||
- name: Setup Rust | ||
run: | | ||
rustup update nightly --no-self-update | ||
rustup default nightly | ||
- name: Install wasm-pack | ||
run: curl https://rustwasm.github.io/wasm-pack/installer/init.sh -sSf | sh | ||
- name: Test (debug) | ||
run: wasm-pack test --firefox --headless crates/core_simd | ||
env: | ||
RUSTFLAGS: ${{ matrix.rustflags }} | ||
- name: Test (release) | ||
run: wasm-pack test --firefox --headless crates/core_simd --release | ||
env: | ||
RUSTFLAGS: ${{ matrix.rustflags }} | ||
|
||
cross-tests: | ||
name: "${{ matrix.target }} (via cross)" | ||
runs-on: ubuntu-latest | ||
strategy: | ||
fail-fast: false | ||
# TODO: Sadly, we cant configure target-feature in a meaningful way | ||
# because `cross` doesn't tell qemu to enable any non-default cpu | ||
# features, nor does it give us a way to do so. | ||
# | ||
# Ultimately, we'd like to do something like [rust-lang/stdarch][stdarch]. | ||
# This is a lot more complex... but in practice it's likely that we can just | ||
# snarf the docker config from around [here][1000-dockerfiles]. | ||
# | ||
# [stdarch]: https://github.com/rust-lang/stdarch/blob/a5db4eaf/.github/workflows/main.yml#L67 | ||
# [1000-dockerfiles]: https://github.com/rust-lang/stdarch/tree/a5db4eaf/ci/docker | ||
|
||
matrix: | ||
target: | ||
- i586-unknown-linux-gnu | ||
# 32-bit arm has a few idiosyncracies like having subnormal flushing | ||
# to zero on by default. Ideally we'd set | ||
- armv7-unknown-linux-gnueabihf | ||
- aarch64-unknown-linux-gnu | ||
# Note: The issue above means neither of these mips targets will use | ||
# MSA (mips simd) but MIPS uses a nonstandard binary representation | ||
# for NaNs which makes it worth testing on despite that. | ||
- mips-unknown-linux-gnu | ||
- mips64-unknown-linux-gnuabi64 | ||
- riscv64gc-unknown-linux-gnu | ||
# TODO this test works, but it appears to time out | ||
# - powerpc-unknown-linux-gnu | ||
# TODO this test is broken, but it appears to be a problem with QEMU, not us. | ||
# - powerpc64le-unknown-linux-gnu | ||
# TODO enable this once a new version of cross is released | ||
# - powerpc64-unknown-linux-gnu | ||
|
||
steps: | ||
- uses: actions/checkout@v2 | ||
- name: Setup Rust | ||
run: | | ||
rustup update nightly --no-self-update | ||
rustup default nightly | ||
rustup target add ${{ matrix.target }} | ||
rustup component add rust-src | ||
- name: Install Cross | ||
# Equivalent to `cargo install cross`, but downloading a prebuilt | ||
# binary. Ideally we wouldn't hardcode a version, but the version number | ||
# being part of the tarball means we can't just use the download/latest | ||
# URL :( | ||
run: | | ||
CROSS_URL=https://github.com/rust-embedded/cross/releases/download/v0.2.1/cross-v0.2.1-x86_64-unknown-linux-gnu.tar.gz | ||
mkdir -p "$HOME/.bin" | ||
curl -sfSL --retry-delay 10 --retry 5 "${CROSS_URL}" | tar zxf - -C "$HOME/.bin" | ||
echo "$HOME/.bin" >> $GITHUB_PATH | ||
- name: Test (debug) | ||
run: cross test --verbose --target=${{ matrix.target }} | ||
|
||
- name: Test (release) | ||
run: cross test --verbose --target=${{ matrix.target }} --release | ||
|
||
features: | ||
name: "Check cargo features (${{ matrix.simd }} × ${{ matrix.features }})" | ||
runs-on: ubuntu-latest | ||
strategy: | ||
fail-fast: false | ||
matrix: | ||
simd: | ||
- "" | ||
- "avx512" | ||
features: | ||
- "" | ||
- "--features std" | ||
- "--features generic_const_exprs" | ||
- "--features std --features generic_const_exprs" | ||
|
||
steps: | ||
- uses: actions/checkout@v2 | ||
- name: Setup Rust | ||
run: | | ||
rustup update nightly --no-self-update | ||
rustup default nightly | ||
- name: Detect AVX512 | ||
run: echo "CPU_FEATURE=$(lscpu | grep -o avx512[a-z]* | sed s/avx/+avx/ | tr '\n' ',' )" >> $GITHUB_ENV | ||
- name: Check build | ||
if: ${{ matrix.simd == '' }} | ||
run: RUSTFLAGS="-Dwarnings" cargo check --all-targets --no-default-features ${{ matrix.features }} | ||
- name: Check AVX | ||
if: ${{ matrix.simd == 'avx512' && contains(env.CPU_FEATURE, 'avx512') }} | ||
run: | | ||
echo "Found AVX features: $CPU_FEATURE" | ||
RUSTFLAGS="-Dwarnings -Ctarget-feature=$CPU_FEATURE" cargo check --all-targets --no-default-features ${{ matrix.features }} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
name: Documentation | ||
|
||
on: | ||
push: | ||
branches: | ||
- master | ||
|
||
jobs: | ||
release: | ||
name: Deploy Documentation | ||
runs-on: ubuntu-latest | ||
|
||
steps: | ||
- name: Checkout Repository | ||
uses: actions/checkout@v1 | ||
|
||
- name: Setup Rust | ||
run: | | ||
rustup update nightly --no-self-update | ||
rustup default nightly | ||
- name: Build Documentation | ||
run: cargo doc --no-deps | ||
|
||
- name: Deploy Documentation | ||
uses: peaceiris/actions-gh-pages@v3 | ||
with: | ||
github_token: ${{ secrets.GITHUB_TOKEN }} | ||
publish_branch: gh-pages | ||
publish_dir: ./target/doc |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
/target | ||
Cargo.lock |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
# Contributing to `std::simd` | ||
|
||
Simple version: | ||
1. Fork it and `git clone` it | ||
2. Create your feature branch: `git checkout -b my-branch` | ||
3. Write your changes. | ||
4. Test it: `cargo test`. Remember to enable whatever SIMD features you intend to test by setting `RUSTFLAGS`. | ||
5. Commit your changes: `git commit add ./path/to/changes && git commit -m 'Fix some bug'` | ||
6. Push the branch: `git push --set-upstream origin my-branch` | ||
7. Submit a pull request! | ||
|
||
## Taking on an Issue | ||
|
||
SIMD can be quite complex, and even a "simple" issue can be huge. If an issue is organized like a tracking issue, with an itemized list of items that don't necessarily have to be done in a specific order, please take the issue one item at a time. This will help by letting work proceed apace on the rest of the issue. If it's a (relatively) small issue, feel free to announce your intention to solve it on the issue tracker and take it in one go! | ||
|
||
## CI | ||
|
||
We currently have 2 CI matrices through Travis CI and GitHub Actions that will automatically build and test your change in order to verify that `std::simd`'s portable API is, in fact, portable. If your change builds locally, but does not build on either, this is likely due to a platform-specific concern that your code has not addressed. Please consult the build logs and address the error, or ask for help if you need it. | ||
|
||
## Beyond stdsimd | ||
|
||
A large amount of the core SIMD implementation is found in the rustc_codegen_* crates in the [main rustc repo](https://github.com/rust-lang/rust). In addition, actual platform-specific functions are implemented in [stdarch]. Not all changes to `std::simd` require interacting with either of these, but if you're wondering where something is and it doesn't seem to be in this repository, those might be where to start looking. | ||
|
||
## Questions? Concerns? Need Help? | ||
|
||
Please feel free to ask in the [#project-portable-simd][zulip-portable-simd] stream on the [rust-lang Zulip][zulip] for help with making changes to `std::simd`! | ||
If your changes include directly modifying the compiler, it might also be useful to ask in [#t-compiler/help][zulip-compiler-help]. | ||
|
||
[zulip-portable-simd]: https://rust-lang.zulipchat.com/#narrow/stream/257879-project-portable-simd | ||
[zulip-compiler-help]: https://rust-lang.zulipchat.com/#narrow/stream/182449-t-compiler.2Fhelp | ||
[zulip]: https://rust-lang.zulipchat.com | ||
[stdarch]: https://github.com/rust-lang/stdarch |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
[workspace] | ||
|
||
members = [ | ||
"crates/core_simd", | ||
"crates/test_helpers", | ||
] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,202 @@ | ||
|
||
Apache License | ||
Version 2.0, January 2004 | ||
http://www.apache.org/licenses/ | ||
|
||
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION | ||
|
||
1. Definitions. | ||
|
||
"License" shall mean the terms and conditions for use, reproduction, | ||
and distribution as defined by Sections 1 through 9 of this document. | ||
|
||
"Licensor" shall mean the copyright owner or entity authorized by | ||
the copyright owner that is granting the License. | ||
|
||
"Legal Entity" shall mean the union of the acting entity and all | ||
other entities that control, are controlled by, or are under common | ||
control with that entity. For the purposes of this definition, | ||
"control" means (i) the power, direct or indirect, to cause the | ||
direction or management of such entity, whether by contract or | ||
otherwise, or (ii) ownership of fifty percent (50%) or more of the | ||
outstanding shares, or (iii) beneficial ownership of such entity. | ||
|
||
"You" (or "Your") shall mean an individual or Legal Entity | ||
exercising permissions granted by this License. | ||
|
||
"Source" form shall mean the preferred form for making modifications, | ||
including but not limited to software source code, documentation | ||
source, and configuration files. | ||
|
||
"Object" form shall mean any form resulting from mechanical | ||
transformation or translation of a Source form, including but | ||
not limited to compiled object code, generated documentation, | ||
and conversions to other media types. | ||
|
||
"Work" shall mean the work of authorship, whether in Source or | ||
Object form, made available under the License, as indicated by a | ||
copyright notice that is included in or attached to the work | ||
(an example is provided in the Appendix below). | ||
|
||
"Derivative Works" shall mean any work, whether in Source or Object | ||
form, that is based on (or derived from) the Work and for which the | ||
editorial revisions, annotations, elaborations, or other modifications | ||
represent, as a whole, an original work of authorship. For the purposes | ||
of this License, Derivative Works shall not include works that remain | ||
separable from, or merely link (or bind by name) to the interfaces of, | ||
the Work and Derivative Works thereof. | ||
|
||
"Contribution" shall mean any work of authorship, including | ||
the original version of the Work and any modifications or additions | ||
to that Work or Derivative Works thereof, that is intentionally | ||
submitted to Licensor for inclusion in the Work by the copyright owner | ||
or by an individual or Legal Entity authorized to submit on behalf of | ||
the copyright owner. For the purposes of this definition, "submitted" | ||
means any form of electronic, verbal, or written communication sent | ||
to the Licensor or its representatives, including but not limited to | ||
communication on electronic mailing lists, source code control systems, | ||
and issue tracking systems that are managed by, or on behalf of, the | ||
Licensor for the purpose of discussing and improving the Work, but | ||
excluding communication that is conspicuously marked or otherwise | ||
designated in writing by the copyright owner as "Not a Contribution." | ||
|
||
"Contributor" shall mean Licensor and any individual or Legal Entity | ||
on behalf of whom a Contribution has been received by Licensor and | ||
subsequently incorporated within the Work. | ||
|
||
2. Grant of Copyright License. Subject to the terms and conditions of | ||
this License, each Contributor hereby grants to You a perpetual, | ||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable | ||
copyright license to reproduce, prepare Derivative Works of, | ||
publicly display, publicly perform, sublicense, and distribute the | ||
Work and such Derivative Works in Source or Object form. | ||
|
||
3. Grant of Patent License. Subject to the terms and conditions of | ||
this License, each Contributor hereby grants to You a perpetual, | ||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable | ||
(except as stated in this section) patent license to make, have made, | ||
use, offer to sell, sell, import, and otherwise transfer the Work, | ||
where such license applies only to those patent claims licensable | ||
by such Contributor that are necessarily infringed by their | ||
Contribution(s) alone or by combination of their Contribution(s) | ||
with the Work to which such Contribution(s) was submitted. If You | ||
institute patent litigation against any entity (including a | ||
cross-claim or counterclaim in a lawsuit) alleging that the Work | ||
or a Contribution incorporated within the Work constitutes direct | ||
or contributory patent infringement, then any patent licenses | ||
granted to You under this License for that Work shall terminate | ||
as of the date such litigation is filed. | ||
|
||
4. Redistribution. You may reproduce and distribute copies of the | ||
Work or Derivative Works thereof in any medium, with or without | ||
modifications, and in Source or Object form, provided that You | ||
meet the following conditions: | ||
|
||
(a) You must give any other recipients of the Work or | ||
Derivative Works a copy of this License; and | ||
|
||
(b) You must cause any modified files to carry prominent notices | ||
stating that You changed the files; and | ||
|
||
(c) You must retain, in the Source form of any Derivative Works | ||
that You distribute, all copyright, patent, trademark, and | ||
attribution notices from the Source form of the Work, | ||
excluding those notices that do not pertain to any part of | ||
the Derivative Works; and | ||
|
||
(d) If the Work includes a "NOTICE" text file as part of its | ||
distribution, then any Derivative Works that You distribute must | ||
include a readable copy of the attribution notices contained | ||
within such NOTICE file, excluding those notices that do not | ||
pertain to any part of the Derivative Works, in at least one | ||
of the following places: within a NOTICE text file distributed | ||
as part of the Derivative Works; within the Source form or | ||
documentation, if provided along with the Derivative Works; or, | ||
within a display generated by the Derivative Works, if and | ||
wherever such third-party notices normally appear. The contents | ||
of the NOTICE file are for informational purposes only and | ||
do not modify the License. You may add Your own attribution | ||
notices within Derivative Works that You distribute, alongside | ||
or as an addendum to the NOTICE text from the Work, provided | ||
that such additional attribution notices cannot be construed | ||
as modifying the License. | ||
|
||
You may add Your own copyright statement to Your modifications and | ||
may provide additional or different license terms and conditions | ||
for use, reproduction, or distribution of Your modifications, or | ||
for any such Derivative Works as a whole, provided Your use, | ||
reproduction, and distribution of the Work otherwise complies with | ||
the conditions stated in this License. | ||
|
||
5. Submission of Contributions. Unless You explicitly state otherwise, | ||
any Contribution intentionally submitted for inclusion in the Work | ||
by You to the Licensor shall be under the terms and conditions of | ||
this License, without any additional terms or conditions. | ||
Notwithstanding the above, nothing herein shall supersede or modify | ||
the terms of any separate license agreement you may have executed | ||
with Licensor regarding such Contributions. | ||
|
||
6. Trademarks. This License does not grant permission to use the trade | ||
names, trademarks, service marks, or product names of the Licensor, | ||
except as required for reasonable and customary use in describing the | ||
origin of the Work and reproducing the content of the NOTICE file. | ||
|
||
7. Disclaimer of Warranty. Unless required by applicable law or | ||
agreed to in writing, Licensor provides the Work (and each | ||
Contributor provides its Contributions) on an "AS IS" BASIS, | ||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or | ||
implied, including, without limitation, any warranties or conditions | ||
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A | ||
PARTICULAR PURPOSE. You are solely responsible for determining the | ||
appropriateness of using or redistributing the Work and assume any | ||
risks associated with Your exercise of permissions under this License. | ||
|
||
8. Limitation of Liability. In no event and under no legal theory, | ||
whether in tort (including negligence), contract, or otherwise, | ||
unless required by applicable law (such as deliberate and grossly | ||
negligent acts) or agreed to in writing, shall any Contributor be | ||
liable to You for damages, including any direct, indirect, special, | ||
incidental, or consequential damages of any character arising as a | ||
result of this License or out of the use or inability to use the | ||
Work (including but not limited to damages for loss of goodwill, | ||
work stoppage, computer failure or malfunction, or any and all | ||
other commercial damages or losses), even if such Contributor | ||
has been advised of the possibility of such damages. | ||
|
||
9. Accepting Warranty or Additional Liability. While redistributing | ||
the Work or Derivative Works thereof, You may choose to offer, | ||
and charge a fee for, acceptance of support, warranty, indemnity, | ||
or other liability obligations and/or rights consistent with this | ||
License. However, in accepting such obligations, You may act only | ||
on Your own behalf and on Your sole responsibility, not on behalf | ||
of any other Contributor, and only if You agree to indemnify, | ||
defend, and hold each Contributor harmless for any liability | ||
incurred by, or claims asserted against, such Contributor by reason | ||
of your accepting any such warranty or additional liability. | ||
|
||
END OF TERMS AND CONDITIONS | ||
|
||
APPENDIX: How to apply the Apache License to your work. | ||
|
||
To apply the Apache License to your work, attach the following | ||
boilerplate notice, with the fields enclosed by brackets "[]" | ||
replaced with your own identifying information. (Don't include | ||
the brackets!) The text should be enclosed in the appropriate | ||
comment syntax for the file format. We also recommend that a | ||
file or class name and description of purpose be included on the | ||
same "printed page" as the copyright notice for easier | ||
identification within third-party archives. | ||
|
||
Copyright [yyyy] [name of copyright owner] | ||
|
||
Licensed under the Apache License, Version 2.0 (the "License"); | ||
you may not use this file except in compliance with the License. | ||
You may obtain a copy of the License at | ||
|
||
http://www.apache.org/licenses/LICENSE-2.0 | ||
|
||
Unless required by applicable law or agreed to in writing, software | ||
distributed under the License is distributed on an "AS IS" BASIS, | ||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
See the License for the specific language governing permissions and | ||
limitations under the License. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
Copyright (c) 2020 The Rust Project Developers | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
in the Software without restriction, including without limitation the rights | ||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the Software is | ||
furnished to do so, subject to the following conditions: | ||
|
||
The above copyright notice and this permission notice shall be included in all | ||
copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
SOFTWARE. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
# The Rust standard library's portable SIMD API | ||
[](https://travis-ci.com/rust-lang/portable-simd) | ||
|
||
Code repository for the [Portable SIMD Project Group](https://github.com/rust-lang/project-portable-simd). | ||
Please refer to [CONTRIBUTING.md](./CONTRIBUTING.md) for our contributing guidelines. | ||
|
||
The docs for this crate are published from the main branch. | ||
You can [read them here][docs]. | ||
|
||
If you have questions about SIMD, we have begun writing a [guide][simd-guide]. | ||
We can also be found on [Zulip][zulip-project-portable-simd]. | ||
|
||
If you are interested in support for a specific architecture, you may want [stdarch] instead. | ||
|
||
## Hello World | ||
|
||
Now we're gonna dip our toes into this world with a small SIMD "Hello, World!" example. Make sure your compiler is up to date and using `nightly`. We can do that by running | ||
|
||
```bash | ||
rustup update -- nightly | ||
``` | ||
|
||
or by setting up `rustup default nightly` or else with `cargo +nightly {build,test,run}`. After updating, run | ||
```bash | ||
cargo new hellosimd | ||
``` | ||
to create a new crate. Edit `hellosimd/Cargo.toml` to be | ||
```toml | ||
[package] | ||
name = "hellosimd" | ||
version = "0.1.0" | ||
edition = "2018" | ||
[dependencies] | ||
core_simd = { git = "https://github.com/rust-lang/portable-simd" } | ||
``` | ||
|
||
and finally write this in `src/main.rs`: | ||
```rust | ||
use core_simd::*; | ||
fn main() { | ||
let a = f32x4::splat(10.0); | ||
let b = f32x4::from_array([1.0, 2.0, 3.0, 4.0]); | ||
println!("{:?}", a + b); | ||
} | ||
``` | ||
|
||
Explanation: We import all the bindings from the crate with the first line. Then, we construct our SIMD vectors with methods like `splat` or `from_array`. Finally, we can use operators on them like `+` and the appropriate SIMD instructions will be carried out. When we run `cargo run` you should get `[11.0, 12.0, 13.0, 14.0]`. | ||
|
||
## Code Organization | ||
|
||
Currently the crate is organized so that each element type is a file, and then the 64-bit, 128-bit, 256-bit, and 512-bit vectors using those types are contained in said file. | ||
|
||
All types are then exported as a single, flat module. | ||
|
||
Depending on the size of the primitive type, the number of lanes the vector will have varies. For example, 128-bit vectors have four `f32` lanes and two `f64` lanes. | ||
|
||
The supported element types are as follows: | ||
* **Floating Point:** `f32`, `f64` | ||
* **Signed Integers:** `i8`, `i16`, `i32`, `i64`, `i128`, `isize` | ||
* **Unsigned Integers:** `u8`, `u16`, `u32`, `u64`, `u128`, `usize` | ||
* **Masks:** `mask8`, `mask16`, `mask32`, `mask64`, `mask128`, `masksize` | ||
|
||
Floating point, signed integers, and unsigned integers are the [primitive types](https://doc.rust-lang.org/core/primitive/index.html) you're already used to. | ||
The `mask` types are "truthy" values, but they use the number of bits in their name instead of just 1 bit like a normal `bool` uses. | ||
|
||
[simd-guide]: ./beginners-guide.md | ||
[zulip-project-portable-simd]: https://rust-lang.zulipchat.com/#narrow/stream/257879-project-portable-simd | ||
[stdarch]: https://github.com/rust-lang/stdarch | ||
[docs]: https://rust-lang.github.io/portable-simd/core_simd |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,86 @@ | ||
|
||
# Beginner's Guide To SIMD | ||
|
||
Hello and welcome to our SIMD basics guide! | ||
|
||
Because SIMD is a subject that many programmers haven't worked with before, we thought that it's best to outline some terms and other basics for you to get started with. | ||
|
||
## Quick Background | ||
|
||
**SIMD** stands for *Single Instruction, Multiple Data*. In other words, SIMD is when the CPU performs a single action on more than one logical piece of data at the same time. Instead of adding two registers that each contain one `f32` value and getting an `f32` as the result, you might add two registers that each contain `f32x4` (128 bits of data) and then you get an `f32x4` as the output. | ||
|
||
This might seem a tiny bit weird at first, but there's a good reason for it. Back in the day, as CPUs got faster and faster, eventually they got so fast that the CPU would just melt itself. The heat management (heat sinks, fans, etc) simply couldn't keep up with how much electricity was going through the metal. Two main strategies were developed to help get around the limits of physics. | ||
* One of them you're probably familiar with: Multi-core processors. By giving a processor more than one core, each core can do its own work, and because they're physically distant (at least on the CPU's scale) the heat can still be managed. Unfortunately, not all tasks can just be split up across cores in an efficient way. | ||
* The second strategy is SIMD. If you can't make the register go any faster, you can still make the register *wider*. This lets you process more data at a time, which is *almost* as good as just having a faster CPU. As with multi-core programming, SIMD doesn't fit every kind of task, so you have to know when it will improve your program. | ||
|
||
## Terms | ||
|
||
SIMD has a few special vocabulary terms you should know: | ||
|
||
* **Vector:** A SIMD value is called a vector. This shouldn't be confused with the `Vec<T>` type. A SIMD vector has a fixed size, known at compile time. All of the elements within the vector are of the same type. This makes vectors *similar to* arrays. One difference is that a vector is generally aligned to its *entire* size (eg: 16 bytes, 32 bytes, etc), not just the size of an individual element. Sometimes vector data is called "packed" data. | ||
|
||
* **Vectorize**: An operation that uses SIMD instructions to operate over a vector is often referred to as "vectorized". | ||
|
||
* **Autovectorization**: Also known as _implicit vectorization_. This is when a compiler can automatically recognize a situation where scalar instructions may be replaced with SIMD instructions, and use those instead. | ||
|
||
* **Scalar:** "Scalar" in mathematical contexts refers to values that can be represented as a single element, mostly numbers like 6, 3.14, or -2. It can also be used to describe "scalar operations" that use strictly scalar values, like addition. This term is mostly used to differentiate between vectorized operations that use SIMD instructions and scalar operations that don't. | ||
|
||
* **Lane:** A single element position within a vector is called a lane. If you have `N` lanes available then they're numbered from `0` to `N-1` when referring to them, again like an array. The biggest difference between an array element and a vector lane is that in general is *relatively costly* to access an individual lane value. On most architectures, the vector has to be pushed out of the SIMD register onto the stack, then an individual lane is accessed while it's on the stack (and possibly the stack value is read back into a register). For this reason, when working with SIMD you should avoid reading or writing the value of an individual lane during hot loops. | ||
|
||
* **Bit Widths:** When talking about SIMD, the bit widths used are the bit size of the vectors involved, *not* the individual elements. So "128-bit SIMD" has 128-bit vectors, and that might be `f32x4`, `i32x4`, `i16x8`, or other variations. While 128-bit SIMD is the most common, there's also 64-bit, 256-bit, and even 512-bit on the newest CPUs. | ||
|
||
* **Vector Register:** The extra-wide registers that are used for SIMD operations are commonly called vector registers, though you may also see "SIMD registers", vendor names for specific features, or even "floating-point register" as it is common for the same registers to be used with both scalar and vectorized floating-point operations. | ||
|
||
* **Vertical:** When an operation is "vertical", each lane processes individually without regard to the other lanes in the same vector. For example, a "vertical add" between two vectors would add lane 0 in `a` with lane 0 in `b`, with the total in lane 0 of `out`, and then the same thing for lanes 1, 2, etc. Most SIMD operations are vertical operations, so if your problem is a vertical problem then you can probably solve it with SIMD. | ||
|
||
* **Horizontal:** When an operation is "horizontal", the lanes within a single vector interact in some way. A "horizontal add" might add up lane 0 of `a` with lane 1 of `a`, with the total in lane 0 of `out`. | ||
|
||
* **Target Feature:** Rust calls a CPU architecture extension a `target_feature`. Proper SIMD requires various CPU extensions to be enabled (details below). Don't confuse this with `feature`, which is a Cargo crate concept. | ||
|
||
## Target Features | ||
|
||
When using SIMD, you should be familiar with the CPU feature set that you're targeting. | ||
|
||
On `arm` and `aarch64` it's fairly simple. There's just one CPU feature that controls if SIMD is available: `neon` (or "NEON", all caps, as the ARM docs often put it). Neon registers can be used as 64-bit or 128-bit. When doing 128-bit operations it just uses two 64-bit registers as a single 128-bit register. | ||
|
||
> By default, the `aarch64`, `arm`, and `thumb` Rust targets generally do not enable `neon` unless it's in the target string. | ||
On `x86` and `x86_64` it's slightly more complicated. The SIMD support is split into many levels: | ||
* 128-bit: `sse`, `sse2`, `sse3`, `ssse3` (not a typo!), `sse4.1`, `sse4.2`, `sse4a` (AMD only) | ||
* 256-bit (mostly): `avx`, `avx2`, `fma` | ||
* 512-bit (mostly): a *wide* range of `avx512` variations | ||
|
||
The list notes the bit widths available at each feature level, though the operations of the more advanced features can generally be used with the smaller register sizes as well. For example, new operations introduced in `avx` generally have a 128-bit form as well as a 256-bit form. This means that even if you only do 128-bit work you can still benefit from the later feature levels. | ||
|
||
> By default, the `i686` and `x86_64` Rust targets enable `sse` and `sse2`. | ||
### Selecting Additional Target Features | ||
|
||
If you want to enable support for a target feature within your build, generally you should use a [target-feature](https://rust-lang.github.io/packed_simd/perf-guide/target-feature/rustflags.html#target-feature) setting within you `RUSTFLAGS` setting. | ||
|
||
If you know that you're targeting a specific CPU you can instead use the [target-cpu](https://rust-lang.github.io/packed_simd/perf-guide/target-feature/rustflags.html#target-cpu) flag and the compiler will enable the correct set of features for that CPU. | ||
|
||
The [Steam Hardware Survey](https://store.steampowered.com/hwsurvey/Steam-Hardware-Software-Survey-Welcome-to-Steam) is one of the few places with data on how common various CPU features are. The dataset is limited to "the kinds of computers owned by people who play computer games", so the info only covers `x86`/`x86_64`, and it also probably skews to slightly higher quality computers than average. Still, we can see that the `sse` levels have very high support, `avx` and `avx2` are quite common as well, and the `avx-512` family is still so early in adoption you can barely find it in consumer grade stuff. | ||
|
||
## Running a program compiled for a CPU feature level that the CPU doesn't support is automatic undefined behavior. | ||
|
||
This means that if you build your program with `avx` support enabled and run it on a CPU without `avx` support, it's **instantly** undefined behavior. | ||
|
||
Even without an `unsafe` block in sight. | ||
|
||
This is no bug in Rust, or soundness hole in the type system. You just plain can't make a CPU do what it doesn't know how to do. | ||
|
||
This is why the various Rust targets *don't* enable many CPU feature flags by default: requiring a more advanced CPU makes the final binary *less* portable. | ||
|
||
So please select an appropriate CPU feature level when building your programs. | ||
|
||
## Size, Alignment, and Unsafe Code | ||
|
||
Most of the portable SIMD API is designed to allow the user to gloss over the details of different architectures and avoid using unsafe code. However, there are plenty of reasons to want to use unsafe code with these SIMD types, such as using an intrinsic function from `core::arch` to further accelerate particularly specialized SIMD operations on a given platform, while still using the portable API elsewhere. For these cases, there are some rules to keep in mind. | ||
|
||
Fortunately, most SIMD types have a fairly predictable size. `i32x4` is bit-equivalent to `[i32; 4]` and so can be bitcast to it, e.g. using [`mem::transmute`], though the API usually offers a safe cast you can use instead. | ||
|
||
However, this is not the same as alignment. Computer architectures generally prefer aligned accesses, especially when moving data between memory and vector registers, and while some support specialized operations that can bend the rules to help with this, unaligned access is still typically slow, or even undefined behavior. In addition, different architectures can require different alignments when interacting with their native SIMD types. For this reason, any `#[repr(simd)]` type has a non-portable alignment. If it is necessary to directly interact with the alignment of these types, it should be via [`mem::align_of`]. | ||
|
||
[`mem::transmute`]: https://doc.rust-lang.org/core/mem/fn.transmute.html | ||
[`mem::align_of`]: https://doc.rust-lang.org/core/mem/fn.align_of.html |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
[package] | ||
name = "core_simd" | ||
version = "0.1.0" | ||
edition = "2021" | ||
homepage = "https://github.com/rust-lang/portable-simd" | ||
repository = "https://github.com/rust-lang/portable-simd" | ||
keywords = ["core", "simd", "intrinsics"] | ||
categories = ["hardware-support", "no-std"] | ||
license = "MIT OR Apache-2.0" | ||
|
||
[features] | ||
default = ["std", "generic_const_exprs"] | ||
std = [] | ||
generic_const_exprs = [] | ||
|
||
[target.'cfg(target_arch = "wasm32")'.dev-dependencies.wasm-bindgen] | ||
version = "0.2" | ||
|
||
[dev-dependencies.wasm-bindgen-test] | ||
version = "0.3" | ||
|
||
[dev-dependencies.proptest] | ||
version = "0.10" | ||
default-features = false | ||
features = ["alloc"] | ||
|
||
[dev-dependencies.test_helpers] | ||
path = "../test_helpers" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,202 @@ | ||
|
||
Apache License | ||
Version 2.0, January 2004 | ||
http://www.apache.org/licenses/ | ||
|
||
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION | ||
|
||
1. Definitions. | ||
|
||
"License" shall mean the terms and conditions for use, reproduction, | ||
and distribution as defined by Sections 1 through 9 of this document. | ||
|
||
"Licensor" shall mean the copyright owner or entity authorized by | ||
the copyright owner that is granting the License. | ||
|
||
"Legal Entity" shall mean the union of the acting entity and all | ||
other entities that control, are controlled by, or are under common | ||
control with that entity. For the purposes of this definition, | ||
"control" means (i) the power, direct or indirect, to cause the | ||
direction or management of such entity, whether by contract or | ||
otherwise, or (ii) ownership of fifty percent (50%) or more of the | ||
outstanding shares, or (iii) beneficial ownership of such entity. | ||
|
||
"You" (or "Your") shall mean an individual or Legal Entity | ||
exercising permissions granted by this License. | ||
|
||
"Source" form shall mean the preferred form for making modifications, | ||
including but not limited to software source code, documentation | ||
source, and configuration files. | ||
|
||
"Object" form shall mean any form resulting from mechanical | ||
transformation or translation of a Source form, including but | ||
not limited to compiled object code, generated documentation, | ||
and conversions to other media types. | ||
|
||
"Work" shall mean the work of authorship, whether in Source or | ||
Object form, made available under the License, as indicated by a | ||
copyright notice that is included in or attached to the work | ||
(an example is provided in the Appendix below). | ||
|
||
"Derivative Works" shall mean any work, whether in Source or Object | ||
form, that is based on (or derived from) the Work and for which the | ||
editorial revisions, annotations, elaborations, or other modifications | ||
represent, as a whole, an original work of authorship. For the purposes | ||
of this License, Derivative Works shall not include works that remain | ||
separable from, or merely link (or bind by name) to the interfaces of, | ||
the Work and Derivative Works thereof. | ||
|
||
"Contribution" shall mean any work of authorship, including | ||
the original version of the Work and any modifications or additions | ||
to that Work or Derivative Works thereof, that is intentionally | ||
submitted to Licensor for inclusion in the Work by the copyright owner | ||
or by an individual or Legal Entity authorized to submit on behalf of | ||
the copyright owner. For the purposes of this definition, "submitted" | ||
means any form of electronic, verbal, or written communication sent | ||
to the Licensor or its representatives, including but not limited to | ||
communication on electronic mailing lists, source code control systems, | ||
and issue tracking systems that are managed by, or on behalf of, the | ||
Licensor for the purpose of discussing and improving the Work, but | ||
excluding communication that is conspicuously marked or otherwise | ||
designated in writing by the copyright owner as "Not a Contribution." | ||
|
||
"Contributor" shall mean Licensor and any individual or Legal Entity | ||
on behalf of whom a Contribution has been received by Licensor and | ||
subsequently incorporated within the Work. | ||
|
||
2. Grant of Copyright License. Subject to the terms and conditions of | ||
this License, each Contributor hereby grants to You a perpetual, | ||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable | ||
copyright license to reproduce, prepare Derivative Works of, | ||
publicly display, publicly perform, sublicense, and distribute the | ||
Work and such Derivative Works in Source or Object form. | ||
|
||
3. Grant of Patent License. Subject to the terms and conditions of | ||
this License, each Contributor hereby grants to You a perpetual, | ||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable | ||
(except as stated in this section) patent license to make, have made, | ||
use, offer to sell, sell, import, and otherwise transfer the Work, | ||
where such license applies only to those patent claims licensable | ||
by such Contributor that are necessarily infringed by their | ||
Contribution(s) alone or by combination of their Contribution(s) | ||
with the Work to which such Contribution(s) was submitted. If You | ||
institute patent litigation against any entity (including a | ||
cross-claim or counterclaim in a lawsuit) alleging that the Work | ||
or a Contribution incorporated within the Work constitutes direct | ||
or contributory patent infringement, then any patent licenses | ||
granted to You under this License for that Work shall terminate | ||
as of the date such litigation is filed. | ||
|
||
4. Redistribution. You may reproduce and distribute copies of the | ||
Work or Derivative Works thereof in any medium, with or without | ||
modifications, and in Source or Object form, provided that You | ||
meet the following conditions: | ||
|
||
(a) You must give any other recipients of the Work or | ||
Derivative Works a copy of this License; and | ||
|
||
(b) You must cause any modified files to carry prominent notices | ||
stating that You changed the files; and | ||
|
||
(c) You must retain, in the Source form of any Derivative Works | ||
that You distribute, all copyright, patent, trademark, and | ||
attribution notices from the Source form of the Work, | ||
excluding those notices that do not pertain to any part of | ||
the Derivative Works; and | ||
|
||
(d) If the Work includes a "NOTICE" text file as part of its | ||
distribution, then any Derivative Works that You distribute must | ||
include a readable copy of the attribution notices contained | ||
within such NOTICE file, excluding those notices that do not | ||
pertain to any part of the Derivative Works, in at least one | ||
of the following places: within a NOTICE text file distributed | ||
as part of the Derivative Works; within the Source form or | ||
documentation, if provided along with the Derivative Works; or, | ||
within a display generated by the Derivative Works, if and | ||
wherever such third-party notices normally appear. The contents | ||
of the NOTICE file are for informational purposes only and | ||
do not modify the License. You may add Your own attribution | ||
notices within Derivative Works that You distribute, alongside | ||
or as an addendum to the NOTICE text from the Work, provided | ||
that such additional attribution notices cannot be construed | ||
as modifying the License. | ||
|
||
You may add Your own copyright statement to Your modifications and | ||
may provide additional or different license terms and conditions | ||
for use, reproduction, or distribution of Your modifications, or | ||
for any such Derivative Works as a whole, provided Your use, | ||
reproduction, and distribution of the Work otherwise complies with | ||
the conditions stated in this License. | ||
|
||
5. Submission of Contributions. Unless You explicitly state otherwise, | ||
any Contribution intentionally submitted for inclusion in the Work | ||
by You to the Licensor shall be under the terms and conditions of | ||
this License, without any additional terms or conditions. | ||
Notwithstanding the above, nothing herein shall supersede or modify | ||
the terms of any separate license agreement you may have executed | ||
with Licensor regarding such Contributions. | ||
|
||
6. Trademarks. This License does not grant permission to use the trade | ||
names, trademarks, service marks, or product names of the Licensor, | ||
except as required for reasonable and customary use in describing the | ||
origin of the Work and reproducing the content of the NOTICE file. | ||
|
||
7. Disclaimer of Warranty. Unless required by applicable law or | ||
agreed to in writing, Licensor provides the Work (and each | ||
Contributor provides its Contributions) on an "AS IS" BASIS, | ||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or | ||
implied, including, without limitation, any warranties or conditions | ||
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A | ||
PARTICULAR PURPOSE. You are solely responsible for determining the | ||
appropriateness of using or redistributing the Work and assume any | ||
risks associated with Your exercise of permissions under this License. | ||
|
||
8. Limitation of Liability. In no event and under no legal theory, | ||
whether in tort (including negligence), contract, or otherwise, | ||
unless required by applicable law (such as deliberate and grossly | ||
negligent acts) or agreed to in writing, shall any Contributor be | ||
liable to You for damages, including any direct, indirect, special, | ||
incidental, or consequential damages of any character arising as a | ||
result of this License or out of the use or inability to use the | ||
Work (including but not limited to damages for loss of goodwill, | ||
work stoppage, computer failure or malfunction, or any and all | ||
other commercial damages or losses), even if such Contributor | ||
has been advised of the possibility of such damages. | ||
|
||
9. Accepting Warranty or Additional Liability. While redistributing | ||
the Work or Derivative Works thereof, You may choose to offer, | ||
and charge a fee for, acceptance of support, warranty, indemnity, | ||
or other liability obligations and/or rights consistent with this | ||
License. However, in accepting such obligations, You may act only | ||
on Your own behalf and on Your sole responsibility, not on behalf | ||
of any other Contributor, and only if You agree to indemnify, | ||
defend, and hold each Contributor harmless for any liability | ||
incurred by, or claims asserted against, such Contributor by reason | ||
of your accepting any such warranty or additional liability. | ||
|
||
END OF TERMS AND CONDITIONS | ||
|
||
APPENDIX: How to apply the Apache License to your work. | ||
|
||
To apply the Apache License to your work, attach the following | ||
boilerplate notice, with the fields enclosed by brackets "[]" | ||
replaced with your own identifying information. (Don't include | ||
the brackets!) The text should be enclosed in the appropriate | ||
comment syntax for the file format. We also recommend that a | ||
file or class name and description of purpose be included on the | ||
same "printed page" as the copyright notice for easier | ||
identification within third-party archives. | ||
|
||
Copyright [yyyy] [name of copyright owner] | ||
|
||
Licensed under the Apache License, Version 2.0 (the "License"); | ||
you may not use this file except in compliance with the License. | ||
You may obtain a copy of the License at | ||
|
||
http://www.apache.org/licenses/LICENSE-2.0 | ||
|
||
Unless required by applicable law or agreed to in writing, software | ||
distributed under the License is distributed on an "AS IS" BASIS, | ||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
See the License for the specific language governing permissions and | ||
limitations under the License. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
Copyright (c) 2020 The Rust Project Developers | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
in the Software without restriction, including without limitation the rights | ||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the Software is | ||
furnished to do so, subject to the following conditions: | ||
|
||
The above copyright notice and this permission notice shall be included in all | ||
copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
SOFTWARE. |
316 changes: 316 additions & 0 deletions
316
library/portable-simd/crates/core_simd/examples/matrix_inversion.rs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,316 @@ | ||
//! 4x4 matrix inverse | ||
// Code ported from the `packed_simd` crate | ||
// Run this code with `cargo test --example matrix_inversion` | ||
#![feature(array_chunks, portable_simd)] | ||
use core_simd::simd::*; | ||
use Which::*; | ||
|
||
// Gotta define our own 4x4 matrix since Rust doesn't ship multidim arrays yet :^) | ||
#[derive(Copy, Clone, Debug, PartialEq, PartialOrd)] | ||
pub struct Matrix4x4([[f32; 4]; 4]); | ||
|
||
#[allow(clippy::too_many_lines)] | ||
pub fn scalar_inv4x4(m: Matrix4x4) -> Option<Matrix4x4> { | ||
let m = m.0; | ||
|
||
#[rustfmt::skip] | ||
let mut inv = [ | ||
// row 0: | ||
[ | ||
// 0,0: | ||
m[1][1] * m[2][2] * m[3][3] - | ||
m[1][1] * m[2][3] * m[3][2] - | ||
m[2][1] * m[1][2] * m[3][3] + | ||
m[2][1] * m[1][3] * m[3][2] + | ||
m[3][1] * m[1][2] * m[2][3] - | ||
m[3][1] * m[1][3] * m[2][2], | ||
// 0,1: | ||
-m[0][1] * m[2][2] * m[3][3] + | ||
m[0][1] * m[2][3] * m[3][2] + | ||
m[2][1] * m[0][2] * m[3][3] - | ||
m[2][1] * m[0][3] * m[3][2] - | ||
m[3][1] * m[0][2] * m[2][3] + | ||
m[3][1] * m[0][3] * m[2][2], | ||
// 0,2: | ||
m[0][1] * m[1][2] * m[3][3] - | ||
m[0][1] * m[1][3] * m[3][2] - | ||
m[1][1] * m[0][2] * m[3][3] + | ||
m[1][1] * m[0][3] * m[3][2] + | ||
m[3][1] * m[0][2] * m[1][3] - | ||
m[3][1] * m[0][3] * m[1][2], | ||
// 0,3: | ||
-m[0][1] * m[1][2] * m[2][3] + | ||
m[0][1] * m[1][3] * m[2][2] + | ||
m[1][1] * m[0][2] * m[2][3] - | ||
m[1][1] * m[0][3] * m[2][2] - | ||
m[2][1] * m[0][2] * m[1][3] + | ||
m[2][1] * m[0][3] * m[1][2], | ||
], | ||
// row 1 | ||
[ | ||
// 1,0: | ||
-m[1][0] * m[2][2] * m[3][3] + | ||
m[1][0] * m[2][3] * m[3][2] + | ||
m[2][0] * m[1][2] * m[3][3] - | ||
m[2][0] * m[1][3] * m[3][2] - | ||
m[3][0] * m[1][2] * m[2][3] + | ||
m[3][0] * m[1][3] * m[2][2], | ||
// 1,1: | ||
m[0][0] * m[2][2] * m[3][3] - | ||
m[0][0] * m[2][3] * m[3][2] - | ||
m[2][0] * m[0][2] * m[3][3] + | ||
m[2][0] * m[0][3] * m[3][2] + | ||
m[3][0] * m[0][2] * m[2][3] - | ||
m[3][0] * m[0][3] * m[2][2], | ||
// 1,2: | ||
-m[0][0] * m[1][2] * m[3][3] + | ||
m[0][0] * m[1][3] * m[3][2] + | ||
m[1][0] * m[0][2] * m[3][3] - | ||
m[1][0] * m[0][3] * m[3][2] - | ||
m[3][0] * m[0][2] * m[1][3] + | ||
m[3][0] * m[0][3] * m[1][2], | ||
// 1,3: | ||
m[0][0] * m[1][2] * m[2][3] - | ||
m[0][0] * m[1][3] * m[2][2] - | ||
m[1][0] * m[0][2] * m[2][3] + | ||
m[1][0] * m[0][3] * m[2][2] + | ||
m[2][0] * m[0][2] * m[1][3] - | ||
m[2][0] * m[0][3] * m[1][2], | ||
], | ||
// row 2 | ||
[ | ||
// 2,0: | ||
m[1][0] * m[2][1] * m[3][3] - | ||
m[1][0] * m[2][3] * m[3][1] - | ||
m[2][0] * m[1][1] * m[3][3] + | ||
m[2][0] * m[1][3] * m[3][1] + | ||
m[3][0] * m[1][1] * m[2][3] - | ||
m[3][0] * m[1][3] * m[2][1], | ||
// 2,1: | ||
-m[0][0] * m[2][1] * m[3][3] + | ||
m[0][0] * m[2][3] * m[3][1] + | ||
m[2][0] * m[0][1] * m[3][3] - | ||
m[2][0] * m[0][3] * m[3][1] - | ||
m[3][0] * m[0][1] * m[2][3] + | ||
m[3][0] * m[0][3] * m[2][1], | ||
// 2,2: | ||
m[0][0] * m[1][1] * m[3][3] - | ||
m[0][0] * m[1][3] * m[3][1] - | ||
m[1][0] * m[0][1] * m[3][3] + | ||
m[1][0] * m[0][3] * m[3][1] + | ||
m[3][0] * m[0][1] * m[1][3] - | ||
m[3][0] * m[0][3] * m[1][1], | ||
// 2,3: | ||
-m[0][0] * m[1][1] * m[2][3] + | ||
m[0][0] * m[1][3] * m[2][1] + | ||
m[1][0] * m[0][1] * m[2][3] - | ||
m[1][0] * m[0][3] * m[2][1] - | ||
m[2][0] * m[0][1] * m[1][3] + | ||
m[2][0] * m[0][3] * m[1][1], | ||
], | ||
// row 3 | ||
[ | ||
// 3,0: | ||
-m[1][0] * m[2][1] * m[3][2] + | ||
m[1][0] * m[2][2] * m[3][1] + | ||
m[2][0] * m[1][1] * m[3][2] - | ||
m[2][0] * m[1][2] * m[3][1] - | ||
m[3][0] * m[1][1] * m[2][2] + | ||
m[3][0] * m[1][2] * m[2][1], | ||
// 3,1: | ||
m[0][0] * m[2][1] * m[3][2] - | ||
m[0][0] * m[2][2] * m[3][1] - | ||
m[2][0] * m[0][1] * m[3][2] + | ||
m[2][0] * m[0][2] * m[3][1] + | ||
m[3][0] * m[0][1] * m[2][2] - | ||
m[3][0] * m[0][2] * m[2][1], | ||
// 3,2: | ||
-m[0][0] * m[1][1] * m[3][2] + | ||
m[0][0] * m[1][2] * m[3][1] + | ||
m[1][0] * m[0][1] * m[3][2] - | ||
m[1][0] * m[0][2] * m[3][1] - | ||
m[3][0] * m[0][1] * m[1][2] + | ||
m[3][0] * m[0][2] * m[1][1], | ||
// 3,3: | ||
m[0][0] * m[1][1] * m[2][2] - | ||
m[0][0] * m[1][2] * m[2][1] - | ||
m[1][0] * m[0][1] * m[2][2] + | ||
m[1][0] * m[0][2] * m[2][1] + | ||
m[2][0] * m[0][1] * m[1][2] - | ||
m[2][0] * m[0][2] * m[1][1], | ||
], | ||
]; | ||
|
||
let det = m[0][0] * inv[0][0] + m[0][1] * inv[1][0] + m[0][2] * inv[2][0] + m[0][3] * inv[3][0]; | ||
if det == 0. { | ||
return None; | ||
} | ||
|
||
let det_inv = 1. / det; | ||
|
||
for row in &mut inv { | ||
for elem in row.iter_mut() { | ||
*elem *= det_inv; | ||
} | ||
} | ||
|
||
Some(Matrix4x4(inv)) | ||
} | ||
|
||
pub fn simd_inv4x4(m: Matrix4x4) -> Option<Matrix4x4> { | ||
let m = m.0; | ||
let m_0 = f32x4::from_array(m[0]); | ||
let m_1 = f32x4::from_array(m[1]); | ||
let m_2 = f32x4::from_array(m[2]); | ||
let m_3 = f32x4::from_array(m[3]); | ||
|
||
const SHUFFLE01: [Which; 4] = [First(0), First(1), Second(0), Second(1)]; | ||
const SHUFFLE02: [Which; 4] = [First(0), First(2), Second(0), Second(2)]; | ||
const SHUFFLE13: [Which; 4] = [First(1), First(3), Second(1), Second(3)]; | ||
const SHUFFLE23: [Which; 4] = [First(2), First(3), Second(2), Second(3)]; | ||
|
||
let tmp = simd_swizzle!(m_0, m_1, SHUFFLE01); | ||
let row1 = simd_swizzle!(m_2, m_3, SHUFFLE01); | ||
|
||
let row0 = simd_swizzle!(tmp, row1, SHUFFLE02); | ||
let row1 = simd_swizzle!(row1, tmp, SHUFFLE13); | ||
|
||
let tmp = simd_swizzle!(m_0, m_1, SHUFFLE23); | ||
let row3 = simd_swizzle!(m_2, m_3, SHUFFLE23); | ||
let row2 = simd_swizzle!(tmp, row3, SHUFFLE02); | ||
let row3 = simd_swizzle!(row3, tmp, SHUFFLE13); | ||
|
||
let tmp = (row2 * row3).reverse().rotate_lanes_right::<2>(); | ||
let minor0 = row1 * tmp; | ||
let minor1 = row0 * tmp; | ||
let tmp = tmp.rotate_lanes_right::<2>(); | ||
let minor0 = (row1 * tmp) - minor0; | ||
let minor1 = (row0 * tmp) - minor1; | ||
let minor1 = minor1.rotate_lanes_right::<2>(); | ||
|
||
let tmp = (row1 * row2).reverse().rotate_lanes_right::<2>(); | ||
let minor0 = (row3 * tmp) + minor0; | ||
let minor3 = row0 * tmp; | ||
let tmp = tmp.rotate_lanes_right::<2>(); | ||
|
||
let minor0 = minor0 - row3 * tmp; | ||
let minor3 = row0 * tmp - minor3; | ||
let minor3 = minor3.rotate_lanes_right::<2>(); | ||
|
||
let tmp = (row3 * row1.rotate_lanes_right::<2>()) | ||
.reverse() | ||
.rotate_lanes_right::<2>(); | ||
let row2 = row2.rotate_lanes_right::<2>(); | ||
let minor0 = row2 * tmp + minor0; | ||
let minor2 = row0 * tmp; | ||
let tmp = tmp.rotate_lanes_right::<2>(); | ||
let minor0 = minor0 - row2 * tmp; | ||
let minor2 = row0 * tmp - minor2; | ||
let minor2 = minor2.rotate_lanes_right::<2>(); | ||
|
||
let tmp = (row0 * row1).reverse().rotate_lanes_right::<2>(); | ||
let minor2 = minor2 + row3 * tmp; | ||
let minor3 = row2 * tmp - minor3; | ||
let tmp = tmp.rotate_lanes_right::<2>(); | ||
let minor2 = row3 * tmp - minor2; | ||
let minor3 = minor3 - row2 * tmp; | ||
|
||
let tmp = (row0 * row3).reverse().rotate_lanes_right::<2>(); | ||
let minor1 = minor1 - row2 * tmp; | ||
let minor2 = row1 * tmp + minor2; | ||
let tmp = tmp.rotate_lanes_right::<2>(); | ||
let minor1 = row2 * tmp + minor1; | ||
let minor2 = minor2 - row1 * tmp; | ||
|
||
let tmp = (row0 * row2).reverse().rotate_lanes_right::<2>(); | ||
let minor1 = row3 * tmp + minor1; | ||
let minor3 = minor3 - row1 * tmp; | ||
let tmp = tmp.rotate_lanes_right::<2>(); | ||
let minor1 = minor1 - row3 * tmp; | ||
let minor3 = row1 * tmp + minor3; | ||
|
||
let det = row0 * minor0; | ||
let det = det.rotate_lanes_right::<2>() + det; | ||
let det = det.reverse().rotate_lanes_right::<2>() + det; | ||
|
||
if det.horizontal_sum() == 0. { | ||
return None; | ||
} | ||
// calculate the reciprocal | ||
let tmp = f32x4::splat(1.0) / det; | ||
let det = tmp + tmp - det * tmp * tmp; | ||
|
||
let res0 = minor0 * det; | ||
let res1 = minor1 * det; | ||
let res2 = minor2 * det; | ||
let res3 = minor3 * det; | ||
|
||
let mut m = m; | ||
|
||
m[0] = res0.to_array(); | ||
m[1] = res1.to_array(); | ||
m[2] = res2.to_array(); | ||
m[3] = res3.to_array(); | ||
|
||
Some(Matrix4x4(m)) | ||
} | ||
|
||
#[cfg(test)] | ||
#[rustfmt::skip] | ||
mod tests { | ||
use super::*; | ||
|
||
#[test] | ||
fn test() { | ||
let tests: &[(Matrix4x4, Option<Matrix4x4>)] = &[ | ||
// Identity: | ||
(Matrix4x4([ | ||
[1., 0., 0., 0.], | ||
[0., 1., 0., 0.], | ||
[0., 0., 1., 0.], | ||
[0., 0., 0., 1.], | ||
]), | ||
Some(Matrix4x4([ | ||
[1., 0., 0., 0.], | ||
[0., 1., 0., 0.], | ||
[0., 0., 1., 0.], | ||
[0., 0., 0., 1.], | ||
])) | ||
), | ||
// None: | ||
(Matrix4x4([ | ||
[1., 2., 3., 4.], | ||
[12., 11., 10., 9.], | ||
[5., 6., 7., 8.], | ||
[16., 15., 14., 13.], | ||
]), | ||
None | ||
), | ||
// Other: | ||
(Matrix4x4([ | ||
[1., 1., 1., 0.], | ||
[0., 3., 1., 2.], | ||
[2., 3., 1., 0.], | ||
[1., 0., 2., 1.], | ||
]), | ||
Some(Matrix4x4([ | ||
[-3., -0.5, 1.5, 1.0], | ||
[ 1., 0.25, -0.25, -0.5], | ||
[ 3., 0.25, -1.25, -0.5], | ||
[-3., 0.0, 1.0, 1.0], | ||
])) | ||
), | ||
|
||
|
||
]; | ||
|
||
for &(input, output) in tests { | ||
assert_eq!(scalar_inv4x4(input), output); | ||
assert_eq!(simd_inv4x4(input), output); | ||
} | ||
} | ||
} | ||
|
||
fn main() { | ||
// Empty main to make cargo happy | ||
} |
193 changes: 193 additions & 0 deletions
193
library/portable-simd/crates/core_simd/examples/nbody.rs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,193 @@ | ||
#![cfg_attr(feature = "std", feature(portable_simd))] | ||
|
||
/// Benchmarks game nbody code | ||
/// Taken from the `packed_simd` crate | ||
/// Run this benchmark with `cargo test --example nbody` | ||
#[cfg(feature = "std")] | ||
mod nbody { | ||
use core_simd::*; | ||
|
||
use std::f64::consts::PI; | ||
const SOLAR_MASS: f64 = 4.0 * PI * PI; | ||
const DAYS_PER_YEAR: f64 = 365.24; | ||
|
||
#[derive(Debug, Clone, Copy)] | ||
struct Body { | ||
pub x: f64x4, | ||
pub v: f64x4, | ||
pub mass: f64, | ||
} | ||
|
||
const N_BODIES: usize = 5; | ||
const BODIES: [Body; N_BODIES] = [ | ||
// sun: | ||
Body { | ||
x: f64x4::from_array([0., 0., 0., 0.]), | ||
v: f64x4::from_array([0., 0., 0., 0.]), | ||
mass: SOLAR_MASS, | ||
}, | ||
// jupiter: | ||
Body { | ||
x: f64x4::from_array([ | ||
4.84143144246472090e+00, | ||
-1.16032004402742839e+00, | ||
-1.03622044471123109e-01, | ||
0., | ||
]), | ||
v: f64x4::from_array([ | ||
1.66007664274403694e-03 * DAYS_PER_YEAR, | ||
7.69901118419740425e-03 * DAYS_PER_YEAR, | ||
-6.90460016972063023e-05 * DAYS_PER_YEAR, | ||
0., | ||
]), | ||
mass: 9.54791938424326609e-04 * SOLAR_MASS, | ||
}, | ||
// saturn: | ||
Body { | ||
x: f64x4::from_array([ | ||
8.34336671824457987e+00, | ||
4.12479856412430479e+00, | ||
-4.03523417114321381e-01, | ||
0., | ||
]), | ||
v: f64x4::from_array([ | ||
-2.76742510726862411e-03 * DAYS_PER_YEAR, | ||
4.99852801234917238e-03 * DAYS_PER_YEAR, | ||
2.30417297573763929e-05 * DAYS_PER_YEAR, | ||
0., | ||
]), | ||
mass: 2.85885980666130812e-04 * SOLAR_MASS, | ||
}, | ||
// uranus: | ||
Body { | ||
x: f64x4::from_array([ | ||
1.28943695621391310e+01, | ||
-1.51111514016986312e+01, | ||
-2.23307578892655734e-01, | ||
0., | ||
]), | ||
v: f64x4::from_array([ | ||
2.96460137564761618e-03 * DAYS_PER_YEAR, | ||
2.37847173959480950e-03 * DAYS_PER_YEAR, | ||
-2.96589568540237556e-05 * DAYS_PER_YEAR, | ||
0., | ||
]), | ||
mass: 4.36624404335156298e-05 * SOLAR_MASS, | ||
}, | ||
// neptune: | ||
Body { | ||
x: f64x4::from_array([ | ||
1.53796971148509165e+01, | ||
-2.59193146099879641e+01, | ||
1.79258772950371181e-01, | ||
0., | ||
]), | ||
v: f64x4::from_array([ | ||
2.68067772490389322e-03 * DAYS_PER_YEAR, | ||
1.62824170038242295e-03 * DAYS_PER_YEAR, | ||
-9.51592254519715870e-05 * DAYS_PER_YEAR, | ||
0., | ||
]), | ||
mass: 5.15138902046611451e-05 * SOLAR_MASS, | ||
}, | ||
]; | ||
|
||
fn offset_momentum(bodies: &mut [Body; N_BODIES]) { | ||
let (sun, rest) = bodies.split_at_mut(1); | ||
let sun = &mut sun[0]; | ||
for body in rest { | ||
let m_ratio = body.mass / SOLAR_MASS; | ||
sun.v -= body.v * m_ratio; | ||
} | ||
} | ||
|
||
fn energy(bodies: &[Body; N_BODIES]) -> f64 { | ||
let mut e = 0.; | ||
for i in 0..N_BODIES { | ||
let bi = &bodies[i]; | ||
e += bi.mass * (bi.v * bi.v).horizontal_sum() * 0.5; | ||
for bj in bodies.iter().take(N_BODIES).skip(i + 1) { | ||
let dx = bi.x - bj.x; | ||
e -= bi.mass * bj.mass / (dx * dx).horizontal_sum().sqrt() | ||
} | ||
} | ||
e | ||
} | ||
|
||
fn advance(bodies: &mut [Body; N_BODIES], dt: f64) { | ||
const N: usize = N_BODIES * (N_BODIES - 1) / 2; | ||
|
||
// compute distance between bodies: | ||
let mut r = [f64x4::splat(0.); N]; | ||
{ | ||
let mut i = 0; | ||
for j in 0..N_BODIES { | ||
for k in j + 1..N_BODIES { | ||
r[i] = bodies[j].x - bodies[k].x; | ||
i += 1; | ||
} | ||
} | ||
} | ||
|
||
let mut mag = [0.0; N]; | ||
for i in (0..N).step_by(2) { | ||
let d2s = f64x2::from_array([ | ||
(r[i] * r[i]).horizontal_sum(), | ||
(r[i + 1] * r[i + 1]).horizontal_sum(), | ||
]); | ||
let dmags = f64x2::splat(dt) / (d2s * d2s.sqrt()); | ||
mag[i] = dmags[0]; | ||
mag[i + 1] = dmags[1]; | ||
} | ||
|
||
let mut i = 0; | ||
for j in 0..N_BODIES { | ||
for k in j + 1..N_BODIES { | ||
let f = r[i] * mag[i]; | ||
bodies[j].v -= f * bodies[k].mass; | ||
bodies[k].v += f * bodies[j].mass; | ||
i += 1 | ||
} | ||
} | ||
for body in bodies { | ||
body.x += dt * body.v | ||
} | ||
} | ||
|
||
pub fn run(n: usize) -> (f64, f64) { | ||
let mut bodies = BODIES; | ||
offset_momentum(&mut bodies); | ||
let energy_before = energy(&bodies); | ||
for _ in 0..n { | ||
advance(&mut bodies, 0.01); | ||
} | ||
let energy_after = energy(&bodies); | ||
|
||
(energy_before, energy_after) | ||
} | ||
} | ||
|
||
#[cfg(feature = "std")] | ||
#[cfg(test)] | ||
mod tests { | ||
// Good enough for demonstration purposes, not going for strictness here. | ||
fn approx_eq_f64(a: f64, b: f64) -> bool { | ||
(a - b).abs() < 0.00001 | ||
} | ||
#[test] | ||
fn test() { | ||
const OUTPUT: [f64; 2] = [-0.169075164, -0.169087605]; | ||
let (energy_before, energy_after) = super::nbody::run(1000); | ||
assert!(approx_eq_f64(energy_before, OUTPUT[0])); | ||
assert!(approx_eq_f64(energy_after, OUTPUT[1])); | ||
} | ||
} | ||
|
||
fn main() { | ||
#[cfg(feature = "std")] | ||
{ | ||
let (energy_before, energy_after) = nbody::run(1000); | ||
println!("Energy before: {}", energy_before); | ||
println!("Energy after: {}", energy_after); | ||
} | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
use crate::simd::intrinsics; | ||
use crate::simd::{LaneCount, Mask, Simd, SimdElement, SupportedLaneCount}; | ||
|
||
impl<T, const LANES: usize> Simd<T, LANES> | ||
where | ||
T: SimdElement + PartialEq, | ||
LaneCount<LANES>: SupportedLaneCount, | ||
{ | ||
/// Test if each lane is equal to the corresponding lane in `other`. | ||
#[inline] | ||
pub fn lanes_eq(self, other: Self) -> Mask<T::Mask, LANES> { | ||
unsafe { Mask::from_int_unchecked(intrinsics::simd_eq(self, other)) } | ||
} | ||
|
||
/// Test if each lane is not equal to the corresponding lane in `other`. | ||
#[inline] | ||
pub fn lanes_ne(self, other: Self) -> Mask<T::Mask, LANES> { | ||
unsafe { Mask::from_int_unchecked(intrinsics::simd_ne(self, other)) } | ||
} | ||
} | ||
|
||
impl<T, const LANES: usize> Simd<T, LANES> | ||
where | ||
T: SimdElement + PartialOrd, | ||
LaneCount<LANES>: SupportedLaneCount, | ||
{ | ||
/// Test if each lane is less than the corresponding lane in `other`. | ||
#[inline] | ||
pub fn lanes_lt(self, other: Self) -> Mask<T::Mask, LANES> { | ||
unsafe { Mask::from_int_unchecked(intrinsics::simd_lt(self, other)) } | ||
} | ||
|
||
/// Test if each lane is greater than the corresponding lane in `other`. | ||
#[inline] | ||
pub fn lanes_gt(self, other: Self) -> Mask<T::Mask, LANES> { | ||
unsafe { Mask::from_int_unchecked(intrinsics::simd_gt(self, other)) } | ||
} | ||
|
||
/// Test if each lane is less than or equal to the corresponding lane in `other`. | ||
#[inline] | ||
pub fn lanes_le(self, other: Self) -> Mask<T::Mask, LANES> { | ||
unsafe { Mask::from_int_unchecked(intrinsics::simd_le(self, other)) } | ||
} | ||
|
||
/// Test if each lane is greater than or equal to the corresponding lane in `other`. | ||
#[inline] | ||
pub fn lanes_ge(self, other: Self) -> Mask<T::Mask, LANES> { | ||
unsafe { Mask::from_int_unchecked(intrinsics::simd_ge(self, other)) } | ||
} | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
Portable SIMD module. | ||
|
||
This module offers a portable abstraction for SIMD operations | ||
that is not bound to any particular hardware architecture. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
use crate::simd::{LaneCount, Simd, SimdElement, SupportedLaneCount}; | ||
use core::fmt; | ||
|
||
macro_rules! impl_fmt_trait { | ||
{ $($trait:ident,)* } => { | ||
$( | ||
impl<T, const LANES: usize> fmt::$trait for Simd<T, LANES> | ||
where | ||
LaneCount<LANES>: SupportedLaneCount, | ||
T: SimdElement + fmt::$trait, | ||
{ | ||
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { | ||
#[repr(transparent)] | ||
struct Wrapper<'a, T: fmt::$trait>(&'a T); | ||
|
||
impl<T: fmt::$trait> fmt::Debug for Wrapper<'_, T> { | ||
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { | ||
self.0.fmt(f) | ||
} | ||
} | ||
|
||
f.debug_list() | ||
.entries(self.as_array().iter().map(|x| Wrapper(x))) | ||
.finish() | ||
} | ||
} | ||
)* | ||
} | ||
} | ||
|
||
impl_fmt_trait! { | ||
Debug, | ||
Binary, | ||
LowerExp, | ||
UpperExp, | ||
Octal, | ||
LowerHex, | ||
UpperHex, | ||
} |
115 changes: 115 additions & 0 deletions
115
library/portable-simd/crates/core_simd/src/intrinsics.rs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,115 @@ | ||
//! This module contains the LLVM intrinsics bindings that provide the functionality for this | ||
//! crate. | ||
//! | ||
//! The LLVM assembly language is documented here: <https://llvm.org/docs/LangRef.html> | ||
/// These intrinsics aren't linked directly from LLVM and are mostly undocumented, however they are | ||
/// simply lowered to the matching LLVM instructions by the compiler. The associated instruction | ||
/// is documented alongside each intrinsic. | ||
extern "platform-intrinsic" { | ||
/// add/fadd | ||
pub(crate) fn simd_add<T>(x: T, y: T) -> T; | ||
|
||
/// sub/fsub | ||
pub(crate) fn simd_sub<T>(x: T, y: T) -> T; | ||
|
||
/// mul/fmul | ||
pub(crate) fn simd_mul<T>(x: T, y: T) -> T; | ||
|
||
/// udiv/sdiv/fdiv | ||
pub(crate) fn simd_div<T>(x: T, y: T) -> T; | ||
|
||
/// urem/srem/frem | ||
pub(crate) fn simd_rem<T>(x: T, y: T) -> T; | ||
|
||
/// shl | ||
pub(crate) fn simd_shl<T>(x: T, y: T) -> T; | ||
|
||
/// lshr/ashr | ||
pub(crate) fn simd_shr<T>(x: T, y: T) -> T; | ||
|
||
/// and | ||
pub(crate) fn simd_and<T>(x: T, y: T) -> T; | ||
|
||
/// or | ||
pub(crate) fn simd_or<T>(x: T, y: T) -> T; | ||
|
||
/// xor | ||
pub(crate) fn simd_xor<T>(x: T, y: T) -> T; | ||
|
||
/// fptoui/fptosi/uitofp/sitofp | ||
pub(crate) fn simd_cast<T, U>(x: T) -> U; | ||
|
||
/// neg/fneg | ||
pub(crate) fn simd_neg<T>(x: T) -> T; | ||
|
||
/// fabs | ||
pub(crate) fn simd_fabs<T>(x: T) -> T; | ||
|
||
pub(crate) fn simd_eq<T, U>(x: T, y: T) -> U; | ||
pub(crate) fn simd_ne<T, U>(x: T, y: T) -> U; | ||
pub(crate) fn simd_lt<T, U>(x: T, y: T) -> U; | ||
pub(crate) fn simd_le<T, U>(x: T, y: T) -> U; | ||
pub(crate) fn simd_gt<T, U>(x: T, y: T) -> U; | ||
pub(crate) fn simd_ge<T, U>(x: T, y: T) -> U; | ||
|
||
// shufflevector | ||
pub(crate) fn simd_shuffle<T, U, V>(x: T, y: T, idx: U) -> V; | ||
|
||
pub(crate) fn simd_gather<T, U, V>(val: T, ptr: U, mask: V) -> T; | ||
pub(crate) fn simd_scatter<T, U, V>(val: T, ptr: U, mask: V); | ||
|
||
// {s,u}add.sat | ||
pub(crate) fn simd_saturating_add<T>(x: T, y: T) -> T; | ||
|
||
// {s,u}sub.sat | ||
pub(crate) fn simd_saturating_sub<T>(x: T, y: T) -> T; | ||
|
||
// reductions | ||
pub(crate) fn simd_reduce_add_ordered<T, U>(x: T, y: U) -> U; | ||
pub(crate) fn simd_reduce_mul_ordered<T, U>(x: T, y: U) -> U; | ||
#[allow(unused)] | ||
pub(crate) fn simd_reduce_all<T>(x: T) -> bool; | ||
#[allow(unused)] | ||
pub(crate) fn simd_reduce_any<T>(x: T) -> bool; | ||
pub(crate) fn simd_reduce_max<T, U>(x: T) -> U; | ||
pub(crate) fn simd_reduce_min<T, U>(x: T) -> U; | ||
pub(crate) fn simd_reduce_and<T, U>(x: T) -> U; | ||
pub(crate) fn simd_reduce_or<T, U>(x: T) -> U; | ||
pub(crate) fn simd_reduce_xor<T, U>(x: T) -> U; | ||
|
||
// truncate integer vector to bitmask | ||
#[allow(unused)] | ||
pub(crate) fn simd_bitmask<T, U>(x: T) -> U; | ||
|
||
// select | ||
pub(crate) fn simd_select<M, T>(m: M, a: T, b: T) -> T; | ||
#[allow(unused)] | ||
pub(crate) fn simd_select_bitmask<M, T>(m: M, a: T, b: T) -> T; | ||
} | ||
|
||
#[cfg(feature = "std")] | ||
mod std { | ||
extern "platform-intrinsic" { | ||
// ceil | ||
pub(crate) fn simd_ceil<T>(x: T) -> T; | ||
|
||
// floor | ||
pub(crate) fn simd_floor<T>(x: T) -> T; | ||
|
||
// round | ||
pub(crate) fn simd_round<T>(x: T) -> T; | ||
|
||
// trunc | ||
pub(crate) fn simd_trunc<T>(x: T) -> T; | ||
|
||
// fsqrt | ||
pub(crate) fn simd_fsqrt<T>(x: T) -> T; | ||
|
||
// fma | ||
pub(crate) fn simd_fma<T>(x: T, y: T, z: T) -> T; | ||
} | ||
} | ||
|
||
#[cfg(feature = "std")] | ||
pub(crate) use crate::simd::intrinsics::std::*; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
use crate::simd::{LaneCount, Simd, SupportedLaneCount}; | ||
use core::{ | ||
iter::{Product, Sum}, | ||
ops::{Add, Mul}, | ||
}; | ||
|
||
macro_rules! impl_traits { | ||
{ $type:ty } => { | ||
impl<const LANES: usize> Sum<Self> for Simd<$type, LANES> | ||
where | ||
LaneCount<LANES>: SupportedLaneCount, | ||
{ | ||
fn sum<I: Iterator<Item = Self>>(iter: I) -> Self { | ||
iter.fold(Simd::splat(0 as $type), Add::add) | ||
} | ||
} | ||
|
||
impl<const LANES: usize> Product<Self> for Simd<$type, LANES> | ||
where | ||
LaneCount<LANES>: SupportedLaneCount, | ||
{ | ||
fn product<I: Iterator<Item = Self>>(iter: I) -> Self { | ||
iter.fold(Simd::splat(1 as $type), Mul::mul) | ||
} | ||
} | ||
|
||
impl<'a, const LANES: usize> Sum<&'a Self> for Simd<$type, LANES> | ||
where | ||
LaneCount<LANES>: SupportedLaneCount, | ||
{ | ||
fn sum<I: Iterator<Item = &'a Self>>(iter: I) -> Self { | ||
iter.fold(Simd::splat(0 as $type), Add::add) | ||
} | ||
} | ||
|
||
impl<'a, const LANES: usize> Product<&'a Self> for Simd<$type, LANES> | ||
where | ||
LaneCount<LANES>: SupportedLaneCount, | ||
{ | ||
fn product<I: Iterator<Item = &'a Self>>(iter: I) -> Self { | ||
iter.fold(Simd::splat(1 as $type), Mul::mul) | ||
} | ||
} | ||
} | ||
} | ||
|
||
impl_traits! { f32 } | ||
impl_traits! { f64 } | ||
impl_traits! { u8 } | ||
impl_traits! { u16 } | ||
impl_traits! { u32 } | ||
impl_traits! { u64 } | ||
impl_traits! { usize } | ||
impl_traits! { i8 } | ||
impl_traits! { i16 } | ||
impl_traits! { i32 } | ||
impl_traits! { i64 } | ||
impl_traits! { isize } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
mod sealed { | ||
pub trait Sealed {} | ||
} | ||
use sealed::Sealed; | ||
|
||
/// A type representing a vector lane count. | ||
pub struct LaneCount<const LANES: usize>; | ||
|
||
impl<const LANES: usize> LaneCount<LANES> { | ||
/// The number of bytes in a bitmask with this many lanes. | ||
pub const BITMASK_LEN: usize = (LANES + 7) / 8; | ||
} | ||
|
||
/// Helper trait for vector lane counts. | ||
pub trait SupportedLaneCount: Sealed { | ||
#[doc(hidden)] | ||
type BitMask: Copy + Default + AsRef<[u8]> + AsMut<[u8]>; | ||
|
||
#[doc(hidden)] | ||
type IntBitMask; | ||
} | ||
|
||
impl<const LANES: usize> Sealed for LaneCount<LANES> {} | ||
|
||
impl SupportedLaneCount for LaneCount<1> { | ||
type BitMask = [u8; 1]; | ||
type IntBitMask = u8; | ||
} | ||
impl SupportedLaneCount for LaneCount<2> { | ||
type BitMask = [u8; 1]; | ||
type IntBitMask = u8; | ||
} | ||
impl SupportedLaneCount for LaneCount<4> { | ||
type BitMask = [u8; 1]; | ||
type IntBitMask = u8; | ||
} | ||
impl SupportedLaneCount for LaneCount<8> { | ||
type BitMask = [u8; 1]; | ||
type IntBitMask = u8; | ||
} | ||
impl SupportedLaneCount for LaneCount<16> { | ||
type BitMask = [u8; 2]; | ||
type IntBitMask = u16; | ||
} | ||
impl SupportedLaneCount for LaneCount<32> { | ||
type BitMask = [u8; 4]; | ||
type IntBitMask = u32; | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
#![cfg_attr(not(feature = "std"), no_std)] | ||
#![feature( | ||
const_fn_trait_bound, | ||
decl_macro, | ||
platform_intrinsics, | ||
repr_simd, | ||
simd_ffi, | ||
staged_api, | ||
stdsimd | ||
)] | ||
#![cfg_attr(feature = "generic_const_exprs", feature(generic_const_exprs))] | ||
#![cfg_attr(feature = "generic_const_exprs", allow(incomplete_features))] | ||
#![warn(missing_docs)] | ||
#![deny(unsafe_op_in_unsafe_fn)] | ||
#![unstable(feature = "portable_simd", issue = "86656")] | ||
//! Portable SIMD module. | ||
#[path = "mod.rs"] | ||
mod core_simd; | ||
pub use self::core_simd::simd; | ||
pub use simd::*; |
Large diffs are not rendered by default.
Oops, something went wrong.
220 changes: 220 additions & 0 deletions
220
library/portable-simd/crates/core_simd/src/masks/bitmask.rs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,220 @@ | ||
use super::MaskElement; | ||
use crate::simd::intrinsics; | ||
use crate::simd::{LaneCount, Simd, SupportedLaneCount}; | ||
use core::marker::PhantomData; | ||
|
||
/// A mask where each lane is represented by a single bit. | ||
#[repr(transparent)] | ||
pub struct Mask<T, const LANES: usize>( | ||
<LaneCount<LANES> as SupportedLaneCount>::BitMask, | ||
PhantomData<T>, | ||
) | ||
where | ||
T: MaskElement, | ||
LaneCount<LANES>: SupportedLaneCount; | ||
|
||
impl<T, const LANES: usize> Copy for Mask<T, LANES> | ||
where | ||
T: MaskElement, | ||
LaneCount<LANES>: SupportedLaneCount, | ||
{ | ||
} | ||
|
||
impl<T, const LANES: usize> Clone for Mask<T, LANES> | ||
where | ||
T: MaskElement, | ||
LaneCount<LANES>: SupportedLaneCount, | ||
{ | ||
fn clone(&self) -> Self { | ||
*self | ||
} | ||
} | ||
|
||
impl<T, const LANES: usize> PartialEq for Mask<T, LANES> | ||
where | ||
T: MaskElement, | ||
LaneCount<LANES>: SupportedLaneCount, | ||
{ | ||
fn eq(&self, other: &Self) -> bool { | ||
self.0.as_ref() == other.0.as_ref() | ||
} | ||
} | ||
|
||
impl<T, const LANES: usize> PartialOrd for Mask<T, LANES> | ||
where | ||
T: MaskElement, | ||
LaneCount<LANES>: SupportedLaneCount, | ||
{ | ||
fn partial_cmp(&self, other: &Self) -> Option<core::cmp::Ordering> { | ||
self.0.as_ref().partial_cmp(other.0.as_ref()) | ||
} | ||
} | ||
|
||
impl<T, const LANES: usize> Eq for Mask<T, LANES> | ||
where | ||
T: MaskElement, | ||
LaneCount<LANES>: SupportedLaneCount, | ||
{ | ||
} | ||
|
||
impl<T, const LANES: usize> Ord for Mask<T, LANES> | ||
where | ||
T: MaskElement, | ||
LaneCount<LANES>: SupportedLaneCount, | ||
{ | ||
fn cmp(&self, other: &Self) -> core::cmp::Ordering { | ||
self.0.as_ref().cmp(other.0.as_ref()) | ||
} | ||
} | ||
|
||
impl<T, const LANES: usize> Mask<T, LANES> | ||
where | ||
T: MaskElement, | ||
LaneCount<LANES>: SupportedLaneCount, | ||
{ | ||
#[inline] | ||
pub fn splat(value: bool) -> Self { | ||
let mut mask = <LaneCount<LANES> as SupportedLaneCount>::BitMask::default(); | ||
if value { | ||
mask.as_mut().fill(u8::MAX) | ||
} else { | ||
mask.as_mut().fill(u8::MIN) | ||
} | ||
if LANES % 8 > 0 { | ||
*mask.as_mut().last_mut().unwrap() &= u8::MAX >> (8 - LANES % 8); | ||
} | ||
Self(mask, PhantomData) | ||
} | ||
|
||
#[inline] | ||
pub unsafe fn test_unchecked(&self, lane: usize) -> bool { | ||
(self.0.as_ref()[lane / 8] >> (lane % 8)) & 0x1 > 0 | ||
} | ||
|
||
#[inline] | ||
pub unsafe fn set_unchecked(&mut self, lane: usize, value: bool) { | ||
unsafe { | ||
self.0.as_mut()[lane / 8] ^= ((value ^ self.test_unchecked(lane)) as u8) << (lane % 8) | ||
} | ||
} | ||
|
||
#[inline] | ||
pub fn to_int(self) -> Simd<T, LANES> { | ||
unsafe { | ||
let mask: <LaneCount<LANES> as SupportedLaneCount>::IntBitMask = | ||
core::mem::transmute_copy(&self); | ||
intrinsics::simd_select_bitmask(mask, Simd::splat(T::TRUE), Simd::splat(T::FALSE)) | ||
} | ||
} | ||
|
||
#[inline] | ||
pub unsafe fn from_int_unchecked(value: Simd<T, LANES>) -> Self { | ||
// TODO remove the transmute when rustc is more flexible | ||
assert_eq!( | ||
core::mem::size_of::<<LaneCount::<LANES> as SupportedLaneCount>::BitMask>(), | ||
core::mem::size_of::<<LaneCount::<LANES> as SupportedLaneCount>::IntBitMask>(), | ||
); | ||
unsafe { | ||
let mask: <LaneCount<LANES> as SupportedLaneCount>::IntBitMask = | ||
intrinsics::simd_bitmask(value); | ||
Self(core::mem::transmute_copy(&mask), PhantomData) | ||
} | ||
} | ||
|
||
#[cfg(feature = "generic_const_exprs")] | ||
#[inline] | ||
pub fn to_bitmask(self) -> [u8; LaneCount::<LANES>::BITMASK_LEN] { | ||
// Safety: these are the same type and we are laundering the generic | ||
unsafe { core::mem::transmute_copy(&self.0) } | ||
} | ||
|
||
#[cfg(feature = "generic_const_exprs")] | ||
#[inline] | ||
pub fn from_bitmask(bitmask: [u8; LaneCount::<LANES>::BITMASK_LEN]) -> Self { | ||
// Safety: these are the same type and we are laundering the generic | ||
Self(unsafe { core::mem::transmute_copy(&bitmask) }, PhantomData) | ||
} | ||
|
||
#[inline] | ||
pub fn convert<U>(self) -> Mask<U, LANES> | ||
where | ||
U: MaskElement, | ||
{ | ||
unsafe { core::mem::transmute_copy(&self) } | ||
} | ||
|
||
#[inline] | ||
pub fn any(self) -> bool { | ||
self != Self::splat(false) | ||
} | ||
|
||
#[inline] | ||
pub fn all(self) -> bool { | ||
self == Self::splat(true) | ||
} | ||
} | ||
|
||
impl<T, const LANES: usize> core::ops::BitAnd for Mask<T, LANES> | ||
where | ||
T: MaskElement, | ||
LaneCount<LANES>: SupportedLaneCount, | ||
<LaneCount<LANES> as SupportedLaneCount>::BitMask: AsRef<[u8]> + AsMut<[u8]>, | ||
{ | ||
type Output = Self; | ||
#[inline] | ||
fn bitand(mut self, rhs: Self) -> Self { | ||
for (l, r) in self.0.as_mut().iter_mut().zip(rhs.0.as_ref().iter()) { | ||
*l &= r; | ||
} | ||
self | ||
} | ||
} | ||
|
||
impl<T, const LANES: usize> core::ops::BitOr for Mask<T, LANES> | ||
where | ||
T: MaskElement, | ||
LaneCount<LANES>: SupportedLaneCount, | ||
<LaneCount<LANES> as SupportedLaneCount>::BitMask: AsRef<[u8]> + AsMut<[u8]>, | ||
{ | ||
type Output = Self; | ||
#[inline] | ||
fn bitor(mut self, rhs: Self) -> Self { | ||
for (l, r) in self.0.as_mut().iter_mut().zip(rhs.0.as_ref().iter()) { | ||
*l |= r; | ||
} | ||
self | ||
} | ||
} | ||
|
||
impl<T, const LANES: usize> core::ops::BitXor for Mask<T, LANES> | ||
where | ||
T: MaskElement, | ||
LaneCount<LANES>: SupportedLaneCount, | ||
{ | ||
type Output = Self; | ||
#[inline] | ||
fn bitxor(mut self, rhs: Self) -> Self::Output { | ||
for (l, r) in self.0.as_mut().iter_mut().zip(rhs.0.as_ref().iter()) { | ||
*l ^= r; | ||
} | ||
self | ||
} | ||
} | ||
|
||
impl<T, const LANES: usize> core::ops::Not for Mask<T, LANES> | ||
where | ||
T: MaskElement, | ||
LaneCount<LANES>: SupportedLaneCount, | ||
{ | ||
type Output = Self; | ||
#[inline] | ||
fn not(mut self) -> Self::Output { | ||
for x in self.0.as_mut() { | ||
*x = !*x; | ||
} | ||
if LANES % 8 > 0 { | ||
*self.0.as_mut().last_mut().unwrap() &= u8::MAX >> (8 - LANES % 8); | ||
} | ||
self | ||
} | ||
} |
228 changes: 228 additions & 0 deletions
228
library/portable-simd/crates/core_simd/src/masks/full_masks.rs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,228 @@ | ||
//! Masks that take up full SIMD vector registers. | ||
use super::MaskElement; | ||
use crate::simd::intrinsics; | ||
use crate::simd::{LaneCount, Simd, SupportedLaneCount}; | ||
|
||
#[repr(transparent)] | ||
pub struct Mask<T, const LANES: usize>(Simd<T, LANES>) | ||
where | ||
T: MaskElement, | ||
LaneCount<LANES>: SupportedLaneCount; | ||
|
||
impl<T, const LANES: usize> Copy for Mask<T, LANES> | ||
where | ||
T: MaskElement, | ||
LaneCount<LANES>: SupportedLaneCount, | ||
{ | ||
} | ||
|
||
impl<T, const LANES: usize> Clone for Mask<T, LANES> | ||
where | ||
T: MaskElement, | ||
LaneCount<LANES>: SupportedLaneCount, | ||
{ | ||
#[inline] | ||
fn clone(&self) -> Self { | ||
*self | ||
} | ||
} | ||
|
||
impl<T, const LANES: usize> PartialEq for Mask<T, LANES> | ||
where | ||
T: MaskElement + PartialEq, | ||
LaneCount<LANES>: SupportedLaneCount, | ||
{ | ||
fn eq(&self, other: &Self) -> bool { | ||
self.0.eq(&other.0) | ||
} | ||
} | ||
|
||
impl<T, const LANES: usize> PartialOrd for Mask<T, LANES> | ||
where | ||
T: MaskElement + PartialOrd, | ||
LaneCount<LANES>: SupportedLaneCount, | ||
{ | ||
fn partial_cmp(&self, other: &Self) -> Option<core::cmp::Ordering> { | ||
self.0.partial_cmp(&other.0) | ||
} | ||
} | ||
|
||
impl<T, const LANES: usize> Eq for Mask<T, LANES> | ||
where | ||
T: MaskElement + Eq, | ||
LaneCount<LANES>: SupportedLaneCount, | ||
{ | ||
} | ||
|
||
impl<T, const LANES: usize> Ord for Mask<T, LANES> | ||
where | ||
T: MaskElement + Ord, | ||
LaneCount<LANES>: SupportedLaneCount, | ||
{ | ||
fn cmp(&self, other: &Self) -> core::cmp::Ordering { | ||
self.0.cmp(&other.0) | ||
} | ||
} | ||
|
||
impl<T, const LANES: usize> Mask<T, LANES> | ||
where | ||
T: MaskElement, | ||
LaneCount<LANES>: SupportedLaneCount, | ||
{ | ||
pub fn splat(value: bool) -> Self { | ||
Self(Simd::splat(if value { T::TRUE } else { T::FALSE })) | ||
} | ||
|
||
#[inline] | ||
pub unsafe fn test_unchecked(&self, lane: usize) -> bool { | ||
T::eq(self.0[lane], T::TRUE) | ||
} | ||
|
||
#[inline] | ||
pub unsafe fn set_unchecked(&mut self, lane: usize, value: bool) { | ||
self.0[lane] = if value { T::TRUE } else { T::FALSE } | ||
} | ||
|
||
#[inline] | ||
pub fn to_int(self) -> Simd<T, LANES> { | ||
self.0 | ||
} | ||
|
||
#[inline] | ||
pub unsafe fn from_int_unchecked(value: Simd<T, LANES>) -> Self { | ||
Self(value) | ||
} | ||
|
||
#[inline] | ||
pub fn convert<U>(self) -> Mask<U, LANES> | ||
where | ||
U: MaskElement, | ||
{ | ||
unsafe { Mask(intrinsics::simd_cast(self.0)) } | ||
} | ||
|
||
#[cfg(feature = "generic_const_exprs")] | ||
#[inline] | ||
pub fn to_bitmask(self) -> [u8; LaneCount::<LANES>::BITMASK_LEN] { | ||
unsafe { | ||
// TODO remove the transmute when rustc can use arrays of u8 as bitmasks | ||
assert_eq!( | ||
core::mem::size_of::<<LaneCount::<LANES> as SupportedLaneCount>::IntBitMask>(), | ||
LaneCount::<LANES>::BITMASK_LEN, | ||
); | ||
let bitmask: <LaneCount<LANES> as SupportedLaneCount>::IntBitMask = | ||
intrinsics::simd_bitmask(self.0); | ||
let mut bitmask: [u8; LaneCount::<LANES>::BITMASK_LEN] = | ||
core::mem::transmute_copy(&bitmask); | ||
|
||
// There is a bug where LLVM appears to implement this operation with the wrong | ||
// bit order. | ||
// TODO fix this in a better way | ||
if cfg!(target_endian = "big") { | ||
for x in bitmask.as_mut() { | ||
*x = x.reverse_bits(); | ||
} | ||
} | ||
|
||
bitmask | ||
} | ||
} | ||
|
||
#[cfg(feature = "generic_const_exprs")] | ||
#[inline] | ||
pub fn from_bitmask(mut bitmask: [u8; LaneCount::<LANES>::BITMASK_LEN]) -> Self { | ||
unsafe { | ||
// There is a bug where LLVM appears to implement this operation with the wrong | ||
// bit order. | ||
// TODO fix this in a better way | ||
if cfg!(target_endian = "big") { | ||
for x in bitmask.as_mut() { | ||
*x = x.reverse_bits(); | ||
} | ||
} | ||
|
||
// TODO remove the transmute when rustc can use arrays of u8 as bitmasks | ||
assert_eq!( | ||
core::mem::size_of::<<LaneCount::<LANES> as SupportedLaneCount>::IntBitMask>(), | ||
LaneCount::<LANES>::BITMASK_LEN, | ||
); | ||
let bitmask: <LaneCount<LANES> as SupportedLaneCount>::IntBitMask = | ||
core::mem::transmute_copy(&bitmask); | ||
|
||
Self::from_int_unchecked(intrinsics::simd_select_bitmask( | ||
bitmask, | ||
Self::splat(true).to_int(), | ||
Self::splat(false).to_int(), | ||
)) | ||
} | ||
} | ||
|
||
#[inline] | ||
pub fn any(self) -> bool { | ||
unsafe { intrinsics::simd_reduce_any(self.to_int()) } | ||
} | ||
|
||
#[inline] | ||
pub fn all(self) -> bool { | ||
unsafe { intrinsics::simd_reduce_all(self.to_int()) } | ||
} | ||
} | ||
|
||
impl<T, const LANES: usize> core::convert::From<Mask<T, LANES>> for Simd<T, LANES> | ||
where | ||
T: MaskElement, | ||
LaneCount<LANES>: SupportedLaneCount, | ||
{ | ||
fn from(value: Mask<T, LANES>) -> Self { | ||
value.0 | ||
} | ||
} | ||
|
||
impl<T, const LANES: usize> core::ops::BitAnd for Mask<T, LANES> | ||
where | ||
T: MaskElement, | ||
LaneCount<LANES>: SupportedLaneCount, | ||
{ | ||
type Output = Self; | ||
#[inline] | ||
fn bitand(self, rhs: Self) -> Self { | ||
unsafe { Self(intrinsics::simd_and(self.0, rhs.0)) } | ||
} | ||
} | ||
|
||
impl<T, const LANES: usize> core::ops::BitOr for Mask<T, LANES> | ||
where | ||
T: MaskElement, | ||
LaneCount<LANES>: SupportedLaneCount, | ||
{ | ||
type Output = Self; | ||
#[inline] | ||
fn bitor(self, rhs: Self) -> Self { | ||
unsafe { Self(intrinsics::simd_or(self.0, rhs.0)) } | ||
} | ||
} | ||
|
||
impl<T, const LANES: usize> core::ops::BitXor for Mask<T, LANES> | ||
where | ||
T: MaskElement, | ||
LaneCount<LANES>: SupportedLaneCount, | ||
{ | ||
type Output = Self; | ||
#[inline] | ||
fn bitxor(self, rhs: Self) -> Self { | ||
unsafe { Self(intrinsics::simd_xor(self.0, rhs.0)) } | ||
} | ||
} | ||
|
||
impl<T, const LANES: usize> core::ops::Not for Mask<T, LANES> | ||
where | ||
T: MaskElement, | ||
LaneCount<LANES>: SupportedLaneCount, | ||
{ | ||
type Output = Self; | ||
#[inline] | ||
fn not(self) -> Self::Output { | ||
Self::splat(true) ^ self | ||
} | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,159 @@ | ||
use crate::simd::intrinsics::{simd_saturating_add, simd_saturating_sub}; | ||
use crate::simd::{LaneCount, Simd, SupportedLaneCount}; | ||
|
||
macro_rules! impl_uint_arith { | ||
($($ty:ty),+) => { | ||
$( impl<const LANES: usize> Simd<$ty, LANES> where LaneCount<LANES>: SupportedLaneCount { | ||
|
||
/// Lanewise saturating add. | ||
/// | ||
/// # Examples | ||
/// ``` | ||
/// # #![feature(portable_simd)] | ||
/// # #[cfg(feature = "std")] use core_simd::Simd; | ||
/// # #[cfg(not(feature = "std"))] use core::simd::Simd; | ||
#[doc = concat!("# use core::", stringify!($ty), "::MAX;")] | ||
/// let x = Simd::from_array([2, 1, 0, MAX]); | ||
/// let max = Simd::splat(MAX); | ||
/// let unsat = x + max; | ||
/// let sat = x.saturating_add(max); | ||
/// assert_eq!(x - 1, unsat); | ||
/// assert_eq!(sat, max); | ||
/// ``` | ||
#[inline] | ||
pub fn saturating_add(self, second: Self) -> Self { | ||
unsafe { simd_saturating_add(self, second) } | ||
} | ||
|
||
/// Lanewise saturating subtract. | ||
/// | ||
/// # Examples | ||
/// ``` | ||
/// # #![feature(portable_simd)] | ||
/// # #[cfg(feature = "std")] use core_simd::Simd; | ||
/// # #[cfg(not(feature = "std"))] use core::simd::Simd; | ||
#[doc = concat!("# use core::", stringify!($ty), "::MAX;")] | ||
/// let x = Simd::from_array([2, 1, 0, MAX]); | ||
/// let max = Simd::splat(MAX); | ||
/// let unsat = x - max; | ||
/// let sat = x.saturating_sub(max); | ||
/// assert_eq!(unsat, x + 1); | ||
/// assert_eq!(sat, Simd::splat(0)); | ||
#[inline] | ||
pub fn saturating_sub(self, second: Self) -> Self { | ||
unsafe { simd_saturating_sub(self, second) } | ||
} | ||
})+ | ||
} | ||
} | ||
|
||
macro_rules! impl_int_arith { | ||
($($ty:ty),+) => { | ||
$( impl<const LANES: usize> Simd<$ty, LANES> where LaneCount<LANES>: SupportedLaneCount { | ||
|
||
/// Lanewise saturating add. | ||
/// | ||
/// # Examples | ||
/// ``` | ||
/// # #![feature(portable_simd)] | ||
/// # #[cfg(feature = "std")] use core_simd::Simd; | ||
/// # #[cfg(not(feature = "std"))] use core::simd::Simd; | ||
#[doc = concat!("# use core::", stringify!($ty), "::{MIN, MAX};")] | ||
/// let x = Simd::from_array([MIN, 0, 1, MAX]); | ||
/// let max = Simd::splat(MAX); | ||
/// let unsat = x + max; | ||
/// let sat = x.saturating_add(max); | ||
/// assert_eq!(unsat, Simd::from_array([-1, MAX, MIN, -2])); | ||
/// assert_eq!(sat, Simd::from_array([-1, MAX, MAX, MAX])); | ||
/// ``` | ||
#[inline] | ||
pub fn saturating_add(self, second: Self) -> Self { | ||
unsafe { simd_saturating_add(self, second) } | ||
} | ||
|
||
/// Lanewise saturating subtract. | ||
/// | ||
/// # Examples | ||
/// ``` | ||
/// # #![feature(portable_simd)] | ||
/// # #[cfg(feature = "std")] use core_simd::Simd; | ||
/// # #[cfg(not(feature = "std"))] use core::simd::Simd; | ||
#[doc = concat!("# use core::", stringify!($ty), "::{MIN, MAX};")] | ||
/// let x = Simd::from_array([MIN, -2, -1, MAX]); | ||
/// let max = Simd::splat(MAX); | ||
/// let unsat = x - max; | ||
/// let sat = x.saturating_sub(max); | ||
/// assert_eq!(unsat, Simd::from_array([1, MAX, MIN, 0])); | ||
/// assert_eq!(sat, Simd::from_array([MIN, MIN, MIN, 0])); | ||
#[inline] | ||
pub fn saturating_sub(self, second: Self) -> Self { | ||
unsafe { simd_saturating_sub(self, second) } | ||
} | ||
|
||
/// Lanewise absolute value, implemented in Rust. | ||
/// Every lane becomes its absolute value. | ||
/// | ||
/// # Examples | ||
/// ``` | ||
/// # #![feature(portable_simd)] | ||
/// # #[cfg(feature = "std")] use core_simd::Simd; | ||
/// # #[cfg(not(feature = "std"))] use core::simd::Simd; | ||
#[doc = concat!("# use core::", stringify!($ty), "::{MIN, MAX};")] | ||
/// let xs = Simd::from_array([MIN, MIN +1, -5, 0]); | ||
/// assert_eq!(xs.abs(), Simd::from_array([MIN, MAX, 5, 0])); | ||
/// ``` | ||
#[inline] | ||
pub fn abs(self) -> Self { | ||
const SHR: $ty = <$ty>::BITS as $ty - 1; | ||
let m = self >> SHR; | ||
(self^m) - m | ||
} | ||
|
||
/// Lanewise saturating absolute value, implemented in Rust. | ||
/// As abs(), except the MIN value becomes MAX instead of itself. | ||
/// | ||
/// # Examples | ||
/// ``` | ||
/// # #![feature(portable_simd)] | ||
/// # #[cfg(feature = "std")] use core_simd::Simd; | ||
/// # #[cfg(not(feature = "std"))] use core::simd::Simd; | ||
#[doc = concat!("# use core::", stringify!($ty), "::{MIN, MAX};")] | ||
/// let xs = Simd::from_array([MIN, -2, 0, 3]); | ||
/// let unsat = xs.abs(); | ||
/// let sat = xs.saturating_abs(); | ||
/// assert_eq!(unsat, Simd::from_array([MIN, 2, 0, 3])); | ||
/// assert_eq!(sat, Simd::from_array([MAX, 2, 0, 3])); | ||
/// ``` | ||
#[inline] | ||
pub fn saturating_abs(self) -> Self { | ||
// arith shift for -1 or 0 mask based on sign bit, giving 2s complement | ||
const SHR: $ty = <$ty>::BITS as $ty - 1; | ||
let m = self >> SHR; | ||
(self^m).saturating_sub(m) | ||
} | ||
|
||
/// Lanewise saturating negation, implemented in Rust. | ||
/// As neg(), except the MIN value becomes MAX instead of itself. | ||
/// | ||
/// # Examples | ||
/// ``` | ||
/// # #![feature(portable_simd)] | ||
/// # #[cfg(feature = "std")] use core_simd::Simd; | ||
/// # #[cfg(not(feature = "std"))] use core::simd::Simd; | ||
#[doc = concat!("# use core::", stringify!($ty), "::{MIN, MAX};")] | ||
/// let x = Simd::from_array([MIN, -2, 3, MAX]); | ||
/// let unsat = -x; | ||
/// let sat = x.saturating_neg(); | ||
/// assert_eq!(unsat, Simd::from_array([MIN, 2, -3, MIN + 1])); | ||
/// assert_eq!(sat, Simd::from_array([MAX, 2, -3, MIN + 1])); | ||
/// ``` | ||
#[inline] | ||
pub fn saturating_neg(self) -> Self { | ||
Self::splat(0).saturating_sub(self) | ||
} | ||
})+ | ||
} | ||
} | ||
|
||
impl_uint_arith! { u8, u16, u32, u64, usize } | ||
impl_int_arith! { i8, i16, i32, i64, isize } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
#[macro_use] | ||
mod reduction; | ||
|
||
#[macro_use] | ||
mod swizzle; | ||
|
||
pub(crate) mod intrinsics; | ||
|
||
#[cfg(feature = "generic_const_exprs")] | ||
mod to_bytes; | ||
|
||
mod comparisons; | ||
mod fmt; | ||
mod iter; | ||
mod lane_count; | ||
mod masks; | ||
mod math; | ||
mod ops; | ||
mod round; | ||
mod select; | ||
mod vector; | ||
mod vendor; | ||
|
||
#[doc = include_str!("core_simd_docs.md")] | ||
pub mod simd { | ||
pub(crate) use crate::core_simd::intrinsics; | ||
|
||
pub use crate::core_simd::lane_count::{LaneCount, SupportedLaneCount}; | ||
pub use crate::core_simd::masks::*; | ||
pub use crate::core_simd::select::Select; | ||
pub use crate::core_simd::swizzle::*; | ||
pub use crate::core_simd::vector::*; | ||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.