Skip to content

Define portability #359

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 10, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions crates/core_simd/src/core_simd_docs.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,38 @@ Portable SIMD module.

This module offers a portable abstraction for SIMD operations
that is not bound to any particular hardware architecture.

# What is "portable"?

This module provides a SIMD implementation that is fast and predictable on any target.

### Portable SIMD works on every target

Unlike target-specific SIMD in `std::arch`, portable SIMD compiles for every target.
In this regard, it is just like "regular" Rust.

### Portable SIMD is consistent between targets

A program using portable SIMD can expect identical behavior on any target.
In most regards, [`Simd<T, N>`] can be thought of as a parallelized `[T; N]` and operates like a sequence of `T`.

This has one notable exception: a handful of older architectures (e.g. `armv7` and `powerpc`) flush [subnormal](`f32::is_subnormal`) `f32` values to zero.
On these architectures, subnormal `f32` input values are replaced with zeros, and any operation producing subnormal `f32` values produces zeros instead.
This doesn't affect most architectures or programs.

### Operations use the best instructions available

Operations provided by this module compile to the best available SIMD instructions.

Portable SIMD is not a low-level vendor library, and operations in portable SIMD _do not_ necessarily map to a single instruction.
Instead, they map to a reasonable implementation of the operation for the target.

Consistency between targets is not compromised to use faster or fewer instructions.
In some cases, `std::arch` will provide a faster function that has slightly different behavior than the `std::simd` equivalent.
For example, [`_mm_min_ps`](`core::arch::x86_64::_mm_min_ps`)[^1] can be slightly faster than [`SimdFloat::simd_min`], but does not conform to the IEEE standard also used by [`f32::min`].
When necessary, [`Simd<T, N>`] can be converted to the types provided by `std::arch` to make use of target-specific functions.

Many targets simply don't have SIMD, or don't support SIMD for a particular element type.
In those cases, regular scalar operations are generated instead.

[^1]: `_mm_min_ps(x, y)` is equivalent to `x.simd_lt(y).select(x, y)`
3 changes: 2 additions & 1 deletion crates/core_simd/src/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,9 @@ mod swizzle_dyn;
mod vector;
mod vendor;

#[doc = include_str!("core_simd_docs.md")]
pub mod simd {
#![doc = include_str!("core_simd_docs.md")]

pub mod prelude;

pub(crate) use crate::core_simd::intrinsics;
Expand Down