Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add more SIMD platform-intrinsics #117953
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more SIMD platform-intrinsics #117953
Changes from all commits
97ae509
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be
bx.align_of(values_elem).bytes()
notvalues_ty
. Technically, the LLVM intrinsic accepts any power of two as alignment, so we could relax this down to 1.https://rust-lang.github.io/unsafe-code-guidelines/layout/packed-simd-vectors.html#packed-simd-vector-types
Am I understanding this correctly or does it not make a difference @workingjubilee?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having checked the Arm ARM, RISCV V spec, and Intel SDM, we should continue to require element alignment at least. We can expect any masked-load-supporting implementation to support masked loads and stores on element boundaries. However, they may reject accesses on byte boundaries (except where the element is u8, of course), or they may just implement the load/store inefficiently if unaligned, which is honestly about as bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are definitely reasons to do unaligned loads (deserialization, packet framing, etc) that benefit from vectorization, so I think we should at least make writing
{}_unaligned
versions possible. Though I can't remember, what's the requirement for scatter/gather?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same, element alignment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can adjust them to take a const parameter for alignment, if we want, I guess?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can only guess the intent, but I am fairly sure about what
Simd::load
currently does. :)If it is intended to allow entirely unaligned pointers, it should be implemented as
ptr.cast::<Self>().read_unaligned()
. Though I guess that would run into issues with non-power-of-2 vectors as the comment indicates, but those aren't properly supported currently anyway.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened rust-lang/portable-simd#382 for the load question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will open a PR to fix the alignment passed down to LLVM IR. The optimizer converts the masked load to an umasked, aligned load because the alignment is too high
https://rust.godbolt.org/z/KEeGbevbb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct.
I certainly wasn't intending on being freewheeling here, or I would not have asked a zillion small questions myself. :^) The problem is slightly of the "remembering all the questions I should be asking" nature, especially when there has already been multiple rounds of review.
I guess the main list of questions to keep in mind is:
[true, false, ..]
? (i.e. "what is the normal behavior?")There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that we do not ever treat uninit memory specially for memory operations in Rust. The UB on uninit derives entirely from the fact that uninit memory violates the validity invariant of most types. So the question that should be asked is: is this an untyped operation, working on raw abstract machine bytes, or is it a typed operation, and at which type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the meaning of "values" here? It seems strange that a load is told the values to load...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the fallback/passthrough values used for lanes that are masked off, so have the corresponding bits in
mask
all zerosThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, the name
values
doesn't really convey that. Thanks!There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the alignment requirements on the pointer for the store to be valid? (And same question for the load.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pointer should be aligned to the individual element, not the whole vector. So when loading
f32x2
, size_of is 8, but alignment of f32 is 4, therefore the pointer should be aligned to 4. This is both for load & storeThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the mask is all-0, i.e., no load/store actually happens. Is the pointer still required to be aligned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even with alignment checking, masked elements never generate an access for the load, thus no hardware implementation requires it. If you have a strong reasoning for arguing otherwise, we could make it UB regardless?
Otherwise: no, it's a non-event, like
unsafe { invalid_ptr.add(0) }
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw x86-64 for these doesn't ever generate an alignment check exception even if you turn alignment checking on, while the others are explicit that the logic is that predication prevents the access, therefore the alignment check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes they do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe everything that's been raised so far has been addressed. Thanks @RalfJung and apologies if my answers earlier were unhelpful. I wasn't aware that the alignment requirement is only relaxed through the from_array/to_array path, relying on the optimizer to elide those later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@farnoy can you link to the PR(s) that address this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#118864
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly I've been asking for documentation. #118853 addresses that, please double-check there that the docs for these new intrinsics match what you implemented now. :)