C11 Atomics: add relaxed load and store to avoid superfluous atomic instructions #10487
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The C11
_Atomic
qualifier requires accesses to such variables to be sequentially consistent [1], leading to unnecessary atomic operations in code paths where there cannot be concurrent accesses by other threads (e.g., initialization of data structures such asopal_object_t
and the non-atomic fallbacks in theopal_thread_*
functions). In fact, the whole shabeng about avoiding atomics inopal_thread_*
calls is moot with C11.This PR introduces
OPAL_ATOMIC_RELAXED_STORE
andOPAL_ATOMIC_RELAXED_LOAD
to avoid atomicoperations in paths where atomicity and memory ordering is not required. On my machine, a sequence of
irecv - send - wait
takes 0.22us on current main and 0.16us with this patch. I'm sure there are plenty of other places where we should useOPAL_ATOMIC_RELAXED_STORE
instead of simple assignment but finding them is tedious.Alternatively, we may consider removing
_Atomic
from theopal_atomic_*
types and usevolatile
instead (to make sure polling on variables is not optimized out). That way, we don't have to litter ugly macros likeOPAL_ATOMIC_RELAXED_STORE
throughout the code and keep the C11 atomic backend consistent with the other backends. Plus, we won't run the chance that there are stray atomic operations that were not caught. After all, all code in Open MPI relies on theopal_atomic*
functions to perform atomic operations and direct accesses to atomic variable are not expected to be atomic or ordered. I wasn't around when the decision was made to use_Atomic
so I don't know what the arguments were back then though.For the record: a simple atomic load/store interface for opal atomic was requested in #9722, which is what this PR adds.
[1] https://en.cppreference.com/w/c/language/atomic
Signed-off-by: Joseph Schuchart [email protected]