-
Notifications
You must be signed in to change notification settings - Fork 900
Clean up atomics interface #9901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@gpaulsen, @jjhursey, and @awlauria there are two changes in this series that need some special IBM review. First, we now provide LL/SC atomics for the C11 and Builtin cases. This is similar to what we did on ARM, and probably good for core atomics performance. You might want to see if this brings perf of the C11 atomics in close enough that you could use them. But even if not, it's worth a look that I didn't screw it up. Second, in the configure code I tried to replicate the behavior of enabling/not enabling the C11 atomics on POWER. I believe i have it right, but some checks that it works how you want would be good. Since that's POWER specific code, I can't test it locally. |
The IBM CI (GNU/Scale) build failed! Please review the log, linked below. Gist: https://gist.github.com/d28c06d2627f1f7d1635d5be8f0f12ae |
The IBM CI (XL) build failed! Please review the log, linked below. Gist: https://gist.github.com/959a2d500e213614d17c1f43e7767dc2 |
The IBM CI (PGI) build failed! Please review the log, linked below. Gist: https://gist.github.com/a43f16cc2441ac0a04e9d3046bc3fe64 |
a41bbfe
to
cb4e9bb
Compare
The IBM CI (GNU/Scale) build failed! Please review the log, linked below. Gist: https://gist.github.com/05f55bfebba8f3877ca0eb6938f4d803 |
The IBM CI (XL) build failed! Please review the log, linked below. Gist: https://gist.github.com/093314a4dadca48168d15aba5208acaa |
cb4e9bb
to
ba3aa9d
Compare
The IBM CI (PGI) build failed! Please review the log, linked below. Gist: https://gist.github.com/df93ecaf70cf3de21751ba674f8c4ed9 |
The IBM CI (GNU/Scale) build failed! Please review the log, linked below. Gist: https://gist.github.com/14620a5e2d2c18ec55637fc94cada061 |
The IBM CI (XL) build failed! Please review the log, linked below. Gist: https://gist.github.com/5ea839d06680566d298d067ebe9403e1 |
The IBM CI (PGI) build failed! Please review the log, linked below. Gist: https://gist.github.com/c62a4cbc3529bdc77db51031854688d5 |
ba3aa9d
to
dfca8e3
Compare
I pulled this down and verified that the powerPc atomics are still used with xl and gcc on our machines. |
a3fec07
to
539934e
Compare
result is commonly used as a scoped variable throughout OMPI's configure. This doesn't currently cause failures, because the assembly checks unconditionally unset result (after using it for its own things). But to clean up the assembly macros, we must clean up this usage. Signed-off-by: Brian Barrett <[email protected]>
As of ebce88b, we no longer have any non-inline assembly code in Open MPI. This patch removes the configure infrastructure to support non-inline assembly. Since the only assembly style supported is GCC inline assembly, remove a bunch of constant checks in the atomics and timing code around GCC inline assembly. Those headers won't be included unless GCC inline assembly is supported. Finally, be consistent in what constant we are checking for GCC inline assembly support and add a missing check in the ARM64 timer code. Signed-off-by: Brian Barrett <[email protected]>
In the early days of OMPI, we used MB/WMB/RMB macros instead of the opal_atomic_* functions that most of the code uses today. Remove the last few places that the MB defines were used and then remove the defines themselves. Signed-off-by: Brian Barrett <[email protected]>
The atomic memory barrier interface was de facto mandatory, as none of the code making opal_atomic_mb() calls bother with the ifdefs. Since all the supported platforms have atomic memory barrier implementations, just make them required. At the same time, clean up the code to reflect that there are no longer non-inline versions of the memory barrier assembly. Signed-off-by: Brian Barrett <[email protected]>
The compare and swap and swap atomic interfaces are de facto mandatory, as no code actually checks for their existance before using them (with the notable exception of the 128 bit cswap), so make those interfaces mandatory and remove all the complex code to detect implementations of them. At the same time, reorder the implementation files to all have consistent ordering of the cswap and swap implementations, to ease maintenance and move to a implementation includes the compatibility layer mode instead of the previous overly complex all in one software implementation model. Finally, remove the untyped macro wrappers as 1) no one uses them and 2) they're a pain to maintain. Signed-off-by: Brian Barrett <[email protected]>
With the change to make compare and swap mandatory, we can always implement spinlocks, so make them mandatory and simplify the interface. Signed-off-by: Brian Barrett <[email protected]>
The atomic math interface has become required based on usage, so embrace the requirement and remove all the gorp to make the interface optional. Split out the compatibility implementation into headers that can be included as required by implementations, saving us from a mess of #defines in atomic_impl.h. Reorganize the implementation headers so that the code is in the same order across implementations, for readability. Signed-off-by: Brian Barrett <[email protected]>
Unlike other interfaces, the LL/SC interface must be optional, as many platforms do not provide LL/SC support. Document the interface a bit better, and move the PowerPC LL/SC interface into its own header file so that it can be used with C11/GCC builtin atomics, similar to the ARM platform. Signed-off-by: Brian Barrett <[email protected]>
With the previous refactoring and breaking out the implementation code into seperate headers, it is no longer required that the atomic_stdc.h implementation be included "special" in the atomics header. Clean up the ordering of includes so that every implementation follows the same include ordering. The stdc implementations of the atomics is type independent and we don't enforce inline wrappers because that results in a massive number of warnings from opal_lifo due to its interesting use of volatile. This patch takes the do-no-harm approach, but we need to clean up opal_lifo in a future patch. Signed-off-by: Brian Barrett <[email protected]>
Finish removal of the old assembly logic that supported non-inline atomics as well as inline styles other than gcc. Rework the primary configure macro to make the flow more obvious (particularly around which atomics style should be used), and push all the decisions about using C11 atomics with certain compilers into the configure logic. Signed-off-by: Brian Barrett <[email protected]>
With the introduction of the portable platform header, all the OPAL_ASSEMBLY_ARCH code was redundant and kind of a pain to deal with. Remove all uses of OPAL_ASSEMBLY_ARCH and replace with the equivalent portable platform conditional. Signed-off-by: Brian Barrett <[email protected]>
539934e
to
d674d8b
Compare
bot:aws:retest (looks like an instance died mid-test) |
* @param delta Value to add (converted to <TYPE>). | ||
*/ | ||
#define opal_atomic_add(ADDR, VALUE) \ | ||
opal_atomic_add_xx((opal_atomic_intptr_t *) (ADDR), (int32_t)(VALUE), sizeof(*(ADDR))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should VALUE
be of int32_t
here? I'd expect it to be either 32 or 64?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's always been an int32_t for the opal_atomic_add() variant's value argument. THe common use case is adding a relatively small value, and you don't want a function taking a double register argument because it's a 64 bit integer on a 32 bit machine. The right answer is to get rid of the inline assembly variants where typing is explicit, but the Power compilers need to get better first :).
opal_atomic_add_xx((opal_atomic_intptr_t *) (ADDR), (int32_t)(VALUE), sizeof(*(ADDR))) | ||
|
||
static inline void opal_atomic_add_xx(opal_atomic_intptr_t *addr, int32_t value, size_t length) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised that length is of size_t
since it should only either be 4
or 8
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, the return type of sizeof() is a size_t. But you are correct that it will be either 4 or 8 on any reasonable platform. Since a size_t is (on any rational platform) be storable in a single register, it doesn't really matter.
|
||
# else | ||
OPAL_ATOMIC_STDC_DEFINE_FETCH_OP(add, size_t, size_t, +) | ||
OPAL_ATOMIC_STDC_DEFINE_FETCH_OP(sub, size_t, size_t, -) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want/need ATOMIC_STDC FETCH_OPs for size_t
&
, |
, and ^
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Historically, we've only needed addition and subtraction (and we only need subtraction because you can't/shouldn't represent negative numbers in a size_t). If we are going to add them, we would need to add them for all the cases, which would be a bunch of work.
@@ -1,39 +0,0 @@ | |||
#!/bin/sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏽
So this started with trying to add a new math atomic (logical and), and turned into a bunch of code and I still haven't gotten to adding a logical and atomic.
The atomics interface was always a mess of complexity because of the need to support non-inline assembly. As that was partially removed and other interfaces added, it had become really complex to add new functionality. We also weren't consistent with interfaces from platform to platform and even answering the question (for the configure summary) of what version of atomics we were going to use was near impossible. Finally, everyone had started assuming optional parts of the interface were actually required (although they were there for all implementations today, so no harm).
This patch series tries to greatly simplify all that, and get us back to atomic.h being the source of compiler-enforced truth for the interface specification.
One remaining open is the compare and swap functions for the C11 atomics. There's a note in atomic.h about it, but it all comes down to handling the volatile usage in the counted pointer implementation for opal_lifo and opal_fifo. I'm not convinced that the volatile usage is correct in the first place, but it would definitely generate a long list of warnings if we tightly enforced the current interface specification. Other implementations are not a problem because the
opal_atomic_<type>_t
types for other implementations are already volatile.