Clean up atomics interface #9901

bwbarrett · 2022-01-20T20:41:23Z

So this started with trying to add a new math atomic (logical and), and turned into a bunch of code and I still haven't gotten to adding a logical and atomic.

The atomics interface was always a mess of complexity because of the need to support non-inline assembly. As that was partially removed and other interfaces added, it had become really complex to add new functionality. We also weren't consistent with interfaces from platform to platform and even answering the question (for the configure summary) of what version of atomics we were going to use was near impossible. Finally, everyone had started assuming optional parts of the interface were actually required (although they were there for all implementations today, so no harm).

This patch series tries to greatly simplify all that, and get us back to atomic.h being the source of compiler-enforced truth for the interface specification.

One remaining open is the compare and swap functions for the C11 atomics. There's a note in atomic.h about it, but it all comes down to handling the volatile usage in the counted pointer implementation for opal_lifo and opal_fifo. I'm not convinced that the volatile usage is correct in the first place, but it would definitely generate a long list of warnings if we tightly enforced the current interface specification. Other implementations are not a problem because the opal_atomic_<type>_t types for other implementations are already volatile.

bwbarrett · 2022-01-20T20:44:12Z

@gpaulsen, @jjhursey, and @awlauria there are two changes in this series that need some special IBM review.

First, we now provide LL/SC atomics for the C11 and Builtin cases. This is similar to what we did on ARM, and probably good for core atomics performance. You might want to see if this brings perf of the C11 atomics in close enough that you could use them. But even if not, it's worth a look that I didn't screw it up.

Second, in the configure code I tried to replicate the behavior of enabling/not enabling the C11 atomics on POWER. I believe i have it right, but some checks that it works how you want would be good. Since that's POWER specific code, I can't test it locally.

ibm-ompi · 2022-01-20T21:02:16Z

The IBM CI (GNU/Scale) build failed! Please review the log, linked below.

Gist: https://gist.github.com/d28c06d2627f1f7d1635d5be8f0f12ae

ibm-ompi · 2022-01-20T21:04:26Z

The IBM CI (XL) build failed! Please review the log, linked below.

Gist: https://gist.github.com/959a2d500e213614d17c1f43e7767dc2

ibm-ompi · 2022-01-20T21:04:38Z

The IBM CI (PGI) build failed! Please review the log, linked below.

Gist: https://gist.github.com/a43f16cc2441ac0a04e9d3046bc3fe64

ibm-ompi · 2022-01-20T23:28:12Z

The IBM CI (GNU/Scale) build failed! Please review the log, linked below.

Gist: https://gist.github.com/05f55bfebba8f3877ca0eb6938f4d803

ibm-ompi · 2022-01-20T23:32:30Z

The IBM CI (XL) build failed! Please review the log, linked below.

Gist: https://gist.github.com/093314a4dadca48168d15aba5208acaa

ibm-ompi · 2022-01-20T23:43:08Z

The IBM CI (PGI) build failed! Please review the log, linked below.

Gist: https://gist.github.com/df93ecaf70cf3de21751ba674f8c4ed9

ibm-ompi · 2022-01-21T01:05:28Z

The IBM CI (GNU/Scale) build failed! Please review the log, linked below.

Gist: https://gist.github.com/14620a5e2d2c18ec55637fc94cada061

ibm-ompi · 2022-01-21T01:13:12Z

The IBM CI (XL) build failed! Please review the log, linked below.

Gist: https://gist.github.com/5ea839d06680566d298d067ebe9403e1

ibm-ompi · 2022-01-21T01:45:25Z

The IBM CI (PGI) build failed! Please review the log, linked below.

Gist: https://gist.github.com/c62a4cbc3529bdc77db51031854688d5

awlauria · 2022-01-21T21:03:10Z

I pulled this down and verified that the powerPc atomics are still used with xl and gcc on our machines.

result is commonly used as a scoped variable throughout OMPI's configure. This doesn't currently cause failures, because the assembly checks unconditionally unset result (after using it for its own things). But to clean up the assembly macros, we must clean up this usage. Signed-off-by: Brian Barrett <[email protected]>

As of ebce88b, we no longer have any non-inline assembly code in Open MPI. This patch removes the configure infrastructure to support non-inline assembly. Since the only assembly style supported is GCC inline assembly, remove a bunch of constant checks in the atomics and timing code around GCC inline assembly. Those headers won't be included unless GCC inline assembly is supported. Finally, be consistent in what constant we are checking for GCC inline assembly support and add a missing check in the ARM64 timer code. Signed-off-by: Brian Barrett <[email protected]>

In the early days of OMPI, we used MB/WMB/RMB macros instead of the opal_atomic_* functions that most of the code uses today. Remove the last few places that the MB defines were used and then remove the defines themselves. Signed-off-by: Brian Barrett <[email protected]>

The atomic memory barrier interface was de facto mandatory, as none of the code making opal_atomic_mb() calls bother with the ifdefs. Since all the supported platforms have atomic memory barrier implementations, just make them required. At the same time, clean up the code to reflect that there are no longer non-inline versions of the memory barrier assembly. Signed-off-by: Brian Barrett <[email protected]>

The compare and swap and swap atomic interfaces are de facto mandatory, as no code actually checks for their existance before using them (with the notable exception of the 128 bit cswap), so make those interfaces mandatory and remove all the complex code to detect implementations of them. At the same time, reorder the implementation files to all have consistent ordering of the cswap and swap implementations, to ease maintenance and move to a implementation includes the compatibility layer mode instead of the previous overly complex all in one software implementation model. Finally, remove the untyped macro wrappers as 1) no one uses them and 2) they're a pain to maintain. Signed-off-by: Brian Barrett <[email protected]>

With the change to make compare and swap mandatory, we can always implement spinlocks, so make them mandatory and simplify the interface. Signed-off-by: Brian Barrett <[email protected]>

The atomic math interface has become required based on usage, so embrace the requirement and remove all the gorp to make the interface optional. Split out the compatibility implementation into headers that can be included as required by implementations, saving us from a mess of #defines in atomic_impl.h. Reorganize the implementation headers so that the code is in the same order across implementations, for readability. Signed-off-by: Brian Barrett <[email protected]>

Unlike other interfaces, the LL/SC interface must be optional, as many platforms do not provide LL/SC support. Document the interface a bit better, and move the PowerPC LL/SC interface into its own header file so that it can be used with C11/GCC builtin atomics, similar to the ARM platform. Signed-off-by: Brian Barrett <[email protected]>

With the previous refactoring and breaking out the implementation code into seperate headers, it is no longer required that the atomic_stdc.h implementation be included "special" in the atomics header. Clean up the ordering of includes so that every implementation follows the same include ordering. The stdc implementations of the atomics is type independent and we don't enforce inline wrappers because that results in a massive number of warnings from opal_lifo due to its interesting use of volatile. This patch takes the do-no-harm approach, but we need to clean up opal_lifo in a future patch. Signed-off-by: Brian Barrett <[email protected]>

Finish removal of the old assembly logic that supported non-inline atomics as well as inline styles other than gcc. Rework the primary configure macro to make the flow more obvious (particularly around which atomics style should be used), and push all the decisions about using C11 atomics with certain compilers into the configure logic. Signed-off-by: Brian Barrett <[email protected]>

With the introduction of the portable platform header, all the OPAL_ASSEMBLY_ARCH code was redundant and kind of a pain to deal with. Remove all uses of OPAL_ASSEMBLY_ARCH and replace with the equivalent portable platform conditional. Signed-off-by: Brian Barrett <[email protected]>

bwbarrett · 2022-01-24T20:33:21Z

bot:aws:retest (looks like an instance died mid-test)

gpaulsen · 2022-01-25T20:33:38Z

opal/include/opal/sys/atomic_impl_size_t_math.h

+ * @param delta         Value to add (converted to <TYPE>).
+ */
+#define opal_atomic_add(ADDR, VALUE) \
+    opal_atomic_add_xx((opal_atomic_intptr_t *) (ADDR), (int32_t)(VALUE), sizeof(*(ADDR)))


Should VALUE be of int32_t here? I'd expect it to be either 32 or 64?

It's always been an int32_t for the opal_atomic_add() variant's value argument. THe common use case is adding a relatively small value, and you don't want a function taking a double register argument because it's a 64 bit integer on a 32 bit machine. The right answer is to get rid of the inline assembly variants where typing is explicit, but the Power compilers need to get better first :).

gpaulsen · 2022-01-25T20:34:23Z

opal/include/opal/sys/atomic_impl_size_t_math.h

+    opal_atomic_add_xx((opal_atomic_intptr_t *) (ADDR), (int32_t)(VALUE), sizeof(*(ADDR)))
+
+static inline void opal_atomic_add_xx(opal_atomic_intptr_t *addr, int32_t value, size_t length)
+{


I'm surprised that length is of size_t since it should only either be 4 or 8?

Technically, the return type of sizeof() is a size_t. But you are correct that it will be either 4 or 8 on any reasonable platform. Since a size_t is (on any rational platform) be storable in a single register, it doesn't really matter.

gpaulsen · 2022-01-25T20:45:55Z

opal/include/opal/sys/atomic_stdc.h


-#    else
+OPAL_ATOMIC_STDC_DEFINE_FETCH_OP(add, size_t, size_t, +)
+OPAL_ATOMIC_STDC_DEFINE_FETCH_OP(sub, size_t, size_t, -)


Do we want/need ATOMIC_STDC FETCH_OPs for size_t &, |, and ^?

Historically, we've only needed addition and subtraction (and we only need subtraction because you can't/shouldn't represent negative numbers in a size_t). If we are going to add them, we would need to add them for all the cases, which would be a bunch of work.

gpaulsen · 2022-01-25T23:55:10Z

opal/include/opal/sys/powerpc/update.sh

@@ -1,39 +0,0 @@
-#!/bin/sh


bwbarrett requested review from hjelmn, ggouaillardet, gpaulsen and awlauria January 20, 2022 20:41

bwbarrett force-pushed the bugfix/asm-cleanup branch from a41bbfe to cb4e9bb Compare January 20, 2022 22:51

bwbarrett force-pushed the bugfix/asm-cleanup branch from cb4e9bb to ba3aa9d Compare January 20, 2022 23:32

bwbarrett force-pushed the bugfix/asm-cleanup branch from ba3aa9d to dfca8e3 Compare January 21, 2022 03:52

bwbarrett force-pushed the bugfix/asm-cleanup branch 2 times, most recently from a3fec07 to 539934e Compare January 23, 2022 00:28

bwbarrett added 10 commits January 24, 2022 16:37

Make spinlocks mandatory

e9d7bb9

With the change to make compare and swap mandatory, we can always implement spinlocks, so make them mandatory and simplify the interface. Signed-off-by: Brian Barrett <[email protected]>

bwbarrett force-pushed the bugfix/asm-cleanup branch from 539934e to d674d8b Compare January 24, 2022 16:38

gpaulsen approved these changes Jan 26, 2022

View reviewed changes

bwbarrett merged commit 717ce89 into open-mpi:master Jan 26, 2022

bwbarrett deleted the bugfix/asm-cleanup branch January 26, 2022 03:30

awlauria mentioned this pull request Jan 27, 2022

v5.0.x: Atomics updates #9936

Merged

gpaulsen mentioned this pull request Feb 3, 2022

opal/atomic: always inline load-link store-conditional #3988

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up atomics interface #9901

Clean up atomics interface #9901

bwbarrett commented Jan 20, 2022

bwbarrett commented Jan 20, 2022

ibm-ompi commented Jan 20, 2022

ibm-ompi commented Jan 20, 2022

ibm-ompi commented Jan 20, 2022

ibm-ompi commented Jan 20, 2022

ibm-ompi commented Jan 20, 2022

ibm-ompi commented Jan 20, 2022

ibm-ompi commented Jan 21, 2022

ibm-ompi commented Jan 21, 2022

ibm-ompi commented Jan 21, 2022

awlauria commented Jan 21, 2022

bwbarrett commented Jan 24, 2022

gpaulsen Jan 25, 2022

bwbarrett Jan 26, 2022

gpaulsen Jan 25, 2022

bwbarrett Jan 26, 2022

gpaulsen Jan 25, 2022

bwbarrett Jan 26, 2022 •

edited

Loading

gpaulsen Jan 25, 2022

Clean up atomics interface #9901

Clean up atomics interface #9901

Conversation

bwbarrett commented Jan 20, 2022

bwbarrett commented Jan 20, 2022

ibm-ompi commented Jan 20, 2022

ibm-ompi commented Jan 20, 2022

ibm-ompi commented Jan 20, 2022

ibm-ompi commented Jan 20, 2022

ibm-ompi commented Jan 20, 2022

ibm-ompi commented Jan 20, 2022

ibm-ompi commented Jan 21, 2022

ibm-ompi commented Jan 21, 2022

ibm-ompi commented Jan 21, 2022

awlauria commented Jan 21, 2022

bwbarrett commented Jan 24, 2022

gpaulsen Jan 25, 2022

Choose a reason for hiding this comment

bwbarrett Jan 26, 2022

Choose a reason for hiding this comment

gpaulsen Jan 25, 2022

Choose a reason for hiding this comment

bwbarrett Jan 26, 2022

Choose a reason for hiding this comment

gpaulsen Jan 25, 2022

Choose a reason for hiding this comment

bwbarrett Jan 26, 2022 • edited Loading

Choose a reason for hiding this comment

gpaulsen Jan 25, 2022

Choose a reason for hiding this comment

bwbarrett Jan 26, 2022 •

edited

Loading