Skip to content

Clean up atomics interface #9901

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jan 26, 2022
Merged

Conversation

bwbarrett
Copy link
Member

So this started with trying to add a new math atomic (logical and), and turned into a bunch of code and I still haven't gotten to adding a logical and atomic.

The atomics interface was always a mess of complexity because of the need to support non-inline assembly. As that was partially removed and other interfaces added, it had become really complex to add new functionality. We also weren't consistent with interfaces from platform to platform and even answering the question (for the configure summary) of what version of atomics we were going to use was near impossible. Finally, everyone had started assuming optional parts of the interface were actually required (although they were there for all implementations today, so no harm).

This patch series tries to greatly simplify all that, and get us back to atomic.h being the source of compiler-enforced truth for the interface specification.

One remaining open is the compare and swap functions for the C11 atomics. There's a note in atomic.h about it, but it all comes down to handling the volatile usage in the counted pointer implementation for opal_lifo and opal_fifo. I'm not convinced that the volatile usage is correct in the first place, but it would definitely generate a long list of warnings if we tightly enforced the current interface specification. Other implementations are not a problem because the opal_atomic_<type>_t types for other implementations are already volatile.

@bwbarrett
Copy link
Member Author

@gpaulsen, @jjhursey, and @awlauria there are two changes in this series that need some special IBM review.

First, we now provide LL/SC atomics for the C11 and Builtin cases. This is similar to what we did on ARM, and probably good for core atomics performance. You might want to see if this brings perf of the C11 atomics in close enough that you could use them. But even if not, it's worth a look that I didn't screw it up.

Second, in the configure code I tried to replicate the behavior of enabling/not enabling the C11 atomics on POWER. I believe i have it right, but some checks that it works how you want would be good. Since that's POWER specific code, I can't test it locally.

@ibm-ompi
Copy link

The IBM CI (GNU/Scale) build failed! Please review the log, linked below.

Gist: https://gist.github.com/d28c06d2627f1f7d1635d5be8f0f12ae

@ibm-ompi
Copy link

The IBM CI (XL) build failed! Please review the log, linked below.

Gist: https://gist.github.com/959a2d500e213614d17c1f43e7767dc2

@ibm-ompi
Copy link

The IBM CI (PGI) build failed! Please review the log, linked below.

Gist: https://gist.github.com/a43f16cc2441ac0a04e9d3046bc3fe64

@ibm-ompi
Copy link

The IBM CI (GNU/Scale) build failed! Please review the log, linked below.

Gist: https://gist.github.com/05f55bfebba8f3877ca0eb6938f4d803

@ibm-ompi
Copy link

The IBM CI (XL) build failed! Please review the log, linked below.

Gist: https://gist.github.com/093314a4dadca48168d15aba5208acaa

@ibm-ompi
Copy link

The IBM CI (PGI) build failed! Please review the log, linked below.

Gist: https://gist.github.com/df93ecaf70cf3de21751ba674f8c4ed9

@ibm-ompi
Copy link

The IBM CI (GNU/Scale) build failed! Please review the log, linked below.

Gist: https://gist.github.com/14620a5e2d2c18ec55637fc94cada061

@ibm-ompi
Copy link

The IBM CI (XL) build failed! Please review the log, linked below.

Gist: https://gist.github.com/5ea839d06680566d298d067ebe9403e1

@ibm-ompi
Copy link

The IBM CI (PGI) build failed! Please review the log, linked below.

Gist: https://gist.github.com/c62a4cbc3529bdc77db51031854688d5

@awlauria
Copy link
Contributor

I pulled this down and verified that the powerPc atomics are still used with xl and gcc on our machines.

@bwbarrett bwbarrett force-pushed the bugfix/asm-cleanup branch 2 times, most recently from a3fec07 to 539934e Compare January 23, 2022 00:28
result is commonly used as a scoped variable throughout OMPI's
configure.  This doesn't currently cause failures, because
the assembly checks unconditionally unset result (after using
it for its own things).  But to clean up the assembly macros,
we must clean up this usage.

Signed-off-by: Brian Barrett <[email protected]>
As of ebce88b, we no longer have any non-inline assembly code in
Open MPI.  This patch removes the configure infrastructure to
support non-inline assembly.

Since the only assembly style supported is GCC inline assembly,
remove a bunch of constant checks in the atomics and timing
code around GCC inline assembly.  Those headers won't be
included unless GCC inline assembly is supported.

Finally, be consistent in what constant we are checking
for GCC inline assembly support and add a missing check
in the ARM64 timer code.

Signed-off-by: Brian Barrett <[email protected]>
In the early days of OMPI, we used MB/WMB/RMB macros instead of the
opal_atomic_* functions that most of the code uses today.  Remove
the last few places that the MB defines were used and then remove
the defines themselves.

Signed-off-by: Brian Barrett <[email protected]>
The atomic memory barrier interface was de facto mandatory,
as none of the code making opal_atomic_mb() calls bother
with the ifdefs.  Since all the supported platforms have
atomic memory barrier implementations, just make them
required.

At the same time, clean up the code to reflect that there
are no longer non-inline versions of the memory barrier
assembly.

Signed-off-by: Brian Barrett <[email protected]>
The compare and swap and swap atomic interfaces are de facto
mandatory, as no code actually checks for their existance before
using them (with the notable exception of the 128 bit cswap), so
make those interfaces mandatory and remove all the complex code
to detect implementations of them.

At the same time, reorder the implementation files to all have
consistent ordering of the cswap and swap implementations, to
ease maintenance and move to a implementation includes the
compatibility layer mode instead of the previous overly complex
all in one software implementation model.

Finally, remove the untyped macro wrappers as 1) no one uses
them and 2) they're a pain to maintain.

Signed-off-by: Brian Barrett <[email protected]>
With the change to make compare and swap mandatory, we can always
implement spinlocks, so make them mandatory and simplify the interface.

Signed-off-by: Brian Barrett <[email protected]>
The atomic math interface has become required based on usage,
so embrace the requirement and remove all the gorp to make the
interface optional.

Split out the compatibility implementation into headers that can
be included as required by implementations, saving us from a
mess of #defines in atomic_impl.h.

Reorganize the implementation headers so that the code is in the same
order across implementations, for readability.

Signed-off-by: Brian Barrett <[email protected]>
Unlike other interfaces, the LL/SC interface must be optional, as
many platforms do not provide LL/SC support.  Document the interface
a bit better, and move the PowerPC LL/SC interface into its own
header file so that it can be used with C11/GCC builtin atomics,
similar to the ARM platform.

Signed-off-by: Brian Barrett <[email protected]>
With the previous refactoring and breaking out the implementation
code into seperate headers, it is no longer required that the
atomic_stdc.h implementation be included "special" in the
atomics header.  Clean up the ordering of includes so that
every implementation follows the same include ordering.

The stdc implementations of the atomics is type independent
and we don't enforce inline wrappers because that results in
a massive number of warnings from opal_lifo due to its interesting
use of volatile.  This patch takes the do-no-harm approach, but
we need to clean up opal_lifo in a future patch.

Signed-off-by: Brian Barrett <[email protected]>
Finish removal of the old assembly logic that supported non-inline
atomics as well as inline styles other than gcc.  Rework the
primary configure macro to make the flow more obvious (particularly
around which atomics style should be used), and push all the
decisions about using C11 atomics with certain compilers into
the configure logic.

Signed-off-by: Brian Barrett <[email protected]>
With the introduction of the portable platform header, all the
OPAL_ASSEMBLY_ARCH code was redundant and kind of a pain to
deal with.  Remove all uses of OPAL_ASSEMBLY_ARCH and replace
with the equivalent portable platform conditional.

Signed-off-by: Brian Barrett <[email protected]>
@bwbarrett
Copy link
Member Author

bot:aws:retest (looks like an instance died mid-test)

* @param delta Value to add (converted to <TYPE>).
*/
#define opal_atomic_add(ADDR, VALUE) \
opal_atomic_add_xx((opal_atomic_intptr_t *) (ADDR), (int32_t)(VALUE), sizeof(*(ADDR)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should VALUE be of int32_t here? I'd expect it to be either 32 or 64?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's always been an int32_t for the opal_atomic_add() variant's value argument. THe common use case is adding a relatively small value, and you don't want a function taking a double register argument because it's a 64 bit integer on a 32 bit machine. The right answer is to get rid of the inline assembly variants where typing is explicit, but the Power compilers need to get better first :).

opal_atomic_add_xx((opal_atomic_intptr_t *) (ADDR), (int32_t)(VALUE), sizeof(*(ADDR)))

static inline void opal_atomic_add_xx(opal_atomic_intptr_t *addr, int32_t value, size_t length)
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised that length is of size_t since it should only either be 4 or 8?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, the return type of sizeof() is a size_t. But you are correct that it will be either 4 or 8 on any reasonable platform. Since a size_t is (on any rational platform) be storable in a single register, it doesn't really matter.


# else
OPAL_ATOMIC_STDC_DEFINE_FETCH_OP(add, size_t, size_t, +)
OPAL_ATOMIC_STDC_DEFINE_FETCH_OP(sub, size_t, size_t, -)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want/need ATOMIC_STDC FETCH_OPs for size_t &, |, and ^?

Copy link
Member Author

@bwbarrett bwbarrett Jan 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Historically, we've only needed addition and subtraction (and we only need subtraction because you can't/shouldn't represent negative numbers in a size_t). If we are going to add them, we would need to add them for all the cases, which would be a bunch of work.

@@ -1,39 +0,0 @@
#!/bin/sh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏽

@bwbarrett bwbarrett merged commit 717ce89 into open-mpi:master Jan 26, 2022
@bwbarrett bwbarrett deleted the bugfix/asm-cleanup branch January 26, 2022 03:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants