Skip to content

opal/atomic: always inline load-link store-conditional #3988

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

hjelmn
Copy link
Member

@hjelmn hjelmn commented Aug 1, 2017

Enabling debugging can cause the load-link store-conditional
atomic operations to hit a live-lock condition. To prevent the
live-lock always inline these atomics.

Fixes #3697

Signed-off-by: Nathan Hjelm [email protected]

@hjelmn
Copy link
Member Author

hjelmn commented Aug 1, 2017

@PHHargrove This should fix the issue.

@PHHargrove
Copy link
Member

@hjelmn retesting all my ppc64 systems today

@shamisp
Copy link
Contributor

shamisp commented Aug 1, 2017

Do we have an idea why function call breaks it ?

@hjelmn
Copy link
Member Author

hjelmn commented Aug 1, 2017

@shamisp Live-lock. When LL/SC are function calls the chance that we touch a cache line in such a way that the LL reservation is canceled goes up.

This is my mistake. I designed the lifo/fifo to avoid live-lock but always intended the LL/SC atomics to be always inlined. I forgot to add the keyword.

@shamisp
Copy link
Contributor

shamisp commented Aug 1, 2017 via email

@hjelmn
Copy link
Member Author

hjelmn commented Aug 1, 2017

I don't think an instruction barrier will help. The problem is that more memory may be touched by the LL/SC calls being full function calls (extra ld/st instructions). Each ld/st may touch a memory location that could be mapped to the reservation cache line. If this happens the SC fails and the lifo pop will never progress.

@PHHargrove
Copy link
Member

@hjelmn No joy with 4fb5fb3 ☹️
Still seeing opal_fifo hanging on a ppc64le.

@hjelmn
Copy link
Member Author

hjelmn commented Aug 1, 2017

Ok, I will see if I can figure out what else might be happening.

@PHHargrove
Copy link
Member

@hjelmn I have a debugger attached if you want to have a look during a break today

@shamisp
Copy link
Contributor

shamisp commented Aug 1, 2017 via email

@hjelmn
Copy link
Member Author

hjelmn commented Aug 1, 2017

@shamisp Hmm, a memory barrier might help as well. I see that we don't have one already. Lets see if moving the ghost read out of the path helps then I will try that.

@hjelmn
Copy link
Member Author

hjelmn commented Aug 1, 2017

@PHHargrove One more commit. I moved some of the code around. If this doesn't help the output of cc -S opal_fifo.c would be very helpful.

@shamisp
Copy link
Contributor

shamisp commented Aug 1, 2017 via email

@PHHargrove
Copy link
Member

@hjelmn opal_fifo.o (and thus .s) contains only opal_fifo_construct and opal_obj_run_constructors. Is that really what you are looking for?

@hjelmn
Copy link
Member Author

hjelmn commented Aug 1, 2017

@PHHargrove I mean ompi/tests/class/opal_fifo.c

@PHHargrove
Copy link
Member

@hjelmn Duh!
Here you go: opal_fifo.s.txt
It doesn't look to be using the code you've been editing at all!

Also FYI:

/home/phargrov/OMPI/openmpi-3.0.0rc2-linux-ppc64el-gcc/openmpi-3.0.0rc2/test/class/opal_fifo.c:109:26: warning: assignment discards `volatile' qualifier from pointer target type [enabled by default]
     for (count = 0, item = fifo->opal_fifo_head.data.item ; item != &fifo->opal_fifo_ghost ;
                          ^

@PHHargrove
Copy link
Member

In case it matters:

*** Assembler
checking dependency style of gcc -std=gnu99... gcc3
checking for BSD- or MS-compatible name lister (nm)... /usr/bin/nm -B
checking the name lister (/usr/bin/nm -B) interface... BSD nm
checking for fgrep... /usr/bin/grep -F
checking for __atomic builtin atomics... yes
checking for processor support of __atomic builtin atomic compare-and-swap on 128-bit values... yes
checking if __int128 atomic compare-and-swap is always lock-free... yes
checking if .proc/endp is needed... no
checking directive for setting text section... .text
checking directive for exporting symbols... .globl
checking for objdump... objdump
checking if .note.GNU-stack is needed... yes
checking suffix for labels... :
checking prefix for global symbol labels...
checking prefix for lsym labels... .L
checking prefix for function in .type... @
checking if .size is needed... yes
checking if .align directive takes logarithmic value... yes
checking if PowerPC registers have r prefix... no
checking for assembly architecture... POWERPC64
checking for builtin atomics... BUILTIN_GCC
checking for atomic assembly filename... none

@hjelmn
Copy link
Member Author

hjelmn commented Aug 1, 2017

Well, thats interesting. Its using the cmpset-128 implementation. Did not see that coming... So there are two problems. One with LL/SC (which I think I fixed) and one with CSWAP128 with GCC builtins. I think I have an idea how to fix the later since i doubt CSWAP128 is lock-free on PPC64. It isn't on x86-64 with gcc 7.x either.

@hjelmn
Copy link
Member Author

hjelmn commented Aug 1, 2017

Looks like I already put the lock-free check :). Hmmm, so something else is going on.

@shamisp
Copy link
Contributor

shamisp commented Aug 1, 2017 via email

@hjelmn
Copy link
Member Author

hjelmn commented Aug 1, 2017

@shamisp This is a different code path. There is already a barrier in that one.

The code path in question was only intended to be used on x86-64 so its possible the barriers are wrong or in the wrong places.

@hjelmn
Copy link
Member Author

hjelmn commented Aug 1, 2017

@shamisp This remaining issue may not affect aarch64. Do you know if __atomic_always_lock_free (16, NULL) returns true or false on aarch64?

@PHHargrove
Copy link
Member

@hjelmn
At least my aarch64 system (gcc-7.1.0 on APM X-Gene Mustang board) lacks 128-bit CAS:

*** Assembler
checking dependency style of gcc... gcc3
checking for BSD- or MS-compatible name lister (nm)... /usr/bin/nm -B
checking the name lister (/usr/bin/nm -B) interface... BSD nm
checking for fgrep... /bin/grep -F
checking for __atomic builtin atomics... yes
checking for processor support of __atomic builtin atomic compare-and-swap on 128-bit values... no
checking for __atomic builtin atomic compare-and-swap on 128-bit values with -mcx16 flag... no
checking if .proc/endp is needed... no
checking directive for setting text section... .text
checking directive for exporting symbols... .globl
checking for objdump... objdump
checking if .note.GNU-stack is needed... yes
checking suffix for labels... :
checking prefix for global symbol labels...
checking prefix for lsym labels... .L
checking prefix for function in .type... @
checking if .size is needed... yes
checking if .align directive takes logarithmic value... yes
checking for assembly architecture... ARM64
checking for builtin atomics... BUILTIN_GCC
checking for atomic assembly filename... none

@hjelmn
Copy link
Member Author

hjelmn commented Aug 1, 2017

@PHHargrove Thanks. That is good to know. I am a little surprised PPC64 has it.

@PHHargrove
Copy link
Member

Upgrading from gcc-4.8.5 to gcc-7.1.0 on the ppc64el host does not change the result.

@hjelmn
Copy link
Member Author

hjelmn commented Aug 1, 2017

@kawashima-fj Can you test this and see if it resolves the aarch64 issue?

@kawashima-fj
Copy link
Member

@hjelmn I confirmed the patch on AArch64 but it does not resolve the problem. make check (opal_lifo, opal_list, ompi_rb_tree) still hangs with -O0 + --disable-builtin-atomics. I'll investigate later today (in Japanese timezone).

@ibm-ompi
Copy link

ibm-ompi commented Sep 1, 2017

The IBM CI (PGI Compiler) build failed! Please review the log, linked below.

Gist: https://gist.github.com/24416d8f956d9a9d234937bc8248c48a

This commit addresses a live-lock that can occur on LL/SC
architectures when turning off optimizations (-O0). In this case
opal_atomic_ll and opal_atomic_sc are not inlined. This adds
additional loads and stores between the load-linked and
store-conditional instructions in the LL/SC lifo and fifo
implemenations. The problem is addressed in two ways:

 - Re-work the LL/SC fifo code to reduce the chance that
   a load or store can cancel the load-linked reservation before
   the store-conditional can be executed. This rework involves
   moving the SC closer to the LL and using the register keyword
   to avoid additional load instructions.

 - Convert the LL/SC atomics from inline function to macros. The
   functions were changed to macros because the same behavior is
   observed with -O0 and always_inline.

Signed-off-by: Nathan Hjelm <[email protected]>
@ibm-ompi
Copy link

ibm-ompi commented Sep 1, 2017

The IBM CI (PGI Compiler) build failed! Please review the log, linked below.

Gist: https://gist.github.com/242cb10b642d9718e6d066f7282e4af3

@hjelmn
Copy link
Member Author

hjelmn commented Sep 1, 2017

The PGI error looks like a compiler bug. @jjhursey Do you know who at PGI can take a look?

@jjhursey
Copy link
Member

jjhursey commented Sep 1, 2017

@hjelmn I used the support form here:
https://www.pgroup.com/support/support_request.php

You have to create an account to fill out the form. If you have a small reproducer, it might help to get a faster response.

CI is running the latest community edition (17.4).

@hjelmn
Copy link
Member Author

hjelmn commented Sep 14, 2017

@jjhursey Ok, I will put together a simple reproducer. Still need to see if I can work around the bug.

@hjelmn
Copy link
Member Author

hjelmn commented Dec 19, 2017

:bot:retest

@ibm-ompi
Copy link

The IBM CI (PGI Compiler) build failed! Please review the log, linked below.

Gist: https://gist.github.com/e5591d0a2c36a608e2b52b5e61cd5dae

@jsquyres
Copy link
Member

jsquyres commented Dec 1, 2018

@hjelmn This PR is now a year old. Is it going to go anywhere?

@ibm-ompi
Copy link

The IBM CI (PGI Compiler) build failed! Please review the log, linked below.

Gist: https://gist.github.com/7b660c9cdc77e221271a9e3aaf6d457d

@awlauria
Copy link
Contributor

@hjelmn it looks like we decided to remove atomics in master. Should this be closed?

@hjelmn
Copy link
Member Author

hjelmn commented Feb 20, 2020

Not all atomics. Just the __sync builtins. Still working on that. This PR is only relevant to the internal atomics.

@ibm-ompi
Copy link

The IBM CI (GNU/Scale) build failed! Please review the log, linked below.

Gist: https://gist.github.com/c69f2faf7c102d573864b18afa7d6a6b

@awlauria awlauria added this to the v5.0.0 milestone Mar 12, 2020
@awlauria
Copy link
Contributor

Re-ping. 5.0 branching is targeted for April 30th. If you want this in for 5.0, please target to get it in master by then. Thanks!

@lanl-ompi
Copy link
Contributor

Can one of the admins verify this patch?

@gpaulsen
Copy link
Member

@hjelmn What's the status of this PR? Did this get resolved with your other load-link store-conditional PR?

@awlauria
Copy link
Contributor

I think we're gonna want this in @gpaulsen for v5.0.x. @hjelmn I can try rebasing it this week.

@gpaulsen
Copy link
Member

@hjelmn Will you have time in next few weeks to rebase this?
We're shooting for v5.0 rc1 on Sept 23 (< 1 month!)

@hjelmn
Copy link
Member Author

hjelmn commented Aug 28, 2021

ok. will take a look this coming week.

@jsquyres
Copy link
Member

FYI @bosilca Do you know if this is still relevant?

@bosilca
Copy link
Member

bosilca commented Aug 31, 2021

A quick look at this indicates that some of the changes proposed here already made it in OMPI, but not all. What we miss without this patch is the lack of inlined atomics for some old compilers, something I'm nore sure we care that much about. Thus, ff the current version works on ppc, then we might want to drop this patch.

@gpaulsen
Copy link
Member

gpaulsen commented Feb 3, 2022

@bwbarrett @hjelmn @bosilca Can we close this now that #9901 (and others) have gone into master?

@bwbarrett
Copy link
Member

#9901 would not have changed this issue at all. It refactored some code, but f8dbf62 (which was after this PR, but still going on 4 years ago) was the patch that actually moved the LL/SC calls from inline functions to macros (which they kind of have to be, because LL/SC). The other parts of this patch have not been added to master, but I'm not convinced they're actually required. I think the important bits are already included, although I'm not 100% sure that we couldn't do more improvements on the opal_lifo LL/SC implementation. @gpaulsen if IBM has a performance expert who might be good at lifo/fifo stack implementations on POWER, it's probably worth a review.

I think the right answer is to close this PR; we haven't been getting reports of failures on POWER (right, @gpaulsen) or ARM64. We should open an issue to review the LL/SC LIFO/FIFO implementation, but this PR doesn't really help us with that, given how out of date it is.

@bwbarrett bwbarrett closed this Feb 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hang in mca_mpool_hugepage_module_init() on ARM64