-
Notifications
You must be signed in to change notification settings - Fork 900
opal/atomic: always inline load-link store-conditional #3988
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@PHHargrove This should fix the issue. |
@hjelmn retesting all my ppc64 systems today |
Do we have an idea why function call breaks it ? |
@shamisp Live-lock. When LL/SC are function calls the chance that we touch a cache line in such a way that the LL reservation is canceled goes up. This is my mistake. I designed the lifo/fifo to avoid live-lock but always intended the LL/SC atomics to be always inlined. I forgot to add the keyword. |
In this case ISB (instruction barrier) is missing inside of the function
call ?
…On Tue, Aug 1, 2017 at 09:20 Nathan Hjelm ***@***.***> wrote:
@shamisp <https://github.com/shamisp> Live-lock. When LL/SC are function
calls the chance that we touch a cache line in such a way that the LL
reservation is canceled goes up.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3988 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ACIe2MvqFYmpFMQVZDoXnZ2hBmsyxv8eks5sTzQkgaJpZM4Opw_Q>
.
|
I don't think an instruction barrier will help. The problem is that more memory may be touched by the LL/SC calls being full function calls (extra ld/st instructions). Each ld/st may touch a memory location that could be mapped to the reservation cache line. If this happens the SC fails and the lifo pop will never progress. |
Ok, I will see if I can figure out what else might be happening. |
@hjelmn I have a debugger attached if you want to have a look during a break today |
Nathan, I would suggest to put memory barrier before invoking AMO inside of
the function and see if it helps.
…On Tue, Aug 1, 2017 at 10:09 AM, Nathan Hjelm ***@***.***> wrote:
Ok, I will see if I can figure out what else might be happening.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3988 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ACIe2MlcFD98E7WpgLlqp3ISgBD1oyLBks5sTz-NgaJpZM4Opw_Q>
.
|
@shamisp Hmm, a memory barrier might help as well. I see that we don't have one already. Lets see if moving the ghost read out of the path helps then I will try that. |
@PHHargrove One more commit. I moved some of the code around. If this doesn't help the output of cc -S opal_fifo.c would be very helpful. |
I would think that inline vs no inline just reduces the chances. In order
to nail this down we have to use some sort of barrier.
…On Tue, Aug 1, 2017 at 10:18 AM, Nathan Hjelm ***@***.***> wrote:
@shamisp <https://github.com/shamisp> Hmm, a memory barrier might help as
well. I see that we don't have one already. Lets see if moving the ghost
read out of the path helps then I will try that.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3988 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ACIe2AYD1H2Pzb6_oSSdOz-vH5xq6s7eks5sT0HhgaJpZM4Opw_Q>
.
|
@hjelmn opal_fifo.o (and thus .s) contains only |
@PHHargrove I mean |
@hjelmn Duh! Also FYI:
|
In case it matters:
|
Well, thats interesting. Its using the cmpset-128 implementation. Did not see that coming... So there are two problems. One with LL/SC (which I think I fixed) and one with CSWAP128 with GCC builtins. I think I have an idea how to fix the later since i doubt CSWAP128 is lock-free on PPC64. It isn't on x86-64 with gcc 7.x either. |
Looks like I already put the lock-free check :). Hmmm, so something else is going on. |
put the barrier ...
…On Tue, Aug 1, 2017 at 11:44 AM, Nathan Hjelm ***@***.***> wrote:
Looks like I already put the lock-free check :). Hmmm, so something else
is going on.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3988 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ACIe2HmDh-FrIKWei6XsdEyrClOCTfLyks5sT1X9gaJpZM4Opw_Q>
.
|
@shamisp This is a different code path. There is already a barrier in that one. The code path in question was only intended to be used on x86-64 so its possible the barriers are wrong or in the wrong places. |
@shamisp This remaining issue may not affect aarch64. Do you know if |
@hjelmn
|
@PHHargrove Thanks. That is good to know. I am a little surprised PPC64 has it. |
Upgrading from gcc-4.8.5 to gcc-7.1.0 on the ppc64el host does not change the result. |
@kawashima-fj Can you test this and see if it resolves the aarch64 issue? |
@hjelmn I confirmed the patch on AArch64 but it does not resolve the problem. |
The IBM CI (PGI Compiler) build failed! Please review the log, linked below. Gist: https://gist.github.com/24416d8f956d9a9d234937bc8248c48a |
This commit addresses a live-lock that can occur on LL/SC architectures when turning off optimizations (-O0). In this case opal_atomic_ll and opal_atomic_sc are not inlined. This adds additional loads and stores between the load-linked and store-conditional instructions in the LL/SC lifo and fifo implemenations. The problem is addressed in two ways: - Re-work the LL/SC fifo code to reduce the chance that a load or store can cancel the load-linked reservation before the store-conditional can be executed. This rework involves moving the SC closer to the LL and using the register keyword to avoid additional load instructions. - Convert the LL/SC atomics from inline function to macros. The functions were changed to macros because the same behavior is observed with -O0 and always_inline. Signed-off-by: Nathan Hjelm <[email protected]>
The IBM CI (PGI Compiler) build failed! Please review the log, linked below. Gist: https://gist.github.com/242cb10b642d9718e6d066f7282e4af3 |
The PGI error looks like a compiler bug. @jjhursey Do you know who at PGI can take a look? |
@hjelmn I used the support form here: You have to create an account to fill out the form. If you have a small reproducer, it might help to get a faster response. CI is running the latest community edition (17.4). |
@jjhursey Ok, I will put together a simple reproducer. Still need to see if I can work around the bug. |
:bot:retest |
The IBM CI (PGI Compiler) build failed! Please review the log, linked below. Gist: https://gist.github.com/e5591d0a2c36a608e2b52b5e61cd5dae |
@hjelmn This PR is now a year old. Is it going to go anywhere? |
The IBM CI (PGI Compiler) build failed! Please review the log, linked below. Gist: https://gist.github.com/7b660c9cdc77e221271a9e3aaf6d457d |
@hjelmn it looks like we decided to remove atomics in master. Should this be closed? |
Not all atomics. Just the __sync builtins. Still working on that. This PR is only relevant to the internal atomics. |
The IBM CI (GNU/Scale) build failed! Please review the log, linked below. Gist: https://gist.github.com/c69f2faf7c102d573864b18afa7d6a6b |
Re-ping. 5.0 branching is targeted for April 30th. If you want this in for 5.0, please target to get it in master by then. Thanks! |
Can one of the admins verify this patch? |
@hjelmn What's the status of this PR? Did this get resolved with your other load-link store-conditional PR? |
@hjelmn Will you have time in next few weeks to rebase this? |
ok. will take a look this coming week. |
FYI @bosilca Do you know if this is still relevant? |
A quick look at this indicates that some of the changes proposed here already made it in OMPI, but not all. What we miss without this patch is the lack of inlined atomics for some old compilers, something I'm nore sure we care that much about. Thus, ff the current version works on ppc, then we might want to drop this patch. |
@bwbarrett @hjelmn @bosilca Can we close this now that #9901 (and others) have gone into |
#9901 would not have changed this issue at all. It refactored some code, but f8dbf62 (which was after this PR, but still going on 4 years ago) was the patch that actually moved the LL/SC calls from inline functions to macros (which they kind of have to be, because LL/SC). The other parts of this patch have not been added to master, but I'm not convinced they're actually required. I think the important bits are already included, although I'm not 100% sure that we couldn't do more improvements on the opal_lifo LL/SC implementation. @gpaulsen if IBM has a performance expert who might be good at lifo/fifo stack implementations on POWER, it's probably worth a review. I think the right answer is to close this PR; we haven't been getting reports of failures on POWER (right, @gpaulsen) or ARM64. We should open an issue to review the LL/SC LIFO/FIFO implementation, but this PR doesn't really help us with that, given how out of date it is. |
Enabling debugging can cause the load-link store-conditional
atomic operations to hit a live-lock condition. To prevent the
live-lock always inline these atomics.
Fixes #3697
Signed-off-by: Nathan Hjelm [email protected]