-
Notifications
You must be signed in to change notification settings - Fork 900
opal/sync: fix race condition #1816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@artpol84 Getting this running through Jenkins. Will try to get it retested multiple times to see if it fails. In theory it shouldn't. |
bot:retest |
:bot:retest: |
1 similar comment
:bot:retest: |
Retest after adding the original Jenkins command line with ob1 (allow UCX to load and unload.) bot:retest |
I can make this even faster as the lock/unlock is not really needed for the condition signal here. Will leave that for later though. |
Looking good. Sending through again. :bot:retest: |
@hjelmn I guess that race condition is still possible here
Same error here. I also assume there will be problems with several independently running PML progress threads. So it seems that you need a memory fence here: static inline void wait_sync_update(ompi_wait_sync_t *sync, int updates, int status)
{
sync->signalling = true;
mem_fence(); // <---------
if( OPAL_LIKELY(OPAL_SUCCESS == status) ) {
if( 0 != (OPAL_THREAD_ADD32(&sync->count, -updates)) ) {
return;
}
} else {
/* this is an error path so just use the atomic */
opal_atomic_swap_32 (&sync->count, 0);
sync->status = OPAL_ERROR;
}
WAIT_SYNC_SIGNAL(sync);
} This race will be much more rare compared to original one, and harder to track. |
Instruction reordering will not happen. pthread calls are by definition effectively a memory barrier. We could add an isync just to be safe if you want think it is warranted. We have isync for both ppc and arm which are the supported platforms that do instruction reordering. |
Oh, wait. before the atomic. Sorry, didn't catch that part when looking at the email version. You are absolutely correct. |
An atomic wmb would be fine there. An isync as I suggested might be too weak. |
Setting of signal and changing count doesn't have pthreads, they are coming суббота, 25 июня 2016 г. пользователь Nathan Hjelm написал:
Best regards, Artem Polyakov |
@artpol84 Yup. Just caught that after hitting comment :) |
Will add the barrier now. |
And correct to the more common signaling spelling :D |
7c30bff
to
d204f67
Compare
@artpol84 Thanks for catching that. Should be fixed now. |
:bot:retest: |
@artpol84 Feel free to open a PR vs my branch and add the mellanox copyright. I prefer the commit adding a copyright line be from a member the copyrighting organization. |
@@ -75,6 +89,9 @@ static inline int sync_wait_st (ompi_wait_sync_t *sync) | |||
*/ | |||
static inline void wait_sync_update(ompi_wait_sync_t *sync, int updates, int status) | |||
{ | |||
sync->signaling = true; | |||
/* ensure the signalling value is commited before updating the count */ | |||
opal_atomic_wmb (); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hjelmn I may be wrong but it seems that wmb addresses CPU-level reordering possibility.
Do we address compiler reordering?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean adding something like asm volatile("" ::: "memory");
may be needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this is already done in wmb()
This commit fixes a race condition discovered by @artpol84. The race happens when a signalling thread decrements the sync count to 0 then goes to sleep. If the waiting thread runs and detects the count == 0 before going to sleep on the condition variable it will destroy the condition variable while the signalling thread is potentially still processing the completion. The fix is to add a non-atomic member to the sync structure that indicates another process is handling completion. Since the member will only be set to false by the initiating thread and the completing thread the variable does not need to be protected. When destoying a condition variable the waiting thread needs to wait until the singalling thread is finished. Thanks to @artpol84 for tracking this down. Fixes open-mpi#1813 Signed-off-by: Nathan Hjelm <[email protected]>
d204f67
to
fb455f0
Compare
Looks like waitany and waitsome are broken. I filed the PR to @hjelmn branch: Can be easily reproduced with following modifications of overlap test: |
@jladd-mlnx @miked-mellanox I think we (Mellanox) will need to include this tests to our jenkins/MTT suite. |
@jsquyres, sure, we can do that as well. |
@hjelmn, what do you mean? I don't see the wait some waitany fix in this Pr. |
I think this patch fixes the original issue, and as such it is ready to go. The performance issues discovered meanwhile should be fixed by a subsequent ticket, once we have the correct fix. |
There was a) a bug and b) performance issue. вторник, 28 июня 2016 г. пользователь bosilca написал:
Best regards, Artem Polyakov |
hjelmn@42e6251 |
Without this bugfix the following tests (already mentioned above) are hanging 100% I think PR must fix without new hang introduction. |
42e6251 introduces its own issues (I commented on it on the other issue about the possible race condition). A possible fix is to generate a sync_update (which will take care of the signaling field) in the wait* operation if there is nothing to wait for. |
@bosilca I think that your comment #1820 (comment) is reasonable for MPI_Waitany only. So in my understanding it is related to this PR because I want this commit 42e6251 to go along with this PR. |
@bosilca btw I solved the problem with Waitany with sync_sets/unsets instead of sync_update. I can do the sync_update but with sync_sets/unsets we avoiding one atomic. What do you think? |
ohh, and according to the backtrace I had when I was debugging: sync_wait will then try to call condition which has internal lock (see the backtrace of the second thread). |
I wont be able to join the call today. |
I think I've got all your points now. Will update PRs later today. Thank вторник, 28 июня 2016 г. пользователь bosilca написал:
Best regards, Artem Polyakov |
Again updated the PR hjelmn#9 with slightly improved MPI_Waitsome performance according @bosilca suggestion |
I don't think it's related. The proble is in MPI Init. Unlikely request вторник, 28 июня 2016 г. пользователь Joshua Ladd написал:
Best regards, Artem Polyakov |
Well, it shouldn't be crashing. It should be exiting. This is the problem:
rdmacm is working for me. Not sure why it is failing to load on Mellanox Jenkins. |
Unless that is the other port again. Maybe the openib btl is not correctly disqualifying that port? |
BTW, if you need udcm to work with a router it wouldn't take much work. No need to use rdmacm for that :). |
I will check tomorrow. |
(request handling related)
Request handling: fix MPI_Waitany and MPI_Waitsome
@artpol84 Will merge after jenkins finishes. |
Sure. |
This commit fixes a race condition discovered by @artpol84. The race
happens when a signalling thread decrements the sync count to 0 then
goes to sleep. If the waiting thread runs and detects the count == 0
before going to sleep on the condition variable it will destroy the
condition variable while the signalling thread is potentially still
processing the completion. The fix is to add a non-atomic member to
the sync structure that indicates another process is handling
completion. Since the member will only be set to false by the
initiating thread and the completing thread the variable does not need
to be protected. When destoying a condition variable the waiting
thread needs to wait until the singalling thread is finished.
Thanks to @artpol84 for tracking this down.
Fixes #1813
Signed-off-by: Nathan Hjelm [email protected]