-
Notifications
You must be signed in to change notification settings - Fork 900
ompi request handling race condition fix (MT-case) #1815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@hjelmn @bosilca @jladd-mlnx @jsquyres @hppritcha This code needs to be verified, however it would be good if @hjelmn @bosilca can review it in advance. |
I think this needs to be rethought. We do not want a lock in the critical path. Will take a look in a bit. |
@hjelmn, sounds reasonable. static inline void wait_sync_update(ompi_wait_sync_t *sync, int updates, int status)
{
// Fast path:
if( (sync->count - updates > 0) && OPAL_LIKELY(OPAL_SUCCESS == status) ){
if( OPAL_ATOMIC_CMPSET_X(&sync->count, sync->count, sync->count - updates) ){
return;
}
}
// Slow path:
WAIT_SYNC_LOCK(sync);
if( OPAL_LIKELY(OPAL_SUCCESS == status) ) {
if( 0 != (OPAL_THREAD_ADD32(&sync->count, -updates)) ) {
goto unlock;
}
} else {
/* this is an error path so just use the atomic */
opal_atomic_swap_32 (&sync->count, 0);
sync->status = OPAL_ERROR;
}
WAIT_SYNC_SIGNAL_UNLOCK(sync);
return;
unlock:
WAIT_SYNC_UNLOCK(sync);
} |
if this is purely a yalla requirement, then perhaps it would be better to put the lock in the yalla path instead of in all paths? |
Not sure. I don't see how this race is possible but I am looking into it. |
This race is possible for ALL PMLs. And I was able to reproduce it only on the jenkins server and not on other hosts, so it is quite rare. Which means you definitely don't want to debug this on the customer side. |
But for sure we can not have a lock in this function if there is any way to avoid it. |
@hjelmn, the lock was there originally and I just proposed comparable solution. |
Ha! Jenkins likes my patch! |
I will update the PR with fastpath during the weekend. |
@artpol84 The lock is fundamentally different with your change. If a MPI_Wait* call is waiting on multiple requests then the lock is now obtained one each and every request completion instead of just the last one. This will reduce multi-threaded message rates. |
No, with this solution you will only acquire the lock if sync->count will turn to 0, which means that only the last thread will be affected by the lock. |
We go to this fastpath: // Fast path:
if( (sync->count - updates > 0) && OPAL_LIKELY(OPAL_SUCCESS == status) ){
if( OPAL_ATOMIC_CMPSET_X(&sync->count, sync->count, sync->count - updates) ){
return;
}
} if the count won't turn to 0. |
Having a cmpset will not help things if there is contention for the update. It will fail and be another locking atomic slowing the critical path. Not sure we can do better but I want to give it some thought. |
I vote we merge. |
We can do iterations there, this should fix most cases. |
@jladd-mlnx Please back off. |
@hppritcha What's your vote? |
@jsquyres what's your vote? This fixes the issue. We spent a lot of time debugging this at the community's request. |
@hjelmn, I think this will solve your concerns: static inline void wait_sync_update(ompi_wait_sync_t *sync, int updates, int status)
{
// Fast path:
while( (sync->count - updates > 0) && OPAL_LIKELY(OPAL_SUCCESS == status) ){
if( OPAL_ATOMIC_CMPSET_X(&sync->count, sync->count, sync->count - updates) ){
return;
}
}
// Slow path:
WAIT_SYNC_LOCK(sync);
if( OPAL_LIKELY(OPAL_SUCCESS == status) ) {
if( 0 != (OPAL_THREAD_ADD32(&sync->count, -updates)) ) {
goto unlock;
}
} else {
/* this is an error path so just use the atomic */
opal_atomic_swap_32 (&sync->count, 0);
sync->status = OPAL_ERROR;
}
WAIT_SYNC_SIGNAL_UNLOCK(sync);
return;
unlock:
WAIT_SYNC_UNLOCK(sync);
} |
I spent 2 days debugging your code, @hjelmn. I want to own this fix |
@artpol84 Not my code. |
@hjelmn Then why the resistance to merge? |
|
@nysal - can you please review and comment also? |
@gpaulsen Considering updates I mentioned in the comments. |
@rhc54 sure! |
@jsquyres, agree. |
Once again, Thanks Mellanox Jenkins. |
Let me update and test the code. And let's discuss afterwards. |
This is sufficient to defeat the race:
Fixed a couple typos. |
@hjelmn have you verified that? |
@artpol84 There is no way it wouldn't. The race is the sync getting destroyed before the signaling is finished. By marking it as being in the process of being signaled it should be clear without using a single extra atomic. Extra atomics and locks were my resistance to the fix as presented. Otherwise it was fantastic work. |
Once again, thanks to the overall community for developing a great code base with contributions from many parties. None of it would be possible without _all_ of us: those who write the code, those who review the code, those who test the code, those who track down / fix bugs in the code, ...etc. I think the technical discussions today have been excellent; I appreciate @artpol84 and @hjelmn working to figure out the root cause and come up with a good solution that works for everyone. |
@hppritcha ok, will do that tomorrow. Too late for that now. |
127f701
to
689199c
Compare
@hppritcha I have updated the PR as we agreed. Under stress testing now, will update later. |
6e60cf7
to
65f222b
Compare
Sounds like rdmacm problem @hjelmn was mentioning. However I have the fix for it here. |
bot:retest |
@jladd-mlnx @bosilca @hjelmn
|
@artpol84 Nice. That will do it. |
BTW, can you put the continue statement and the comment above WAIT_SYNC_RELEASE from my commit. My copyright is already up-to-date on this file. |
Ok, will do tomorrow понедельник, 27 июня 2016 г. пользователь Nathan Hjelm написал:
Best regards, Artem Polyakov |
@artpol84 I have a strong doubt about this patch. The count can be negative for waitany and waitsome. |
@bosilca is right. The count can go negative in waitany and waitsome. There may be no way to do this without an extra member. |
Described in #1813