-
Notifications
You must be signed in to change notification settings - Fork 900
MPI_Waitsome performance improvement (version #2) #1821
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -384,12 +401,16 @@ int ompi_request_default_wait_some(size_t count, | |||
/* If the request is completed go ahead and mark it as such */ | |||
assert( REQUEST_COMPLETE(request) ); | |||
num_requests_done++; | |||
indices[i] = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of the memset and the assignment to indices[i], you can use the indices array to store directly the outcome of the atomic operation, so that the if on line 384 transforms in if( !(indices[i] = (int)OPAL_ATOMIC_CMPSET_PTR(&request->req_complete, REQUEST_PENDING, &sync)) ) {
35c3f0c
to
4679ebf
Compare
@bosilca ready, under stress testing now. |
4679ebf
to
fbf1363
Compare
:bot:retest: |
1 similar comment
:bot:retest: |
fbf1363
to
76fd5c9
Compare
by avoiding extra atomic exchanges. Use indices array to mark already completed connections in the pre-wait loop to avoid extra atomic exchanges in the after-wait loop.
76fd5c9
to
732d890
Compare
Build Failed with XL compiler! Please review the log, and get in touch if you have questions. |
Build Failed with GNU compiler! Please review the log, and get in touch if you have questions. |
The IBM system rebooted unexpectedly while the Jenkins test was running. It's back up now - sorry about that. |
bot:retest |
@jsquyres I see that Travis results are outdated and I was unable to force it to rerun. Who supports it? |
bot:retest |
@artpol84 Travis doesn't respond to our "bot:" commands. But you can click through the Travis details link and you should be able to click on the "rebuild" button: Actually, I'm not 100% sure how the Travis permissions work. If you click through the travis link on a random PR, do you see the rebuild button? (right now on the travis build for this PR, it's an "X" instead of a "swoop", because it's currently re-building) |
@artpol84 Is this PR still relevant? |
We agreed with @bosilca on this fix some time ago. Maybe it's worth to review this again but I believe it is still relevant. |
This remains a good optimization as it has the potential to reduce the number of atomic operations in MPI_Waitsome. 👍 |
This is an alternative (to PR #1820) implementation of MPI_Waitsome optimization according to @bosilca suggestion.
@bosilca, please check if I understood you correctly. Consider only the last commit (35c3f0c). First three are from PR #1816 because this PR depends on it. I'll rebase this one if the base #1816 will be merged.
In terms of performance here (compared to #1820) we have to memset the array of indices at the beginning.
I didn't considered the case where we put signalled indexes compactly (back to back) in the
indices
array like this:because in this case in the second loop you'll have to search. And even though indexes are in the increasing order and you can apply binary search I don't think it will be faster that setting indices to 0 once at the beginning of the waitsome.