Skip to content

MPI_Waitsome performance improvement (version #2) #1821

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 8, 2016

Conversation

artpol84
Copy link
Contributor

This is an alternative (to PR #1820) implementation of MPI_Waitsome optimization according to @bosilca suggestion.

@bosilca, please check if I understood you correctly. Consider only the last commit (35c3f0c). First three are from PR #1816 because this PR depends on it. I'll rebase this one if the base #1816 will be merged.

In terms of performance here (compared to #1820) we have to memset the array of indices at the beginning.

I didn't considered the case where we put signalled indexes compactly (back to back) in the indices array like this:

{ 0, 10, 14, 25, ... }

because in this case in the second loop you'll have to search. And even though indexes are in the increasing order and you can apply binary search I don't think it will be faster that setting indices to 0 once at the beginning of the waitsome.

@@ -384,12 +401,16 @@ int ompi_request_default_wait_some(size_t count,
/* If the request is completed go ahead and mark it as such */
assert( REQUEST_COMPLETE(request) );
num_requests_done++;
indices[i] = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of the memset and the assignment to indices[i], you can use the indices array to store directly the outcome of the atomic operation, so that the if on line 384 transforms in if( !(indices[i] = (int)OPAL_ATOMIC_CMPSET_PTR(&request->req_complete, REQUEST_PENDING, &sync)) ) {

@artpol84
Copy link
Contributor Author

@bosilca ready, under stress testing now.

@hjelmn
Copy link
Member

hjelmn commented Jun 29, 2016

:bot:retest:

1 similar comment
@hjelmn
Copy link
Member

hjelmn commented Jun 29, 2016

:bot:retest:

by avoiding extra atomic exchanges.

Use indices array to mark already completed connections
in the pre-wait loop to avoid extra atomic exchanges
in the after-wait loop.
@ibm-ompi
Copy link

Build Failed with XL compiler! Please review the log, and get in touch if you have questions.

@ibm-ompi
Copy link

Build Failed with GNU compiler! Please review the log, and get in touch if you have questions.

@jjhursey
Copy link
Member

The IBM system rebooted unexpectedly while the Jenkins test was running. It's back up now - sorry about that.
bot:retest:

@artpol84
Copy link
Contributor Author

artpol84 commented Jul 7, 2016

bot:retest

@artpol84
Copy link
Contributor Author

artpol84 commented Jul 7, 2016

@jsquyres I see that Travis results are outdated and I was unable to force it to rerun. Who supports it?

@artpol84
Copy link
Contributor Author

artpol84 commented Jul 7, 2016

bot:retest

@jsquyres
Copy link
Member

jsquyres commented Jul 7, 2016

@artpol84 Travis doesn't respond to our "bot:" commands. But you can click through the Travis details link and you should be able to click on the "rebuild" button:

travis

Actually, I'm not 100% sure how the Travis permissions work. If you click through the travis link on a random PR, do you see the rebuild button? (right now on the travis build for this PR, it's an "X" instead of a "swoop", because it's currently re-building)

@jsquyres
Copy link
Member

jsquyres commented Sep 7, 2016

@artpol84 Is this PR still relevant?

@artpol84
Copy link
Contributor Author

artpol84 commented Sep 7, 2016

We agreed with @bosilca on this fix some time ago. Maybe it's worth to review this again but I believe it is still relevant.

@bosilca
Copy link
Member

bosilca commented Sep 7, 2016

This remains a good optimization as it has the potential to reduce the number of atomic operations in MPI_Waitsome. 👍

@artpol84 artpol84 merged commit 84e178c into open-mpi:master Sep 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants