Skip to content

MPI_Waitsome performance improvement #1820

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

artpol84
Copy link
Contributor

@artpol84 artpol84 commented Jun 27, 2016

This PR makes sense only if

will need to rebase afterwards.

@bosilca @hjelmn please check.

I'm addressing the following drawback:
https://github.com/hjelmn/ompi/blob/request_perfm_regression/ompi/request/req_wait.c#L412:L417

Still checking but waitsome test seems to pass.

hjelmn and others added 2 commits June 26, 2016 20:14
This commit fixes a race condition discovered by @artpol84. The race
happens when a signalling thread decrements the sync count to 0 then
goes to sleep. If the waiting thread runs and detects the count == 0
before going to sleep on the condition variable it will destroy the
condition variable while the signalling thread is potentially still
processing the completion. The fix is to add a non-atomic member to
the sync structure that indicates another process is handling
completion. Since the member will only be set to false by the
initiating thread and the completing thread the variable does not need
to be protected. When destoying a condition variable the waiting
thread needs to wait until the singalling thread is finished.

Thanks to @artpol84 for tracking this down.

Fixes open-mpi#1813

Signed-off-by: Nathan Hjelm <[email protected]>
@artpol84 artpol84 changed the title Fix waitsome MPI_Waitsome performance improvement Jun 27, 2016
@artpol84
Copy link
Contributor Author

running stress-test againts this now. Will update later.

@artpol84 artpol84 force-pushed the fix_waitsome branch 2 times, most recently from 432a849 to a85bd84 Compare June 27, 2016 14:57
@bosilca
Copy link
Member

bosilca commented Jun 27, 2016

I have few comments about this patch.

Introducing WAIT_SYNC_SIGNALLED have the drawback to bring back the original issue. Now the sync primitive might have few requests associated with, but will be released by the creator thread even if one of the progress thread is actually trying to activate the synchronization primitive (because now signaling doesn't protect the path where a thread is still having a hand on the synchronization primitive). I think a simpler solution will be to simply call the sync_update to remove the initial value and if necessary set the signalling field to false).

sync_sets isn't equal to count - num_requests_null_inactive - num_requests_done in the first part of wait_some ? and then to count - num_requests_null_inactive - num_requests_done on the second loop ?

The patch improving MPI_Waitsome saves one atomic operation for all requests that were originally completed in exchange for increasing the size of the request structure. Why not using the user provided array (indices) instead of altering the request structure ? I dont think there is anything in the standard that prevents us from using the array as temporary buffer, as long as upon return it contains the expected values, and we know it must have the right size (or the application is incorrect).

@artpol84
Copy link
Contributor Author

@bosilca

  1. Your concerns are only applicable to MPI_Waitany, I believe. For MPI_Waitsome we call SIGNALLED only if we are 100% sure that nobody is waiting on the sync.
  2. introducing a flag helps to avoid index search (if you will put request indexes compact as they appear to be finished OR index rearrangement at the end of the function if you will use original indexes).

(request handling related)
@artpol84
Copy link
Contributor Author

@bosilca Ok, let me think more about that and try to consider your suggestions.

by avoiding extra atomic exchanges.

The fix is based on MPI spec:
    12.4.2 Multiple threads completing the same request. A program in which two threads block, waiting
on the same request, is erroneous. Similarly, the same request cannot appear in the
array of requests of two concurrent MPI_{WAIT|TEST}{ANY|SOME|ALL} calls. In MPI, a
request can only be completed once. Any combination of wait or test that violates this rule
is erroneous."

We add marked flag to the request structure. Only MPI_Waitsome thread will use/access it by any means.
PML threads will not see/touch it. So given that any particular request can be used no more than in one
MPI_Waitsome we are safe to do this change.
@artpol84
Copy link
Contributor Author

Closing this in favor of PR #1821

@artpol84 artpol84 closed this Jun 28, 2016
@bosilca bosilca mentioned this pull request Jul 1, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants