MPI_Waitsome performance improvement #1820

artpol84 · 2016-06-27T11:04:29Z

This PR makes sense only if

sub PR Request handling: fix MPI_Waitany and MPI_Waitsome hjelmn/ompi#9 will be accepted
PR opal/sync: fix race condition #1816 will be merged

will need to rebase afterwards.

I'm addressing the following drawback:
https://github.com/hjelmn/ompi/blob/request_perfm_regression/ompi/request/req_wait.c#L412:L417

Still checking but waitsome test seems to pass.

@artpol84

This commit fixes a race condition discovered by @artpol84. The race happens when a signalling thread decrements the sync count to 0 then goes to sleep. If the waiting thread runs and detects the count == 0 before going to sleep on the condition variable it will destroy the condition variable while the signalling thread is potentially still processing the completion. The fix is to add a non-atomic member to the sync structure that indicates another process is handling completion. Since the member will only be set to false by the initiating thread and the completing thread the variable does not need to be protected. When destoying a condition variable the waiting thread needs to wait until the singalling thread is finished. Thanks to @artpol84 for tracking this down. Fixes open-mpi#1813 Signed-off-by: Nathan Hjelm <[email protected]>

artpol84 · 2016-06-27T11:18:26Z

running stress-test againts this now. Will update later.

bosilca · 2016-06-27T23:25:17Z

I have few comments about this patch.

Introducing WAIT_SYNC_SIGNALLED have the drawback to bring back the original issue. Now the sync primitive might have few requests associated with, but will be released by the creator thread even if one of the progress thread is actually trying to activate the synchronization primitive (because now signaling doesn't protect the path where a thread is still having a hand on the synchronization primitive). I think a simpler solution will be to simply call the sync_update to remove the initial value and if necessary set the signalling field to false).

sync_sets isn't equal to count - num_requests_null_inactive - num_requests_done in the first part of wait_some ? and then to count - num_requests_null_inactive - num_requests_done on the second loop ?

The patch improving MPI_Waitsome saves one atomic operation for all requests that were originally completed in exchange for increasing the size of the request structure. Why not using the user provided array (indices) instead of altering the request structure ? I dont think there is anything in the standard that prevents us from using the array as temporary buffer, as long as upon return it contains the expected values, and we know it must have the right size (or the application is incorrect).

artpol84 · 2016-06-28T00:45:42Z

@bosilca

Your concerns are only applicable to MPI_Waitany, I believe. For MPI_Waitsome we call SIGNALLED only if we are 100% sure that nobody is waiting on the sync.
introducing a flag helps to avoid index search (if you will put request indexes compact as they appear to be finished OR index rearrangement at the end of the function if you will use original indexes).

(request handling related)

artpol84 · 2016-06-28T01:09:01Z

@bosilca Ok, let me think more about that and try to consider your suggestions.

by avoiding extra atomic exchanges. The fix is based on MPI spec: 12.4.2 Multiple threads completing the same request. A program in which two threads block, waiting on the same request, is erroneous. Similarly, the same request cannot appear in the array of requests of two concurrent MPI_{WAIT|TEST}{ANY|SOME|ALL} calls. In MPI, a request can only be completed once. Any combination of wait or test that violates this rule is erroneous." We add marked flag to the request structure. Only MPI_Waitsome thread will use/access it by any means. PML threads will not see/touch it. So given that any particular request can be used no more than in one MPI_Waitsome we are safe to do this change.

artpol84 · 2016-06-28T15:14:47Z

Closing this in favor of PR #1821

hjelmn and others added 2 commits June 26, 2016 20:14

Fix Mellanox copyright.

8d011ea

artpol84 changed the title ~~Fix waitsome~~ MPI_Waitsome performance improvement Jun 27, 2016

artpol84 force-pushed the fix_waitsome branch 2 times, most recently from 432a849 to a85bd84 Compare June 27, 2016 14:57

artpol84 mentioned this pull request Jun 28, 2016

opal/sync: fix race condition #1816

Merged

Fix MPI_Waitany and MPI_Waitsome

fb44c3b

(request handling related)

artpol84 force-pushed the fix_waitsome branch from a85bd84 to 5929fa9 Compare June 28, 2016 08:52

artpol84 mentioned this pull request Jun 28, 2016

MPI_Waitsome performance improvement (version #2) #1821

Merged

artpol84 closed this Jun 28, 2016

bosilca mentioned this pull request Jul 1, 2016

MPI_Waitall optimization #176

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MPI_Waitsome performance improvement #1820

MPI_Waitsome performance improvement #1820

Uh oh!

artpol84 commented Jun 27, 2016 •

edited

Loading

Uh oh!

artpol84 commented Jun 27, 2016

Uh oh!

bosilca commented Jun 27, 2016

Uh oh!

artpol84 commented Jun 28, 2016

Uh oh!

artpol84 commented Jun 28, 2016

Uh oh!

artpol84 commented Jun 28, 2016

Uh oh!

Uh oh!

MPI_Waitsome performance improvement #1820

MPI_Waitsome performance improvement #1820

Uh oh!

Conversation

artpol84 commented Jun 27, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

artpol84 commented Jun 27, 2016

Uh oh!

bosilca commented Jun 27, 2016

Uh oh!

artpol84 commented Jun 28, 2016

Uh oh!

artpol84 commented Jun 28, 2016

Uh oh!

artpol84 commented Jun 28, 2016

Uh oh!

Uh oh!

artpol84 commented Jun 27, 2016 •

edited

Loading