Skip to content

Topic/osc pt2pt 1 thread fixes #2590

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 16, 2016

Conversation

jjhursey
Copy link
Member

Fixes Issue #2505 for the single threaded case.

@hjelmn I would appreciate a review of these three commits before they go into master.

The multithreaded test case against the osc/pt2pt component is still failing with wrong answers, hangs, and segvs. We'll open another ticket for that.

Note that this commit should go into the v2.x and v2.0.x branches. If included in the v2.0.2 release we should also put in a protection against using osc/pt2pt with MPI_THREAD_MULTIPLE as that currently can lead to wrong answers. We'll need a separate, v2.0.x temporary commit for that. I can work on generating that change.

 * If the user uses PSCW synchronization after a Fence then the previous
   epoch is not reset which can cause the PSCW to transfer data before
   it is ready leading to wrong answers.
 * This commit resets the `eager_send_active` in the start call.

Signed-off-by: Joshua Hursey <[email protected]>
 * When using `MPI_Lock`/`MPI_Unlock` with `MPI_Get` and non-contiguous
   datatypes is is possible that the unlock finishes too early before
   the data is actually present in the recv buffer.
 * We need to wait for the irecv to complete before unlocking the target.
   This commit waits for the outgoing fragment counts to become equal
   before unlocking.

Signed-off-by: Joshua Hursey <[email protected]>
 * When using `MPI_Put` with `MPI_Win_lock_all` a hang is possible since
   the `put` is waiting on `eager_send_active` to become `true` but
   that variable might not be reset in the case of `MPI_Win_lock_all`
   depending on other incoming events (e.g., `post` or ACKs of lock
   requests.

Signed-off-by: Joshua Hursey <[email protected]>
Copy link
Member

@hjelmn hjelmn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@jjhursey
Copy link
Member Author

@hjelmn Thanks!

We have one more patch regarding lock ordering that might need some discussion. I'm preparing that now. We will bring that in as a separate PR since it might require some discussion (might turn into a MPI standard interpretation issue).

@jjhursey jjhursey merged commit ced245d into open-mpi:master Dec 16, 2016
@jjhursey jjhursey deleted the topic/osc-pt2pt-1-thread-fixes branch December 20, 2016 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants