Skip to content

PMIx_Fences - remove two during MPI initialization #11451

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

hppritcha
Copy link
Member

This patch removes redundant PMIx Fences in the initialization procedure for MPI when using the World Process Model (WPM). See chapter 11 sections 2 and 3 of the MPI-4 standard for a discussion of the WPM and new Sessions model.

The patch does, however, require that what should have been a local operation to support initialization of an MPI session, into a global one. Note this does not disable the sessions feature but just restricts when it will work at this point to use cases that are similar to MPI initialization using the WPM.

Refactoring to make ompi_mpi_instance_init_common purely local will require changes that would be too impactive for the current state of the 5.0.0 release cycle. See issue #11239.

Related to #11166

@hppritcha hppritcha marked this pull request as draft February 27, 2023 23:32
@hppritcha
Copy link
Member Author

bot:ibm:retest

@hppritcha hppritcha force-pushed the pmix_fence_removal_v4 branch from 5fc8716 to 6ea5e84 Compare February 28, 2023 21:19
This patch removes redundant PMIx Fences in the initialization
procedure for MPI when using the World Process Model (WPM).
See chapter 11 sections 2 and 3 of the MPI-4 standard for a discussion
of the WPM and new Sessions model.

The patch does, however, require that what should have been a local operation
to support initialization of an MPI session, into a global one.
Note this does not disable the sessions feature but just restricts when it
will work at this point to use cases that are similar to MPI initialization
using the WPM.

Refactoring to make ompi_mpi_instance_init_common  purely local will require changes that would
be too impactive for the current state of the 5.0.0 release cycle.
See issue open-mpi#11239.

Related to open-mpi#11166

Signed-off-by: Howard Pritchard <[email protected]>
@hppritcha hppritcha force-pushed the pmix_fence_removal_v4 branch from 6ea5e84 to c241cb1 Compare February 28, 2023 21:23
@hppritcha hppritcha marked this pull request as ready for review February 28, 2023 21:23
@hppritcha hppritcha requested a review from jjhursey March 6, 2023 19:51
@hppritcha
Copy link
Member Author

@jjhursey could you review this PR when you have a chance?

@jjhursey
Copy link
Member

I feel uncomfortable reviewing this PR. Someone else should review that has time to trace the dependencies. The correctness of MPI_Init over different transports, especially at scale, is sensitive to the structure of the fences. As we saw with the earlier version of the PR, which was later reverted, it is difficult to see those implications through a review - and requires some deeper knowledge of the impact on lower-level transports. Unfortunately, I don't have the cycles to do that level of review.

@jjhursey jjhursey removed their request for review March 30, 2023 15:00
@hppritcha hppritcha mentioned this pull request Apr 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants