Skip to content

Conversation

jjhursey
Copy link
Member

@jjhursey jjhursey commented Aug 5, 2022

Signed-off-by: Joshua Hursey [email protected]

@jjhursey
Copy link
Member Author

jjhursey commented Aug 5, 2022

General question about #if MPI_VERSION >= 4 protections. I added them in so that the MPI 4 code is isolated if we didn't want to include it while claiming to be "3.1" compliant. However, since this change is additive are they needed if we just put this in main?

@jsquyres
Copy link
Member

jsquyres commented Aug 5, 2022

I don't think we're isolating all MPI-4 features -- e.g., sessions support is built and included, regardless of MPI_VERSION.

My impression of using MPI_VERSION >= 4 kinds of comparisons was for doing things that are only relevant when we claim MPI-4 compliance. E.g., some things were deprecated, or some behaviors changed in MPI-4.0 compared to MPI-3.x.

As a more concrete example: we probably do not want to emit deprecation warnings about canceling an MPI_Send request while Open MPI claims MPI-3.x conformance, because that behavior is not deprecated until MPI-4.0. We could warn about the upcoming deprecation -- i.e., emit a message something like "Although this version of Open MPI is MPI-3.whatever conformant, future versions of Open MPI will be MPI-4.0 conformant, and therefore canceling a send request will be deprecated. You should update your code!"

But that seems pretty bulky / doesn't feel like it's worth it / might even end up confusing the user (i.e., warning about something that -- in the current version of Open MPI -- is perfectly valid). Hence, we put MPI_VERSION >= 4 kinds of protection around such functionality.

@jjhursey
Copy link
Member Author

jjhursey commented Aug 5, 2022

Good point. I'll take them out and repush.

@jjhursey jjhursey force-pushed the add-hw-guide-split branch from c93f38d to 3e50202 Compare August 5, 2022 19:50
@jjhursey
Copy link
Member Author

jjhursey commented Aug 5, 2022

Done. Ready for review.

Once this is merged, then I'll file an enhancement to improve the implementation of MPI_COMM_TYPE_HW_UNGUIDED (and see if I can get someone to work on it).

@@ -65,6 +65,8 @@ int MPI_Comm_split_type(MPI_Comm comm, int split_type, int key,
}

if ( MPI_COMM_TYPE_SHARED != split_type && // Same as OMPI_COMM_TYPE_NODE
MPI_COMM_TYPE_HW_UNGUIDED != split_type &&
MPI_COMM_TYPE_HW_GUIDED != split_type &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not iterating directly over ompi_comm_split_type_hw_guided_support ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ompi_comm_split_type_hw_guided_support doesn't include MPI_COMM_TYPE_HW_(UN)GUIDED or MPI_UNDEFINED So it wouldn't be a complete list.

We could make a little helper function to handle the set that is covered (like bool ompi_comm_is_valid_split_type(int split_type)) I'm not sure it buys us much though since this is the only place that would be using it.

@jjhursey jjhursey force-pushed the add-hw-guide-split branch from 3e50202 to 8804c04 Compare August 5, 2022 21:06
 * `MPI_COMM_TYPE_HW_GUIDED` supports all of the existing `OMPI_COMM_TYPE_`
   options.
 * `MPI_COMM_TYPE_HW_UNGUIDED` is recognized, but not supported so it returns
   `MPI_COMM_NULL` indidicating that the MPI library cannot split the
   communicator any further.

Signed-off-by: Joshua Hursey <[email protected]>
@jjhursey jjhursey force-pushed the add-hw-guide-split branch from 8804c04 to 19e24fa Compare August 5, 2022 21:08
* Stage 0: Recognized, but not implemented.
* Stage 1: Do better than that
*/
if (MPI_COMM_TYPE_HW_UNGUIDED == global_split_type) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A quick and simple solution to this. Add a field to ompi_communicator_t after c_epoch (there is room for exactly 4 bytes) and store the type used on the communicator creation (inherit from the parent for duplication, and set to undefined otherwise). Then use this type to go down the ompi_comm_split_type_hw_guided_support for the next level in the hierarchy. It might work.

Copy link
Member Author

@jjhursey jjhursey Aug 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea. I would like to experiment with that as a second effort (after this PR is merged). I'm going to talk with someone today that might have some cycles in the near term to work on it. Otherwise, I might take a crack at it.

(edit) I just filed Issue #10635 to track the "unguided" mode improvement.

@jjhursey jjhursey merged commit b1383f4 into open-mpi:main Aug 9, 2022
@jjhursey jjhursey deleted the add-hw-guide-split branch August 9, 2022 18:07
@dalcinl
Copy link
Contributor

dalcinl commented Aug 17, 2022

@jjhursey Something seems to be wrong. As per the MPI-4 standard, MPI_COMM_TYPE_HW_GUIDED with info={"mpi_hw_resource_type": "mpi_shared_memory"} is equivalent to using MPI_COMM_TYPE_SHARED.
The shared memory resource type is the only case mentioned explicitly in the standard, and required for consistency with the older MPI_COMM_TYPE_SHARED. Any chance you could implement support for at least this case? I guess the implementation should be fairly trivial, as MPI_COMM_TYPE_SHARED is already implemented.

Reproducer

from mpi4py import MPI

comm = MPI.COMM_SELF.Split_type(MPI.COMM_TYPE_SHARED, info=MPI.INFO_NULL)
comm.Free()

info = MPI.Info.Create()
info.Set("mpi_hw_resource_type", "mpi_shared_memory")
comm = MPI.COMM_SELF.Split_type(MPI.COMM_TYPE_HW_GUIDED, info=info)
comm.Free()

Output

$ mpiexec -n 1 python test.py
[kw61149:1134179] Error: in ompi_comm_split_type_get_part() unexpected split_type=13
Traceback (most recent call last):
  File "/home/dalcinl/Devel/mpi4py/tmp2.py", line 8, in <module>
    comm = MPI.COMM_SELF.Split_type(MPI.COMM_TYPE_HW_GUIDED, info=info)
  File "mpi4py/MPI/Comm.pyx", line 219, in mpi4py.MPI.Comm.Split_type
mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind

@jjhursey
Copy link
Member Author

Humm. That was working fine in the test I was using (it was in C not Python, but I'm not sure that should matter). Let me investigate.

@jjhursey
Copy link
Member Author

I see the problem. It must have cropped up in the re-organization (it's looking at the original split type not the adjusted split type). I'm working on a patch now.

@jjhursey
Copy link
Member Author

@dalcinl I posted a fix in #10681

I posted the C test program to the ompi-tests-public repo in open-mpi/ompi-tests-public#19

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

need to implement MPI_COMM_TYPE_HW_GUIDED/UNGUIDED for MPI 4.0
5 participants