Skip to content

Fix MPI_COMM_TYPE_HW_GUIDED split #10681

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 22, 2022
Merged

Conversation

jjhursey
Copy link
Member

  • The wrong split type was being passed to the
    ompi_comm_split_type_get_part function. It was passing the
    "original" (MPI_COMM_TYPE_HW_GUIDED) split type and not the
    "converted" (e.g., MPI_COMM_TYPE_SHARED) split type. This resulted
    in an error indicating that the split type was invalid.

 * The wrong split type was being passed to the
   `ompi_comm_split_type_get_part` function. It was passing the
   "original" (`MPI_COMM_TYPE_HW_GUIDED`) split type and not the
   "converted" (e.g., `MPI_COMM_TYPE_SHARED`) split type. This resulted
   in an error indicating that the split type was invalid.

Signed-off-by: Joshua Hursey <[email protected]>
@dalcinl
Copy link
Contributor

dalcinl commented Aug 17, 2022

@jjhursey All good from my side ([link] ignore the failure, your branch is missing a fix already in main).

@jjhursey jjhursey merged commit 849985e into open-mpi:main Aug 22, 2022
@jjhursey jjhursey deleted the fix-hw-guided-split branch August 22, 2022 12:35
@dalcinl
Copy link
Contributor

dalcinl commented Aug 22, 2022

@jjhursey Sorry, but last time I confirmed things were OK, I got confused.

Running mpi4py tests in one process isOK, but running in two processes is not. See the [[failure]] (https://github.com/mpi4py/mpi4py-testing/runs/7953805209?check_suite_focus=true#step:18:1236) yourself.

My best guess is that you are not handling MPI_UNDEFINED properly. Any process passing MPI_UNDEFINED must be filtered-out of the new communicator, and the calling process must return MPI_COMM_NULL.

Reproducer

from mpi4py import MPI

comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()

info = MPI.Info.Create()
info.Set("mpi_hw_resource_type", "mpi_shared_memory")

for root in range(size):
    if rank == root:
        split_type = MPI.COMM_TYPE_HW_GUIDED
    else:
        split_type = MPI.UNDEFINED

    comm = comm.Split_type(split_type, info=info)
    if rank == root:
        assert comm != MPI.COMM_NULL
        assert comm.size == 1
        assert comm.rank == 0
        comm.Free()
    else:
        assert comm == MPI.COMM_NULL

info.Free()

Output

$ mpiexec -n 1 python test.py 

$ mpiexec -n 2 python test.py 
[kw61149:1129015] Error: Mismatched info values for MPI_COMM_TYPE_HW_GUIDED
Traceback (most recent call last):
  File "/home/dalcinl/Devel/mpi4py/tmp3.py", line 16, in <module>
    comm = comm.Split_type(split_type, info=info)
  File "mpi4py/MPI/Comm.pyx", line 219, in mpi4py.MPI.Comm.Split_type
mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind
Traceback (most recent call last):
  File "/home/dalcinl/Devel/mpi4py/tmp3.py", line 16, in <module>
    comm = comm.Split_type(split_type, info=info)
  File "mpi4py/MPI/Comm.pyx", line 219, in mpi4py.MPI.Comm.Split_type
mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind

@jjhursey
Copy link
Member Author

I see the case now. I'll work on a fix and update the unit test in ompi-tests-public. Thanks for the update!

@jjhursey
Copy link
Member Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants