Skip to content

mpi_assert_allow_overtaking breaks MPI_Issend #6559

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
devreal opened this issue Apr 2, 2019 · 4 comments
Closed

mpi_assert_allow_overtaking breaks MPI_Issend #6559

devreal opened this issue Apr 2, 2019 · 4 comments

Comments

@devreal
Copy link
Contributor

devreal commented Apr 2, 2019

I'm using MPI_Issend in one of my code paths to ensure the proper reception of messages before signalling that all outstanding transfers have completed. At some point I came across the communicator info key mpi_assert_allow_overtaking that is available in Open MPI 4.0.1 (described in §6.4.4 of the current MPI standard draft) and thought I'd give it a try because I really don't care about message ordering in this particular code. Well, premature optimizations are the root of all evil... It took me quite a while to figure out that having added that key a while ago actually broke this code path because it causes transfers started with MPI_Issend to never complete.

The below code can be used to reliable trigger this issue:

#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv)
{
  int rank, size;
  int nmsg = 0;
  int provided;
  MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &provided);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  MPI_Comm_size(MPI_COMM_WORLD, &size);

  MPI_Comm comm;
  MPI_Comm_dup(MPI_COMM_WORLD, &comm);

  // signal MPI that we don't care about the order of messages
  MPI_Info info;
  MPI_Info_create(&info);
  // Setting this key causes Issend transfers to never complete
  MPI_Info_set(info, "mpi_assert_allow_overtaking", "true");
  MPI_Comm_set_info(comm, info);
  MPI_Info_free(&info);

  int dest = (rank + 1) % size;
  int val;
  MPI_Request rreq, sreq;
  MPI_Irecv(&val, 1, MPI_INT, MPI_ANY_SOURCE, 1000, comm, &rreq);
  MPI_Issend(&val, 1, MPI_INT, dest, 1000, comm, &sreq);

  int sflag = 0, rflag = 0;
  printf("Starting testing\n");
  do {
    if (!rflag)
      MPI_Test(&rreq, &rflag, MPI_STATUS_IGNORE);
    if (!sflag)
      MPI_Test(&sreq, &sflag, MPI_STATUS_IGNORE);
  } while (!sflag && !rflag);
  printf("Done with single message!\n");

  MPI_Finalize();
  return 0;
}

The code works if I

  • Change the MPI_Issend to MPI_Isend (which is not correct in my case); or
  • Avoid setting the key mpi_assert_allow_overtaking to true

Otherwise, the code continues testing the send and receive requests without ever completing the message:

$ mpirun -n 2 -N 1 ./test_mpiissend
Starting testing
Starting testing

I'm using Open MPI 4.0.1 (installed from release tarball) and see this problem on both a Cray XC40 and an IB cluster (tested with and without UCX).

I guess this info key is still an experimental feature since it's not yet part of the official standard. My understanding of this info key is that it changes the ordering in which messages are matched but that should not interfere with the way MPI_Issend works, right?

Please let me know if I can provide any other information.

@hppritcha
Copy link
Member

@devreal does it hang if you run both processes on the same node?

@bosilca
Copy link
Member

bosilca commented Apr 2, 2019

All BTLs exhibit this issue in 4.0.1, but the problem seems to be fixed in master. After investigation it appears that 0263456 has not been moved over the 4.x branch.

@devreal
Copy link
Contributor Author

devreal commented Apr 3, 2019

@hppritcha Yes, I also saw the code hanging on a single node.

@bosilca Thanks for digging this up. I guess a simple search on Github would have done the trick for me ^^

@devreal
Copy link
Contributor Author

devreal commented Apr 24, 2019

This issue was fixed in the 4.0.x release branch through #6582, closing this issue. Thanks for the backport!

@devreal devreal closed this as completed Apr 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants