-
Notifications
You must be signed in to change notification settings - Fork 900
mpi_assert_allow_overtaking breaks MPI_Issend #6559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@devreal does it hang if you run both processes on the same node? |
All BTLs exhibit this issue in 4.0.1, but the problem seems to be fixed in master. After investigation it appears that 0263456 has not been moved over the 4.x branch. |
@hppritcha Yes, I also saw the code hanging on a single node. @bosilca Thanks for digging this up. I guess a simple search on Github would have done the trick for me ^^ |
This issue was fixed in the 4.0.x release branch through #6582, closing this issue. Thanks for the backport! |
I'm using
MPI_Issend
in one of my code paths to ensure the proper reception of messages before signalling that all outstanding transfers have completed. At some point I came across the communicator info keympi_assert_allow_overtaking
that is available in Open MPI 4.0.1 (described in §6.4.4 of the current MPI standard draft) and thought I'd give it a try because I really don't care about message ordering in this particular code. Well, premature optimizations are the root of all evil... It took me quite a while to figure out that having added that key a while ago actually broke this code path because it causes transfers started withMPI_Issend
to never complete.The below code can be used to reliable trigger this issue:
The code works if I
MPI_Issend
toMPI_Isend
(which is not correct in my case); ormpi_assert_allow_overtaking
totrue
Otherwise, the code continues testing the send and receive requests without ever completing the message:
I'm using Open MPI 4.0.1 (installed from release tarball) and see this problem on both a Cray XC40 and an IB cluster (tested with and without UCX).
I guess this info key is still an experimental feature since it's not yet part of the official standard. My understanding of this info key is that it changes the ordering in which messages are matched but that should not interfere with the way
MPI_Issend
works, right?Please let me know if I can provide any other information.
The text was updated successfully, but these errors were encountered: