-
Notifications
You must be signed in to change notification settings - Fork 900
v4.0.x: Backport alltoall in place fixes #10509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The temporary buffer must be shifted by the true_extent on a per type basis (since the various datatypes might have different true_extent). Thanks Heiko Bauke for reporting this. Refs. open-mpi#9329 Signed-off-by: Gilles Gouaillardet <[email protected]> (cherry picked from commit 0041ce8) Signed-off-by: Brian Barrett <[email protected]> (cherry picked from commit 8ff0a09)
This function can be used to compute the packed size of a datatype on a target architecture. Signed-off-by: George Bosilca <[email protected]> (cherry picked from commit 74049fc) Signed-off-by: Brian Barrett <[email protected]> (cherry picked from commit ab7d6bc)
Dont copy the datatype into a buffer with the same extent, but instead pack it and send it to the peer as packed. Signed-off-by: George Bosilca <[email protected]> (cherry picked from commit 447b289) Signed-off-by: Brian Barrett <[email protected]> (cherry picked from commit 71d061c)
Provide optimized variant for the homogeneous case. Signed-off-by: George Bosilca <[email protected]> (cherry picked from commit dc4e2ce) Signed-off-by: Brian Barrett <[email protected]> (cherry picked from commit 18d5fca)
Signed-off-by: George Bosilca <[email protected]> (cherry picked from commit b9012a3) Signed-off-by: Brian Barrett <[email protected]> (cherry picked from commit 0524cfa)
Signed-off-by: Mikhail Kurnosov <[email protected]> (cherry picked from commit aba6765) Signed-off-by: Brian Barrett <[email protected]> (cherry picked from commit be70f5d)
b9012a3 accidently set the message tags for the alltoall and alltoallv in place algorithms to the alltoallw tag. Undo that change. Signed-off-by: Brian Barrett <[email protected]> (cherry picked from commit f32dc49) (cherry picked from commit 4348b9a)
b9012a3 used the alltoallw interpretation of rdisps instead of the alltoall/alltoallv interpretation. According to the MPI standard, the byte displacement is recvbuf + rdispls[i] * extent(recvtype) for alltoall and alltoallv, but is recvbuf + rdispls[i] for alltoallw. Signed-off-by: Brian Barrett <[email protected]> (cherry picked from commit d8c50a5) (cherry picked from commit e06ae70)
Without waiting for the last receive before going to the next iteration we might overwrite data if the current left neighbor become the right at the next iteration. Start with an MPI_REQUEST_NULL request. If the request is not NULL and the first rcounts is 0, the ompi_request_wait will segfault. Signed-off-by: George Bosilca <[email protected]> (cherry picked from commit 6802702) Signed-off-by: Brian Barrett <[email protected]> (cherry picked from commit 4eb03ae)
FWIW: do you want these fixes in v4.0.x? Or is "4.1.x has the fixes for these bugs" a suitable response? |
So doing some internal testing - it seems that only bringing over c7b7619 will fix the segv in the reported issue. The remaining commits are performance improvements, and a new algorithm, and patches to said new algorithm. So whether the whole series should come over, or just the single patch (c7b7619) - I'll leave the call up to @gpaulsen and @hppritcha. |
If only the bare minimum is wanted I'll blow the other commits away and leave only c7b7619. |
Are we wanting this for spectrum MPI? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! I'd prefer to take the whole PR.
This went back to v4.1 a while ago here: #9672
Refs: #9329 and #9501