Skip to content

v4.0.x: Backport alltoall in place fixes #10509

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jun 24, 2022

Conversation

awlauria
Copy link
Contributor

This went back to v4.1 a while ago here: #9672

Refs: #9329 and #9501

ggouaillardet and others added 9 commits June 23, 2022 09:51
The temporary buffer must be shifted by the true_extent on a
per type basis (since the various datatypes might have different
true_extent).

Thanks Heiko Bauke for reporting this.

Refs. open-mpi#9329

Signed-off-by: Gilles Gouaillardet <[email protected]>
(cherry picked from commit 0041ce8)
Signed-off-by: Brian Barrett <[email protected]>
(cherry picked from commit 8ff0a09)
This function can be used to compute the packed size of a datatype on a
target architecture.

Signed-off-by: George Bosilca <[email protected]>
(cherry picked from commit 74049fc)
Signed-off-by: Brian Barrett <[email protected]>
(cherry picked from commit ab7d6bc)
Dont copy the datatype into a buffer with the same extent, but instead
pack it and send it to the peer as packed.

Signed-off-by: George Bosilca <[email protected]>
(cherry picked from commit 447b289)
Signed-off-by: Brian Barrett <[email protected]>
(cherry picked from commit 71d061c)
Provide optimized variant for the homogeneous case.

Signed-off-by: George Bosilca <[email protected]>
(cherry picked from commit dc4e2ce)
Signed-off-by: Brian Barrett <[email protected]>
(cherry picked from commit 18d5fca)
Signed-off-by: George Bosilca <[email protected]>
(cherry picked from commit b9012a3)
Signed-off-by: Brian Barrett <[email protected]>
(cherry picked from commit 0524cfa)
Signed-off-by: Mikhail Kurnosov <[email protected]>
(cherry picked from commit aba6765)
Signed-off-by: Brian Barrett <[email protected]>
(cherry picked from commit be70f5d)
b9012a3 accidently set the message tags for the alltoall and
alltoallv in place algorithms to the alltoallw tag.  Undo that
change.

Signed-off-by: Brian Barrett <[email protected]>
(cherry picked from commit f32dc49)
(cherry picked from commit 4348b9a)
b9012a3 used the alltoallw interpretation of rdisps instead of the
alltoall/alltoallv interpretation.  According to the MPI standard,
the byte displacement is recvbuf + rdispls[i] * extent(recvtype) for
alltoall and alltoallv, but is recvbuf + rdispls[i] for alltoallw.

Signed-off-by: Brian Barrett <[email protected]>
(cherry picked from commit d8c50a5)
(cherry picked from commit e06ae70)
Without waiting for the last receive before going to the next iteration we
might overwrite data if the current left neighbor become the right at the next
iteration.

Start with an MPI_REQUEST_NULL request.
If the request is not NULL and the first rcounts is 0, the ompi_request_wait
will segfault.

Signed-off-by: George Bosilca <[email protected]>
(cherry picked from commit 6802702)
Signed-off-by: Brian Barrett <[email protected]>
(cherry picked from commit 4eb03ae)
@awlauria awlauria requested review from hppritcha and gpaulsen June 23, 2022 13:55
@awlauria awlauria changed the base branch from main to v4.0.x June 23, 2022 13:55
@awlauria awlauria added this to the v4.0.8 milestone Jun 23, 2022
@jsquyres
Copy link
Member

FWIW: do you want these fixes in v4.0.x? Or is "4.1.x has the fixes for these bugs" a suitable response?

@awlauria
Copy link
Contributor Author

So doing some internal testing - it seems that only bringing over c7b7619 will fix the segv in the reported issue. The remaining commits are performance improvements, and a new algorithm, and patches to said new algorithm.

So whether the whole series should come over, or just the single patch (c7b7619) - I'll leave the call up to @gpaulsen and @hppritcha.

@awlauria
Copy link
Contributor Author

If only the bare minimum is wanted I'll blow the other commits away and leave only c7b7619.

@hppritcha
Copy link
Member

Are we wanting this for spectrum MPI?

Copy link
Member

@gpaulsen gpaulsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I'd prefer to take the whole PR.

@hppritcha hppritcha added the NEWS label Jun 24, 2022
@hppritcha hppritcha merged commit e771506 into open-mpi:v4.0.x Jun 24, 2022
@awlauria awlauria deleted the alltoall_backport_fixes branch June 24, 2022 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants