Skip to content

Pass NULL to the collective components for unused buffers. #11552

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

bosilca
Copy link
Member

@bosilca bosilca commented Apr 1, 2023

We want the underlying algorithms to know when they can use a buffer as a temporary, such as when one module delegate a collective to another. Instead of allocating temporaries everywhere, we allow the upper level module to handle the temporary buffers, and pass them into all the others. Thus, we need to make sure that the user provided buffers are screened such that they don't make it as temporaries.

This could help #11418 to streamline the call to the xhc reduction.

@bosilca bosilca added this to the v5.0.0 milestone Apr 1, 2023
@bosilca bosilca requested review from devreal and bwbarrett April 1, 2023 00:09
Copy link
Contributor

@devreal devreal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just some minor comments. Passing cleaned parameters seems sensible. But what about allreduce and alltoall?

@bosilca
Copy link
Member Author

bosilca commented Apr 2, 2023

Passing cleaned parameters seems sensible. But what about allreduce and alltoall?

Neither Allreduce or Alltoall provide the opportunity for the user to provide a buffer that shall not be used. That would be possible for the other flavors of MPI_Alltoall{v,w,x} but there the count will be NULL for all unused buffers, so we already have a way to see if the buffer is usable. Also, in these operations we are not allowed to change the user-provided arrays, which means we will always be forced to operate on temporary memory with all the associated overheads.

@bosilca bosilca force-pushed the topic/null_as_unused_buffers_for_collectives branch 4 times, most recently from 7ae2b93 to 88ff1d2 Compare April 2, 2023 16:50
@devreal
Copy link
Contributor

devreal commented Apr 3, 2023

Neither Allreduce or Alltoall provide the opportunity for the user to provide a buffer that shall not be used. That would be possible for the other flavors of MPI_Alltoall{v,w,x} but there the count will be NULL for all unused buffers, so we already have a way to see if the buffer is usable. Also, in these operations we are not allowed to change the user-provided arrays, which means we will always be forced to operate on temporary memory with all the associated overheads.

Right, I was thinking in terms of MPI_IN_PLACE but that is a known value already so no need to use NULL.

Copy link
Contributor

@devreal devreal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@bwbarrett bwbarrett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to document this behavior in ompi/mca/coll/coll.h.

We want the underlying algorithms to know when they can use a buffer as
a temporary, such as when one module delegate a collective to another.
Instead of allocating temporaries everywhere, we allow the upper level
module to handle the temporary buffers, and pass them into all the
others. Thus, we need to make sure that the user provided buffers are
screened such that they don't make it as temporaries.

Signed-off-by: George Bosilca <[email protected]>
@bosilca bosilca force-pushed the topic/null_as_unused_buffers_for_collectives branch from 88ff1d2 to 1edc1b5 Compare April 3, 2023 16:19
@jsquyres jsquyres modified the milestones: v5.0.0, v5.0.1 Oct 30, 2023
@janjust janjust modified the milestones: v5.0.1, v5.0.2 Jan 8, 2024
@jsquyres jsquyres modified the milestones: v5.0.2, v5.0.3 Feb 13, 2024
gkatev added a commit to gkatev/ompi that referenced this pull request Mar 20, 2024
With open-mpi#11552, rbuf is *either* valid & appropriately sized, or NULL,
and collectives implementations downstream from HAN (e.g. XHC) rely
on this convention.

Signed-off-by: George Katevenis <[email protected]>
gkatev added a commit to gkatev/ompi that referenced this pull request Mar 26, 2024
Improve handling of the rbuf pointer. Previously, it was possible for the
pointer to become non-NULL, even though it does not point to valid memory.
This becomes a problem for the assurances that open-mpi#11552 tries to introduce.

Signed-off-by: George Katevenis <[email protected]>
@gkatev
Copy link
Contributor

gkatev commented Apr 1, 2024

Hi, were there any concerns or issues left to resolve with this? Is it good to be merged?

@bosilca bosilca merged commit 75b0e6d into open-mpi:main Apr 1, 2024
@bosilca bosilca deleted the topic/null_as_unused_buffers_for_collectives branch April 1, 2024 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants