-
Notifications
You must be signed in to change notification settings - Fork 900
Some multi-threading fixes #9302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The second commit looks larger than it really is, it is mostly spacing adjustments. |
/** | ||
* Register and open all available components, giving them a chance to access the MCA parameters. | ||
*/ | ||
OPAL_THREAD_LOCK(&ompi_mpi_topo_bootstrap_mutex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather have this safety moved down into the framework itself, as I don't think that any lazily initialized framework support, or need, to allow threaded initialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you expand on this a bit? The io framework lazy init does something similar to what's done here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you mentionned we have several frameworks that need to be initialized lazily potentially in a multi-threaded way. I would like to have a consistent way to handle their lazy initialization, to avoid any discrepancies between them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened #9307 for discussing the lazy framework initialization. I can work on that once we have consensus. We can remove this specific commit from the PR, so that the rest needn't be held up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @nysal . Will remove the commit.
…on after wait completes. We found an issue where with using multiple threads, it is possible for the data to not be in the buffer before MPI_Wait() returns. Testing the buffer later after MPI_Wait() returned would show the data arrives eventually without the rmb(). We have seen this issue on Power9 intermittently using PAMI, but in theory could happen with any transport. Signed-off-by: Austen Lauria <[email protected]>
Signed-off-by: Austen Lauria <[email protected]>
Removed the commit with the topo changes. |
ompi/request: Add a read memory barrier to sync the receive buffer so…
…on after wait completes.
We found an issue where with using multiple threads, it is possible for the data
to not be in the buffer before MPI_Wait() returns. Testing the buffer later after
MPI_Wait() returned would show the data arrives eventually without the rmb().
We have seen this issue on Power9 intermittently using PAMI, but in theory could
happen with any transport.
Signed-off-by: Austen Lauria [email protected]