-
Notifications
You must be signed in to change notification settings - Fork 900
Fujitsu: MPI_GATHER (linear_sync) can be truncated with derived datatypes #134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Imported from trac issue 2981. Created by jsquyres on 2012-01-26T17:42:14, last modified: 2014-05-20T17:59:11
|
Trac comment by jsquyres on 2012-01-26 17:43:12: Oops -- this is a DDT issue, and I meant to assign it to George. :-) |
Trac comment by jsquyres on 2012-04-17 11:18:51: George -- can you have a look? |
Trac comment by jsquyres on 2012-04-24 14:05:39: No fix provided yet -- pushing to 1.6.1. |
Trac comment by bosilca on 2014-05-20 17:59:11: This is a more general issue we have in Open MPI with the tuned collectives. If the send and the receive datatypes and counts are not identical, the message splitting decision is wrong (as it split in repetitions of the entire datatype), leading to truncation in the best case and to wrong messages in the worst one. Without going through a packed version, there is no easy fix. |
Resolve thread safety in TCP BTL jenkins: threads, known_issues
@bosilca can this be closed? |
This isn't fixed and will not going to be. The simplest solution for application requiring collective with different type signature (but same typemap) is to disable all pipelining for MPI collectives. |
@bosilca Is there a way to just disable the pipelining for MPI collectives? I think the big hammer is disabling the entire tuned collective component, but perhaps there's a better approach? |
First, all pipeline algorithms suffers from this issue, not only those in the tuned collectives. Second, disabling tuned or more generally disabling pipelining will have a drastic performance impact on most applications (and not only for DL). Last, tuned is the only collective component that supports MPI_T as a mean to configure the collective decision per communicator (and there are several example on our mailing lists on how to achieve this for the tuned module). |
Per http://www.open-mpi.org/community/lists/devel/2012/01/10215.php, MPI_GATHER using coll:tuned, linear_sync can be truncated improperly.
I slightly modified the program that was originally sent and attached it here. It shows the problem for me on trunk and v1.5 (I assume it's also a problem on v1.4).
Many thanks for the bug report from Fujitsu.
The text was updated successfully, but these errors were encountered: