Fujitsu: MPI_GATHER (linear_sync) can be truncated with derived datatypes #134

ompiteam · 2014-10-01T16:20:41Z

Per http://www.open-mpi.org/community/lists/devel/2012/01/10215.php, MPI_GATHER using coll:tuned, linear_sync can be truncated improperly.

I slightly modified the program that was originally sent and attached it here. It shows the problem for me on trunk and v1.5 (I assume it's also a problem on v1.4).

Many thanks for the bug report from Fujitsu.

ompiteam · 2014-10-01T16:20:42Z

Imported from trac issue 2981. Created by jsquyres on 2012-01-26T17:42:14, last modified: 2014-05-20T17:59:11

jsquyres attached gather.c on 2012-01-26 17:42:35

ompiteam · 2014-10-01T16:20:42Z

Trac comment by jsquyres on 2012-01-26 17:43:12:

Oops -- this is a DDT issue, and I meant to assign it to George. :-)

ompiteam · 2014-10-01T16:20:43Z

Trac comment by jsquyres on 2012-04-17 11:18:51:

George -- can you have a look?

ompiteam · 2014-10-01T16:20:43Z

Trac comment by jsquyres on 2012-04-24 14:05:39:

No fix provided yet -- pushing to 1.6.1.

ompiteam · 2014-10-01T16:20:44Z

Trac comment by bosilca on 2014-05-20 17:59:11:

This is a more general issue we have in Open MPI with the tuned collectives. If the send and the receive datatypes and counts are not identical, the message splitting decision is wrong (as it split in repetitions of the entire datatype), leading to truncation in the best case and to wrong messages in the worst one. Without going through a packed version, there is no easy fix.

Resolve thread safety in TCP BTL jenkins: threads, known_issues

hppritcha · 2020-02-03T17:38:41Z

@bosilca can this be closed?

bosilca · 2020-02-03T22:08:14Z

This isn't fixed and will not going to be. The simplest solution for application requiring collective with different type signature (but same typemap) is to disable all pipelining for MPI collectives.

gpaulsen · 2020-02-03T22:37:38Z

@bosilca Is there a way to just disable the pipelining for MPI collectives? I think the big hammer is disabling the entire tuned collective component, but perhaps there's a better approach?
I see you can force a non-pipelined algorithm for both bcase and reduce algorithms, but is there a better approach?

bosilca · 2020-02-03T22:58:48Z

First, all pipeline algorithms suffers from this issue, not only those in the tuned collectives. Second, disabling tuned or more generally disabling pipelining will have a drastic performance impact on most applications (and not only for DL). Last, tuned is the only collective component that supports MPI_T as a mean to configure the collective decision per communicator (and there are several example on our mailing lists on how to achieve this for the tuned module).

kawashima-fj · 2020-02-04T05:31:52Z

related (but not same): #199 #1763

ompiteam assigned bosilca Oct 1, 2014

ompiteam added this to the Open MPI 1.6.6 milestone Oct 1, 2014

ompiteam added bug Severity: critical labels Oct 1, 2014

yosefe pushed a commit to yosefe/ompi that referenced this issue Mar 5, 2015

Merge pull request open-mpi#134 from rhc54/cmr/tcp-btl

48cb4e5

Resolve thread safety in TCP BTL jenkins: threads, known_issues

lrrajesh added a commit to lrrajesh/ompi that referenced this issue Mar 19, 2015

bug open-mpi#134 check form min-nodes >=1

5c26a93

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fujitsu: MPI_GATHER (linear_sync) can be truncated with derived datatypes #134

Fujitsu: MPI_GATHER (linear_sync) can be truncated with derived datatypes #134

ompiteam commented Oct 1, 2014

ompiteam commented Oct 1, 2014

ompiteam commented Oct 1, 2014

ompiteam commented Oct 1, 2014

ompiteam commented Oct 1, 2014

ompiteam commented Oct 1, 2014

hppritcha commented Feb 3, 2020

bosilca commented Feb 3, 2020

gpaulsen commented Feb 3, 2020

bosilca commented Feb 3, 2020

kawashima-fj commented Feb 4, 2020

Fujitsu: MPI_GATHER (linear_sync) can be truncated with derived datatypes #134

Fujitsu: MPI_GATHER (linear_sync) can be truncated with derived datatypes #134

Comments

ompiteam commented Oct 1, 2014

ompiteam commented Oct 1, 2014

ompiteam commented Oct 1, 2014

ompiteam commented Oct 1, 2014

ompiteam commented Oct 1, 2014

ompiteam commented Oct 1, 2014

hppritcha commented Feb 3, 2020

bosilca commented Feb 3, 2020

gpaulsen commented Feb 3, 2020

bosilca commented Feb 3, 2020

kawashima-fj commented Feb 4, 2020