Closed
Description
Test Case: https://gist.github.com/jjhursey/508037aa535c7dd1fe2e64610675c280
The above linked testcase performs collectives using the same type signature at each rank in the collective call, but using different types and counts to achieve that signature. It uses either MPI_LONG_LONG
, a non-contiguous MPI_LONG_LONG
followed by a space, or contiguous 2x MPI_LONG_LONG
as its three datatypes.
When run with -np 4 (either on the same node or across two nodes) the MPI_Allgatherv
failed with a wrong answer. The test case also includes MPI_Bcast
and MPI_Allgather
that pass.
shell$ mpirun -np 4 -mca coll ^hcoll,tuned ./coll_non_uniform_types
- testbcast 16
- testbcast 112
- testbcast 1008
- testbcast 10000
- testbcast 100000
- testbcast 1000000
- testallgather 16
- testallgather 112
- testallgather 1008
- testallgather 10000
- testallgather 100000
- testallgather 1000000
- testallgatherv 16
R0 buf[1] is 8897841259083430779, want 1 (from 0)
abort: wrong data(6)
R2 buf[1] is 8897841259083430779, want 1 (from 0)
abort: wrong data(6)
R1 buf[1] is 8897841259083430779, want 1 (from 0)
abort: wrong data(6)
R3 buf[1] is 8897841259083430779, want 1 (from 0)
abort: wrong data(6)
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 16.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
Note that this test passes with the tuned
component. Below is the expected output:
shell$ mpirun -np 4 -mca coll ^hcoll ./coll_non_uniform_types
- testbcast 16
- testbcast 112
- testbcast 1008
- testbcast 10000
- testbcast 100000
- testbcast 1000000
- testallgather 16
- testallgather 112
- testallgather 1008
- testallgather 10000
- testallgather 100000
- testallgather 1000000
- testallgatherv 16
- testallgatherv 112
- testallgatherv 1008
- testallgatherv 10000
- testallgatherv 100000
- testallgatherv 1000000