You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm developer of Intel Math Kernel Library and for our cluster components we provide OpenMPI support for our customers.
But recently we faced with issue related to sending/receiving when sender and receiver use different data types.
I tried to use several OpenMPI versions such as 1.6.1 (hang), 1.8.1 and 2.1.1 (crash).
These versions have been downloaded from OpenMPI site and built from source.
Information about system:
OS: RHEL 7.2
CPU: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
Network type: N/A, within 1 node.
You can find reproducer below. The reproducer works fine with other MPI implementations, such as Intel MPI, MPICH. I run it on 2 processes and as result I observe the following error message:
[mkl:147075] *** An error occurred in MPI_Bcast
[mkl:147075] *** reported by process [3517710337,1]
[mkl:147075] *** on communicator MPI COMMUNICATOR 6 SPLIT FROM 3
[mkl:147075] *** MPI_ERR_TRUNCATE: message truncated
[mkl:147075] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[mkl:147075] *** and potentially your MPI job)
Denis, we have a long standing bug related to collective communications using different datatypes when the tuned collective modules is used and the pipelining is enabled. Because of the different datatypes, the processes will decide to pipeline the collective at different granularities leading in some cases to data truncation. The open issue related to this topic is #1763.
There is no known quick solution. You can try to not use the tuned module (--mca coll ^tuned) or you can enable dynamic collective decision and then remove all pipeline for the collectives using different datatypes (more info on our FAQ), but all this will impact all usages of the particular collective.
Hi all,
I'm developer of Intel Math Kernel Library and for our cluster components we provide OpenMPI support for our customers.
But recently we faced with issue related to sending/receiving when sender and receiver use different data types.
I tried to use several OpenMPI versions such as 1.6.1 (hang), 1.8.1 and 2.1.1 (crash).
These versions have been downloaded from OpenMPI site and built from source.
Information about system:
OS: RHEL 7.2
CPU: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
Network type: N/A, within 1 node.
You can find reproducer below. The reproducer works fine with other MPI implementations, such as Intel MPI, MPICH. I run it on 2 processes and as result I observe the following error message:
The text was updated successfully, but these errors were encountered: