Description
Background information
What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)
9704f0f (master)
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
git clone
If you are building/installing from a git clone, please copy-n-paste the output from git submodule status
.
4d07260d9f79bb7f328b1fc9107b45e683cf2c4e ../../../../3rd-party/openpmix (v1.1.3-3319-g4d07260d) 9ac0b7ecee2c97c357bf6751fdaab7a10e62df14 ../../../../3rd-party/prrte (psrvr-v2.0.0rc1-4133-g9ac0b7ec)
Please describe the system on which you are running
- Operating system/version: Linux 4.16.3-301.fc28.x86_64
- Computer hardware:
- Network type: InfiniBand
Details of the problem
Dynamic selection provided via MCA parameters does not work for simple algorithms. Simple algorithm (coll_han_use_simple_<op>
) splits global communicator into intra- and inter-node sub-communicators with disabled HAN component (mca_coll_han_comm_create()
):
opal_info_set(&comm_info, "ompi_comm_coll_preference", "tuned,^han");
By this reason on sub-communicators simple algorithm uses a collective operation from component with a highest priority.
In the following example we want to choose Bcast from tuned component for intra- and inter-node communication. But simple algorithm calls Bcast from basic component (component with a highest priority).
mpiexec --host cn2:8,cn3:8,cn4:8,cn5:8,cn6:8 --n 40 \
--map-by core --bind-to core --mca pml ucx \
--mca coll_basic_priority 90 \
--mca coll_libnbc_priority 10 \
--mca coll_adapt_priority 0 \
--mca coll_sm_priority 0 \
--mca coll_han_priority 100 \
--mca coll_han_bcast_dynamic_intra_node_module 4 \
--mca coll_han_bcast_dynamic_inter_node_module 4 \
--mca coll_han_use_simple_bcast 1 \
./bcast_test