-
Notifications
You must be signed in to change notification settings - Fork 922
Description
Background information
What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)
9704f0f (master)
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
git clone
If you are building/installing from a git clone, please copy-n-paste the output from git submodule status
.
4d07260d9f79bb7f328b1fc9107b45e683cf2c4e ../../../../3rd-party/openpmix (v1.1.3-3319-g4d07260d) 9ac0b7ecee2c97c357bf6751fdaab7a10e62df14 ../../../../3rd-party/prrte (psrvr-v2.0.0rc1-4133-g9ac0b7ec)
Please describe the system on which you are running
- Operating system/version: Linux 4.16.3-301.fc28.x86_64
- Computer hardware:
- Network type: InfiniBand
Details of the problem
Dynamic selection provided via MCA parameters does not work for simple algorithms. Simple algorithm (coll_han_use_simple_<op>
) splits global communicator into intra- and inter-node sub-communicators with disabled HAN component (mca_coll_han_comm_create()
):
opal_info_set(&comm_info, "ompi_comm_coll_preference", "tuned,^han");
By this reason on sub-communicators simple algorithm uses a collective operation from component with a highest priority.
In the following example we want to choose Bcast from tuned component for intra- and inter-node communication. But simple algorithm calls Bcast from basic component (component with a highest priority).
mpiexec --host cn2:8,cn3:8,cn4:8,cn5:8,cn6:8 --n 40 \
--map-by core --bind-to core --mca pml ucx \
--mca coll_basic_priority 90 \
--mca coll_libnbc_priority 10 \
--mca coll_adapt_priority 0 \
--mca coll_sm_priority 0 \
--mca coll_han_priority 100 \
--mca coll_han_bcast_dynamic_intra_node_module 4 \
--mca coll_han_bcast_dynamic_inter_node_module 4 \
--mca coll_han_use_simple_bcast 1 \
./bcast_test