Skip to content

OMPI 5.0.x branch coll HAN introduces a circular dependency when disqualifying itself #11448

Closed
@wzamazon

Description

@wzamazon

Thank you for taking the time to submit an issue!

Background information

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

v5.0.x branch

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

compiled from source with the following coonfigure options:

./configure --prefix=/xxx/openmpi/v5.0.x/install --with-sge --without-verbs --with-libfabric=/opt/amazon/efa --disable-man-pages --with-libevent=external --with-hwloc=external --enable-cuda --with-cuda=/usr/local/cuda --with-cuda-libdir=/usr/local/cuda/lib64/stubs --disable-builtin-atomics

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

7f6f8db13b42916b27b690b8a3f9e2757ec1417f 3rd-party/openpmix (v4.2.3-8-g7f6f8db1)
 c7b2c715f92495637c298249deb5493e86864ac8 3rd-party/prrte (v3.0.1rc1-36-gc7b2c715f9)
 237ceff1a8ed996d855d69f372be9aaea44919ea config/oac (237ceff)

Please describe the system on which you are running

  • Operating system/version: Amazon Linux 2
  • Computer hardware: AMD EPYC 7R13
  • Network type: EFA

Details of the problem

running MPI_Allreduce() with cuda build of Open MPI, and 1 process per node will lead to segfault.

To reproduce, compile OSU Micro Benchmark with cuda support enabled

./configure --prefix=/openmpi-v5.0.0rc10/install CC=/openmpi/v5.0.x/install/bin/mpicc CXX=/openmpi/v5.0.x/install/bin/mpicxx --with-cuda=/usr/local/cuda --enable-cuda

then run osu_allreduce using 1 process per node

/openmpi/v5.0.x/install/bin/mpirun \
        -n 2 --hostfile 2instances \
        --map-by ppr:1:node \
        -x FI_HMEM_CUDA_ENABLE_XFER=1 \
        -x PATH \
        /omb/openmpi-v5.0.0rc10/install/libexec/osu-micro-benchmarks/mpi/collective/osu_allreduce

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions