Closed
Description
Thank you for taking the time to submit an issue!
Background information
What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)
v5.0.x branch
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
compiled from source with the following coonfigure options:
./configure --prefix=/xxx/openmpi/v5.0.x/install --with-sge --without-verbs --with-libfabric=/opt/amazon/efa --disable-man-pages --with-libevent=external --with-hwloc=external --enable-cuda --with-cuda=/usr/local/cuda --with-cuda-libdir=/usr/local/cuda/lib64/stubs --disable-builtin-atomics
If you are building/installing from a git clone, please copy-n-paste the output from git submodule status
.
7f6f8db13b42916b27b690b8a3f9e2757ec1417f 3rd-party/openpmix (v4.2.3-8-g7f6f8db1)
c7b2c715f92495637c298249deb5493e86864ac8 3rd-party/prrte (v3.0.1rc1-36-gc7b2c715f9)
237ceff1a8ed996d855d69f372be9aaea44919ea config/oac (237ceff)
Please describe the system on which you are running
- Operating system/version: Amazon Linux 2
- Computer hardware: AMD EPYC 7R13
- Network type: EFA
Details of the problem
running MPI_Allreduce() with cuda build of Open MPI, and 1 process per node will lead to segfault.
To reproduce, compile OSU Micro Benchmark with cuda support enabled
./configure --prefix=/openmpi-v5.0.0rc10/install CC=/openmpi/v5.0.x/install/bin/mpicc CXX=/openmpi/v5.0.x/install/bin/mpicxx --with-cuda=/usr/local/cuda --enable-cuda
then run osu_allreduce
using 1 process per node
/openmpi/v5.0.x/install/bin/mpirun \
-n 2 --hostfile 2instances \
--map-by ppr:1:node \
-x FI_HMEM_CUDA_ENABLE_XFER=1 \
-x PATH \
/omb/openmpi-v5.0.0rc10/install/libexec/osu-micro-benchmarks/mpi/collective/osu_allreduce