You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have observed a hang behavior in mpirun since at least Aug. 2023(and likely earlier) after the application completes. The issue happens at a 5-10% chance, and reliably reproducible.
The issue does not happen with Open MPI 4.1.x branch.
What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)
Main and v5.0.x
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
Network type: Single node shared memory(--mca pml ob1)
Details of the problem
We can reproduce the issue with a few osu microbenchmarks collectives benchmarks. It is easier to reproduce with a higher core count platform, e.g. we typically test on 64 cores.
We observe that the application, i.e. osu_reduce, completes normally and verified that MPI_Finalize is called successfully by all participant processes. However, the mpirun command get stuck at the end of the benchmark.
Uh oh!
There was an error while loading. Please reload this page.
Background information
We have observed a hang behavior in mpirun since at least Aug. 2023(and likely earlier) after the application completes. The issue happens at a 5-10% chance, and reliably reproducible.
The issue does not happen with Open MPI 4.1.x branch.
What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)
Main and v5.0.x
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
Built from source:
If you are building/installing from a git clone, please copy-n-paste the output from
git submodule status
.Tried builtin pointers and prrte master branch.
Please describe the system on which you are running
--mca pml ob1
)Details of the problem
We can reproduce the issue with a few osu microbenchmarks collectives benchmarks. It is easier to reproduce with a higher core count platform, e.g. we typically test on 64 cores.
We observe that the application, i.e.
osu_reduce
, completes normally and verified thatMPI_Finalize
is called successfully by all participant processes. However, thempirun
command get stuck at the end of the benchmark.References
More details see: openpmix/prrte#1839
The text was updated successfully, but these errors were encountered: