You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I found the issue, I had a missing symbol in the port, but really puzzling that it would work even with UCX.
I'll open the fix after I ran it through a few more tests.
Background information
What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)
v5.0.x head
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
Source build
If you are building/installing from a git clone, please copy-n-paste the output from
git submodule status
.Please describe the system on which you are running
Details of the problem
We are seeing segfaults with this commit: https://github.com/open-mpi/ompi/pull/12781/files#diff-750d0e8be09c5f4ee5f703b8ba2c735a3e1b8b807162936e55530ec721ec5b86
The backtrace is
We also get segfault with EFA network but so far the issue appears to be within CUDA memory copy.
The text was updated successfully, but these errors were encountered: