You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have recently installed OpenMP 4.0.4 binding with GCC-7 compilers. I enabled UCX (version 1.8.0) support with "--ucx" in the ./configure step. At the same time, I also turned on "--with-verbs" option. Then at runtime, it complained "WARNING: There was an error initializing OpenFabirc devide.", but I still got the correct results instead of a crashed run.
I knew that the same issue was reported in the issue #6517. @yosefe pointed out that "These error message are printed by openib BTL which is deprecated." Instead of using "--with-verbs", we need "--without-verbs". Indeed, that solved my problem. After recompiled with "--without-verbs", the above error disappeared. Here, I'd like to understand more about "--with-verbs" and "--without-verbs". What does "verbs" here really mean? Please elaborate as much as you can. If we use "--without-verbs", do we ensure data transfer go through Infiniband (but not Ethernet)? Our GitHub documentation says "UCX currently support - OpenFabric verbs (including Infiniband and RoCE)". That made me confused a bit if we configure it by "--with-ucx" and "--without-verbs" at the same time.
Thanks,
Collin
The text was updated successfully, but these errors were encountered:
@collinmines Let me try to answer your question from what I picked up over the last year or so: the verbs integration in Open MPI is essentially unmaintained and will not be included in Open MPI 5.0 anymore. It is still in the 4.0.x releases but I found that it fails to work with newer IB devices (giving the error you are observing). The recommended way of using InfiniBand with Open MPI is through UCX, which is supported and developed by Mellanox. If you configure Open MPI with --with-ucx --without-verbs you are telling Open MPI to ignore it's internal support for libverbs and use UCX instead. This does not affect how UCX works and should not affect performance.
Thank you for taking the time to submit an issue!
Background information
This may or may not an issue, but I'd like to know more details regarding OpenFabric verbs in terms of OpenMPI termonilogies.
What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)
version 4.0.4
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
I installed v4.0.4 from a soruce tarball, not from a git clone.
If you are building/installing from a git clone, please copy-n-paste the output from
git submodule status
.Please describe the system on which you are running
Details of the problem
I have recently installed OpenMP 4.0.4 binding with GCC-7 compilers. I enabled UCX (version 1.8.0) support with "--ucx" in the ./configure step. At the same time, I also turned on "--with-verbs" option. Then at runtime, it complained "WARNING: There was an error initializing OpenFabirc devide.", but I still got the correct results instead of a crashed run.
I knew that the same issue was reported in the issue #6517. @yosefe pointed out that "These error message are printed by openib BTL which is deprecated." Instead of using "--with-verbs", we need "--without-verbs". Indeed, that solved my problem. After recompiled with "--without-verbs", the above error disappeared. Here, I'd like to understand more about "--with-verbs" and "--without-verbs". What does "verbs" here really mean? Please elaborate as much as you can. If we use "--without-verbs", do we ensure data transfer go through Infiniband (but not Ethernet)? Our GitHub documentation says "UCX currently support - OpenFabric verbs (including Infiniband and RoCE)". That made me confused a bit if we configure it by "--with-ucx" and "--without-verbs" at the same time.
Thanks,
Collin
The text was updated successfully, but these errors were encountered: