Skip to content

WARNING: There was an error initializing OpenFabric device --with-verbs #7841

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
collinmines opened this issue Jun 18, 2020 · 1 comment
Open

Comments

@collinmines
Copy link

Thank you for taking the time to submit an issue!

Background information

This may or may not an issue, but I'd like to know more details regarding OpenFabric verbs in terms of OpenMPI termonilogies.

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

version 4.0.4

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

I installed v4.0.4 from a soruce tarball, not from a git clone.

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

Please describe the system on which you are running

  • Operating system/version: CentOS 7.7 (kernel 3.10.0)
  • Computer hardware: Intel Xeon Sandy Bridge processors
  • Network type: Mellanox 4

Details of the problem

I have recently installed OpenMP 4.0.4 binding with GCC-7 compilers. I enabled UCX (version 1.8.0) support with "--ucx" in the ./configure step. At the same time, I also turned on "--with-verbs" option. Then at runtime, it complained "WARNING: There was an error initializing OpenFabirc devide.", but I still got the correct results instead of a crashed run.

I knew that the same issue was reported in the issue #6517. @yosefe pointed out that "These error message are printed by openib BTL which is deprecated." Instead of using "--with-verbs", we need "--without-verbs". Indeed, that solved my problem. After recompiled with "--without-verbs", the above error disappeared. Here, I'd like to understand more about "--with-verbs" and "--without-verbs". What does "verbs" here really mean? Please elaborate as much as you can. If we use "--without-verbs", do we ensure data transfer go through Infiniband (but not Ethernet)? Our GitHub documentation says "UCX currently support - OpenFabric verbs (including Infiniband and RoCE)". That made me confused a bit if we configure it by "--with-ucx" and "--without-verbs" at the same time.

Thanks,
Collin

@devreal
Copy link
Contributor

devreal commented Jul 17, 2020

@collinmines Let me try to answer your question from what I picked up over the last year or so: the verbs integration in Open MPI is essentially unmaintained and will not be included in Open MPI 5.0 anymore. It is still in the 4.0.x releases but I found that it fails to work with newer IB devices (giving the error you are observing). The recommended way of using InfiniBand with Open MPI is through UCX, which is supported and developed by Mellanox. If you configure Open MPI with --with-ucx --without-verbs you are telling Open MPI to ignore it's internal support for libverbs and use UCX instead. This does not affect how UCX works and should not affect performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants