-
Notifications
You must be signed in to change notification settings - Fork 900
Newer versions of OpenMPI are unable to locate CUDA support. #12264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
As @hppritcha pointed out, this is indeed documented https://docs.open-mpi.org/en/v5.0.x/tuning-apps/networking/cuda.html. |
@tmh97 Per the Webex today, could you provide a little more info? E.g.:
|
@jsquyres It seems Alternatively, |
Do we know that that is correct?
Given that the docs were specifically written that way, is it correct to assume that there is a reason Alternatively, @edgargabriel stated today on the call that configuring @edgargabriel Can you confirm that this is correct / what is currently happening on |
on Ubuntu 20.04, I need:
Note that I seem to need to specify paths for both cuda and cuda-libdir. Adding a path for libdir alone was not enough. |
Yes, having to specify both |
I went back to the cluster with the lustre file system, and I can see clearly in bash_history that I configured for a while Open MPI with the However, as of right now, it looks like I don't need to set the |
Ok, so then this question really is just about
|
The stubs point to a libcuda.so that allows linking CUDA applications using the driver API (such as OMPI) on platforms without GPUs. This is different from what other libraries require, but there are valid reasons. I'll vote for automatically checking for the stubs in |
Cool. Can someone in NVIDIA look into this? Hint, hint. 😄 |
Finding CUDA libraries without having to specify both --with-cuda and --with-cuda-lib was requested in github issue open-mpi#12264 Signed-off-by: Nick Sarkauskas <[email protected]>
Finding CUDA libraries without having to specify both --with-cuda and --with-cuda-lib was requested in github issue open-mpi#12264 Signed-off-by: Nick Sarkauskas <[email protected]>
Finding CUDA libraries without having to specify both --with-cuda and --with-cuda-lib was requested in github issue open-mpi#12264 Signed-off-by: Nick Sarkauskas <[email protected]>
Finding CUDA libraries without having to specify both --with-cuda and --with-cuda-lib was requested in github issue open-mpi#12264 Signed-off-by: Nick Sarkauskas <[email protected]>
Finding CUDA libraries without having to specify both --with-cuda and --with-cuda-lib was requested in github issue open-mpi#12264 Signed-off-by: Nick Sarkauskas <[email protected]>
Finding CUDA libraries without having to specify both --with-cuda and --with-cuda-lib was requested in github issue #12264 Signed-off-by: Nick Sarkauskas <[email protected]> (cherry picked from commit cad3d9a)
fixed with #12382 |
Background information
What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)
The bug exists in 5.01, I am unaware if it also exists for previous, or subsequent releases.
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
This issue exists in source tarball and gitclone, I've tested both.
Please describe the system on which you are running
Two node system
Details of the problem
I used to be able to get CUDA support with OpenMPI by simply providing the
--with-cuda=/usr/local/cuda
option at OMPI configure. Now it seems I also require thewith-cuda-libdir
Without this additional flag, it appears as if there is no support for NVIDIA devices,CUDA support: no
. I believe this will cause problems for users when they re-build OMPI to a newer version and suddenly see their CUDA support is non-existent.The text was updated successfully, but these errors were encountered: