Expand CUDA support and fix documentation to account for all cuda dependent components.

## Background information

### What version of Open MPI are you using?

v5.0.1

### Describe how Open MPI was installed

Open MPI was installed from Github release tarball. Configuration was done using this command line:
```
../configure \
        --prefix="${prefix_dir}" \
        --without-psm2 \
        --without-ofi \
        --with-lustre \
        --with-slurm \
        --with-pmix \
        --with-ucx="${UCX_DIR}" \
        --with-cuda="${CUDA_ROOT}" \
        --with-cuda-libdir="${CUDA_ROOT}/lib64/stubs" \
        --enable-mca-dso=btl-smcuda,rcache-rgpusm,rcache-gpusm,accelerator-cuda,coll-cuda
```

Note that I added coll-cuda to the list of mca-dsos. I'm not sure if it is intentionally missing [in the documentation](https://docs.open-mpi.org/en/v5.0.x/tuning-apps/networking/cuda.html#how-do-i-build-open-mpi-with-cuda-aware-support). I also tried without coll-cuda first, but with the same outcome.

CUDA Toolkit version 12.3 was installed in `CUDA_ROOT`. UCX was built against that CUDA toolkit. On cluster nodes with the drivers installed, `ucx_info -d` reports the relevant CUDA and gdrcopy transports. 

Remark: The host used for compilation has the CUDA toolkit and runtime installed, but not the driver. So using `stubs` appears to be the way to go in that case (see #12264)

### Please describe the system on which you are running

* Operating system/version: Rocky Linux 8.8
* Computer hardware: Intel Xeon 
* Network type: InfiniBand

-----------------------------

## Details of the problem

With Open MPI 4.1.4,I was able to build it such that one could compile and run binaries without the need of having the CUDA toolkit, runtime and drivers available on the node in use. However, with 5.0.1 configured as shown above, the linker warns about missing libcudart when building a binary (even a basic `MPI_Init/MPI_Finalize` program):
```C
#include <stdio.h>
#include <stdlib.h>

#include "mpi.h"

int main(int argc, char* argv[])
{
        MPI_Init(&argc, &argv);
        MPI_Finalize();

        return EXIT_SUCCESS;
}
```

```
$ mpicc -show hw.c -o hw
gcc hw.c -o hw -I/path/to/openmpi/include -pthread -L/path/to/openmpi/lib -Wl,-rpath -Wl,/path/to/openmpi/lib -Wl,--enable-new-dtags -lmpi
$ mpicc hw.c -o hw
/usr/bin/ld: warning: libcudart.so.12, needed by /path/to/openmpi/lib/libmpi.so, not found (try using -rpath or -rpath-link)
$ mpirun -n1 ./hw
./hw: error while loading shared libraries: libcudart.so.12: cannot open shared object file: No such file or directory
$ ldd hw
        linux-vdso.so.1 (0x00007ffc747da000)
        libmpi.so.40 => /path/to/openmpi/lib/libmpi.so.40 (0x000014ae23df9000)
        [...]
        libcudart.so.12 => not found
```

With 4.1.4 I am able to compile and launch without those warnings/errors while having a CUDA-aware MPI. For 4.1.4 it was not the case that libmpi depends on libcudart, although 4.1.4 was configured using `--with-cuda=...`.

If I got the SC'23 BoF slides correct, I understand that with 5.x Open MPI intends to integrate (link?) plugins directly into libmpi. But with the `enable-mca-dso` configure option I tried to put all CUDA related components into DSOs and thus away from libmpi. Nevertheless, libmpi has libcudart as a shared library dependency (see above). I also checked the symbols which libmpi needs but it does not appear to require any stuff from libcudart:

```
$ nm -D /path/to/openmpi/lib/libmpi.so.40 | grep -i cuda
000000000029cdb0 T mca_pml_ob1_rdma_cuda_btls
00000000002c7e20 T MPIX_Query_cuda_support
                 U opal_built_with_cuda_support
                 U opal_cuda_support
```

So it appears to me that libmpi unnecessarily depends on libcudart. Is there some bug in the configure/compilation process or is it not possible anymore to build Open MPI libraries such that one can compile applications without CUDA runtime libraries being available? Given the dependency to libcudart of libmpi the statement from the documentation 

> Open MPI supports building with CUDA libraries and running on systems without CUDA libraries or hardware.

does not appear to apply here. Or is there something wrong on my side?

Btw: The [test program from the documentation](https://docs.open-mpi.org/en/v5.0.x/tuning-apps/networking/cuda.html#can-i-tell-at-compile-time-or-runtime-whether-i-have-cuda-aware-support) may also deserve a call to `MPI_Init` in case one follows the DSO approach. Otherwise, it is reported that there is no CUDA support (using OMPI v5.0.1 with CUDA toolkit 12.3 available for compilation/execution):

```
$ ./check  # with MPI_Init
Compile time check:
This MPI library has CUDA-aware support.
Run time check:
This MPI library has CUDA-aware support.
$ ./check-no-init # without MPI_Init
Compile time check:
This MPI library has CUDA-aware support.
Run time check:
This MPI library does not have CUDA-aware support.
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Expand CUDA support and fix documentation to account for all cuda dependent components. #12279

Background information

What version of Open MPI are you using?

Describe how Open MPI was installed

Please describe the system on which you are running

Details of the problem

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Expand CUDA support and fix documentation to account for all cuda dependent components. #12279

Description

Background information

What version of Open MPI are you using?

Describe how Open MPI was installed

Please describe the system on which you are running

Details of the problem

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions