Skip to content

CUDA build: make all fails with undefined references on master and v5.0.x #8656

Closed
@dbonner

Description

@dbonner

Thank you for taking the time to submit an issue!

Background information

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

branch: master
hash: d18d3f6

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

(For machine 1: 256 threads, machine 2: 36 threads, machine 3: 12 threads)
git clone --recursive https://github.com/open-mpi/ompi.git -j 256
cd ompi
export AUTOMAKE_JOBS=256
./autogen.pl
./configure --disable-picky --prefix=/usr/local --with-cuda=/usr/local/cuda-11.2 --with-ucx=/usr/local/ucx
make -j 256 all
---> ERROR

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

7145774 3rd-party/openpmix (v1.1.3-2852-g7145774e)
284d15d7b9be51c07ae3a3964b1567fde1a106e2 3rd-party/prrte (dev-31005-g284d15d7b9)

Please describe the system on which you are running

  • Operating system/version:
  • Computer hardware:
  • Network type:

I have tried this on 3 machines' bare metal and all 3 machines showed the same error:

  1. Dual AMD Epyc 7742, 8 x Nvidia A-100 40Gig
  2. Intel i9-10980XE, Nvidia 2080 Ti
  3. Intel i7-9750H, Nvidia 2080 MaxQ

All machines are set up with the same software:

Ubuntu 20.10
gcc-10
Cuda 11.2 update 2
nv_peer_memory built from latest source
gdrcopy built from latest source
ucx built from latest source
mlnx_ofed - latest version

Details of the problem

Please describe, in detail, the problem that you are having, including the behavior you expect to see, the actual behavior that you are seeing, steps to reproduce the problem, etc. It is most helpful if you can attach a small program that a developer can use to reproduce your problem.

Note: If you include verbatim output (or a code block), please use a GitHub Markdown code block like below:

shell$ mpirun -np 2 ./hello_world
shell$ make -j 256 install
make[2]: Entering directory '/home/daniel/ompi/opal/tools/wrappers'
  CC       opal_wrapper.o
  CCLD     opal_wrapper
/usr/bin/ld: /usr/local/lib/libmca_common_cuda.so.0: undefined reference to `opal_cuda_add_initialization_function'
/usr/bin/ld: ../../../opal/.libs/libopen-pal.so: undefined reference to `opal_cuda_memmove'
/usr/bin/ld: ../../../opal/.libs/libopen-pal.so: undefined reference to `opal_cuda_memcpy'
/usr/bin/ld: ../../../opal/.libs/libopen-pal.so: undefined reference to `mca_cuda_convertor_init'
/usr/bin/ld: ../../../opal/.libs/libopen-pal.so: undefined reference to `opal_cuda_check_bufs'
/usr/bin/ld: ../../../opal/.libs/libopen-pal.so: undefined reference to `opal_cuda_memcpy_sync'
collect2: error: ld returned 1 exit status
make[2]: *** [Makefile:1443: opal_wrapper] Error 1
make[2]: Leaving directory '/home/daniel/ompi/opal/tools/wrappers'
make[1]: *** [Makefile:1868: all-recursive] Error 1
make[1]: Leaving directory '/home/daniel/ompi/opal'
make: *** [Makefile:1437: all-recursive] Error 1
Command exited with non-zero status 2
104.88user 28.62system 0:50.56elapsed 264%CPU (0avgtext+0avgdata 22904maxresident)k
3608inputs+327112outputs (0major+7489985minor)pagefaults 0swaps

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions