Skip to content

Unable to build ompi with cuda support since commit e1df5de because of 'multiple definitions' #8772

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dbonner opened this issue Apr 6, 2021 · 1 comment

Comments

@dbonner
Copy link

dbonner commented Apr 6, 2021

Thank you for taking the time to submit an issue!

Background information

I have been unable to build ompi from source since commit e1df5de (the first commit on 26 March 2021).
I can build successfully from the earlier commit cbcb570

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

I am trying to build from git branch master.

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

git clone

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

The following refers to the first commit when it starts failing. It fails in the same way up until the latest commit.
7826fb8 3rd-party/openpmix (v1.1.3-2884-g7826fb8b)
b97c686b81e82f18e9b15ef1480cc9aca1331904 3rd-party/prrte (dev-31045-gb97c686b81)

Please describe the system on which you are running

  • Operating system/version:
    Ubuntu 20.10
    CUDA 11.2.2-1
  • Computer hardware:
    I get this error on all 3 of my machines (which run the same OS and software):
    Inspur Dual AMD 7742 8xNVIDIA A100 GPUs
    Intel i9-10980XE 1xNVIDIA 2080Ti
    Intel i7-9750H 1xNVIDIA 2080 Max Q
  • Network type:
    TCP (mellanox driver installed)

Details of the problem

Please describe, in detail, the problem that you are having, including the behavior you expect to see, the actual behavior that you are seeing, steps to reproduce the problem, etc. It is most helpful if you can attach a small program that a developer can use to reproduce your problem.

This is part of a script where $PW iw my system password:

The commands I use to build ompi are:
export AUTOMAKE_JOBS=256
\time ./autogen.pl 2>&1 | tee my-autogen-pl-log-file.txt
\time ./configure --disable-picky --prefix=/usr/local --with-cuda=/usr/local/cuda-11.2 --with-ucx=/usr/local/ucx --with-pmix=internal --with-prrte=internal --with-hwloc=internal --with-libevent=internal 2>&1 | tee my-configure-log-file.txt
\time make -j 256 all 2>&1 | tee my-make-all-log-file.txt
echo $PW | sudo -S --user=root --login bash -c 'cd /home/daniel/localgpu/ompi; AUTOMAKE_JOBS=256 \time make --directory=/home/daniel/localgpu/ompi -j 256 install 2>&1 | tee my-sudo-s-make-install-log-file.txt && chown daniel:daniel my-sudo-s-make-install-log-file.txt'
echo $PW | sudo -S --user=root --login ldconfig

The error occurs at the end of the 'make all' command. The output is:

make[2]: Entering directory '/home/daniel/localgpu/ompi/opal'
  CC       class/opal_cstring.lo
  CC       class/opal_bitmap.lo
  CC       class/opal_list.lo
  CC       class/opal_hash_table.lo
  CC       class/opal_free_list.lo
  CC       class/opal_lifo.lo
  CC       class/opal_graph.lo
  CC       memoryhooks/memory.lo
  CC       class/opal_pointer_array.lo
  CC       class/opal_hotel.lo
  CC       class/opal_ring_buffer.lo
  CC       class/opal_object.lo
  CC       runtime/opal_finalize.lo
  CC       class/opal_value_array.lo
  CC       runtime/opal_progress.lo
  CC       class/opal_interval_tree.lo
  CC       class/opal_fifo.lo
  CC       class/opal_rb_tree.lo
  CC       runtime/opal_params.lo\
  CC       runtime/opal_progress_threads.lo
  CC       runtime/opal_init.lo
  CC       runtime/opal_info_support.lo
  CCLD     libopen-pal.la
/usr/bin/ld: mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o): in function `mca_common_cuda_free':common_cuda.c:(.text+0x1e0): multiple definition of `mca_common_cuda_free'; mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):common_cuda.c:(.text+0x1e0): first defined here
/usr/bin/ld: mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o): in function `opal_cuda_memcpy':common_cuda.c:(.text+0x230): multiple definition of `opal_cuda_memcpy'; mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):common_cuda.c:(.text+0x230): first defined here
/usr/bin/ld: mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):(.bss+0x0): multiple definition of `opal_cuda_verbose'; mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):(.bss+0x0): first defined here
/usr/bin/ld: mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o): in function `mca_common_cuda_malloc':common_cuda.c:(.text+0x560): multiple definition of `mca_common_cuda_malloc'; mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):common_cuda.c:(.text+0x560): first defined here
/usr/bin/ld: mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):(.bss+0x40): multiple definition of `mca_common_cuda_enabled'; mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):(.bss+0x40): first defined here
/usr/bin/ld: mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):(.data+0x4): multiple definition of `cuda_event_max'; mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):(.data+0x4): first defined here
/usr/bin/ld: mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):(.bss+0x38): multiple definition of `cuda_event_ipc_array'; mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):(.bss+0x38): first defined here
/usr/bin/ld: mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):(.bss+0x20): multiple definition of `cuda_event_ipc_frag_array'; mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):(.bss+0x20): first defined here
/usr/bin/ld: mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):(.bss+0x30): multiple definition of `cuda_event_dtoh_array'; mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):(.bss+0x30): first defined here 
/usr/bin/ld: mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):(.bss+0x18): multiple definition of `cuda_event_dtoh_frag_array'; mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):(.bss+0x18): first defined here
/usr/bin/ld: mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):(.bss+0x28): multiple definition of `cuda_event_htod_array'; mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):(.bss+0x28): first defined here 
/usr/bin/ld: mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):(.bss+0x10): multiple definition of `cuda_event_htod_frag_array'; mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):(.bss+0x10): first defined here
/usr/bin/ld: mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o): in function `mca_common_cuda_register_mca_variables':common_cuda.c:(.text+0xff0): multiple definition of `mca_common_cuda_register_mca_variables'; mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):common_cuda.c:(.text+0xff0): first defined here
/usr/bin/ld: mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o): in function `mca_common_cuda_fini':common_cuda.c:(.text+0x1200): multiple definition of `mca_common_cuda_fini'; mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):common_cuda.c:(.text+0x1200): first defined here
/usr/bin/ld: mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o): in function `mca_common_cuda_construct_event_and_handle':common_cuda.c:(.text+0x1e90): multiple definition of `mca_common_cuda_construct_event_and_handle'; mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):common_cuda.c:(.text+0x1e90): first defined here
/usr/bin/ld: mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o): in function `mca_common_cuda_destruct_event':common_cuda.c:(.text+0x1f40): multiple definition of `mca_common_cuda_destruct_event'; mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):common_cuda.c:(.text+0x1f40): first defined here
/usr/bin/ld: mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o): in function `mca_common_wait_stream_synchronize':common_cuda.c:(.text+0x1fb0): multiple definition of `mca_common_wait_stream_synchronize'; mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):common_cuda.c:(.text+0x1fb0): first defined here
/usr/bin/ld: mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o): in function `mca_common_cuda_memcpy':common_cuda.c:(.text+0x1fc0): multiple definition of `mca_common_cuda_memcpy'; mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):common_cuda.c:(.text+0x1fc0): first defined here
/usr/bin/ld: mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o): in function `mca_common_cuda_record_dtoh_event':common_cuda.c:(.text+0x24c0): multiple definition of `mca_common_cuda_record_dtoh_event'; mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):common_cuda.c:(.text+0x24c0): first defined here
/usr/bin/ld: mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o): in function `mca_common_cuda_record_htod_event':common_cuda.c:(.text+0x26a0): multiple definition of `mca_common_cuda_record_htod_event'; mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):common_cuda.c:(.text+0x26a0): first defined here
/usr/bin/ld: mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o): in function `mca_common_cuda_get_dtoh_stream':common_cuda.c:(.text+0x2880): multiple definition of `mca_common_cuda_get_dtoh_stream'; mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):common_cuda.c:(.text+0x2880): first defined here
/usr/bin/ld: mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o): in function `mca_common_cuda_get_htod_stream':common_cuda.c:(.text+0x2890): multiple definition of `mca_common_cuda_get_htod_stream'; mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):common_cuda.c:(.text+0x2890): first defined here
/usr/bin/ld: mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o): in function `progress_one_cuda_ipc_event':common_cuda.c:(.text+0x28a0): multiple definition of `progress_one_cuda_ipc_event'; mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):common_cuda.c:(.text+0x28a0): first defined here
/usr/bin/ld: mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o): in function `progress_one_cuda_dtoh_event':common_cuda.c:(.text+0x2aa0): multiple definition of `progress_one_cuda_dtoh_event'; mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):common_cuda.c:(.text+0x2aa0): first defined here
/usr/bin/ld: mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o): in function `progress_one_cuda_htod_event':common_cuda.c:(.text+0x2cc0): multiple definition of `progress_one_cuda_htod_event'; mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):common_cuda.c:(.text+0x2cc0): first defined here
/usr/bin/ld: mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o): in function `mca_common_cuda_memhandle_matches':common_cuda.c:(.text+0x2ee0): multiple definition of `mca_common_cuda_memhandle_matches'; mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):common_cuda.c:(.text+0x2ee0): first defined here
/usr/bin/ld: mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o): in function `mca_common_cuda_get_device':common_cuda.c:(.text+0x2f80): multiple definition of `mca_common_cuda_get_device'; mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):common_cuda.c:(.text+0x2f80): first defined here
/usr/bin/ld: mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o): in function `mca_common_cuda_device_can_access_peer':common_cuda.c:(.text+0x3000): multiple definition of `mca_common_cuda_device_can_access_peer'; mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):common_cuda.c:(.text+0x3000): first defined here  
/usr/bin/ld: mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o): in function `mca_common_cuda_get_address_range':common_cuda.c:(.text+0x3040): multiple definition of `mca_common_cuda_get_address_range'; mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):common_cuda.c:(.text+0x3040): first defined here       
/usr/bin/ld: mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o): in function `mca_common_cuda_previously_freed_memory':common_cuda.c:(.text+0x3110): multiple definition of `mca_common_cuda_previously_freed_memory'; mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):common_cuda.c:(.text+0x3110): first defined here
/usr/bin/ld: mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o): in function `mca_common_cuda_get_buffer_id':common_cuda.c:(.text+0x3210): multiple definition of `mca_common_cuda_get_buffer_id'; mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):common_cuda.c:(.text+0x3210): first defined here
/usr/bin/ld: mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o): in function `opal_cuda_add_initialization_function':common_cuda.c:(.text+0x3310): multiple definition of `opal_cuda_add_initialization_function'; mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):common_cuda.c:(.text+0x3310): first defined here
/usr/bin/ld: mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o): in function `mca_common_cuda_stage_one_init':common_cuda.c:(.text+0x3320): multiple definition of `mca_common_cuda_stage_one_init'; mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):common_cuda.c:(.text+0x3320): first defined here
/usr/bin/ld: mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o): in function `mca_cuda_convertor_init':common_cuda.c:(.text+0x4530): multiple definition of `mca_cuda_convertor_init'; mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):common_cuda.c:(.text+0x4530): first defined here
/usr/bin/ld: mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o): in function `opal_cuda_check_bufs':common_cuda.c:(.text+0x4590): multiple definition of `opal_cuda_check_bufs'; mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):common_cuda.c:(.text+0x4590): first defined here
/usr/bin/ld: mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o): in function `opal_cuda_check_one_buf':common_cuda.c:(.text+0x4600): multiple definition of `opal_cuda_check_one_buf'; mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):common_cuda.c:(.text+0x4600): first defined here
/usr/bin/ld: mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o): in function `opal_cuda_malloc':common_cuda.c:(.text+0x4650): multiple definition of `opal_cuda_malloc'; mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):common_cuda.c:(.text+0x4650): first defined here
/usr/bin/ld: mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o): in function `opal_cuda_free':common_cuda.c:(.text+0x46a0): multiple definition of `opal_cuda_free'; mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):common_cuda.c:(.text+0x46a0): first defined here
/usr/bin/ld: mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o): in function `opal_cuda_memcpy_sync':common_cuda.c:(.text+0x46e0): multiple definition of `opal_cuda_memcpy_sync'; mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):common_cuda.c:(.text+0x46e0): first defined here
/usr/bin/ld: mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o): in function `opal_cuda_memmove':common_cuda.c:(.text+0x4730): multiple definition of `opal_cuda_memmove'; mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):common_cuda.c:(.text+0x4730): first defined here
/usr/bin/ld: mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o): in function `opal_cuda_set_copy_function_async':common_cuda.c:(.text+0x4780): multiple definition of `opal_cuda_set_copy_function_async'; mca/common/cuda/.libs/libmca_common_cuda_noinst.a(common_cuda.o):common_cuda.c:(.text+0x4780): first defined here
collect2: error: ld returned 1 exit status
make[2]: *** [Makefile:1738: libopen-pal.la] Error 1
make[2]: Leaving directory '/home/daniel/localgpu/ompi/opal'
make[1]: *** [Makefile:1866: all-recursive] Error 1
make[1]: Leaving directory '/home/daniel/localgpu/ompi/opal'
make: *** [Makefile:1435: all-recursive] Error 1
Command exited with non-zero status 2

I have also attached my logs.
my-sudo-s-make-install-log-file.txt
my-make-all-log-file.txt
my-configure-log-file.txt
my-autogen-pl-log-file.txt

@bosilca
Copy link
Member

bosilca commented Apr 6, 2021

Duplicate of #8764

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants