Open
Description
Background information
What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)
3.1.2 tar available here: https://www.open-mpi.org/software/ompi/v3.1/
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
From tarball configured with:
Configure command line: '--enable-mpirun-prefix-by-default' '--enable-debug' '--enable-mem-debug' '--enable-mpi-fortran=no'
Please describe the system on which you are running
- Operating system/version: CentOS Linux release 7.5.1804 (Core)
- Computer hardware: haswell
- Network type: Infiniband controller: Mellanox Technologies MT27600 [Connect-IB]
- C compiler family name: GNU
- C compiler version: 4.8.5
Details of the problem
I'm using OSU benchmarks 5.3 for an intra-node accumulate latency benchmark that works fine with 3.1.0 but fails with 3.1.2.
[akvenkatesh@hsw225 build]$ mpirun -np 2 ./libexec/osu-micro-benchmarks/mpi/one-sided/osu_acc_latency
# OSU MPI_Accumulate latency Test v5.3
# Window creation: MPI_Win_allocate
# Synchronization: MPI_Win_flush
# Size Latency (us)
0 0.23
[hsw225:63377:0:63377] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
==== backtrace ====
0 /home/akvenkatesh/ucx/build/lib/libucs.so.0(+0x232e9) [0x2b7dbd4922e9]
1 /home/akvenkatesh/ucx/build/lib/libucs.so.0(+0x2342f) [0x2b7dbd49242f]
2 /home/akvenkatesh/ompi-non-git/openmpi-3.1.2/build-vanilla/lib/openmpi/mca_btl_openib.so(mca_btl_openib_get+0x144) [0x2b7dbb251a05]
3 /home/akvenkatesh/ompi-non-git/openmpi-3.1.2/build-vanilla/lib/openmpi/mca_osc_rdma.so(ompi_osc_get_data_blocking+0x2a8) [0x2b7dc6b6ba11]
4 /home/akvenkatesh/ompi-non-git/openmpi-3.1.2/build-vanilla/lib/openmpi/mca_osc_rdma.so(+0xe851) [0x2b7dc6b73851]
5 /home/akvenkatesh/ompi-non-git/openmpi-3.1.2/build-vanilla/lib/openmpi/mca_osc_rdma.so(+0x10575) [0x2b7dc6b75575]
6 /home/akvenkatesh/ompi-non-git/openmpi-3.1.2/build-vanilla/lib/openmpi/mca_osc_rdma.so(ompi_osc_rdma_accumulate+0x14a) [0x2b7dc6b75f08]
7 /home/akvenkatesh/ompi-non-git/openmpi-3.1.2/build-vanilla/lib/libmpi.so.40(PMPI_Accumulate+0x438) [0x2b7da9cb5fa4]
8 ./libexec/osu-micro-benchmarks/mpi/one-sided/osu_acc_latency() [0x401cc0]
9 ./libexec/osu-micro-benchmarks/mpi/one-sided/osu_acc_latency() [0x4018d8]
10 /usr/lib64/libc.so.6(__libc_start_main+0xf5) [0x2b7daa1c6445]
11 ./libexec/osu-micro-benchmarks/mpi/one-sided/osu_acc_latency() [0x401479]
===================
[hsw225:63377:0:63377] Process frozen...
If I use --mca osc ucx explicity I see a hang:
[akvenkatesh@hsw225 build]$ mpirun -np 2 --mca osc ucx ./libexec/osu-micro-benchmarks/mpi/one-sided/osu_acc_latency
# OSU MPI_Accumulate latency Test v5.3
# Window creation: MPI_Win_allocate
# Synchronization: MPI_Win_flush
# Size Latency (us)
These are the backtraces:
(gdb) bt
#0 0x00002ab3c3587976 in ucs_callbackq_leave (cbq=0x1b47290) at ../../../src/ucs/datastruct/callbackq.c:64
#1 0x00002ab3c3588b16 in ucs_callbackq_slow_proxy (arg=0x1b47290) at ../../../src/ucs/datastruct/callbackq.c:391
#2 0x00002ab3c2d42b9c in ucs_callbackq_dispatch (cbq=0x1b47290)
at /home/akvenkatesh/ucx/build/../src/ucs/datastruct/callbackq.h:208
#3 0x00002ab3c2d47d70 in uct_worker_progress (worker=0x1b47290) at /home/akvenkatesh/ucx/build/../src/uct/api/uct.h:1644
#4 ucp_worker_progress (worker=0x1b5ef80) at ../../../src/ucp/core/ucp_worker.c:1381
#5 0x00002ab3c2d4bd0e in ucp_rma_wait (worker=0x1b5ef80, user_req=0x312a8e0, op_name=0x2ab3c2dbde9a "atomic_fadd64")
at ../../../src/ucp/rma/rma.inl:49
#6 0x00002ab3c2d4e885 in ucp_atomic_fetch_b (ep=0x2ab3e2c0f0e0, opcode=UCP_ATOMIC_FETCH_OP_FADD, value=1, result=0x7ffde77b99f8,
size=8, remote_addr=53173648, rkey=0x3121b40, op_name=0x2ab3c2dbde9a "atomic_fadd64") at ../../../src/ucp/rma/amo_basic.c:263
#7 0x00002ab3c2d4ec55 in ucp_atomic_fadd64_inner (result=0x7ffde77b99f8, rkey=0x3121b40, remote_addr=53173648, add=1,
ep=0x2ab3e2c0f0e0) at ../../../src/ucp/rma/amo_basic.c:292
#8 ucp_atomic_fadd64 (ep=0x2ab3e2c0f0e0, add=1, remote_addr=53173648, rkey=0x3121b40, result=0x7ffde77b99f8)
at ../../../src/ucp/rma/amo_basic.c:288
#9 0x00002ab3cc634ece in start_shared (module=0x1ef6980, target=1) at ../../../../../ompi/mca/osc/ucx/osc_ucx_passive_target.c:28
#10 0x00002ab3cc63545f in ompi_osc_ucx_lock (lock_type=2, target=1, assert=0, win=0x1b38310)
at ../../../../../ompi/mca/osc/ucx/osc_ucx_passive_target.c:145
#11 0x00002ab3afd6b639 in PMPI_Win_lock (lock_type=2, rank=1, assert=0, win=0x1b38310) at pwin_lock.c:66
#12 0x0000000000401bf6 in run_acc_with_flush (rank=0, type=WIN_ALLOCATE) at ../../../mpi/one-sided/osu_acc_latency.c:207
#13 0x00000000004018d8 in main (argc=1, argv=0x7ffde77b9c88) at ../../../mpi/one-sided/osu_acc_latency.c:128
(gdb) bt
#0 progress_callback () at ../../../../../ompi/mca/osc/ucx/osc_ucx_component.c:107
#1 0x00002b66e1b65562 in opal_progress () at ../../opal/runtime/opal_progress.c:228
#2 0x00002b66e0edc326 in ompi_request_wait_completion (req=0x32cdde0) at ../../ompi/request/request.h:413
#3 0x00002b66e0edc360 in ompi_request_default_wait (req_ptr=0x7fffa3250a80, status=0x7fffa3250a60)
at ../../ompi/request/req_wait.c:42
#4 0x00002b66e0f82efd in ompi_coll_base_sendrecv_zero (dest=0, stag=-16, source=0, rtag=-16, comm=0x607800 <ompi_mpi_comm_world>)
at ../../../../ompi/mca/coll/base/coll_base_barrier.c:64
#5 0x00002b66e0f835b8 in ompi_coll_base_barrier_intra_two_procs (comm=0x607800 <ompi_mpi_comm_world>, module=0x2f1ad30)
at ../../../../ompi/mca/coll/base/coll_base_barrier.c:300
#6 0x00002b66fd4f7a3d in ompi_coll_tuned_barrier_intra_dec_fixed (comm=0x607800 <ompi_mpi_comm_world>, module=0x2f1ad30)
at ../../../../../ompi/mca/coll/tuned/coll_tuned_decision_fixed.c:196
#7 0x00002b66e0efdf5e in PMPI_Barrier (comm=0x607800 <ompi_mpi_comm_world>) at pbarrier.c:63
#8 0x0000000000401e2d in run_acc_with_flush (rank=1, type=WIN_ALLOCATE) at ../../../mpi/one-sided/osu_acc_latency.c:219
#9 0x00000000004018d8 in main (argc=1, argv=0x7fffa3250cc8) at ../../../mpi/one-sided/osu_acc_latency.c:128
If I disable openib, it works for intra-node and I get this:
[akvenkatesh@hsw225 build]$ mpirun -np 2 --mca btl ^openib ./libexec/osu-micro-benchmarks/mpi/one-sided/osu_acc_latency
# OSU MPI_Accumulate latency Test v5.3
# Window creation: MPI_Win_allocate
# Synchronization: MPI_Win_flush
# Size Latency (us)
0 0.11
1 0.11
2 0.11
4 0.12
8 0.12
16 0.14
32 0.17
64 0.25
128 0.39
256 0.65
512 1.18
1024 2.19
2048 4.29
4096 8.36
8192 16.63
16384 34.97
32768 68.19
65536 133.99
131072 266.68
262144 533.38
524288 1057.40
1048576 2119.27
2097152 4233.57
4194304 8448.53
The same doesn't work for inter-node case, with the following non-blocking allreduce failure stemming from Win_allocate:
[akvenkatesh@hsw225 build]$ mpirun -np 2 --hostfile $PWD/hostfile ./libexec/osu-micro-benchmarks/mpi/one-sided/osu_acc_latency [hsw224:37186] ../../../../../ompi/mca/pml/ucx/pml_ucx.c:296 Error: ucp_ep_create(proc=0) failed: Destination is unreachable
[hsw224:37186] ../../../../../ompi/mca/pml/ucx/pml_ucx.c:362 Error: Failed to resolve UCX endpoint for rank 0
Error in MPI_Isend(22728420, 1, 0x2b4d32c1dfe0, 0, -26, 6322176) (-1)
osu_acc_latency: ../../../../../ompi/mca/coll/libnbc/nbc_iallreduce.c:185: ompi_coll_libnbc_iallreduce: Assertion `((0xdeafbeedULL << 32) + 0xdeafbeedULL) == ((opal_object_t *) (schedule))->obj_magic_id' failed.
[hsw224:37186] *** Process received signal ***
[hsw224:37186] Signal: Aborted (6)
[hsw224:37186] Signal code: (-6)
[hsw224:37186] [ 0] /lib64/libpthread.so.0(+0xf6d0)[0x2b4d32c4d6d0]
[hsw224:37186] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x2b4d32e90277]
[hsw224:37186] [ 2] /lib64/libc.so.6(abort+0x148)[0x2b4d32e91968]
[hsw224:37186] [ 3] /lib64/libc.so.6(+0x2f096)[0x2b4d32e89096]
[hsw224:37186] [ 4] /lib64/libc.so.6(+0x2f142)[0x2b4d32e89142]
[hsw224:37186] [ 5] /home/akvenkatesh/ompi-non-git/openmpi-3.1.2/build-vanilla/lib/openmpi/mca_coll_libnbc.so(ompi_coll_libnbc_iallreduce+0x610)[0x2b4d460dc82d]
[hsw224:37186] [ 6] /home/akvenkatesh/ompi-non-git/openmpi-3.1.2/build-vanilla/lib/libmpi.so.40(+0x34546)[0x2b4d328ca546]
[hsw224:37186] [ 7] /home/akvenkatesh/ompi-non-git/openmpi-3.1.2/build-vanilla/lib/libmpi.so.40(+0x338e0)[0x2b4d328c98e0]
[hsw224:37186] [ 8] /home/akvenkatesh/ompi-non-git/openmpi-3.1.2/build-vanilla/lib/libmpi.so.40(+0x37945)[0x2b4d328cd945]
[hsw224:37186] [ 9] /home/akvenkatesh/ompi-non-git/openmpi-3.1.2/build-vanilla/lib/libopen-pal.so.40(opal_progress+0x30)[0x2b4d33578562]
[hsw224:37186] [10] /home/akvenkatesh/ompi-non-git/openmpi-3.1.2/build-vanilla/lib/libmpi.so.40(+0x32f0b)[0x2b4d328c8f0b]
[hsw224:37186] [11] /home/akvenkatesh/ompi-non-git/openmpi-3.1.2/build-vanilla/lib/libmpi.so.40(ompi_comm_nextcid+0x6c)[0x2b4d328c970b]
[hsw224:37186] [12] /home/akvenkatesh/ompi-non-git/openmpi-3.1.2/build-vanilla/lib/libmpi.so.40(ompi_comm_dup_with_info+0x10b)[0x2b4d328c622c]
[hsw224:37186] [13] /home/akvenkatesh/ompi-non-git/openmpi-3.1.2/build-vanilla/lib/libmpi.so.40(ompi_comm_dup+0x25)[0x2b4d328c611f]
[hsw224:37186] [14] /home/akvenkatesh/ompi-non-git/openmpi-3.1.2/build-vanilla/lib/openmpi/mca_osc_rdma.so(+0x154e8)[0x2b4d477a64e8]
[hsw224:37186] [15] /home/akvenkatesh/ompi-non-git/openmpi-3.1.2/build-vanilla/lib/libmpi.so.40(ompi_osc_base_select+0x155)[0x2b4d329abaa5]
[hsw224:37186] [16] /home/akvenkatesh/ompi-non-git/openmpi-3.1.2/build-vanilla/lib/libmpi.so.40(ompi_win_allocate+0x7b)[0x2b4d328f71d3]
[hsw224:37186] [17] /home/akvenkatesh/ompi-non-git/openmpi-3.1.2/build-vanilla/lib/libmpi.so.40(MPI_Win_allocate+0x256)[0x2b4d3296cfa4]
[hsw224:37186] [18] ./libexec/osu-micro-benchmarks/mpi/one-sided/osu_acc_latency[0x40448c]
[hsw224:37186] [19] ./libexec/osu-micro-benchmarks/mpi/one-sided/osu_acc_latency[0x401ba3]
[hsw224:37186] [20] ./libexec/osu-micro-benchmarks/mpi/one-sided/osu_acc_latency[0x4018d8]
[hsw224:37186] [21] /lib64/libc.so.6(__libc_start_main+0xf5)[0x2b4d32e7c445]
[hsw224:37186] [22] ./libexec/osu-micro-benchmarks/mpi/one-sided/osu_acc_latency[0x401479]
[hsw224:37186] *** End of error message ***
[1539818461.353517] [hsw224:37186:0] select.c:312 UCX ERROR no active messages transport to <no debug data>: Unsupported operation
# OSU MPI_Accumulate latency Test v5.3
# Window creation: MPI_Win_allocate
# Synchronization: MPI_Win_flush
# Size Latency (us)
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 37186 on node hsw224 exited on signal 6 (Aborted).
--------------------------------------------------------------------------
Changing the allocation to Win_create or Win_create_dynamic doesn't change the problem because all these go to ompi_comm_dup eventually.
What am I missing?