Skip to content

osu_latency tests with CUDA segfault after mpi_memory_alloc_kinds is introduced #13096

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jiaxiyan opened this issue Feb 13, 2025 · 5 comments
Closed
Assignees

Comments

@jiaxiyan
Copy link
Contributor

Background information

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

main branch

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

build main branch from source

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

08e41ed 3rd-party/openpmix (v1.1.3-4067-g08e41ed5)
30cadc6746ebddd69ea42ca78b964398f782e4e3 3rd-party/prrte (psrvr-v2.0.0rc1-4839-g30cadc6746)
dfff67569fb72dbf8d73a1dcf74d091dad93f71b config/oac (dfff675)

Please describe the system on which you are running

  • Operating system/version: Amazon Linux2
  • Computer hardware: p4d.24xlarge
  • Network type: Elastic Fabric Adapter

Details of the problem

osu-micro-benchmarks cuda tests are failing with segfault since #13055 is merged

mpirun --wdir . -n 2 --hostfile hostfile --map-by ppr:2:node --timeout 1800 -x LD_LIBRARY_PATH=/opt/amazon/efa/lib64 -x PATH  /home/osu-micro-benchmarks/mpi/pt2pt/osu_latency  --buffer-num multiple -d cuda H D

2025-02-12 18:03:27,068 - INFO - utils - mpirun output:
# OSU MPI-CUDA Latency Test
# Send Buffer on HOST (H) and Receive Buffer on DEVICE (D)
# Size          Latency (us)
0                       0.65
[ip-172-31-17-116:33408] *** Process received signal ***
[ip-172-31-17-116:33408] Signal: Segmentation fault (11)
[ip-172-31-17-116:33408] Signal code: Invalid permissions (2)
[ip-172-31-17-116:33408] Failing at address: 0x7f1303600000
[ip-172-31-17-116:33408] [ 0] /lib64/libpthread.so.0(+0x118e0)[0x7f133b5258e0]
[ip-172-31-17-116:33408] [ 1] /lib64/libc.so.6(+0x14dbeb)[0x7f133b2b4beb]
[ip-172-31-17-116:33408] [ 2] /opt/amazon/efa/lib64/libfabric.so.1(+0x1f672)[0x7f12e78cc672]
[ip-172-31-17-116:33408] [ 3] /opt/amazon/efa/lib64/libfabric.so.1(+0x1f627)[0x7f12e78cc627]
....
[ip-172-31-17-116:33408] *** End of error message ***
--------------------------------------------------------------------------
    This help section is empty because PRRTE was built without Sphinx.
--------------------------------------------------------------------------

The backtrace shows segfault comes from memcpy attempting to copy 1 byte from an inaccessible memory address.

(gdb) bt
#0  0x00007f91a9139be8 in __memmove_avx_unaligned_erms () from /lib64/libc.so.6
#1  0x00007f915561432f in ofi_memcpy (device=0, dest=0x7f9145412da0, src=0x7f9173200000, size=1)
    at ./include/ofi_hmem.h:263
#2  0x00007f91556144eb in ofi_copy_from_hmem (iface=FI_HMEM_SYSTEM, device=0, dest=0x7f9145412da0, src=0x7f9173200000,
    size=1) at ./include/ofi_hmem.h:405
#3  0x00007f9155614eb6 in ofi_copy_mr_iov (mr=0x0, iov=0x7ffd8b06c5f0, iov_count=1, offset=0, buf=0x7f9145412da0,
    size=191, dir=0) at src/hmem.c:458
#4  0x00007f9155614f53 in ofi_copy_from_mr_iov (dest=0x7f9145412da0, size=192, mr=0x0, iov=0x7ffd8b06c5f0, iov_count=1,
    iov_offset=0) at src/hmem.c:473
#5  0x00007f9155731e03 in smr_format_inline (cmd=0x7f9145412d60, mr=0x0, iov=0x7ffd8b06c5f0, count=1)
    at prov/shm/src/smr_ep.c:277
#6  0x00007f9155732e20 in smr_do_inline (ep=0x20ce8f20, peer_smr=0x7f9145396000, id=1, peer_id=0, op=1, tag=1, data=0,
    op_flags=131072, desc=0x0, iov=0x7ffd8b06c5f0, iov_count=1, total_len=1, context=0x0, cmd=0x7f9145412d60)
    at prov/shm/src/smr_ep.c:647
#7  0x00007f915572b559 in smr_generic_inject (ep_fid=0x20ce8f20, buf=0x7f9173200000, len=1, dest_addr=1, tag=1, data=0,
    op=1, op_flags=131072) at prov/shm/src/smr_msg.c:214
#8  0x00007f915572bb75 in smr_tinjectdata (ep_fid=0x20ce8f20, buf=0x7f9173200000, len=1, data=0, dest_addr=1, tag=1)
    at prov/shm/src/smr_msg.c:394
#9  0x00007f91556c47fa in fi_tinjectdata (ep=0x20ce8f20, buf=0x7f9173200000, len=1, data=0, dest_addr=1, tag=1)
    at ./include/rdma/fi_tagged.h:149
#10 0x00007f91556c6c0d in efa_rdm_msg_tinjectdata (ep_fid=0x20ce83c0, buf=0x7f9173200000, len=1, data=0, dest_addr=1,
    tag=1) at prov/efa/src/rdm/efa_rdm_msg.c:594
#11 0x00007f9154103d5a in fi_tinjectdata (ep=0x20ce83c0, buf=0x7f9173200000, len=1, data=0, dest_addr=1, tag=1)
    at /home/ec2-user/PortaFiducia/build/libraries/libfabric/v1.22.x/install/libfabric/include/rdma/fi_tagged.h:149
#12 0x00007f915410c12e in ompi_mtl_ofi_send_generic (ofi_cq_data=true, mode=MCA_PML_BASE_SEND_STANDARD,
    convertor=0x7ffd8b06df60, tag=1, dest=1, comm=0x62e960 <ompi_mpi_comm_world>, mtl=0x7f9154335260 <ompi_mtl_ofi>)
    at mtl_ofi.h:937
#13 ompi_mtl_ofi_send_true (mtl=0x7f9154335260 <ompi_mtl_ofi>, comm=0x62e960 <ompi_mpi_comm_world>, dest=1, tag=1,
    convertor=0x7ffd8b06df60, mode=MCA_PML_BASE_SEND_STANDARD) at mtl_ofi_send_opt.c:38
#14 0x00007f9154985256 in mca_pml_cm_send (buf=0x7f9173200000, count=1, datatype=0x62df60 <ompi_mpi_char>, dst=1, tag=1,
    sendmode=MCA_PML_BASE_SEND_STANDARD, comm=0x62e960 <ompi_mpi_comm_world>) at pml_cm.h:347
#15 0x00007f91a98cecbd in PMPI_Send (buf=0x7f9173200000, count=1, type=0x62df60 <ompi_mpi_char>, dest=1, tag=1,
    comm=0x62e960 <ompi_mpi_comm_world>) at send.c:93
#16 0x00000000004029bc in main (argc=<optimized out>, argv=<optimized out>) at osu_latency.c:168
@jiaxiyan
Copy link
Contributor Author

@edgargabriel I tried adding --memory-alloc-kinds cuda to mpirun but it doesn't help. Please advise

@edgargabriel
Copy link
Member

I will try to see whether I can reproduce it, but your output doesn't actually show anything related to memkind in the backtrace.

@edgargabriel
Copy link
Member

are you using the embedded PRRTE/PMIX versions, or do you have external ones?

@edgargabriel
Copy link
Member

@jiaxiyan could you please test whether #13097 fixes the issue for you? With this patch I was able to run osu_latency on cuda devices (but it was with UCX, not libfabric)

@jiaxiyan
Copy link
Contributor Author

@edgargabriel Yes it is fixed. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants