Skip to content

OSHMEM segfault #11524

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rzambre opened this issue Mar 23, 2023 · 5 comments
Closed

OSHMEM segfault #11524

rzambre opened this issue Mar 23, 2023 · 5 comments

Comments

@rzambre
Copy link
Contributor

rzambre commented Mar 23, 2023

Thank you for taking the time to submit an issue!

Background information

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

main branch pointing to commit 118b95d.

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

git clone with the following configure flags:

../../../configure --prefix=/path/to/install --with-ucx=/path/to/ucx --without-verbs --disable-man-pages --with-pmix=internal --with-hwloc=internal --enable-mpi-fortran=no --enable-oshmem-fortran=no

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

7d25bd0 3rd-party/openpmix (v1.1.3-3825-g7d25bd02)
4725d89abe53c52343eeb49c90986c4d407d6392 3rd-party/prrte (psrvr-v2.0.0rc1-4609-g4725d89abe)
237ceff1a8ed996d855d69f372be9aaea44919ea config/oac (237ceff)

Please describe the system on which you are running

HPCAC cluster. Both thor and helios partitions face this issue.


Details of the problem

Reproducer:

#include <stdio.h>
#include <shmem.h>
int main()
{
    shmem_init();
    int my_pe = shmem_my_pe();
    printf("My PE index is %d\n", my_pe);
    void *cache_line = shmem_malloc(64);
    shmem_free(cache_line);
    shmem_finalize();
    return 0;
}

Failure:

[rzambre@helios017 shmem]$ oshrun -n 2 -N 1 ./shmem_hello_world
[helios017:386741:0:386741] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
==== backtrace (tid: 386741) ====
 0 0x0000000000012cf0 __funlockfile()  :0
 1 0x00000000000d00a9 __memmove_avx_unaligned_erms()  :0
 2 0x00000000000642da non_overlap_accelerator_copy_content_same_ddt()  opal_datatype_copy.c:0
 3 0x00000000000a1978 ompi_datatype_sndrcv()  ???:0
 4 0x00000000000fbd9b ompi_coll_base_allgatherv_intra_ring()  ???:0
 5 0x0000000000135d1e ompi_coll_tuned_allgatherv_intra_dec_fixed()  ???:0
 6 0x00000000000a3023 PMPI_Allgatherv()  ???:0
 7 0x0000000000043918 oshmem_shmem_allgatherv()  ???:0
 8 0x00000000001374e0 mca_memheap_modex_recv_all()  ???:0
 9 0x00000000000432eb oshmem_shmem_init()  ???:0
10 0x00000000000457a4 pshmem_init()  ???:0
11 0x0000000000400813 main()  /global/home/users/rzambre/play/shmem/./shmem_hello_world.c:7
12 0x000000000003ad85 __libc_start_main()  ???:0
13 0x000000000040074e _start()  ???:0
=================================

An MPI hello world example works successfully though. Code:

#include <stdio.h>
#include <mpi.h>
int main()
{
    int my_pe;
    MPI_Init(NULL, NULL);
    MPI_Comm_rank(MPI_COMM_WORLD, &my_pe);
    printf("My PE index is %d\n", my_pe);
     MPI_Finalize();
    return 0;
}

Successful run:

[rzambre@helios017 shmem]$ mpirun -n 2 -N 1 ./mpi_hello_world
My PE index is 0
My PE index is 1
@rzambre
Copy link
Contributor Author

rzambre commented Mar 23, 2023

Looks similar to #11430.

@jsquyres
Copy link
Member

@janjust Ping

@janjust
Copy link
Contributor

janjust commented Mar 28, 2023

@jsquyres will be fixed with #11525

@janjust
Copy link
Contributor

janjust commented Mar 28, 2023

I'll close when PR goes in

rzambre added a commit to rzambre/ompi that referenced this issue Mar 31, 2023
Fixes segfaults during shmem_init() reported in open-mpi#11430
and open-mpi#11524.

Signed-off-by: Rohit Zambre <[email protected]>
rzambre added a commit to rzambre/ompi that referenced this issue Mar 31, 2023
Fixes segfaults during shmem_init() reported in open-mpi#11430
and open-mpi#11524.

Signed-off-by: Rohit Zambre <[email protected]>
(cherry picked from commit 974f0c3)
@janjust
Copy link
Contributor

janjust commented Apr 12, 2023

fixed with #11525

@janjust janjust closed this as completed Apr 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants