Skip to content

[5.0.0rc10] oshmem segfault in mca_memheap_modex_recv_all() #11430

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
david-edwards-arm opened this issue Feb 21, 2023 · 6 comments
Closed

[5.0.0rc10] oshmem segfault in mca_memheap_modex_recv_all() #11430

david-edwards-arm opened this issue Feb 21, 2023 · 6 comments
Milestone

Comments

@david-edwards-arm
Copy link

Thank you for taking the time to submit an issue!

Background information

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

5.0.0rc10 with ucx 1.13.1

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Source tarball

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

Please describe the system on which you are running

  • Operating system/version: Ubuntu 18.04
  • Computer hardware: x86_64
  • Compiler: system gcc (7.5.0)

Details of the problem

The following oshmem program crashes in mca_memheap_modex_recv_all() in memheap_base_mkey.c. This is due to a type mismatch in the (third) size argument to the PMIX_DATA_BUFFER_UNLOAD() call, which expects size_t* but is given int*. There is an assert() in the preceding lines guarding against this scenario which appears outdated.
A compiler type mismatch warning is also given for the char** vs. void** of the (second) send_buffer argument, though this is not a cause of the segfault.

#include <shmem.h>

int main()
{
    start_pes(0);
    return 0;
}
@david-edwards-arm
Copy link
Author

The segfault occurs because the send_buffer address is corrupted due to the size incorrect data type.

@david-edwards-arm
Copy link
Author

Incomplete but illustrative patch is

--- a/oshmem/mca/memheap/base/memheap_base_mkey.c
+++ b/oshmem/mca/memheap/base/memheap_base_mkey.c
@@ -583,7 +583,8 @@
     assert(sizeof(int32_t) == sizeof(int));
 
     /* Do allgather */
-    PMIX_DATA_BUFFER_UNLOAD(msg, send_buffer, size);
+    PMIX_DATA_BUFFER_UNLOAD(msg, send_buffer, buffer_size);
+    size = (int)buffer_size;
     MEMHEAP_VERBOSE(1, "local keys packed into %d bytes, %d segments", size, memheap_map->n_segments);
 
     OPAL_TIMING_ENV_NEXT(recv_all, "serialize data");

rzambre added a commit to rzambre/ompi that referenced this issue Mar 23, 2023
Fixes segfaults during shmem_init() reported in open-mpi#11430 and

Signed-off-by: Rohit Zambre <[email protected]>
rzambre added a commit to rzambre/ompi that referenced this issue Mar 23, 2023
Fixes segfaults during shmem_init() reported in open-mpi#11430 and

Signed-off-by: Rohit Zambre <[email protected]>
@jsquyres jsquyres added this to the v5.0.0 milestone Mar 23, 2023
@jsquyres
Copy link
Member

@janjust Ping

@janjust
Copy link
Contributor

janjust commented Mar 28, 2023

@jsquyres also fixed with #11525

@janjust
Copy link
Contributor

janjust commented Mar 28, 2023

I'll close when PR goes in

rzambre added a commit to rzambre/ompi that referenced this issue Mar 31, 2023
Fixes segfaults during shmem_init() reported in open-mpi#11430 and

Signed-off-by: Rohit Zambre <[email protected]>
rzambre added a commit to rzambre/ompi that referenced this issue Mar 31, 2023
Fixes segfaults during shmem_init() reported in open-mpi#11430 and

Signed-off-by: Rohit Zambre <[email protected]>
rzambre added a commit to rzambre/ompi that referenced this issue Mar 31, 2023
Fixes segfaults during shmem_init() reported in open-mpi#11430
and open-mpi#11524.

Signed-off-by: Rohit Zambre <[email protected]>
rzambre added a commit to rzambre/ompi that referenced this issue Mar 31, 2023
Fixes segfaults during shmem_init() reported in open-mpi#11430
and open-mpi#11524.

Signed-off-by: Rohit Zambre <[email protected]>
(cherry picked from commit 974f0c3)
@janjust
Copy link
Contributor

janjust commented Apr 12, 2023

fixed with #11550

@janjust janjust closed this as completed Apr 12, 2023
yli137 pushed a commit to yli137/ompi that referenced this issue Jan 10, 2024
Fixes segfaults during shmem_init() reported in open-mpi#11430 and

Signed-off-by: Rohit Zambre <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants