Skip to content

segv in type_dup_fn_* Fortran tests #6346

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jsquyres opened this issue Jan 31, 2019 · 3 comments · Fixed by #6348
Closed

segv in type_dup_fn_* Fortran tests #6346

jsquyres opened this issue Jan 31, 2019 · 3 comments · Fixed by #6348
Assignees
Labels

Comments

@jsquyres
Copy link
Member

jsquyres commented Jan 31, 2019

On master head, I see that the IBM tests for duplicating datatypes are consistently segv'ing:

$ cd ompi-tests/ibm/datatype
$ ./type_dup_fn_mpifh
...segv...

This happens in all 3 of the Fortran tests (type_dup_fn_[mpih|usempi|usempif08]). It does not seem to happen in the C version of this same test (type_dup_fn), which is... weird.

Here's a stack trace from a resulting corefile:

#0  0x0000003fed232495 in raise () from /lib64/libc.so.6
#1  0x0000003fed233c75 in abort () from /lib64/libc.so.6
#2  0x0000003fed2703a7 in __libc_message () from /lib64/libc.so.6
#3  0x0000003fed275dee in malloc_printerr () from /lib64/libc.so.6
#4  0x0000003fed278c3d in _int_free () from /lib64/libc.so.6
#5  0x00002aaaac3e029a in opal_datatype_destruct (datatype=0x731d80)
    at opal_datatype_create.c:83
#6  0x00002aaaab1f0de1 in opal_obj_run_destructors (object=0x731d80)
    at ../../opal/class/opal_object.h:462
#7  0x00002aaaab1f1269 in ompi_datatype_destroy (type=0x7fffffffcf10)
    at ompi_datatype_create.c:90
#8  0x00002aaaab2c3d67 in PMPI_Type_free (type=0x7fffffffcf10) at ptype_free.c:60
#9  0x00002aaaaaf1fdff in ompi_type_free_f (type=0x7fffffffcf6c, 
    ierr=0x7fffffffcf68) at type_free_f.c:76
#10 0x00002aaaaaf1fdcf in mpi_type_free_ (type=0x7fffffffcf6c, 
    ierr=0x7fffffffcf68) at type_free_f.c:56
#11 0x0000000000400dde in mpi_type_dup_fn_mpifh () at type_dup_fn_mpifh.f90:22
#12 0x0000000000400e58 in main (argc=1, argv=0x7fffffffd5e5)
    at type_dup_fn_mpifh.f90:27
#13 0x0000003fed21ed1d in __libc_start_main () from /lib64/libc.so.6
#14 0x0000000000400c29 in _start ()

I notice that a free() is failing because it appears to be freeing datatype->ptypes, which appears to be a non-malloc'ed pointer somehow:

83              free(datatype->ptypes);
(gdb) p datatype->ptypes
$1 = (size_t *) 0x2aaaab516000 <__compound_literal.2>

@bosilca I tried to dig into this but couldn't figure out where ptypes came from...

@jsquyres
Copy link
Member Author

@ggouaillardet @bosilca git bisect shows that this problem originated from commit 7c938f0, just a few days ago ("opal/datatype: plug a memory leak in opal_datatype_t destructor").

Can you please investigate?

@jsquyres jsquyres assigned bosilca and ggouaillardet and unassigned bosilca Jan 31, 2019
@ggouaillardet
Copy link
Contributor

@jsquyres will do !

ggouaillardet added a commit to ggouaillardet/ompi that referenced this issue Feb 1, 2019
Reset ptypes when cloning a datatype in order to prevent
a double free() in the opal_datatype_t destructor.

This fixes a bug introduced in open-mpi/ompi@7c938f0

Fixes open-mpi#6346

Signed-off-by: Gilles Gouaillardet <[email protected]>
@ggouaillardet
Copy link
Contributor

@jsquyres thanks for the report, this is fixed in #6348

FWIW, the C test uses MPI_INT but the Fortran test uses MPI_INTEGER.
ompi_mpi_int.dt.super.ptypes is NULL, but ompi_mpi_integer.dt.super.ptypes is not, hence the crash in Fortran but not in C

ggouaillardet added a commit to jsquyres/ompi that referenced this issue Feb 1, 2019
Reset ptypes when cloning a datatype in order to prevent
a double free() in the opal_datatype_t destructor.

This fixes a bug introduced in open-mpi/ompi@7c938f0

Fixes open-mpi#6346

Signed-off-by: Gilles Gouaillardet <[email protected]>

(cherry picked from commit open-mpi/ompi@b395342)
hppritcha pushed a commit to hppritcha/ompi that referenced this issue Mar 19, 2019
Reset ptypes when cloning a datatype in order to prevent
a double free() in the opal_datatype_t destructor.

This fixes a bug introduced in open-mpi/ompi@7c938f0

Fixes open-mpi#6346

Signed-off-by: Gilles Gouaillardet <[email protected]>
bosilca pushed a commit to bosilca/ompi that referenced this issue Sep 13, 2019
Reset ptypes when cloning a datatype in order to prevent
a double free() in the opal_datatype_t destructor.

This fixes a bug introduced in open-mpi/ompi@7c938f0

Fixes open-mpi#6346

Signed-off-by: Gilles Gouaillardet <[email protected]>

(cherry picked from commit open-mpi/ompi@b395342)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants