Skip to content

fcoll/two_phase: fix coverity errors #963

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 2, 2015
Merged

Conversation

hjelmn
Copy link
Member

@hjelmn hjelmn commented Oct 1, 2015

Fixes CIDs 72300, 72344, 1196764-1196768, 72300: Resource leaks

Mulitple allocated arrays are going out of scope at the end of
mca_fcoll_two_phase_file_write_all. Free these arrays. Also removed
the extraneous NULL checks since free (NULL) is safe in C.

Change returns to goto exit where the allocated resources are freed.

Fixes CIDs 72285-72292, 72297, 72298: Resource leaks

Change all appropriate return statements to goto exit to ensure that
all resources are freed. Also removed the NULL checks since free
(NULL) is safe in C.

Fixes CIDs 72295, 72296: Resource leaks

Moved free of requests and recv_types to after exit label. This will
ensure these are freed on error.

Also added a loop and statement to free send_buf which is going out of
scope at the end of the function.

Fixes CIDs 72336-72240, 735197, 735198: Resource leaks

Moved the exit label before to before the resources are released and
changed all appropriate return statements to goto exit. Also removed
extraneous NULL checks because free (NULL) is safe in C.

Fixes CIDs 72341, 72343, 1196805-1196809: Resource leaks

Free all resources after exit label and change return statements to
goto exit to ensure all resources are freed on error.

Fixes CID 1269973: Unused value

Check return code of ompi_request_wait_all. If it fails jump to the
exit.

Fixes CID 714119: Dereference before NULL check

Wrong value checked in conditional.

Signed-off-by: Nathan Hjelm [email protected]

@hjelmn
Copy link
Member Author

hjelmn commented Oct 1, 2015

@edgargabriel Please review. This fixes a large number of issues identified by coverity. There are others that remain but the fixes are not as straight-forward.

@hjelmn
Copy link
Member Author

hjelmn commented Oct 1, 2015

Fixed typos.

@lanl-ompi
Copy link
Contributor

Test FAILed.

@edgargabriel
Copy link
Member

What is the reason that you removed all the checks whether the pointer is a NULL pointer before freeing the memory for all variables? I do not see any benefit on that, and in fact, I am pretty sure e.g. in the dynamic segmentation algorithm they required for correctness inside the while loop.

@hjelmn
Copy link
Member Author

hjelmn commented Oct 1, 2015

free (NULL) is completely safe in C. free checks for NULL already and returns if the pointer is NULL. With the check before free you end up checking the conditional twice for the common case (non-NULL pointer).

@edgargabriel
Copy link
Member

ok, grabbing a copy of your repository and testing whether our tests still work.

@edgargabriel
Copy link
Member

the changes to dynamic module seem to be ok, the two_phase module generates now however a segfault in a free statement.

gabriel@salmon:~/ompi-tests/mpi2basic_tests/file> mpirun --mca fcoll two_phase -np 6 ./filetest
Checking for MPI_File_open:
using MPI_MODE_CREATE..................working
using MPI_MODE_WRONLY..................working
using MPI_MODE_RDONLY..................working
using MPI_MODE_RDWR....................working
using MPI_MODE_APPEND..................working
using MPI_MODE_EXCL....................working
using MPI_MODE_DELETE_ON_CLOSE.........working
Checking for MPI_File_close................working
Checking for MPI_File_delete...............working
Checking for MPI_File_write:
using single proc no fview.............working
verifying status fields................working
opening file in append mode............working
writing a large file...................working
[salmon:07724] *** Process received signal ***
[salmon:07724] Signal: Segmentation fault (11)
[salmon:07724] Signal code: Address not mapped (1)
[salmon:07724] Failing at address: 0xfffffffffffffff9
[salmon:07724] [ 0] /lib64/libpthread.so.0(+0xf890)[0x7f61c8306890]
[salmon:07724] [ 1] /lib64/libc.so.6(cfree+0x14)[0x7f61c7fcbd34]
[salmon:07724] [ 2] /home/gabriel/OpenMPI/lib64/libopen-pal.so.0(opal_free+0x1f)[0x7f61c79aee9d]
[salmon:07724] [ 3] /home/gabriel/OpenMPI/lib64/openmpi/mca_fcoll_two_phase.so(mca_fcoll_two_phase_file_write_all+0xc04)[0x7f61afdf9264]
[salmon:07724] [ 4] /home/gabriel/OpenMPI/lib64/openmpi/mca_io_ompio.so(ompio_io_ompio_file_write_at_all+0x70)[0x7f61bc5e5b4d]
[salmon:07724] [ 5] /home/gabriel/OpenMPI/lib64/openmpi/mca_io_ompio.so(mca_io_ompio_file_write_at_all+0x5a)[0x7f61bc5e5a29]
[salmon:07724] [ 6] /home/gabriel/OpenMPI/lib64/libmpi.so.0(PMPI_File_write_at_all+0x188)[0x7f61c85dbbdd]
[salmon:07724] [ 7] ./filetest[0x4093e9]
[salmon:07724] [ 8] ./filetest[0x40230a]
[salmon:07724] [ 9] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f61c7f71b05]
[salmon:07724] [10] ./filetest[0x402199]

[salmon:07724] *** End of error message ***

mpirun noticed that process rank 0 with PID 0 on node salmon exited on signal 11 (Segmentation fault).

@hjelmn
Copy link
Member Author

hjelmn commented Oct 1, 2015

@edgargabriel Ok, I will double-check. Thanks for testing. I will let you know when I have determined what I did wrong.

@edgargabriel
Copy link
Member

I can also check, but it might be next week before I get to it. Thanks!

hjelmn added 2 commits October 1, 2015 14:38
Fixes CIDs 72300, 72344, 1196764-1196768, 72300: Resource leaks

Mulitple allocated arrays are going out of scope at the end of
mca_fcoll_two_phase_file_write_all. Free these arrays. Also removed
the extraneous NULL checks since free (NULL) is safe in C.

Change returns to goto exit where the allocated resources are freed.

Fixes CIDs 72285-72292, 72297, 72298: Resource leaks

Change all appropriate return statements to goto exit to ensure that
all resources are freed. Also removed the NULL checks since free
(NULL) is safe in C.

Fixes CIDs 72295, 72296: Resource leaks

Moved free of requests and recv_types to after exit label. This will
ensure these are freed on error.

Also added a loop and statement to free send_buf which is going out of
scope at the end of the function.

Fixes CIDs 72336-72240, 735197, 735198: Resource leaks

Moved the exit label before to before the resources are released and
changed all appropriate return statements to goto exit. Also removed
extraneous NULL checks because free (NULL) is safe in C.

Fixes CIDs 72341, 72343, 1196805-1196809: Resource leaks

Free all resources after exit label and change return statements to
goto exit to ensure all resources are freed on error.

Fixes CID 1269973: Unused value

Check return code of ompi_request_wait_all. If it fails jump to the
exit.

Fixes CID 714119: Dereference before NULL check

Wrong value checked in conditional.

Signed-off-by: Nathan Hjelm <[email protected]>
Fixes CID 72320: Explicit NULL dereferenced

On error it is possible that the blocklen_per_process array is
NULL. Change the NULL check before the free to check for non-NULL on
the array not the array element. Also clean up allocation of this
array to use calloc instead of malloc + setting each element to NULL.

Signed-off-by: Nathan Hjelm <[email protected]>
@hjelmn
Copy link
Member Author

hjelmn commented Oct 1, 2015

Found a typo in one of the changes. Fixed that error. Now I see another error but that also happens without any of my changes.

@edgargabriel
Copy link
Member

you mean the error in the append test? I see that too, not sure what is triggering that, will check. It used to pass.

@hjelmn
Copy link
Member Author

hjelmn commented Oct 1, 2015

Seeing the error in the write shared test on my mac:

Checking for MPI_File_write_ordered........false
Process 42213 stopped
* thread #1: tid = 0xdaf8d, 0x0000000100051bc9 libmpi.0.dylib`ompi_request_default_wait_all(count=3, requests=0x00007fff5fbff170, statuses=0x00007fff5fbff190) + 105 at req_wait.c:233, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x1100000088)
    frame #0: 0x0000000100051bc9 libmpi.0.dylib`ompi_request_default_wait_all(count=3, requests=0x00007fff5fbff170, statuses=0x00007fff5fbff190) + 105 at req_wait.c:233
   230      for (i = 0; i < count; i++) {
   231          request = *rptr++;
   232 
-> 233          if (request->req_complete == true) {
   234              if( OPAL_UNLIKELY( MPI_SUCCESS != request->req_status.MPI_ERROR ) ) {
   235                  failed++;
   236              }
(lldb) bt
* thread #1: tid = 0xdaf8d, 0x0000000100051bc9 libmpi.0.dylib`ompi_request_default_wait_all(count=3, requests=0x00007fff5fbff170, statuses=0x00007fff5fbff190) + 105 at req_wait.c:233, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x1100000088)
  * frame #0: 0x0000000100051bc9 libmpi.0.dylib`ompi_request_default_wait_all(count=3, requests=0x00007fff5fbff170, statuses=0x00007fff5fbff190) + 105 at req_wait.c:233
    frame #1: 0x00000001000dd18c libmpi.0.dylib`MPI_Waitall(count=3, requests=0x00007fff5fbff170, statuses=0x00007fff5fbff190) + 444 at waitall.c:76
    frame #2: 0x000000010000c6a5 filetest`write_shared_test(comm=0x0000000100212a98, root=0) + 2917 at write_shared.c:228
    frame #3: 0x0000000100000a29 filetest`main(argc=1, argv=0x00007fff5fbff278) + 345 at filetest.c:84
    frame #4: 0x00007fff8265b5c9 libdyld.dylib`start + 1
    frame #5: 0x00007fff8265b5c9 libdyld.dylib`start + 1

@edgargabriel
Copy link
Member

so actually it works for me mostly, just changed the free to my_req_per_proc, correct? Which sharedfp component is selected in your case? sm or lockedfile (I don't have a mac to test unfortunately).

@edgargabriel
Copy link
Member

the write_shared test passes for both components on linux, so not sure where to start to look into that. can you post a complete traceback?

@hjelmn
Copy link
Member Author

hjelmn commented Oct 1, 2015

Output of rank 0 with -mca sharedfp_base_verbose 100: https://gist.github.com/hjelmn/9ffb5b42299058b89701

@hjelmn hjelmn added the bug label Oct 2, 2015
@hjelmn hjelmn added this to the Open MPI v2.0.0 milestone Oct 2, 2015
@edgargabriel
Copy link
Member

I looked at the file that you attached, and I think I have an idea on what might be going wrong on your system. But I will need somehow access to a MAC to test it :-( Anyway, it is not related to this PR, it has to do with creating the shared memory region through mmap inside of the sm component. So I think we should be good to go with this PR. I can open a new issue for the sm sharedfp component.

+1

hjelmn added a commit that referenced this pull request Oct 2, 2015
fcoll/two_phase: fix coverity errors
@hjelmn hjelmn merged commit eb79edf into open-mpi:master Oct 2, 2015
jsquyres added a commit to jsquyres/ompi that referenced this pull request Aug 23, 2016
Update the PMIx support to the PMIx 1.1.2 release.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants