Skip to content

Conversation

ggouaillardet
Copy link
Contributor

No description provided.

so usock_peer_create_socket know it must re-create the socket
/* assuming it is ever supposed to occur */
also fix a typo (peer->sd >= 0) in usock_peer_create_socket
@ggouaillardet
Copy link
Contributor Author

@rhc54 can you please have a look at this ?
the issue can be evidenced with loop_spawn from the ibm test suite.

mpirun -np 1 ./loop_spawn

with current master, a SIGPIPE occurs when mpirun tries to write to a peer that has already gone (and hence the socket has been closed)
/* not sure that should occur ... */
the first commit fixes that (plus a typo)

then we run into an other issue, when mpirun tries to write to a peer that has already gone,
it ends up sending the message to itself (!)
the second commit fixes that by simply discarding these messages instead of sending them.
the second commit might make the first one useless (except for the typo ...)

bottom line, there is a real issue, and i am not convinced i fixed that in a proper way

@rhc54
Copy link
Contributor

rhc54 commented Apr 4, 2016

I think it looks okay - thanks!

@rhc54
Copy link
Contributor

rhc54 commented Apr 4, 2016

BTW: this needs to go over to 2.x as well, when you get a chance

@rhc54 rhc54 merged commit 74293bc into open-mpi:master Apr 4, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants