Skip to content

Commit 2a6bcd3

Browse files
orivejjsquyres
authored andcommitted
Fix oob_tcp tcp_component_close segfault with active listeners
oob_tcp in non-HNP mode shares libevent event_base with oob_base [1]. orte_oob_base_close calls: (1) oob_tcp component_shutdown, then (2) opal_progress_thread_finalize, then (3) oob_tcp tcp_component_close [2]. opal_progress_thread_finalize calls tracker_destructor [3] that frees the event_base [4]. If any oob_tcp event listeners are active at this time, oob_tcp will crash trying to delete them at [5] [6]. This change moves oob_tcp event listener cleanup from component_close to component_shutdown so that it happens before the event_base is freed. [1] https://github.com/open-mpi/ompi/blob/v4.0.1/orte/mca/oob/tcp/oob_tcp_listener.c#L160 [2] https://github.com/open-mpi/ompi/blob/v4.0.1/orte/mca/oob/base/oob_base_frame.c#L95 [3] https://github.com/open-mpi/ompi/blob/v4.0.1/opal/runtime/opal_progress_threads.c#L232 [4] https://github.com/open-mpi/ompi/blob/v4.0.1/opal/runtime/opal_progress_threads.c#L65 [5] https://github.com/open-mpi/ompi/blob/v4.0.1/orte/mca/oob/tcp/oob_tcp_component.c#L192 [6] https://github.com/open-mpi/ompi/blob/v4.0.1/orte/mca/oob/tcp/oob_tcp_listener.c#L955 Signed-off-by: Orivej Desh <[email protected]> (cherry picked from commit 78b7e34)
1 parent 3ee28a5 commit 2a6bcd3

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

orte/mca/oob/tcp/oob_tcp_component.c

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -184,9 +184,6 @@ static int tcp_component_open(void)
184184
*/
185185
static int tcp_component_close(void)
186186
{
187-
/* cleanup listen event list */
188-
OPAL_LIST_DESTRUCT(&mca_oob_tcp_component.listeners);
189-
190187
OBJ_DESTRUCT(&mca_oob_tcp_component.peers);
191188

192189
if (NULL != mca_oob_tcp_component.ipv4conns) {
@@ -736,6 +733,9 @@ static void component_shutdown(void)
736733
(void **) &peer, node, &node);
737734
}
738735

736+
/* cleanup listen event list */
737+
OPAL_LIST_DESTRUCT(&mca_oob_tcp_component.listeners);
738+
739739
opal_output_verbose(2, orte_oob_base_framework.framework_output,
740740
"%s TCP SHUTDOWN done",
741741
ORTE_NAME_PRINT(ORTE_PROC_MY_NAME));

0 commit comments

Comments
 (0)