-
Notifications
You must be signed in to change notification settings - Fork 901
SPML/UCX: fixed hang in SHMEM_FINALIZE #6918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPML/UCX: fixed hang in SHMEM_FINALIZE #6918
Conversation
hoopoepg
commented
Aug 21, 2019
- used MPI _Barrier to synchronize processes
- used MPI _Barrier to synchronize processes Signed-off-by: Sergey Oblomov <[email protected]>
2fde838
to
182023f
Compare
oshmem/mca/spml/ucx/spml_ucx.c
Outdated
ret = opal_common_ucx_del_procs_nofence(del_procs, nprocs, oshmem_my_proc_id(), | ||
mca_spml_ucx.num_disconnect, | ||
mca_spml_ucx_ctx_default.ucp_worker); | ||
/* Do not barrier here - barrier is called in _shmem_finalize */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to barrier here - barrier is called in _shmem_finalize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -323,6 +321,8 @@ int mca_spml_ucx_add_procs(ompi_proc_t** procs, size_t nprocs) | |||
free(wk_roffs); | |||
|
|||
SPML_UCX_VERBOSE(50, "*** ADDED PROCS ***"); | |||
|
|||
opal_common_ucx_mca_proc_added(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is detection of missing event handler. original call was from incorrect function
Signed-off-by: Sergey Oblomov <[email protected]>
42b33ad
to
01dacaa
Compare
bot:retest |
bot:retest |
@yosefe ok to merge? |
@hoopoepg pls port to 4.0.x |
This PR should not have been merged as it was -- the commit message message on the first commit directly contradicts the commit content. Mistakes happen, but this particular pattern "make a commit, and then make another commit to fix the first commit" is a bit of a pet peeve of mine. Please make the final merged commits be good, high-quality commits whenever possible. The two commits on this PR should have been squashed before merging so that we didn't have to have a bad commit on master in the first place (which then gets propagated out to the release branches). Thanks. |
Oh, sorry I should have caught that in review. |