Skip to content

SPML/UCX: fixed hang in SHMEM_FINALIZE #6918

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 22, 2019

Conversation

hoopoepg
Copy link
Contributor

  • used MPI _Barrier to synchronize processes

- used MPI _Barrier to synchronize processes

Signed-off-by: Sergey Oblomov <[email protected]>
@hoopoepg hoopoepg force-pushed the topic/fixed-hand-on-shmem-finalize branch from 2fde838 to 182023f Compare August 21, 2019 09:05
ret = opal_common_ucx_del_procs_nofence(del_procs, nprocs, oshmem_my_proc_id(),
mca_spml_ucx.num_disconnect,
mca_spml_ucx_ctx_default.ucp_worker);
/* Do not barrier here - barrier is called in _shmem_finalize */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to barrier here - barrier is called in _shmem_finalize

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -323,6 +321,8 @@ int mca_spml_ucx_add_procs(ompi_proc_t** procs, size_t nprocs)
free(wk_roffs);

SPML_UCX_VERBOSE(50, "*** ADDED PROCS ***");

opal_common_ucx_mca_proc_added();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is detection of missing event handler. original call was from incorrect function

Signed-off-by: Sergey Oblomov <[email protected]>
@hoopoepg hoopoepg force-pushed the topic/fixed-hand-on-shmem-finalize branch from 42b33ad to 01dacaa Compare August 21, 2019 09:09
@yosefe
Copy link
Contributor

yosefe commented Aug 21, 2019

bot:retest

@hoopoepg
Copy link
Contributor Author

bot:retest

@hoopoepg
Copy link
Contributor Author

@yosefe ok to merge?

@yosefe yosefe merged commit 8933171 into open-mpi:master Aug 22, 2019
@yosefe
Copy link
Contributor

yosefe commented Aug 22, 2019

@hoopoepg pls port to 4.0.x

@yosefe yosefe deleted the topic/fixed-hand-on-shmem-finalize branch August 22, 2019 08:37
@jsquyres
Copy link
Member

This PR should not have been merged as it was -- the commit message message on the first commit directly contradicts the commit content.

Mistakes happen, but this particular pattern "make a commit, and then make another commit to fix the first commit" is a bit of a pet peeve of mine. Please make the final merged commits be good, high-quality commits whenever possible. The two commits on this PR should have been squashed before merging so that we didn't have to have a bad commit on master in the first place (which then gets propagated out to the release branches). Thanks.

@gpaulsen
Copy link
Member

Oh, sorry I should have caught that in review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants