Skip to content

ofi/common: fix code that broke sessions #12870

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 23, 2024

Conversation

hppritcha
Copy link
Member

With sessions initialization model (section 11.3 of MPI 4 standard) MPI may be initialized and finalized any number of times.

This patch refactors code that was assuming a one shot init/finalize sequence for initializing Open MPI and its MCA param space

The underlying problem with the replaced code was that when an app calls MPI_Session_finalize and there are no more sessions active, the MCA param space is destroyed. So if one does not build Open MPI to use dynamically load components, and a component is using static variables in a way that assumes the MCA param space is always preserved if a static variable is set to some value, then things break if a subsequent MPI_Session_init is invoked.

Related to #12869

With sessions initialization model (section 11.3 of MPI 4 standard)
MPI may be initialized and finalized any number of times.

This patch refactors code that was assuming a one shot init/finalize
sequence for initializing Open MPI and its MCA param space

The underlying problem with the replaced code was that when an app calls MPI_Session_finalize
and there are no more sessions active, the MCA param space is
destroyed.  So if one does not build Open MPI to use dynamically load components,
and a component is using static variables in a way that assumes the MCA param
space is always preserved if a static variable is set to some value,
then things break if a subsequent MPI_Session_init is invoked.

Related to open-mpi#12869

Signed-off-by: Howard Pritchard <[email protected]>
@hppritcha hppritcha force-pushed the fix_common_ofi_for_sessions branch from a9a334d to 2f4ba1b Compare October 18, 2024 20:15
@hppritcha hppritcha requested review from jsquyres, a team, cpshereda and edgargabriel and removed request for a team, jsquyres and cpshereda October 18, 2024 20:20
}

if (0 > accelerator_rank_index) {

param = mca_base_var_find("opal", "opal_common", "ofi", "accelerator_rank");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good, there is a case that could be made that this should not be an ofi parameter, but an mca parameter of the accelerator framework (for a later PR)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good point. I was just keeping things the way they were. I've no objection to moving it in the end to accel framework.

@hppritcha hppritcha merged commit c393881 into open-mpi:main Oct 23, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants