Skip to content

Set the refcnt to zero for new memory zones. #13089

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 12, 2025

Conversation

bosilca
Copy link
Member

@bosilca bosilca commented Feb 11, 2025

Thanks @ymeur for the report and the proposed fix.

Fixes #12892.

Thanks @ymeur for the report and the proposed fix.

Fixes open-mpi#12892.

Signed-off-by: George Bosilca <[email protected]>
Copy link
Contributor

@devreal devreal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with this patch but maybe we should attribute the author of #12892 in the commit? (sorry, can't tag the name here)

@bosilca
Copy link
Member Author

bosilca commented Feb 11, 2025

It is. What exactly do you mean ?

@devreal
Copy link
Contributor

devreal commented Feb 11, 2025

I meant a Signed-off-by or Co-authored-by line but we don't have an email address i guess

@MamziB
Copy link
Contributor

MamziB commented Feb 11, 2025

@bosilca @devreal @janjust
I took a look at the code today. When we attach we increment the ref count and when we detach, we decrease it. Therefore, we only need to set the ref count to 0 in the osc init. This PR is not needed, unless there is a specific benchmark that fails/has mem leak with this design.

@bosilca
Copy link
Member Author

bosilca commented Feb 11, 2025

@MamziB the original issue reference both the need for this fix and the rationale. This looks correct to me, we need to set the refcount to zero when we create a new memory zone, and this is exactly what we do here (we make room into the existing allocations array, and as a result this allocation will inherit the refcount of the previous one).

@MamziB
Copy link
Contributor

MamziB commented Feb 11, 2025

@bosilca Yes, I am aware of that issue.

We have a maximum of OMPI_OSC_UCX_ATTACH_MAX dynamic windows that can be created, and the local_dynamic_win_info array of this size keeps track of any window that is allocated.

During initialization, component_select ensures that all reference counts are set to zero. As a result, we no longer need to manually reset them. When a new window is created, we can be certain that the assigned slot among the OMPI_OSC_UCX_ATTACH_MAX slots is available—either because it was previously detached or has never been used. In both cases, the reference count must already be zero.

@bosilca
Copy link
Member Author

bosilca commented Feb 11, 2025

That's a different story. The two memmove are creating room in the local_dynamic_win_info array for the new zone to be inserted. In a location that was occupied, therefore with a non zero refcount. If we don't reset the refcount of the insert_index location, this new memory zone will inherit the refcount of the prior zone.

@MamziB
Copy link
Contributor

MamziB commented Feb 11, 2025

@bosilca Thanks for the explanation, I see your point now and this PR is indeed needed.

@bosilca bosilca merged commit a477b22 into open-mpi:main Feb 12, 2025
15 checks passed
@bosilca bosilca deleted the fix/ucx_dyn_window_refcnt branch February 12, 2025 05:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug report : osc_ucx_component.c openmpi 5.0.x : dynamic windows attach/detach
3 participants