Skip to content

osc/ucx: Fix bugs with dynamic windows #11095

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 24, 2022

Conversation

s417-lama
Copy link
Contributor

This PR contains three small bug fixes for dynamic windows in osc/ucx.

The first two fixes are for bugs introduced in this PR #10709 :

  • The get_dynamic_win_info() function always returned OMPI_SUCCESS even if it caused an error in the middle, which hid the next error below
  • OSC_UCX_STATE_DYNAMIC_LOCK_OFFSET was introduced but no actual dynamic_lock member was added, which resulted in corrupted dynamic_win_count value

The third issue is independent of the above PR, but it causes an error when we call MPI_Rget or MPI_Rput for dynamic windows.
As the remote_addr value for rget/rput was wrong, the dynamic window check always failed with rget/rput.
I fixed this issue by simply removing dynamic window check for rget/rput, because the request is eventually forwarded to ompi_osc_ucx_get/put, which correctly checks dynamic window status anyway.

@ompiteam-bot
Copy link

Can one of the admins verify this patch?

@github-actions
Copy link

Hello! The Git Commit Checker CI bot found a few problems with this PR:

f988411: Remove unnecessary (and incorrect) dynamic window ...

  • check_signed_off: does not contain a valid Signed-off-by line

23ea49c: Add missing dynamic_lock member in ompi_osc_ucx_st...

  • check_signed_off: does not contain a valid Signed-off-by line

d5cc3ba: Correctly return error codes in get_dynamic_win_in...

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

@s417-lama
Copy link
Contributor Author

Sorry to mess it up. I added signed-off-by:.

@jsquyres
Copy link
Member

ok to test

@jsquyres
Copy link
Member

FYI @open-mpi/ucx team

@janjust janjust requested review from devreal and janjust November 21, 2022 12:24
@janjust
Copy link
Contributor

janjust commented Nov 24, 2022

@s417-lama cam you please open up the v5.0 of this PR

@janjust janjust merged commit 3808f3d into open-mpi:main Nov 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants