-
Notifications
You must be signed in to change notification settings - Fork 901
common/ompio: implement pipelined file_iwrite and iread #11225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
common/ompio: implement pipelined file_iwrite and iread #11225
Conversation
Hello! The Git Commit Checker CI bot found a few problems with this PR: 7979d17: common/ompio: implement pipelined file_iwrite and ...
Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks! |
7979d17
to
116f17c
Compare
the failure in the pull request build checker is unrelated:
|
bot:retest |
116f17c
to
85017ee
Compare
@devreal thank you for your review, I think I addressed all your comments, if you have chance, can you please re-review? |
25251d9
to
dea57a4
Compare
@bosilca and @devreal I updated the pr to address the issues that you raised, if you have a chance can you please re-review? The multi-threading protection should work I think, but is not very nice, a better solution would probably require re-working some of the interfaces (especially fbtl), that could be for another pr however. |
if (mca_common_ompio_progress_is_registered) { | ||
OPAL_THREAD_UNLOCK (&mca_common_ompio_mutex); | ||
return; | ||
} | ||
opal_progress_register (mca_common_ompio_progress); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a place where this callback is unregistered (I can't find it in this PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's also a good question, ompio so far has not unregistered the progress function again. My assumption was that an application is not just using one non-blocking operations, but will most likely do that throughout the lifetime of the application, and hence we don't necessarily want to constantly register/unregister the progress function. I can add removing the progress function if the internal request list is empty.
Are other components doing that (i.e. unregistering their progress function)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most components unregister their progress function during shutdown. That way you avoid getting called after shutdown.
Removing progress callbacks seems quite expensive (linear search and O(n) atomic swaps) so doing that frequently is not a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, thanks, I can do the same. I will add the unregister functionality to the component shutdown function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use the register/unregister of progress in many places. Take a look at the non-blocking communicator creation (in communicator/comm_request.c) for a sample.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I suggest a two step plan here?
-
I've added the progress_unregister into the shutdown step of the component, and have done a fairly extensive round of testing with our internal testsuite, the ompio file I/O testsuite, and the hdf5 testsuite, and everything still seems to work. Since ompio so far has also not unregistered the progress function at runtime, this pr does not change the behavior of ompio (if anything, it improves it a bit by doing an unregister at all).
-
After this pr, I can investigate the impact of moving the unregister operation to request finalize operation, i.e. if there is no pending I/o request anymore, we remove the progress function. I will look both at the impact on file I/O operations, but also on communication operations. We can than revisit the decision based on the data, which might point in either direction. (It might be however 1-2 weeks until I get to this task)
Would this be a reasonable path forward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me.
dea57a4
to
927e2a8
Compare
Pipelined iread/iwrite operations require the notion of subrequests, i.e. a user level request can contain multiple internal subrequests that all have to complete before the user level operation is considered finished. This requires adjustments to the internal ompio progress engine and data structures. Note: this is purely just a pipelined algorithm, no overlap between different iterations. Signed-off-by: Edgar Gabriel <[email protected]>
927e2a8
to
ff7458a
Compare
bot:retest |
Pipelined iread/iwrite operations require the notion of subrequests, i.e. a user level request can contain multiple internal subrequests that all have to complete before the user level operation is considered finished. This requires adjustments
to the internal ompio progress engine and data structures.
Note: this is purely just a pipelined algorithm, no overlap between different iterations.