Skip to content

common/ompio: implement pipelined file_iwrite and iread #11225

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

edgargabriel
Copy link
Member

Pipelined iread/iwrite operations require the notion of subrequests, i.e. a user level request can contain multiple internal subrequests that all have to complete before the user level operation is considered finished. This requires adjustments
to the internal ompio progress engine and data structures.

Note: this is purely just a pipelined algorithm, no overlap between different iterations.

@github-actions
Copy link

Hello! The Git Commit Checker CI bot found a few problems with this PR:

7979d17: common/ompio: implement pipelined file_iwrite and ...

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

@edgargabriel edgargabriel force-pushed the topic/ompio-pipeline-iread-iwrite branch from 7979d17 to 116f17c Compare December 19, 2022 21:57
@edgargabriel
Copy link
Member Author

the failure in the pull request build checker is unrelated:

java.io.IOException: No space left on device

@edgargabriel
Copy link
Member Author

bot:retest

@edgargabriel edgargabriel requested a review from bosilca December 22, 2022 22:25
@edgargabriel edgargabriel force-pushed the topic/ompio-pipeline-iread-iwrite branch from 116f17c to 85017ee Compare January 17, 2023 15:44
@edgargabriel
Copy link
Member Author

@devreal thank you for your review, I think I addressed all your comments, if you have chance, can you please re-review?

@edgargabriel edgargabriel force-pushed the topic/ompio-pipeline-iread-iwrite branch 2 times, most recently from 25251d9 to dea57a4 Compare January 19, 2023 22:04
@edgargabriel
Copy link
Member Author

@bosilca and @devreal I updated the pr to address the issues that you raised, if you have a chance can you please re-review? The multi-threading protection should work I think, but is not very nice, a better solution would probably require re-working some of the interfaces (especially fbtl), that could be for another pr however.

if (mca_common_ompio_progress_is_registered) {
OPAL_THREAD_UNLOCK (&mca_common_ompio_mutex);
return;
}
opal_progress_register (mca_common_ompio_progress);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a place where this callback is unregistered (I can't find it in this PR).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's also a good question, ompio so far has not unregistered the progress function again. My assumption was that an application is not just using one non-blocking operations, but will most likely do that throughout the lifetime of the application, and hence we don't necessarily want to constantly register/unregister the progress function. I can add removing the progress function if the internal request list is empty.

Are other components doing that (i.e. unregistering their progress function)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most components unregister their progress function during shutdown. That way you avoid getting called after shutdown.

Removing progress callbacks seems quite expensive (linear search and O(n) atomic swaps) so doing that frequently is not a good idea.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thanks, I can do the same. I will add the unregister functionality to the component shutdown function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use the register/unregister of progress in many places. Take a look at the non-blocking communicator creation (in communicator/comm_request.c) for a sample.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I suggest a two step plan here?

  • I've added the progress_unregister into the shutdown step of the component, and have done a fairly extensive round of testing with our internal testsuite, the ompio file I/O testsuite, and the hdf5 testsuite, and everything still seems to work. Since ompio so far has also not unregistered the progress function at runtime, this pr does not change the behavior of ompio (if anything, it improves it a bit by doing an unregister at all).

  • After this pr, I can investigate the impact of moving the unregister operation to request finalize operation, i.e. if there is no pending I/o request anymore, we remove the progress function. I will look both at the impact on file I/O operations, but also on communication operations. We can than revisit the decision based on the data, which might point in either direction. (It might be however 1-2 weeks until I get to this task)

Would this be a reasonable path forward?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me.

@edgargabriel edgargabriel force-pushed the topic/ompio-pipeline-iread-iwrite branch from dea57a4 to 927e2a8 Compare January 20, 2023 18:17
Pipelined iread/iwrite operations require the notion of subrequests, i.e. a user level request
can contain multiple internal subrequests that all have to complete before the user level operation
is considered finished. This requires adjustments
to the internal ompio progress engine and data structures.

Note: this is purely just a pipelined algorithm, no overlap between different iterations.

Signed-off-by: Edgar Gabriel <[email protected]>
@edgargabriel edgargabriel force-pushed the topic/ompio-pipeline-iread-iwrite branch from 927e2a8 to ff7458a Compare January 20, 2023 19:46
@edgargabriel
Copy link
Member Author

bot:retest

@edgargabriel edgargabriel merged commit 12857c3 into open-mpi:main Jan 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants