Skip to content

Allow explicit use of the Active Message RDMA interface #9904

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jan 26, 2022

Conversation

bwbarrett
Copy link
Member

The AM RDMA interface previously only could be used by updating a btl to fill in missing interfaces. However, it has the capability of providing stronger semantics than the underlying BTL (like providing remote completion even if the underlying BTL doesn't). Provide an external interface to the AM RDMA code that will be used by the RDMA OSC code when the underlying BTLs don't provide remote completion or the required put/get/atomic interfaces.

@bwbarrett
Copy link
Member Author

There's likely going to be a small conflict with #9903, but not sure which will get in first. Should be easy to resolve, and wanted to get this out in case @hjelmn had some time in the near future to have a look.

@bwbarrett bwbarrett force-pushed the bugfix/rdma-osc-am-refactor branch from 3b6e81a to 517424b Compare January 24, 2022 16:41
@bwbarrett
Copy link
Member Author

#9903 did, in fact, create some small conflicts that are resolved in the latest update.

@bwbarrett bwbarrett requested a review from gpaulsen January 24, 2022 20:32
@bwbarrett
Copy link
Member Author

bot:aws:retest (no idea; one of the test sets got killed by Jenkins for ?reasons?).

Remove the type-specific implementation of min in the btl
active message code and use opal_min instead.  While this loses
type safety in the general case, most compilers used in development
support C generics, so we'll still get type safety checks.

Signed-off-by: Brian Barrett <[email protected]>
Copy link
Member

@gpaulsen gpaulsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, here's what I found. Please comment and then I can approve.

Rename internal functions in the btl am-rdma code to be consistently
prefixed by am_rdma_.  Previously, there was a mixture of prefixes,
all of which were unnecessarily long for private functions.  There
is no external-visible change in this commit.

Signed-off-by: Brian Barrett <[email protected]>
Disable the underlying flush() call when using the am-rdma interfce.
The current code does not properly handle flushing between the
am-rdma and underlying code properly, so it is quite easy to end
up with flush not behaving the way users expect.  Since there
are currently no callers that rely on a btl having both flush and
am-rdma interfaces, explicitly disable the case for now.

Signed-off-by: Brian Barrett <[email protected]>
Add per-btl storage for the am-rdma interface, allowing the interface
to make one-time decisions about whether or not to use the underlying
put/get interfaces.  This is a necessary step in allowing users of the
am-rdma interface to require different semantics (completion,
registration, etc.) than what is natively provided by the BTL.

Signed-off-by: Brian Barrett <[email protected]>
Add a mode for the am-rdma interface where the existing BTL is
not updated, but the wrapper functions are exported directly
to the caller.  This interface allows for specifying required
flags (although REMOTE_COMPLETION is the only one supported today)
and disabling memory registration requirements for the returned
interface.

Signed-off-by: Brian Barrett <[email protected]>
The front end functions for am-rdma atomic operations only allow
a 4 or 8 byte atomic.  Add an assert on the progress function
that the received size is less than a uint8_t can hold before
assignment from a size_t.

Signed-off-by: Brian Barrett <[email protected]>
@bwbarrett bwbarrett force-pushed the bugfix/rdma-osc-am-refactor branch from 517424b to 341f164 Compare January 25, 2022 20:59
Copy link
Member

@gpaulsen gpaulsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, look great.

@bwbarrett bwbarrett merged commit 47eee4d into open-mpi:master Jan 26, 2022
@bwbarrett bwbarrett deleted the bugfix/rdma-osc-am-refactor branch January 26, 2022 18:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants