-
Notifications
You must be signed in to change notification settings - Fork 900
osc/rdma: adjustment on btl selection logic #9696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
osc/rdma: adjustment on btl selection logic #9696
Conversation
CI will fail because this PR depends on #9694 |
The IBM CI (GNU/Scale) build failed! Please review the log, linked below. Gist: https://gist.github.com/ed6555c1b3042a027157d39ff4589bde |
The IBM CI (XL) build failed! Please review the log, linked below. Gist: https://gist.github.com/86e0709aa974c58a53976db5cf22c439 |
The IBM CI (PGI) build failed! Please review the log, linked below. Gist: https://gist.github.com/23fe703dd200c2de96ce74ec90ba8dbd |
* use of registration-based RDMA (these BTLs will not be used) and will use | ||
* any remaining BTL. By default the BTLs used will be tcp and sm but any single | ||
* (or pair) of BTLs may be used. | ||
* This function is used when there ompi_osc_rdm_query_btls() failed to find |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
failed to find a single btl that can communicate with all peers and supports remote completion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! fixed in new revision.
} | ||
|
||
++btls_found; | ||
if (module) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know Nathan did this wrong, but we explicitly compare to NULL in Open MPI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! fixed in new revision.
|
||
OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_INFO, "found alternate btl %s", btls_to_use[i]); | ||
OSC_RDMA_VERBOSE(MCA_BASE_VERBOSE_INFO, "disabing btl's native support of RDMA and ATOMIC"); | ||
item->btl_module->btl_flags &= ~(MCA_BTL_FLAGS_RDMA | MCA_BTL_FLAGS_GET | MCA_BTL_FLAGS_PUT | MCA_BTL_FLAGS_ATOMIC_FOPS ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really not comfortable with this approach; we should not be modifying the BTL module's flags. Those are owned by the BTL itself. Why do we need to do this (rather than use our own state for capabilities)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is necessary because mca_am_rdma_init
will only fill the gaps, e.g. if a btl does not support put/get/atomics, mca_am_rdma_init
add the support. mca_am_rdma_init
will not overwrite btl's native support.
but what osc/rdma want is for an alternate btl to always use AM rdma, do not use its native put/get/atomics.
Maybe we can add an flag to mca_am_rdma_init
to force it overwrite a btl's native implementation of rdma/atomics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should go the flags to am_rdma_init route. I also don't know that we want to overwrite the btl's native implementation; remember that the PML is using the BTL at the same time.
I think we want a proxy btl for the one-sided to use, which is just the underlying BTL and all the AM structures. So less am_rdma_init() and more btl* btl_base_am_rdma_alloc(base_btl);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the advice. I will add a function btl_base_am_rdma_alloc
then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bwbarrett I tried to allocate a new btl module, but encountered an issue.
The newly created btls are stored in "module->selected_btls. Later when creating peer, osc/rdma need to pick a btl (from
module->selected_btls) for each peer. This is currently done by comparing pointer in
module->selected_btlswith pointer of btl registered with bml. Because the newly created btls are not registered with bml, osc/rdma cannot find a btl (and an endpoint) for the peer. (this happens in function
ompi_osc_rdma_peer_btl_endpoint`)
I wonder what is the best way to handle this issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we'd need to go the whole way and import into the BML if we go down that path. I sent a long email to you and Nathan about some alternatives that might be less costly. Let's run that to ground and come back to this discussion.
b334866
to
59a4528
Compare
The IBM CI (GNU/Scale) build failed! Please review the log, linked below. Gist: https://gist.github.com/4978a3cec5b53fa9b1b3a979a2ceb2f9 |
The IBM CI (XL) build failed! Please review the log, linked below. Gist: https://gist.github.com/09c1eb4b879d0c01b6243d77e637922b |
The IBM CI (PGI) build failed! Please review the log, linked below. Gist: https://gist.github.com/17465f4ecc97b9fe4cf924273b1608f8 |
|
bot:aws:retest |
bot:ibm:retest |
@gpaulsen I am not sure how to restart the ibm and mellanox CI. Can you help? |
bot:ibm:retest |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
@awlauria Thank you! |
This patch makes active message (AM) to not use get to implement put, because AM put based on get does not support remote completion. By making this change, active message put will support remote completion. Other active message operations: get, fop and cswap support remote completion by the nature of the completion. Therefore, this patch ensures active message RDMA/atomics always support remote completion. Signed-off-by: Wei Zhang <[email protected]>
This patch introduced a new struct ompi_osc_rdma_btl_wrapper_t, which contains all the information osc/rdma uses in a mca_btl_base_module_t object. With this change, a ompi_osc_rdma_btl_wrapper_t struct will be created when a mca_btl_base_module was selected, and stored in "selected_btl" osc/rdma then operates on an ompi_osc_rdma_btl_wrapper_t object instead of the mca_btl_base_module_t object directly. This change is a preparation for an upcoming change on btl selection logic. Signed-off-by: Wei Zhang <[email protected]>
This patch makes the following adjustments to the btl selection logic. First, when selecting a primary btl, this patch requires the primary btl to support remote completion. Second, when selecting alternate btls, this patch allowes any btl to be used as an alternate btl. Prior to this patch, only a list of pre-defined btls can be used as alternate btls. Finally, when a btl is used as an atlernate btl, this patch create a btl wrapper such that osc/rdma will always use active message RDMA/atomics on this alternate btl. The reason for these changes are: First, these changes ensured the selected btls of osc/rdma always support remote completion (because active message RDMA/atomics supports remote completion). Remote completion is essential for several key components of osc/rdma: the usage of cpu atomics to update peer's state, the usage of local leader to update peer's state and, its fence implementation. Therefore the assurance of btl's support of remote completion greatly simplified osc/rdma's implementation. Second, these changes eliminated the need to save and exchange more than 1 memory reigstration, because active message RDMA/atomic does not require explicit memory registration. Third, these changes ensured the correctness of atomic operations. When multiple alternate btls are used, atomicity cannot be guarenteed accross each btl's native implementation of atomics. Finally, these changes allowed more btls to be used as alternate btls, especially btl/self. Signed-off-by: Wei Zhang <[email protected]>
59a4528
to
b98bba5
Compare
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
This PR contain 3 commits:
first one update btl's active message so it always support remote completion
The other 2 update the btl selection logic in osc/rdma.