-
Notifications
You must be signed in to change notification settings - Fork 900
pml/ob1: fixed local handle sent during PUT control message #6556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pml/ob1: fixed local handle sent during PUT control message #6556
Conversation
In case of using a btl_put in ob1, the handle of the locally registered memory is sent with a PUT control message. In the current master code the sent handle is necessary the handle in the frag but if the handle has been successfully registered in the request, the frag structure does not have any valid handle and all fragments use the request one. I suggest to check if the handle in the fragment is valid and if not to send the handle from the request. Signed-off-by: Brelle Emmanuel <[email protected]>
Can one of the admins verify this patch? |
ok to test |
I think about the following correction for the problem that diff --git a/ompi/mca/pml/ob1/pml_ob1_recvreq.c b/ompi/mca/pml/ob1/pml_ob1_recvreq.c
index ea5fa6849d..3338c79f87 100644
--- a/ompi/mca/pml/ob1/pml_ob1_recvreq.c
+++ b/ompi/mca/pml/ob1/pml_ob1_recvreq.c
@@ -411,6 +411,7 @@ static int mca_pml_ob1_recv_request_put_frag (mca_pml_ob1_rdma_frag_t *frag)
mca_pml_ob1_rdma_hdr_t *hdr;
size_t reg_size;
int rc;
+ mca_btl_base_registration_handle_t *local_handle = NULL;
reg_size = bml_btl->btl->btl_registration_handle_size;
@@ -423,11 +424,33 @@ static int mca_pml_ob1_recv_request_put_frag (mca_pml_ob1_rdma_frag_t *frag)
}
ctl->des_cbfunc = mca_pml_ob1_recv_ctl_completion;
+ if(bml_btl->btl->btl_register_mem && NULL == frag->local_handle){
+ /* Check if the segment is already registered */
+ for (size_t i = 0 ; i < recvreq->req_rdma_cnt ; ++i) {
+ if (recvreq->req_rdma[i].bml_btl == frag->rdma_bml) {
+ /* do not copy the handle to the fragment to avoid deregistring it twice */
+ local_handle = recvreq->req_rdma[i].btl_reg;
+ break;
+ }
+ }
+ if (NULL == local_handle) {
+ /* Not already registered. Register the region with the BTL. */
+ mca_bml_base_register_mem (bml_btl, frag->local_address, frag->rdma_length, MCA_BTL_REG_FLAG_REMOTE_WRITE,
+ &frag->local_handle);
+ if (OPAL_UNLIKELY(NULL == frag->local_handle)) {
+ return OMPI_ERR_OUT_OF_RESOURCE;
+ }
+ local_handle = frag->local_handle;
+ }
+ }else{
+ local_handle = frag->local_handle;
+ }
+
/* fill in rdma header */
hdr = (mca_pml_ob1_rdma_hdr_t *) ctl->des_segments->seg_addr.pval;
mca_pml_ob1_rdma_hdr_prepare (hdr, (!recvreq->req_ack_sent) ? MCA_PML_OB1_HDR_TYPE_ACK : 0,
recvreq->remote_req_send.lval, frag, recvreq, frag->rdma_offset,
- frag->local_address, frag->rdma_length, frag->local_handle,
+ frag->local_address, frag->rdma_length, local_handle,
reg_size);
ob1_hdr_hton(hdr, MCA_PML_OB1_HDR_TYPE_PUT, proc);
@@ -1026,15 +1049,6 @@ int mca_pml_ob1_recv_request_schedule_once( mca_pml_ob1_recv_request_t* recvreq,
opal_convertor_get_current_pointer (&recvreq->req_recv.req_base.req_convertor, &data_ptr);
OPAL_THREAD_UNLOCK(&recvreq->lock);
- if (btl->btl_register_mem) {
- mca_bml_base_register_mem (bml_btl, data_ptr, size, MCA_BTL_REG_FLAG_REMOTE_WRITE,
- &frag->local_handle);
- if (OPAL_UNLIKELY(NULL == frag->local_handle)) {
- MCA_PML_OB1_RDMA_FRAG_RETURN(frag);
- continue;
- }
- }
-
/* fill in the minimum information needed to handle the fin message */
frag->cbfunc = mca_pml_ob1_put_completion;
frag->rdma_length = size;
diff --git a/ompi/mca/pml/ob1/pml_ob1_sendreq.c b/ompi/mca/pml/ob1/pml_ob1_sendreq.c
index 1626e13e35..2bad26fd7c 100644
--- a/ompi/mca/pml/ob1/pml_ob1_sendreq.c
+++ b/ompi/mca/pml/ob1/pml_ob1_sendreq.c
@@ -1174,7 +1174,7 @@ int mca_pml_ob1_send_request_put_frag( mca_pml_ob1_rdma_frag_t *frag )
}
}
- if (NULL == frag->local_handle) {
+ if (NULL == local_handle) {
/* Not already registered. Register the region with the BTL. */
mca_bml_base_register_mem (bml_btl, frag->local_address, frag->rdma_length, 0,
&frag->local_handle);
(END) |
I plan to review this in the next couple of days. |
@zuzuki, if I am not mistaking the mca_pml_ob1_recv_request_put_frag function may also be called after a rget control message (if the btl_get failed with the error OMPI_ERR_NOT_AVAILABLE ). In this case the memory may have already been registered and the handle would be stored in the request. |
@hjelmn You said you were going to have a look a look. Do you have the cycles to review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 This patch brings the PUT path in sync with the GET path.
This PR and #6621 should go in the all branches. |
In case of using a btl_put in ob1, the handle of the locally registered
memory is sent with a PUT control message. In the current master code
the sent handle is necessary the handle in the frag but if the handle
has been successfully registered in the request, the frag structure does
not have any valid handle and all fragments use the request one.
I suggest to check if the handle in the fragment is valid and if not to
send the handle from the request.
Signed-off-by: Brelle Emmanuel [email protected]