Skip to content
This repository was archived by the owner on Sep 30, 2022. It is now read-only.

Topic/v1.10/coll fixes #1262

Merged
merged 11 commits into from
Jul 28, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
124 changes: 124 additions & 0 deletions ompi/mca/coll/base/README.memory_management
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
/* This comment applies to all collectives (including the basic
* module) where we allocate a temporary buffer. For the next few
* lines of code, it's tremendously complicated how we decided that
* this was the Right Thing to do. Sit back and enjoy. And prepare
* to have your mind warped. :-)
*
* Recall some definitions (I always get these backwards, so I'm
* going to put them here):
*
* extent: the length from the lower bound to the upper bound -- may
* be considerably larger than the buffer required to hold the data
* (or smaller! But it's easiest to think about when it's larger).
*
* true extent: the exact number of bytes required to hold the data
* in the layout pattern in the datatype.
*
* For example, consider the following buffer (just talking about
* true_lb, extent, and true extent -- extrapolate for true_ub:
*
* A B C
* --------------------------------------------------------
* | | |
* --------------------------------------------------------
*
* There are multiple cases:
*
* 1. A is what we give to MPI_Send (and friends), and A is where
* the data starts, and C is where the data ends. In this case:
*
* - extent: C-A
* - true extent: C-A
* - true_lb: 0
*
* A C
* --------------------------------------------------------
* | |
* --------------------------------------------------------
* <=======================extent=========================>
* <======================true extent=====================>
*
* 2. A is what we give to MPI_Send (and friends), B is where the
* data starts, and C is where the data ends. In this case:
*
* - extent: C-A
* - true extent: C-B
* - true_lb: positive
*
* A B C
* --------------------------------------------------------
* | | User buffer |
* --------------------------------------------------------
* <=======================extent=========================>
* <===============true extent=============>
*
* 3. B is what we give to MPI_Send (and friends), A is where the
* data starts, and C is where the data ends. In this case:
*
* - extent: C-A
* - true extent: C-A
* - true_lb: negative
*
* A B C
* --------------------------------------------------------
* | | User buffer |
* --------------------------------------------------------
* <=======================extent=========================>
* <======================true extent=====================>
*
* 4. MPI_BOTTOM is what we give to MPI_Send (and friends), B is
* where the data starts, and C is where the data ends. In this
* case:
*
* - extent: C-MPI_BOTTOM
* - true extent: C-B
* - true_lb: [potentially very large] positive
*
* MPI_BOTTOM B C
* --------------------------------------------------------
* | | User buffer |
* --------------------------------------------------------
* <=======================extent=========================>
* <===============true extent=============>
*
* So in all cases, for a temporary buffer, all we need to malloc()
* is a buffer of size true_extent. We therefore need to know two
* pointer values: what value to give to MPI_Send (and friends) and
* what value to give to free(), because they might not be the same.
*
* Clearly, what we give to free() is exactly what was returned from
* malloc(). That part is easy. :-)
*
* What we give to MPI_Send (and friends) is a bit more complicated.
* Let's take the 4 cases from above:
*
* 1. If A is what we give to MPI_Send and A is where the data
* starts, then clearly we give to MPI_Send what we got back from
* malloc().
*
* 2. If B is what we get back from malloc, but we give A to
* MPI_Send, then the buffer range [A,B) represents "dead space"
* -- no data will be put there. So it's safe to give B-true_lb to
* MPI_Send. More specifically, the true_lb is positive, so B-true_lb is
* actually A.
*
* 3. If A is what we get back from malloc, and B is what we give to
* MPI_Send, then the true_lb is negative, so A-true_lb will actually equal
* B.
*
* 4. Although this seems like the weirdest case, it's actually
* quite similar to case #2 -- the pointer we give to MPI_Send is
* smaller than the pointer we got back from malloc().
*
* Hence, in all cases, we give (return_from_malloc - true_lb) to MPI_Send.
*
* This works fine and dandy if we only have (count==1), which we
* rarely do. ;-) So we really need to allocate (true_extent +
* ((count - 1) * extent)) to get enough space for the rest. This may
* be more than is necessary, but it's ok.
*
* Simple, no? :-)
*
*/


38 changes: 18 additions & 20 deletions ompi/mca/coll/basic/coll_basic_allgather.c
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
* University of Stuttgart. All rights reserved.
* Copyright (c) 2004-2005 The Regents of the University of California.
* All rights reserved.
* Copyright (c) 2014 Research Organization for Information Science
* Copyright (c) 2014-2016 Research Organization for Information Science
* and Technology (RIST). All rights reserved.
* $COPYRIGHT$
*
Expand Down Expand Up @@ -91,9 +91,10 @@ mca_coll_basic_allgather_inter(void *sbuf, int scount,
struct ompi_communicator_t *comm,
mca_coll_base_module_t *module)
{
int rank, root = 0, size, rsize, err, i;
char *tmpbuf = NULL, *ptmp;
ptrdiff_t rlb, slb, rextent, sextent, incr;
int rank, root = 0, size, rsize, i, err;
char *tmpbuf_free = NULL, *tmpbuf, *ptmp;
ptrdiff_t rlb, rextent, incr;
ptrdiff_t gap, span;
ompi_request_t *req;
mca_coll_basic_module_t *basic_module = (mca_coll_basic_module_t*) module;
ompi_request_t **reqs = basic_module->mccb_reqs;
Expand All @@ -116,17 +117,13 @@ mca_coll_basic_allgather_inter(void *sbuf, int scount,
MCA_COLL_BASE_TAG_ALLGATHER,
MCA_PML_BASE_SEND_STANDARD, comm));
if (OMPI_SUCCESS != err) {
return err;
goto exit;
}
} else {
/* receive a msg. from all other procs. */
err = ompi_datatype_get_extent(rdtype, &rlb, &rextent);
if (OMPI_SUCCESS != err) {
return err;
}
err = ompi_datatype_get_extent(sdtype, &slb, &sextent);
if (OMPI_SUCCESS != err) {
return err;
goto exit;
}

/* Do a send-recv between the two root procs. to avoid deadlock */
Expand All @@ -135,14 +132,14 @@ mca_coll_basic_allgather_inter(void *sbuf, int scount,
MCA_PML_BASE_SEND_STANDARD,
comm, &reqs[rsize]));
if (OMPI_SUCCESS != err) {
return err;
goto exit;
}

err = MCA_PML_CALL(irecv(rbuf, rcount, rdtype, 0,
MCA_COLL_BASE_TAG_ALLGATHER, comm,
&reqs[0]));
if (OMPI_SUCCESS != err) {
return err;
goto exit;
}

incr = rextent * rcount;
Expand All @@ -152,20 +149,21 @@ mca_coll_basic_allgather_inter(void *sbuf, int scount,
MCA_COLL_BASE_TAG_ALLGATHER,
comm, &reqs[i]));
if (MPI_SUCCESS != err) {
return err;
goto exit;
}
}

err = ompi_request_wait_all(rsize + 1, reqs, MPI_STATUSES_IGNORE);
if (OMPI_SUCCESS != err) {
return err;
goto exit;
}

/* Step 2: exchange the resuts between the root processes */
tmpbuf = (char *) malloc(scount * size * sextent);
if (NULL == tmpbuf) {
return err;
}
span = opal_datatype_span(&sdtype->super, (int64_t)scount * (int64_t)size, &gap);
tmpbuf_free = (char *) malloc(span);
if (NULL == tmpbuf_free) { err = OMPI_ERR_OUT_OF_RESOURCE; goto exit; }

tmpbuf = tmpbuf_free - gap;

err = MCA_PML_CALL(isend(rbuf, rsize * rcount, rdtype, 0,
MCA_COLL_BASE_TAG_ALLGATHER,
Expand Down Expand Up @@ -222,8 +220,8 @@ mca_coll_basic_allgather_inter(void *sbuf, int scount,
}

exit:
if (NULL != tmpbuf) {
free(tmpbuf);
if (NULL != tmpbuf_free) {
free(tmpbuf_free);
}

return err;
Expand Down
16 changes: 7 additions & 9 deletions ompi/mca/coll/basic/coll_basic_allreduce.c
Original file line number Diff line number Diff line change
Expand Up @@ -78,8 +78,8 @@ mca_coll_basic_allreduce_inter(void *sbuf, void *rbuf, int count,
struct ompi_communicator_t *comm,
mca_coll_base_module_t *module)
{
int err, i, rank, root = 0, rsize;
ptrdiff_t lb, extent;
int err, i, rank, root = 0, rsize, line;
ptrdiff_t extent, dsize, gap;
char *tmpbuf = NULL, *pml_buffer = NULL;
ompi_request_t *req[2];
mca_coll_basic_module_t *basic_module = (mca_coll_basic_module_t*) module;
Expand All @@ -98,16 +98,14 @@ mca_coll_basic_allreduce_inter(void *sbuf, void *rbuf, int count,
* simultaniously. */
/*****************************************************************/
if (rank == root) {
err = ompi_datatype_get_extent(dtype, &lb, &extent);
err = ompi_datatype_type_extent(dtype, &extent);
if (OMPI_SUCCESS != err) {
return OMPI_ERROR;
}

tmpbuf = (char *) malloc(count * extent);
if (NULL == tmpbuf) {
return OMPI_ERR_OUT_OF_RESOURCE;
}
pml_buffer = tmpbuf - lb;
dsize = opal_datatype_span(&dtype->super, count, &gap);
tmpbuf = (char *) malloc(dsize);
if (NULL == tmpbuf) { err = OMPI_ERR_OUT_OF_RESOURCE; line = __LINE__; goto exit; }
pml_buffer = tmpbuf - gap;

/* Do a send-recv between the two root procs. to avoid deadlock */
err = MCA_PML_CALL(irecv(rbuf, count, dtype, 0,
Expand Down
23 changes: 14 additions & 9 deletions ompi/mca/coll/basic/coll_basic_alltoallw.c
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
* Copyright (c) 2013 Los Alamos National Security, LLC. All rights
* reserved.
* Copyright (c) 2013 FUJITSU LIMITED. All rights reserved.
* Copyright (c) 2014 Research Organization for Information Science
* Copyright (c) 2014-2016 Research Organization for Information Science
* and Technology (RIST). All rights reserved.
* Copyright (c) 2014 Cisco Systems, Inc. All rights reserved.
* $COPYRIGHT$
Expand Down Expand Up @@ -42,10 +42,10 @@ mca_coll_basic_alltoallw_intra_inplace(void *rbuf, int *rcounts, const int *rdis
mca_coll_base_module_t *module)
{
mca_coll_basic_module_t *basic_module = (mca_coll_basic_module_t*) module;
int i, j, size, rank, err=MPI_SUCCESS, max_size;
MPI_Request *preq;
char *tmp_buffer;
ptrdiff_t ext;
int i, j, size, rank, err = MPI_SUCCESS, max_size;
MPI_Request *preq, *reqs = NULL;
char *tmp_buffer, *save_buffer = NULL;
ptrdiff_t ext, gap;

/* Initialize. */

Expand All @@ -59,17 +59,17 @@ mca_coll_basic_alltoallw_intra_inplace(void *rbuf, int *rcounts, const int *rdis

/* Find the largest receive amount */
for (i = 0, max_size = 0 ; i < size ; ++i) {
ompi_datatype_type_extent (rdtypes[i], &ext);
ext *= rcounts[i];
ext = opal_datatype_span(&rdtypes[i]->super, rcounts[i], &gap);

max_size = ext > max_size ? ext : max_size;
}

/* Allocate a temporary buffer */
tmp_buffer = calloc (max_size, 1);
tmp_buffer = save_buffer = calloc (max_size, 1);
if (NULL == tmp_buffer) {
return OMPI_ERR_OUT_OF_RESOURCE;
}
tmp_buffer -= gap;

/* in-place alltoallw slow algorithm (but works) */
for (i = 0 ; i < size ; ++i) {
Expand Down Expand Up @@ -129,7 +129,12 @@ mca_coll_basic_alltoallw_intra_inplace(void *rbuf, int *rcounts, const int *rdis

error_hndl:
/* Free the temporary buffer */
free (tmp_buffer);
free (save_buffer);
if( MPI_SUCCESS != err ) { /* Free the requests. */
if( NULL != reqs ) {
mca_coll_basic_free_reqs(basic_module->mccb_reqs, 2);
}
}

/* All done */

Expand Down
11 changes: 5 additions & 6 deletions ompi/mca/coll/basic/coll_basic_exscan.c
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ mca_coll_basic_exscan_intra(void *sbuf, void *rbuf, int count,
mca_coll_base_module_t *module)
{
int size, rank, err;
ptrdiff_t true_lb, true_extent, lb, extent;
ptrdiff_t dsize, gap;
char *free_buffer = NULL;
char *reduce_buffer = NULL;

Expand Down Expand Up @@ -81,15 +81,14 @@ mca_coll_basic_exscan_intra(void *sbuf, void *rbuf, int count,

/* Get a temporary buffer to perform the reduction into. Rationale
* for malloc'ing this size is provided in coll_basic_reduce.c. */
ompi_datatype_get_extent(dtype, &lb, &extent);
ompi_datatype_get_true_extent(dtype, &true_lb, &true_extent);
dsize = opal_datatype_span(&dtype->super, count, &gap);

free_buffer = (char*)malloc(true_extent + (count - 1) * extent);
free_buffer = (char*)malloc(dsize);
if (NULL == free_buffer) {
return OMPI_ERR_OUT_OF_RESOURCE;
}
reduce_buffer = free_buffer - lb;
err = ompi_datatype_copy_content_same_ddt(dtype, count,
reduce_buffer = free_buffer - gap;
err = ompi_datatype_copy_content_same_ddt(dtype, count,
reduce_buffer, (char*)sbuf);

/* Receive the reduced value from the prior rank */
Expand Down
Loading