Skip to content

Backport/3.1.x/6952 #6978

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 18 commits into from
Closed

Conversation

bosilca
Copy link
Member

@bosilca bosilca commented Sep 13, 2019

This effort started as the backport of #6952 to the 3.1 branch, but due to a large number of conflicts it expanded to include everything necessary to bring the datatype engine of 3.1 in sync with 4.0 (and master).

I lightly tested it, and everything looks OK. However, what started as a simple backport became a gigantic patch and I certainly do not plan to spend more time on it. As such, I would advice against it's inclusion in the 3.1. Moving this functionality to 3.0 will be similarly complicated.

ggouaillardet and others added 18 commits September 13, 2019 11:13
Reset ptypes when cloning a datatype in order to prevent
a double free() in the opal_datatype_t destructor.

This fixes a bug introduced in open-mpi/ompi@7c938f0

Fixes open-mpi#6346

Signed-off-by: Gilles Gouaillardet <[email protected]>

(cherry picked from commit open-mpi/ompi@b395342)
…-destructor-leak

v4.0.x: opal/datatype: plug a memory leak in opal_datatype_t destructor
Always use size_t (instead of converting to an uint32_t) in order to
correctly support large datatypes.

Thanks Ben Menadue for the initial bug report

Refs open-mpi#6016

Signed-off-by: Gilles Gouaillardet <[email protected]>
Signed-off-by: George Bosilca <[email protected]>
Move toward a base type of vector (count, type, blocklen, extent, disp)
with disp and extent applying toward the count repertition and blocklen
being a contiguous memory of type type.
Implement 2 optimizations on this description used during type_commit:
- collapse: successive similar datatype descriptions are collapsed
together with an increased count.
- fusion: fuse successive datatype descriptions in order to minimize the
number of resulting memcpy during pack/unpack.

Fixes at the OMPI datatype level including:
 - Fix the create_hindexed and vector creation.
 - Fix the handling of [get|set]_elements and _count.
 - Correctly compute the dispacement for block indexed types.
 - Support the MPI_LB and MPI_UB deprecation, aka. OMPI_ENABLE_MPI1_COMPAT.

Signed-off-by: George Bosilca <[email protected]>
Update the comments to better reflect what is going on.
Minor indentations.

Signed-off-by: George Bosilca <[email protected]>
Merge contiguous iov in order to minimize the number of returned iovec.

Signed-off-by: George Bosilca <[email protected]>
Rework the to_self test to be able to be used as a benchmark.

Signed-off-by: George Bosilca <[email protected]>
- optimize handling of contiguous with gaps datatypes.
- fixes a performance issue for all datatypes with a count of 1.
- optimize the pack/unpack of contiguous with gaps datatype.
- optimize the case of blocklen == 1

Signed-off-by: George Bosilca <[email protected]>
Signed-off-by: George Bosilca <[email protected]>
Upon detecting a datatype loop representation skip the entire loop
according the the remaining space.

Signed-off-by: George Bosilca <[email protected]>
Optimize contiguous loops by collapsing them into a single element.
During datatype optimization collapse similar elements into larger
blocks.

Signed-off-by: George Bosilca <[email protected]>
Amazing how a bad instruction scheduling can have such a drastic impact
on the code performance. With this change, the get a boost of at least
50% on the performance of data with a small blocklen and/or count.

Signed-off-by: George Bosilca <[email protected]>
Start optimizing the code.

This commit divides the operations in 2 parts, the first, outside the
critical part, deals with partial blocks of predefined elements, and the
second, inside the critical path, only deals with full blocks of
elements. This reduces the number of expensive operations in the
critical path and results in a decent performance increase.

Signed-off-by: George Bosilca <[email protected]>
Fixes the convertor iovec description on the MPI-IO reported by Edgar.

Signed-off-by: George Bosilca <[email protected]>
This patch fixes the merge of contiguous elements into larger but more
compact datatypes, and allows for contiguous elements to have thir
blocklen increasing instead of the count. The idea is to always maximize
the blocklen, aka. the contiguous part of the datatype.

Signed-off-by: George Bosilca <[email protected]>
(cherry picked from commit 41e6f55)
Signed-off-by: George Bosilca <[email protected]>
@jsquyres
Copy link
Member

@bosilca We agree: many thanks for the effort, but let's not take this to v3.1.x. Can you give us a README-worthy description of what is broken? I.e., if we don't fix this in v3.0.x / v3.1.x, we at least want to describe the corner case that doesn't work (and therefore the user should upgrade to v4.0.x).

@jsquyres jsquyres added the NEWS label Sep 16, 2019
Copy link
Member

@jsquyres jsquyres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per @bosilca's recommendation, put a negative review on here so that we do not accidentally accept this PR. Leaving it open just so that we can get a README-worthy description of the bug that will not be fixed for v3.0.x and v3.1.x.

@bosilca
Copy link
Member Author

bosilca commented Sep 17, 2019

I don't know what bugs are or not fixed, because the datatype engine in master and 4.x is now drastically different that the one in 3.x. All the bugs the original patch fixed in master and 4.x, were the result of the new datatype engine that has not been yet backported in 3.x. The patch provided here would have done all that (bring the 3.x datatype engine in sync with the master and fix all known bugs), but as we discussed we will not get it in. So, the 3.x has the old datatype engine with its own set of bugs.

jsquyres added a commit to jsquyres/ompi that referenced this pull request Sep 30, 2019
Do not remove 0-extent instances while optimizing the datatype.  This
is a distillation of a larger set of DDT bug fixes and optimizations
from master.  George tried to cherry-pick these fixes+optimizations
back to the v3.0.x/v3.1.x branches and it resulted in a giant list of
heavily-modified cherry-picked commits (see
open-mpi#6978).  This did not seem like a
good idea.

George therefore distilled the critical bug fix down to this small
commit, suitable for both the v3.0.x/v3.1.x branches.  See
open-mpi#7019 (comment)
for details.

Signed-off-by: Jeff Squyres <[email protected]>
jsquyres added a commit to jsquyres/ompi that referenced this pull request Sep 30, 2019
Do not remove 0-extent instances while optimizing the datatype.  This
is a distillation of a larger set of DDT bug fixes and optimizations
from master.  George tried to cherry-pick these fixes+optimizations
back to the v3.0.x/v3.1.x branches and it resulted in a giant list of
heavily-modified cherry-picked commits (see
open-mpi#6978).  This did not seem like a
good idea.

George therefore distilled the critical bug fix down to this small
commit, suitable for both the v3.0.x/v3.1.x branches.  See
open-mpi#7019 (comment)
for details.

Signed-off-by: Jeff Squyres <[email protected]>
(cherry picked from commit b77ee56)
@jsquyres
Copy link
Member

Per #7019 (comment), this PR has been replaced with #7023.

@jsquyres jsquyres closed this Sep 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants