-
Notifications
You must be signed in to change notification settings - Fork 900
Backport/3.1.x/6952 #6978
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backport/3.1.x/6952 #6978
Conversation
Reset ptypes when cloning a datatype in order to prevent a double free() in the opal_datatype_t destructor. This fixes a bug introduced in open-mpi/ompi@7c938f0 Fixes open-mpi#6346 Signed-off-by: Gilles Gouaillardet <[email protected]> (cherry picked from commit open-mpi/ompi@b395342)
…-destructor-leak v4.0.x: opal/datatype: plug a memory leak in opal_datatype_t destructor
Always use size_t (instead of converting to an uint32_t) in order to correctly support large datatypes. Thanks Ben Menadue for the initial bug report Refs open-mpi#6016 Signed-off-by: Gilles Gouaillardet <[email protected]>
Signed-off-by: George Bosilca <[email protected]>
Move toward a base type of vector (count, type, blocklen, extent, disp) with disp and extent applying toward the count repertition and blocklen being a contiguous memory of type type. Implement 2 optimizations on this description used during type_commit: - collapse: successive similar datatype descriptions are collapsed together with an increased count. - fusion: fuse successive datatype descriptions in order to minimize the number of resulting memcpy during pack/unpack. Fixes at the OMPI datatype level including: - Fix the create_hindexed and vector creation. - Fix the handling of [get|set]_elements and _count. - Correctly compute the dispacement for block indexed types. - Support the MPI_LB and MPI_UB deprecation, aka. OMPI_ENABLE_MPI1_COMPAT. Signed-off-by: George Bosilca <[email protected]>
Update the comments to better reflect what is going on. Minor indentations. Signed-off-by: George Bosilca <[email protected]>
Merge contiguous iov in order to minimize the number of returned iovec. Signed-off-by: George Bosilca <[email protected]>
Rework the to_self test to be able to be used as a benchmark. Signed-off-by: George Bosilca <[email protected]>
- optimize handling of contiguous with gaps datatypes. - fixes a performance issue for all datatypes with a count of 1. - optimize the pack/unpack of contiguous with gaps datatype. - optimize the case of blocklen == 1 Signed-off-by: George Bosilca <[email protected]>
Signed-off-by: George Bosilca <[email protected]>
Upon detecting a datatype loop representation skip the entire loop according the the remaining space. Signed-off-by: George Bosilca <[email protected]>
Optimize contiguous loops by collapsing them into a single element. During datatype optimization collapse similar elements into larger blocks. Signed-off-by: George Bosilca <[email protected]>
Amazing how a bad instruction scheduling can have such a drastic impact on the code performance. With this change, the get a boost of at least 50% on the performance of data with a small blocklen and/or count. Signed-off-by: George Bosilca <[email protected]>
Start optimizing the code. This commit divides the operations in 2 parts, the first, outside the critical part, deals with partial blocks of predefined elements, and the second, inside the critical path, only deals with full blocks of elements. This reduces the number of expensive operations in the critical path and results in a decent performance increase. Signed-off-by: George Bosilca <[email protected]>
Fixes the convertor iovec description on the MPI-IO reported by Edgar. Signed-off-by: George Bosilca <[email protected]>
Signed-off-by: George Bosilca <[email protected]>
This patch fixes the merge of contiguous elements into larger but more compact datatypes, and allows for contiguous elements to have thir blocklen increasing instead of the count. The idea is to always maximize the blocklen, aka. the contiguous part of the datatype. Signed-off-by: George Bosilca <[email protected]> (cherry picked from commit 41e6f55)
Signed-off-by: George Bosilca <[email protected]>
@bosilca We agree: many thanks for the effort, but let's not take this to v3.1.x. Can you give us a README-worthy description of what is broken? I.e., if we don't fix this in v3.0.x / v3.1.x, we at least want to describe the corner case that doesn't work (and therefore the user should upgrade to v4.0.x). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per @bosilca's recommendation, put a negative review on here so that we do not accidentally accept this PR. Leaving it open just so that we can get a README-worthy description of the bug that will not be fixed for v3.0.x and v3.1.x.
I don't know what bugs are or not fixed, because the datatype engine in master and 4.x is now drastically different that the one in 3.x. All the bugs the original patch fixed in master and 4.x, were the result of the new datatype engine that has not been yet backported in 3.x. The patch provided here would have done all that (bring the 3.x datatype engine in sync with the master and fix all known bugs), but as we discussed we will not get it in. So, the 3.x has the old datatype engine with its own set of bugs. |
Do not remove 0-extent instances while optimizing the datatype. This is a distillation of a larger set of DDT bug fixes and optimizations from master. George tried to cherry-pick these fixes+optimizations back to the v3.0.x/v3.1.x branches and it resulted in a giant list of heavily-modified cherry-picked commits (see open-mpi#6978). This did not seem like a good idea. George therefore distilled the critical bug fix down to this small commit, suitable for both the v3.0.x/v3.1.x branches. See open-mpi#7019 (comment) for details. Signed-off-by: Jeff Squyres <[email protected]>
Do not remove 0-extent instances while optimizing the datatype. This is a distillation of a larger set of DDT bug fixes and optimizations from master. George tried to cherry-pick these fixes+optimizations back to the v3.0.x/v3.1.x branches and it resulted in a giant list of heavily-modified cherry-picked commits (see open-mpi#6978). This did not seem like a good idea. George therefore distilled the critical bug fix down to this small commit, suitable for both the v3.0.x/v3.1.x branches. See open-mpi#7019 (comment) for details. Signed-off-by: Jeff Squyres <[email protected]> (cherry picked from commit b77ee56)
Per #7019 (comment), this PR has been replaced with #7023. |
This effort started as the backport of #6952 to the 3.1 branch, but due to a large number of conflicts it expanded to include everything necessary to bring the datatype engine of 3.1 in sync with 4.0 (and master).
I lightly tested it, and everything looks OK. However, what started as a simple backport became a gigantic patch and I certainly do not plan to spend more time on it. As such, I would advice against it's inclusion in the 3.1. Moving this functionality to 3.0 will be similarly complicated.