-
Notifications
You must be signed in to change notification settings - Fork 900
Export/ulfm to ompi5 #7740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Export/ulfm to ompi5 #7740
Conversation
Can one of the admins verify this patch? |
yikes - can you squash this down? |
7844e02
to
73b6fa4
Compare
It's my understanding that this PR is the OMPI side companion of openpmix/prrte#542 in PRRTE. |
cf886d5
to
38a392d
Compare
bot:retest |
PRRTE has requested that the Open MPI community inform them if we're taking this for v5.0 or if we'll hold it for post v5.0. Everyone, please review this PR, and we can discuss and make a decision on 5/26 web-ex. |
38a392d
to
3b1bd67
Compare
3b1bd67
to
22522b0
Compare
Is this PR still slated to make it into 5.0? |
yes. the conlicts will be removed asap. |
22522b0
to
99aa09b
Compare
99aa09b
to
ac66558
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of minor changes/comments/questions/missing braces. I cannot vouch for the conceptual correctness as I haven't tried to fully understand the algorithms (there is only so much time available) and I only quickly gazed over the build system and Fortran changes as I'm not familiar with either. Maybe someone else wants to look at these parts...
Thanks Joseph that's very thorough. I'll mark the items resolved as I go through it. |
…iew. 670301e7...69eb9cc5 Protect and warn against using fault-tolerance with malfunctionning select-based libevent. Signed-off-by: Aurélien Bouteiller <[email protected]> Remove the disable-build implicit platform for FT Make sure that we can run failure-free with PML that do not support FT Signed-off-by: Aurelien Bouteiller <[email protected]> ununsed variable in errhandler Signed-off-by: Aurelien Bouteiller <[email protected]> Update the ft-mpi tune file for prte detector default Signed-off-by: Aurelien Bouteiller <[email protected]> Remove debug function OMPI_Comm_failure_inject Signed-off-by: Aurelien Bouteiller <[email protected]> Do not leak the port_string in spawn error cases Signed-off-by: Aurelien Bouteiller <[email protected]> Rbcast n2 would not send to the remote_group Signed-off-by: Aurelien Bouteiller <[email protected]> Check for NULL flag arg in agree/is_revoked Signed-off-by: Aurelien Bouteiller <[email protected]> Substitue MPI_Wtime with PMPI_Wtime. Signed-off-by: Aurelien Bouteiller <[email protected]> General cleanup Address comments from Joseph Schuchart: open-mpi#7740 (review) Signed-off-by: Aurelien Bouteiller <[email protected]> Remove dead code in detector Signed-off-by: Aurelien Bouteiller <[email protected]> ftagree: Rewrite obfuscated for loops as such and explain why, when it cannot be done Signed-off-by: Aurelien Bouteiller <[email protected]> Use normal datatype operations in ETA agree Signed-off-by: Aurelien Bouteiller <[email protected]> Align the 'proc padding' by moving the 'proc active' fiedl as it breaks openshmem otherwise. Signed-off-by: Aurelien Bouteiller <[email protected]> Serialize access to the 'all_failed_procs' group, there was a risk of concurrent accesses to outdated aliases from different threads. Signed-off-by: Aurelien Bouteiller <[email protected]> Errhandler should not call progress recursively; we did add a recursive call by mistake when activating PMIx event notification Signed-off-by: Aurelien Bouteiller <[email protected]> Merge branch 'master' into export/ulfm-to-ompi5-expanded do not modify the loop indice in the inner loop Signed-off-by: Aurelien Bouteiller <[email protected]> Uninitialized variable in ft_rbcast was harmless Signed-off-by: Aurelien Bouteiller <[email protected]> Do the cancel-wait error path only for errors that are garanteed to complete the wait. Other error types just request_free. Signed-off-by: Aurelien Bouteiller <[email protected]> convert proc-failed any-source errors to single source in collectives Signed-off-by: Aurelien Bouteiller <[email protected]> Merge branch 'master' into export/ulfm-to-ompi5-expanded
This comment has been minimized.
This comment has been minimized.
7830235
to
e1a0f14
Compare
This comment has been minimized.
This comment has been minimized.
bot:ibm:pgi:retest |
This comment has been minimized.
This comment has been minimized.
The IBM CI (XL) build failed! Please review the log, linked below. Gist: https://gist.github.com/9ee672eee7b1160d304dd4948f237c14 |
The IBM CI (GNU/Scale) build failed! Please review the log, linked below. Gist: https://gist.github.com/d636bd89cf5c13641c881fa46943c9c0 |
The IBM CI (PGI) build failed! Please review the log, linked below. Gist: https://gist.github.com/0bd7744a554a5e9f298db038a01de7b8 |
56b6663
to
bd8c781
Compare
The IBM CI (GNU/Scale) build failed! Please review the log, linked below. Gist: https://gist.github.com/1c5d49907771e1fd8ad92fe3737af72d |
The IBM CI (XL) build failed! Please review the log, linked below. Gist: https://gist.github.com/b875bd150564e4a03b71b59ba437fc06 |
The IBM CI (PGI) build failed! Please review the log, linked below. Gist: https://gist.github.com/777e273b28cb62aed863a6a32be927a7 |
CI runtime failure is due to missing openpmix/prrte#643 in prte. |
Discussion on 5 Jan 2021 webex:
|
bd8c781
to
77b2dc7
Compare
bot:ompi:retest |
I'm getting lots of warnings when I try to build this using gcc 4.8.5 on a RHEL7 system:
|
Did some testing using ompi-tests/ibm and things look nominal. Would be nice to get the compiler warnings fixed before merging this PR. Also would be nice to decide whether to remove the --no-orte OMPI Jenkins CI test or else modify so that it passes before this PR is merged. |
@@ -595,6 +595,9 @@ enum { | |||
MPI_WIN_DISP_UNIT, | |||
MPI_WIN_CREATE_FLAVOR, | |||
MPI_WIN_MODEL, | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend removing lines 598 and 599 as unneeded.
Here is the CI error for the "--no-orte" (as mentioned this aliases to "--no-prrte")
|
77b2dc7
to
8c0dbbd
Compare
Issue when using external prte has been resolved (ft will get compile-disabled if it was auto-enabled, error at configure is requested with explicit --with-ft configure flag). The --no-orte flag is useful, but should be renamed --no-internal-prte Launching with fault tolerance has been simplified (see prte openpmix/prrte#726) |
There appears to be a misunderstanding of the We specifically stated in telecons that OMPI did not want the ability to build against an external PRRTE, and @bwbarrett wrote the 3rd-party configure code accordingly. Since that time, the downstream packagers have indicated they plan to release PRRTE separately, so we might want to revisit that decision. For now, however, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had few minor comments @abouteiller and myself need to address.
8c0dbbd
to
680336e
Compare
The historical repositories contain the full history and attribution and are available from https://bitbucket.org/icldistcomp/ulfm2/src/ulfm/ and prior https://github.com/ICLDisco/ulfm-legacy Signed-off-by: Aurelien Bouteiller <[email protected]> Signed-off-by: George Bosilca <[email protected]> Signed-off-by: Josh Hursey <[email protected]> Signed-off-by: Thomas Herault <[email protected]> Signed-off-by: Wesley Bland <[email protected]> Signed-off-by: Nuria Losada <[email protected]> Signed-off-by: Nathan T. Weeks <[email protected]> Squashed commit of the following: commit 262a7f750fc9a86cc936aa59b535936fb3406db6 Author: Aurelien Bouteiller <[email protected]> Date: Mon Feb 8 11:03:35 2021 -0500 Fix a case where a send-request to a failed process would be started without an endpoint (issue open-mpi#7740 (comment)) Signed-off-by: Aurelien Bouteiller <[email protected]> ... commit b8fbc4e200f8acd5bfbcd7c493ff04e674e2a023 Merge: c02ef875 724eb49 Author: Aurelien Bouteiller <[email protected]> Date: Thu Jan 28 00:05:02 2021 -0500 Merge branch 'master' into export/ulfm-to-ompi5-expanded ... commit 69ab6b8 Author: Aurélien Bouteiller <[email protected]> Date: Thu Feb 18 20:03:12 2016 -0500 Importing ULFM ompi layer: snapshot of WIP Missing BTL and COLL imports. Almost compiles w/o --with-ft Signed-off-by: Aurelien Bouteiller <[email protected]>
680336e
to
6a406fb
Compare
ready to go. |
The historical repositories contain the full history and attribution and are available from https://bitbucket.org/icldistcomp/ulfm2/src/ulfm/ and prior https://github.com/ICLDisco/ulfm-legacy Signed-off-by: Aurelien Bouteiller <[email protected]> Signed-off-by: George Bosilca <[email protected]> Signed-off-by: Josh Hursey <[email protected]> Signed-off-by: Thomas Herault <[email protected]> Signed-off-by: Wesley Bland <[email protected]> Signed-off-by: Nuria Losada <[email protected]> Signed-off-by: Nathan T. Weeks <[email protected]> Squashed commit of the following: commit 262a7f750fc9a86cc936aa59b535936fb3406db6 Author: Aurelien Bouteiller <[email protected]> Date: Mon Feb 8 11:03:35 2021 -0500 Fix a case where a send-request to a failed process would be started without an endpoint (issue open-mpi#7740 (comment)) Signed-off-by: Aurelien Bouteiller <[email protected]> ... commit b8fbc4e200f8acd5bfbcd7c493ff04e674e2a023 Merge: c02ef875 724eb49 Author: Aurelien Bouteiller <[email protected]> Date: Thu Jan 28 00:05:02 2021 -0500 Merge branch 'master' into export/ulfm-to-ompi5-expanded ... commit 69ab6b8 Author: Aurélien Bouteiller <[email protected]> Date: Thu Feb 18 20:03:12 2016 -0500 Importing ULFM ompi layer: snapshot of WIP Missing BTL and COLL imports. Almost compiles w/o --with-ft Signed-off-by: Aurelien Bouteiller <[email protected]>
Features
This implementation conforms to the User Level Failure Mitigation (ULFM)
MPI Standard draft proposal. The ULFM proposal is developed by the MPI
Forum's Fault Tolerance Working Group to support the continued operation of
MPI programs after crash (node failures) have impacted the execution. The key
principle is that no MPI call (point-to-point, collective, RMA, IO, ...) can
block indefinitely after a failure, but must either succeed or raise an MPI
error.
This implementation produces the three supplementary error codes and five
supplementary interfaces defined in the communicator section of the
[http://fault-tolerance.org/wp-content/uploads/2012/10/20170221-ft.pdf]
(ULFM chapter) standard draft document.
MPIX_ERR_PROC_FAILED
when a process failure prevents the completion ofan MPI operation.
MPIX_ERR_PROC_FAILED_PENDING
when a potential sender matching a non-blockingwildcard source receive has failed.
MPIX_ERR_REVOKED
when one of the ranks in the application has invoked theMPI_Comm_revoke
operation on the communicator.MPIX_Comm_revoke(MPI_Comm comm)
Interrupts any communication pending onthe communicator at all ranks.
MPIX_Comm_shrink(MPI_Comm comm, MPI_Comm* newcomm)
creates a newcommunicator where dead processes in comm were removed.
MPIX_Comm_agree(MPI_Comm comm, int *flag)
performs a consensus (i.e. faulttolerant allreduce operation) on flag (with the operation bitwise AND).
MPIX_Comm_failure_get_acked(MPI_Comm, MPI_Group*)
obtains the group ofcurrently acknowledged failed processes.
MPIX_Comm_failure_ack(MPI_Comm)
acknowledges that the application intendsto ignore the effect of currently known failures on wildcard receive
completions and agreement return values.
Supported Systems
There are several MPI engines available in Open MPI,
notably, PML "ob1", "cm", "ucx", and MTL "ofi", "portals4", "psm2".
At this point, only "ob1" is adapted to support fault tolerance.
"ob1" uses BTL ("Byte Transfer Layer") components for each supported
network. "ob1" supports a variety of networks that can be used in
combination with each other. Collective operations (blocking and
non-blocking) use an optimized implementation on top of "ob1".
More details available in README.FT.ULFM.md
Performance tests
For p2p operations
Each solid line represents an IMB-MPI1 run with Open MPI master. ucx/ib between ranks located at two different nodes, ucx/sm between two ranks located on sibling cores on a single node. The performance of pr7740 is overlayed with dashed lines and squares/circles.
Left column is latency (lower is better); Right column is bandwidth (higher is better).
Graph rows alternate between the default setting on that machine (pml=ucx), for validation only, and mode that supports runtime ft=on (pml=ob1/uct).
The patch introduces no substantive changes to the UCX pml pathway, so we expect no difference (which is what we see), and we cannot run with ft=on.
The PML ob1 pathway supports FT, and we see no performance difference with ft=off, and maybe a very slight effect with ft=on.
For coll operations
In this set of tests we run a selection of IMB collective benchmarks. We present only the pml=ob1 case (ucx case shows no difference). sm+cma is used for communication between cores, uct/mlx5 is used between nodes, map-by slot (all other settings default, thus coll=tuned).
The results are a bit more variable as is illustrated by the fact that it happens that ft=on is faster in some instances than ft=off or master. We would need more statistics to cleanup the look of the curves, but overall the performance impact appears non-measurable.
Aries systems
This is a set of tests that compares ULFM with its upstream master (an older version) on Cori (ugni). This graph aggregates a log of data as each violin point is the aggregate of hundred of IMB runs. Blue is master reference, orange is ulfm ft=on. We present the distribution of multiple runs per message size.
Again, the difference is minuscule, if any.