coll/HAN: Don't DQ HAN dynamic @ intra-node subcomm + typo fixes #10458

gkatev · 2022-06-08T14:29:14Z

HAN disables itself when running in a single node, but that shouldn't include the subcommunicator created by HAN-dynamic.
See also #10438.

Tested on v5.0.x
Signed-off-by: George Katevenis [email protected]

ompiteam-bot · 2022-06-08T14:29:17Z

Can one of the admins verify this patch?

gkatev · 2022-06-08T14:30:41Z

@FlorentGermain-Bull

awlauria · 2022-06-08T16:29:51Z

ok to test

bosilca · 2022-06-08T16:53:32Z

Fixing the typos is good, but enabling HAN at the node level needs to be backed by some evidence. Why is this necessary ? In which case this leads to improved performance ?

gkatev · 2022-06-08T17:04:04Z

This doesn't actually allow HAN to run at the node-level, but rather HAN's intra-node subcommunicator. (note the added INTRA_NODE check in the diff)

When mca_coll_han_comm_create_new creates the inter/intra comms, it sets the INTRA_NODE/INTER_NODE info keys. These sub-comms also use HAN (#10456). HAN's dynamic functions detect if the current communicator is a sub-communicator, via the info key (topo_lvl), and they delegate to the respective component.

bosilca · 2022-06-08T19:15:59Z

The first part of the check will only succeed if all processes are local (aka spawned by the same RTE daemon). I don't think that can happen for an INTRA comm.

gkatev · 2022-06-08T19:48:14Z

This is the path that I have in mind:

App calls Barrier
Enter mca_coll_han_barrier_intra_dynamic, take 3rd if case (topo_lvl = GLOBAL_COMMUNICATOR)
Enter mca_coll_han_barrier_intra_simple
Enter mca_coll_han_comm_create_new
Two communicators are created, both preferring HAN (assuming Coll han: fix allreduce dynamic calling internal han algo on sub_comm #10456 is merged). One communicator has INTRA_NODE set via ompi_comm_coll_han_topo_level, the other INTER_NODE

During low comm's creation, han is considered. Without this change, it is disqualified because the communicator spans only one node.
- With the change, the INTRA_NODE info key is detected, and HAN is chosen for the intra-node sub-comm

Control returns to mca_coll_han_barrier_intra_simple, low_comm's coll_barrier gets called
Enter mca_coll_han_barrier_intra_dynamic, take 4th case (topo_lvl = INTRA_NODE)
Enter the actual submodule that handles the intra-node barrier

bosilca · 2022-06-08T21:24:14Z

In the current code the step 7 will be skipped, and the control is going directly from your step 6 to your step 8. This brings me back to my original question, is there a realistic need to give the control back to HAN on our own communicators ? The only reason I can see, is if we want to support multiple levels, but I might have missed something.

gkatev · 2022-06-08T21:52:15Z

Oh I understand, I believe step 7 is necessary in order for HAN to decide which component/module should be called for the intra-node level. Unlike the non-dynamic path where this is decided at split-time, in the dynamic it is decided when the collective is called. Without this change, the component that gets used for the intra-node level is the one with the next higer priority, and the coll_han_<coll>_dynamic_intra_node_module parameter has no effect. I will perform some additional tests to make double sure things are indeed the way I describe them.

gkatev · 2022-06-09T06:46:29Z

I did some extra tests for completeness (2 nodes, 2 ranks each)

Before change:

coll_han_barrier_dynamic_intra_node_module=3 (tuned)
rank 0: Base barrier intra rec.doubling @ INTRA_NODE
rank 0: Base barrier intra rec.doubling @ INTER_NODE
rank 0: Base barrier intra rec.doubling @ INTRA_NODE

coll_han_barrier_dynamic_intra_node_module=1 (basic)
rank 0: Base barrier intra rec.doubling @ INTRA_NODE
rank 0: Base barrier intra rec.doubling @ INTER_NODE
rank 0: Base barrier intra rec.doubling @ INTRA_NODE

After change:

coll_han_barrier_dynamic_intra_node_module=3 (tuned)
rank 0: Base barrier intra rec.doubling @ INTRA_NODE
rank 0: Base barrier intra rec.doubling @ INTER_NODE
rank 0: Base barrier intra rec.doubling @ INTRA_NODE

coll_han_barrier_dynamic_intra_node_module=1 (basic)
rank 0: Base barrier intra linear @ INTRA_NODE
rank 0: Base barrier intra rec.doubling @ INTER_NODE
rank 0: Base barrier intra linear @ INTRA_NODE

The thought occurs, whether HAN's dynamic function should be invoked each time a collective is called, or if the underlying module's function should be called directly instead. I'm not certain what kind of overhead the dynamic functions impose, some things do get cached, so they will only add overhead the first time. I believe we could in HAN-dynamic, instead of just call another function according to the rules, also update the communicator to not go through HAN in future calls.

FlorentGermain-Bull · 2022-06-09T07:11:52Z

There are two conigurations type in han.

If we only use configuration through MCA parameters, yes we can update function pointers of c_coll in communicator structure at first dynamic call.

If we use a configuration file, module called can vary regarding message size. In that case, updating c_coll function pointers is not possible.

To add more value to dynamic choice of han, there is an example where it is usefull:
Let A and B be two components providing bcast and reduce implementation.
Has its granularity is on module scale, component choice by priority do not allow me to use bcast from A and reduce from B.

gkatev · 2022-06-09T07:19:26Z

Ah okay make sense, I wasn't aware of the possibility for message-size-based selection

jsquyres · 2022-07-17T12:15:33Z

Did the controversial parts of this PR become #10456? Should this PR be reduced to just the typo fixes, or closed if it is fully replaced by #10456?

gkatev · 2022-07-17T13:55:26Z

No the two PRs don't currently overlap

jsquyres · 2022-07-19T14:26:51Z

No the two PRs don't currently overlap

Cool. Where are we on this PR, then?

gkatev · 2022-07-19T14:31:15Z

IMO, it's ready and good to go. @bosilca, did my explanations above address the concerns?

Edit: To clarify the relation of this PR with #10456:

Coll han: fix allreduce dynamic calling internal han algo on sub_comm #10456 reverts a change that effectively disabled HAN's "dynamic" path. At the same time, it fixes the original bug (a typo) that triggered the PR (coll/han: fix coll preference selection in mca_coll_han_comm_create_new #8250) that disabled the dynamic path
This PR is for the dynamic path, and so is related to the aforementioned ones in that it is only meaningful when the path is not disabled
- The contained fix allows HAN's dynamic sub-module selection to properly function on the intra-node sub-comm

awlauria · 2022-08-19T15:12:19Z

@bosilca / @devreal ping - please review.

jsquyres · 2022-09-28T15:46:46Z

@gkatev Can you rebase this PR so that it picks up the new CI? Thanks!

Signed-off-by: George Katevenis <[email protected]>

gkatev · 2022-10-17T08:12:07Z

Should we also apply this to 5.0.x? (let me know and I will make a PR)

gkatev mentioned this pull request Jul 14, 2022

Coll han: fix allreduce dynamic calling internal han algo on sub_comm #10456

Merged

gkatev mentioned this pull request Aug 18, 2022

Coll/HAN and Coll/Adapt not default on 5.0.x #10347

Closed

gpaulsen requested a review from bosilca August 18, 2022 22:08

awlauria requested a review from devreal August 19, 2022 15:13

coll/HAN: Don't DQ HAN dynamic @ intra-node subcomm + typo fixes

705f8ec

Signed-off-by: George Katevenis <[email protected]>

gkatev force-pushed the main branch from b331dbe to 705f8ec Compare September 29, 2022 08:43

github-actions bot added the Target: main label Sep 29, 2022

jsquyres requested review from bosilca and devreal and removed request for bosilca and devreal September 29, 2022 12:24

devreal approved these changes Sep 29, 2022

View reviewed changes

bosilca approved these changes Sep 29, 2022

View reviewed changes

awlauria merged commit 31953c1 into open-mpi:main Sep 29, 2022

FlorentGermain-Bull mentioned this pull request Nov 8, 2022

Coll han update file reading 5.0.x #11043

Merged

coll/HAN: Don't DQ HAN dynamic @ intra-node subcomm + typo fixes #10458

coll/HAN: Don't DQ HAN dynamic @ intra-node subcomm + typo fixes #10458

Uh oh!

Conversation

gkatev commented Jun 8, 2022

Uh oh!

ompiteam-bot commented Jun 8, 2022

Uh oh!

gkatev commented Jun 8, 2022

Uh oh!

awlauria commented Jun 8, 2022

Uh oh!

bosilca commented Jun 8, 2022

Uh oh!

gkatev commented Jun 8, 2022

Uh oh!

bosilca commented Jun 8, 2022

Uh oh!

gkatev commented Jun 8, 2022

Uh oh!

bosilca commented Jun 8, 2022

Uh oh!

gkatev commented Jun 8, 2022

Uh oh!

gkatev commented Jun 9, 2022

Uh oh!

FlorentGermain-Bull commented Jun 9, 2022

Uh oh!

gkatev commented Jun 9, 2022

Uh oh!

jsquyres commented Jul 17, 2022

Uh oh!

gkatev commented Jul 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jsquyres commented Jul 19, 2022

Uh oh!

gkatev commented Jul 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

awlauria commented Aug 19, 2022

Uh oh!

jsquyres commented Sep 28, 2022

Uh oh!

gkatev commented Oct 17, 2022

Uh oh!

Uh oh!

gkatev commented Jul 17, 2022 •

edited

Loading

gkatev commented Jul 19, 2022 •

edited

Loading