Skip to content

coll/han: set as default except if processes are consecutive across nodes #10963

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

devreal
Copy link
Contributor

@devreal devreal commented Oct 21, 2022

coll/han provides better latency than coll/tuned in cases where processes are mapped to nodes nonconsecutively, e.g., using --rank-by node. In that case coll/han reduces the amount of cross-node traffic. Its benefits are less clear with linear consecutive process placements. We try to detect linear process placement and if found reduce the priority of coll/han to below coll/tuned. A new mca parameter coll_han_priority_penalty is used to control the adjustment (10 by default).

This PR addresses #10347 for coll/han by increasing the default priority of coll/han to 35 (coll/tuned stands at 30)

Signed-off-by: Joseph Schuchart [email protected]

bosilca
bosilca previously approved these changes Oct 21, 2022
@bwbarrett
Copy link
Member

I'm confused; I thought the plan was to always switch to HAN (assuming multiple ranks / node) and then adjust tuned so that the delta wasn't really important? I'm also not a huge fan of linear == good mapping, given that AWS has very few instances with power of two cores.

@awlauria
Copy link
Contributor

@devreal can you fix the conflict? Any thoughts on Brian's questions?

@janjust
Copy link
Contributor

janjust commented Dec 6, 2022

@devreal ping ^

coll/han provides better latency than coll/tuned if processes are mapped to nodes
nonconsecutively, e.g., using --rank-by node. In that case coll/han
reduces the amount of cross-node traffic. Its benefits are less clear
with linear process placements. We try to detect linear process placement
and if found reduce the priority of coll/han to below coll/tuned.
A new mca parameter coll_han_priority_penalty is used to control the
adjustment (10 by default).


Signed-off-by: Joseph Schuchart <[email protected]>
@devreal devreal force-pushed the han_priority_penalty branch from 5b70a53 to acb0984 Compare December 6, 2022 19:43
@gkatev
Copy link
Contributor

gkatev commented Dec 7, 2022

Is it okay to call ompi_coll_base_allreduce_intra_recursivedoubling during comm_query? (#9780)

@devreal
Copy link
Contributor Author

devreal commented Dec 7, 2022

@bosilca and I are not sure that we really want this. I'll mark it as draft for now.

@devreal devreal marked this pull request as draft December 7, 2022 08:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants