Skip to content

Coll/han Improvements on algorithm selection through MCA and configuration file #10828

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

FlorentGermain-Bull
Copy link
Contributor

@FlorentGermain-Bull FlorentGermain-Bull commented Sep 21, 2022

Allow topological level to be named in configuration file

Try to read topological level as a string then as an id in configuration file.

Improve algorithm management and choice

Uniformisation of algorithm choice mechanism.
Translation table from name to function pointer is set in ompi/mca/coll/han/coll_han_algos.c as mca_base_var_enum_value_t.

Allow algorithm selection (optional) in configuration file

Algorithm choice can be made directly in the configuration file for han component (see configuration file example).

Algorithm choice through MCA parameters simplification

Algorithm choice is made using their name through an enum.

Configuration file example

1 # Number of collectives described in this file
allreduce # Set of rules for allreduce collectives
    1 # How many topological levels are described in this file
    global_communicator # Topological level
        1 # Number of configurations
        1 # Configuration size (communicator size on this level)
            4 # Number of message size rules
            0 han @intra # From 0 to 999 sized message, use intra algorithm of han component
            1000 han # From 1000 to 7999, use default algorithm of han component
            8000 han @simple # From 8000 to 19999, use simple algorithm of han component
            20000 tuned # Fallback on tuned if message size is higher than 20000

Note: Han can only be used on the global_communicator level.

Set of MCA parameters to read a han configuration file:

# Han must be selected to be used
export OMPI_MCA_coll_han_priority=100

# Activate file reading
export OMPI_MCA_coll_han_use_dynamic_file_rules=true

# Set file path
export OMPI_MCA_coll_han_dynamic_rules_filename=path/to/configuration_file

@ompiteam-bot
Copy link

Can one of the admins verify this patch?

@jsquyres
Copy link
Member

ok to test

1 similar comment
@awlauria
Copy link
Contributor

ok to test

@jsquyres jsquyres requested a review from bosilca September 21, 2022 14:13
@gpaulsen
Copy link
Member

bot:ibm:retest

@gpaulsen
Copy link
Member

@FlorentGermain-Bull Would you be able to rebase your branch on main somewhere after
7dbfbeea - build: Use open-mpi/oac for oac submodule commit? We're having an issue with the IBM CI when it tries to test a Pull Request that doesn't include that commit.

@jsquyres
Copy link
Member

@FlorentGermain-Bull And be sure to see https://www.mail-archive.com/[email protected]/msg21421.html

@gpaulsen
Copy link
Member

bot:ibm:retest

@FlorentGermain-Bull FlorentGermain-Bull force-pushed the coll_han_update_file_reading branch from ac4fd09 to 62bf950 Compare September 22, 2022 06:37
@gkatev
Copy link
Contributor

gkatev commented Sep 22, 2022

FYI it looks like all changes proposed in #10456 are also included here

@FlorentGermain-Bull FlorentGermain-Bull force-pushed the coll_han_update_file_reading branch 2 times, most recently from bad049f to 4ae2355 Compare September 22, 2022 08:19
@gpaulsen
Copy link
Member

gpaulsen commented Sep 22, 2022

It worked! Thanks. I've heard that Mellanox is working on Their CI. So no action on your part for that.

@gpaulsen gpaulsen requested a review from bosilca September 22, 2022 13:21
@bosilca bosilca changed the title Coll/han Improvements on algorithm gestion through MCA and configuration file Coll/han Improvements on algorithm selection through MCA and configuration file Sep 22, 2022
@FlorentGermain-Bull FlorentGermain-Bull force-pushed the coll_han_update_file_reading branch 2 times, most recently from 8e52500 to 8b21e0e Compare September 26, 2022 07:19
@bwbarrett
Copy link
Member

bot:aws:retest

1 similar comment
@gpaulsen
Copy link
Member

bot:aws:retest

@awlauria
Copy link
Contributor

@FlorentGermain-Bull can you rebase this on top of current main if it is still something you want to get in. Thanks

        Allow topological level to be named in configuration file
        Improve algorithm management and choice
        Allow algorithm selection (optional) in configuration file
        Algorithm choice through MCA parameters simplification

Signed-off-by: Florent GERMAIN <[email protected]>
@FlorentGermain-Bull FlorentGermain-Bull force-pushed the coll_han_update_file_reading branch from 8b21e0e to 9245e27 Compare October 24, 2022 08:40
@awlauria
Copy link
Contributor

@bosilca please review so we can get this into v5.

@bosilca bosilca merged commit d01a626 into open-mpi:main Oct 25, 2022
@devreal
Copy link
Contributor

devreal commented Oct 26, 2022

@FlorentGermain-Bull Are you planning to bring this back to 5.0.x?

@FlorentGermain-Bull
Copy link
Contributor Author

@FlorentGermain-Bull Are you planning to bring this back to 5.0.x?

sorry for the late reply
I'm on it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants