Skip to content

Improvements and fixes for coll/HAN #10438

Open
@gkatev

Description

@gkatev

Hi, I have been looking into the HAN collective component, and would like to suggest some usability improvements and some fixes. I was planning on implementing these improvements (or some/most of them) and submitting PRs myself. So, in this issue, I'm looking for the "green light" that these suggestions are desirable, or any ideas/comments regarding them, or to know if someone else is already working on them or something similar.

  1. Currently, module selection for the intra/inter-node communicators has a fixed range of selections, and the MCA parameters can influence this selection through numeric indices associated with components.

I suggest adjusting the component choice to be based on the name (string) of the collective component to utilize, and remove the fixed selections. This will allow easier tuning (strings instead of IDs), and the possibility to use any component for each comm, without code modification. Example: --mca coll_han_bcast_up_module adapt --mca coll_han_bcast_low_module sm.

  1. Towards further ease-of-use improvements, add MCA param(s) to control the component choice for all primitives (and segsize, _use_simple?)

Currently, parameters are in the form of coll_han_<coll>_up_module, coll_han_<coll>_down_module, coll_han_<coll>_segsize, coll_han_use_simple_<coll>. While keeping these, example of addition: coll_han_up_module, coll_han_down_module. The primitive-specific parameters would override the new non-primitive-specific parameter, if set.

In the context of (1) and (2), I would also seek to unify mca_coll_han_comm_create() and mca_coll_han_comm_create_new() (?).

  1. Look into the dynamic functions available, and possibly fix them. I'm not entirely sure how these work, and it's possible that they are actually working they way they are supposed to. I have deposited some notes regarding these here: coll/han: dynamic selection does not work for simple algorithms #9883 (comment)

FYI, for anyone working on HAN, I believe that #10335 also affects (?) the ompi_comm_coll_preference info key that is used to influence the component selection for each subcomm.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions