-
Notifications
You must be signed in to change notification settings - Fork 900
coll/acoll: Remove use of cid as array index #12783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
coll/acoll: Remove use of cid as array index #12783
Conversation
So....what happens when the 11th communicator appears? |
When it exceeds 10 communicators, it falls back to base functions. |
And the user will know this happened - how? How will it impact them? What I'm getting at is that you are introducing a hard limit into the OMPI code base, and users are going to have a difficult time realizing what is happening and trying to figure out a way around it. Historically, we have found that such hard limits tend to generate user unhappiness - so perhaps some justification for why it is necessary here would help? You state that there is no performance benefit, so why do this? Something like (a) what problem is this solving, (b) how the 10 communicators are impacted by the change, (c) how a communicator/performance is impacted when you fall back to the base, (d) how a user can expand that number if the limit is hurting them, etc. |
@amd-nithyavs would it be possible to make the max. number of subcommunicators that you want to store an mca-parameter that the user could adjust, and allocate the array dynamically during acoll_module_enable() ? |
cdbc2b4
to
944dc79
Compare
Done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly good to me, just a few minor nits
err = check_and_create_subc(comm, acoll_module, &subc); | ||
|
||
/* Fallback to linear if subcomms structure is not obtained */ | ||
if (subc == NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit: the Open MPI coding standard require to have the constant as the first argument of the comparison, so this should be
if (NULL == subc) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
/* Obtain the subcomms structure */ | ||
err = check_and_create_subc(comm, acoll_module, &subc); | ||
/* Fallback to knomial if subcomms is not obtained */ | ||
if (subc == NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dito here (there a few more later in this pr, I assume you will check all comparisons in this PR to adjust this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
/* Subcomms structure is not present, create one if within limit*/ | ||
if (num_subc == acoll_module->max_comms) { | ||
*subc_ptr = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we add here an opal_output() warning to the user that we are using the fallback path and suggest that he can increase the max no. of communicators handled by the component using the mca parameter? This warning could be displayed with verbosity level turned on or similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
The use of cid as array index is removed. Instead, now coll_acoll_subcomms_t is dynamically allocated for each new communicator. Signed-off-by: Nithya V S <[email protected]>
944dc79
to
a59c462
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR removes the use of context id as array index in
mca_coll_acoll_module_t
. Previously,coll_acoll_subcomms_t subc[]
was indexed through context id of the communicator. Now,subc
is dynamically allocated for each new communicator (for up to 10 communicators), thereby eliminating the need to use context id as array index.There is no change in performance of the collective APIs with this change.