-
Notifications
You must be signed in to change notification settings - Fork 900
ompi/coll/accelerator: implement reduce_local #12758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ompi/coll/accelerator: implement reduce_local #12758
Conversation
571c658
to
9b4af62
Compare
@Akshay-Venkatesh Why are you putting |
Why does the commit message say
As far as I can tell, this commit/PR is not specific to CUDA. |
Probably just old terminology misuse, since in the past anything accelerator was synonymous to cuda. |
@Akshay-Venkatesh please drop the bot:notacherrypick from the commit message, anything in main doesn't need it since the CI is looking for commits in main for cherry-picks in other branches. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
9b4af62
to
4617d96
Compare
dtype, op, root, comm, | ||
s->c_coll.coll_reduce_module); | ||
|
||
if ((comm == NULL) && (root == -1)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a horrible highjack of the reduction API ! A collective module is always attached to a communicator (also that's where the module
is coming from). How can you end up here with a comm equal to NULL ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't disagree that the usage is is hacky. Do you have an alternative option that doesn't involve NULL comm and -1 root that also doesn't involve duplicated code? Maybe there are other constants for comm and root that are more appropriate here?
cc @jsquyres
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, check my other comment. You are trying to do a local_reduce in the reduce collective by using a NULL communicator as a signal. There is an API for local reduce, in this same PR, why don't you use it instead of redirecting it into the reduce ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that would effectively involve copying the code in mca_coll_accelerator_reduce
into mca_coll_accelerator_reduce_local
. I was hoping there was a way to avoid duplicated code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
absolutely. Put it into a separate function and invoke it in all places where it is needed.
struct ompi_op_t *op, | ||
mca_coll_base_module_t *module) | ||
{ | ||
return mca_coll_accelerator_reduce(sbuf, rbuf, count, dtype, op, -1, NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of the solution proposed here, copy the code from above in this function. Cleaner and more readable from my perspective, as well as more into the OMPI approach to collective modules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Akshay-Venkatesh Ping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: Akshay Venkatesh <[email protected]>
Signed-off-by: Akshay Venkatesh <[email protected]>
6dbeab8
to
7f6f788
Compare
@bosilca can we get another look at this PR |
reduce_local implementation