You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This condition will be true for non-root ranks in the same node as the root. But, for these ranks, rbuf has been previously initialized to NULL. Thus, t->rbuf will be set to something non-NULL, but won't point to valid memory. Later on, this influences the rbuf parameter to coll_reduce:
This is a problem when trying to detect (in XHC) if rbuf is valid or not, as this is done by checking if the pointer is NULL.
Related: #11552 and the discussion in #11418
I'm posting this as an issue instead of a PR, with the hopes that someone will take it through the last mile, as I'm not fully sure what the desired fix for this would be, or if there are similar occurrences in other HAN collectives that should also be adjusted. Something like this does fix it:
Uh oh!
There was an error while loading. Please reload this page.
Hi, while testing the new HAN+XHC integration method, I came upon a bug/issue in HAN's reduce, in this part of the code:
ompi/ompi/mca/coll/han/coll_han_reduce.c
Lines 176 to 178 in 9216ad4
This condition will be true for non-root ranks in the same node as the root. But, for these ranks,
rbuf
has been previously initialized to NULL. Thus,t->rbuf
will be set to something non-NULL, but won't point to valid memory. Later on, this influences therbuf
parameter tocoll_reduce
:ompi/ompi/mca/coll/han/coll_han_reduce.c
Lines 248 to 256 in 9216ad4
This is a problem when trying to detect (in XHC) if rbuf is valid or not, as this is done by checking if the pointer is NULL.
Related: #11552 and the discussion in #11418
I'm posting this as an issue instead of a PR, with the hopes that someone will take it through the last mile, as I'm not fully sure what the desired fix for this would be, or if there are similar occurrences in other HAN collectives that should also be adjusted. Something like this does fix it:
The text was updated successfully, but these errors were encountered: