-
Notifications
You must be signed in to change notification settings - Fork 900
opal/mca/accelerator: introduce get_device_pci_attr api #11687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
af4e7c6
to
9fd8ec7
Compare
@@ -476,6 +478,11 @@ static int mca_accelerator_rocm_get_device(int *dev_id) | |||
return OPAL_SUCCESS; | |||
} | |||
|
|||
static int mca_accelerator_rocm_get_device_pci_attr(int dev_id, opal_accelerator_pci_attr_t *pci_attr) | |||
{ | |||
return OPAL_ERR_NOT_IMPLEMENTED; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should implement this, too. https://rocm-developer-tools.github.io/HIP/group__Driver.html#gaaa961c8ec8047903617f4245fa50256a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@edgargabriel Hi could you please take a look at the change and let me know if it makes sense to you. Unfortunately I do not have access to rocm devices for testing, so if possible I very much appreciate if you could do a quick sanity test.🙏
9fd8ec7
to
0d8df2d
Compare
I do have to sigh as PMIx is supposed to provide you with such info. MPICH-4 uses it - guess it's just something OMPI hasn't caught up with yet. 🤷♂️ |
bot:ibm:retest |
@rhc54 Ooooh do you have a pointer to that? In theory we don't have to do this in accelerator, since PCI is not an accelerator concept(and this change implies that all accelerators must be PCI devices). But the winning motivation here is to abstract away the accelerator type, such that calling routines do not need to branch off cuda vs rocm vs etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fundamentally good for me, I can also confirm that the code also compiles correctly.
Introduce get_device_pci_attr api to query accelerator device PCI attributes. This enables intelligent selection of other PCI(e) devices based on affinity with the accelerator, e.g. NICs. Signed-off-by: Wenduo Wang <[email protected]>
0d8df2d
to
00a567e
Compare
Busy moving at the moment, so let me get back to you on it. For now, you'd need this as a backup method anyway, so no harm done. Just don't let me forget about it 😄 |
@bwbarrett @lrbison @edgargabriel Can we merge this change? I also need to backport it. |
Introduce get_device_pci_attr api to query accelerator device PCI attributes. This enables intelligent selection of other PCI(e) devices based on affinity with the accelerator, e.g. NICs.