-
Notifications
You must be signed in to change notification settings - Fork 901
How to initialize smsc when btl/sm is not used? #10342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The issue is that when OB1 is not the PML nobody stores the the modex info you are trying to read. The complete fix would be to also store the modex info in the open part of your component. |
Oh I see! Do you happen to have any readily available resource about the modex's workings? I see this modex send inside smsc/xpmem, that sends exactly what is needed in
Evidently this is not enough? Could you point to the code in pml/ob1 or btl/sm that does what I would need to do in my component? I also tried placing the call to |
Are you certain that function is called on all processes ? |
Yes, I see that each process enters it exactly once. |
Have you tried to call from within the coll/sm component initialization? That is generally where the modex is sent. Also, might be worth chatting offline about your plans for coll/sm. I have been working on it as well to make it NUMA aware. The first cut is to remove all existing collectives and re-implement starting with barrier. So far it is beating Intel MPI's NUMA-aware barrier on a 2-NUMA node. I haven't had time to work on broadcast (the next obvious one). |
Also, smsc was split out of btl/sm specifically for coll/sm so I am happy to help get that integration complete. |
Where exactly is the component initialization? I have also tried placing the call in I don't have any plans for coll/sm specifically, here I just used it as a testing base. However, we have developed a new component for topology-aware collectives, and from your description it appears that there is overlap. I believe that the current plan is to open-source it at some point in the coming months. |
I created #10897 to fix this. Please let me know how the proposed solution looks. |
Hi, I'm using opal/smsc in a collectives component. When pml/ob1+btl/sm are used, all works correctly. However, if instead ucx is configured as the PML, smsc remains uninitialized. I noticed that btl/sm calls
mca_smsc_base_select()
, and tried calling that in my code, but even so, later calls toget_endpoint()
fail.So, how should I go about initializing smsc in my code when it's not initialized elsewhere?
In case the the
mca_smsc_base_select
call is all that is needed, consider this a bug report :-)I'm on v5.0.0rc6
To reproduce:
With pml=ob1, all works ok:
The text was updated successfully, but these errors were encountered: