-
Notifications
You must be signed in to change notification settings - Fork 900
orted/pmix: fix spawn in singleton mode #2084
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
orted/pmix: fix spawn in singleton mode #2084
Conversation
14de6b7
to
400f0df
Compare
@rhc54 could you please review this ? currently, spawn is broken on master with singleton. that can be evidenced by running once the patch is applied, it is then also possible to run unfortunatly, |
I don't think that is actually going to do what it is supposed to do. The purpose of the transport key attribute is to record the key used by the original parent job (in this case, the singleton) so the child can be given the same key. This allows PSM2 to enable communication between the parent and child jobs. What this patch does is assign a new key to the parent job, and then pass that key to the child. It isn't the same key used by the parent singleton, and so communication won't work. What we need is for the singleton to pass the transport key to the HNP in the spawn request. We can add a PMIx attribute for this purpose. Then the patch would be to detect that PMIx attribute and set the transport key attribute in the parent job so that the PLM can do the right thing. Make sense? |
It does make sense Would it be easier to pass the key on the orted command line ? |
Actually, I was thinking that the singleton can just add the transport key to the spawn command as another "info" key. So in the ompi/dpm, we would lookup and add the transport key to the pmix.spawn attribute list |
400f0df
to
f3f8aa8
Compare
@rhc54 i gave it some more thoughts and came with a fix i think is both correct and simpler. unless i am missing a broader rationale, adding yet an other info key to spawn in order to only fix the singleton case looks like an overkill to me. could you please double check it ? |
That should be okay so long as the ess/singleton code is adjusted to fit. You need to ensure that the ess code doesn't call precondition_transport, and that it properly handles another envar. |
f3f8aa8
to
a78b9e7
Compare
i updated the patch |
Yeah, I think it's okay too - might need to watch out to see if we have to increase the buffer size for the return string from the orted as we are adding things to it. |
do you mean |
in singleton mode, have the spawn'ed orted invoke orte_pre_condition_transports() and send the transport key back to the singleton
a78b9e7
to
83399ad
Compare
You are correct - one thing I hate about this interface is that you don't see the changes in the full context of the file without lots of "clicks". We should be good to go. Thanks! |
invoke orte_pre_condition_transports() in order to set the
ORTE_JOB_TRANSPORT_KEY attribute.