-
Notifications
You must be signed in to change notification settings - Fork 900
orted: fix tree-spawn when the node regex is too long #4637
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
orted: fix tree-spawn when the node regex is too long #4637
Conversation
note to myself:
|
Looks okay to me - thanks! You shouldn't need to retrieve the prefix because you were already given it on the cmd line that started the orted. There is indeed some dead code to remove. Now that the daemon calls remote_spawn itself, there is no longer a need for the "tree_spawn" cmd (odls/odls_types.h) or the associated cmd processing code in orted/orted_comm.c as the HNP is no longer sending a tree-spawn message to the orted. |
33d706b
to
3cbf39b
Compare
@ggouaillardet @rhc54 Is this one ready to merge? |
orte/mca/odls/odls_types.h
Outdated
@@ -44,7 +46,7 @@ typedef uint8_t orte_daemon_cmd_flag_t; | |||
#define ORTE_DAEMON_KILL_LOCAL_PROCS (orte_daemon_cmd_flag_t) 2 | |||
#define ORTE_DAEMON_SIGNAL_LOCAL_PROCS (orte_daemon_cmd_flag_t) 3 | |||
#define ORTE_DAEMON_ADD_LOCAL_PROCS (orte_daemon_cmd_flag_t) 4 | |||
#define ORTE_DAEMON_TREE_SPAWN (orte_daemon_cmd_flag_t) 5 | |||
#define ORTE_DAEMON_TREE_SPAWN_UNUSED (orte_daemon_cmd_flag_t) 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can just remove this cmd - no need to renumber the rest.
I think it is, yes - but I defer to @ggouaillardet |
since open-mpi/ompi@8f496b0 rsh_wait_daemon is invoked with an orte_wait_tracker_t *, that must be used to reach the orte_plm_rsh_caddy_t *. Signed-off-by: Gilles Gouaillardet <[email protected]>
…aitpid_cb() since open-mpi/ompi@8f496b0 sstore_stage_local_compress_waitpid_cb is invoked with an orte_wait_tracker_t *, that must be used to reach the orte_sstore_stage_local_app_snapshot_info_t *. Signed-off-by: Gilles Gouaillardet <[email protected]>
This parameter can be used to set the node regex max length that can be passed to the orted command line. For testing purpose, it can be set to zero in order to force the node regex being retrieved by orted from its parent. Signed-off-by: Gilles Gouaillardet <[email protected]>
When the node regex is too long to be sent on the command line, retrieve it first from the parent, and then spawn the remote orted Signed-off-by: Gilles Gouaillardet <[email protected]>
Now that the daemon calls remote_spawn itself, there is no longer a need for the "tree_spawn" command nor the associated command processing code since the HNP is no longer sending a tree-spawn message to the orted. Thanks Ralph for the guidance ! Signed-off-by: Gilles Gouaillardet <[email protected]>
3cbf39b
to
03da521
Compare
ready for primetime, merging now |
When the node regex is too long to be sent on the command line,
retrieve it first from the parent, and then spawn the remote orted
Signed-off-by: Gilles Gouaillardet [email protected]