Skip to content

Conversation

rhc54
Copy link
Contributor

@rhc54 rhc54 commented Mar 25, 2020

Deprecate --am and --amca options

Avoid default param files on backend nodes
Any parameters in the PRRTE default or user param files will have been
picked up by prte and included in the environment sent to the prted, so
don't open those files on the backend.

Avoid picking up MCA param file info on backend
Avoid the scaling problem at PRRTE startup by only reading the system
and user param files on the frontend.

Complete revisions to cmd line parser for OMPI
Per specification, enforce following precedence order:

  1. system-level default parameter file
  2. user-level default parameter file
  3. Anything found in the environment
  4. "--tune" files. Note that "--amca" goes away and becomes equivalent to "--tune". Okay if it is provided more than once on a cmd line (we will aggregate the list of files, retaining order), but an error if a parameter is referenced in more than one file with a different value
  5. "--mca" options. Again, error if the same option appears more than once with a different value. Allowed to override a parameter referenced in a "tune" file
  6. "-x" options. Allowed to overwrite options given in a "tune" file, but cannot conflict with an explicit "--mca" option
  7. all other options

Fix special handling of "-np"
Fixes #7565

Get agreement on jobid across the layers
Need all three pieces (PRRTE, PMIx, and OPAL) to agree on the nspace
conversion to jobid method

Ensure prte show_help messages get output
Print abnormal termination messages
Fixes #7564

Add extra libs to PRRTE binaries for external deps
libevent, hwloc, and pmix can be external and may require that their
libs be explicitly linked into the PRRTE binaries

Fix scalable launch with rsh - resolve several issues in tree-spawn

Signed-off-by: Ralph Castain [email protected]

@rhc54 rhc54 requested a review from jsquyres March 25, 2020 22:46
@rhc54 rhc54 self-assigned this Mar 25, 2020
@rhc54
Copy link
Contributor Author

rhc54 commented Mar 25, 2020

@artemry-mlnx I fully expect this to fail Mellanox CI due to the changes in cmd line parsing rules when it comes to "--tune", "--amca", "--mca", and "-x" options. I'll need your help to work thru the updates to your CI

@artemry-nv
Copy link

@rhc54 sure, I'll deal with the CI script update.

@open-mpi open-mpi deleted a comment from ibm-ompi Mar 26, 2020
@open-mpi open-mpi deleted a comment from ibm-ompi Mar 26, 2020
@open-mpi open-mpi deleted a comment from ibm-ompi Mar 26, 2020
@gpaulsen
Copy link
Member

Well, host c656f6n03 is up, but address 172.18.0.3 must be a docker overlay network.
@jjhursey can you please take a look?

@rhc54
Copy link
Contributor Author

rhc54 commented Mar 26, 2020

No, it isn't anything to do with IBM CI - the MCA params need to be properly prefixed in the OMPI case and they aren't. I'm fixing it

@jjhursey
Copy link
Member

Well, host c656f6n03 is up, but address 172.18.0.3 must be a docker overlay network.

Just to follow up on this question: Yeah the 172.18.0.x network is the overlay inside the virtual cluster. It is private and routable between only the "virtual nodes" in that cluster. It shouldn't be getting in the way. It sounds like Ralph is working the real issue.

@open-mpi open-mpi deleted a comment from ibm-ompi Mar 26, 2020
@open-mpi open-mpi deleted a comment from ibm-ompi Mar 26, 2020
@jsquyres
Copy link
Member

Inside an salloc job, I see that mpirun of a non-MPI job works fine, but mpirun of an MPI job fails:

$ mpirun hostname | sort | uniq
mpi028
mpi029

$ mpirun ring_c
------------------------------------------------------------
A process or daemon was unable to complete a TCP connection
to another process:
  Local host:    savbu-usnic-a
  Remote host:   mpi028
This is usually caused by a firewall on the remote host. Please
check that any firewall (e.g., iptables) has been disabled and
try again.
------------------------------------------------------------
------------------------------------------------------------
A process or daemon was unable to complete a TCP connection
to another process:
  Local host:    savbu-usnic-a
  Remote host:   mpi029
This is usually caused by a firewall on the remote host. Please
check that any firewall (e.g., iptables) has been disabled and
try again.
------------------------------------------------------------

@rhc54
Copy link
Contributor Author

rhc54 commented Mar 26, 2020

Not sure I can understand that one - did the daemon on the remote host segfault? Otherwise, it's hard to understand why the PRRTE infrastructure would care about MPI vs non-MPI apps

@jsquyres
Copy link
Member

I am now unable to reproduce the error. Weird, because it was 100% reproducible before. 🤷‍♂

Sooo... let's ignore my report, I guess!

@jsquyres
Copy link
Member

What's the failure in the Mellanox CI? Is that some in-flight CLI change that the Mellanox CI needs to be updated for?

@rhc54
Copy link
Contributor Author

rhc54 commented Mar 26, 2020

Yes and no. Yes, the specific test should return an error because they included the same envar in two different tune files, and assigned it a different value. This is no longer permitted.

No in that there are still some bugs in the parser as it didn't error out as it should. I'm working on that one, and then we'll have to address the test itself.

@rhc54
Copy link
Contributor Author

rhc54 commented Mar 26, 2020

@artemry-mlnx I believe I have this working now and reporting cmd line errors as it should. There are two tests that are going to fail:

echo "--mca mca_base_env_list \"XXX_A=1;XXX_B=2;XXX_C;XXX_D;XXX_E\"" > "$WORKSPACE/test_tune.conf"
# shellcheck disable=SC2086
val=$("${OMPI_HOME}/bin/mpirun" $mca -np 2 --tune "$WORKSPACE/test_tune.conf" --mca mca_base_env_list \
     "XXX_A=7;XXX_B=8" "${abs_path}/env_mpi" | sed -n -e 's/^XXX_.*=//p' | sed -e ':a;N;$!ba;s/\n/+/g' | bc)
# precedence goes left-to-right.
# A is set to 1 in "tune", and then reset to 7 in the --mca parameter
# B is set to 2 in "tune", and then reset to 8 in the --mca parameter
# C, D, E are taken from the environment as 3,4,5
# return (7+8+3+4+5)*2=54

val=$("${OMPI_HOME}/bin/mpirun" $mca --np 2 --tune "$WORKSPACE/test_tune.conf" \
    --am "$WORKSPACE/test_amca.conf" "${abs_path}/env_mpi" | sed -n -e 's/^XXX_.*=//p' | sed -e ':a;N;$!ba;s/\n/+/g' | bc)
# precedence goes left-to-right.
# A is first set to 1 in "tune", and then reset to 7 in "amca".
# B is first set to 2 in "tune", but then reset to 8 in "amca"
# C, D, E are taken from the environment as 3,4,5
# return (7+8+3+4+5)*2=54

These will fail because they "reset" values, which is no longer allowed. You can override values from a file by putting a new value on the cmd line itself, but values from within two files (or even within a single file) are not allowed to override each other. I'm not sure how you would like to fix those - we could just turn them off, I suppose, or you could fix the override. Up to you.

Deprecate --am and --amca options

Avoid default param files on backend nodes
Any parameters in the PRRTE default or user param files will have been
picked up by prte and included in the environment sent to the prted, so
don't open those files on the backend.

Avoid picking up MCA param file info on backend
Avoid the scaling problem at PRRTE startup by only reading the system
and user param files on the frontend.

Complete revisions to cmd line parser for OMPI
Per specification, enforce following precedence order:

1. system-level default parameter file
1. user-level default parameter file
1. Anything found in the environment
1. "--tune" files. Note that "--amca" goes away and becomes equivalent to "--tune". Okay if it is provided more than once on a cmd line (we will aggregate the list of files, retaining order), but an error if a parameter is referenced in more than one file with a different value
1. "--mca" options. Again, error if the same option appears more than once with a different value. Allowed to override a parameter referenced in a "tune" file
1. "-x" options. Allowed to overwrite options given in a "tune" file, but cannot conflict with an explicit "--mca" option
1. all other options

Fix special handling of "-np"

Get agreement on jobid across the layers
Need all three pieces (PRRTE, PMIx, and OPAL) to agree on the nspace
conversion to jobid method

Ensure prte show_help messages get output
Print abnormal termination messages
Cleanup error reporting in persistent operations

Signed-off-by: Ralph Castain <[email protected]>

dd

Signed-off-by: Ralph Castain <[email protected]>
Copy link
Member

@jsquyres jsquyres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are working for me.

@rhc54
Copy link
Contributor Author

rhc54 commented Mar 29, 2020

@artemry-mlnx Any estimate on when you will update the tests? We are hitting a bit of a problem in that PRRTE is now ahead of OMPI and cannot launch OMPI jobs without this PR being committed.

Properly mark/detect that a daemon sourced the event broadcast to avoid
reinjecting it into the PMIx server library. Correct the source field
for the event notify call on launcher ready.

Update event notification for tool support
Deal with a variety of race conditions related to tool reconnection to a
different server.

Signed-off-by: Ralph Castain <[email protected]>
@artemry-nv
Copy link

@rhc54 working on the script update.
ETA: today/tomorrow.

libevent, hwloc, and pmix can be external and may require that their
libs be explicitly linked into the PRRTE binaries

Signed-off-by: Ralph Castain <[email protected]>
@artemry-nv
Copy link

artemry-nv commented Mar 31, 2020

@rhc54
Could you please elaborate a bit on the new logic based on the scenarios above - who is not allowed to overwrite whom?
I've temporary disabled 2 tests you mentioned above but there are still some failed test scenarios: log file - could you please take a look?

@rhc54
Copy link
Contributor Author

rhc54 commented Mar 31, 2020

@artemry-mlnx Perhaps the easiest approach is for me to give you a pull request with the required changes? That will give me a chance to manually check each test and determine the required changes

@rhc54
Copy link
Contributor Author

rhc54 commented Mar 31, 2020

@artemry-mlnx I believe mellanox-hpc/jenkins_scripts#96 will fix the CI tests. Can you please review and we can recheck here?

@rhc54
Copy link
Contributor Author

rhc54 commented Mar 31, 2020

@jladd-mlnx

@artemry-nv
Copy link

@rhc54
Have you verified the changes?

@rhc54
Copy link
Contributor Author

rhc54 commented Mar 31, 2020

@artemry-mlnx I ran the updated tests by hand and they all worked, so I believe it should pass. If not, I can deal with any failures once I see them.

@artemry-nv
Copy link

@rhc54
Can you please update Azure Pipeline file in your feature branch to the new CI scripts (your topic branch in jenkins_scripts repo) to verify before merging them:

ompi_jenkins_scripts_git_repo_url: https://github.com/mellanox-hpc/jenkins_scripts.git

ompi_jenkins_scripts_git_branch: master

@rhc54
Copy link
Contributor Author

rhc54 commented Mar 31, 2020

@artemry-mlnx Done - let's see how it does!

@artemry-nv
Copy link

@rhc54 it's passed, please revert back Azure Pipeline and I'll proceed with the merge for jenkins_scripts.

@rhc54
Copy link
Contributor Author

rhc54 commented Mar 31, 2020

@artemry-mlnx Done!

@artemry-nv
Copy link

@rhc54
Ideally we need to merge these both PRs in the same time - are you going to merge this Open MPI PR after CI is passed?

@rhc54
Copy link
Contributor Author

rhc54 commented Mar 31, 2020

@artemry-mlnx Yes - though it isn't clear that this PR is going to pass without my local CI change unless you commit the PR, true?

@artemry-nv
Copy link

That's true - merged first.

@artemry-nv
Copy link

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rhc54
Copy link
Contributor Author

rhc54 commented Mar 31, 2020

@artemry-mlnx Thanks!

@rhc54 rhc54 merged commit 538d2de into open-mpi:master Mar 31, 2020
@rhc54 rhc54 deleted the topic/up2 branch March 31, 2020 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"mpirun -np 1 ..." emits warning master: mpirun does not emit error message with non-existent executable
5 participants