Skip to content

mpirun --help needs updating #10705

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
awlauria opened this issue Aug 23, 2022 · 19 comments
Closed

mpirun --help needs updating #10705

awlauria opened this issue Aug 23, 2022 · 19 comments

Comments

@awlauria
Copy link
Contributor

awlauria commented Aug 23, 2022

The output of mpirun --help is rather out of date, and needs to be carefully read over and updated to account for new, removed, and changed options. Descriptions for some of these are either wrong, or insufficient.

Additionally, some of the mpirun --help foo options do not display anything when they should. For example:

$ ./exports/bin/mpirun --help ppr
--------------------------------------------------------------------------
Help was requested for an unknown option:

  Option: ppr

Please use the "mpirun --help" command to obtain a list of all
supported options.
-------------------------------------------------------------------------- 

where this should print the ppr help message.

Refs: #10698

@jjhursey
Copy link
Member

Notes:

  • --help should dump all of the options
    • Verify all of the options work with mpirun
  • --help ARG should dump information about just that argument
    • Make sure all of the category ARGs work (e.g., --help ppr)
      • Some categories might not work - just document those that work vs don't work for now.
    • Verify that the information is correct - clarify anything that needs clarification
  • All changes need to go into PRRTE (maybe in the schizo/ompi component, but likely will touch the core)
  • Once the CLI is fixed, then we need to look at the Open MPI docs

@drwootton
Copy link
Contributor

drwootton commented Aug 24, 2022

@jjhursey asked me to check mpirun options to make sure that reasonable help text was displayed for each option and that the mpirun command recognized each option in running a simple test.

I tested options in order top-down as specified in schizo_ompi.c

I tracked this by creating a text file with mpirun --help and then annotate the options I tested by adding lines starting with @@ after each option. I ran simple mpirun tests for each options to see if they seemed to work. In some cases I wasn't sure exactly how to use the option or what it was supposed to do, so I may have reported false failures.

These options were not properly recognized by mpirun --help commands

--omca: Option is recognized but no help text (mpirun --help omca says it is an invalid option)
--gomca: Option is recognized but no help text (mpirun --help gomca says it is an invalid option)
--parsable: Option is recognized but no help text (mpirun --help parsable says it is an invalid option)
--parseable: Option is recognized but no help text (mpirun --help parseable says it is an invalid option)
-n: Not included in mpirun --help text, but does have 'mpirun --help n' text and is recognized by
mpirun command.
-np: Not included in mpirun --help text, but does have 'mpirun --help np' text and is recognized by
mpirun command.
-c: Not included in mpirun --help text and not recognized by 'mpirun --help c', but is recognized by
the mpirun command.
--app: Has help text for 'mpirun --help app' but no text for 'mpirun --help'.
'mpirun --app appfile hello' runs but it's not clear how to specify option in appfile or
if appfile works

The options where help text was displayed are annotated, along with mpirun command results, in the attached file.
help.txt

@drwootton
Copy link
Contributor

I'm not expecting parameters like ppr to have individual help text since they are parameters to mpirun options such as --map-by or --bind-to which have their own help text.

@drwootton
Copy link
Contributor

I tested the remaining mpirun options, and have results in two files.

I updated help.txt with more comments, flagged with '%%' to distinguish them from the first set, flagged with '@@'.

There is a list of deprecated mpirun options in schizo_ompi.c, which I tested. Most/all of them did not appear in the mpirun --help text, so I created a separate file with the results of testing this. deprecated.txt.

All of these tests were run with a clean OpenMPI build, cloned around noon on 8/24.

help.txt
deprecated.txt

@jjhursey
Copy link
Member

jjhursey commented Aug 30, 2022

Supported: Failed to run

  • --stop-in-app
    • does not accept the parameter mentioned in the help text, and if the parameter is omitted, mpirun doesn't seem to do anything.

    • ACTION: Needs fixing. Austen may have already fixed this.
  • --output proctable x
    • specifying '--output proctable x' writes to stdout/stderr and no file is created.

    • ACTION: Needs fixing or clarifying
  • --launch-agent <arg0>
    • if I specify an invalid executable, like 'zzz', then no error message is issued and the application runs on the remote node specified by --host.

    • ACTION: Needs an error message
  • --personality <arg0>
    • seems to accept anything, like --personality xxx, without error

    • ACTION: Needs an error message. Should we accept anything other than ompi? Probably not. Take this out of the --help list.
  • -v / -V
    • mpirun seems to accept the -v or -V options, but I'm not sure they do anything.

    • ACTION: Needs to be fixed
  • --output <arg0>
    • Output options work, but output for dir and file options still displays to terminal in addition to specified destination

    • ACTION: Needs to be fixed or clarified if there is a "don't echo to terminal option"
  • --stop-in-app
    • no mention of how the application-determined point is specified

    • ACTION: Needs fixing as it does not work correctly.
  • -s|--preload-binary
    • If -n is before --preload-binary then it works. If -n is after then it launched one per physical core. Unexpected ordering behavior.

    • ACTION: Fix this

Supported: Need better help messages

  • ACTION: Generally the help text printed by mpirun --help ARG should be more informative than the short text displayed in --help.
  • ACTION: Cleanup these help messages
  • mpirun --help output
    • Needs description of supported directives

  • mpirun --help display
    • Does not display much information

  • --debug-daemons-file
    • It doesn't give filename or filename pattern. Unclear default

    • ACTION: remove from --help
  • --leave-session-attached
    • Help text is confusing to me since I don't understand why asking to leave session attached has anything with discarding stdout/stderr.

    • ACTION: remove from --help
  • --output <arg0>
    • It mentions qualifiers, but what the allowed qualifiers are or how to specify them.

  • --default-hostfile
    • not sure what a default hostfile is

    • ACTION: Clarify difference between --hostfile and --default-hostfile
  • --rankfile <arg0>
    • is a deprecated option not marked as deprecated.

    • ACTION: Is this really deprecated? Mark as not deprecated.
  • --noprefix
    • 'automatic --prefix behavior' is not explained.

  • --prefix <arg0>
    • maybe it should say something about prefix being the root directory for the MPI installation.

  • --tune <arg0>
    • it is not clear from help text that the string must be 'parm=value'. File option?

  • --mca <arg0> <arg1>
    • mpirun recognizes this option but does not flag it as deprecated.

    • ACTION: Are we really deprecating MCA? I hope not. Clarify that it is not deprecated. Does this set PMIX/PRRTE/OMPI MCA versions, or just the OMPI MCA? We should clarify.

Supported: Not recognized by mpirun, but are listed in --help

  • -H (short form of --host)
    • Help is not displayed for '-H' option.

    • ACTION: Fix
  • ACTION: Remove from --help -- possibly deeper
    • --gpmixmca <arg0> <arg1>
  • ACTION: Remove the following from --help
    • --test-suicide <arg0>
    • --set-cwd-to-session-dir
    • --daemonize
    • --keepalive <arg0>
    • --singleton <arg0>
    • --no-ready-msg
    • --report-pid <arg0>
    • --report-uri <arg0>
    • --set-sid
    • --system-server

Supported: Reported as invalid option

  • ACTION: Add text for these. Make sure they have backing code.
    • mpirun --help initial-errhandler
    • mpirun --help soft
    • mpirun --help arch
    • mpirun --help file
    • mpirun --help with-ft
  • ACTION: Translate to the backend function
    • mpirun --help display-comm
    • mpirun --help display-comm-finalize
    • mpirun --help output-proctable

Deprecated: Need --help deprecated statement

  • ACTION: Add deprecated text (Verify with the MCA deprecation warning enabled)
    • --nolocal
    • --oversubscribe
    • --use-hwthread-cpus
    • --cpu-set
    • --cpu-list
    • --bind-to-core
    • --bynode
    • --bycore
    • --cpus-per-proc
    • --cpus-per-rank
    • --npernode
    • --pernode
    • --byslot
    • --npersocket
    • --ppr
    • --amca
    • --am
    • --debug

Deprecated: Other

  • --output-filename
    • command processing does not seem to match the help text description. If I specify '--output-filename xxx' I get output files per task and file descriptor prefixed with 'xxx'.

    • ACTION: Needs fixing
  • --display-topo
    • The mpirun command flags the option as requiring a parameter, but I don't know what a valid value is, so adding a parameter fails as well

    • ACTION: Needs fixing. Austen may have a fix posted
  • --display-devel-allocation
    • A mpirun --display-devel-allocation -2 hello command does not display any allocation text.

    • Adding a valid --host option results in mpirun hanging.

    • ACTION: This may be removed
  • --use-hwthread-cpus
    • mpirun fails telling me the bind-to directive has an invalid qualifier hwthread.

    • ACTION: Need to fix the translation
  • --debug
    • mpirun accepts the command but doesn't seem to do anything.

    • ACTION: Remove from --help

From #10698:

  • --app, deprecated translation is not working
  • -N doesn't appear in --help, possible removal candidate. Maps to --map-by ppr:1:node, so when used with --map-by this is confusing.
  • --get-stack-traces seems to fail at larger scales via a hang or crash
  • --mca mca_base_env_list does not work on recent prrte updates
  • --cou-set/--cpu-list: Works - somewhat. If you ask for more ranks than cpus
    you get:
PRTE ERROR: Unable to map job in file rmaps_rr.c at line 184
 even with --oversubscribe
  • --show-progress, doesn't seem to do anything. Removal candidate?

@hppritcha
Copy link
Member

I'll take a pass at the

Supported: Not recognized by mpirun, but are listed in --help

items

@jjhursey jjhursey mentioned this issue Sep 13, 2022
14 tasks
hppritcha added a commit to hppritcha/prrte that referenced this issue Sep 15, 2022
from the mpirun help message

related to open-mpi/ompi#10705

Signed-off-by: Howard Pritchard <[email protected]>
@rhc54
Copy link
Contributor

rhc54 commented Sep 16, 2022

You need to be careful here to distinguish between options that face the user vs options thatmpirun must accept due to other requirements. For example the --keepalive option is used when PMIx is asked to fork/exec the DVM in response to a singleton comm_spawn, or is asked to fork/exec mpirun by a debugger tool (which is a use-case from the DDT debugger team). Likewise, --singleton is required by the singleton comm_spawn as that is what passes the singleton's ID to the DVM so it knows who it is supporting.

There are a number of these "hidden" options that you can, if you wish, remove from the help file as they are not generally used directly by a user. However, a developer (e.g., writing a tool) might need to know they exist and how to use them.

@rhc54
Copy link
Contributor

rhc54 commented Sep 17, 2022

I'll try to provide some thoughts when/where I can.

--leave-session-attached is usually the very first thing we ask the user to do when they report launch problems so we can see if any error messages are coming from the daemons. It is also needed if you want to see the daemon output from any verbose options you set. Perhaps better to simply improve the help message on it.

--personality probably isn't something the user needs to set. However, you may be missing something for supporting OSHMEM apps based on OMPI. There are supporting elements in PMIx for open shmem applications based on input I received from Mellanox and SUNY, so you probably should pass the oshmem personality down to PMIx for those types of applications. Perhaps something like detecting that they used oshrun to start the job, and then add oshmem to the personality field of the prte_job_t when processing envars or some other entry point? Might need some investigation.

@rhc54
Copy link
Contributor

rhc54 commented Sep 21, 2022

Do you guys want me to attend a Tues meeting, or perhaps a dedicated one, to review these? I fear that there is some misunderstanding here regarding the use of many of these options. It is probably okay to remove some from the help text, but we somehow have to maintain their usage or else other things the we regularly use will break and/or no longer be available.

@gpaulsen
Copy link
Member

@rhc54 in talking to others, the v5 RMs WOULD like to meet with you for some clarification. Look in PMIx slack for Austen's message. We have a resource who can help with some implementation after we make some decisions, so lets do it! :)

@rhc54
Copy link
Contributor

rhc54 commented Sep 26, 2022

Sure - happy to do so. Sorry I missed today's meeting - had a doctor's appt. Will work with Austen on alternate times.

@naughtont3
Copy link
Contributor

Quick follow-up regarding mpirun with DVM... it seems like the best thing for users would be to add ability to pass --dvm-uri down from the schizo/ompi options. Otherwise they would need to use prun and all the other MPI related options would be absent.

I thought this would be a matter of adding PMIX_OPTION_DEFINE(PRTE_CLI_DVM_URI, PMIX_ARG_REQD) to schio/ompi but quick check didn't have that in resulting mpirun --help output. So I missed something.

@rhc54
Copy link
Contributor

rhc54 commented Sep 27, 2022

OMPI folks decided they did not want mpirun to connect to a DVM, and therefore there is no option for doing so.

There is no problem with OMPI users using prun --personality ompi <bunch of OMPI options> to run an OMPI-based job on a DVM.

@naughtont3
Copy link
Contributor

I was thinking passing the DVM uri was a good compromise to include for mpirun, as it would provide easy way for user to avoid having to adjust their mpirun to leverage the DVM. Question was if it was difficult to add this functionality, or just "document" how to do it with prun.

@rhc54
Copy link
Contributor

rhc54 commented Sep 27, 2022

Trivial to add, if you folks decide you want to do so.

@naughtont3
Copy link
Contributor

Q: Can we use prun --personality ompi <bunch of OMPI options> --dvm-uri file:dvm.uri ...? Restated, can we use options from two schizo personalities?

@rhc54
Copy link
Contributor

rhc54 commented Sep 27, 2022

It isn't mixing personalities - just telling prun to use the OMPI personality to parse the cmd line

@naughtont3
Copy link
Contributor

Ok, thanks for clarification Ralph.

Q: Can we use prun --personality ompi <bunch of OMPI options> --dvm-uri file:dvm.uri ...? Restated, can we use options from two schizo personalities?

And for notes on this ticket, i did following for quick test...

prte --prtemca prte_pmix_server_verbose 50 --report-uri dvm2.uri >& LOG.dvm2 &
prun --personality ompi --display-comm   --dvm-uri file:dvm2.uri --np 4 ./ring_c
tail LOG.dvm2

@awlauria
Copy link
Contributor Author

awlauria commented Oct 17, 2022

Closing this as complete, fix has percolated up to v5.0.x in latest submodule update.

prrte -
main: openpmix/prrte#1542
v3.0: openpmix/prrte#1548

ompi -
main: #10928
v5.0.x: #10934

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants