Skip to content

Question about rank ordering of processes #6298

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jsquyres opened this issue Jan 24, 2019 · 1 comment · Fixed by #6493
Closed

Question about rank ordering of processes #6298

jsquyres opened this issue Jan 24, 2019 · 1 comment · Fixed by #6493
Labels

Comments

@jsquyres
Copy link
Member

We've noticed a difference in rank ordering behavior. It is easiest to describe this using 2 variations of the following 2 examples (i.e., 4 different cases):

$ cat foo.sh
#!/bin/sh
echo "`hostname`: MCW rank $OMPI_COMM_WORLD_RANK"
$ mpirun --host aaa,bbb ./foo.sh
[...output 1...]
$ mpirun --host bbb,aaa ./foo.sh
[...output 2...]

CASE 1: OMPI v2.1.x + localhost

  • Open MPI v2.1.x
  • When launching mpirun from machine aaa (i.e., when launching on localhost)

In this case, the two outputs are:

# Output 1
aaa: MCW rank 0
bbb: MCW rank 1
# Output 2
aaa: MCW rank 1
bbb: MCW rank 0

Notice that the order of MCW ranks follows the order of the hosts in the --host argument.

Case 2: OMPI v2.1.x + no localhost

  • Open MPI v2.1.x
  • When launching mpirun from a 3rd machine (i.e., when not launching on localhost):

In this case, the two outputs are:

# Output 1
aaa: MCW rank 0
bbb: MCW rank 1
# Output 2
aaa: MCW rank 1
bbb: MCW rank 0

Notice that -- just like case 1 -- the order of MCW ranks follows the order of the hosts in the --host argument.

Case 3: OMPI v3.0.x + localhost

  • Open MPI v3.0.x and beyond
  • When launching mpirun from machine aaa (i.e., when launching on localhost)

In this case, the two outputs are:

# Output 1
aaa: MCW rank 0
bbb: MCW rank 1
# Output 2
aaa: MCW rank 0
bbb: MCW rank 1

Notice that the order of MCW ranks does not follow the order of the hosts in the --host argument -- it stays constant.

Case 4: OMPI V3.0.x + no localhost

  • Open MPI v3.0.x and beyond
  • When launching mpirun from a 3rd machine (i.e., when not launching on localhost):

In this case, the two outputs are:

# Output 1
aaa: MCW rank 0
bbb: MCW rank 1
# Output 2
aaa: MCW rank 1
bbb: MCW rank 0

Notice that -- just like cases 1 and 2, but unlike case 3 -- the order of MCW ranks follows the order of the hosts in the --host argument.


Do we know / remember if case 3 is intentional?

We ask because:

  • the behavior changed from v2.1.x to v3.0.x (and beyond)
  • the behavior is different depending on whether localhost is in the --host list or not (which, if this was a deliberate change in behavior, seems odd)

...or is rank ordering according to the ordering of hosts in --host not guaranteed? I.e., are cases 1, 2, and 4 just happenstance?

FYI @bturrubiates

@rhc54
Copy link
Contributor

rhc54 commented Jan 24, 2019

Heck if I know - the consensus has changed over the years. IIRC, the last time we went around on this, I believe we decided that the ordering should follow the --host list. However, we always got in knots over the various cases (when resources are managed, a hostfile is provided, etc.).

The problem is that the ordering can be really important when you are on clusters with topological fabrics. Most users don't know how the nodes sit on the topology, but the scheduler does and assigns the hosts in the required order for best performance. In those cases, you really only want -host to act as a filter and not necessarily specify the ordering.

That said, we have had users complain about that behavior too, regardless of the possible performance impact. What you probably really need is a different "marker" in the -host option that indicates a request for rigid ordering. We already have markers for empty nodes, so adding another marker to indicate rigid ordering shouldn't be too hard. You then just need to ensure that the node ordering on the list of available nodes (as constructed in rmaps_base_support_fns.c) matches the requested ordering so the map gets constructed correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants