Skip to content

map-by ppr behavior change between 3.1.2 and later releases  #6236

Closed
@hppritcha

Description

@hppritcha

A user is reporting a problem with using a ppr mapping specification option which appeared to work pre OMPI 3.1.3 release.

Here's what the user was using for placement options (simplified) which works with 3.1.2 and older
versions of Open MPI:

mpirun -np 1 --map-by ppr:1:socket:pe=1 -bind-to core --report-bindings --mca btl ^openib ./hello_c
[cn800:171052] MCW rank 0 bound to socket 0[core 0[hwt 0-3]]: [BBBB/..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../....]

but launching the job using these options with 3.1.3 (or master or 4.0.x) gives:

--------------------------------------------------------------------------
An invalid value was given for the number of processes
per resource (ppr) to be mapped on each node:

  PPR:  1:socket:pe=1

The specification must be a comma-separated list containing
combinations of number, followed by a colon, followed
by the resource type. For example, a value of "1:socket" indicates that
one process is to be mapped onto each socket. Values are supported
for hwthread, core, L1-3 caches, socket, numa, and node. Note that
enough characters must be provided to clearly specify the desired
resource (e.g., "nu" for "numa").
--------------------------------------------------------------------------

Now if the user replaces the above command line options with

mpirun -np 1 --map-by ppr:1:socket,pe=1 -bind-to core --report-bindings --mca btl ^openib 

the command seems to work.

If the user uses the , for 3.1.2 or older however, she gets a similar error message from mpirun.

This change in behavior was due to 376d408.

So, the question is, which notation is correct? Its not clear from the mpirun man page why one should use a , starting with 3.1.3 but use a : for 3.1.2 and older releases.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions