Closed
Description
A user is reporting a problem with using a ppr mapping specification option which appeared to work pre OMPI 3.1.3 release.
Here's what the user was using for placement options (simplified) which works with 3.1.2 and older
versions of Open MPI:
mpirun -np 1 --map-by ppr:1:socket:pe=1 -bind-to core --report-bindings --mca btl ^openib ./hello_c
[cn800:171052] MCW rank 0 bound to socket 0[core 0[hwt 0-3]]: [BBBB/..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../....][..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../..../....]
but launching the job using these options with 3.1.3 (or master or 4.0.x) gives:
--------------------------------------------------------------------------
An invalid value was given for the number of processes
per resource (ppr) to be mapped on each node:
PPR: 1:socket:pe=1
The specification must be a comma-separated list containing
combinations of number, followed by a colon, followed
by the resource type. For example, a value of "1:socket" indicates that
one process is to be mapped onto each socket. Values are supported
for hwthread, core, L1-3 caches, socket, numa, and node. Note that
enough characters must be provided to clearly specify the desired
resource (e.g., "nu" for "numa").
--------------------------------------------------------------------------
Now if the user replaces the above command line options with
mpirun -np 1 --map-by ppr:1:socket,pe=1 -bind-to core --report-bindings --mca btl ^openib
the command seems to work.
If the user uses the ,
for 3.1.2 or older however, she gets a similar error message from mpirun.
This change in behavior was due to 376d408.
So, the question is, which notation is correct? Its not clear from the mpirun man page why one should use a ,
starting with 3.1.3 but use a :
for 3.1.2 and older releases.