You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was recently investigating the issue with PMIx_Get latency of a dstore. I was running on 1 node and observing growing numbers when PPN cont was increased. I was using the default binding policy thinking that it defaults to bind-to core.
The bottleneck was attributed to a thread shift part: openpmix/openpmix#665 (comment).
Debugging the scheduler that PMIx service thread was assigned to a different core which was causing perf issues. You can see on the plot that starting from 4 procs the performance degrades noticeably. This is due to the fact that if IIRC up to 2 processes mpirun will bind to core and then it will be socket.
Perf confirmed that guess:
cpu # is enclosed in brackets: [0004];
pmix_intra_perf[164802] is the main thread
pmix_intra_perf[164807/164802] is a service thread.
For 4 PPN case procs was remaining on their CPUs for the whole time (cpu4 and cpu8). But starting from 16PPN they began to actively migrate which caused more rapid growt:
@artpol84 IIRC, the rationale for binding to sockets (instead of core) is to be friendly with those who have hybrid MPI+OpenMP applications, but fail to ask n cpus per MPI task.
And the rationale for binding to cores by default when there are 2 MPI tasks is simply to get better out of the box performance when comparing Open MPI vs an other MPI library.
And when those decisions was made circumstances I highlighted here wasn’t taken into consideration.
I discussed this with @rhc54 in the context of PMIx performance and he suggested that OMPI defaults might need to be revisited considering these findings.
This was discussed at the devel meeting, and the conclusion was that the need to adequately support multi-threaded applications overrides this issue. We don't know of any way to force the kernel to keep one thread local to another, following each other around the socket.
For performance tests like the one you are running, you should override the default binding policy with bind-to core. For OMPI, we feel that the current defaults are the correct ones to use.
OMPI version: v2.1
I was recently investigating the issue with PMIx_Get latency of a dstore. I was running on 1 node and observing growing numbers when PPN cont was increased. I was using the default binding policy thinking that it defaults to bind-to core.

The bottleneck was attributed to a thread shift part:
openpmix/openpmix#665 (comment).
Debugging the scheduler that PMIx service thread was assigned to a different core which was causing perf issues. You can see on the plot that starting from 4 procs the performance degrades noticeably. This is due to the fact that if IIRC up to 2 processes mpirun will bind to core and then it will be socket.
Perf confirmed that guess:
[0004]
;pmix_intra_perf[164802]
is the main threadpmix_intra_perf[164807/164802]
is a service thread.For 4 PPN case procs was remaining on their CPUs for the whole time (cpu4 and cpu8). But starting from 16PPN they began to actively migrate which caused more rapid growt:
After forcing bind-to core performance stabilized (yellow dashed curve):
openpmix/openpmix#665 (comment)
I this an additional input on the impact that default binding policy may have. The suggestion is to consider this at the next OMPI dev meeting.
The text was updated successfully, but these errors were encountered: