CPU frequency-dependent timer issues starting with 2.0.2

Dear Open MPI team,

A few days ago, my colleague Daniel Tameling noticed severe performance issues when running the HPCC benchmark with Open MPI. After spending quite some time tracking down the reason, we suspect that a regression was introduced between Open MPI 2.0.1 and 2.0.2. More specifically: the Open MPI 2.0.1 release tarball seems to be okay and the 2.0.2 release shows issues that persist through the openmpi-v2.0.x-201702170256-5fa504b nightly build.

The issue is that the new releases seem to be severely affected by the CPU frequency set using the acpi-cpufreq driver. When the "ondemand" governor is active and set to allow frequencies between 1.20 GHz and 2.40 GHz, the performance difference between Open MPI 2.0.1 and versions > 2.0.2 is almost a factor of two. Only when the governor "userspace" is used to pin the frequency to the maximum+turbo, the two versions show similar performance.

It does not seem to depend on the PML, BTL, MTL or even fabric. We tested FDR, EDR, openib, mxm, ob1, yalla, cm (on IB with Slurm), and cm, psm2 (on OPA with PBS).

The following latencies were measured on 2 nodes, 2 sockets Intel Xeon E5-2680v4 connected using InfiniBand EDR:

ompi-2.0.3-5fa504b, "ondemand" at 1.20 GHz-2.40 GHz:

```
$ mpirun -n 2 -mca pml yalla -mca rmaps_dist_device mlx5_0:1 -mca coll_hcoll_enable 0 -x MXM_IB_PORTS=mlx5_0:1 -x MXM_TLS=rc,self,shm -mca rmaps_base_mapping_policy dist:span -map-by node --report-bindings bash -c 'ulimit -s 10240; ~/opt/osu-5.3-ompi2/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_latency'
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
   WARNING:

You should always run with libnvidia-ml.so that is installed with your NVIDIA Display Driver. By default it's installed in /usr/lib and /usr/lib64. libnvidia-ml.so in TDK package is a stub library that is attached only for build purposes (e.g. machine that you build your application doesn't have to have Display Driver installed).
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[hsw006:37946] MCW rank 0 bound to socket 1[core 14[hwt 0-1]]: [../../../../../../../../../../../../../..][BB/../../../../../../../../../../../../..]
[hsw007:47684] MCW rank 1 bound to socket 1[core 14[hwt 0-1]]: [../../../../../../../../../../../../../..][BB/../../../../../../../../../../../../..]
[1487330514.524672] [hsw006:37959:0]         sys.c:744  MXM  WARN  Conflicting CPU frequencies detected, using: 2401.00
[1487330514.522429] [hsw007:47691:0]         sys.c:744  MXM  WARN  Conflicting CPU frequencies detected, using: 2401.00
# OSU MPI Latency Test v5.3
# Size          Latency (us)
0                       2.23
1                       2.23
2                       2.20
4                       2.19
8                       2.19
16                      2.27
32                      2.28
64                      2.37
128                     3.21
256                     3.37
512                     3.61
1024                    3.98
2048                    4.76
4096                    6.41
8192                   10.12
16384                  16.37
32768                  20.39
65536                  26.40
131072                 37.37
262144                 59.33
524288                102.96
1048576               191.14
2097152               364.29
4194304               711.79
```

ompi-2.0.1, "ondemand" at 1.20 GHz-2.40 GHz:

```
$ mpirun -n 2 -mca pml yalla -mca rmaps_dist_device mlx5_0:1 -mca coll_hcoll_enable 0 -x MXM_IB_PORTS=mlx5_0:1 -x MXM_TLS=rc,self,shm -mca rmaps_base_mapping_policy dist:span -map-by node --report-bindings bash -c 'ulimit -s 10240; ~/opt/osu-5.3-ompi2/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_latency'
[snip warning]
[hsw006:37973] MCW rank 0 bound to socket 1[core 14[hwt 0-1]]: [../../../../../../../../../../../../../..][BB/../../../../../../../../../../../../..]
[hsw007:47714] MCW rank 1 bound to socket 1[core 14[hwt 0-1]]: [../../../../../../../../../../../../../..][BB/../../../../../../../../../../../../..]
[1487330533.943496] [hsw006:37990:0]         sys.c:744  MXM  WARN  Conflicting CPU frequencies detected, using: 2401.00
[1487330533.948853] [hsw007:47721:0]         sys.c:744  MXM  WARN  Conflicting CPU frequencies detected, using: 2401.00
# OSU MPI Latency Test v5.3
# Size          Latency (us)
0                       1.15
1                       1.14
2                       1.13
4                       1.10
8                       1.10
16                      1.13
32                      1.14
64                      1.17
128                     1.58
256                     1.65
512                     1.77
1024                    1.96
2048                    2.35
4096                    3.23
8192                    5.05
16384                   8.17
32768                  10.21
65536                  13.27
131072                 18.74
262144                 29.59
524288                 51.32
1048576                95.44
2097152               182.00
4194304               355.76
```

ompi-2.0.3-5fa504b, "userspace" at 1.8 GHz:

```
$ mpirun -n 2 -mca pml yalla -mca rmaps_dist_device mlx5_0:1 -mca coll_hcoll_enable 0 -x MXM_IB_PORTS=mlx5_0:1 -x MXM_TLS=rc,self,shm -mca rmaps_base_mapping_policy dist:span -map-by node --report-bindings bash -c 'ulimit -s 10240; ~/opt/osu-5.3-ompi2/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_latency'
[snip warning]
[hsw006:41654] MCW rank 0 bound to socket 1[core 14[hwt 0-1]]: [../../../../../../../../../../../../../..][BB/../../../../../../../../../../../../..]
[hsw007:51373] MCW rank 1 bound to socket 1[core 14[hwt 0-1]]: [../../../../../../../../../../../../../..][BB/../../../../../../../../../../../../..]
# OSU MPI Latency Test v5.3
# Size          Latency (us)
0                       1.74
1                       1.78
2                       1.78
4                       1.78
8                       1.78
16                      1.83
32                      1.84
64                      1.93
128                     2.54
256                     2.70
512                     2.93
1024                    3.29
2048                    4.04
4096                    5.62
8192                    9.20
16384                  12.25
32768                  14.89
65536                  18.98
131072                 26.13
262144                 41.60
524288                 69.95
1048576               128.21
2097152               244.13
4194304               475.81
```

ompi-2.0.1, "userspace" at 1.8 GHz:

```
$ mpirun -n 2 -mca pml yalla -mca rmaps_dist_device mlx5_0:1 -mca coll_hcoll_enable 0 -x MXM_IB_PORTS=mlx5_0:1 -x MXM_TLS=rc,self,shm -mca rmaps_base_mapping_policy dist:span -map-by node --report-bindings bash -c 'ulimit -s 10240; ~/opt/osu-5.3-ompi2/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_latency'
[snip warning]
[hsw006:41690] MCW rank 0 bound to socket 1[core 14[hwt 0-1]]: [../../../../../../../../../../../../../..][BB/../../../../../../../../../../../../..]
[hsw007:51407] MCW rank 1 bound to socket 1[core 14[hwt 0-1]]: [../../../../../../../../../../../../../..][BB/../../../../../../../../../../../../..]
# OSU MPI Latency Test v5.3
# Size          Latency (us)
0                       1.27
1                       1.30
2                       1.30
4                       1.30
8                       1.30
16                      1.35
32                      1.35
64                      1.38
128                     1.86
256                     1.97
512                     2.14
1024                    2.43
2048                    2.99
4096                    4.23
8192                    6.81
16384                   9.12
32768                  11.11
65536                  14.12
131072                 19.51
262144                 30.47
524288                 52.76
1048576                95.91
2097152               182.81
4194304               356.55
```

ompi-2.0.3-5fa504b, "userspace" at 2.4 GHz (turbo on):

```
$ mpirun -n 2 -mca pml yalla -mca rmaps_dist_device mlx5_0:1 -mca coll_hcoll_enable 0 -x MXM_IB_PORTS=mlx5_0:1 -x MXM_TLS=rc,self,shm -mca rmaps_base_mapping_policy dist:span -map-by node --report-bindings bash -c 'ulimit -s 10240; ~/opt/osu-5.3-ompi2/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_latency'
[snip warning]
[hsw006:45372] MCW rank 0 bound to socket 1[core 14[hwt 0-1]]: [../../../../../../../../../../../../../..][BB/../../../../../../../../../../../../..]
[hsw007:55141] MCW rank 1 bound to socket 1[core 14[hwt 0-1]]: [../../../../../../../../../../../../../..][BB/../../../../../../../../../../../../..]
# OSU MPI Latency Test v5.3
# Size          Latency (us)
0                       1.09
1                       1.10
2                       1.10
4                       1.09
8                       1.09
16                      1.14
32                      1.14
64                      1.17
128                     1.60
256                     1.68
512                     1.79
1024                    1.98
2048                    2.37
4096                    3.21
8192                    5.07
16384                   8.20
32768                  10.20
65536                  13.22
131072                 18.69
262144                 29.67
524288                 51.43
1048576                95.50
2097152               182.12
4194304               355.87
```

ompi-2.0.1, "userspace" at 2.4 GHz (turbo on):

```
$ mpirun -n 2 -mca pml yalla -mca rmaps_dist_device mlx5_0:1 -mca coll_hcoll_enable 0 -x MXM_IB_PORTS=mlx5_0:1 -x MXM_TLS=rc,self,shm -mca rmaps_base_mapping_policy dist:span -map-by node --report-bindings bash -c 'ulimit -s 10240; ~/opt/osu-5.3-ompi2/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_latency'
[snip warning]
[hsw006:45403] MCW rank 0 bound to socket 1[core 14[hwt 0-1]]: [../../../../../../../../../../../../../..][BB/../../../../../../../../../../../../..]
[hsw007:55175] MCW rank 1 bound to socket 1[core 14[hwt 0-1]]: [../../../../../../../../../../../../../..][BB/../../../../../../../../../../../../..]
# OSU MPI Latency Test v5.3
# Size          Latency (us)
0                       1.10
1                       1.11
2                       1.09
4                       1.09
8                       1.08
16                      1.13
32                      1.13
64                      1.16
128                     1.57
256                     1.65
512                     1.76
1024                    1.96
2048                    2.35
4096                    3.23
8192                    5.04
16384                   8.15
32768                  10.17
65536                  13.21
131072                 18.71
262144                 29.55
524288                 51.31
1048576                95.37
2097152               182.00
4194304               355.76
```

The Open MPI version (2.0.2a pre-release) in the HPC-X toolkit version 1.8.0 shows the same issues.  Earlier releases (e.g., 1.10.2) also seem to be unaffected.

We are quite stumped as to what could be going on. (My gut feeling would be to blame the recent timer changes, but I really have no idea.)

In any case, thank you for your work on Open MPI!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CPU frequency-dependent timer issues starting with 2.0.2 #3003

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CPU frequency-dependent timer issues starting with 2.0.2 #3003

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions