Skip to content

Poor performance on Power-9 hardware with GCC and SMT enabled #2380

Open
@zephyr111

Description

@zephyr111

Hello,

I benchmarked the simple following dgemm call using 4096x4096 matrices (thus n=4096 and a, b and c are matrices) on a IBM LC922 machine with 2 POWER-9 processors (of each 22 cores and 88 hardware threads):
cblas_dgemm(CblasColMajor, CblasNoTrans, CblasNoTrans, n, n, n, 1.0, a, n, b, n, 1.0, c, n);

While the performance is great when using exactly 1 thread per core (and specifying threads places and binding). The performance strongly drop to the sequential performance if 2 or 4 threads per core are used with gcc and we can see that only one thread is actually computing. Note that with clang there is also a drop but clearly less significant and more than threads is running.

With GCC 8.3.0:

$ OMP_NUM_THREADS=44 OMP_PLACES="cores(44)" OMP_PROC_BIND=close ./a.out
462.493 Gflops (time: 0.29717 s)
$ OMP_NUM_THREADS=176 OMP_PLACES="threads(176)" OMP_PROC_BIND=close ./a.out
22.1915 Gflops (time: 6.1933 s)
$ OMP_NUM_THREADS=1 OMP_PLACES="cores(1)" OMP_PROC_BIND=close ./a.out
22.6448 Gflops (time: 6.06934 s)

With Clang 9.0.0-2:

$ OMP_NUM_THREADS=176 OMP_PLACES="threads(176)" OMP_PROC_BIND=close ./a.out
219.556 Gflops (time: 0.625986 s)
$ OMP_NUM_THREADS=176 OMP_PLACES="threads(176)" OMP_PROC_BIND=close ./a.out
221.271 Gflops (time: 0.621134 s)
$ OMP_NUM_THREADS=88 OMP_PLACES="threads(88)" OMP_PROC_BIND=close ./a.out
138.701 Gflops (time: 0.990901 s)
$ OMP_NUM_THREADS=88 OMP_PLACES="threads(88)" OMP_PROC_BIND=spread ./a.out
135.868 Gflops (time: 1.01156 s)
$ OMP_NUM_THREADS=44 OMP_PLACES="threads(44)" OMP_PROC_BIND=spread ./a.out
160.299 Gflops (time: 0.857392 s)
$ OMP_NUM_THREADS=44 OMP_PLACES="cores(44)" OMP_PROC_BIND=spread ./a.out
381.88 Gflops (time: 0.359901 s)

All test are runned on a ubuntu18.04.1 system.

Here is the command used to compile the basic example code:
g++ -O3 -mcpu=native -ffast-math main.cpp -I./OpenBLAS -L./OpenBLAS -lopenblas -fopenmp

The commit of the OpenBLAS git used is quite up to date: 8d2a796 (on origin/develop).

Note that this problem could also be related to possible issues in the OpenMP runtime implementation.

main.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions