Slowdown when using openblas-pthreads alongside openmp based parallel code

Hi,

I have a code which mixes BLAS calls (gemm) and OpenMP based parallel loops (they are not nested). When OpenBLAS is built using OpenMP everything is fine but when OpenBLAS is built with pthreads there's a huge slowdown. Below is a reproducible example (sorry it's from python/cython)

```py
%load_ext cython

%%cython -f --compile-args=-fopenmp --link-args=-fopenmp
# cython: profile=True, cdivision=True, boundscheck=False, wraparound=False

import time
import numpy as np
from cython.parallel import prange
 
def f2(double[:, ::1] A):
     cdef:
         double v = 0
         int m = A.shape[0]
         int n = A.shape[1]
         int i, j
 
    with nogil:
        for i in prange(m):      # OpenMP parallel for loop (*)
            for j in range(n):
                v += A[i, j]
 
    return v
    
    
def f1(U, V):
    v = 0
    for n_iter in range(100):
        UV = U @ V.T             # BLAS call (gemm)
        v += f2(UV)              # function runs an OpenMP parallel for loop
    return v
    

U = np.random.randn(10000, 100)
V = np.random.randn(10, 100)
 
t = time.time()
v = f1(U, V)
print(time.time() - t)
```

On my laptop (2 physical cores), when I use a sequential loop in (*), it runs in **0.26s**. When I use a parallel loop it runs in **2.6s** (10x slower). This is with OpenBLAS 0.3.12 built with pthreads. This conda env allows to reproduce ``conda create -n tmp -c conda-forge python numpy cython ipython``.

However, if I use OpenBLAS built with OpenMP, it runs in **0.26s** with and without prange. This is with OpenBLAS 0.3.9 built with OpenMP. This conda env allows to reproduce ``conda create -n tmp -c conda-forge python numpy cython ipython blas[build=openblas] libopenblas=0.3.9``.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Slowdown when using openblas-pthreads alongside openmp based parallel code #3187

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Slowdown when using openblas-pthreads alongside openmp based parallel code #3187

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions