-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
In one of my simulation codes written in FORTRAN2008, there is a loop over a number of signal frequencies and for each of these frequencies, a system of linear equations is solved using calls to ZGBTRF or ZGBTRS in the loop body. Since there is some effort required to generate the system matrix and rhs vector for each frequency, I decided to parallelize the loop using OpenMP meaning that only a single thread would be used to solve the system of linear equations at a given signal frequency. This used to work fine with openblas 0.2.19 and the Intel Math Kernel Library.
After compiling openblas 0.2.20 using gcc and gfortran versions 6.3 or 7.2 on Ubuntu 17.10 using make TARGET=HASWELL USE_OPENMP=1 BINARY=64 FC=gfortran, I have trouble obtaining correct matrix inversion results in the parallelized loop issuing calls to ZGBTRF or ZGBTRS in the loop body, when I increase the number of threads to work on the parallelized loop to be larger than one using the OMP_NUM_THREADS variable. It seems that this is a problem with thread safety in openblas 0.2.20. A workaround I tested is to set the OPENBLAS_NUM_THREADS variable to 1 just before the start of the loop using "call openblas_set_num_threads(1)" in the fortran source code. However, this sets OMP_NUM_THREADS to 1, too. So, even though the results of the workaround are correct, this is not satisfactory, because the loop is executed in sequential order. From the description given in the current README.md file, this behaviour is somewhat unexpected.
I hope that this description helps you to localise the problem. I would be more than happy to assist checking updates using my code.
By the way, do you have any plans to write parallelised versions of the Cholesky decomposition (DPPTRF and DPPTRS)?
Keep up the good work!