Skip to content

Setting CPP_THREAD_SAFETY_TEST=1 causes a crash on 0.3.19 #3503

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
arungiridhar opened this issue Dec 30, 2021 · 10 comments · Fixed by #3504
Closed

Setting CPP_THREAD_SAFETY_TEST=1 causes a crash on 0.3.19 #3503

arungiridhar opened this issue Dec 30, 2021 · 10 comments · Fixed by #3504

Comments

@arungiridhar
Copy link

With OpenBlas 0.3.19 on setting CPP_THREAD_SAFETY_TEST=1 there is a segfault when exiting from dgemv_tester, preventing the library from being built:

make -j 12 -C cpp_thread_test all
make[1]: Entering directory '/var/tmp/pamac-build-_/openblas-lapack/src/OpenBLAS-0.3.19/cpp_thread_test'
g++ -O2 -Wall -Wextra -Wshadow -fopenmp -std=c++11 dgemv_thread_safety.cpp ../libopenblas_zenp-r0.3.19.a -lm -lpthread -lgfortran -lm -lpthread -lgfortran -L/usr/lib/gcc/x86_64-pc-linux-gnu/11.1.0 -L/usr/lib/gcc/x86_64-pc-linux-gnu/11.1.0/../../../../lib -L/lib/../lib -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-pc-linux-gnu/11.1.0/../../..  -lgfortran -lm -lgomp -lquadmath -lm -lpthread -lc   -o dgemv_tester
./dgemv_tester
*----------------------------*
| DGEMV thread safety tester |
*----------------------------*
Size of random matrices and vectors(N=M): 1024
Number of concurrent calls into OpenBLAS : 12
Number of testing rounds : 16
This test will need 96.1875 MiB of RAM

Initializing random number generator...done
Preparing to test CBLAS DGEMV thread safety
Allocating matrices...done
Allocating vectors...done
Filling matrices with random numbers...done
Filling vectors with random numbers...done
Testing CBLAS DGEMV thread safety
DGEMV round #0
Launching 12 threads simultaneously using OpenMP...done
Waiting for threads to finish...done
Comparing results from different threads...OK!

DGEMV round #1
Launching 12 threads simultaneously using OpenMP...done
Waiting for threads to finish...done
Comparing results from different threads...OK!

... (trimmed for brevity)

DGEMV round #15
Launching 12 threads simultaneously using OpenMP...done
Waiting for threads to finish...done
Comparing results from different threads...OK!

CBLAS DGEMV thread safety test PASSED!

make[1]: *** [Makefile:8: dgemv_tester] Segmentation fault (core dumped)
make[1]: *** Deleting file 'dgemv_tester'
make[1]: Leaving directory '/var/tmp/pamac-build-_/openblas-lapack/src/OpenBLAS-0.3.19/cpp_thread_test'
make: *** [Makefile:149: tests] Error 2
==> ERROR: A failure occurred in check().
    Aborting...

With the git source (commit ee823b6), there is no such crash in the threading tests:

make -j 12 -C cpp_thread_test all
make[1]: warning: -j12 forced in submake: resetting jobserver mode.
make[1]: Entering directory '/home/_/src/OpenBLAS/cpp_thread_test'
g++ -O2 -Wall -Wextra -Wshadow -fopenmp -std=c++11 dgemv_thread_safety.cpp ../libopenblas_zenp-r0.3.19.dev.a -lm -lpthread -lgfortran -lm -lpthread -lgfortran -L/usr/lib/gcc/x86_64-pc-linux-gnu/11.1.0 -L/usr/lib/gcc/x86_64-pc-linux-gnu/11.1.0/../../../../lib -L/lib/../lib -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-pc-linux-gnu/11.1.0/../../..  -lgfortran -lm -lquadmath -lm -lc   -o dgemv_tester
./dgemv_tester
*----------------------------*
| DGEMV thread safety tester |
*----------------------------*
Size of random matrices and vectors(N=M): 1024
Number of concurrent calls into OpenBLAS : 12
Number of testing rounds : 16
This test will need 96.1875 MiB of RAM

Initializing random number generator...done
Preparing to test CBLAS DGEMV thread safety
Allocating matrices...done
Allocating vectors...done
Filling matrices with random numbers...done
Filling vectors with random numbers...done
Testing CBLAS DGEMV thread safety
DGEMV round #0
Launching 12 threads simultaneously using OpenMP...done
Waiting for threads to finish...done
Comparing results from different threads...OK!

DGEMV round #1
Launching 12 threads simultaneously using OpenMP...done
Waiting for threads to finish...done
Comparing results from different threads...OK!

... (trimmed for brevity)

DGEMV round #15
Launching 12 threads simultaneously using OpenMP...done
Waiting for threads to finish...done
Comparing results from different threads...OK!

CBLAS DGEMV thread safety test PASSED!

g++ -O2 -Wall -Wextra -Wshadow -fopenmp -std=c++11 dgemm_thread_safety.cpp ../libopenblas_zenp-r0.3.19.dev.a -lm -lpthread -lgfortran -lm -lpthread -lgfortran -L/usr/lib/gcc/x86_64-pc-linux-gnu/11.1.0 -L/usr/lib/gcc/x86_64-pc-linux-gnu/11.1.0/../../../../lib -L/lib/../lib -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-pc-linux-gnu/11.1.0/../../..  -lgfortran -lm -lquadmath -lm -lc   -o dgemm_tester
./dgemm_tester
*----------------------------*
| DGEMM thread safety tester |
*----------------------------*
Size of random matrices(N=M=K): 1024
Number of concurrent calls into OpenBLAS : 12
Number of testing rounds : 16
This test will need 288 MiB of RAM

Initializing random number generator...done
Preparing to test CBLAS DGEMM thread safety
Allocating matrices...done
Filling matrices with random numbers...done
Testing CBLAS DGEMM thread safety
DGEMM round #0
Launching 12 threads simultaneously using OpenMP...done
Waiting for threads to finish...done
Comparing results from different threads...OK!

DGEMM round #1
Launching 12 threads simultaneously using OpenMP...done
Waiting for threads to finish...done
Comparing results from different threads...OK!

... (trimmed for brevity)

DGEMM round #15
Launching 12 threads simultaneously using OpenMP...done
Waiting for threads to finish...done
Comparing results from different threads...OK!

CBLAS DGEMM thread safety test PASSED!

make[1]: Leaving directory '/home/_/src/OpenBLAS/cpp_thread_test'
make[1]: warning: -j12 forced in submake: resetting jobserver mode.
make[1]: Entering directory '/home/_/src/OpenBLAS/exports'
perl ./gensymbol linktest  x86_64 _ 0 0 0 0 0 0 "" "" 1 0 1 1 1 1 > linktest.c
cc -O2 -DSMALL_MATRIX_OPT -DMAX_STACK_ALLOC=2048 -Wall -m64 -DF_INTERFACE_GFORT -fPIC -DSMP_SERVER -DNO_WARMUP -DMAX_CPU_NUMBER=12 -DMAX_PARALLEL_NUMBER=1 -DBUILD_SINGLE=1 -DBUILD_DOUBLE=1 -DBUILD_COMPLEX=1 -DBUILD_COMPLEX16=1 -DVERSION=\"0.3.19.dev\" -msse3 -mssse3 -msse4.1 -mavx -mavx2 -mavx2 -UASMNAME -UASMFNAME -UNAME -UCNAME -UCHAR_NAME -UCHAR_CNAME -DASMNAME= -DASMFNAME=_ -DNAME=_ -DCNAME= -DCHAR_NAME=\"_\" -DCHAR_CNAME=\"\" -DNO_AFFINITY -I..  -shared -o ../libopenblas_zenp-r0.3.19.dev.so \
-Wl,--whole-archive ../libopenblas_zenp-r0.3.19.dev.a -Wl,--no-whole-archive \
-Wl,-soname,libopenblas.so.0 -lm -lpthread -lgfortran -lm -lpthread -lgfortran
cc -O2 -DSMALL_MATRIX_OPT -DMAX_STACK_ALLOC=2048 -Wall -m64 -DF_INTERFACE_GFORT -fPIC -DSMP_SERVER -DNO_WARMUP -DMAX_CPU_NUMBER=12 -DMAX_PARALLEL_NUMBER=1 -DBUILD_SINGLE=1 -DBUILD_DOUBLE=1 -DBUILD_COMPLEX=1 -DBUILD_COMPLEX16=1 -DVERSION=\"0.3.19.dev\" -msse3 -mssse3 -msse4.1 -mavx -mavx2 -mavx2 -UASMNAME -UASMFNAME -UNAME -UCNAME -UCHAR_NAME -UCHAR_CNAME -DASMNAME= -DASMFNAME=_ -DNAME=_ -DCNAME= -DCHAR_NAME=\"_\" -DCHAR_CNAME=\"\" -DNO_AFFINITY -I..  -w -o linktest linktest.c ../libopenblas_zenp-r0.3.19.dev.so -L/usr/lib/gcc/x86_64-pc-linux-gnu/11.1.0 -L/usr/lib/gcc/x86_64-pc-linux-gnu/11.1.0/../../../../lib -L/lib/../lib -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-pc-linux-gnu/11.1.0/../../..  -lgfortran -lm -lquadmath -lm -lc   && echo OK.
OK.
rm -f linktest
make[1]: Leaving directory '/home/_/src/OpenBLAS/exports'

 OpenBLAS build complete. (BLAS CBLAS LAPACK LAPACKE)

  OS               ... Linux             
  Architecture     ... x86_64               
  BINARY           ... 64bit                 
  C compiler       ... GCC  (cmd & version : cc (GCC) 11.1.0)
  Fortran compiler ... GFORTRAN  (cmd & version : GNU Fortran (GCC) 11.1.0)
  Library Name     ... libopenblas_zenp-r0.3.19.dev.a (Multi-threading; Max num-threads is 12)

0.3.19 was installed from the distro here: https://aur.archlinux.org/packages/openblas-lapack/

All build settings on both 0.3.19 and the git source were the same. Everything was left at default values except for setting CPP_THREAD_SAFETY_TEST=1.

This behavior causes a crash when exiting from GNU Octave as reported here: https://savannah.gnu.org/bugs/?61742

Hardware: AMD Ryzen 5 3600.
OS: Manjaro Linux.

@martin-frbg
Copy link
Collaborator

No relevant commits between 0.3.19 and current develop so the crash would appear to be nondeterministic. (Very few thread-related changes in 0.3.19 anyway, so probably an older bug and/or something brought up by gcc 11.1)

@arungiridhar
Copy link
Author

arungiridhar commented Dec 30, 2021

I notice that the distro-provided 0.3.19 build uses -lgomp and -lpthread to build dgemv_tester but my git build does not use those two flags. Could they cause a crash of dgemv_tester on exit? If so, how to suppress them from showing up automatically?

(edited for clarity)

@martin-frbg
Copy link
Collaborator

dgemv_tester uses OpenMP which relies on pthreads, so no way it could build without them.
(Problem not immediately reproducible on my 12-core Zen2)

@arungiridhar
Copy link
Author

I found a way to make the distro supplied 0.3.19 work. It turned out the distro install script was automatically adding the following options to OpenBlas:

_config="FC=gfortran USE_OPENMP=1 USE_THREAD=1 \
  USE_TLS=1 \
  NO_LAPACK=0 BUILD_LAPACK_DEPRECATED=1 \
  MAJOR_VERSION=${_lapackver:0:1} NO_STATIC=1"

I added CPP_THREAD_SAFETY_TEST=1 to the list and started deleting the others one at a time. I could isolate the cause of the crash to USE_TLS=1. I removed that option and left the others in, along with CPP_THREAD_SAFETY_TEST=1, and there is no crash with dgemv_test and no crash with Octave.

Does this make sense to you? If yes, I will let the package maintainers know about this thread so they can update their PKGBUILD script as required.

@brada4
Copy link
Contributor

brada4 commented Dec 30, 2021

Actually same workaround is mentioned in octave thread linked in your initial report.
PS by you, sorry to overlook that.

@martin-frbg
Copy link
Collaborator

Thanks, reproduced now with USE_TLS=1 USE_OPENMP=1 - looks to be some kind of race on shutdown where the master thread gets woken up with the (mis)information that there are zero threads available to partition the workload over. Not sure why this happens with USE_TLS, but this threading mode is certainly much less well tested.

@martin-frbg
Copy link
Collaborator

Probably two bugs - one in keeping track of memory buffers to free, and another a race in getting/setting the maximum thread count. I think I have a fix for the race, but I am still trying to trace the TLS buffer logic.

@martin-frbg
Copy link
Collaborator

Hmm. This actually used to work in 0.3.18... looks like my #3437 broke it but I do not yet understand why.

@arungiridhar
Copy link
Author

Not sure if this is related, but if you look in drivers/others/memory.c, this shows up near line 86:

#if defined(USE_TLS) && defined(SMP)
#define COMPILE_TLS

and then it continues a bit, then line 218 says

#ifndef SMP
#define blas_cpu_number 1
#define blas_num_threads 1

It looks like that second ifndef won't get executed at all because it is masked by the first. Not sure if that makes any difference in this case.

To be clear, does SMP refer to literal multiple processors or does it also apply for a single processor with multiple cores and threads? I've seen some authors draw a distinction between SMP and SMT, but from the code in memory.c it looks like they are rolledi nto one?

@martin-frbg
Copy link
Collaborator

martin-frbg commented Dec 31, 2021

No distinction between SMP and SMT here - and if everything happens in a single thread anyway, the code is considerably simplified. The "COMPILE_TLS" exists because memory.c is actually two versions of the original memory.c combined into one file - one using thread-local storage, which was intended to replace the original code, and the other half based on K.Goto's original housekeeping of malloc'd buffers. What I did not get (before dinner) is why calling omp_get_num_places() would throw the entire TLS code in disarray - but it appears that call can actually return zero for the number of available cores/threads when OMP_PROC_BIND is not set. Ouch ☹️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants