-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Setting CPP_THREAD_SAFETY_TEST=1 causes a crash on 0.3.19 #3503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
No relevant commits between 0.3.19 and current |
I notice that the distro-provided 0.3.19 build uses -lgomp and -lpthread to build dgemv_tester but my git build does not use those two flags. Could they cause a crash of dgemv_tester on exit? If so, how to suppress them from showing up automatically? (edited for clarity) |
dgemv_tester uses OpenMP which relies on pthreads, so no way it could build without them. |
I found a way to make the distro supplied 0.3.19 work. It turned out the distro install script was automatically adding the following options to OpenBlas:
I added CPP_THREAD_SAFETY_TEST=1 to the list and started deleting the others one at a time. I could isolate the cause of the crash to USE_TLS=1. I removed that option and left the others in, along with CPP_THREAD_SAFETY_TEST=1, and there is no crash with dgemv_test and no crash with Octave. Does this make sense to you? If yes, I will let the package maintainers know about this thread so they can update their PKGBUILD script as required. |
Actually same workaround is mentioned in octave thread linked in your initial report. |
Thanks, reproduced now with USE_TLS=1 USE_OPENMP=1 - looks to be some kind of race on shutdown where the master thread gets woken up with the (mis)information that there are zero threads available to partition the workload over. Not sure why this happens with USE_TLS, but this threading mode is certainly much less well tested. |
Probably two bugs - one in keeping track of memory buffers to free, and another a race in getting/setting the maximum thread count. I think I have a fix for the race, but I am still trying to trace the TLS buffer logic. |
Hmm. This actually used to work in 0.3.18... looks like my #3437 broke it but I do not yet understand why. |
Not sure if this is related, but if you look in drivers/others/memory.c, this shows up near line 86:
and then it continues a bit, then line 218 says
It looks like that second ifndef won't get executed at all because it is masked by the first. Not sure if that makes any difference in this case. To be clear, does SMP refer to literal multiple processors or does it also apply for a single processor with multiple cores and threads? I've seen some authors draw a distinction between SMP and SMT, but from the code in memory.c it looks like they are rolledi nto one? |
No distinction between SMP and SMT here - and if everything happens in a single thread anyway, the code is considerably simplified. The "COMPILE_TLS" exists because memory.c is actually two versions of the original memory.c combined into one file - one using thread-local storage, which was intended to replace the original code, and the other half based on K.Goto's original housekeeping of malloc'd buffers. What I did not get (before dinner) is why calling omp_get_num_places() would throw the entire TLS code in disarray - but it appears that call can actually return zero for the number of available cores/threads when OMP_PROC_BIND is not set. Ouch |
With OpenBlas 0.3.19 on setting CPP_THREAD_SAFETY_TEST=1 there is a segfault when exiting from dgemv_tester, preventing the library from being built:
With the git source (commit ee823b6), there is no such crash in the threading tests:
0.3.19 was installed from the distro here: https://aur.archlinux.org/packages/openblas-lapack/
All build settings on both 0.3.19 and the git source were the same. Everything was left at default values except for setting CPP_THREAD_SAFETY_TEST=1.
This behavior causes a crash when exiting from GNU Octave as reported here: https://savannah.gnu.org/bugs/?61742
Hardware: AMD Ryzen 5 3600.
OS: Manjaro Linux.
The text was updated successfully, but these errors were encountered: