-
Notifications
You must be signed in to change notification settings - Fork 1.6k
OpenBLAS crashing for Julia with different threading options #2225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The problem is probably that Julia is trying to invoke (at least close to) 56 instances of OpenBLAS, and your libopenblas.so was built to preallocate only 28 (or thereabouts) memory buffers. In such a situation, even limiting OPENBLAS_NUM_THREADS is not likely to help as the number of concurrent calls is imposed from outside. (NUM_THREADS is a compile-time parameter that takes its default from the number of cores - including hyperthreading ones - on the build host) |
We do indeed build Julia on Linux with NUM_THREADS of 16. Is there a way to make this a run-time setting rather than compile time? Because of the high level of allocation for large number of threads, we try to keep this setting on the lower side. However, people on larger machines then are unable to use all the cores without recompiling openblas. |
Also, it would be nice if OpenBLAS could print an appropriate error message if it detects inconsistent compile time and run time settings. That way we can avoid a crash, and openblas can simply refuse to compute, or compute with fewer threads. |
There is no easy way to turn this into a runtime setting unfortunately.This is one |
We can certainly provide a higher default, but then people on smaller machines don't like the extra memory allocated - which can be substantial. @staticfloat @vtjnash Should we try |
Default is MAX(50,NCPU*2) regions allowd at the compile time. i.e with >50 real CPUs, threaded with OpenBLAS, or called from multithread programsm mandates setting CPU number higher. Rationale back then was to make region holding structure biger to silence most of bug reports of a kind, still keeping the structure under on memory page to not rise TLB misses & stuff. |
The problem in a nutshell is that there is a limit to how many threads of OpenBLAS can be running at any given moment. If you set OpenBLAS to use N threads, and then you call a BLAS/LAPACK function from M threads, you use up N*M slots. If you run out of slots, you either get a crash or incorrect results. Please see the threading related remarks in Makefile.rule. They are not perfect or exhaustive, but generally OK. |
As Andrew noted previously, the number of slots is a compile time constant for OpenBLAS. So you have at least 50 slots. If the product |
:-) I actually added that 50 in that line |
Hi, I've posted this to the Julia issues but figured I would link here as well
JuliaLang/julia#14857 (comment)
My machine has 56 cores, and the default for OPENBLAS_NUM_THREADS on my machine seems to be 8- I don't remember if I had any control in that, as I don't believe I built OpenBLAS from source
Running with half the julia threads works for me:
But other settings segfault with the same error `BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
Here's the test program I'm running:
The text was updated successfully, but these errors were encountered: