-
Notifications
You must be signed in to change notification settings - Fork 1.6k
BLAS : Program is Terminated. Because you tried to allocate too many memory regions. #1882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Well, you know, he may be right. But it would certainly help if we knew the version of OpenBLAS that you are currently using, and a bit more about what erkale does. I guess erkale itself is multithreading and thus calling into OpenBLAS from multiple threads, which causes problems that I am currently trying to fix. |
openblas-0.2.20_3,1 on FreeBSD |
Could you try with current |
Alternatively rebuild package with OpenMP that should be more moderate in openmp program and not try to spawn n^2 threads |
|
The message
begs the question "How many regions were allocated?" Also with the ever increasing computing power, what does "too many" really mean? Why is this limitation imposed? |
It is a fixed table of regions.... 1-2 are consumed per parallel thread |
Why can't you reallocate it dynamically when exceeded instead of fixing it once and for good during build? |
This limit is directly related to the NUM_THREADS parameter set at build time (which defaults to the number of cores detected on the build host ). There has been a recent attempt to rewrite the memory allocation logic (that dates back to K. Goto's original libGotoBLAS of 10+ years ago) using thread-local storage. Unfortunately the reimplementation met a number of unexpected corner cases and it is unclear if it is safe to use as the default in its current state. See option USE_TLS in current |
Indeed OpenBLAS FreeBSD port would be built with 16 malloc slots only. |
I'll change the limit in the port for now. But no matter what the limit value would be set, this problem will come back because the number of threads shouldn't even in theory be tied to the number of CPUs in general (threads can be half-idle for example). This needs to be solved. |
…k other ports NUM_THREADS= sets the build-time limit on the number of threads that apps can use with OpenBlas. This unbreaks at least science/erkale's tests, and possibly some other software. The upstream acknowledges the problem, recommended this solution for the port, and are working on the permanent solution: OpenMathLib/OpenBLAS#1882 Approved by: portmgr blanket (unbreak) git-svn-id: svn+ssh://svn.freebsd.org/ports/head@485641 35697150-7ecd-e111-bb59-0022644237b5
…k other ports NUM_THREADS= sets the build-time limit on the number of threads that apps can use with OpenBlas. This unbreaks at least science/erkale's tests, and possibly some other software. The upstream acknowledges the problem, recommended this solution for the port, and are working on the permanent solution: OpenMathLib/OpenBLAS#1882 Approved by: portmgr blanket (unbreak)
@yurivict You are wrong about number of threads. The constraining resource here is CPU cache, OpenBLAS( or MKL for that sake) would operate on limited amount of data , fitting in L1d/L2/L3/L4 caches. Obvious if 2 threads of a kind meet on same core they go with 10-20x slower memory accesses from main memory and performance goes 10x down. |
You assume that all threads are CPU-intense. But some threads might be idle. Some might work on separate data sets while using only 10% of CPU each. Some people create threads per connection, etc. All sorts of use models can take place. |
…k other ports NUM_THREADS= sets the build-time limit on the number of threads that apps can use with OpenBlas. This unbreaks at least science/erkale's tests, and possibly some other software. The upstream acknowledges the problem, recommended this solution for the port, and are working on the permanent solution: OpenMathLib/OpenBLAS#1882 Approved by: portmgr blanket (unbreak) git-svn-id: svn+ssh://svn.freebsd.org/ports/head@485641 35697150-7ecd-e111-bb59-0022644237b5
That 10% would break the assumption of computation kernels that cache is for their exclusive use, and both compute kernels on same core will slow down N times more than just in half as with normal compiler-emitted code. |
Change of |
@yurivict do they (tests) pass with OPENBLAS_NUM_THREADS=1 and/or with OpenMP OpenBLAS? |
…k other ports NUM_THREADS= sets the build-time limit on the number of threads that apps can use with OpenBlas. This unbreaks at least science/erkale's tests, and possibly some other software. The upstream acknowledges the problem, recommended this solution for the port, and are working on the permanent solution: OpenMathLib/OpenBLAS#1882 Approved by: portmgr blanket (unbreak) git-svn-id: svn+ssh://svn.freebsd.org/ports/head@485641 35697150-7ecd-e111-bb59-0022644237b5
They still fail. |
Summary: The change to What helped: change to liblapack.so/libblas.so/libcblas.so. Tests pass with this implementation. Testcase: The Erkale quantum chemistry project (https://github.com/susilehtola/erkale) built with |
You mean openblas.so fails completlely? Or you had to direct .BLAS .cblas .lapack alll to OpenBLAS at once? |
It triggers exceptions error, see above, and the processes crash. |
The log? |
|
Is it the log from: HOW MANY CPU CORES ARE THERE IN THE BUILD MACHINE? |
4 cores, 8 virtual CPUs.
|
I will try to get something out of Linux and erakle If all tests run in same program continuously-there are some uninitialized values fixed, that may probably worth waiting for 0.3.4 instead of rushing 0.3.3 Which project is to blame for allocating threads? So far I see just slight misconfiguration, and probably old version. |
Parts of the build process are serialized to avoid races - GNU make is not very sophisticated in this regard. |
BLAS part does not have inter-dependencies, so you can get 100+ cores utilized for few seconds for each CPU generation, but serialized parts (ar) in between. |
The patch has been committed to the FreeBSD port ( |
… safety issues This patch is recommended by the upstream: OpenMathLib/OpenBLAS#1882 (comment) Approved by: portmgr (unbreak) git-svn-id: svn+ssh://svn.freebsd.org/ports/head@485906 35697150-7ecd-e111-bb59-0022644237b5
… safety issues This patch is recommended by the upstream: OpenMathLib/OpenBLAS#1882 (comment) Approved by: portmgr (unbreak) git-svn-id: svn+ssh://svn.freebsd.org/ports/head@485906 35697150-7ecd-e111-bb59-0022644237b5
… safety issues This patch is recommended by the upstream: OpenMathLib/OpenBLAS#1882 (comment) Approved by: portmgr (unbreak)
… safety issues This patch is recommended by the upstream: OpenMathLib/OpenBLAS#1882 (comment) Approved by: portmgr (unbreak) git-svn-id: svn+ssh://svn.freebsd.org/ports/head@485906 35697150-7ecd-e111-bb59-0022644237b5
Both options should go to non-threaded version too |
Ok BLAS imports (probably some of L1 is masked by gsl cblas macros) All BLAS have thread limits, it is a performance issue for particular functions for small inputs, not crasher or something There are some dangerous LAPACK functions getting imported mandating frecursive
Let me summarize: I think for now it is best to import pthread version in all circumstances in serial programs and single-threaded, safeguarded with -frecursive in threaded ones, and keep the GOMP version in the basement for programs that do not crash when build with GCC world (as disabled by default option for example) The only dangers are performance-related i.e OMP program imports threaded version and gets N^2 threads which can be brought under control with variables, or single threaded program imports single threaded version, still faster than netlib, but with big space for improvement Improvements gained towards 0.3.4:
|
I see now that |
@martin-frbg No more need for -frecursive, it is now in right place. There will be AVX-512 (Skylake-X) support , both FreeBSD clang and gcc can compile it, so new option for that(?). In principle it builds with clang+flang(once later works) too, if you want to experiment in other side of OMP world, but not required at all. I think we cannot improve anything here, but feel free to report if you stumble on anything similarily weird. |
Could you create a separate issue for that please (I assume with "early threading" you mean inefficient multithreading for tiny problem sizes (and not something leading to catastrophic failure), but I am guaranteed to lose my mind if I try to look into that today). |
@yurivict (not related to current issue at all) is it possible to get to FreeBSD something like linux pax-utils, i.e. lddtree to find 2 distinct OMP imports and symtree to quickly list imported functions per library? |
@brada4 Is it this package: https://www.freshports.org/sysutils/pax-utils ? |
FYI You can use Repology website to search for packages by name in different systems: https://repology.org/ |
Installed, thanks :-) |
…r non-openmp too Previously I added these options only to the openmp build which isn't a default. This change is requested by the upstream. Ref. OpenMathLib/OpenBLAS#1882 Approved by: portmgr (unbreak) git-svn-id: svn+ssh://svn.freebsd.org/ports/head@485947 35697150-7ecd-e111-bb59-0022644237b5
…r non-openmp too Previously I added these options only to the openmp build which isn't a default. This change is requested by the upstream. Ref. OpenMathLib/OpenBLAS#1882 Approved by: portmgr (unbreak)
…r non-openmp too Previously I added these options only to the openmp build which isn't a default. This change is requested by the upstream. Ref. OpenMathLib/OpenBLAS#1882 Approved by: portmgr (unbreak) git-svn-id: svn+ssh://svn.freebsd.org/ports/head@485947 35697150-7ecd-e111-bb59-0022644237b5
…r non-openmp too Previously I added these options only to the openmp build which isn't a default. This change is requested by the upstream. Ref. OpenMathLib/OpenBLAS#1882 Approved by: portmgr (unbreak) git-svn-id: svn+ssh://svn.freebsd.org/ports/head@485947 35697150-7ecd-e111-bb59-0022644237b5
Closing as the crucial change, adding |
@brada4 Is there any 'harm' on increasing the number of memory buffers to large number? From the local_memory_table = (struct alloc_t **)malloc(sizeof(struct alloc_t *) * NUM_BUFFERS);
memset(local_memory_table, 0, sizeof(struct alloc_t *) * NUM_BUFFERS); It seems that it's just From the discussions above, I can't quite relate the |
Very bad to steal closed unrelated thread... |
I used openblas for blas/lapack functions in the erkale project, and it fails. erkale's author says that openblas is broken, see susilehtola/erkale#29 (comment)
The text was updated successfully, but these errors were encountered: