-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Program is Terminated. Because you tried to allocate too many memory regions. #539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Did you build OpenBLAS with -frecursive ? dspmv is part of Lapack and, with gcc, Lapack is thread safe only if built with -frecursive or -fopenmp. See https://github.com/xianyi/OpenBLAS/blob/develop/Makefile.rule#L152. |
No we didn't. On Sun, Apr 12, 2015 at 12:25 PM, Jerome Robert [email protected]
|
Yes.
|
That doesn't seem to work right, because then the options Dan On Sun, Apr 12, 2015 at 2:17 PM, Jerome Robert [email protected]
|
In OpenBLAS, we mange a pool of memory buffers and allocate the number of buffers as the following. Could you build OpenBLAS with larger |
OK, thanks. On Mon, Apr 13, 2015 at 10:55 AM, Zhang Xianyi [email protected]
|
Add an item in FAQ https://github.com/xianyi/OpenBLAS/wiki/faq#allocmorebuffers |
Could we maybe just have |
Should add that I see the same behavior that @danpovey notes with |
Hi I am facing the same problem despite the fact that these are my make flags in openblas : make TARGET=HASWELL F_COMPILER=GFORTRAN SHARED=1 DYNAMIC_THREADS=1 USE_OPENMP=1 NUM_THREADS=128 %{?_smp_mflags} I am using this spec file: The test that im running is: I have try with USE_OPENMP=0 USE_THREAD=1 NUM_THREADS=128 with the same result Thanks a lot |
What kind of hardware are you trying to run this on ? You could try changing the calculation of NUM_BUFFERS in common.h (currently NUM_THREADS * 2) to see if it is as simple as that or |
@VictorRodriguez also which version are you using - the current "develop" branch from git, or some older release ? Some thread safety fixes for the traversal of the buffers list went in around the new year. |
Also see if adding "-frecursive" to the fortran compiler options (by uncommenting the FCOMMON_OPT line in Makefile.rule helps (as suggested above) |
@martin-frbg thanks for the feedback I'm using Version 0.2.19 version . Let me try what you suggest , thanks a lot |
FYI @VictorRodriguez had better luck with |
With the current code, I think a user-defined FCOMMON_OPT should only gain additional settings in Makefile.system. (In the past, you might have run into a situation where the default "-O2" optimization level was applied to the fortran part only if FCOMMON_OPT was previously undefined but this has been fixed in early january as well, i.e. post 0.2.19) |
What about |
I think these are only appended to whatever FCOMMON_OPT is seen by Makefile.system ( |
Hi team Thanks a lot for the help, however I haven't been able to fix this issue. I have been done this changes on my spec file: diff --git a/openblas.spec b/openblas.spec sed -i -e "s/-O2/-O3/g" Makefile* Then I re run this code crom scipy ( I am thinking to rebuild scipy ): https://github.com/scipy/scipy/blob/master/scipy/interpolate/tests/test_interpnd.py Is there a test for this case in openblas source code? ( in C for example ) Thanks a lot for the help |
I am not aware of any specific testcase in the source, my low tech approach would be
|
Part of the reason I started using |
It turns out that the issue @VictorRodriguez was having was due to someone cherry-picking 84b8170 into the source tree. This commit was effectively reverted in dd6212e - sorry for the noise. |
Hi,
I would appreciate it if you could explain why OpenBLAS crashes if you build it with OPENBLAS_NUM_THREADS=2 and then try to call it from multi-threaded code. Is this by design? Or maybe due to the limitations of the machine we're running it on?
Dan
/home/ubuntu/workspace/kaldi/src/online2bin/online2-wav-nnet2-latgen-faster --online=true --do-endpointing=false --config=exp/nnet1/conf/online_nnet2_decoding.conf --max-active=3000 --beam=8.0 --lattice-beam=4.0 --acoustic-scale=0.07 --word-symbol-table=exp/nnet1/graph/words.txt exp/nnet1/final.mdl exp/nnet1/graph/HCLG.fst ark:data/test/split1/1/spk2utt 'ark,s,cs:wav-copy scp,p:data/test/split1/1/wav.scp ark:- |' ark:/dev/null
LOG (online2-wav-nnet2-latgen-faster:ComputeDerivedVars():ivector-extractor.cc:180) Computing derived variables for iVector extractor
[New Thread 0x7fffedfa1700 (LWP 26871)]
[New Thread 0x7fffed7a0700 (LWP 26872)]
[New Thread 0x7fffecf9f700 (LWP 26873)]
[Thread 0x7fffedfa1700 (LWP 26871) exited]
[New Thread 0x7fffdffff700 (LWP 26874)]
[Thread 0x7fffed7a0700 (LWP 26872) exited]
[New Thread 0x7fffdf7fe700 (LWP 26875)]
[Thread 0x7fffecf9f700 (LWP 26873) exited]
[New Thread 0x7fffed7a0700 (LWP 26876)]
[New Thread 0x7fffedfa1700 (LWP 26877)]
[New Thread 0x7fffecf9f700 (LWP 26878)]
[New Thread 0x7fffdeffd700 (LWP 26879)]
[Thread 0x7fffdffff700 (LWP 26874) exited]
[Thread 0x7fffdf7fe700 (LWP 26875) exited]
[New Thread 0x7fffde7fc700 (LWP 26880)]
[New Thread 0x7fffdffff700 (LWP 26881)][Thread 0x7fffed7a0700 (LWP 26876) exited][New Thread 0x7fffddffb700 (LWP 26882)]
[New Thread 0x7fffdf7fe700 (LWP 26883)][Thread 0x7fffedfa1700 (LWP 26877) exited][New Thread 0x7fffdd7fa700 (LWP 26884)]
[New Thread 0x7fffed7a0700 (LWP 26885)]BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
Program received signal SIGSEGV, Segmentation fault.[Switching to Thread 0x7fffed7a0700 (LWP 26885)]0x00007ffff259f024 in dcopy_k () from /home/ubuntu/workspace/kaldi/tools/OpenBLAS/install/lib/libopenblas.so.0
(gdb) bt
#0 0x00007ffff259f024 in dcopy_k () from /home/ubuntu/workspace/kaldi/tools/OpenBLAS/install/lib/libopenblas.so.0
#1 0x00007ffff238cd1e in dspmv_U () from /home/ubuntu/workspace/kaldi/tools/OpenBLAS/install/lib/libopenblas.so.0#2 0x00007ffff2355653 in cblas_dspmv () from /home/ubuntu/workspace/kaldi/tools/OpenBLAS/install/lib/libopenblas.so.0#3 0x00007ffff53e15f9 in kaldi::cblas_Xspmv (dim=40, alpha=1, Mdata=0x16fb440, ydata=0x7ae370, ystride=100, beta=0, xdata=0x7fffc400a6a0, xstride=1)
The text was updated successfully, but these errors were encountered: