-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
norm(zeros(129,129)) causes Abort trap: 6 #14507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is on El Capitan |
Can't reproduce on Ubuntu or Windows. |
Cannot reproduce on El Capitan, with a slightly newer version of Julia and a newer CPU:
Are you sure the original report uses El Capitan? I thought El Capitan is Darwin 15, not Darwin 13. |
Let me try updating my OS to 10.11.2, see if that fixes it.
|
See https://en.wikipedia.org/wiki/OS_X_El_Capitan. Some part of your system is still on Mavericks. As a wild guess I'd point to Xcode or the Command line tools, or left-over parts from a previous (Mavericks) Julia install. |
I’ve updated to 10.11.2 and still have the issue:
Let me try reinstalling Julia.
|
I’m using the downloadable binary, maybe the reported System information comes from the compilation machine?
|
Redownloaded the binary and the same bug is there. I’ll try making 0.4.2 from source now and see if that resolves it
|
OK I did the test on a built version of Julia. It no longer crashes, but I get the message
|
I can reproduce this on 10.10.5: julia> versioninfo()
Julia Version 0.5.0-dev+1922
Commit af0668e* (2015-12-30 00:54 UTC)
Platform Info:
System: Darwin (x86_64-apple-darwin14.5.0)
CPU: Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.3
julia> norm(zeros(129,129))
BLAS : Bad memory unallocation! : 32 0x7fff5ad7a020
0.0 |
It seems unlikely that the OS version is to blame. An alternative explanation is a difference in the CPU architecture (Sandybridge vs. Haswell), which leads to different code paths in OpenBLAS. (In particular, the SIMD vector sizes are different.) Can you post here the output of
|
Don't the returns from |
yep its the same as in versioninfo:
|
I can reporduce this on a Haswell machine if I launch Julia with OPENBLAS_CORETYPE=Sandybridge julia so I guess there is an issue with OpenBLAS' Sandybridge kernels. On Tue, Dec 29, 2015 at 10:43 PM, Sheehan Olver [email protected]
|
Cc @xianyi |
Yes, confirm @andreasnoack report about sandybridge. |
Hi for all, My first time here! I can reproduce the error here with El Capitan version 10.11.2.
What I saw until now is that the error occurs on file: /JuliaLang/julia/blob/master/base/linalg/svd.jl on the line: svdvals!{T<:BlasFloat}(A::StridedMatrix{T}) = any([size(A)...].==0) ? zeros(T, 0) : LAPACK.gesdd!('N', A)[2] It calls the function LAPACK.gesdd!('N', A) The weirdest thing is that the error occurs only with the index 129,129:
The exception is raised on the file: In the function "gesdd!", there is a looping:
And the error is raised on the second call of the loop (i=2). The error is raised in the OpenBLAS lib, and it looks like it occurs on a stack pointer subtract:
Hope this helps!! Sorry if I made mistakes with the English language :-) |
Nice debugging work @gomiero, thanks and welcome! Looks like we should report this as an openblas bug, maybe with a C or Fortran reproduction case. Worth testing against the develop branch of openblas. |
This seems to be fixed on the develop branch (OpenMathLib/OpenBLAS@3857581) |
Hello @tkelman and All, Thanks for the welcome and a Happy New Year to All! Sorry for the late answer, but until a few days ago, I had never programmed in FORTRAN, so it took a bit to learn it and get more information about this issue. I looked for a message in the OS X system's log and I found the following report:
I could confirm the call stack with the valgrind tool when the exception is raised:
I have found that the error 'Abort Trap: 6' is raised on a call of the function __stack_chk_fail located in the libsystem_c OS X library. You can reproduce the behavior of this function with a simple C program:
If you enter an index 10 (above the limit of the allocated buffer), the program ends with the error 'Abort Trap: 6'. The same C program, compiled with gcc 5.1.0 on my Windows 8.1 system (64-bit - Intel(R) Core(TM) i7-5820K - Haswell-E/EP) only generates a ACCESS_VIOLATION error if you go beyond the index 119 (120 or above). In sistem Windows, I used the following compiler:
On stack's fail, I found that in OpenBLAS library exists an if statement, used to decide if the version of gemv function that will be called is single thread (gemv) or multi-threaded (gemv_thread). As there is a fixed calculation related to the size of the matrix (m * n) to decide which version will be called, maybe this explains why the error does occur with the index 129,129. In the first execution of the loop in Julia code (i = 1), the single thread function (gemv) is called, however, in the second run (i = 2) the called function is multithreaded (gemv_thread). I realized that __stack_chk_fail error occurs near the blas_memory_free function, so it may be that some of the threads are reaching out of the bounds of the buffer allocated, in the end of the code flow. I have no experience in debugging multithreaded code on OS X, however, I'll try to learn how to do this, and try to compile a debug version of OpenBLAS (-g flag), because I believe it will be easier to follow the code flow with LLDB and valgrind with the debug information enabled. Maybe it is important that someone with more experience than I, can, please, confirm if the analysis made is correct and accurate. Please, point me if I made a mistake in this analysis and, again, sorry if I made mistakes with the English language. Best Regards |
Your analysis makes sense so far (and your english is absolutely fine, I wouldn't worry about that), though given @andreasnoack's report it seems this may have already been fixed by openblas just not yet included in a release version. We could try identifying which upstream commit fixed the problem, or ask if they consider the current develop state to be stable enough to tag a release that we could try upgrading to. |
I discovered this issue also, but in using eigfact() and svd() instead of norm(): for example, Is there anything one can do now to get around this problem? |
The issue is fixed upstream so when we update the OpenBLAS version then it will go away. Depending on your platform, it might be easy to upgrade your OpenBLAS. Do you compile your own Julia? |
No, I have just been downloading the precompiled versions. I was on 0.4.2; it looks like it is time to upgrade to 0.4.3, but I imagine that doesn't fix OpenBLAS or you would have mentioned it. The built-in |
Note that accelerate does not have the fast lapack functions that openblas has. |
Here's a workaround: recompile with override USE_SYSTEM_BLAS = 1
override USE_SYSTEM_LAPACK = 0
override USE_BLAS64 = 0
override USE_QUIET = 0 |
@xianyi Do you have a planned date (roughly) for the next release of openblas? |
You could also set the OPENBLAS_CORETYPE environment variable to something slightly older. What came right before Sandy Bridge, Nehalem maybe? |
An issue with my fix is that it break special functions: julia> airyai(5)
ERROR: error compiling airyai: error compiling airy: error compiling _airy: could not load library "libopenspecfun"
dlopen(libopenspecfun.dylib, 1): Library not loaded: /usr/local/Cellar/gfortran/4.8.2/gfortran/lib/libgfortran.3.dylib
Referenced from: /Users/solver/Projects/julia/usr/lib//libopenspecfun.dylib
Reason: image not found |
Nevermind my last comment, I just hadn't properly reset the dependencies. Is there a reason that the OS X bundle can't be compiled with override USE_SYSTEM_BLAS = 1 Not only does it fix the bug, eigs is roughly 20x faster for small matrices. |
There seems to be an issue with GEMV in OpenBLAS for small matrices on OS X only. See JuliaLang/LinearAlgebra.jl#72. Hopefully, it can be fixed soon. We have discussed using VecLib by default on OS X before, but I think it is easier to use the same BLAS on all platforms and I don't think VecLib is uniformly faster than OpenBLAS. For GEMM, OpenBLAS is usually as fast as any other BLAS. |
Given that the issue JuliaLang/LinearAlgebra.jl#72 is over 2 years old, it doesn't look hopeful. But the bigger issue is that the current Julia bundle is essentially unusable for anything that requires svd. |
Is this closed now that we're using openblas 0.2.18? |
I think this is closed. Originally I would have julia bomb on norm(zeros(129,129)). This now has On Sat, Jun 4, 2016 at 8:04 AM, Tony Kelman [email protected]
|
The text was updated successfully, but these errors were encountered: