-
Notifications
You must be signed in to change notification settings - Fork 1.6k
segmentation fault in dgemm_otcopy #1694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
#1137 could be related (misaligned input) - in that case changing the two instances of |
HASWELL was added to openblas with 0.2.15 only https://packages.ubuntu.com/source/xenial/openblas What backtrace says that liblapack.so.3 is redirection to apt-provided openblas (0.2.18),which calls internal function from your provided openblas build (lines 4 and 5 in your backtrace) Can you provide result from consistent build e.g. adapting instruction to your build of known version of OpenBLAS, or reverting to complete apt package? |
Fairly certain that 0.2.20 was meant here, as that was the last update that was linked on the openblas.net webpage before xianyi became mostly unavailable. |
yep. my mistake. 0.2.20.
…On Sat, Jul 21, 2018 at 11:52 AM Martin Kroeker ***@***.***> wrote:
Fairly certain that 0.2.20 was meant here, as that was the last update
that was linked on the openblas.net webpage before xianyi became mostly
unavailable.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1694 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGAYfd6kJ4Tx-RSAQGVidfk_ZQqfKTjnks5uI3h4gaJpZM4VZsTw>
.
|
It is a mix of versions problem. |
Any success stories? |
Nothing yet. I've gone to the repo version of both openblas and
suitesparse for ubuntu 16.04 and 18,04 with the same seg fault. With
valgrind it ran out of memory on a 256G machine. I'm now looking at
martin-frbg's idea. I'll let you know. Thanks for the suggestions!
…On Wed, Jul 25, 2018 at 1:04 AM Andrew ***@***.***> wrote:
Any success stories?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1694 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGAYfUPuy4B9eGVO9wVRZwyQRkriUE85ks5uKCaZgaJpZM4VZsTw>
.
|
Can you show debug package |
I'll install the dbg versio. See below. Thanks again!
$ update-alternatives --list libblas.so
/usr/lib/libblas/libblas.so
/usr/lib/libopenblasp-r0.2.18.so
/usr/lib/openblas-base/libblas.so
$ update-alternatives --list liblapack.so
/usr/lib/lapack/liblapack.so
/usr/lib/openblas-base/liblapack.so
Also,
$ update-alternatives --display libblas.so
libblas.so - auto mode
link best version is /usr/lib/libopenblasp-r0.2.18.so
link currently points to /usr/lib/libopenblasp-r0.2.18.so
link libblas.so is /usr/lib/libblas.so
slave blas.pc is /usr/lib/pkgconfig/blas.pc
slave libblas.a is /usr/lib/libblas.a
/usr/lib/libblas/libblas.so - priority 10
slave blas.pc: /usr/lib/pkgconfig/blas-netlib.pc
slave libblas.a: /usr/lib/libblas/libblas.a
/usr/lib/libopenblasp-r0.2.18.so - priority 40
slave libblas.a: /usr/lib/libopenblasp-r0.2.18.a
/usr/lib/openblas-base/libblas.so - priority 40
slave blas.pc: /usr/lib/pkgconfig/blas-openblas.pc
slave libblas.a: /usr/lib/openblas-base/libblas.a
$ update-alternatives --display liblapack.so
liblapack.so - auto mode
link best version is /usr/lib/openblas-base/liblapack.so
link currently points to /usr/lib/openblas-base/liblapack.so
link liblapack.so is /usr/lib/liblapack.so
slave lapack.pc is /usr/lib/pkgconfig/lapack.pc
slave liblapack.a is /usr/lib/liblapack.a
/usr/lib/lapack/liblapack.so - priority 20
slave lapack.pc: /usr/lib/pkgconfig/lapack-netlib.pc
slave liblapack.a: /usr/lib/lapack/liblapack.a
/usr/lib/openblas-base/liblapack.so - priority 40
slave lapack.pc: /usr/lib/pkgconfig/lapack-openblas.pc
slave liblapack.a: /usr/lib/openblas-base/liblapack.a
…On Wed, Jul 25, 2018 at 7:23 AM Andrew ***@***.***> wrote:
Can you show update-alternatives --list to assure consistent blas and
lapack are used?
debug package apt install libopenblas0-dbg would help to decode code line
numbers and function parameters.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1694 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGAYfaOC6N9v7emCoHySOw_rJaXelZv8ks5uKH9QgaJpZM4VZsTw>
.
|
libopenblas-dbg has been removed from 16.04
…On Wed, Jul 25, 2018 at 7:23 AM Andrew ***@***.***> wrote:
Can you show update-alternatives --list to assure consistent blas and
lapack are used?
debug package apt install libopenblas0-dbg would help to decode code line
numbers and function parameters.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1694 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGAYfaOC6N9v7emCoHySOw_rJaXelZv8ks5uKH9QgaJpZM4VZsTw>
.
|
Probably best to continue with current develop branch. Did you try changing the movaps to movups yet ? |
On my freshly installed ubuntu: There is no /usr/lib/libopenblasp-r0.2.18.so in my ubuntu 16.04 (xenial) Could you, please, uninstall all openblas apt packages, and remove spurious file(s) until alternatives point to /usr/lib/lapack/* |
Any progress ? |
I started with a clean install of Ubuntu 16.04. I pulled the repo versions
of libopenblas-dev and the report version of the suitesparse package. Same
seg fault problem. So I started fresh (clean install of ubuntu 16.04) I
cloned the latest openblas and built the latest suitesparse (5.3.0). I
static linked to the libs and so far everything is working fine.
Thanks for your help!
…On Thu, Aug 16, 2018 at 4:51 AM Martin Kroeker ***@***.***> wrote:
Any progress ?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1694 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGAYfWTRujwe_T0hUdXGRbmjnAupJWwHks5uRVzSgaJpZM4VZsTw>
.
|
Ubuntu suitesparse should link to supplanted blas without problem, if old version is acceptable. |
I've been using Openblas on my ubuntu 16.04 LTS systems for a few years with no issues. However, the last month for one of my problems I am getting a segmentation fault from dgemm_otcopy. The analysis runs every week and the matrix I am factoring gets a little bigger every week. However, the fault only occurs occasionally (twice in the last 5 weeks) and only on some of my computers (2 out of 4 of them virtually identically configured). When it does fail on a system, it is easily reproducible. The matrix being factored (using cholmod) is very large but very sparse. This week it is 12544654 by 12544654 with 71272674 nnz. The really strange thing is, it only segmentation faults when the executable is called from a shell script (bash). I can even make it fail with a 1 line script. But it doesn't ever fault when I run the executable from the command line, and the answer is sensible.
The fault occurs regardless of which version of Openblass I use (currently 0.2.20). At first, I suspected it overran stack so I ulimit -s unlimited and it actually changed the location where it faulted from a free memory call (classic stack overrun) to the dgemm_otcopy.
Here's the backtrace,
Any thoughts, suggestions, etc where to look next? I have seen a few older reports of a similar failure with no resolutions.
Kind regards,
B
Edit: Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz - 6 cores plus hyper-threading
I'll try both 0.3.1 and current development branch and get back....
The text was updated successfully, but these errors were encountered: