Skip to content

Invalid read of size 4 in ctrmm_iutncopy_SANDYBRIDGE #1770

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
amigalemming opened this issue Sep 21, 2018 · 7 comments
Closed

Invalid read of size 4 in ctrmm_iutncopy_SANDYBRIDGE #1770

amigalemming opened this issue Sep 21, 2018 · 7 comments

Comments

@amigalemming
Copy link

I've got hard to reproduce crashes when calling ctrmm with 2x2 matrices on sandybridge. Valgrind says:

==4642== Invalid read of size 4
==4642==    at 0x797BFD1: ctrmm_iutncopy_SANDYBRIDGE (in /usr/lib/libopenblasp-r0.2.18.so)
==4642==    by 0x69E9700: ctrmm_LNUN (in /usr/lib/libopenblasp-r0.2.18.so)
==4642==    by 0x5963BBA: ctrmm_ (in /usr/lib/openblas-base/libblas.so.3)
==4642==    by 0x711075: ??? (in .../haskell/lapack/dist/build/lapack-test/lapack-test)
==4642==    by 0x42000BE147: ???
==4642==    by 0x42001FFFCF: ???
==4642==    by 0x42001FFFB7: ???
==4642==    by 0x42000BE06F: ???
==4642==    by 0x42001FFFB7: ???
==4642==  Address 0x4200200000 is in a --- anonymous segment
==4642== 
==4642== 
==4642== Process terminating with default action of signal 11 (SIGSEGV)
==4642==  Bad permissions for mapped region at address 0x4200200000
==4642==    at 0x797BFD1: ctrmm_iutncopy_SANDYBRIDGE (in /usr/lib/libopenblasp-r0.2.18.so)
==4642==    by 0x69E9700: ctrmm_LNUN (in /usr/lib/libopenblasp-r0.2.18.so)
==4642==    by 0x5963BBA: ctrmm_ (in /usr/lib/openblas-base/libblas.so.3)
==4642==    by 0x711075: ??? (in .../haskell/lapack/dist/build/lapack-test/lapack-test)
==4642==    by 0x42000BE147: ???
==4642==    by 0x42001FFFCF: ???
==4642==    by 0x42001FFFB7: ???
==4642==    by 0x42000BE06F: ???
==4642==    by 0x42001FFFB7: ???

strmm, dtrmm, ztrmm seem not to be affected.

Maybe related to #601?

@martin-frbg
Copy link
Collaborator

Could be related indeed, and your backtrace suggests you are using an old version (0.2.18) that still had this bug. Would it be possible for you to try a more recent version ? The fix was committed on the develop branch in october 2017 and the first release to have it would be 0.3.0.

@amigalemming
Copy link
Author

amigalemming commented Sep 21, 2018 via email

@brada4
Copy link
Contributor

brada4 commented Sep 21, 2018

https://github.com/xianyi/OpenBLAS/wiki/faq#debianlts
You can completely remove ubuntu openblas packages.
EDIT:
libblas.so.3 is runtime (ld.so) library. i strongly advise to install blas-dev/lapack-dev packages to have gcc/ld to read reference BLAS (libblas.so) when compiling and dont introduce non-portable dependency on openblas PRIVATE symbols, then later the runtime ld.so will pick the right improved version 3 of BLAS that is replaced with OpenBLAS.

@brada4
Copy link
Contributor

brada4 commented Sep 22, 2018

@amigalemming if you use hmatrix you have to disable openblas usage, so that it uses system blas. Probably uninstaling libopenblas0 forces its hand....

@amigalemming
Copy link
Author

amigalemming commented Sep 22, 2018 via email

@brada4
Copy link
Contributor

brada4 commented Sep 22, 2018

Those are openblas internal symbols that appear missing. They should not be called directly.
two letters in the end are 2 transform parameters that should have been passed through BLAS function.
You can find dispatch functions in interface/* which selects parallel or not, parallel is very bad for small (?but how small) data samples, more accurate calibration is always welcome....

Yes, you will use ld that knows only about functions in (netlib) libblas.so , then at runtime libblas.so.3 will be libopenblas.so
As a result any further consumer will be free to use netlib/atlas/mkl/openblas/cublas/clblas etc

EDIT: those private symbols are not maintained, even less guaranteed, between versions. it could happen (and happens often) that broken parallel function gets disabled for few releases, or broken serial function gets replaced whith parallele version in one thread.
The official API is BLAS, CBLAS, LAPACK, LAPACKE + few utility (not math) functions in openblas cblas.h that deal with detecting build configuration, change maximum threads below number of CPUs, detected CPU type etc.

@amigalemming
Copy link
Author

amigalemming commented Apr 16, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants