Skip to content

Issue with multi-threaded sgemm kernel (reading past end of block->segmentatation faults) #535

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
danpovey opened this issue Apr 7, 2015 · 17 comments

Comments

@danpovey
Copy link

danpovey commented Apr 7, 2015

==4126== Invalid read of size 16
==4126== at 0x9D12F47: sgemm_kernel (in /home/ubuntu/workspace/kaldi/tools/OpenBLAS/install/lib/libopenblas_sandybridgep-r0.2.14.so)
==4126== Address 0x29fe95cc is 16,556 bytes inside a block of size 16,560 alloc'd

@danpovey
Copy link
Author

danpovey commented Apr 7, 2015

Sorry, more info:
Latest commit in our repo is:
commit 1e80b8b
CPU: model name : Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
uname -a: Linux ip-10-180-56-32 3.16.0-33-generic #44-Ubuntu SMP Thu Mar 12 12:19:35 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Compiled with this command:
make PREFIX=pwd/OpenBLAS/install FC=gfortran BINARY=64 DEBUG=1 USE_THREAD=0 -C OpenBLAS all install

I couldn't get a stack trace although it was compiled with debug- sorry.

@xianyi
Copy link
Collaborator

xianyi commented Apr 7, 2015

@danpovey , thank you for the feedback.
Dose it cause SEGFAULT? How could I reproduce this bug?

@danpovey
Copy link
Author

danpovey commented Apr 7, 2015

It sometimes causes a segfault, yes.
Reproduction is a little tricky, I am trying to figure out the sizes of matrix involved.

@danpovey
Copy link
Author

danpovey commented Apr 7, 2015

... it's in a larger application. But with other math libraries it's OK, and without threaded OpenBLAS.

@xianyi
Copy link
Collaborator

xianyi commented Apr 7, 2015

Is sgemm called immediately after set_num_threads? Same to #447 ?

@danpovey
Copy link
Author

danpovey commented Apr 7, 2015

No, I don't think we are calling set_num_threads at all.
Amit- if you just call "make openblas" from tools/, it makes it with
make PREFIX=pwd/OpenBLAS/install FC=gfortran BINARY=64 DEBUG=1
USE_THREAD=0 -C OpenBLAS all install
which doesn't use threads at all (USE_THREAD=0), so maybe this has nothing
to do with multi-threading.

Dan

On Tue, Apr 7, 2015 at 10:59 AM, Zhang Xianyi [email protected]
wrote:

Is sgemm called immediately after set_num_threads? Same to #447
#447 ?


Reply to this email directly or view it on GitHub
#535 (comment).

@danpovey
Copy link
Author

danpovey commented Apr 7, 2015

Also, this is what we're linking with, which may tell you something.
libopenblas.so.0 =>
/home/ubuntu/workspace/kaldi/tools/OpenBLAS/install/lib/libopenblas.so.0
(0x00007fb276420000)

Amit, I'd like to know how you were compiling OpenBLAS.
Dan

On Tue, Apr 7, 2015 at 11:04 AM, Daniel Povey [email protected] wrote:

No, I don't think we are calling set_num_threads at all.
Amit- if you just call "make openblas" from tools/, it makes it with
make PREFIX=pwd/OpenBLAS/install FC=gfortran BINARY=64 DEBUG=1
USE_THREAD=0 -C OpenBLAS all install
which doesn't use threads at all (USE_THREAD=0), so maybe this has nothing
to do with multi-threading.

Dan

On Tue, Apr 7, 2015 at 10:59 AM, Zhang Xianyi [email protected]
wrote:

Is sgemm called immediately after set_num_threads? Same to #447
#447 ?


Reply to this email directly or view it on GitHub
#535 (comment).

@danpovey
Copy link
Author

danpovey commented Apr 7, 2015

Oh, I see this is how we were compiling OpenBLAS:

git clone git://github.com/xianyi/OpenBLAS
cd OpenBLAS
  OPENBLAS_NUM_THREADS=2 make PREFIX=install all install

Dan

On Tue, Apr 7, 2015 at 11:05 AM, Daniel Povey [email protected] wrote:

Also, this is what we're linking with, which may tell you something.
libopenblas.so.0 =>
/home/ubuntu/workspace/kaldi/tools/OpenBLAS/install/lib/libopenblas.so.0
(0x00007fb276420000)

Amit, I'd like to know how you were compiling OpenBLAS.
Dan

On Tue, Apr 7, 2015 at 11:04 AM, Daniel Povey [email protected] wrote:

No, I don't think we are calling set_num_threads at all.
Amit- if you just call "make openblas" from tools/, it makes it with
make PREFIX=pwd/OpenBLAS/install FC=gfortran BINARY=64 DEBUG=1
USE_THREAD=0 -C OpenBLAS all install
which doesn't use threads at all (USE_THREAD=0), so maybe this has
nothing to do with multi-threading.

Dan

On Tue, Apr 7, 2015 at 10:59 AM, Zhang Xianyi [email protected]
wrote:

Is sgemm called immediately after set_num_threads? Same to #447
#447 ?


Reply to this email directly or view it on GitHub
#535 (comment).

@martin-frbg
Copy link
Collaborator

Above is valgrind output, right ? If so, perhaps you can get more information by running valgrind with the
"--db-attach=yes" option, which will allow you to drop into the gdb debugger when an invalid access is trapped.

@danpovey
Copy link
Author

danpovey commented Apr 7, 2015

I tried that, but unfortunately could not get a stack trace. I'm trying to
recompile with debug.
Dan

On Tue, Apr 7, 2015 at 11:28 AM, Martin Kroeker [email protected]
wrote:

Above is valgrind output, right ? If so, perhaps you can get more
information by running valgrind with the
"--db-attach=yes" option, which will allow you to drop into the gdb
debugger when an invalid access is trapped.


Reply to this email directly or view it on GitHub
#535 (comment).

@danpovey
Copy link
Author

danpovey commented Apr 7, 2015

After recompiling with debug I was still not able to get a stack trace, for
some reason:
Dan

0x0000000009e1a141 in sgemm_kernel () at
../kernel/x86_64/sgemm_kernel_16x4_sandy.S:2263
2263 SAVE2x2
(gdb)
(gdb)
(gdb) bt
#0 0x0000000009e1a141 in sgemm_kernel () at
../kernel/x86_64/sgemm_kernel_16x4_sandy.S:2263
#1 0x0000000000000000 in ?? ()
(gdb) list
2258 ALIGN_4
2259
2260
2261 .L2_39:
2262
2263 SAVE2x2
2264
2265 #if (defined(TRMMKERNEL) && defined(LEFT) && defined(TRANSA)) ||
2266 (defined(TRMMKERNEL) && !defined(LEFT) && !defined(TRANSA))

2267 movq K, %rax

==16059== Thread 2:
==16059== Invalid read of size 16
==16059== at 0x9E1A141: sgemm_kernel (sgemm_kernel_16x4_sandy.S:2263)
==16059== Address 0x2a295148 is 16,552 bytes inside a block of size 16,560
alloc'd
==16059== at 0x4C2D136: memalign (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)==16059== by
0x4C2D251: posix_memalign (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==16059== by 0x762FD65: kaldi::Matrix::Init(int, int)
(kaldi-matrix.cc:655)
==16059== by 0x762FA17: kaldi::Matrix::Resize(int, int,
kaldi::MatrixResizeType) (kaldi-matrix.cc:701)
==16059== by 0x4F19F42: kaldi::Matrix::Matrix(int, int,
kaldi::MatrixResizeType) (kaldi-matrix.h:696)
==16059== by 0x6C0D791: kaldi::CuMatrix::Resize(int, int,
kaldi::MatrixResizeType) (cu-matrix.cc:77)
==16059== by 0x565033B:
kaldi::nnet2::Component::Propagate(kaldi::nnet2::ChunkInfo const&,
kaldi::nnet2::ChunkInfo const&, kaldi::CuMatrixBase const&, kaldi::
CuMatrix) const (nnet-component.h:208)
==16059== by 0x5650B8B: kaldi::nnet2::NnetComputer::Propagate()
(nnet-compute.cc:99)
==16059== by 0x565131C: kaldi::nnet2::NnetComputation(kaldi::nnet2::Nnet
const&, kaldi::CuMatrixBase const&, bool,
kaldi::CuMatrixBase
) (nnet-compute.cc:164)
==16059== by 0x573B2A7:
kaldi::nnet2::DecodableNnet2Online::ComputeForFrame(int)
(online-nnet2-decodable.cc:127)
==16059== by 0x573ACF9:
kaldi::nnet2::DecodableNnet2Online::LogLikelihood(int, int)
(online-nnet2-decodable.cc:50)
==16059== by 0x696672B:
kaldi::LatticeFasterOnlineDecoder::ProcessEmitting(kaldi::DecodableInterface*)
(lattice-faster-online-decoder.cc:901)
==16059==
==16059==
==16059== ---- Attach to debugger ? --- [Return/N/n/Y/y/C/c] ---- y
==16059== starting debugger with cmd: /usr/bin/gdb -nw /proc/17502/fd/1024
17502
GNU gdb (Ubuntu 7.8-1ubuntu4) 7.8.0.20141001-cvs

On Tue, Apr 7, 2015 at 11:32 AM, Daniel Povey [email protected] wrote:

I tried that, but unfortunately could not get a stack trace. I'm trying
to recompile with debug.
Dan

On Tue, Apr 7, 2015 at 11:28 AM, Martin Kroeker [email protected]
wrote:

Above is valgrind output, right ? If so, perhaps you can get more
information by running valgrind with the
"--db-attach=yes" option, which will allow you to drop into the gdb
debugger when an invalid access is trapped.


Reply to this email directly or view it on GitHub
#535 (comment).

@xianyi
Copy link
Collaborator

xianyi commented Apr 7, 2015

In OpenBLAS gemm kernel, we stores frame register (rbp) to the stack, and uses frame register (rbp) for matrix B. Thus, it may not get the stack trace in the kernel.

@danpovey
Copy link
Author

danpovey commented Apr 7, 2015

Let me know if there is a workaround that I can do in gdb.
Dan

On Tue, Apr 7, 2015 at 11:48 AM, Zhang Xianyi [email protected]
wrote:

In OpenBLAS gemm kernel, we stores frame register (rbp) to the stack, and
uses frame register (rbp) for matrix B. Thus, it may not get the stack
trace in the kernel.


Reply to this email directly or view it on GitHub
#535 (comment).

@xianyi
Copy link
Collaborator

xianyi commented Apr 7, 2015

@danpovey , please try the latest develop branch. I think I fixed this bug.

@danpovey
Copy link
Author

danpovey commented Apr 7, 2015

Thanks a lot for fixing it so fast! Yes, the issue seems to have gone away.
Dan

On Tue, Apr 7, 2015 at 12:58 PM, Zhang Xianyi [email protected]
wrote:

@danpovey https://github.com/danpovey , please try the latest develop
branch. I think I fixed this bug.


Reply to this email directly or view it on GitHub
#535 (comment).

@xianyi xianyi closed this as completed Apr 7, 2015
@amitbeka
Copy link

amitbeka commented Apr 8, 2015

Thanks for solving this so quickly - you guys are awesome!

@xianyi
Copy link
Collaborator

xianyi commented Apr 8, 2015

@amitbeka , Thank you for choosing OpenBLAS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants