Skip to content

Multithreaded version of OpenBLAS causes intermittent segfaults when starting Julia #229

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
brorson opened this issue Jun 7, 2013 · 5 comments

Comments

@brorson
Copy link

brorson commented Jun 7, 2013

I cloned and built the latest Julia on my FC13 machine (x64, AMD processor). When I start Julia, I get intermittant segfaults. Here's my machine info:

[sdb@localhost examples]$ uname -a
Linux localhost.localdomain 2.6.34.7-61.fc13.x86_64 #1 SMP Tue Oct 19
04:06:30 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux

I ran valgrind --tool=drd on Julia. This tool analyzes the behavior of threads and reports problems and conflicts between the various threads. I find many conflicts involving OpenBLAS threads. See the partial log from valgrind below.

When I set OPENBLAS_NUM_THREADS=1 then Julia start up perfectly each time -- no segfaults.

There appears to be some multithreading issues with OpenBLAS which cause the intermittent Julia start-up failures on my machine. Please take a look at the below valgrind log and see if any problems can be identified from it.


[sdb@localhost examples]$ valgrind --tool=drd ../julia
==6424== drd, a thread error detector
==6424== Copyright (C) 2006-2009, and GNU GPL'd, by Bart Van Assche.
==6424== Using Valgrind-3.5.0 and LibVEX; rerun with -h for copyright info
==6424== Command: ../julia
==6424==
--6424-- Warning: DWARF2 CFI reader: unhandled DW_OP_ opcode 0x8
--6424-- Warning: DWARF2 CFI reader: unhandled DW_OP_ opcode 0x8
--6424-- Warning: DWARF2 CFI reader: unhandled DW_OP_ opcode 0x8
--6424-- Warning: DWARF2 CFI reader: unhandled DW_OP_ opcode 0x8
--6424-- Warning: DWARF2 CFI reader: unhandled DW_OP_ opcode 0x8
--6424-- Warning: DWARF2 CFI reader: unhandled DW_OP_ opcode 0x8
==6424== Thread 3:
==6424== Conflicting load by thread 3 at 0x0e4ecb14 size 4
==6424== at 0xDA2D45D: blas_memory_alloc (in /usr/local/src/julia/usr/lib/libopenblas.so)
==6424== by 0xDA2E0D4: ??? (in /usr/local/src/julia/usr/lib/libopenblas.so)
==6424== by 0x4A0CE50: vgDrd_thread_wrapper (drd_pthread_intercepts.c:272)
==6424== by 0x345E007760: start_thread (in /lib64/libpthread-2.12.1.so)
==6424== by 0x345DCE14FC: clone (in /lib64/libc-2.12.1.so)
==6424== Allocation context: BSS section of /usr/local/src/julia/usr/lib/libopenblas.so
==6424== Other segment start (thread 2)
==6424== at 0x345DCE14C1: clone (in /lib64/libc-2.12.1.so)
==6424== Other segment end (thread 2)
==6424== at 0x345DCC8897: sched_yield (in /lib64/libc-2.12.1.so)
==6424== by 0xDA2E11E: ??? (in /usr/local/src/julia/usr/lib/libopenblas.so)
==6424== by 0x4A0CE50: vgDrd_thread_wrapper (drd_pthread_intercepts.c:272)
==6424== by 0x345E007760: start_thread (in /lib64/libpthread-2.12.1.so)
==6424== by 0x345DCE14FC: clone (in /lib64/libc-2.12.1.so)
==6424==
==6424== Conflicting load by thread 3 at 0x0e4ecb14 size 4
==6424== at 0xDA2D4FF: blas_memory_alloc (in /usr/local/src/julia/usr/lib/libopenblas.so)
==6424== by 0xDA2E0D4: ??? (in /usr/local/src/julia/usr/lib/libopenblas.so)
==6424== by 0x4A0CE50: vgDrd_thread_wrapper (drd_pthread_intercepts.c:272)
==6424== by 0x345E007760: start_thread (in /lib64/libpthread-2.12.1.so)
==6424== by 0x345DCE14FC: clone (in /lib64/libc-2.12.1.so)
==6424== Allocation context: BSS section of /usr/local/src/julia/usr/lib/libopenblas.so
==6424== Other segment start (thread 2)
==6424== at 0x345DCE14C1: clone (in /lib64/libc-2.12.1.so)
==6424== Other segment end (thread 2)
==6424== at 0x345DCC8897: sched_yield (in /lib64/libc-2.12.1.so)
==6424== by 0xDA2E11E: ??? (in /usr/local/src/julia/usr/lib/libopenblas.so)
==6424== by 0x4A0CE50: vgDrd_thread_wrapper (drd_pthread_intercepts.c:272)
==6424== by 0x345E007760: start_thread (in /lib64/libpthread-2.12.1.so)
==6424== by 0x345DCE14FC: clone (in /lib64/libc-2.12.1.so)
==6424==
==6424== Conflicting store by thread 3 at 0x0e4ecb54 size 4
==6424== at 0xDA2D5C8: blas_memory_alloc (in /usr/local/src/julia/usr/lib/libopenblas.so)
==6424== by 0xDA2E0D4: ??? (in /usr/local/src/julia/usr/lib/libopenblas.so)
==6424== by 0x4A0CE50: vgDrd_thread_wrapper (drd_pthread_intercepts.c:272)
==6424== by 0x345E007760: start_thread (in /lib64/libpthread-2.12.1.so)
==6424== by 0x345DCE14FC: clone (in /lib64/libc-2.12.1.so)
==6424== Allocation context: BSS section of /usr/local/src/julia/usr/lib/libopenblas.so
==6424== Other segment start (thread 2)
==6424== at 0x345DCE14C1: clone (in /lib64/libc-2.12.1.so)
==6424== Other segment end (thread 2)
==6424== at 0x345DCC8897: sched_yield (in /lib64/libc-2.12.1.so)
==6424== by 0xDA2E11E: ??? (in /usr/local/src/julia/usr/lib/libopenblas.so)
==6424== by 0x4A0CE50: vgDrd_thread_wrapper (drd_pthread_intercepts.c:272)
==6424== by 0x345E007760: start_thread (in /lib64/libpthread-2.12.1.so)
==6424== by 0x345DCE14FC: clone (in /lib64/libc-2.12.1.so)
... etc ....

@xianyi
Copy link
Collaborator

xianyi commented Jun 9, 2013

Thank you for the report. I will test it.

Could you try to build OpenBLAS with NO_AFFINITY=1?

Xianyi

@brorson
Copy link
Author

brorson commented Jun 9, 2013

OK, I built it with NO_AFFINITY=1. I still get intermittent segfaults when starting Julia -- see below. Is there anything else I should try?

Cheers,

Stuart


[sdb@localhost julia]$ ./julia
_
_ _ ()_ | A fresh approach to technical computing
() | () () | Documentation: http://docs.julialang.org
_ _ | | __ _ | Type "help()" to list help topics
| | | | | | |/ ` | |
| | |
| | | | (
| | | Version 0.2.0-1831.r026f394c.dirty
/ |_'|||__'| | Commit 026f394c54 2013-06-05 11:17:41*
|__/ |

julia>
[sdb@localhost julia]$ ./julia
Segmentation fault (core dumped)
[sdb@localhost julia]$ ./julia
Segmentation fault (core dumped)
[sdb@localhost julia]$ ./julia
Segmentation fault (core dumped)
[sdb@localhost julia]$ ./julia
Segmentation fault (core dumped)
[sdb@localhost julia]$ ./julia
Segmentation fault (core dumped)
[sdb@localhost julia]$ ./julia
Segmentation fault (core dumped)
[sdb@localhost julia]$ ./julia
_
_ _ ()_ | A fresh approach to technical computing
() | () () | Documentation: http://docs.julialang.org
_ _ | | __ _ | Type "help()" to list help topics
| | | | | | |/ ` | |
| | |
| | | | (
| | | Version 0.2.0-1831.r026f394c.dirty
/ |_'|||__'| | Commit 026f394c54 2013-06-05 11:17:41*
|__/ |

julia>

@xianyi
Copy link
Collaborator

xianyi commented Jun 13, 2013

Hi @brorson ,

What's your AMD processor? Is it bulldozer?

Could you try to build OpenBLAS with USE_OPENMP=1?

Xianyi

@brorson
Copy link
Author

brorson commented Jun 17, 2013

OK, I just did the rebuild with USE_OPENMP=1. Julia starts reliably now -- no segfaults. See below.

Oh, and I was wrong about the AMD processor. My laptop is an 8-way Intel processor. Here's the cpuinfo:

[sdb@localhost AHS]$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 30
model name : Intel(R) Core(TM) i7 CPU Q 720 @ 1.60GHz
stepping : 5
cpu MHz : 933.000
cache size : 6144 KB
physical id : 0
siblings : 8
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm ida tpr_shadow vnmi flexpriority ept vpid
bogomips : 3192.38
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 30
model name : Intel(R) Core(TM) i7 CPU Q 720 @ 1.60GHz
stepping : 5
cpu MHz : 933.000
cache size : 6144 KB
physical id : 0
siblings : 8
core id : 1
cpu cores : 4
apicid : 2
initial apicid : 2
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm ida tpr_shadow vnmi flexpriority ept vpid
bogomips : 3192.13
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:

processor : 2
vendor_id : GenuineIntel
cpu family : 6
model : 30
model name : Intel(R) Core(TM) i7 CPU Q 720 @ 1.60GHz
stepping : 5
cpu MHz : 933.000
cache size : 6144 KB
physical id : 0
siblings : 8
core id : 2
cpu cores : 4
apicid : 4
initial apicid : 4
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm ida tpr_shadow vnmi flexpriority ept vpid
bogomips : 3191.85
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:

processor : 3
vendor_id : GenuineIntel
cpu family : 6
model : 30
model name : Intel(R) Core(TM) i7 CPU Q 720 @ 1.60GHz
stepping : 5
cpu MHz : 933.000
cache size : 6144 KB
physical id : 0
siblings : 8
core id : 3
cpu cores : 4
apicid : 6
initial apicid : 6
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm ida tpr_shadow vnmi flexpriority ept vpid
bogomips : 3192.80
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:

processor : 4
vendor_id : GenuineIntel
cpu family : 6
model : 30
model name : Intel(R) Core(TM) i7 CPU Q 720 @ 1.60GHz
stepping : 5
cpu MHz : 933.000
cache size : 6144 KB
physical id : 0
siblings : 8
core id : 0
cpu cores : 4
apicid : 1
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm ida tpr_shadow vnmi flexpriority ept vpid
bogomips : 3192.21
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:

processor : 5
vendor_id : GenuineIntel
cpu family : 6
model : 30
model name : Intel(R) Core(TM) i7 CPU Q 720 @ 1.60GHz
stepping : 5
cpu MHz : 933.000
cache size : 6144 KB
physical id : 0
siblings : 8
core id : 1
cpu cores : 4
apicid : 3
initial apicid : 3
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm ida tpr_shadow vnmi flexpriority ept vpid
bogomips : 3192.43
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:

processor : 6
vendor_id : GenuineIntel
cpu family : 6
model : 30
model name : Intel(R) Core(TM) i7 CPU Q 720 @ 1.60GHz
stepping : 5
cpu MHz : 933.000
cache size : 6144 KB
physical id : 0
siblings : 8
core id : 2
cpu cores : 4
apicid : 5
initial apicid : 5
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm ida tpr_shadow vnmi flexpriority ept vpid
bogomips : 3192.69
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:

processor : 7
vendor_id : GenuineIntel
cpu family : 6
model : 30
model name : Intel(R) Core(TM) i7 CPU Q 720 @ 1.60GHz
stepping : 5
cpu MHz : 933.000
cache size : 6144 KB
physical id : 0
siblings : 8
core id : 3
cpu cores : 4
apicid : 7
initial apicid : 7
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm ida tpr_shadow vnmi flexpriority ept vpid
bogomips : 3192.22
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:

Finally, here's my current Makefile.config_last in the openblas directory:

OSNAME=Linux
ARCH=x86_64
C_COMPILER=GCC
BINARY32=
BINARY64=1
CEXTRALIB=-L/usr/lib/gcc/x86_64-redhat-linux/4.4.5 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.5 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.5/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.5/../../.. -lgomp -lpthread -lc
F_COMPILER=GFORTRAN
FC=gfortran
BU=_
FEXTRALIB=-L/usr/lib/gcc/x86_64-redhat-linux/4.4.5 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.5 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.5/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.5/../../.. -lgfortran -lm -lgfortran -lm -lgomp -lpthread -lc
CORE=NEHALEM
LIBCORE=nehalem
NUM_CORES=8
HAVE_MMX=1
HAVE_SSE=1
HAVE_SSE2=1
HAVE_SSE3=1
HAVE_SSSE3=1
HAVE_SSE4_1=1
HAVE_SSE4_2=1
MAKE += -j 8
SGEMM_UNROLL_M=4
SGEMM_UNROLL_N=8
DGEMM_UNROLL_M=2
DGEMM_UNROLL_N=8
QGEMM_UNROLL_M=2
QGEMM_UNROLL_N=2
CGEMM_UNROLL_M=2
CGEMM_UNROLL_N=4
ZGEMM_UNROLL_M=1
ZGEMM_UNROLL_N=4
XGEMM_UNROLL_M=1
XGEMM_UNROLL_N=1

Julia starts reliably now, with USE_OPENMP=1:

[sdb@localhost julia]$ ./julia
_
_ _ ()_ | A fresh approach to technical computing
() | () () | Documentation: http://docs.julialang.org
_ _ | | __ _ | Type "help()" to list help topics
| | | | | | |/ ` | |
| | |
| | | | (
| | | Version 0.2.0-1831.r026f394c.dirty
/ |_'|||__'| | Commit 026f394c54 2013-06-05 11:17:41*
|__/ |

julia>
[sdb@localhost julia]$ ./julia
_
_ _ ()_ | A fresh approach to technical computing
() | () () | Documentation: http://docs.julialang.org
_ _ | | __ _ | Type "help()" to list help topics
| | | | | | |/ ` | |
| | |
| | | | (
| | | Version 0.2.0-1831.r026f394c.dirty
/ |_'|||__'| | Commit 026f394c54 2013-06-05 11:17:41*
|__/ |

julia>
[sdb@localhost julia]$ ./julia
_
_ _ ()_ | A fresh approach to technical computing
() | () () | Documentation: http://docs.julialang.org
_ _ | | __ _ | Type "help()" to list help topics
| | | | | | |/ ` | |
| | |
| | | | (
| | | Version 0.2.0-1831.r026f394c.dirty
/ |_'|||__'| | Commit 026f394c54 2013-06-05 11:17:41*
|__/ |

julia>

@ViralBShah
Copy link
Contributor

Do you think this was due to #221 ?

@wernsaar wernsaar closed this as completed Jun 4, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants