-
Notifications
You must be signed in to change notification settings - Fork 1.6k
BLAS isn't multi-core #1883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
What is your typical problem size ? OpenBLAS will (should) not use multiple threads for a very small matrix where the administrative overhead would exceed any possible gain from working in parallel. |
The execution time of the script in 1 thread takes about 5-6 seconds, when multithreading is used, the execution time is 0.8 - 1 second. It may be a problem in the wrong configuration of dlib, but in theory it should not be so. |
Reference BLAS is not multicore, that is for certain. We will need a firm test case or at least what comes out form "perf record ./detect ; perf report" Another thing - I suspect it is threads there in the pictures?
Try to describe which thread belong to main(), then make some guesses regarding asymmetry, if you cannot contain it, please edit out private things from the file and attach here. |
Running |
@martin-frbg ubuntu libblas.so.3 priorities (highest becomes default when installed) |
|
ldd detect **libopenblas**.so.0 => /usr/local/lib/libopenblas.so.0 (0x0000007fa7397000)
**libpthread**.so.0 => /lib/aarch64-linux-gnu/libpthread.so.0 (0x0000007fa7353000)
libopencv_core.so.4.0 => /usr/local/lib/libopencv_core.so.4.0 (0x0000007fa7036000)
libopencv_imgproc.so.4.0 => /usr/local/lib/libopencv_imgproc.so.4.0 (0x0000007fa6c81000)
libopencv_calib3d.so.4.0 => /usr/local/lib/libopencv_calib3d.so.4.0 (0x0000007fa6b1c000)
libopencv_videoio.so.4.0 => /usr/local/lib/libopencv_videoio.so.4.0 (0x0000007fa6aba000)
libstdc++.so.6 => /usr/lib/aarch64-linux-gnu/libstdc++.so.6 (0x0000007fa6923000)
libm.so.6 => /lib/aarch64-linux-gnu/libm.so.6 (0x0000007fa6869000)
libgcc_s.so.1 => /lib/aarch64-linux-gnu/libgcc_s.so.1 (0x0000007fa6845000)
libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000007fa66ec000)
/lib/ld-linux-aarch64.so.1 (0x000000557e93b000)
libgfortran.so.4 => /usr/lib/aarch64-linux-gnu/libgfortran.so.4 (0x0000007fa65e8000)
libdl.so.2 => /lib/aarch64-linux-gnu/libdl.so.2 (0x0000007fa65d3000)
libopencv_features2d.so.4.0 => /usr/local/lib/libopencv_features2d.so.4.0 (0x0000007fa651b000)
libopencv_flann.so.4.0 => /usr/local/lib/libopencv_flann.so.4.0 (0x0000007fa64b9000)
libopencv_imgcodecs.so.4.0 => /usr/local/lib/libopencv_imgcodecs.so.4.0 (0x0000007fa62a9000)
libgstreamer-1.0.so.0 => /usr/lib/aarch64-linux-gnu/libgstreamer-1.0.so.0 (0x0000007fa6179000)
libgobject-2.0.so.0 => /usr/lib/aarch64-linux-gnu/libgobject-2.0.so.0 (0x0000007fa611b000)
libglib-2.0.so.0 => /usr/lib/aarch64-linux-gnu/libglib-2.0.so.0 (0x0000007fa600c000)
libgstapp-1.0.so.0 => /usr/lib/aarch64-linux-gnu/libgstapp-1.0.so.0 (0x0000007fa5fec000)
libgstriff-1.0.so.0 => /usr/lib/aarch64-linux-gnu/libgstriff-1.0.so.0 (0x0000007fa5fce000)
libgstpbutils-1.0.so.0 => /usr/lib/aarch64-linux-gnu/libgstpbutils-1.0.so.0 (0x0000007fa5f89000)
libdc1394.so.22 => /usr/lib/aarch64-linux-gnu/libdc1394.so.22 (0x0000007fa5f08000)
libavcodec.so.57 => /usr/lib/aarch64-linux-gnu/libavcodec.so.57 (0x0000007fa4cc2000)
libavformat.so.57 => /usr/lib/aarch64-linux-gnu/libavformat.so.57 (0x0000007fa4a8b000)
libavutil.so.55 => /usr/lib/aarch64-linux-gnu/libavutil.so.55 (0x0000007fa49fa000)
libswscale.so.4 => /usr/lib/aarch64-linux-gnu/libswscale.so.4 (0x0000007fa4986000)
libjpeg.so.8 => /usr/lib/aarch64-linux-gnu/libjpeg.so.8 (0x0000007fa493c000)
libpng16.so.16 => /usr/lib/aarch64-linux-gnu/libpng16.so.16 (0x0000007fa4901000)
libtiff.so.5 => /usr/lib/aarch64-linux-gnu/libtiff.so.5 (0x0000007fa4886000)
libgmodule-2.0.so.0 => /usr/lib/aarch64-linux-gnu/libgmodule-2.0.so.0 (0x0000007fa4870000)
libffi.so.6 => /usr/lib/aarch64-linux-gnu/libffi.so.6 (0x0000007fa4858000)
libpcre.so.3 => /lib/aarch64-linux-gnu/libpcre.so.3 (0x0000007fa47e6000)
libgstbase-1.0.so.0 => /usr/lib/aarch64-linux-gnu/libgstbase-1.0.so.0 (0x0000007fa476f000)
libgstaudio-1.0.so.0 => /usr/lib/aarch64-linux-gnu/libgstaudio-1.0.so.0 (0x0000007fa46fb000)
libgsttag-1.0.so.0 => /usr/lib/aarch64-linux-gnu/libgsttag-1.0.so.0 (0x0000007fa46b4000)
libgstvideo-1.0.so.0 => /usr/lib/aarch64-linux-gnu/libgstvideo-1.0.so.0 (0x0000007fa461f000)
libraw1394.so.11 => /usr/lib/aarch64-linux-gnu/libraw1394.so.11 (0x0000007fa4603000)
libusb-1.0.so.0 => /lib/aarch64-linux-gnu/libusb-1.0.so.0 (0x0000007fa45dd000)
libswresample.so.2 => /usr/lib/aarch64-linux-gnu/libswresample.so.2 (0x0000007fa45b6000)
libwebp.so.6 => /usr/lib/aarch64-linux-gnu/libwebp.so.6 (0x0000007fa455e000)
libva.so.2 => /usr/lib/aarch64-linux-gnu/libva.so.2 (0x0000007fa452f000)
libzvbi.so.0 => /usr/lib/aarch64-linux-gnu/libzvbi.so.0 (0x0000007fa449e000)
libxvidcore.so.4 => /usr/lib/aarch64-linux-gnu/libxvidcore.so.4 (0x0000007fa43b7000)
libx265.so.146 => /usr/lib/aarch64-linux-gnu/libx265.so.146 (0x0000007fa4159000)
libx264.so.152 => /usr/lib/aarch64-linux-gnu/libx264.so.152 (0x0000007fa3ff5000)
libwebpmux.so.3 => /usr/lib/aarch64-linux-gnu/libwebpmux.so.3 (0x0000007fa3fdc000)
libwavpack.so.1 => /usr/lib/aarch64-linux-gnu/libwavpack.so.1 (0x0000007fa3faa000)
libvpx.so.5 => /usr/lib/aarch64-linux-gnu/libvpx.so.5 (0x0000007fa3dfd000)
libvorbisenc.so.2 => /usr/lib/aarch64-linux-gnu/libvorbisenc.so.2 (0x0000007fa3d4e000)
libvorbis.so.0 => /usr/lib/aarch64-linux-gnu/libvorbis.so.0 (0x0000007fa3d18000)
libtwolame.so.0 => /usr/lib/aarch64-linux-gnu/libtwolame.so.0 (0x0000007fa3ce9000)
libtheoraenc.so.1 => /usr/lib/aarch64-linux-gnu/libtheoraenc.so.1 (0x0000007fa3ca7000)
libtheoradec.so.1 => /usr/lib/aarch64-linux-gnu/libtheoradec.so.1 (0x0000007fa3c80000)
libspeex.so.1 => /usr/lib/aarch64-linux-gnu/libspeex.so.1 (0x0000007fa3c59000)
libsnappy.so.1 => /usr/lib/aarch64-linux-gnu/libsnappy.so.1 (0x0000007fa3c41000)
libshine.so.3 => /usr/lib/aarch64-linux-gnu/libshine.so.3 (0x0000007fa3c27000)
librsvg-2.so.2 => /usr/lib/aarch64-linux-gnu/librsvg-2.so.2 (0x0000007fa3be9000)
libcairo.so.2 => /usr/lib/aarch64-linux-gnu/libcairo.so.2 (0x0000007fa3aed000)
libopus.so.0 => /usr/lib/aarch64-linux-gnu/libopus.so.0 (0x0000007fa3aa0000)
libopenjp2.so.7 => /usr/lib/aarch64-linux-gnu/libopenjp2.so.7 (0x0000007fa3a43000)
libmp3lame.so.0 => /usr/lib/aarch64-linux-gnu/libmp3lame.so.0 (0x0000007fa39c7000)
libgsm.so.1 => /usr/lib/aarch64-linux-gnu/libgsm.so.1 (0x0000007fa39ad000)
liblzma.so.5 => /lib/aarch64-linux-gnu/liblzma.so.5 (0x0000007fa397d000)
libz.so.1 => /lib/aarch64-linux-gnu/libz.so.1 (0x0000007fa3950000)
libssh-gcrypt.so.4 => /usr/lib/aarch64-linux-gnu/libssh-gcrypt.so.4 (0x0000007fa38d8000)
libopenmpt.so.0 => /usr/lib/aarch64-linux-gnu/libopenmpt.so.0 (0x0000007fa3710000)
libbluray.so.2 => /usr/lib/aarch64-linux-gnu/libbluray.so.2 (0x0000007fa36b9000)
libgnutls.so.30 => /usr/lib/aarch64-linux-gnu/libgnutls.so.30 (0x0000007fa355b000)
libxml2.so.2 => /usr/lib/aarch64-linux-gnu/libxml2.so.2 (0x0000007fa33bc000)
libgme.so.0 => /usr/lib/aarch64-linux-gnu/libgme.so.0 (0x0000007fa3365000)
libchromaprint.so.1 => /usr/lib/aarch64-linux-gnu/libchromaprint.so.1 (0x0000007fa3343000)
libbz2.so.1.0 => /lib/aarch64-linux-gnu/libbz2.so.1.0 (0x0000007fa3321000)
libX11.so.6 => /usr/lib/aarch64-linux-gnu/libX11.so.6 (0x0000007fa31f7000)
libdrm.so.2 => /usr/lib/aarch64-linux-gnu/libdrm.so.2 (0x0000007fa31d8000)
libvdpau.so.1 => /usr/lib/aarch64-linux-gnu/libvdpau.so.1 (0x0000007fa31c4000)
libva-x11.so.2 => /usr/lib/aarch64-linux-gnu/libva-x11.so.2 (0x0000007fa31af000)
libva-drm.so.2 => /usr/lib/aarch64-linux-gnu/libva-drm.so.2 (0x0000007fa319c000)
libjbig.so.0 => /usr/lib/aarch64-linux-gnu/libjbig.so.0 (0x0000007fa317d000)
liborc-0.4.so.0 => /usr/lib/aarch64-linux-gnu/liborc-0.4.so.0 (0x0000007fa3104000)
libudev.so.1 => /lib/aarch64-linux-gnu/libudev.so.1 (0x0000007fa30da000)
libsoxr.so.0 => /usr/lib/aarch64-linux-gnu/libsoxr.so.0 (0x0000007fa3079000)
libnuma.so.1 => /usr/lib/aarch64-linux-gnu/libnuma.so.1 (0x0000007fa3059000)
libogg.so.0 => /usr/lib/aarch64-linux-gnu/libogg.so.0 (0x0000007fa3042000)
libgdk_pixbuf-2.0.so.0 => /usr/lib/aarch64-linux-gnu/libgdk_pixbuf-2.0.so.0 (0x0000007fa3013000)
libgio-2.0.so.0 => /usr/lib/aarch64-linux-gnu/libgio-2.0.so.0 (0x0000007fa2e97000)
libpangocairo-1.0.so.0 => /usr/lib/aarch64-linux-gnu/libpangocairo-1.0.so.0 (0x0000007fa2e7b000)
libpangoft2-1.0.so.0 => /usr/lib/aarch64-linux-gnu/libpangoft2-1.0.so.0 (0x0000007fa2e56000)
libpango-1.0.so.0 => /usr/lib/aarch64-linux-gnu/libpango-1.0.so.0 (0x0000007fa2dff000)
libfontconfig.so.1 => /usr/lib/aarch64-linux-gnu/libfontconfig.so.1 (0x0000007fa2daf000)
libcroco-0.6.so.3 => /usr/lib/aarch64-linux-gnu/libcroco-0.6.so.3 (0x0000007fa2d6b000)
libpixman-1.so.0 => /usr/lib/aarch64-linux-gnu/libpixman-1.so.0 (0x0000007fa2d07000)
libfreetype.so.6 => /usr/lib/aarch64-linux-gnu/libfreetype.so.6 (0x0000007fa2c5e000)
libxcb-shm.so.0 => /usr/lib/aarch64-linux-gnu/libxcb-shm.so.0 (0x0000007fa2c49000)
libxcb.so.1 => /usr/lib/aarch64-linux-gnu/libxcb.so.1 (0x0000007fa2c19000)
libxcb-render.so.0 => /usr/lib/aarch64-linux-gnu/libxcb-render.so.0 (0x0000007fa2bfe000)
libXrender.so.1 => /usr/lib/aarch64-linux-gnu/libXrender.so.1 (0x0000007fa2be5000)
libXext.so.6 => /usr/lib/aarch64-linux-gnu/libXext.so.6 (0x0000007fa2bc5000)
libgcrypt.so.20 => /lib/aarch64-linux-gnu/libgcrypt.so.20 (0x0000007fa2b08000)
libgssapi_krb5.so.2 => /usr/lib/aarch64-linux-gnu/libgssapi_krb5.so.2 (0x0000007fa2ab8000)
libmpg123.so.0 => /usr/lib/aarch64-linux-gnu/libmpg123.so.0 (0x0000007fa2a5b000)
libvorbisfile.so.3 => /usr/lib/aarch64-linux-gnu/libvorbisfile.so.3 (0x0000007fa2a43000)
libp11-kit.so.0 => /usr/lib/aarch64-linux-gnu/libp11-kit.so.0 (0x0000007fa2931000)
libidn2.so.0 => /usr/lib/aarch64-linux-gnu/libidn2.so.0 (0x0000007fa2905000)
libunistring.so.2 => /usr/lib/aarch64-linux-gnu/libunistring.so.2 (0x0000007fa2780000)
libtasn1.so.6 => /usr/lib/aarch64-linux-gnu/libtasn1.so.6 (0x0000007fa275f000)
libnettle.so.6 => /usr/lib/aarch64-linux-gnu/libnettle.so.6 (0x0000007fa271e000)
libhogweed.so.4 => /usr/lib/aarch64-linux-gnu/libhogweed.so.4 (0x0000007fa26dd000)
libgmp.so.10 => /usr/lib/aarch64-linux-gnu/libgmp.so.10 (0x0000007fa2660000)
libicuuc.so.60 => /usr/lib/aarch64-linux-gnu/libicuuc.so.60 (0x0000007fa248b000)
libXfixes.so.3 => /usr/lib/aarch64-linux-gnu/libXfixes.so.3 (0x0000007fa2473000)
**libgomp**.so.1 => /usr/lib/aarch64-linux-gnu/libgomp.so.1 (0x0000007fa2436000)
libselinux.so.1 => /lib/aarch64-linux-gnu/libselinux.so.1 (0x0000007fa2403000)
libresolv.so.2 => /lib/aarch64-linux-gnu/libresolv.so.2 (0x0000007fa23de000)
libmount.so.1 => /lib/aarch64-linux-gnu/libmount.so.1 (0x0000007fa2381000)
libharfbuzz.so.0 => /usr/lib/aarch64-linux-gnu/libharfbuzz.so.0 (0x0000007fa22df000)
libthai.so.0 => /usr/lib/aarch64-linux-gnu/libthai.so.0 (0x0000007fa22c7000)
libexpat.so.1 => /lib/aarch64-linux-gnu/libexpat.so.1 (0x0000007fa2288000)
libXau.so.6 => /usr/lib/aarch64-linux-gnu/libXau.so.6 (0x0000007fa2275000)
libXdmcp.so.6 => /usr/lib/aarch64-linux-gnu/libXdmcp.so.6 (0x0000007fa2260000)
libgpg-error.so.0 => /lib/aarch64-linux-gnu/libgpg-error.so.0 (0x0000007fa223c000)
libkrb5.so.3 => /usr/lib/aarch64-linux-gnu/libkrb5.so.3 (0x0000007fa216d000)
libk5crypto.so.3 => /usr/lib/aarch64-linux-gnu/libk5crypto.so.3 (0x0000007fa212f000)
libcom_err.so.2 => /lib/aarch64-linux-gnu/libcom_err.so.2 (0x0000007fa211b000)
libkrb5support.so.0 => /usr/lib/aarch64-linux-gnu/libkrb5support.so.0 (0x0000007fa2101000)
libicudata.so.60 => /usr/lib/aarch64-linux-gnu/libicudata.so.60 (0x0000007fa0746000)
libblkid.so.1 => /lib/aarch64-linux-gnu/libblkid.so.1 (0x0000007fa06f1000)
libgraphite2.so.3 => /usr/lib/aarch64-linux-gnu/libgraphite2.so.3 (0x0000007fa06c0000)
libdatrie.so.1 => /usr/lib/aarch64-linux-gnu/libdatrie.so.1 (0x0000007fa06aa000)
libbsd.so.0 => /lib/aarch64-linux-gnu/libbsd.so.0 (0x0000007fa0688000)
libkeyutils.so.1 => /lib/aarch64-linux-gnu/libkeyutils.so.1 (0x0000007fa0672000)
libuuid.so.1 => /lib/aarch64-linux-gnu/libuuid.so.1 (0x0000007fa065b000)``` |
So it appears to be using OpenBLAS alright, but the overall running time seems to be too short to attach gdb to the running program and get a thread status report ? (With "1438" in brada's script replaced by the actual process id of the running "detect") |
I just ran it like this:
Then entered run, for start. |
Now you need to run samples until it gets asymmetrical thread CPU consumption as in pictures, then break with like Ctrl-C and inside gdb run perf and attach/detach is somewhat less burdening to main program. |
gomp gets loaded later, it is quite possible that pthread OpenBLAS gets called from various OMP threads yielding ncpu^2 worker threads. That should be worked around only in very recent develop versions soon to become 0.3.4 EDIT: #1875 is not yet applied to develop |
I use develop branch from github |
|
Am I doing the right thing? |
You did the right thing of (idle) threads is they look like OMP in dlib and opencv code instead. EDIT threads 2 3 4 are OpenBLAS workers. |
I don't have the perf command.
|
When I compiled opencv for multithreading it was written that it would use pthread. Also, before that, I used TBB, but there are no special changes. |
All use pthread-s (otherwise there would be libgomp.so below libpthread.so) , you see in backtraces, which is sort of good. kernel should be v4.15 in ubuntu 18.04 , you should ask for perf in the place your custom kernel was made (probably they dont have one) |
Compiling code of what? opencv? dlib? openblas? or my application? |
The problem I suspect lies in pointlessly spinning threads for doing almost nothing, with lots of CPU time being spent in single-threaded codes splitting the work among those. |
I collected openblas with -pg, but I do not know what's next. |
gprof default is polluting stdout |
Actually the suspect code is shared between architectures, perf might be easier available on x86_64 virtual machine for example. |
On another virtual machine, and not in my computer, everything works. I can not start, because the compilation with flags -pg did not produce results. Even if the compilation is done with libopenblas.a |
In shadow that gprof build conf is not widely used, I meant the other tool with normally built library:
|
I can not do so ( |
profile.txt |
When using openMP, the results are depressing. |
OpenBLAS takes <1% of your processing time.
1st function is from reference BLAS, completely in fortran, not calling OpenBLAS Probably it is worth going after top calls (in the profile) from DLIB, and try to eliminate those, or look for parallelized versions. |
The first few functions are dlib, can these functions be multithreaded? |
It is within scope of dlib programming, you got some advice in dlib issue already, try to link that with top calls encountered in the profile. OpenBLAS makes almost unnoticeable part of your computation. I will check in regard of two functions if they can be improved, but you will not see any improvement in overall code run time from there. |
Since you already use that (OpenBLAS as closest to MKL on ARM CPU) for 1% of code, probably worth looking into reworking 99% remaining so that it is either parallel itself or offloads more to parallel OpenBLAS or to GPU OpenCL (translated from CUDA into your SoC parlance)? |
Please reopen if you get any clear evidence that this is a problem in OpenBLAS rather than your code or dlib. |
When trying to work with openblas and dlib, multi-core processing does not work.
Previously (on another system, everything worked)
davisking/dlib#1004 (comment)
As I understand it, dlib uses multi-core openblas for multithreading.
OS: Ubuntu 18.04
Arch: arm64
Board: Orange pi
OpenBLAS version: git clone
Openblass build:
make CC=aarch64-linux-gnu-gcc FC=aarch64-linux-gnu-gfortran HOSTCC=gcc TARGET=ARMV8 -j8
Config:
Dlib build:
cmake -DCMAKE_C_FLAGS="-O3 -fprofile-use " -DDLIB_USE_CUDA=NO -DCMAKE_TOOLCHAIN_FILE=/mnt/c/Users/StepanOFF/Desktop/face_detect/min_core_aarch/aarch64.cmake –build –config Release ..
Both compilations have no errors and all libraries are present.
The picture shows that at startup only 1 core works.

No processing

The text was updated successfully, but these errors were encountered: