-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Problem with OpenBlas and Openmp #1193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hmm. If you did the compile in several stages for some reason (e.g. did not have gfortran installed at first) be sure to do a |
That warning is issued when pthread version is called from openmp program. Ubuntu OpenBLAS has that warning patched out.
|
Thank you both for your fast reply. The problem seems to be solved. I rebuilt the library using @brada4 sugestion of using USE_OPENMP=1 in the make comand instead of exporting it, and then adding it to the make install command too (maybe there was some problem with that). Now the error is no more appearing, and the performance is not so bad: mkl 6 seconds OpenBLAS 14, still double but not the exagerated difference that existed before. @brada4 why should I use the DYNAMIC_ARCH? Thank you both for your help |
because ubuntu package has it and ubuntu-openblas liblapack.so.0 imports it if touched. it is just needed for completeness to avoid spurious problems if you really replace system package. |
But im compiling from source, i dont have any ubuntu openblas package installed as far as i know. Im not trying to replace any system package. By the way I want to use this code in ARM and thats why I wanted to test it with openblas(because it has ARM support), and i still dont know wich O.S. will be in that platform. Are you asking for the report because that performance difference is not expected? |
"Some" performance difference is to be expected, though hopefully it will not be a factor of two in general but rather dependent on matrix sizes and number of threads. Having perf numbers might enable someone to spot particular bottlenecks for your specific use case, but given the apparent lack of available developers that are also fluent in assembly I am not sure if Sandybridge-generation hardware |
Just looking for overly eager threading, nothing more. Some performance difference is OK, but if missing half is threading threshold related it should be easy to recalibrate and improve |
sotty for the delay, i was quite bussy and not familiar with the perf tools, here is the report (by the way the program is segfaulting when i launch the record so maybe i have some bug around, no error in normal launch). With the proper core affinity setup and omp cores I am getting 1,9 secs with mkl and 4,15 with OpenBLAS, lower but still double. So I will appreciate if you gave me some advice fro the library/environment setup in order to improve the timings. Thank you in advance
|
Can you repeat same measuring but in ./common.h around line 360 |
Done, now it hasn't segfaulted. I cannot evaluate the time now because its a multi user machine and someone is executing, so the times will be wrong (giving 5.5 secs now). But just in case the thing you are looking for is not affected by that here is the report, I will repeat the measurements tomorrow if the machine is free. By the way, only for curiosity and if it is not too complex to explain, what have I just done (the edited line)?
|
sched_yield() makes system call, it is expensive for very short functions, you see _schedule is out of 2nd picture. |
Thank you for the info, here is the report of the execution, its still double of the mkl versi'on, but the sched removal has decreased the cost a little bit , around 0,20 from 4 seconds (now is executing between 3,95 and 4,05).
|
What is the (typical) matrix size in your test ? It is quite likely that MKL is a bit more sophisticated when it comes to picking the most appropriate algorithm. (For small sizes even the plain netlib implementation may have less overhead, and/or OpenBLAS may be splitting the workload on the "wrong" (smaller) index.) |
@martin-frbg observation is similar to 20-core broadwell (less threads, less yielding, less running time). I think one-to-all threading does not accommodate well the situation where all is a lot. |
Sorry for the delay, i Ws traveling past week, @martin-frbg I am working with different matrix sizes, 750x130, 750x10000, 10000x130 and 10000x10000, some operations are vector matrix. The code can be found in : https://gitlab.com/P.SanJuan/ASNA |
Since the initial problem was solved, and the performance difference seems to be in normal parameters, I close the issue. Thanks everyone for the help. |
Smallest is <1MB, middle are 10 and 60MB, biggest is 800MB |
Uh oh!
There was an error while loading. Please reload this page.
HI, i'm having problems using Openblas and openmp. I have a program that is working in mkl and i wanted to try it out with OpenBlas to compare.
In some parts of my code i have BLAS/LAPACK calls into an OpenMP prallel block and in other parts i have the BLAS/LAPACK calls outside from any parallel region. SO i dont want to build OpenBlas sequentially because i need it to be parallel for that sections that are not in openmp parallel regions.
The problem comes when instaling OpenBLAS with USE_OPENMP=1 apparently with bsuccess:
` OpenBLAS build complete. (BLAS CBLAS LAPACK LAPACKE)
OS ... Linux
Architecture ... x86_64
BINARY ... 64bit
C compiler ... GCC (command line : gcc)
Fortran compiler ... GFORTRAN (command line : gfortran)
Library Name ... libopenblas_sandybridgep-r0.2.20.dev.a (Multi threaded; Max num-threads is 48)
Use OpenMP in the multithreading. Because of ignoring OPENBLAS_NUM_THREADS and GOTO_NUM_THREADS flags,
you should use OMP_NUM_THREADS environment variable to control the number of threads.`
But when executing the code i obtain an infinite amout of:
OpenBLAS Warning : Detect OpenMP Loop and this application may hang. Please rebuild the library with USE_OPENMP=1 option.
By the way i have another version without the OpenMP parallel regions and it works but with a very poor performance: 9 secs MKL vs 2860 secs OpenBlas
So clearly something is not working properly. Caun anyone help me with these issues?
Additional info:
-S.O: Ubunto 16.04
-Processor: Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy bridge)
-Gfortran installed due to requirement when building the library.
-Compiled with: export USE_OPENMP=1;make
The text was updated successfully, but these errors were encountered: