Skip to content

Problem with OpenBlas and Openmp #1193

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
NonRepeat opened this issue May 31, 2017 · 17 comments
Closed

Problem with OpenBlas and Openmp #1193

NonRepeat opened this issue May 31, 2017 · 17 comments

Comments

@NonRepeat
Copy link

NonRepeat commented May 31, 2017

HI, i'm having problems using Openblas and openmp. I have a program that is working in mkl and i wanted to try it out with OpenBlas to compare.

In some parts of my code i have BLAS/LAPACK calls into an OpenMP prallel block and in other parts i have the BLAS/LAPACK calls outside from any parallel region. SO i dont want to build OpenBlas sequentially because i need it to be parallel for that sections that are not in openmp parallel regions.

The problem comes when instaling OpenBLAS with USE_OPENMP=1 apparently with bsuccess:

` OpenBLAS build complete. (BLAS CBLAS LAPACK LAPACKE)

OS ... Linux
Architecture ... x86_64
BINARY ... 64bit
C compiler ... GCC (command line : gcc)
Fortran compiler ... GFORTRAN (command line : gfortran)
Library Name ... libopenblas_sandybridgep-r0.2.20.dev.a (Multi threaded; Max num-threads is 48)

Use OpenMP in the multithreading. Because of ignoring OPENBLAS_NUM_THREADS and GOTO_NUM_THREADS flags,
you should use OMP_NUM_THREADS environment variable to control the number of threads.`

But when executing the code i obtain an infinite amout of:
OpenBLAS Warning : Detect OpenMP Loop and this application may hang. Please rebuild the library with USE_OPENMP=1 option.

By the way i have another version without the OpenMP parallel regions and it works but with a very poor performance: 9 secs MKL vs 2860 secs OpenBlas

So clearly something is not working properly. Caun anyone help me with these issues?

Additional info:
-S.O: Ubunto 16.04
-Processor: Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy bridge)
-Gfortran installed due to requirement when building the library.
-Compiled with: export USE_OPENMP=1;make

@martin-frbg
Copy link
Collaborator

Hmm. If you did the compile in several stages for some reason (e.g. did not have gfortran installed at first) be sure to do a make clean before the make USE_OPENMP=1 as the build system is a bit involved and may not detect if some already present object modules were compiled with different options.
Poor performance usually comes at least in part from using too many threads - by default, OpenBLAS will try to run on all available cores including hyperthreading pseudocores that may actually lower throughput. Also, which version of gcc/gfortran are you using (Ubuntu 16 comes with a rather old one by default I believe) ? And silly as the suggestion may seem, have you made sure that your program is
actually using the newly built openblas rather than an ancient one that may have come with the OS ?

@brada4
Copy link
Contributor

brada4 commented May 31, 2017

That warning is issued when pthread version is called from openmp program. Ubuntu OpenBLAS has that warning patched out.
You can check openblas library and your program with ldd to confirm runtime import of conflicting APIs.

export X=Y; make will not work, best option for ubuntu is described here:
https://github.com/xianyi/OpenBLAS/wiki/faq#debianlts
You will need to use USE_OPENMP=1 flag for both make commands in addition to DYNAMIC_ARCH=1
Also you will have to relink your program against system -lblas
(bonus - guess how to supplant libblas.so.3 with MKL)

@NonRepeat
Copy link
Author

NonRepeat commented Jun 1, 2017

Thank you both for your fast reply. The problem seems to be solved.
@martin-frbg i did make clean between each installation , my gcc is 5.3.0 and well it was a stupid suggestion but i havent checked before you told me XD, but yes it links against the new built library.

I rebuilt the library using @brada4 sugestion of using USE_OPENMP=1 in the make comand instead of exporting it, and then adding it to the make install command too (maybe there was some problem with that).

Now the error is no more appearing, and the performance is not so bad: mkl 6 seconds OpenBLAS 14, still double but not the exagerated difference that existed before.

@brada4 why should I use the DYNAMIC_ARCH?

Thank you both for your help

@brada4
Copy link
Contributor

brada4 commented Jun 1, 2017

because ubuntu package has it and ubuntu-openblas liblapack.so.0 imports it if touched. it is just needed for completeness to avoid spurious problems if you really replace system package.
PPS normally you dont run into this situation but I tried to make FAQ paragraph as bulletproof as possible.
Can you run perf record ; perf report and paste just the text? (replace your symbols with XXX and kernel symbols with YYY if not willing to share that.

@NonRepeat
Copy link
Author

NonRepeat commented Jun 2, 2017

But im compiling from source, i dont have any ubuntu openblas package installed as far as i know. Im not trying to replace any system package. By the way I want to use this code in ARM and thats why I wanted to test it with openblas(because it has ARM support), and i still dont know wich O.S. will be in that platform.

Are you asking for the report because that performance difference is not expected?

@martin-frbg
Copy link
Collaborator

"Some" performance difference is to be expected, though hopefully it will not be a factor of two in general but rather dependent on matrix sizes and number of threads. Having perf numbers might enable someone to spot particular bottlenecks for your specific use case, but given the apparent lack of available developers that are also fluent in assembly I am not sure if Sandybridge-generation hardware
(no matter how prevalent) will receive much attention unless from someone directly benefiting from improved performance.

@brada4
Copy link
Contributor

brada4 commented Jun 2, 2017

Just looking for overly eager threading, nothing more. Some performance difference is OK, but if missing half is threading threshold related it should be easy to recalibrate and improve

@NonRepeat
Copy link
Author

NonRepeat commented Jun 6, 2017

sotty for the delay, i was quite bussy and not familiar with the perf tools, here is the report (by the way the program is segfaulting when i launch the record so maybe i have some bug around, no error in normal launch).

With the proper core affinity setup and omp cores I am getting 1,9 secs with mkl and 4,15 with OpenBLAS, lower but still double.

So I will appreciate if you gave me some advice fro the library/environment setup in order to improve the timings.

Thank you in advance

47,79%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] dgemm_kernel                                                   
   5,27%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] dgemm_itcopy                                                   
   3,62%  mainAsnaGcc  mainAsnaGcc                              [.] compInitialAtoms._omp_fn.1                                     
   3,21%  mainAsnaGcc  mainAsnaGcc                              [.] dasnaPar_cpu._omp_fn.6                                         
   3,21%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x00000000000193b7                                             
   2,54%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x00000000000193b9                                             
   2,11%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x000000000001954f                                             
   2,04%  mainAsnaGcc  [kernel.kallsyms]                        [k] __schedule                                                     
   1,93%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x0000000000019551                                             
   1,73%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x00000000000193bb                                                1,51%  mainAsnaGcc  [kernel.kallsyms]                        [k] entry_SYSCALL_64_fastpath                                         1,31%  mainAsnaGcc  [kernel.kallsyms]                        [k] update_curr                                                       1,28%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x0000000000019553                                                1,27%  mainAsnaGcc  libc-2.23.so                             [.] __sched_yield                                                     1,26%  mainAsnaGcc  [kernel.kallsyms]                        [k] clear_page_c_e                                                    1,12%  mainAsnaGcc  [kernel.kallsyms]                        [k] pick_next_task_fair                                               1,10%  mainAsnaGcc  [kernel.kallsyms]                        [k] native_sched_clock                                                1,07%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] ddot_kernel_8                                                     1,04%  mainAsnaGcc  [kernel.kallsyms]                        [k] entry_SYSCALL_64_after_swapgs                                  
   0,93%  mainAsnaGcc  [kernel.kallsyms]                        [k] _raw_spin_lock                                                 
   0,92%  mainAsnaGcc  mainAsnaGcc                              [.] dsub_x86._omp_fn.19                                            
   0,89%  mainAsnaGcc  mainAsnaGcc                              [.] ddiv_x86._omp_fn.17                                            
   0,87%  mainAsnaGcc  [kernel.kallsyms]                        [k] _raw_spin_lock_irq                                             
   0,77%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] dgemm_beta                                                     
   0,67%  mainAsnaGcc  mainAsnaGcc                              [.] dMin_x86._omp_fn.22                                            
   0,65%  mainAsnaGcc  [kernel.kallsyms]                        [k] pick_next_entity                                               
   0,63%  mainAsnaGcc  [kernel.kallsyms]                        [k] update_min_vruntime                                            
   0,49%  mainAsnaGcc  [kernel.kallsyms]                        [k] __calc_delta                                                   
   0,42%  mainAsnaGcc  [kernel.kallsyms]                        [k] native_queued_spin_lock_slowpath                               
   0,39%  mainAsnaGcc  [kernel.kallsyms]                        [k] sys_sched_yield                                                
   0,37%  mainAsnaGcc  [kernel.kallsyms]                        [k] cpuacct_charge                                                 
   0,33%  mainAsnaGcc  mainAsnaGcc                              [.] dtrans._omp_fn.15                                              
   0,32%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] dcopy_k                                                        
   0,32%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x00000000000193bf                                             
   0,31%  mainAsnaGcc  [kernel.kallsyms]                        [k] schedule                                                       
   0,27%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] blas_memory_alloc                                              
   0,27%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x0000000000019557                                             
   0,26%  mainAsnaGcc  mainAsnaGcc                              [.] dmult_x86._omp_fn.18                                           
   0,18%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] dasum_k                                                        
   0,17%  mainAsnaGcc  [kernel.kallsyms]                        [k] update_rq_clock.part.80                                        
   0,17%  mainAsnaGcc  mainAsnaGcc                              [.] dasnaPar_cpu._omp_fn.7                                         
   0,15%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] dnrm2_k                                                        
   0,15%  mainAsnaGcc  [kernel.kallsyms]                        [k] yield_task_fair                                                
   0,15%  mainAsnaGcc  [kernel.kallsyms]                        [k] rcu_note_context_switch                                        
   0,15%  mainAsnaGcc  [kernel.kallsyms]                        [k] clear_buddies                                                  
   0,14%  mainAsnaGcc  mainAsnaGcc                              [.] dasnaPar_cpu._omp_fn.10                                        
   0,13%  mainAsnaGcc  [kernel.kallsyms]                        [k] check_cfs_rq_runtime                                           
   0,13%  mainAsnaGcc  [kernel.kallsyms]                        [k] sched_clock_cpu                                                
   0,13%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] inner_thread                                                   
   0,11%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] dgemm_incopy                                                   
   0,09%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] ddot_k                                                         
   0,08%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x00000000000193b4                                             
   0,08%  mainAsnaGcc  [kernel.kallsyms]                        [k] sched_clock                                                    
   0,07%  mainAsnaGcc  [kernel.kallsyms]                        [k] entry_SYSCALL_64                                               
   0,07%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x00000000000182ca                                             
   0,07%  mainAsnaGcc  libm-2.23.so                             [.] 0x000000000006578d                                             
   0,07%  mainAsnaGcc  [kernel.kallsyms]                        [k] try_to_wake_up                                                 
   0,07%  mainAsnaGcc  libm-2.23.so                             [.] 0x000000000006577d                                             
   0,07%  mainAsnaGcc  mainAsnaGcc                              [.] compInitialAtoms._omp_fn.2                                     
   0,06%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] sched_yield@plt                                                
   0,06%  mainAsnaGcc  libm-2.23.so                             [.] 0x0000000000065789                                             
   0,06%  mainAsnaGcc  libm-2.23.so                             [.] 0x0000000000065730                                             
   0,06%  mainAsnaGcc  libm-2.23.so                             [.] 0x0000000000065771                                             
   0,06%  mainAsnaGcc  libc-2.23.so                             [.] malloc                                                         
   0,06%  mainAsnaGcc  [kernel.kallsyms]                        [k] futex_wake                                                     
   0,05%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] blas_memory_free                                               
   0,05%  mainAsnaGcc  libm-2.23.so                             [.] 0x0000000000065791                                             
   0,05%  mainAsnaGcc  libm-2.23.so                             [.] 0x0000000000065769                                             
   0,05%  mainAsnaGcc  libm-2.23.so                             [.] 0x0000000000065765                                             
   0,05%  mainAsnaGcc  libm-2.23.so                             [.] 0x0000000000065699                                             
   0,04%  mainAsnaGcc  libm-2.23.so                             [.] 0x0000000000065781                                             
   0,04%  mainAsnaGcc  libm-2.23.so                             [.] 0x00000000000656f3                                             
   0,04%  mainAsnaGcc  libm-2.23.so                             [.] 0x0000000000065725                                             
   0,04%  mainAsnaGcc  libm-2.23.so                             [.] 0x00000000000655f0                                             
   0,04%  mainAsnaGcc  libm-2.23.so                             [.] 0x0000000000065845                                             
   0,04%  mainAsnaGcc  mainAsnaGcc                              [.] log@plt                                                        
   0,04%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] dgemm_oncopy                                                   
   0,04%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x0000000000019620                                             
   0,04%  mainAsnaGcc  [kernel.kallsyms]                        [k] get_futex_key_refs.isra.12                                     
   0,04%  mainAsnaGcc  [kernel.kallsyms]                        [k] perf_pmu_sched_task                                            
   0,04%  mainAsnaGcc  [kernel.kallsyms]                        [k] get_futex_key                    

@brada4
Copy link
Contributor

brada4 commented Jun 6, 2017

Can you repeat same measuring but in ./common.h around line 360
#define YIELDING sched_yield()
replace with
#define YIELDING {}
(full rebuild after)

@NonRepeat
Copy link
Author

NonRepeat commented Jun 6, 2017

Done, now it hasn't segfaulted. I cannot evaluate the time now because its a multi user machine and someone is executing, so the times will be wrong (giving 5.5 secs now). But just in case the thing you are looking for is not affected by that here is the report, I will repeat the measurements tomorrow if the machine is free.

By the way, only for curiosity and if it is not too complex to explain, what have I just done (the edited line)?

  47,39%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] dgemm_kernel                                                   
  15,37%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] inner_thread                                                   
   5,46%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x000000000001954f                                             
   5,38%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] dgemm_itcopy                                                   
   4,49%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x0000000000019551                                             
   3,36%  mainAsnaGcc  mainAsnaGcc                              [.] dasnaPar_cpu._omp_fn.6                                         
   2,80%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x0000000000019553                                             
   2,73%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x00000000000193b7                                             
   2,39%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] ddot_kernel_8                                                  
   2,06%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x00000000000193b9                                             
   1,32%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x00000000000193bb                                             
   1,07%  mainAsnaGcc  mainAsnaGcc                              [.] dsub_x86._omp_fn.19                                            
   0,84%  mainAsnaGcc  mainAsnaGcc                              [.] ddiv_x86._omp_fn.17                                            
   0,63%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] dgemm_beta                                                     
   0,53%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x0000000000019557                                             
   0,50%  mainAsnaGcc  mainAsnaGcc                              [.] compInitialAtoms._omp_fn.1                                     
   0,27%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] dcopy_k                                                        
   0,26%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x00000000000193bf                                             
   0,25%  mainAsnaGcc  mainAsnaGcc                              [.] dmult_x86._omp_fn.18                                           
   0,23%  mainAsnaGcc  mainAsnaGcc                              [.] dasnaPar_cpu._omp_fn.10                                        
   0,20%  mainAsnaGcc  [kernel.kallsyms]                        [k] clear_page_c_e                                                 
   0,18%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] ddot_k                                                         
   0,16%  mainAsnaGcc  mainAsnaGcc                              [.] dMin_x86._omp_fn.22                                            
   0,09%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x000000000001954c                                             
   0,08%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] inner_thread                                                   
   0,08%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] blas_memory_alloc                                              
   0,07%  mainAsnaGcc  mainAsnaGcc                              [.] dasnaPar_cpu._omp_fn.7                                         
   0,07%  mainAsnaGcc  [kernel.kallsyms]                        [k] native_queued_spin_lock_slowpath                               
   0,07%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x000000000000a3d8                                             
   0,06%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x00000000000182ca                                             
   0,06%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] dasum_k                                                        
   0,05%  mainAsnaGcc  libc-2.23.so                             [.] malloc                                                         
   0,04%  mainAsnaGcc  mainAsnaGcc                              [.] dtrans._omp_fn.15                                              
   0,04%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] dgemm_oncopy                                                   
   0,03%  mainAsnaGcc  [kernel.kallsyms]                        [k] get_futex_key_refs.isra.12                                     
   0,03%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x000000000001942c                                             
   0,03%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x0000000000019620                                             
   0,03%  mainAsnaGcc  [kernel.kallsyms]                        [k] entry_SYSCALL_64_after_swapgs                                  
   0,03%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x0000000000019696                                             
   0,03%  mainAsnaGcc  [kernel.kallsyms]                        [k] futex_wake                                                     
   0,02%  mainAsnaGcc  [kernel.kallsyms]                        [k] entry_SYSCALL_64_fastpath                                      
   0,02%  mainAsnaGcc  [kernel.kallsyms]                        [k] get_futex_key                                                  
   0,02%  mainAsnaGcc  [kernel.kallsyms]                        [k] update_blocked_averages                                        
   0,02%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] dtrsm_iltncopy                                                 
   0,02%  mainAsnaGcc  mainAsnaGcc                              [.] compInitialAtoms._omp_fn.2                                     
   0,02%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] dgemv_n                                                        
   0,02%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] dgemm_incopy                                                   
   0,02%  mainAsnaGcc  [kernel.kallsyms]                        [k] hash_futex                                                     
   0,02%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] dtrsm_ilnncopy                                                 
   0,02%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] dtrsm_kernel_RN                                                
   0,02%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x00000000000193b4                                             
   0,02%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x000000000001946b                                             
   0,02%  mainAsnaGcc  mainAsnaGcc                              [.] dasnaPar_cpu._omp_fn.8                                         
   0,02%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] dnrm2_k                                                        
   0,02%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] LAPACKE_dtr_nancheck                                           
   0,01%  mainAsnaGcc  libc-2.23.so                             [.] 0x000000000007fee1                                             
   0,01%  mainAsnaGcc  libgomp.so.1.0.0                         [.] free@plt                                                       
   0,01%  mainAsnaGcc  [kernel.kallsyms]                        [k] perf_pmu_sched_task                                            
   0,01%  mainAsnaGcc  libm-2.23.so                             [.] 0x00000000000655f0                                             
   0,01%  mainAsnaGcc  [kernel.kallsyms]                        [k] do_futex                                                       
   0,01%  mainAsnaGcc  libm-2.23.so                             [.] 0x000000000006578d                                             
   0,01%  mainAsnaGcc  [kernel.kallsyms]                        [k] sys_futex                                                      
   0,01%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] blas_memory_free                                               
   0,01%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x000000000000a3bc                                             
   0,01%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x0000000000019629                                             
   0,01%  mainAsnaGcc  [kernel.kallsyms]                        [k] task_tick_fair                                                 
   0,01%  mainAsnaGcc  [kernel.kallsyms]                        [k] try_to_wake_up                                                 
   0,01%  mainAsnaGcc  libm-2.23.so                             [.] 0x000000000006577d                                             
   0,01%  mainAsnaGcc  libm-2.23.so                             [.] 0x0000000000065699                                             
   0,01%  mainAsnaGcc  libm-2.23.so                             [.] 0x0000000000065789                                             
   0,01%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x00000000000170a9                                             
   0,01%  mainAsnaGcc  libm-2.23.so                             [.] 0x0000000000065730                                             
   0,01%  mainAsnaGcc  [kernel.kallsyms]                        [k] native_write_msr_safe                                          
   0,01%  mainAsnaGcc  libm-2.23.so                             [.] 0x0000000000065771                                             
   0,01%  mainAsnaGcc  libm-2.23.so                             [.] 0x0000000000065765                                             
   0,01%  mainAsnaGcc  libpthread-2.23.so                       [.] pthread_mutex_lock                                             
   0,01%  mainAsnaGcc  libm-2.23.so                             [.] 0x0000000000065725                                             
   0,01%  mainAsnaGcc  [kernel.kallsyms]                        [k] copy_user_enhanced_fast_string                                 
   0,01%  mainAsnaGcc  [kernel.kallsyms]                        [k] update_cfs_shares                                              
   0,01%  mainAsnaGcc  libm-2.23.so                             [.] 0x00000000000656f3                                             
   0,01%  mainAsnaGcc  libc-2.23.so                             [.] 0x000000000007fd2e           

@brada4
Copy link
Contributor

brada4 commented Jun 6, 2017

sched_yield() makes system call, it is expensive for very short functions, you see _schedule is out of 2nd picture.
Look at #1051 for more inspiration.

@NonRepeat
Copy link
Author

Thank you for the info, here is the report of the execution, its still double of the mkl versi'on, but the sched removal has decreased the cost a little bit , around 0,20 from 4 seconds (now is executing between 3,95 and 4,05).

48,45%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] dgemm_kernel                                                   
   9,29%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] inner_thread                                                   
   5,91%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] dgemm_itcopy                                                   
   5,21%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x0000000000019551                                                4,63%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x000000000001954f                                                3,48%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x0000000000019553                                             
   3,47%  mainAsnaGcc  mainAsnaGcc                              [.] dasnaPar_cpu._omp_fn.6                                         
   2,89%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x00000000000193b9                                             
   2,82%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x00000000000193b7                                             
   2,78%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] ddot_kernel_8                                                  
   1,88%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x00000000000193bb                                             
   1,12%  mainAsnaGcc  mainAsnaGcc                              [.] dsub_x86._omp_fn.19                                            
   1,08%  mainAsnaGcc  mainAsnaGcc                              [.] ddiv_x86._omp_fn.17                                            
   1,01%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] dgemm_beta                                                     
   0,83%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x0000000000019557                                             
   0,45%  mainAsnaGcc  [kernel.kallsyms]                        [k] clear_page_c_e                                                 
   0,43%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x00000000000193bf                                             
   0,36%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] dcopy_k                                                        
   0,32%  mainAsnaGcc  mainAsnaGcc                              [.] compInitialAtoms._omp_fn.1                                     
   0,29%  mainAsnaGcc  mainAsnaGcc                              [.] dmult_x86._omp_fn.18                                           
   0,28%  mainAsnaGcc  mainAsnaGcc                              [.] dMin_x86._omp_fn.22                                            
   0,23%  mainAsnaGcc  mainAsnaGcc                              [.] dasnaPar_cpu._omp_fn.10                                        
   0,20%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] ddot_k                                                         
   0,14%  mainAsnaGcc  mainAsnaGcc                              [.] dasnaPar_cpu._omp_fn.7                                         
   0,10%  mainAsnaGcc  [kernel.kallsyms]                        [k] native_queued_spin_lock_slowpath                               
   0,09%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x000000000000a3d8                                             
   0,08%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x00000000000182ca                                             
   0,08%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] blas_memory_alloc                                              
   0,07%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] dasum_k                                                        
   0,07%  mainAsnaGcc  libc-2.23.so                             [.] malloc                                                         
   0,06%  mainAsnaGcc  mainAsnaGcc                              [.] dtrans._omp_fn.15                                              
   0,05%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] dgemm_oncopy                                                   
   0,05%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] dgemm_incopy                                                   
   0,04%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x0000000000019696                                             
   0,04%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x000000000001942c                                             
   0,04%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x0000000000019620                                             
   0,03%  mainAsnaGcc  [kernel.kallsyms]                        [k] get_futex_key_refs.isra.12                                     
   0,03%  mainAsnaGcc  [kernel.kallsyms]                        [k] entry_SYSCALL_64_fastpath                                      
   0,03%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x000000000001946b                                             
   0,03%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] dtrsm_ilnncopy                                                 
   0,03%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] dgemv_n                                                        
   0,03%  mainAsnaGcc  [kernel.kallsyms]                        [k] hash_futex                                                     
   0,03%  mainAsnaGcc  [kernel.kallsyms]                        [k] entry_SYSCALL_64_after_swapgs                                  
   0,03%  mainAsnaGcc  mainAsnaGcc                              [.] dasnaPar_cpu._omp_fn.8                                         
   0,03%  mainAsnaGcc  mainAsnaGcc                              [.] compInitialAtoms._omp_fn.2                                     
   0,03%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x00000000000193b4                                             
   0,03%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] dtrsm_iltncopy                                                 
   0,02%  mainAsnaGcc  [kernel.kallsyms]                        [k] futex_wake                                                     
   0,02%  mainAsnaGcc  [kernel.kallsyms]                        [k] update_blocked_averages                                        
   0,02%  mainAsnaGcc  [kernel.kallsyms]                        [k] perf_pmu_sched_task                                            
   0,02%  mainAsnaGcc  [kernel.kallsyms]                        [k] get_futex_key                                                  
   0,02%  mainAsnaGcc  libm-2.23.so                             [.] 0x0000000000065699                                             
   0,02%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] dtrsm_kernel_RN                                                
   0,02%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] LAPACKE_dtr_nancheck                                           
   0,02%  mainAsnaGcc  libc-2.23.so                             [.] 0x000000000007fee1                                             
   0,02%  mainAsnaGcc  libm-2.23.so                             [.] 0x00000000000655f0                                             
   0,02%  mainAsnaGcc  libm-2.23.so                             [.] 0x0000000000065730                                             
   0,02%  mainAsnaGcc  libgomp.so.1.0.0                         [.] free@plt                                                       
   0,02%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x000000000001954c                                             
   0,02%  mainAsnaGcc  [kernel.kallsyms]                        [k] do_futex                                                       
   0,02%  mainAsnaGcc  [kernel.kallsyms]                        [k] sys_futex                                                      
   0,01%  mainAsnaGcc  libm-2.23.so                             [.] 0x000000000006578d                                             
   0,01%  mainAsnaGcc  [kernel.kallsyms]                        [k] try_to_wake_up                                                 
   0,01%  mainAsnaGcc  libm-2.23.so                             [.] 0x0000000000065725                                             
   0,01%  mainAsnaGcc  libm-2.23.so                             [.] 0x00000000000656a4                                             
   0,01%  mainAsnaGcc  [kernel.kallsyms]                        [k] native_write_msr_safe                                          
   0,01%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x00000000000170a9                                             
   0,01%  mainAsnaGcc  [kernel.kallsyms]                        [k] task_tick_fair                                                 
   0,01%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x000000000000a3bc                                             
   0,01%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x0000000000019435                                             
   0,01%  mainAsnaGcc  libm-2.23.so                             [.] 0x0000000000065771                                             
   0,01%  mainAsnaGcc  libm-2.23.so                             [.] 0x0000000000065789                                             
   0,01%  mainAsnaGcc  libm-2.23.so                             [.] 0x00000000000656f3                                             
   0,01%  mainAsnaGcc  libgomp.so.1.0.0                         [.] 0x0000000000019629                                             
   0,01%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] blas_memory_free                                               
   0,01%  mainAsnaGcc  libm-2.23.so                             [.] 0x00000000000655f9                                             
   0,01%  mainAsnaGcc  libm-2.23.so                             [.] 0x0000000000065765                                             
   0,01%  mainAsnaGcc  libopenblas_sandybridgep-r0.2.20.dev.so  [.] dpotf2_L                                                       
   0,01%  mainAsnaGcc  libc-2.23.so                             [.] 0x000000000007fd2e                                             
   0,01%  mainAsnaGcc  libm-2.23.so                             [.] 0x000000000006564f           

@martin-frbg
Copy link
Collaborator

What is the (typical) matrix size in your test ? It is quite likely that MKL is a bit more sophisticated when it comes to picking the most appropriate algorithm. (For small sizes even the plain netlib implementation may have less overhead, and/or OpenBLAS may be splitting the workload on the "wrong" (smaller) index.)

@brada4
Copy link
Contributor

brada4 commented Jun 7, 2017

@martin-frbg observation is similar to 20-core broadwell (less threads, less yielding, less running time). I think one-to-all threading does not accommodate well the situation where all is a lot.

@NonRepeat
Copy link
Author

NonRepeat commented Jun 13, 2017

Sorry for the delay, i Ws traveling past week, @martin-frbg I am working with different matrix sizes, 750x130, 750x10000, 10000x130 and 10000x10000, some operations are vector matrix.

The code can be found in : https://gitlab.com/P.SanJuan/ASNA

@NonRepeat
Copy link
Author

Since the initial problem was solved, and the performance difference seems to be in normal parameters, I close the issue.

Thanks everyone for the help.

@brada4
Copy link
Contributor

brada4 commented Jun 19, 2017

Smallest is <1MB, middle are 10 and 60MB, biggest is 800MB
1st is under threading threshold, 2nd is above threading threshold but will not gain from multiple multi-core CPUs (!), last may or may not gain from lots of CPUs (do micro-benchmark, _gemm time is not input-dependent - maybe cpuset-ing it to one multi-core CPU improves matters)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants