You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My application extensively uses SGEMM kernels with sizes:
M=128, N=361, K=1152
M=32, N=361, K=288
(This is an im2col+SGEMM combo for DCNN computation)
Single-threaded (application itself is multithreaded) with OPENBLAS_CORETYPE=Haswell
1000 predictions in 37.00 seconds -> 27 p/s
1000 evaluations in 4.29 seconds -> 233 p/s
Static build for Zen (see previous issue, dynamic dispatch is broken):
1000 predictions in 40.23 seconds -> 24 p/s
1000 evaluations in 4.50 seconds -> 222 p/s
So performance tanks about 5% to 20%.
So, "Zen" support in OpenBLAS actually worsens performance on Zen.