Open
Description
When I perform calculations of the type C = (transpose(A))B I noticed that with openBLAS I don't gain any speedup when I call cblas_dgemm with the right flags to indicate that A is transposed, compared to if I just do a manual transposition myself, i.e. D = transpose(A) which involves a memory allocation for D followed by manually filling in D(i,j)=A(j,i), and only then C = DB via cblas_dgemm. I observe the same behavior when doing A*(transpose(B)). On the other hand using MKL BLAS, the first option which doesn't do a manual transposition results in 50% faster execution. Is this because OpenBLAS makes hidden copies when transA and transB are set to CBlasTrans?
Metadata
Metadata
Assignees
Labels
No labels