Performance of dgemm

When I perform calculations of the type C = (transpose(A))*B I noticed that with openBLAS I don't gain any speedup when I call cblas_dgemm with the right flags to indicate that A is transposed, compared to if I just do a manual transposition myself, i.e. D = transpose(A) which involves a memory allocation for D followed by manually filling in D(i,j)=A(j,i), and only then C = D*B via cblas_dgemm. I observe the same behavior when doing A*(transpose(B)). On the other hand using MKL BLAS, the first option which doesn't do a manual transposition results in 50% faster execution. Is this because OpenBLAS makes hidden copies when transA and transB are set to CBlasTrans?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance of dgemm #1840

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Performance of dgemm #1840

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions