Skip to content

Performance of dgemm #1840

Open
Open
@jmsargado

Description

@jmsargado

When I perform calculations of the type C = (transpose(A))B I noticed that with openBLAS I don't gain any speedup when I call cblas_dgemm with the right flags to indicate that A is transposed, compared to if I just do a manual transposition myself, i.e. D = transpose(A) which involves a memory allocation for D followed by manually filling in D(i,j)=A(j,i), and only then C = DB via cblas_dgemm. I observe the same behavior when doing A*(transpose(B)). On the other hand using MKL BLAS, the first option which doesn't do a manual transposition results in 50% faster execution. Is this because OpenBLAS makes hidden copies when transA and transB are set to CBlasTrans?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions