ZEN kernels perform worse, give wrong results, compared to HASWELL kernels on Zen/Ryzen

My application extensively uses SGEMM kernels with sizes:

M=128, N=361, K=1152
M=32, N=361, K=288

(This is an im2col+SGEMM combo for DCNN computation)
Single-threaded (application itself is multithreaded) with OPENBLAS_CORETYPE=Haswell

1000 predictions in 37.00 seconds -> 27 p/s
1000 evaluations in  4.29 seconds -> 233 p/s

Static build for Zen (see previous issue, dynamic dispatch is broken):
1000 predictions in 40.23 seconds -> 24 p/s
1000 evaluations in  4.50 seconds -> 222 p/s

So performance tanks about 5% to 20%.

So, "Zen" support in OpenBLAS actually worsens performance on Zen.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ZEN kernels perform worse, give wrong results, compared to HASWELL kernels on Zen/Ryzen #1147

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ZEN kernels perform worse, give wrong results, compared to HASWELL kernels on Zen/Ryzen #1147

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions