-
Notifications
You must be signed in to change notification settings - Fork 1.6k
nan in svd N for Haswell core #3318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
which OpenBLAS version is this please ? |
@mattip - I'm very sorry - I lost track of the versions - this is for Numpy 1.21.0 - what version of OpenBLAS is this? What is the best way to test against OpenBLAS 0.3.17? |
Guessing 1.21 would be using 0.3.15, so probably no improvement from going to 0.3.17. If it only happens on Haswell it is less likely to be a bug in LAPACK but the key to the different behaviour with and without UV (if that is what I think it is) may be in LAPACK DGESDD using a different algorithm when the singular vectors are needed. |
Wait for a build with PR numpy/numpy#19492 to appear on https://anaconda.org/scipy-wheels-nightly/numpy/files. Since we are revising the wheel building repo to use 64 bit interfaces, the builds are quite active now. Usually they only update once a week or so on Sun. You can check the OpenBLAS version with this code https://github.com/numpy/numpy/blob/main/tools/openblas_support.py#L294 |
@martin-frbg - from @mattip's hint, the version tested above was:
Testing with the latest Numpy development build as per Matti's instructions (
This gives the same outputs for the SVDs (5 nan values for Haswell, 0 for the others). |
So no recent regression at least ... last relevant changes to Haswell (GEMM kernels) were around 0.3.8 I think... Will try to swap them back as soon as I have reproduced the problem on my hardware. (Probably not today though) |
Bisected to abef2ea "Move -fma option setting to kernel/Makefile.L1", i.e. it is a problem with where and when to apply a compiler option that is (theoretically) required only for one or two BLAS kernels that use FMA in the form of intrinsiscs. (Also means broken since 0.3.13.dev snapshot of December 17). No immediate idea how to handle this, as using |
Interim conclusion is that the Haswell |
Thanks very much for tracking that one down. |
wild guess: what happens if you compile (GCC) with |
Thanks for the suggestion. Unfortunately this does not address the issue of FMA inexplicably being needed for some files (possibly just one) that do not use intrinsics. (My PR fixes that case for gmake builds, but changing the cmake build is much less straightforward) |
Thanks, I know that option but the cmake rules for the kernel files are themselves generated by a cmake script... |
Just a question: why is FMA inexplicably being needed ? Performance reasons? |
See earlier in this thread... #3318 (comment) #3318 (comment) |
Which file do you expect to compile with
|
DGEMV_T not DGEMV_N. And the "microk" is |
This problem was already present in 0.2.15, the version that introduced the dgemv_t microkernel for Haswell. I fail to identify any formal problems with the inline assembly, and unfortunately I do not see where |
Please forgive me for this bug-report - coming from Numpy - but I think we've hit a bug in the singular value calculation for the Haswell core type - see numpy/numpy#19473
Specifically, only on the Haswell core, we are getting unexpected
nan
values in the case where we compute the singular values without the UV matrices. The files to reproduce in Numpy are here:svd_check.zip
The check file is
svd_check.py
in the archive - and contains:We run with many cores using the
test_svds.sh
script:This gives 5
nan
values for the Haswell core, and 0 otherwise. If we compute the UV matrices at the same time, we never getnan
in the SVs, regardless of core.Could this be a bug in the SV-only computation in OpenBLAS?
The text was updated successfully, but these errors were encountered: