-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Empirical workaround for numpy SVD NaN problem from issue 3318 #3320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Please forgive my ignorance about the testing framework, but it is practical add a test to exercise this, perhaps with the original array from #3318 ? |
Only if I manage to convert it to C - right now I am still wondering what this is trying to tell me (or if is just a persistent gcc bug) |
@martin-frbg, using GCC one would very much prefer to use This behaviour is different to clang and MSVC. On the other hand with GCC you have fine control over the instructions set. I'm able to prepare a PR for that if desired. |
I am not convinced as the original issue appeared to have been caused by applying |
Stupid question that probably is already present in testing: is there a way to check that the xmm/ymm registers are being properly saved/restored and wrap the kernel calls with it for testing? |
Unfortunately not, so far it has always been retroactive fixing as things blew up with the next smarter release of gcc and/or somebody shouting "oi, you need to save that". (I do believe most such bugs have been fixed in my time here, but I fear I still know just enough assembly to be dangerous. And actually the last change to the Haswell DGEMV microkernel has been to proactively save all xmm/ymm registers instead of just those directly touched by the code - but reverting that had no bearing on the issue) |
A wild theory (I have lots of these): maybe it is outside the kernel, and somewhere else someone is not restoring a (different?) register. By adding the flag the registers are used differently and so the problem does not appear. Not sure what would be the easiest way to verify this and how much effort it would be. |
more Voodoo than fix, actual problem probably just papered over by the -mfma option
fixes #3318 (for now)