Skip to content

Empirical workaround for numpy SVD NaN problem from issue 3318 #3320

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 22, 2021
Merged

Empirical workaround for numpy SVD NaN problem from issue 3318 #3320

merged 3 commits into from
Jul 22, 2021

Conversation

martin-frbg
Copy link
Collaborator

@martin-frbg martin-frbg commented Jul 18, 2021

more Voodoo than fix, actual problem probably just papered over by the -mfma option
fixes #3318 (for now)

@matthew-brett
Copy link
Contributor

Please forgive my ignorance about the testing framework, but it is practical add a test to exercise this, perhaps with the original array from #3318 ?

@martin-frbg
Copy link
Collaborator Author

Only if I manage to convert it to C - right now I am still wondering what this is trying to tell me (or if is just a persistent gcc bug)

@carlkl
Copy link

carlkl commented Jul 19, 2021

@martin-frbg, using GCC one would very much prefer to use -march= options instead of -msse2, -mavx and so forth. -mavx2 for example does not switch on -mfma, you have to use -march=haswell (Corrected). The -march= flags controls much more instructions.

This behaviour is different to clang and MSVC. On the other hand with GCC you have fine control over the instructions set.

I'm able to prepare a PR for that if desired.

@martin-frbg
Copy link
Collaborator Author

I am not convinced as the original issue appeared to have been caused by applying -mfma to "too many" files so I do not yet think an even less fine-grained application of options would help.

@mattip
Copy link
Contributor

mattip commented Jul 21, 2021

Stupid question that probably is already present in testing: is there a way to check that the xmm/ymm registers are being properly saved/restored and wrap the kernel calls with it for testing?

@martin-frbg
Copy link
Collaborator Author

Unfortunately not, so far it has always been retroactive fixing as things blew up with the next smarter release of gcc and/or somebody shouting "oi, you need to save that". (I do believe most such bugs have been fixed in my time here, but I fear I still know just enough assembly to be dangerous. And actually the last change to the Haswell DGEMV microkernel has been to proactively save all xmm/ymm registers instead of just those directly touched by the code - but reverting that had no bearing on the issue)

@mattip
Copy link
Contributor

mattip commented Jul 21, 2021

A wild theory (I have lots of these): maybe it is outside the kernel, and somewhere else someone is not restoring a (different?) register. By adding the flag the registers are used differently and so the problem does not appear. Not sure what would be the easiest way to verify this and how much effort it would be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

nan in svd N for Haswell core
4 participants