-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Issue with LAPACKE_sgesvd() on custom compiled v0.3.7 for Win64 (new x86_64-w64-mingw32 might be the cause) #2297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
You are correct - one has to copy gfortran redist from cross-build system, that
Does anything change if you use TARGET=OPTERON or OPTERON_SSE3 ? Between mentioned releases of OpenBLAS ang GCC (You got 8.2.0?) - default gfortran ABI changed slightly, and fresh GCC got more picky about assembly registers. |
No, it's 8.3
I'll try other things you've mentioned and post updates later. |
No change
No change /* #define OPENBLAS_HAVE_3DNOW
#define OPENBLAS_HAVE_3DNOWEX defined in
No change
Kind of no change. For the same input (rng is seeded with the same constant) outputs
or in bytes
|
If you compile 0.2.20 with Debian 10 - does it work? |
I don't use neither numpy, nor python at all, it's purely C++ project. I mentioned the python script just to give an idea what kind of work gesvd() is doing in the project (python is easier to comprehend than C++ code I gave link to)
Have just downloaded the code, tried to compile: compilation breaks in the middle (and I don't know how to fix it - is it possible to make error messages more verbose? there's even no line number here...)
|
IIRC, the cgemm_cr problem with 0.2.20 was a "funny" quirk of mingw where it misinterpretes the |
Seems so. Tried to compile with MXE env looks intimidating 😁 |
0.3.7 integrates that fflag already, there will be no change. For what is important numpy build has to use that flag too. |
v0.3.0 also fails at the same point. v0.3.1 compiled successfully, but no change for the issue when I run my code. @brada4 Did you read my previous comment about python? Numpy is completely irrelevant to the issue. And btw, just tried to compute on doubles instead of floats using my base v0.3.7. |
oh... If I'd only knew ahead that it'd take almost two days to switch to a newer version... But finally I made it work. I've just managed to install mingw-w64 from previous debian release. Compiler from there have version 6.3.0 and links to the same -3 fortran ABI .dll... Now both functions, - single precision |
It is unlikely fortran issue. |
Have now learned that it is possible to rebuild MXE with a more recent gcc, will provide an updated |
That would be a good thing, if it's not very burdensomely for you. Regarding the issue... I personally think that it's very important, that the issue is easily reproducible, because it makes much easier to find the root cause of the issue. If it's actually an issue in OpenBLAS - it could easily be fixed. If the compiler is to blame, - it would very beneficial for the whole society if a reproducible compiler bug report will be created. Should I make a short isolated code to reproduce the issue exactly as I saw it, so some of you (who are sufficiently familiar with the OpenBLASs code) could debug the library and find the real cause? |
If it is not too much trouble for you, an isolated test code would be great - that way it could hopefully be established if this is a bug in OpenBLAS or in recent mingw ports of gcc. |
No problem, I'll post it soon |
@martin-frbg please take a look into rep https://github.com/Arech/sgesvd_tester Note that actually in a conventional FP mode sgesvd() from a buggy binary is able to catch and return an error (however, it'll still produce NaNs in output). It will silently return success with a junk in output when FP rounding mode was set to "round towards zero". Proper binary works great in both modes. Feel free to ping me if you need some more info/help. |
@Arech and how would reference BLAS and MKL react to your FPU compliance diversions? |
@brada4 Andrew, I'd like you to answer me the following two questions first before I answer yours:
|
Your code changes FPU flags only on calling thread, others remain in default state, you need to change FPCSR code in blas_server_win32.c to propagate your setting to all threads, that is limitation of mingw32. Maybe then it works properly with strange rounding modes too, certainly they produce different results from standard conditions in all cases. |
Makes me wonder if (the effect of) setting CONSISTENT_FPCSR=1 could/should be done automatically at compile time |
MKL propagates user-set FPU config to all threads. I am puzzled why here even single-threaded case was failing. |
Now I think I got you idea. If I understand it correctly, there is a possibility that:
Well... that is fair point. I was going to test it, but
Indeed. Setting env variables Now, regarding non standart FPU state and if it's necessary to propagate to worker threads... Unfortunately, I don't remember exactly, why I chose to use this setting in my main codebase. According to a comment, it seems that it was intended to prevent NaNs from occurring in vectorized form of ::std::exp() function (that is possibly a quirk of either the compiler or my hardware, because denormals were already disabled at that moment, but NaNs kept occurring from ::std::exp() anyway).. Therefore as long as OpenBLAS functions does not produce NaNs (and it seems that setting So, to reiterate, for me personally (don't know about other use-cases) there's no need for OpenBLAS to support non-conventional FPU config's as long as it doesn't produce denormal numbers. What really bothers me is that even in totally standard FPU config sgesvd() always converges when it was compiled with v6.3 and always fails when it was compiled with v8.3. Any ideas why it happens? |
This injustice is not fixable. Just like netlib blas here IEEE754 conformant FPU is expected, with all NaN signalling etc. For those caring less denormals can be disabled. |
@brada4 What "this injustice" exactly you're talking about? @martin-frbg Did you understand Andrew's point? Do you agree with him? |
Probably something lost in translation... @brada4 could you rephrase your comment ? |
Things like There is shortcoming that FPU modes are not distributed pn each call to threads like MKL does. |
Have same problem, can I somehow obtain the configuration/flags of the MXE environment that @martin-frbg uses? |
Same problem to which? |
Same problem as in OP post. Custom compiled 0.3.7 produces weird results in Flag set:
|
What is your CPU? Like off CPU-Z screenshot will do. |
Here it is (Coffee Lake, I think?):
|
Tried with 0.3.13, flag set DYNAMIC_ARCH=1
DYNAMIC_OLDER=1
CONSISTENT_FPCSR=1
TARGET=CORE2
BINARY=64
USE_THREAD=0
USE_LOCKING=1
NUM_THREADS=200
HOSTCC=gcc
CC=x86_64-w64-mingw32-gcc
FC=x86_64-w64-mingw32-gfortran same freeze:
I see 12 threads (I have 12 logical cores) started by openblas.dll, which is very weird, as I have Each of them has this stack trace:
|
Strange behaviour from fortran. |
@brada4 this helped, I no longer see
So there's nothing in other threads it could wait for, that's for sure. Some weird issue with |
Threads should not have happened at all in first place. KeWhatever are driver functions, SBDSQR will just make use of few library calls down the call chain, would not even read a file or display a pixel to to touch OS drivers, let alone kernel mode. |
Very strange. All code that queries the environment for the number of threads at runtime should already be guarded by |
I've ran |
Note: I didn't use |
Is your own test code multithreaded by any chance ? (Still trying to understand where the ntoskernel calls come from - SGESVD/SBDSQR is plain old single-threaded fortran from the netlib reference implementation, it will call the OpenBLAS SROT kernel which on x86_64 will try to run parallel tasks - but again only |
Thread bt would wildly diverge if outer threads would call into library. |
hmm. can you upload your Makefile.conf and config.h please ? Maybe one of these generated files contains a clue. And if you are building the 0.3.13 tag and not current |
Sure.
Yes, I've been building from |
Please disregard my last comments, looks like I was still copying from old location with 0.3.7. I'll get proper tests shortly. |
I don't see freezes now, but sgesvd output still doesn't match what was in binaries from MXE. Rogue threads problem also remains, and I'll try more tests with different flag sets and building from MSYS2 MinGW 5.x/6.x/8.x/10.x and Linux MinGW 5.x/6.x/8.x/10.x. |
probably you need to install |
If you built multi-processor version and constrain it with environment variables? Those fortran-borne threads are not expected. |
Ok, so here's the thing: in my case it boils down to Now, I always have Should I create another ticket for this issue? Don't want piggybacking here as in the end my issue seems to be unrelated to the OP's post. |
I checked and it' s installed. There are both |
PS Keep this ticket, the USE_THREAD |
Looks like AVX disassembly to me. Intermediate details: Exception info
|
Is it possible to expand libopenblas.dll .text session to see inside which function EIP is in? Like double-clicking in section or so.
Also the 3 calls before crash - do you see atgs parsed, like to match these three functions? ? I think it writes hits guard page before allocation ? @martin-frbg ? |
No idea so far, perhaps need a build with DEBUG=1 (or |
Tried building with
(It shows different |
Looking more and more like another instance of "BUFFERSIZE too small for the desired GEMM_P/Q/R so we write past the end of the GEMM buffer" |
Wild guess - could you try changing the |
Didn't help; I'm now trying to build more or less reproducible example and attach it here along with debug build of the library. |
Probably worth setting breakpoint at s/dgemm entry and just getting out size arguments, code path is not changed by the rest of content in matrices. |
Hi there.
I have a wrapper over LAPACKE_sgesvd() that works well with supplied binary v0.2.19/20, custom compiled v.0.2.20 and supplied binary v0.3.7. However, the code doesn't work well with v0.3.7 compiled (same options as I did for v0.2.20) on a fresh Debian10 with fresh compilers
CC=x86_64-w64-mingw32-gcc FC=x86_64-w64-mingw32-gfortran
(I'll describe later details of the compilation process)Testing environment: Windows 7 x64 with latest updates on AMD Phenom II X6 1090.
The code in question is the same as the following python script:
It takes random N(0,1) matrix and performs SVD on it. Here's first few floats of sample input (colmajor matrix 64*785):
All tested versions of LAPACKE_sgesvd() works great, except custom compiled v0.3.7, which despite returning success (0), outputs the following junk:
Or in DWORDS
Note, that I had to do custom compilation, because the supplied binary still doesn't use the
CONSISTENT_FPCSR=1
switch and I eventually get a lots of NaNs that seriously slows computation down.The compilation process is basically the same as described in the issue linked above ( #1237 ). I installed fresh Debian10 on a virtual machine, did all the boilerplates
and then ran
to obtain kind of distro in
/opt/OpenBLAS
. Then I copy the contents of/opt/OpenBLAS
to the windows system, compile my project over it, copylibopenblas.dll
to my exe's folder and get the issue with LAPACKE_sgesvd() when I run my code.Note, that I haven't found any issues with CBLAS routines I use (mainly gemm, syrk and symm). Moreover, I'm glad to see some performance improvement over the older v0.2.20.
I've noticed one suspicious difference between the supplied binary
libopenblas.dll
and my compiled version. The supplied binary depends onlibgfortran-3.dll
and works great with very old version of this lib dated 21.10.2014 (AFAIR I got it from some .zip from sourceforge's project page long ago). However, the custom compiled version depends onlibgfortran-5.dll
file, which I had to take (with all other necessary .dll dependencies) from debian's installation folder/lib/gcc/x86_64-w64-mingw32/8.3-win32
.Any ideas how to fix the issue?
Probably it worth trying to change the compiler to some older version, however, I'm not aware how to do it (I'm a foreigner in the Linux world). Could someone please explain it a little if the idea is worth trying?
The text was updated successfully, but these errors were encountered: