Description
Hi there.
I have a wrapper over LAPACKE_sgesvd() that works well with supplied binary v0.2.19/20, custom compiled v.0.2.20 and supplied binary v0.3.7. However, the code doesn't work well with v0.3.7 compiled (same options as I did for v0.2.20) on a fresh Debian10 with fresh compilers CC=x86_64-w64-mingw32-gcc FC=x86_64-w64-mingw32-gfortran
(I'll describe later details of the compilation process)
Testing environment: Windows 7 x64 with latest updates on AMD Phenom II X6 1090.
The code in question is the same as the following python script:
def sample(self, shape):
if len(shape) < 2:
raise RuntimeError("Only shapes of length 2 or more are "
"supported.")
flat_shape = (shape[0], np.prod(shape[1:]))
a = get_rng().normal(0.0, 1.0, flat_shape)
u, _, v = np.linalg.svd(a, full_matrices=False)
# pick the one with the correct shape
q = u if u.shape == flat_shape else v
q = q.reshape(shape)
return floatX(self.gain * q)
It takes random N(0,1) matrix and performs SVD on it. Here's first few floats of sample input (colmajor matrix 64*785):
0x000000000D4C00C0 -0.909242570 -0.741646349 -0.169360474 -0.177789196
0x000000000D4C00D0 -0.341049701 -0.345561802 -0.421100467 -0.359291792
0x000000000D4C00E0 -0.0570527203 1.12855971 -1.45928419 0.384212315
...
All tested versions of LAPACKE_sgesvd() works great, except custom compiled v0.3.7, which despite returning success (0), outputs the following junk:
0x000000000D4C00C0 -3.40282347e+38 -3.40282347e+38 0.000000000 0.000000000
0x000000000D4C00D0 3.40282347e+38 3.40282347e+38 -3.40282347e+38 3.40282347e+38
0x000000000D4C00E0 -3.40282347e+38 0.000000000 0.000000000 -3.40282347e+38
...
Or in DWORDS
0x000000000D4C00C0 ff7fffff ff7fffff 00000000 00000000
0x000000000D4C00D0 7f7fffff 7f7fffff ff7fffff 7f7fffff
0x000000000D4C00E0 ff7fffff 00000000 00000000 ff7fffff
...
Note, that I had to do custom compilation, because the supplied binary still doesn't use the CONSISTENT_FPCSR=1
switch and I eventually get a lots of NaNs that seriously slows computation down.
The compilation process is basically the same as described in the issue linked above ( #1237 ). I installed fresh Debian10 on a virtual machine, did all the boilerplates
apt-get update
apt-get upgrade
apt-get install make cmake gcc mingw-w64 gfortran-mingw-w64
and then ran
make clean
make DYNAMIC_ARCH=0 CONSISTENT_FPCSR=1 CC=x86_64-w64-mingw32-gcc FC=x86_64-w64-mingw32-gfortran HOSTCC=gcc NUM_THREADS=6 TARGET=BARCELONA PREFIX=/opt/OpenBLAS
make install
to obtain kind of distro in /opt/OpenBLAS
. Then I copy the contents of /opt/OpenBLAS
to the windows system, compile my project over it, copy libopenblas.dll
to my exe's folder and get the issue with LAPACKE_sgesvd() when I run my code.
Note, that I haven't found any issues with CBLAS routines I use (mainly gemm, syrk and symm). Moreover, I'm glad to see some performance improvement over the older v0.2.20.
I've noticed one suspicious difference between the supplied binary libopenblas.dll
and my compiled version. The supplied binary depends on libgfortran-3.dll
and works great with very old version of this lib dated 21.10.2014 (AFAIR I got it from some .zip from sourceforge's project page long ago). However, the custom compiled version depends on libgfortran-5.dll
file, which I had to take (with all other necessary .dll dependencies) from debian's installation folder /lib/gcc/x86_64-w64-mingw32/8.3-win32
.
Any ideas how to fix the issue?
Probably it worth trying to change the compiler to some older version, however, I'm not aware how to do it (I'm a foreigner in the Linux world). Could someone please explain it a little if the idea is worth trying?