Skip to content

Cross compilation for Skylake X #2986

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
winpassuser opened this issue Nov 11, 2020 · 9 comments
Closed

Cross compilation for Skylake X #2986

winpassuser opened this issue Nov 11, 2020 · 9 comments

Comments

@winpassuser
Copy link

winpassuser commented Nov 11, 2020

I have run into an issue where openblas (both v0.3.12 and v0.3.6) gives wrong results on Skylake X when compiled with DYNAMIC_ARCH=1 on a machine with an older cpu. When I add NO_AVX512=1 or compile on the Skylake X machine, everything runs fine.
Compiler: gcc/gfortran 7.5.0 and 9.3.0 (the Skylake X machine has 7.5.0, I have tested both 7.5.0 and 9.3.0 on the older machine).

I have attached the compilation logs, code and data that triggers the error. The code computes the Cholesky factorization of a 1107x1107 matrix with eigenvalues between 1 and 3.6 x 104. On Skylake X, the value of info from dpotrf_ is 65, wrongly indicating that the matrix is not positive definite.

The compilation logs are from three different builds:
on the older machine: make DYNAMIC_ARCH=1 NUM_THREADS=64 >log_old_machine.txt 2>&1
on the Skylake X machine: make DYNAMIC_ARCH=1 NUM_THREADS=64 >log_skylake.txt 2>&1
on the Skylake X machine: make DYNAMIC_ARCH=1 NUM_THREADS=64 TARGET=SANDYBRIDGE >log_skylake_target_sandy.txt 2>&1

Statically linking the test code to openblas on the older machine and running the binary on the Skylake X machine triggers the error. There is no error when I use OPENBLAS_CORETYPE=Haswell, or use the library from log_skylake.txt or log_skylake_target_sandy.txt.

log_old_machine.txt
log_skylake.txt
log_skylake_target_sandy.txt
code_and_data.zip

@martin-frbg
Copy link
Collaborator

Thanks for the very comprehensive report - there was a long-standing bug in the cpu parameter assignment during DYNAMIC_ARCH builds fixed post 0.3.12, but I would not expect it to have caused this particular mess. Will try to reproduce this
tomorrow.

@brada4
Copy link
Contributor

brada4 commented Nov 11, 2020

Please make sure to patch BIOS/microcode past this bug:
https://lists.debian.org/debian-devel/2017/06/msg00308.html
Or just throw latest microcode at the CPU, with all spectre bunch of fixes.

You code does fine, at least on haswell (and forced back to sandybridge) does not throw the error.
EDIT: after some waiting also Netlib LAPACK agrees your test case is good.

@martin-frbg
Copy link
Collaborator

@brada4 that theory does not seem to fit the report that it works when compiled on the SKX (and if I read you correctly your test was on Haswell only ?)

@brada4
Copy link
Contributor

brada4 commented Nov 11, 2020

Only failures are when skylakex code is invoked on a compatible CPU... I just checked the rest as well as possible + usual cold shower regarding microcode.

@winpassuser
Copy link
Author

Please make sure to patch BIOS/microcode past this bug:
https://lists.debian.org/debian-devel/2017/06/msg00308.html
Or just throw latest microcode at the CPU, with all spectre bunch of fixes.

Thank you for this suggestion. I am not root, but the machine has rebooted after this package was installed, and dmesg shows that it runs microcode from June 2020 (search for 0x2006906):

$ dmesg | grep microcode
[ 1.503570] microcode: sig=0x50654, pf=0x4, revision=0x2006906
[ 1.503637] microcode: Microcode Update Driver: v2.2.

@brada4
Copy link
Contributor

brada4 commented Nov 11, 2020

Thank you for confirming, your microcode is way past that accuracy problem from 3 years ago.
Your microcode is one version behind absolutely latest which would fix some power-meter related crypto dilemma, unrelated to numeric accuracy we are after.
https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/releases/tag/microcode-20201110

@martin-frbg
Copy link
Collaborator

Reproduced with a build created on Sandybridge, not reproduced with a build made on Haswell even when setting the build TARGET for all common code to SANDYBRIDGE there. (Had to update binutils on the Sandybridge to get AVX512 code to compile, so it cannot be caused by an outdated assembler on the older machine). Quite disconcerting.

@martin-frbg
Copy link
Collaborator

Problem appears to be unrelated to multithreading (though the falsely flagged matrix element does vary with thread count), and
if it was observed with 0.3.6 already it cannot be the AVX512 DGEMM kernel itself as it was disabled in that particular version.
Differences in SWITCH_RATIO,GEMM_PREFERRED_SIZE or DGEMM_UNROLL_MN play no role either.

@martin-frbg
Copy link
Collaborator

closing as fixed (or at least worked around for now) by #3469

raspbian-autopush pushed a commit to raspbian-packages/openblas that referenced this issue Oct 11, 2023
…achine

Origin: upstream, OpenMathLib/OpenBLAS#3579
Bug: OpenMathLib/OpenBLAS#2986
     OpenMathLib/OpenBLAS#3454
     OpenMathLib/OpenBLAS#3557
Bug-Debian: https://bugs.debian.org/1025480
Applied-Upstream: 0.3.21
Reviewed-by: Sébastien Villemot <[email protected]>
Last-Update: 2023-06-26

When building OpenBLAS with dynamic arch selection on x86-64 hardware
that does not support AVX2 (e.g. Intel Ivybridge or earlier), then
the AVX512 (SkylakeX) kernel for DGEMM would produce incorrect
results (of course when run on AVX512-capable hardware).

The problem was that the check for determining whether the compiler
is able to understand AVX512 assembly/intrinsics was doubly
incorrect: it would test the build machine capabilities (instead of
the compiler capabilities); and it would check for AVX2 instead of
AVX512. As a consequence, on pre-AVX2 hardware, the build system
would conclude that the compiler is not able to understand AVX512
primitives, and would create a broken AVX512 (SkylakeX) DGEMM kernel
(essentially a Haswell kernel, but with some wrong assumptions, hence
leading to incorrect numerical results).
Last-Update: 2023-06-26
Gbp-Pq: Name avx512-dgemm.patch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants