-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Multi-arch OpenBLAS with DYNAMIC_ARCH=1 yields wrong result when compiling on ivybridge and running on skylake #3454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Probably a duplicate of #2986 (ivybridge is same as Sandybridge as far as OpenBLAS is concerned), but back then my bisect went nowhere and the problem was stated to have appeared in much earlier versions. So far I have no idea why Skylake kernels would get miscompiled on Sandybridge specifically (doing the same DYNAMIC_ARCH build with the same gcc version on Haswell did not result in any problems on SkylakeX). |
I first did a "manual" bisect by first looking for the first version tag that showed the problem and then bisecting from there. 0.3.10 was the first I tried because 0.3.0 did not compile (I think in v0.3.9 or so was a note referring to this). After narrowing it down to somewhere between 0.3.12 and 0.3.13 I bisec'ed from there. That may be why I got different bisect results than you... Because you noted that the problem seems to have occurred in earlier versions, to be sure that I did not make an error during bisecting, I just checked out |
Please set OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 and retry failing sample. Code you point to is regarding threads. |
The trouble is that d71fe4e (should have) affected parameters for the unrelated Haswell and Ryzen cpus only, neither Sandybridge nor SkylakeX. |
Hmm. Part of the problem may be that the Ivybridge-compiled OpenBLAS may actually be using Haswell kernels or worse on SkylakeX (due to a logic bug between build platform capability and runtime platform capability). Specifically undoing the changes from d71fe4e did not make the test case from #2986 work however, and given that your code already ran fine on actual Haswell before you reverted that I still think that should have been unrelated - at worst a heisenbug that does not show up on every run. |
Bisecting again with the test case from #2986 but with an earlier starting point now puts the blame on 081b188 which was part of PR #2384 by @wjc404 . At least this is now something that targeted SKYLAKEX, and I can already see one design flaw that was not clear to me a year ago - the PR introduces cpu-specific code in the BLAS3 driver functions, but in DYNAMIC_ARCH builds these are only built once with the settings of the designated TARGET (or build cpu).
(min_jj here is the N argument in a subsequent GEMM_ONCOPY or ...OTCOPY(M,N,..) (For a TARGET=GENERIC build, min_jj would be capped at 4 instead of the intended 12, as my #3026 removed the 3GEMM_UNROLL_N line which was seen to cause SYRK performance problems on Haswell. For a Sandybridge/Ivybridge build, GEMM_UNROLL_N would be 4, so min_jj would get capped at 8 instead of 24 - and it would have been 12 without my removing the 3GEMM_UNROLL_N - which was the second half of the PR to which d71fef4 belonged. So perhaps this is why reverting that part worked for you (assuming you reverted the entire PR and not just the two unrelated GEMM_UNROLL_MN lines in param.h) |
Small correction - it is always the GEMM_UNROLL_N applicable for the build host that gets inserted into the level3 gemm driver code, regardless of TARGET. And the problem is reproducible with Intel SDE, so probably unrelated to BIOS/microcode versions of physical hardware. |
probably fixed with 0.3.19 through #3469 |
Checking out v0.3.19 and compiling with the same flags as in the OP solved this problem for me. Setting OMP_NUM_THREADS and OPENBLAS_NUM_THREADS to 1 with v0.3.18 did not affect the issue. Thank you for all your work! |
…achine Origin: upstream, OpenMathLib/OpenBLAS#3579 Bug: OpenMathLib/OpenBLAS#2986 OpenMathLib/OpenBLAS#3454 OpenMathLib/OpenBLAS#3557 Bug-Debian: https://bugs.debian.org/1025480 Applied-Upstream: 0.3.21 Reviewed-by: Sébastien Villemot <[email protected]> Last-Update: 2023-06-26 When building OpenBLAS with dynamic arch selection on x86-64 hardware that does not support AVX2 (e.g. Intel Ivybridge or earlier), then the AVX512 (SkylakeX) kernel for DGEMM would produce incorrect results (of course when run on AVX512-capable hardware). The problem was that the check for determining whether the compiler is able to understand AVX512 assembly/intrinsics was doubly incorrect: it would test the build machine capabilities (instead of the compiler capabilities); and it would check for AVX2 instead of AVX512. As a consequence, on pre-AVX2 hardware, the build system would conclude that the compiler is not able to understand AVX512 primitives, and would create a broken AVX512 (SkylakeX) DGEMM kernel (essentially a Haswell kernel, but with some wrong assumptions, hence leading to incorrect numerical results). Last-Update: 2023-06-26 Gbp-Pq: Name avx512-dgemm.patch
I am running OpenBLAS-enabled R in a Singularity container. I used version 0.3.18 as packaged by Debian but I noticed that on some of our SLURM nodes R gives a wrong result (a genomic kinship matrix is reported as not being positive definite when it actually is). Also, I saw really strange and clearly invalid PCA (principal component analysis) results on that node.
More specifically, on nodes that have the CPU architecture "skylake" the result was wrong whereas on nodes with architectures "nehalem", "haswell" and "ivybridge" the result was correct.
To follow up on this, I compiled OpenBLAS myself. The problem only occurs with the make argument
DYNAMIC_ARCH=1
. I includedTARGET=GENERIC
but that didn't change anything. But if usingmake TARGET=GENERIC
withoutDYNAMIC_ARCH
, the problem does not occur. The problem also only occurs if I compile on ivybridge and run on skylake, not vice versa.I did a
git bisect
usingmake TARGET=GENERIC DYNAMIC_ARCH=1
and found that the first commit that shows this problem is d71fe4e.The text was updated successfully, but these errors were encountered: