-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Incorrect GEMM results with armv8 #1870
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Could be related to #1821 (fastest way to check would probably be to replace just kernel/arm64/KERNEL.ARMV8 with an older version) |
I'm fixing some of the ARM issues on my fork: See if that branch works for you. |
Can you please post your test code ? I cannot reproduce this on ARMV8 with current
|
@mdenna-nviso does it happen that you use one input as output by chance? |
Hi all, thanks for the prompt support!
I repeated the test with the latest Thanks |
I'm wondering if I'm doing something wrong, but the same test program compiled with version 5a6a2be gives me correct results:
Any test I could to help clarify the issue? |
That is odd, I cannot reproduce your results either (develop or cleanup branches). I don't have access to a (free) RPi3, would you mind doing a git bisect between 5a6a2be and some known bad commit? A simple script to compile, run and grep for "56 62 68 74" would do for git-bisect. At least we know what change altered the results for you, so it's easier to start guessing what's going wrong. |
A few questions:
|
Both seconded :-) |
I run bisect, looks like there is a commit that breaks the library (bus fault) and from that point on I always get bus fault or wrong results. BUS FAULT:
This is a commit somewhere in between the bad commit and the latest develop where the bus fault has been fixed but the results are incorrect: [5d42b6e] Merge pull request #1756 from martin-frbg/issue1754
FAIL I'm compiling the example using cmake.
Thanks |
Too bad, it looks like the (intermittent use of the) new thread-local storage allocation code messes up the bisect. Current |
@mdenna-nviso Small few-line patches highly recommended over 0.3.3 and older to improve old/good serializing allocator |
@brada4 does not explain why it still goes wrong with current |
CMAKE seems to be picking a different (and apparently) broken sgemm_kernel for ARMV8 (generic 2x2 as far as I can tell by now), possibly all the data obtained from the call to getarch_2nd goes unused. I cannot correlate that with any change since 5a6a2be though. |
For TARGET=ARMV8, 5a6a2be would choose sgemm_kernel_4x4.S for inclusion in kernel/CMakeFiles , more recent versions went to generic sgemm_kernel_2x2.c . (Both pick sgemm_kernel_16x4.S when allowed to autodetect a CortexA53 system, so the microkernel selection mechanism is not completely broken). |
Actually reversing the order of the if branches does look like it could provide an acceptable (interim) solution, except that on its own this change creates spurious references to nonexisting sources "cgemm_kernel.S" and "zgemm_kernel.S" with corresponding ghost objects in the generated Makefiles, build and link.txt |
Hi,
it seems there is a regression in cblas_sgemm() when compiled for aarch64 (single thread).
Version 5a6a2be was working correctly but the latest one gives wrong results even for simple test cases:
NMK = 4,4,4
A = B =
0.0000, 1.0000, 2.0000, 3.0000,
4.0000, 5.0000, 6.0000, 7.0000,
8.0000, 9.0000, 10.0000, 11.0000,
12.0000, 13.0000, 14.0000, 15.0000,
Expected result:
56.0000, 62.0000, 68.0000, 74.0000,
152.0000, 174.0000, 196.0000, 218.0000,
248.0000, 286.0000, 324.0000, 362.0000,
344.0000, 398.0000, 452.0000, 506.0000,
Got:
28.0000, 34.0000, 76.0000, 82.0000,
76.0000, 98.0000, 252.0000, 274.0000,
124.0000, 162.0000, 428.0000, 466.0000,
172.0000, 226.0000, 604.0000, 658.0000,
Any suggestion?
The text was updated successfully, but these errors were encountered: