-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Fix iamax sse implementation and add utests #2414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix iamax sse implementation and add utests #2414
Conversation
5f925de
to
9c054f1
Compare
Hehe, thanks. I read the diff for the full commit three times without spotting the p-q typo in the original... |
9c054f1
to
8a32853
Compare
Yep. To be honest I found the issue while debugging in gdb. I understood the code and then I observed the suspicious comparison instruction. |
8a32853
to
2aa3886
Compare
Looks as if you rattled two other cages with your new testcase - though I am not sure I understand why arm32 would fail with clang only - will try to reproduce that locally. |
2aa3886
to
10c9942
Compare
Apparently I had an issue in |
10c9942
to
c1a4569
Compare
I've got it. There's an error in CMake configuration for |
How so ? I see it (on raspberry pi4 in 32bit mode) linking to arm/iamax_vfp.S (as defined in KERNEL.ARMV6) with USE_ABS and USE_MIN defined as it should be. All tests passed with clang7, |
Give me a minute, I've got a fix. |
c1a4569
to
2d46994
Compare
Got it - the problem is in ismin, not isamin, definition. This is likely to have quite far-reaching implications, would not surprise me if it -or something like it - was behing #2396 |
2d46994
to
bcf4ab9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain this change please - why do you believe the assembly kernels were "wrongly enabled" four years ago ? If they failed the new utest - or if you benchmarked both and found the assembly inferior, this should go into a separate PR IMHO
Ok, so I noticed the failure once I added a unit tests for this PR. Based on the results it returned value which is ought to be returned for absolute values of the input. Moreover, looking at |
Wouldn't it be using min.S rather than isamin_power8.S (which is just a precompiled version of isamin.c as a workaround for old gcc versions that cannot process the C file) ? (The same default would apply to other targets where the KERNEL.target does not specify SMINKERNEL.) Anyway my main point was that I would like to avoid having tangentially related changes in a PR that does not mention them.) |
I don't know how
which I believe is a generic implementation. (The same default would apply to other targets where the KERNEL.target does not specify SMINKERNEL.) Anyway my main point was that I would like to avoid having tangentially related changes in a PR that does not mention them.) Sure. I can remove the commit that's related to POWER target. Will you merge this if it has a new failing unit test that fails on POWER? |
Certainly - we know that the failure is not actually caused by this PR, and I am trying to track down and fix the POWER issue already. While arm/imin.c and arm/imax.c are indeed the generic implementations used across several architectures, in the event that the KERNEL.target lacks a definition there will be a fallback taken from the respective Makefile.Lx. So it could be that more than just POWER8 is affected. (Actually imin.S/imax.S look as if it should work, they combine a subtraction of two candidate values with a selection of one or the other depending on whether the result of the subtraction is less than zero. However I see now that Makefile.L1 actually defines iAmin.S/iAmax.S as the fallback, which does not feel correct...) |
bcf4ab9
to
5c3cdb7
Compare
The was a typo in iamax_sse.S where one of the comparison was cmpeqps instead of cmpeqss. That misdetected index for sequences where the minimum value was 0.
5c3cdb7
to
aeea14e
Compare
Good. I've just removed the POWER related change. I'm leaving that to you.. |
The typo in the implementation is there since the beginning (import from GotoBLAS).