Skip to content

Fix iamax sse implementation and add utests #2414

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 17, 2020

Conversation

marxin
Copy link
Contributor

@marxin marxin commented Feb 13, 2020

The typo in the implementation is there since the beginning (import from GotoBLAS).

@marxin marxin force-pushed the fix-iamax_sse-implementation branch from 5f925de to 9c054f1 Compare February 13, 2020 13:47
@martin-frbg
Copy link
Collaborator

Hehe, thanks. I read the diff for the full commit three times without spotting the p-q typo in the original...

@marxin marxin force-pushed the fix-iamax_sse-implementation branch from 9c054f1 to 8a32853 Compare February 13, 2020 14:07
@marxin
Copy link
Contributor Author

marxin commented Feb 13, 2020

Hehe, thanks. I read the diff for the full commit three times without spotting the p-q typo in the original...

Yep. To be honest I found the issue while debugging in gdb. I understood the code and then I observed the suspicious comparison instruction.

@marxin marxin force-pushed the fix-iamax_sse-implementation branch from 8a32853 to 2aa3886 Compare February 13, 2020 16:02
@martin-frbg
Copy link
Collaborator

Looks as if you rattled two other cages with your new testcase - though I am not sure I understand why arm32 would fail with clang only - will try to reproduce that locally.

@marxin marxin force-pushed the fix-iamax_sse-implementation branch from 2aa3886 to 10c9942 Compare February 14, 2020 08:11
@marxin
Copy link
Contributor Author

marxin commented Feb 14, 2020

Looks as if you rattled two other cages with your new testcase - though I am not sure I understand why arm32 would fail with clang only - will try to reproduce that locally.

Apparently I had an issue in Makefile where I wrongly included '.c' and not '.o' file.
And I used VLA in C, which explains why older compilers rejected the code.

@marxin marxin force-pushed the fix-iamax_sse-implementation branch from 10c9942 to c1a4569 Compare February 14, 2020 08:28
@marxin
Copy link
Contributor Author

marxin commented Feb 14, 2020

I've got it. There's an error in CMake configuration for /home/marxin/Programming/OpenBLAS/objdir/kernel/CMakeFiles/isamin_k.S file.

@martin-frbg
Copy link
Collaborator

martin-frbg commented Feb 14, 2020

How so ? I see it (on raspberry pi4 in 32bit mode) linking to arm/iamax_vfp.S (as defined in KERNEL.ARMV6) with USE_ABS and USE_MIN defined as it should be. All tests passed with clang7,
have not found a quick way to install clang6 as used in the CI.

@marxin
Copy link
Contributor Author

marxin commented Feb 14, 2020

How so ? I see it (on raspberry pi4 in 32bit mode) linking to arm/iamax_vfp.S (as defined in KERNEL.ARMV6) with USE_ABS and USE_MIN defined as it should be. All tests passed with clang7,
have not found a quick way to install clang6 as used in the CI.

Give me a minute, I've got a fix.

@marxin marxin force-pushed the fix-iamax_sse-implementation branch from c1a4569 to 2d46994 Compare February 14, 2020 09:37
@martin-frbg
Copy link
Collaborator

Got it - the problem is in ismin, not isamin, definition. This is likely to have quite far-reaching implications, would not surprise me if it -or something like it - was behing #2396

@marxin marxin force-pushed the fix-iamax_sse-implementation branch from 2d46994 to bcf4ab9 Compare February 14, 2020 11:01
@marxin marxin changed the title Fix iamax sse implementation Fix iamax sse implementation and add utests Feb 14, 2020
@martin-frbg martin-frbg added this to the 0.3.9 milestone Feb 14, 2020
Copy link
Collaborator

@martin-frbg martin-frbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain this change please - why do you believe the assembly kernels were "wrongly enabled" four years ago ? If they failed the new utest - or if you benchmarked both and found the assembly inferior, this should go into a separate PR IMHO

@marxin
Copy link
Contributor Author

marxin commented Feb 15, 2020

Can you explain this change please - why do you believe the assembly kernels were "wrongly enabled" four years ago ? If they failed the new utest - or if you benchmarked both and found the assembly inferior, this should go into a separate PR IMHO

Ok, so I noticed the failure once I added a unit tests for this PR. Based on the results it returned value which is ought to be returned for absolute values of the input. Moreover, looking at OpenBLAS/kernel/power/isamin_power8.S, the fine contains quite some xvabssp instructions and one can't find usage of USE_ABS macro in the file. I can't read ppc64 assembly, but I'm pretty sure it gives bad values.

@martin-frbg
Copy link
Collaborator

martin-frbg commented Feb 15, 2020

Wouldn't it be using min.S rather than isamin_power8.S (which is just a precompiled version of isamin.c as a workaround for old gcc versions that cannot process the C file) ? (The same default would apply to other targets where the KERNEL.target does not specify SMINKERNEL.) Anyway my main point was that I would like to avoid having tangentially related changes in a PR that does not mention them.)

@marxin
Copy link
Contributor Author

marxin commented Feb 16, 2020

Wouldn't it be using min.S rather than isamin_power8.S (which is just a precompiled version of isamin.c as a workaround for old gcc versions that cannot process the C file)?

I don't know how isamin_power8.S is created. But one can see that ISMINKERNEL is implemented multiple times with arm/imin.c:

kernel/arm/KERNEL.ARMV5:ISMINKERNEL  = ../arm/imin.c
kernel/arm64/KERNEL.ARMV8:ISMINKERNEL  = ../arm/imin.c
kernel/arm64/KERNEL.CORTEXA57:ISMINKERNEL  = ../arm/imin.c
kernel/arm64/KERNEL.THUNDERX:ISMINKERNEL  = ../arm/imin.c
kernel/arm64/KERNEL.THUNDERX2T99:ISMINKERNEL  = ../arm/imin.c
kernel/arm64/KERNEL.TSV110:ISMINKERNEL  = ../arm/imin.c
kernel/power/KERNEL.POWER8:#ISMINKERNEL  = ../arm/imin.c
kernel/power/KERNEL.POWER9:#ISMINKERNEL  = ../arm/imin.c
kernel/x86/KERNEL.generic:ISMINKERNEL  = ../arm/imin.c
kernel/x86_64/KERNEL.generic:ISMINKERNEL  = ../arm/imin.c
kernel/zarch/KERNEL.Z13:ISMINKERNEL  = ../arm/imin.c
kernel/zarch/KERNEL.ZARCH_GENERIC:ISMINKERNEL  = ../arm/imin.c

which I believe is a generic implementation.

(The same default would apply to other targets where the KERNEL.target does not specify SMINKERNEL.) Anyway my main point was that I would like to avoid having tangentially related changes in a PR that does not mention them.)

Sure. I can remove the commit that's related to POWER target. Will you merge this if it has a new failing unit test that fails on POWER?

@martin-frbg
Copy link
Collaborator

Certainly - we know that the failure is not actually caused by this PR, and I am trying to track down and fix the POWER issue already. While arm/imin.c and arm/imax.c are indeed the generic implementations used across several architectures, in the event that the KERNEL.target lacks a definition there will be a fallback taken from the respective Makefile.Lx. So it could be that more than just POWER8 is affected. (Actually imin.S/imax.S look as if it should work, they combine a subtraction of two candidate values with a selection of one or the other depending on whether the result of the subtraction is less than zero. However I see now that Makefile.L1 actually defines iAmin.S/iAmax.S as the fallback, which does not feel correct...)

@marxin marxin force-pushed the fix-iamax_sse-implementation branch from bcf4ab9 to 5c3cdb7 Compare February 17, 2020 08:01
The was a typo in iamax_sse.S where one of the comparison
was cmpeqps instead of cmpeqss. That misdetected index
for sequences where the minimum value was 0.
@marxin marxin force-pushed the fix-iamax_sse-implementation branch from 5c3cdb7 to aeea14e Compare February 17, 2020 08:02
@marxin
Copy link
Contributor Author

marxin commented Feb 17, 2020

Good. I've just removed the POWER related change. I'm leaving that to you..

@marxin marxin requested a review from martin-frbg February 17, 2020 08:05
@martin-frbg martin-frbg merged commit 634f2bd into OpenMathLib:develop Feb 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants