-
Notifications
You must be signed in to change notification settings - Fork 1.6k
[RFC] Adding softfp support for ARM by a inline assembly wrapper #853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
To give a brief idea about what we have done, following is an example of our implementation of #ifdef __ARM_PCS_VFP
#define cblas_sgemm cblas_sgemm
#define cblas_sgemv cblas_sgemv
#define cblas_saxpy cblas_saxpy
#define cblas_saxpby cblas_saxpby
#define cblas_sasum cblas_sasum
#define cblas_sscal cblas_sscal
#define cblas_sdot cblas_sdot
#define cblas_scopy cblas_scopy
#else
#define cblas_sgemm "cblas_sgemm"
#define cblas_sgemv "cblas_sgemv"
#define cblas_saxpy "cblas_saxpy"
#define cblas_saxpby "cblas_saxpby"
#define cblas_sasum "cblas_sasum"
#define cblas_sscal "cblas_sscal"
#define cblas_sdot "cblas_sdot"
#endif
void cblas_saxpy_wrapper(const int N, const float a, const float *x, const int incX,
float *y, const int incY) {
#ifdef __ARM_PCS_VFP
// if compiled by -mfloat-abi=hard, directly call cblas_saxpy
return cblas_saxpy(N, a, x, incX, y, incY);
#else
#ifdef __SOFTFP
#error "ERROR:please build use softfp or hard ABI\n"
#else
// if compiled by -mfloat-abi=softfp, run the assembly to prepare for the hardfp ABI call.
__asm__ __volatile__("sub sp, sp, #24 \n\t"
"mov r0, %0 \n\t"
"fmsr s0, %1 \n\t"
"mov r1, %2 \n\t"
"mov r2, %3 \n\t"
"mov r3, %4 \n\t"
"str %5, [sp] \n\t"
"bl " cblas_saxpy "(PLT)\n\t"
"add sp, sp, #24 \n\t"
:
: "r"(N), "r"(a), "r"(x), "r"(incX), "r"(y), "r"(incY)
: "cc", "memory", "r0", "s0", "r1", "r2", "r3", "sp");
return;
#endif
#endif
} For functions with many parameters, such as |
@erlv , thank you for the suggestion. Actually, I prefer to modify the assembly kernel directly. I think it only need edit the beginning part of the source codes. |
Hi @erlv
Have you tried setting the ABI to This allows to link with OpenBLAS (hardfp), hardfp Android libraries when available, and still some remaining softfp Android libraries, and will run on any ARMv7 device. If your goal is to build for general Android softfp (armeabi) with OpenBLAS in hardfp, know that this will fail on older devices because OpenBLAS uses vfpv3-d32 which they won't have. It will only work for devices that do have a floating point register, i.e. that support armeabi-v7a-hard. |
Hi @xianyi , |
Hi @buffer51 , |
Android NDK recommends correct procedure to detect CPU features. Part of the problem is #844 that assemblies are not marked with co-processor they use. |
@brada4 , do you know how to detect CPU correctly on Android? |
It is very similar to x86 cpuid, they even have library functions to detect it: |
Hey erlv! That's a great idea and I really appreciate that you want to contribute back to the community. I did test your example wrapper and it works like a charm. If you want to give the other wrappers you wrote to the openblas users, I would be very grateful! |
@buffer51: With ndk-12, the armeabi-v7a-hard has been removed from the ndk. Therefore you cannot build with that anymore. See https://android.googlesource.com/platform/ndk/+/master/docs/HardFloatAbi.md |
@andreas-eberle , thank you for the information. |
@andreas-eberle Agreed. I think this is a good news, this was very painful in practice. Also, all new phones are now AArch64, which I guess is why they stopped bothering. If you really need ARMv7a, you can still use NDK r11c. Otherwise building for AArch64 already works, and you can use simple armeabi for older devices (slow though..). |
@buffer51 , I didn't try our ARM Cortex-A57 codes on Android AArch64 phone. I think it should work. Do you already try it? |
@xianyi I have not tried the target CORTEXA57 specifically, just ARMV8. It worked without issues. I have a phone with big.LITTLE A53 / A57, I could try CORTEXA57 if you want. |
It is A53 unless #844 cpuid mess is addressed by kernel patch and there is a chance for openblas to support asymmetric multiprocessing. Especially freeze in place of invalid instruction trap is impossible to work around. |
@brada4 , I didn't familiar with A53. Are there any differences between A53 and A57? |
Both are ARMv8 ISA, but A53 is in-order core and A57 is Out-of-Order. |
@erlv, could you please post a wrapper for cblas_sgemm? Thanks a lot. That will be extremely helpful. |
softfp support was added through #1221 and related changes, so closing here |
Dear OpenBLAS community,
We recently suffers from the Android default SoftFP link with hardfp OpenBLAS correctness issue while linking several libraries with OpenBLAS as #777 .
Within our team, we have successfully make the compiler and linker happy on Android by linking softfp library and hardfp OpenBLAS together with a inline assembly wrapper and a compiler wrapper. Meanwhile there is no correctness issue both in theory and by many existing tests.
I am wondering whether the upstream like such idea, and if yes, I'd like to find a way to contribute it back to OpenBLAS community.
The inline assembly wrapper is a simulator, which tries to prepare the hardfp OpenBLAS function call's register and stack parameter passing using embedd assembly when compiler compile the code with
-mfloat-abi=softfp
.The reason of using inline assembly wrapper instead of direct softfp implementation are:
The compiler wrapper is basically used to get rid of the float-abi attribute in each object file within the
libblas.a/libblas.so
file, so that the linker does not complain the erroruses VFP register arguments, output does not
while link Android toolchain compiled object files with OpenBLAS on ARM.The text was updated successfully, but these errors were encountered: