Skip to content

Why does OpenBLAS not support sofftp with ARMv7 #777

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jainanshul opened this issue Feb 3, 2016 · 16 comments
Closed

Why does OpenBLAS not support sofftp with ARMv7 #777

jainanshul opened this issue Feb 3, 2016 · 16 comments

Comments

@jainanshul
Copy link

As per #363 (comment) softfp is not supported with ARMv7. I tested it on the device and I do get a segmentation fault. Wondering why is softfp only supported for ARMv5?

@brada4
Copy link
Contributor

brada4 commented Feb 4, 2016

ARMv5 is C-only. You can compile it with exact -march flag for your architecture to get optimal code. ARMv7 kernels gets all enhancements from use of VFP which you dont have. Also you must measure if libm or gcc builtins are faster (-fno-builtins vs -fbuiltins). Probably it is best to avoid double precision alltogether as one double precision instruction will be emulated in 30-40 int32 instructions.

@xianyi
Copy link
Collaborator

xianyi commented Feb 4, 2016

For ARMv7, we write hard fp ABI in assembly kernels. If we support softfp ABI, we must change these assembly manually. It can be done but lots of work.

@jainanshul
Copy link
Author

Thanks for your replies. I change the target to ARMV5 and I don't see any crash with softfp. However the performance is abysmal 27secs to load and predict using caffe with softp openblas vs 7secs with hardfp openblas.

Does OpenBLAS support 64bit ARM architecture?

@xianyi
Copy link
Collaborator

xianyi commented Feb 4, 2016

The ARMv5 is naive C kernels. Therefore, it is very slow.

We support ARM 64bit including Cortex-A57.

@brada4
Copy link
Contributor

brada4 commented Feb 4, 2016

Yes, performance is abysmal as your CPU lacks floating point support and everything is done in integer ALU. 4 instructions to emulate floating point instruction is very very reasonable.

You can try -fbuiltins - maybe GCC can optimize something, but will not get better than 3x slowdown.

Example: http://www.jhauser.us/arithmetic/SoftFloat.html find 32bit mul in integer c. 4 instructions according to my brain's optimistic C compiler (or 3 if you really really dont do anything about NaNs.

@xianyi
Copy link
Collaborator

xianyi commented Feb 4, 2016

@brada4 , there are three FP ABI mode soft, softfp, and hard.

The soft uses software FPU (as you mentioned) and soft calling convention.
The softfp uses hardware FPU and compatible soft calling convention.
The hard uses hardware FPU and different calling convention. The hard calling convention is passing the floating point argument by FP registers.

OpenBLAS uses hard FP ABI mode. It conflicts with soft and softfp.

@brada4
Copy link
Contributor

brada4 commented Feb 4, 2016

...which is a call to libm which may or may not use FPU which may or may not exist.

@brada4
Copy link
Contributor

brada4 commented Feb 7, 2016

Actually "does not support" claim is now void as per @jainanshul
It is hard to tell why is it so slow without pieces of code. softfp slows down all calls involving floating point arguments. Depending of GCC flags you chose it might be much more or much less of such calls. Without code or compiler arguments - there is no issue.

@jainanshul
Copy link
Author

@brada4 well openBLAS crashes with ARMv7 and softfp support so I would still say openBLAS doesn't support softfp with ARMv7.

@brada4
Copy link
Contributor

brada4 commented Feb 8, 2016

ARMv7 is name of Debian architecture, i.e hardfp
ARMv5 means any ARM, just like Prescott means Any x86.

@lygstate
Copy link

lygstate commented Jan 2, 2017

What's the point of ARMV6?

@martin-frbg
Copy link
Collaborator

martin-frbg commented Jan 2, 2017

What's the point of ARMV6?

Older models of Raspberry Pi and similar (support was added in early 2013) ?

@lygstate
Copy link

lygstate commented Jan 2, 2017

Does it also use the hard ABI?

@martin-frbg
Copy link
Collaborator

I'm not that experienced with ARM coding, but the assembly used in the v6-specific routines under kernel/arm look suspiciously similar to the v7 counterparts (and both were written by the same developer), so probably yes.

@epicstar
Copy link

@xianyi what is the current timeline for this feature to be complete? Thanks for working on this btw.

@martin-frbg
Copy link
Collaborator

Expected to be resolved through #1221, so closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants