Open
Description
It would be nice to have feature detection for ARM (32-bit) in x/sys/cpu
.
Concretely, a new cpu.ARM
struct that closely resembles the existing cpu.ARM64
struct, tailored to the ARM specific hardware capabilities. The following fields are proposed, which map directly to the {HWCAP, HWCAP2} auxiliary vector values on Linux and FreeBSD:
HasSWP bool // SWP instruction support
HasHALF bool // Half-word load and store support
HasTHUMB bool // ARM Thumb instruction set
Has26BIT bool // Address space limited to 26-bits
HasFASTMUL bool // 32-bit operand, 64-bit result multiplication support
HasFPA bool // Floating point arithmetic support
HasVFP bool // Vector floating point support
HasEDSP bool // DSP Extensions support
HasJAVA bool // Java instruction set
HasIWMMXT bool // Intel Wireless MMX technology support
HasCRUNCH bool // MaverickCrunch context switching and handling
HasTHUMBEE bool // Thumb EE instruction set
HasNEON bool // NEON instruction set
HasVFPv3 bool // Vector floating point version 3 support
HasVFPv3D16 bool // Vector floating point version 3 D8-D15
HasTLS bool // Thread local storage support
HasVFPv4 bool // Vector floating point version 4 support
HasIDIVA bool // Integer divide instruction support in ARM mode
HasIDIVT bool // Integer divide instruction support in Thumb mode
HasIDIV bool // Integer divide instruction support in ARM and Thumb mode
HasVFPD32 bool // Vector floating point version 3 D15-D31
HasLPAE bool // Large Physical Address Extensions
HasEVTSTRM bool // Event stream support
HasAES bool // AES hardware implementation
HasPMULL bool // Polynomial multiplication instruction set
HasSHA1 bool // SHA1 hardware implementation
HasSHA2 bool // SHA2 hardware implementation
HasCRC32 bool // CRC32 hardware implementation
As I look around, I see code detecting CPU features based on the runtime.goarm
value (which is set by the GOARM environment variable at link time), rather than a runtime check. This means that:
- As
runtime.goarm
is notconst
, the fast-path (e.g. using NEON) and slow-path fallback are being compiled into the binary, but only one path can ever be used. It would be nice if both paths can be used via run-time detection instead. - Using the above, one cannot have a "universal binary" that is especially problematic on Android.
In one of my projects, I have resorted to parsing /proc/cpuinfo
for run-time detection of NEON, which only works on Linux. I'd love to instead use the standard library.
Metadata
Metadata
Assignees
Labels
Type
Projects
Relationships
Development
No branches or pull requests
Activity
smasher164 commentedon Aug 7, 2019
x/sys/cpu has still not been totally updated to match internal/cpu, which has some of the feature flags you are looking for.
Note that so far, we have only exposed flags that are needed for the runtime and standard library. That is, CL 114826 exposes
HasIDIVA
for use in the runtime, and CL 126315 exposesHasVFPv4
for use in the math package. I could see the argument for exposing all of the HWCAP flags listed in linux, for use in external code.I will defer to @martisch's judgement on that.
martisch commentedon Aug 7, 2019
Note that changing internal/cpu seems a separate topic (it also already supports arm) and there are no plans to export those variables outside the runtime/standard library.
If x/sys/cpu support for linux/arm is enough (not *BSD or windows/arm) then it should be as easy as implementing the hwCap/hwCap2 bit checks in https://github.com/golang/sys/blob/master/cpu/cpu_arm.go since hwCap/hwCap2 should already be populated on Linux by: https://github.com/golang/sys/blob/master/cpu/cpu_linux.go. I think its fine to expose all linux supported but general arm applicable flags in x/sys/cpu.
Note that the above would not support android as /proc/self/auxv might not be accessible and it wont change the runtime and standard libs detection of arm features (since that uses internal/cpu and not x/sys/cpu) and the effects of setting goarm are not effected as they override any cpu package .e.g.:
go/src/runtime/os_freebsd_arm.go
Line 14 in 3b6216e
go/src/math/sqrt_arm.s
Line 9 in 0df9fa2
jpap commentedon Aug 8, 2019
Thanks for the detailed response, Martin.
That's right; I'm happy to open a new issue to discuss that separately if you wish. I would propose that the internal implementation be separate (a "copy") of the
x/sys/cpu
version.Since you mention it, I was reading the CL for #28148 (see this source file), and noticed how the use of detection of NEON could be improved by checking the hardware-caps instead of using the static
runtime.goarm
value. I did notice other similar uses throughout the Go internal/standard library, as you have linked to. While it might be some work to review all of them for a possible switch to runtime hardware-cap checks (esp where we can use a NEON code-path), it might be beneficial to do so over time?(As said above, I am doing run-time detection at present by parsing
/proc/cpuinfo
which could also be improved.)I am mostly interested in Android, so support for Linux would be a win. (Apple has moved exclusively to aarch64 for some time now.)
It looks like FreeBSD has added support back in 2017 (see D12290 and D12291), so it might be possible to do something similar there also.
As far as I can tell, Go bring-up for windows/arm has stalled (c.f. #26148), so that would need to wait regardless?
BoringSSL provides a good reference implementation: they weak-link
getauxval
which is available on Android from API level 20, and fall back to/proc/self/auxv
and/proc/cpuinfo
in that order. So it appears we might be able to improve on the implementation you have linked to.martisch commentedon Aug 8, 2019
The scope of the feature request is currently unclear to me when reading the title and first post content. From the comments I would infer it is to add arm support for android. It also does not seem restricted to x/sys/cpu but part of the proposal is to change the runtimes use of goarm with dynamic detection. As internal/cpu IRC already has support for arm the runtime could already use that in some places where goarm is used (no need to copy from x/sys/cpu). However this would be a behaviour change and some platforms would still need to use goarm (to populate the cpu variables). I think that is better discussed decoupled from any x/sys/cpu support.
The Go compiler uses the goarm setting to make decisions about the emitted instructions and thereby does not produce "universal binaries". Changing this would be a larger change than x/sys/cpu support and likely have larger performance and binary size implications when switched to runtime detection:
go/src/cmd/compile/internal/arm/ssa.go
Line 644 in f125b32
go/src/cmd/compile/internal/ssa/regalloc.go
Line 599 in 8a317eb
Seems like a good candidate for using x/sys/cpu but it seems it would need a fallback in x/sys/cpu to goarm for platforms not supporting CPU feature detection via AUXV or syscalls/proc.
The Go runtime sets hwcap on arm in internal/cpu on FreeBSD by reading the auxv thats provided after argv. I dont think x/sys/cpu has direct access to that. So I think we can not simply copy that approach to x/sys/cpu.
go/src/runtime/os_freebsd.go
Line 385 in a62b572
No having only partial support in x/sys/cpu is fine. However to not regress in performance for other platforms we would seem to need a fallback to setting the cpu features based on goarm on those not supporting runtime detection.
Thank you. Thats a nice reference. If we can make the same work under go then that looks like an option to gain feature detection support on android in x/sys/cpu.
jpap commentedon Aug 8, 2019
Let me be more clear...
This issue is about:
x/sys/cpu
for third parties.internal/cpu
.elf32_freebsd_sysvec
symbol in libc; see posted links above to the FreeBSD site for further details.This issue is not about,
internal/cpu
and how CPU hardware-cap detection is done internally in the Go standard library.however a separate issue can be created that proposes:
internal/cpu
; for example, to determine whether Advanced SIMD (NEON) is supported on the device on Linux. The BoringSSL implementation for Linux could be used as a reference here also.runtime.goarm
. After a quick look, it appears there is no support for NEON-accelerated algorithms on arm; only a reference to "will want to revision NEON, when support is better", which is actually on arm64 where NEON is always supported. (That reference is to Plan 9 assembly support, I am guessing.)I understand this issue, but I would never propose it. My reference to a "universal binary" was with reference to one of my projects where I detect NEON, and take a fast-path accordingly. A true universal binary -- that is, a program supporting multiple architectures -- is often achieved though shipping separate texts for each architecture in a sandwiched "fat binary" as is possible on macOS, and only exists as a proof-of-concept on Linux. I am not proposing that here.
Yes -- and sure beats developers individually using
go:linkname
to punch a hole through toruntime.goarm
.I have not tried it, but it appears a symbol exists on libc that gets you access to the information without having to go via auxv.
gopherbot commentedon Aug 16, 2019
Change https://golang.org/cl/190525 mentions this issue:
cpu: add support for ARM (32-bit) feature detection.
jpap commentedon Aug 16, 2019
@martisch, I've submitted a CL for this issue as described by my past post, and have tested it on linux/arm and freebsd/arm. Are you able to review it?
martisch commentedon Aug 17, 2019
Sure. Thanks for working on it. I added a first round of high level comments.
I been meaning to reply more in depth about how we can approach this but unfortunately did not find the time earlier.
There is already support for linux auxv reading in cpu_linux. We should first leverage that existing detection to add the hwcap bits to variable mapping and then I think basic linux/arm support should already work. Continuing from there we can add additional support for android to work around /proc/self/auxv not necessary being accessible to fill hwcap in x/sys/cpu. Afterwards we can extend the support for arm on other platforms and wheel in help from testers with hardware on those.
What makes arm special in x/sys/cpu vs other archs is that absent hwcap detection we should fall back to the minimal set of hwcap bits set that is mandated by the goarm variable.
rsc commentedon Aug 20, 2019
As far as the API is concerned, which is what the proposal process would care about, it seems that the proposal is to add a cpu.ARM that is very similar to cpu.ARM64 with appropriate modifications for the actual CPU features that might be present.
/cc @tklauser @martisch @ianlancetaylor for feedback
martisch commentedon Aug 26, 2019
For the API I think we should stay consistent with the other architectures and
internal/cpu
which means to expose acpu.ARM
struct with fields namedHasNAME
whereNAME
is the correspondingHWCAP
name of the feature. If needed the client ofx/sys/cpu
can create combinations such asHasIDIVA | HasIDIVT
outside ofx/sys/cpu
but we do not need to create these as fields inside thecpu.ARM
struct.In general (for adding other architectures) I think if the existing naming schema is followed we do not need to have an separate proposal for each API addition for additional CPU feature structs/variables in
x/sys/cpu
.jpap commentedon Aug 27, 2019
@martisch, I have updated the original post to include a concrete list of fields.
I would argue that having "derived" fields, such as the proposed
HasIDIV
makes the API easier to consume by the user. Anyone else keen to chime in?[-]proposal: x/sys/cpu -- add support for ARM[/-][+]proposal: x/sys/cpu: add support for ARM[/+]15 remaining items
jpap commentedon Sep 26, 2019
The CL has been updated and split into a chain of commits.
@tklauser I've invited you as a reviewer and hope that you've got a little time to look at this contribution. Thank you in advance.
gopherbot commentedon Sep 27, 2019
Change https://golang.org/cl/197541 mentions this issue:
cpu: protect ARM feature detection from broken device
gopherbot commentedon Sep 27, 2019
Change https://golang.org/cl/197540 mentions this issue:
cpu: fallback to using /proc/{self/auxv, cpuinfo} for ARM feature detection
gopherbot commentedon Sep 27, 2019
Change https://golang.org/cl/197542 mentions this issue:
cpu: add support for FreeBSD ARM feature detection
cpu: support ARM feature detection on Linux
davidben commentedon Sep 27, 2019
(I'm one of the BoringSSL maintainers and the author of our ARM CPU detect bits.)
32-bit ARM CPU feature detection on Linux is kind of a headache with older Androids, yeah. :-(
I probably wouldn't recommend trying to detect that one broken CPU (https://golang.org/cl/197541). The Android cpu-features library doesn't do this and we only ever had issues with one function. At this point the affected CPU is rare enough that I'm hoping to just remove it soon. (Chrome for Android already requires NEON support. The workaround results in us carrying extra crypto implementations for just that CPU.)
As for whether you want the the tower of /proc fallbacks, I guess it depends on what versions of Android you care about and how much you care about getting NEON on those older devices. Android L and up have
getauxval
. I expect other Linux ARMs havegetauxval
by now. https://developer.android.com/about/dashboards has some Android usage numbers.I'll also note that BoringSSL only pays attention to NEON and ARMv8 crypto-related bits, so some of our fallbacks may not be a good template for the other features. In particular, I don't think the ARMv8 logic in https://golang.org/cl/197540's /proc/cpuinfo parser is quite right.
jpap commentedon Sep 28, 2019
@davidben, thanks for chiming in; some comments inline below.
That's fair enough. My interest is in NEON detection on Android. However it would be nice if we also had accurate crypto detection, so that we might later introduce accelerated TLS for arm32 in Go's stdlib. It looks like support is currently limited to arm64.
I would ideally like to target KitKat (API level 19) and up. How good is the
/proc/self/auxv
fallback there? If we can't land the fallback support intox/sys/cpu
, then I might just use a vendored approach. What would you recommend?On the ARMv8 check, I lifted it directly from this BoringSSL implementation. If you can outline what's not quite right here, and what could be improved, I'd appreciate it.
davidben commentedon Sep 28, 2019
I don't remember off-hand when the /proc/self/auxv works vs. /proc/cpuinfo. I vaguely recall it was something weird though, where some Android version or device accidentally took away /proc/self/auxv without adding getauxval?
As I said, BoringSSL only cares about NEON and ARMv8 crypto-related bits. I expect there are other optional ARMv7 features that became mandatory in ARMv8 other than just NEON. Since BoringSSL doesn't care about any ARMv7 feature other than NEON, our code is not a good template for those features.
It probably makes sense to check how the kernel actually fills in /proc/cpuinfo and review against that.
jpap commentedon Sep 28, 2019
In that case, would you recommend that if you had any fallback whatsoever, you include both
/proc/self/auxv
and/proc/cpuinfo
as you have in the BoringSSL implementation, and as I have done in the CL under discussion?That makes sense.
Looking at the Armv8 Architecture Reference Manual, in the AArch32 execution state:
As you've stated, there could be more features that are mandatory in ARMv8, but I would say that the above are the ones we would generally care about (NEON + crypto).
I'll update the CL to reflect this. If I've missed a feature that you care about, please let me know.
einthusan commentedon Oct 29, 2021
martisch commentedon Oct 29, 2021
einthusan commentedon Oct 29, 2021
martisch commentedon Oct 29, 2021