-
Notifications
You must be signed in to change notification settings - Fork 983
fix(rp2350): add software spinlocks #5034
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release
Are you sure you want to change the base?
Conversation
Oh, whoops. My go fmt extension has been flaking out on me. Will have the missing rp2040 imports updated in a moment. Here's the lock/unlock disassembled output with inlining disabled:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I support the switch to atomic instructions, but if you need something that works right away, RP2350-E2 mentions that some spinlocks are not affected:
The following SIO spinlocks can be used normally because they don’t alias with writable registers: 5, 6, 7,
10, 11, and 18 through 31. Some of the other lock addresses may be used safely depending on which of
the high-addressed SIO registers are in use.
Locks 18 through 24 alias with some read-only TMDS encoder registers, which is safe as only writes are
mis-decoded.
src/runtime/runtime_rp2350.go
Outdated
// r0 is automatically filled with the pointer value "l" here. | ||
// We create a variable to permit access to the state byte (l.state) and | ||
// avoid a memory fault when accessing it in assembly. | ||
state := &l.state | ||
_ = state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hoping state
ends up in r0
seems brittle to me, and I'm surprised the compiler doesn't optimize it away. Are you sure you can't bind state
to an asm register a better way? https://tinygo.org/docs/concepts/compiler-internals/inline-assembly/ mentions that Cgo assembly is more full-featured and also inlined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, don't do this. This will break eventually. The compiler is free to put it in any register it likes, store it on the stack, whatever.
Also, all the assembly instructions below are independent from a compiler POV so the compiler is free to modify registers between them if it wants to (it probably won't, but it would be allowed to).
A much better way would be to use atomic operations directly, and with that I mean sync/atomic. See the section I posted before:
tinygo/src/runtime/runtime_tinygoriscv_qemu.go
Lines 360 to 375 in 3869f76
func (l *spinLock) Lock() { | |
// Try to replace 0 with 1. Once we succeed, the lock has been acquired. | |
for !l.Uint32.CompareAndSwap(0, 1) { | |
spinLoopWait() | |
} | |
} | |
func (l *spinLock) Unlock() { | |
// Safety check: the spinlock should have been locked. | |
if schedulerAsserts && l.Uint32.Load() != 1 { | |
runtimePanic("unlock of unlocked spinlock") | |
} | |
// Unlock the lock. Simply write 0, because we already know it is locked. | |
l.Uint32.Store(0) | |
} |
This should result in similar assembly, and if it doesn't we'd have to investigate why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it's questionable, but because the Go receiver variable is always passed as the first parameter and r0 is where the first parameter will always be per the AAPCS this should always be consistent. I won't die on that hill if atomics can produce a similar enough result though. I don't recall what method I was testing, but the atomic lock method I was initially looking at disassembled to about 4x as long as this, hence the hacky setup. Looking at that one though, it's only ~2x the size which seems reasonable to me.
Hacky version:
0x10001178 <(*runtime.spinLock).Lock+0>: cbz r0, 0x10001192 <(*runtime.spinLock).Lock+26>
0x1000117a <(*runtime.spinLock).Lock+2>: ldaexb r2, [r0]
0x1000117e <(*runtime.spinLock).Lock+6>: movs r1, #1
0x10001180 <(*runtime.spinLock).Lock+8>: cmp r2, #0
0x10001182 <(*runtime.spinLock).Lock+10>: bne.n 0x1000117a <(*runtime.spinLock).Lock+2>
0x10001184 <(*runtime.spinLock).Lock+12>: strexb r2, r1, [r0]
0x10001188 <(*runtime.spinLock).Lock+16>: cmp r2, #0
0x1000118a <(*runtime.spinLock).Lock+18>: bne.n 0x1000117a <(*runtime.spinLock).Lock+2>
0x1000118c <(*runtime.spinLock).Lock+20>: dmb sy
0x10001190 <(*runtime.spinLock).Lock+24>: bx lr
0x10001192 <(*runtime.spinLock).Lock+26>: bl 0x100013f4 <runtime.nilPanic>
Extensive version:
0x10001178 <(*runtime.spinLock).Lock+0>: cbz r0, 0x100011d8 <(*runtime.spinLock).Lock+96>
0x1000117a <(*runtime.spinLock).Lock+2>: adds r0, #4
0x1000117c <(*runtime.spinLock).Lock+4>: movs r1, #1
0x1000117e <(*runtime.spinLock).Lock+6>: nop
0x10001180 <(*runtime.spinLock).Lock+8>: ldaex r2, [r0]
0x10001184 <(*runtime.spinLock).Lock+12>: cbnz r2, 0x10001190 <(*runtime.spinLock).Lock+24>
0x10001186 <(*runtime.spinLock).Lock+14>: stlex r2, r1, [r0]
0x1000118a <(*runtime.spinLock).Lock+18>: cmp r2, #0
0x1000118c <(*runtime.spinLock).Lock+20>: bne.n 0x10001180 <(*runtime.spinLock).Lock+8>
0x1000118e <(*runtime.spinLock).Lock+22>: b.n 0x100011d2 <(*runtime.spinLock).Lock+90>
0x10001190 <(*runtime.spinLock).Lock+24>: clrex
0x10001194 <(*runtime.spinLock).Lock+28>: ldaex r2, [r0]
0x10001198 <(*runtime.spinLock).Lock+32>: cbnz r2, 0x100011a4 <(*runtime.spinLock).Lock+44>
0x1000119a <(*runtime.spinLock).Lock+34>: stlex r2, r1, [r0]
0x1000119e <(*runtime.spinLock).Lock+38>: cmp r2, #0
0x100011a0 <(*runtime.spinLock).Lock+40>: bne.n 0x10001194 <(*runtime.spinLock).Lock+28>
0x100011a2 <(*runtime.spinLock).Lock+42>: b.n 0x100011d2 <(*runtime.spinLock).Lock+90>
0x100011a4 <(*runtime.spinLock).Lock+44>: clrex
0x100011a8 <(*runtime.spinLock).Lock+48>: ldaex r2, [r0]
0x100011ac <(*runtime.spinLock).Lock+52>: cbnz r2, 0x100011b8 <(*runtime.spinLock).Lock+64>
0x100011ae <(*runtime.spinLock).Lock+54>: stlex r2, r1, [r0]
0x100011b2 <(*runtime.spinLock).Lock+58>: cmp r2, #0
0x100011b4 <(*runtime.spinLock).Lock+60>: bne.n 0x100011a8 <(*runtime.spinLock).Lock+48>
0x100011b6 <(*runtime.spinLock).Lock+62>: b.n 0x100011d2 <(*runtime.spinLock).Lock+90>
0x100011b8 <(*runtime.spinLock).Lock+64>: clrex
0x100011bc <(*runtime.spinLock).Lock+68>: ldaex r2, [r0]
0x100011c0 <(*runtime.spinLock).Lock+72>: cbnz r2, 0x100011cc <(*runtime.spinLock).Lock+84>
0x100011c2 <(*runtime.spinLock).Lock+74>: stlex r2, r1, [r0]
0x100011c6 <(*runtime.spinLock).Lock+78>: cmp r2, #0
0x100011c8 <(*runtime.spinLock).Lock+80>: bne.n 0x100011bc <(*runtime.spinLock).Lock+68>
0x100011ca <(*runtime.spinLock).Lock+82>: b.n 0x100011d2 <(*runtime.spinLock).Lock+90>
0x100011cc <(*runtime.spinLock).Lock+84>: clrex
0x100011d0 <(*runtime.spinLock).Lock+88>: b.n 0x10001180 <(*runtime.spinLock).Lock+8>
0x100011d2 <(*runtime.spinLock).Lock+90>: dmb sy
0x100011d6 <(*runtime.spinLock).Lock+94>: bx lr
0x100011d8 <(*runtime.spinLock).Lock+96>: bl 0x10001438 <runtime.nilPanic>
That version:
0x10001178 <(*runtime.spinLock).Lock+0>: cbz r0, 0x100011ae <(*runtime.spinLock).Lock+54>
0x1000117a <(*runtime.spinLock).Lock+2>: adds r0, #4
0x1000117c <(*runtime.spinLock).Lock+4>: movs r1, #1
0x1000117e <(*runtime.spinLock).Lock+6>: nop
0x10001180 <(*runtime.spinLock).Lock+8>: ldaex r2, [r0]
0x10001184 <(*runtime.spinLock).Lock+12>: cbnz r2, 0x10001192 <(*runtime.spinLock).Lock+26>
0x10001186 <(*runtime.spinLock).Lock+14>: stlex r2, r1, [r0]
0x1000118a <(*runtime.spinLock).Lock+18>: cmp r2, #0
0x1000118c <(*runtime.spinLock).Lock+20>: it eq
0x1000118e <(*runtime.spinLock).Lock+22>: bxeq lr
0x10001190 <(*runtime.spinLock).Lock+24>: b.n 0x10001180 <(*runtime.spinLock).Lock+8>
0x10001192 <(*runtime.spinLock).Lock+26>: movs r1, #1
0x10001194 <(*runtime.spinLock).Lock+28>: clrex
0x10001198 <(*runtime.spinLock).Lock+32>: wfe
0x1000119a <(*runtime.spinLock).Lock+34>: nop
0x1000119c <(*runtime.spinLock).Lock+36>: ldaex r2, [r0]
0x100011a0 <(*runtime.spinLock).Lock+40>: cmp r2, #0
0x100011a2 <(*runtime.spinLock).Lock+42>: bne.n 0x10001194 <(*runtime.spinLock).Lock+28>
0x100011a4 <(*runtime.spinLock).Lock+44>: stlex r2, r1, [r0]
0x100011a8 <(*runtime.spinLock).Lock+48>: cmp r2, #0
0x100011aa <(*runtime.spinLock).Lock+50>: bne.n 0x1000119c <(*runtime.spinLock).Lock+36>
0x100011ac <(*runtime.spinLock).Lock+52>: bx lr
0x100011ae <(*runtime.spinLock).Lock+54>: bl 0x10001414 <runtime.nilPanic>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed this one isn't doing a memory barrier at the end. I assume we'll want to add that to the rp2350 atomics, but I'm not sure where this implementation is coming from exactly. Is this generated from the arm assembly in the mainstream sync/atomic package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it's questionable, but because the Go receiver variable is always passed as the first parameter and r0 is where the first parameter will always be per the AAPCS this should always be consistent.
The AAPCS only applies on non-inlined externally available functions. That doesn't apply here. The compiler is free to inline these anywhere and use any register.
There are a few cases where you can rely on the calling convention, but this is not one of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah, I didn't consider inlining, that could have been problematic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even without inlining there is no reason the compiler would be required to keep state
in any particular register. It might have picked any register, you're lucky it picked r0 in this case. In fact I had expected it would have optimized it out entirely.
src/runtime/runtime_rp2350.go
Outdated
arm.Asm("1:") | ||
// Exclusively load (lock) the state byte and put its value in r2. | ||
arm.Asm("ldaexb r2, [r0]") | ||
// Set the r1 register to '1' for later use. | ||
arm.Asm("movs r1, #1") | ||
// Check if the lock was already taken (r2 != 0). | ||
arm.Asm("cmp r2, #0") | ||
// Jump back to the loop start ("1:") if the lock is already held. | ||
arm.Asm("bne 1b") | ||
|
||
// Attempt to store '1' into the lock state byte. | ||
// The return code (0 for success, 1 for failure) is placed in r2. | ||
arm.Asm("strexb r2, r1, [r0]") | ||
// Check if the result was successful (r2 == 0). | ||
arm.Asm("cmp r2, #0") | ||
// Jump back to the loop start ("1:") if the lock was not acquired. | ||
arm.Asm("bne 1b") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With register binding through Cgo assembly, it seems to me that the assembly can be cut down to just the special instructions (ldaexb
and strexb
) and the rest kept in Go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you talking about the arm.AsmFull()
functions? I originally tried that, but it doesn't allow passing through pointer types for some reason (has a note about having been removed in v0.23.0)
If you mean something else I'm curious though. Initial google results aren't bringing up much
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm talking about "Inline assembly using CGo": https://tinygo.org/docs/concepts/compiler-internals/inline-assembly/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inline assembly through CGo is indeed an option, and can make the code slightly more efficient. I recommend reading this page to get an understanding of how it works: http://www.ethernut.de/en/documents/arm-inline-asm.html
Other than that, it's just standard CGo. You can make the function static
and put it directly in the Go file like so:
// static void spinlock_lock(unsigned *lock) { ... }
import "C"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment below.
Also,
Another thing I noticed in that section, they also always disable interrupts when the spinlocks are being held, we may want to do the same: [...]
The runtime does this in various places if needed. The spinlock implementation doesn't need to disable interrupts too.
src/runtime/runtime_rp2350.go
Outdated
// r0 is automatically filled with the pointer value "l" here. | ||
// We create a variable to permit access to the state byte (l.state) and | ||
// avoid a memory fault when accessing it in assembly. | ||
state := &l.state | ||
_ = state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, don't do this. This will break eventually. The compiler is free to put it in any register it likes, store it on the stack, whatever.
Also, all the assembly instructions below are independent from a compiler POV so the compiler is free to modify registers between them if it wants to (it probably won't, but it would be allowed to).
A much better way would be to use atomic operations directly, and with that I mean sync/atomic. See the section I posted before:
tinygo/src/runtime/runtime_tinygoriscv_qemu.go
Lines 360 to 375 in 3869f76
func (l *spinLock) Lock() { | |
// Try to replace 0 with 1. Once we succeed, the lock has been acquired. | |
for !l.Uint32.CompareAndSwap(0, 1) { | |
spinLoopWait() | |
} | |
} | |
func (l *spinLock) Unlock() { | |
// Safety check: the spinlock should have been locked. | |
if schedulerAsserts && l.Uint32.Load() != 1 { | |
runtimePanic("unlock of unlocked spinlock") | |
} | |
// Unlock the lock. Simply write 0, because we already know it is locked. | |
l.Uint32.Store(0) | |
} |
This should result in similar assembly, and if it doesn't we'd have to investigate why.
Thank you for tracking those down. I figured there would probably be some, but I couldn't find them |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than a nit, this looks good to me. @aykevl WDYT?
@@ -297,31 +297,6 @@ var ( | |||
futexLock = spinLock{id: 3} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
id
is an implementation detail of rp2040 spinlocks, whereas on rp2350 id
s have some meaning but are unusued. I suggest moving the spinlock variables to the rpXXXX.go files and avoid the id
field on rp2350.
|
||
type spinLock struct { | ||
atomic.Uint32 | ||
id uint8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete field.
As it turns out, the RP2350 has hardware spinlocks that can be unlocked by writes to nearby addresses, the lower spinlocks currently in use in TinyGo happen to be unlocked by writes to the doorbell interrupt registers used to signal between cores, very possibly leading to some unexpected unlocks. This was not corrected in the A3 or A4 steppings and instead software spinlocks are used by default on RP2350 in pico-sdk:
https://www.raspberrypi.com/documentation/pico-sdk/hardware.html#group_hardware_sync
Another thing I noticed in that section, they also always disable interrupts when the spinlocks are being held, we may want to do the same:
These are the software spinlock macros ported over:
https://github.com/raspberrypi/pico-sdk/blob/2.2.0/src/rp2_common/hardware_sync_spin_lock/include/hardware/sync/spin_lock.h#L112
https://github.com/raspberrypi/pico-sdk/blob/2.2.0/src/rp2_common/hardware_sync_spin_lock/include/hardware/sync/spin_lock.h#L197