Skip to content

Conversation

mikesmitty
Copy link
Contributor

@mikesmitty mikesmitty commented Sep 10, 2025

As it turns out, the RP2350 has hardware spinlocks that can be unlocked by writes to nearby addresses, the lower spinlocks currently in use in TinyGo happen to be unlocked by writes to the doorbell interrupt registers used to signal between cores, very possibly leading to some unexpected unlocks. This was not corrected in the A3 or A4 steppings and instead software spinlocks are used by default on RP2350 in pico-sdk:

RP2350 Warning. Due to erratum RP2350-E2, writes to new SIO registers above an offset of +0x180 alias the spinlocks, causing spurious lock releases. This SDK by default use atomic memory accesses to implement the hardware_sync_spin_lock API, as a workaround on RP2350 A2.

https://www.raspberrypi.com/documentation/pico-sdk/hardware.html#group_hardware_sync

Another thing I noticed in that section, they also always disable interrupts when the spinlocks are being held, we may want to do the same:

[...] the default spinlock related methods here (e.g. spin_lock_blocking) always disable interrupts while the lock is held as use by IRQ handlers and user code is common/desirable, and spin locks are only expected to be held for brief periods.

These are the software spinlock macros ported over:
https://github.com/raspberrypi/pico-sdk/blob/2.2.0/src/rp2_common/hardware_sync_spin_lock/include/hardware/sync/spin_lock.h#L112
https://github.com/raspberrypi/pico-sdk/blob/2.2.0/src/rp2_common/hardware_sync_spin_lock/include/hardware/sync/spin_lock.h#L197

@mikesmitty
Copy link
Contributor Author

Oh, whoops. My go fmt extension has been flaking out on me. Will have the missing rp2040 imports updated in a moment. Here's the lock/unlock disassembled output with inlining disabled:

   0x10001178 <(*runtime.spinLock).Lock+0>:     cbz     r0, 0x10001192 <(*runtime.spinLock).Lock+26>
   0x1000117a <(*runtime.spinLock).Lock+2>:     ldaexb  r2, [r0]
   0x1000117e <(*runtime.spinLock).Lock+6>:     movs    r1, #1
   0x10001180 <(*runtime.spinLock).Lock+8>:     cmp     r2, #0
   0x10001182 <(*runtime.spinLock).Lock+10>:    bne.n   0x1000117a <(*runtime.spinLock).Lock+2>
   0x10001184 <(*runtime.spinLock).Lock+12>:    strexb  r2, r1, [r0]
   0x10001188 <(*runtime.spinLock).Lock+16>:    cmp     r2, #0
   0x1000118a <(*runtime.spinLock).Lock+18>:    bne.n   0x1000117a <(*runtime.spinLock).Lock+2>
   0x1000118c <(*runtime.spinLock).Lock+20>:    dmb     sy
   0x10001190 <(*runtime.spinLock).Lock+24>:    bx      lr
   0x10001192 <(*runtime.spinLock).Lock+26>:    bl      0x100013f4 <runtime.nilPanic>
   0x100013e4 <(*runtime.spinLock).Unlock+0>:   cbz     r0, 0x100013ee <(*runtime.spinLock).Unlock+10>
   0x100013e6 <(*runtime.spinLock).Unlock+2>:   movs    r1, #0
   0x100013e8 <(*runtime.spinLock).Unlock+4>:   stlb    r1, [r0]
   0x100013ec <(*runtime.spinLock).Unlock+8>:   bx      lr
   0x100013ee <(*runtime.spinLock).Unlock+10>:  bl      0x100013f4 <runtime.nilPanic>

Copy link
Contributor

@eliasnaur eliasnaur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I support the switch to atomic instructions, but if you need something that works right away, RP2350-E2 mentions that some spinlocks are not affected:

The following SIO spinlocks can be used normally because they don’t alias with writable registers: 5, 6, 7,
10, 11, and 18 through 31. Some of the other lock addresses may be used safely depending on which of
the high-addressed SIO registers are in use.
Locks 18 through 24 alias with some read-only TMDS encoder registers, which is safe as only writes are
mis-decoded.

Comment on lines 31 to 35
// r0 is automatically filled with the pointer value "l" here.
// We create a variable to permit access to the state byte (l.state) and
// avoid a memory fault when accessing it in assembly.
state := &l.state
_ = state
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hoping state ends up in r0 seems brittle to me, and I'm surprised the compiler doesn't optimize it away. Are you sure you can't bind state to an asm register a better way? https://tinygo.org/docs/concepts/compiler-internals/inline-assembly/ mentions that Cgo assembly is more full-featured and also inlined.

Copy link
Member

@aykevl aykevl Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, don't do this. This will break eventually. The compiler is free to put it in any register it likes, store it on the stack, whatever.
Also, all the assembly instructions below are independent from a compiler POV so the compiler is free to modify registers between them if it wants to (it probably won't, but it would be allowed to).

A much better way would be to use atomic operations directly, and with that I mean sync/atomic. See the section I posted before:

func (l *spinLock) Lock() {
// Try to replace 0 with 1. Once we succeed, the lock has been acquired.
for !l.Uint32.CompareAndSwap(0, 1) {
spinLoopWait()
}
}
func (l *spinLock) Unlock() {
// Safety check: the spinlock should have been locked.
if schedulerAsserts && l.Uint32.Load() != 1 {
runtimePanic("unlock of unlocked spinlock")
}
// Unlock the lock. Simply write 0, because we already know it is locked.
l.Uint32.Store(0)
}

This should result in similar assembly, and if it doesn't we'd have to investigate why.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's questionable, but because the Go receiver variable is always passed as the first parameter and r0 is where the first parameter will always be per the AAPCS this should always be consistent. I won't die on that hill if atomics can produce a similar enough result though. I don't recall what method I was testing, but the atomic lock method I was initially looking at disassembled to about 4x as long as this, hence the hacky setup. Looking at that one though, it's only ~2x the size which seems reasonable to me.

Hacky version:

   0x10001178 <(*runtime.spinLock).Lock+0>:     cbz     r0, 0x10001192 <(*runtime.spinLock).Lock+26>
   0x1000117a <(*runtime.spinLock).Lock+2>:     ldaexb  r2, [r0]
   0x1000117e <(*runtime.spinLock).Lock+6>:     movs    r1, #1
   0x10001180 <(*runtime.spinLock).Lock+8>:     cmp     r2, #0
   0x10001182 <(*runtime.spinLock).Lock+10>:    bne.n   0x1000117a <(*runtime.spinLock).Lock+2>
   0x10001184 <(*runtime.spinLock).Lock+12>:    strexb  r2, r1, [r0]
   0x10001188 <(*runtime.spinLock).Lock+16>:    cmp     r2, #0
   0x1000118a <(*runtime.spinLock).Lock+18>:    bne.n   0x1000117a <(*runtime.spinLock).Lock+2>
   0x1000118c <(*runtime.spinLock).Lock+20>:    dmb     sy
   0x10001190 <(*runtime.spinLock).Lock+24>:    bx      lr
   0x10001192 <(*runtime.spinLock).Lock+26>:    bl      0x100013f4 <runtime.nilPanic>

Extensive version:

   0x10001178 <(*runtime.spinLock).Lock+0>:     cbz     r0, 0x100011d8 <(*runtime.spinLock).Lock+96>
   0x1000117a <(*runtime.spinLock).Lock+2>:     adds    r0, #4
   0x1000117c <(*runtime.spinLock).Lock+4>:     movs    r1, #1
   0x1000117e <(*runtime.spinLock).Lock+6>:     nop
   0x10001180 <(*runtime.spinLock).Lock+8>:     ldaex   r2, [r0]
   0x10001184 <(*runtime.spinLock).Lock+12>:    cbnz    r2, 0x10001190 <(*runtime.spinLock).Lock+24>
   0x10001186 <(*runtime.spinLock).Lock+14>:    stlex   r2, r1, [r0]
   0x1000118a <(*runtime.spinLock).Lock+18>:    cmp     r2, #0
   0x1000118c <(*runtime.spinLock).Lock+20>:    bne.n   0x10001180 <(*runtime.spinLock).Lock+8>
   0x1000118e <(*runtime.spinLock).Lock+22>:    b.n     0x100011d2 <(*runtime.spinLock).Lock+90>
   0x10001190 <(*runtime.spinLock).Lock+24>:    clrex
   0x10001194 <(*runtime.spinLock).Lock+28>:    ldaex   r2, [r0]
   0x10001198 <(*runtime.spinLock).Lock+32>:    cbnz    r2, 0x100011a4 <(*runtime.spinLock).Lock+44>
   0x1000119a <(*runtime.spinLock).Lock+34>:    stlex   r2, r1, [r0]
   0x1000119e <(*runtime.spinLock).Lock+38>:    cmp     r2, #0
   0x100011a0 <(*runtime.spinLock).Lock+40>:    bne.n   0x10001194 <(*runtime.spinLock).Lock+28>
   0x100011a2 <(*runtime.spinLock).Lock+42>:    b.n     0x100011d2 <(*runtime.spinLock).Lock+90>
   0x100011a4 <(*runtime.spinLock).Lock+44>:    clrex
   0x100011a8 <(*runtime.spinLock).Lock+48>:    ldaex   r2, [r0]
   0x100011ac <(*runtime.spinLock).Lock+52>:    cbnz    r2, 0x100011b8 <(*runtime.spinLock).Lock+64>
   0x100011ae <(*runtime.spinLock).Lock+54>:    stlex   r2, r1, [r0]
   0x100011b2 <(*runtime.spinLock).Lock+58>:    cmp     r2, #0
   0x100011b4 <(*runtime.spinLock).Lock+60>:    bne.n   0x100011a8 <(*runtime.spinLock).Lock+48>
   0x100011b6 <(*runtime.spinLock).Lock+62>:    b.n     0x100011d2 <(*runtime.spinLock).Lock+90>
   0x100011b8 <(*runtime.spinLock).Lock+64>:    clrex
   0x100011bc <(*runtime.spinLock).Lock+68>:    ldaex   r2, [r0]
   0x100011c0 <(*runtime.spinLock).Lock+72>:    cbnz    r2, 0x100011cc <(*runtime.spinLock).Lock+84>
   0x100011c2 <(*runtime.spinLock).Lock+74>:    stlex   r2, r1, [r0]
   0x100011c6 <(*runtime.spinLock).Lock+78>:    cmp     r2, #0
   0x100011c8 <(*runtime.spinLock).Lock+80>:    bne.n   0x100011bc <(*runtime.spinLock).Lock+68>
   0x100011ca <(*runtime.spinLock).Lock+82>:    b.n     0x100011d2 <(*runtime.spinLock).Lock+90>
   0x100011cc <(*runtime.spinLock).Lock+84>:    clrex
   0x100011d0 <(*runtime.spinLock).Lock+88>:    b.n     0x10001180 <(*runtime.spinLock).Lock+8>
   0x100011d2 <(*runtime.spinLock).Lock+90>:    dmb     sy
   0x100011d6 <(*runtime.spinLock).Lock+94>:    bx      lr
   0x100011d8 <(*runtime.spinLock).Lock+96>:    bl      0x10001438 <runtime.nilPanic>

That version:

   0x10001178 <(*runtime.spinLock).Lock+0>:     cbz     r0, 0x100011ae <(*runtime.spinLock).Lock+54>
   0x1000117a <(*runtime.spinLock).Lock+2>:     adds    r0, #4
   0x1000117c <(*runtime.spinLock).Lock+4>:     movs    r1, #1
   0x1000117e <(*runtime.spinLock).Lock+6>:     nop
   0x10001180 <(*runtime.spinLock).Lock+8>:     ldaex   r2, [r0]
   0x10001184 <(*runtime.spinLock).Lock+12>:    cbnz    r2, 0x10001192 <(*runtime.spinLock).Lock+26>
   0x10001186 <(*runtime.spinLock).Lock+14>:    stlex   r2, r1, [r0]
   0x1000118a <(*runtime.spinLock).Lock+18>:    cmp     r2, #0
   0x1000118c <(*runtime.spinLock).Lock+20>:    it      eq
   0x1000118e <(*runtime.spinLock).Lock+22>:    bxeq    lr
   0x10001190 <(*runtime.spinLock).Lock+24>:    b.n     0x10001180 <(*runtime.spinLock).Lock+8>
   0x10001192 <(*runtime.spinLock).Lock+26>:    movs    r1, #1
   0x10001194 <(*runtime.spinLock).Lock+28>:    clrex
   0x10001198 <(*runtime.spinLock).Lock+32>:    wfe
   0x1000119a <(*runtime.spinLock).Lock+34>:    nop
   0x1000119c <(*runtime.spinLock).Lock+36>:    ldaex   r2, [r0]
   0x100011a0 <(*runtime.spinLock).Lock+40>:    cmp     r2, #0
   0x100011a2 <(*runtime.spinLock).Lock+42>:    bne.n   0x10001194 <(*runtime.spinLock).Lock+28>
   0x100011a4 <(*runtime.spinLock).Lock+44>:    stlex   r2, r1, [r0]
   0x100011a8 <(*runtime.spinLock).Lock+48>:    cmp     r2, #0
   0x100011aa <(*runtime.spinLock).Lock+50>:    bne.n   0x1000119c <(*runtime.spinLock).Lock+36>
   0x100011ac <(*runtime.spinLock).Lock+52>:    bx      lr
   0x100011ae <(*runtime.spinLock).Lock+54>:    bl      0x10001414 <runtime.nilPanic>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed this one isn't doing a memory barrier at the end. I assume we'll want to add that to the rp2350 atomics, but I'm not sure where this implementation is coming from exactly. Is this generated from the arm assembly in the mainstream sync/atomic package?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's questionable, but because the Go receiver variable is always passed as the first parameter and r0 is where the first parameter will always be per the AAPCS this should always be consistent.

The AAPCS only applies on non-inlined externally available functions. That doesn't apply here. The compiler is free to inline these anywhere and use any register.

There are a few cases where you can rely on the calling convention, but this is not one of them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, I didn't consider inlining, that could have been problematic.

Copy link
Member

@aykevl aykevl Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even without inlining there is no reason the compiler would be required to keep state in any particular register. It might have picked any register, you're lucky it picked r0 in this case. In fact I had expected it would have optimized it out entirely.

Comment on lines 38 to 54
arm.Asm("1:")
// Exclusively load (lock) the state byte and put its value in r2.
arm.Asm("ldaexb r2, [r0]")
// Set the r1 register to '1' for later use.
arm.Asm("movs r1, #1")
// Check if the lock was already taken (r2 != 0).
arm.Asm("cmp r2, #0")
// Jump back to the loop start ("1:") if the lock is already held.
arm.Asm("bne 1b")

// Attempt to store '1' into the lock state byte.
// The return code (0 for success, 1 for failure) is placed in r2.
arm.Asm("strexb r2, r1, [r0]")
// Check if the result was successful (r2 == 0).
arm.Asm("cmp r2, #0")
// Jump back to the loop start ("1:") if the lock was not acquired.
arm.Asm("bne 1b")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With register binding through Cgo assembly, it seems to me that the assembly can be cut down to just the special instructions (ldaexb and strexb) and the rest kept in Go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you talking about the arm.AsmFull() functions? I originally tried that, but it doesn't allow passing through pointer types for some reason (has a note about having been removed in v0.23.0)
If you mean something else I'm curious though. Initial google results aren't bringing up much

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm talking about "Inline assembly using CGo": https://tinygo.org/docs/concepts/compiler-internals/inline-assembly/

Copy link
Member

@aykevl aykevl Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline assembly through CGo is indeed an option, and can make the code slightly more efficient. I recommend reading this page to get an understanding of how it works: http://www.ethernut.de/en/documents/arm-inline-asm.html

Other than that, it's just standard CGo. You can make the function static and put it directly in the Go file like so:

// static void spinlock_lock(unsigned *lock) { ... }
import "C"

Copy link
Member

@aykevl aykevl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment below.

Also,

Another thing I noticed in that section, they also always disable interrupts when the spinlocks are being held, we may want to do the same: [...]

The runtime does this in various places if needed. The spinlock implementation doesn't need to disable interrupts too.

Comment on lines 31 to 35
// r0 is automatically filled with the pointer value "l" here.
// We create a variable to permit access to the state byte (l.state) and
// avoid a memory fault when accessing it in assembly.
state := &l.state
_ = state
Copy link
Member

@aykevl aykevl Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, don't do this. This will break eventually. The compiler is free to put it in any register it likes, store it on the stack, whatever.
Also, all the assembly instructions below are independent from a compiler POV so the compiler is free to modify registers between them if it wants to (it probably won't, but it would be allowed to).

A much better way would be to use atomic operations directly, and with that I mean sync/atomic. See the section I posted before:

func (l *spinLock) Lock() {
// Try to replace 0 with 1. Once we succeed, the lock has been acquired.
for !l.Uint32.CompareAndSwap(0, 1) {
spinLoopWait()
}
}
func (l *spinLock) Unlock() {
// Safety check: the spinlock should have been locked.
if schedulerAsserts && l.Uint32.Load() != 1 {
runtimePanic("unlock of unlocked spinlock")
}
// Unlock the lock. Simply write 0, because we already know it is locked.
l.Uint32.Store(0)
}

This should result in similar assembly, and if it doesn't we'd have to investigate why.

@mikesmitty
Copy link
Contributor Author

I support the switch to atomic instructions, but if you need something that works right away, RP2350-E2 mentions that some spinlocks are not affected:

The following SIO spinlocks can be used normally because they don’t alias with writable registers: 5, 6, 7,
10, 11, and 18 through 31. Some of the other lock addresses may be used safely depending on which of
the high-addressed SIO registers are in use.
Locks 18 through 24 alias with some read-only TMDS encoder registers, which is safe as only writes are
mis-decoded.

Thank you for tracking those down. I figured there would probably be some, but I couldn't find them

Copy link
Contributor

@eliasnaur eliasnaur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than a nit, this looks good to me. @aykevl WDYT?

@@ -297,31 +297,6 @@ var (
futexLock = spinLock{id: 3}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id is an implementation detail of rp2040 spinlocks, whereas on rp2350 ids have some meaning but are unusued. I suggest moving the spinlock variables to the rpXXXX.go files and avoid the id field on rp2350.


type spinLock struct {
atomic.Uint32
id uint8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete field.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants