Description
We have to keep uintptrs and unsafe.Pointers separate, to get accurate stackmaps for the compiler. However, in some cases, this generates unnecessary register moves.
Here's the example from the runtime I'm looking at. mapaccess1_fast32
currently ends:
for {
for i, k := uintptr(0), b.keys(); i < bucketCnt; i, k = i+1, add(k, 4) {
if *(*uint32)(k) == key && b.tophash[i] != empty {
return add(unsafe.Pointer(b), dataOffset+bucketCnt*4+i*uintptr(t.valuesize))
}
}
b = b.overflow(t)
if b == nil {
return unsafe.Pointer(&zeroVal[0])
}
}
This has an unnecessary nil check of b in the inner loop when evaluating b.tophash
, so I'd like to change the outer loop structure to remove it:
for ; b != nil; b = b.overflow(t) {
for i, k := uintptr(0), b.keys(); i < bucketCnt; i, k = i+1, add(k, 4) {
if *(*uint32)(k) == key && b.tophash[i] != empty {
return add(unsafe.Pointer(b), dataOffset+bucketCnt*4+i*uintptr(t.valuesize))
}
}
}
return unsafe.Pointer(&zeroVal[0])
With this new structure, the nil check is gone, but we now have an extra register-register move, instruction 0x009f:
0x0096 00150 (hashmap_fast.go:42) MOVQ "".t+40(SP), CX
0x009b 00155 (hashmap_fast.go:42) MOVWLZX 84(CX), DX
0x009f 00159 (hashmap_fast.go:42) MOVQ AX, BX
0x00a2 00162 (hashmap_fast.go:42) LEAQ -8(BX)(DX*1), DX
0x00a7 00167 (hashmap_fast.go:42) TESTB AL, (CX)
0x00a9 00169 (hashmap_fast.go:42) MOVQ (DX), AX
0x00ac 00172 (hashmap_fast.go:42) TESTQ AX, AX
0x00af 00175 (hashmap_fast.go:42) JEQ 185
The register-register move is there because calculating b.overflow
involves a uintptr/unsafe.Pointer conversion, which gets translated into a MOVQconvert; regalloc allocates a register for the converted value. However, the register move is pointless; the destination register (BX) is used in an LEAQ instruction and is dead thereafter.
In general, it seems that we should be able to rewrite away some MOVQconverts when they are used once, immediately, as part of some pointer math, which is the typical usage. The hard part is making sure that the rewrite rules are safe.
This should help codegen for the runtime, which does lots of pointer arithmetic.
cc @randall77