Closed
Description
Sometimes compiler generates a runtime.newobject(t)
call where t
size is statically known to be 0.
That call would return &runtime.zerobase
:
Lines 809 to 816 in c043fc4
While new(zeroSizedType)
case is not very interesting, empty slice literals also emit a call to newobject
(see below).
Instead of generating runtime.newobject
call, compiler could insert the returned expression itself.
Impact on performance can be measured by this simple benchmark:
package benchmark
import (
"testing"
)
var sinkStruct *struct{}
var sinkSlice []int
func BenchmarkNew(b *testing.B) {
for i := 0; i < b.N; i++ {
sinkStruct = new(struct{})
}
}
func BenchmarkSliceLit(b *testing.B) {
for i := 0; i < b.N; i++ {
sinkSlice = []int{}
}
}
name old time/op new time/op delta
New-8 8.39ns ± 0% 1.29ns ± 6% -84.59% (p=0.000 n=9+10)
SliceLit-8 8.80ns ± 0% 1.88ns ± 0% -78.63% (p=0.000 n=9+9)
The impact on the code size is also positive.
func newSlice() []int { return []int{} }
Old generated code for newSlice
(amd64/linux):
"".newSlice STEXT size=80 args=0x18 locals=0x18
0x0000 00000 (foo.go:11) TEXT "".newSlice(SB), ABIInternal, $24-24
0x0000 00000 (foo.go:11) MOVQ (TLS), CX
0x0009 00009 (foo.go:11) CMPQ SP, 16(CX)
0x000d 00013 (foo.go:11) JLS 73
0x000f 00015 (foo.go:11) SUBQ $24, SP
0x0013 00019 (foo.go:11) MOVQ BP, 16(SP)
0x0018 00024 (foo.go:11) LEAQ 16(SP), BP
0x001d 00029 (foo.go:11) FUNCDATA $0, gclocals·9fb7f0986f647f17cb53dda1484e0f7a(SB)
0x001d 00029 (foo.go:11) FUNCDATA $1, gclocals·69c1753bd5f81501d95132d08af04464(SB)
0x001d 00029 (foo.go:11) FUNCDATA $3, gclocals·9fb7f0986f647f17cb53dda1484e0f7a(SB)
0x001d 00029 (foo.go:12) PCDATA $2, $1
0x001d 00029 (foo.go:12) PCDATA $0, $0
0x001d 00029 (foo.go:12) LEAQ type.[0]int(SB), AX
0x0024 00036 (foo.go:12) PCDATA $2, $0
0x0024 00036 (foo.go:12) MOVQ AX, (SP)
0x0028 00040 (foo.go:12) CALL runtime.newobject(SB)
0x002d 00045 (foo.go:12) PCDATA $2, $1
0x002d 00045 (foo.go:12) MOVQ 8(SP), AX
0x0032 00050 (foo.go:12) PCDATA $2, $0
0x0032 00050 (foo.go:12) PCDATA $0, $1
0x0032 00050 (foo.go:12) MOVQ AX, "".~r0+32(SP)
0x0037 00055 (foo.go:12) XORPS X0, X0
0x003a 00058 (foo.go:12) MOVUPS X0, "".~r0+40(SP)
0x003f 00063 (foo.go:12) MOVQ 16(SP), BP
0x0044 00068 (foo.go:12) ADDQ $24, SP
0x0048 00072 (foo.go:12) RET
0x0049 00073 (foo.go:12) NOP
0x0049 00073 (foo.go:11) PCDATA $0, $-1
0x0049 00073 (foo.go:11) PCDATA $2, $-1
0x0049 00073 (foo.go:11) CALL runtime.morestack_noctxt(SB)
0x004e 00078 (foo.go:11) JMP 0
New generated code for newSlice
:
"".newSlice STEXT nosplit size=21 args=0x18 locals=0x0
0x0000 00000 (foo.go:10) TEXT "".newSlice(SB), NOSPLIT|ABIInternal, $0-24
0x0000 00000 (foo.go:10) FUNCDATA $0, gclocals·9fb7f0986f647f17cb53dda1484e0f7a(SB)
0x0000 00000 (foo.go:10) FUNCDATA $1, gclocals·69c1753bd5f81501d95132d08af04464(SB)
0x0000 00000 (foo.go:10) FUNCDATA $3, gclocals·9fb7f0986f647f17cb53dda1484e0f7a(SB)
0x0000 00000 (foo.go:11) PCDATA $2, $1
0x0000 00000 (foo.go:11) PCDATA $0, $1
0x0000 00000 (foo.go:11) LEAQ runtime.zerobase(SB), AX
0x0007 00007 (foo.go:11) PCDATA $2, $0
0x0007 00007 (foo.go:11) MOVQ AX, "".~r0+8(SP)
0x000c 00012 (foo.go:11) XORPS X0, X0
0x000f 00015 (foo.go:11) MOVUPS X0, "".~r0+16(SP)
0x0014 00020 (foo.go:11) RET
The important part is that there is no more call to runtime.newobject(SB)
.
I'll send a CL with that optimization applied.
Activity
gopherbot commentedon Dec 28, 2018
Change https://golang.org/cl/155840 mentions this issue:
cmd/compile: don't generate newobject call for 0-sized types
mvdan commentedon Dec 28, 2018
What's the size impact on a large Go binary like
cmd/go
? Please include that little stat in the commit message too.quasilyte commentedon Dec 28, 2018
@mvdan, for
cmd/go
there is 0 change in code size.Optimization does trigger several times during compilation, but it doesn't seem to have an effect on that particular binary.
mvdan commentedon Dec 28, 2018
Huh, I'd expect this to happen often and to shave off at least a few kilobytes from most large binaries. Did you check if the compiled binary changes at all?
quasilyte commentedon Dec 28, 2018
cmd/go
does not have any diff.cmd/gofmt
, however, does hove some.josharian commentedon Dec 29, 2018
I have a handful of newobject changes, including this one, in my tree. I didn’t mail this one because it basically never triggers. Others include specializing newobject in various ways. I didn’t mail the rest because I think we should move newobject to SSA construction world first.
josharian commentedon Dec 30, 2018
To be clear, I’m game to see the optimization go in. But I do think we should move that code before it gets too complex. I peeked at the other things I was playing with. One was a specialized newstring, which doesn’t need a typ arg. (Use SoleComponent for best effect.) Another was for newobject for SSA-able types containing no pointers. In that case you can allocate without zeroing and then zero on the caller side, in the hope that that zeroing will be optimized away in favor of later writes. Just in case you wanted to see either of those through. :) One minor complication is that newobject is treated as special throughout SSA world.
mvdan commentedon Dec 30, 2018
I'm a bit confused. I would imagine that statements like
sinkSlice = []int{}
in the benchmark above would be very common.Also, if this optimization basically never triggers, how come
gofmt
got a bit smaller?quasilyte commentedon Dec 30, 2018
It does trigger for empty slices as well as
new(T)
calls where T size is 0.The latter can be the 0 frequency case, but maybe empty slices were not reduced to
newobject
call previously?offtopic
2019 is coming 🎄 :)josharian commentedon Mar 17, 2019
https://go-review.googlesource.com/c/go/+/167957 moves newobject handling to ssa conversion.