-
Notifications
You must be signed in to change notification settings - Fork 18k
cmd/compile: avoiding zeroing new allocations when possible #24926
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
We already inline memmove for small constant size in generic.rules. |
There are two problems with doing it with rules. (1) What we need to do is modify the OpStaticCall's Aux value and leave the overall set of values unchanged; that's hard to do with rules. The first half you can fake by having a condition that has a side-effect, but the second half is not currently possible in a clean way. Maybe we could have a magic "origv" RHS value or some such. (2) For the |
Hmm. Another possible use for an Consider something like: func f(b []byte) int {
s := string(b[:4])
return len(s) // or do something else non-escaping with s
} This ends up compiling to This optimization is hard to do during walk, because |
cc @randall77 for opinions in general about this |
please also do this for |
I'd like to see removal of zeroing if we know the object will be completely overwritten. |
Re: ptr-containing objects, do you think #24928 would do the trick? |
I'm not sure. I think it's probably faster to zero a whole object than to selectively zero fields. Unless there are very large sections which don't have pointers, but that seems rare, and detecting subsequent complete overwriting of such objects also seems hard. |
related: golang#24926 Change-Id: I3015ccf2ad9b3fd7f5db7c31806b95aebeab61e1
related: golang#24926 Change-Id: I3015ccf2ad9b3fd7f5db7c31806b95aebeab61e1
Regarding the original topic, another approach is to generate different code in the front-end. See |
related: golang#24926 Change-Id: I3015ccf2ad9b3fd7f5db7c31806b95aebeab61e1
related: golang#24926 Change-Id: I3015ccf2ad9b3fd7f5db7c31806b95aebeab61e1
Change https://golang.org/cl/367496 mentions this issue: |
I mentioned some benchmark results in the CL comment. Posting the benchmarks themselves here. package example
import (
"testing"
)
type smallStruct struct {
f0 int64
f1 float64
f2 int32
f3 int32
}
type averageStruct struct {
f0 uint64
flags [8]bool
small [4]smallStruct
}
//go:noinline
func newSmallStruct(f0 int64, f1 float64, f2, f3 int32) *smallStruct {
return &smallStruct{
f0: f0,
f1: f1,
f2: f2,
f3: f3,
}
}
//go:noinline
func newAverageStruct(v uint64, s0, s1, s2, s3 *smallStruct) *averageStruct {
return &averageStruct{
f0: v,
flags: [8]bool{true, true, true, true, true, true, true, true},
small: [4]smallStruct{*s0, *s1, *s2, *s3},
}
}
//go:noinline
func newInt(v int) *int {
p := new(int)
*p = v
return p
}
//go:noinline
func newIntSlice1(v0 int) []int {
return []int{v0}
}
//go:noinline
func newIntSlice4(v0, v1, v2, v3 int) []int {
return []int{v0, v1, v2, v3}
}
//go:noinline
func newIntSlice8(v0, v1, v2, v3, v4, v5, v6, v7 int) []int {
return []int{v0, v1, v2, v3, v4, v5, v6, v7}
}
//go:noinline
func newIntSlice32(v0, v1, v2, v3, v4, v5, v6, v7 int) []int {
return []int{
v0, v1, v2, v3, v4, v5, v6, v7,
v0, v1, v2, v3, v4, v5, v6, v7,
v0, v1, v2, v3, v4, v5, v6, v7,
v0, v1, v2, v3, v4, v5, v6, v7,
}
}
func BenchmarkNewInt(b *testing.B) {
for i := 0; i < b.N; i++ {
newInt(i)
}
}
func BenchmarkNewIntSlice1(b *testing.B) {
for i := 0; i < b.N; i++ {
newIntSlice1(i)
}
}
func BenchmarkNewIntSlice4(b *testing.B) {
for i := 0; i < b.N; i++ {
newIntSlice4(i, 0, i, 0)
}
}
func BenchmarkNewIntSlice8(b *testing.B) {
for i := 0; i < b.N; i++ {
newIntSlice8(i, 0, i, 0, i, 0, i, 0)
}
}
func BenchmarkNewIntSlice32(b *testing.B) {
for i := 0; i < b.N; i++ {
newIntSlice32(i, 0, i, 0, i, 0, i, 0)
}
}
func BenchmarkNewSmallStruct(b *testing.B) {
for i := 0; i < b.N; i++ {
newSmallStruct(int64(i), 1.5, 1, 2)
}
}
func BenchmarkNewAverageStruct(b *testing.B) {
var s [4]smallStruct
for i := 0; i < b.N; i++ {
newAverageStruct(uint64(i), &s[0], &s[1], &s[2], &s[3])
}
} |
Consider:
The first line gets translated into
x = newobject(type-of-int64)
, which callsmallocgc
with a "needszero" argument of true. But it doesn't need zeroing: it has no pointers, and data gets written to the whole thing.Same holds for:
and more interestingly:
We could detect such scenarios in the SSA backend and replace the call to
newobject
to a call to a (newly created)newobjectNoClr
, which is identical tonewobject
except that it passesfalse
tomallocgc
forneedszero
.Aside: The SSA backend already understands
newobject
a little. It removes the pointless zero assignment from:although not from:
Converting to
newobjectNoClr
would probably require a new SSA pass, in which we put values in store order, detect calls tonewobject
, and then check whether subsequent stores obviate the need for zeroing. And also at the same time eliminate unnecessary zeroing that the existing rewrite rules don't cover.This new SSA pass might also someday grow to understand and rewrite e.g. calls to
memmove
andmemequal
with small constant sizes.It is not obvious to me that this pass would pull its weight, compilation-time-wise. Needs experimentation. Filing an issue so that I don't forget about it. :)
The text was updated successfully, but these errors were encountered: