Description
Consider:
func f() *int {
x := new(int)
*x = 1
return x
}
The first line gets translated into x = newobject(type-of-int64)
, which calls mallocgc
with a "needszero" argument of true. But it doesn't need zeroing: it has no pointers, and data gets written to the whole thing.
Same holds for:
func f() *[2]int {
x := new([2]int)
x[0] = 1
x[1] = 2
return x
}
and more interestingly:
func f() *[1024]int {
x := new([1024]int)
for i := range x {
x[i] = i
}
return x
}
We could detect such scenarios in the SSA backend and replace the call to newobject
to a call to a (newly created) newobjectNoClr
, which is identical to newobject
except that it passes false
to mallocgc
for needszero
.
Aside: The SSA backend already understands newobject
a little. It removes the pointless zero assignment from:
func f() *[2]int {
x := new([2]int)
x[0] = 0 // removed
return x
}
although not from:
func f() *[2]int {
x := new([2]int)
x[0] = 1
x[1] = 0 // not removed, but could be
return x
}
Converting to newobjectNoClr
would probably require a new SSA pass, in which we put values in store order, detect calls to newobject
, and then check whether subsequent stores obviate the need for zeroing. And also at the same time eliminate unnecessary zeroing that the existing rewrite rules don't cover.
This new SSA pass might also someday grow to understand and rewrite e.g. calls to memmove
and memequal
with small constant sizes.
It is not obvious to me that this pass would pull its weight, compilation-time-wise. Needs experimentation. Filing an issue so that I don't forget about it. :)