Description
A conversion that involves runtime.getitab
(convI2I
, assertI2I
, assertE2I
) is generally much slower than an ordinary
interface value construction from a statically known concrete type.
this is fast: *bytes.Buffer => io.Writer
this is not as fast: io.ReadWriter => io.Writer
The Go compiler has an IR-based devirtualization pass that performs a local method call devirtualization. It's not as silly as it might sound as it works well enough in some situations thanks to the inlining.
It does not, however, handle I2I-like operations. Even if we know that converted value has some non-interface type T
, we still do an expensive I2I operation.
This situation occurs a lot in a codebase that works with hierarchical data. AST is an example: we have ast.Node
and ast.Expr
. It's quite common to write a function that accepts ast.Node
while some other function can operate with ast.Expr
. In the *ast.Ident
-> ast.Expr
-> ast.Node
chain we can simplify the ast.Expr
-> ast.Node
conversion if we use the information that ast.Expr
is actually *ast.Ident
.
A real-world case can be found in the Go compiler code.
func (p *noder) structType(expr *syntax.StructType) ir.Node {
// ...
p.setlineno(expr) // setlineno is inlined
// ...
}
func (p *noder) setlineno(n syntax.Node) {
if n != nil {
base.Pos = p.pos(n) // convI2I: node->poser
}
}
// This functions can't be inlined.
func (m *posMap) pos(p poser) src.XPos { return m.makeXPos(p.Pos()) }
Another situation is when constructor returns an interface type, like hash.Hash
and then they're passed as io.Writer
. In many cases we can avoid convI2I
for md5.New()
-> io.Writer
case (it works for most hash/crypto related constructors).
Here is a simple benchmark that illustrates the performance problem with I2I:
package benchtest
import (
"bytes"
"io"
"testing"
)
//go:noinline
func sinkWriter(io.Writer) {}
func BenchmarkDevirt(b *testing.B) {
buf := &bytes.Buffer{}
var rw io.ReadWriter = buf
for i := 0; i < b.N; i++ {
sinkWriter(rw) // convI2I
}
}
If we rewrite convI2I
like this (pseudo-code):
convI2I(rw, "io.Writer") -> convI2I(rw.(*bytes.Buffer), "io.Writer")
Then we get these results:
name old time/op new time/op delta
Devirt-8 11.1ns ± 1% 2.4ns ± 1% -78.55% (p=0.000 n=10+10)
The same idea applies to the type assertions that involve interface-to-interface conversion.
The optimized code is also usually smaller from the machine code point of view.
.text
segment size differences:
go tool: -224 bytes
cmd/asm: -192 bytes
Total binary size differences:
go tool: -543 bytes
cmd/asm: -310 bytes
In general, this is not a binary size optimization as convI2I is quite rare on its own.
There are even fewer cases that we can optimize at the compile time.
I provided these numbers just to be sure that we cover the binary size impact as well.
Note: this does not solve all convI2I issues, but it can at least reduce the amount of convI2I we see in our CPU profiles.
I'll send a CL that provides my first attempt at this optimization. If CL is not good enough, we can at least have this issue that has some sweet numbers to think about.
Implementation notes
Changing devirtualize.go
from ir.VisitList
to ir.EditChildren
makes it measurably slower.
This is why a slightly less simple approach is used when we keep ir.VisitList
, but handle some nodes via their parents. It covers less code, but in practice the optimization coverage should be OK. Suggestions are welcome: it could be the case that we can introduce these optimizations to some other part of the compiler.
This approach runs with almost identical speed, compilebench shows no significant diff this time:
name old time/op new time/op delta
Template 411ms ± 2% 410ms ± 1% ~ (p=1.000 n=5+5)
Unicode 212ms ± 4% 210ms ± 3% ~ (p=0.310 n=5+5)
GoTypes 2.18s ± 1% 2.20s ± 7% ~ (p=0.841 n=5+5)
Compiler 166ms ± 2% 160ms ± 4% -3.90% (p=0.008 n=5+5)
SSA 16.9s ± 2% 17.1s ± 0% ~ (p=0.151 n=5+5)
Flate 263ms ± 2% 261ms ± 3% ~ (p=0.548 n=5+5)
GoParser 467ms ± 1% 461ms ± 2% ~ (p=0.095 n=5+5)
Tar 351ms ± 2% 352ms ± 2% ~ (p=1.000 n=5+5)
XML 459ms ± 1% 462ms ± 1% ~ (p=0.151 n=5+5)
LinkCompiler 783ms ± 2% 778ms ± 2% ~ (p=0.730 n=5+4)
ExternalLinkCompiler 2.56s ± 0% 2.58s ± 2% ~ (p=0.413 n=4+5)
LinkWithoutDebugCompiler 480ms ± 2% 477ms ± 3% ~ (p=0.690 n=5+5)
[Geo mean] 683ms 680ms -0.40%
name old user-time/op new user-time/op delta
Template 532ms ± 2% 514ms ± 8% ~ (p=0.151 n=5+5)
Unicode 275ms ±12% 280ms ± 9% ~ (p=0.841 n=5+5)
GoTypes 2.84s ± 1% 2.85s ± 5% ~ (p=0.841 n=5+5)
Compiler 211ms ± 6% 212ms ± 6% ~ (p=0.841 n=5+5)
SSA 23.5s ± 1% 23.9s ± 1% ~ (p=0.095 n=5+5)
Flate 328ms ± 5% 333ms ± 5% ~ (p=0.548 n=5+5)
GoParser 571ms ± 0% 567ms ± 1% ~ (p=0.222 n=5+5)
Tar 442ms ± 6% 433ms ± 4% ~ (p=0.548 n=5+5)
XML 582ms ± 3% 589ms ± 3% ~ (p=0.421 n=5+5)
LinkCompiler 1.34s ± 4% 1.36s ±10% ~ (p=0.690 n=5+5)
ExternalLinkCompiler 2.91s ± 3% 2.91s ± 1% ~ (p=1.000 n=5+5)
LinkWithoutDebugCompiler 614ms ± 3% 615ms ± 3% ~ (p=1.000 n=5+5)
[Geo mean] 887ms 889ms +0.19%
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Activity
gopherbot commentedon Feb 10, 2022
Change https://go.dev/cl/384794 mentions this issue:
cmd/compile: optimize iface-to-iface conversions
mvdan commentedon Feb 10, 2022
Just want to drop by and thank you for writing this issue with such detail and in a way that most Go developers can understand :)
ianlancetaylor commentedon Feb 10, 2022
CC @randall77
nightlyone commentedon Mar 24, 2023
https://go.dev/cl/384794 just got abandoned after its first birthday.
So there is an opportunity for fresh minds to pick up that idea and re-evaluate its usefulness.