cmd/compile: optimize convI2I, assertI2I, assertE2I for statically known types

A conversion that involves `runtime.getitab` (`convI2I`, `assertI2I`, `assertE2I`) is generally much slower than an ordinary
interface value construction from a statically known concrete type.

```
this is fast:        *bytes.Buffer => io.Writer
this is not as fast: io.ReadWriter => io.Writer
```

The Go compiler has an IR-based devirtualization pass that performs a local method call devirtualization. It's not as silly as it might sound as it works well enough in some situations thanks to the inlining.

It does not, however, handle I2I-like operations. Even if we know that converted value has some non-interface type `T`, we still do an expensive I2I operation.

This situation occurs a lot in a codebase that works with hierarchical data. AST is an example: we have `ast.Node` and `ast.Expr`. It's quite common to write a function that accepts `ast.Node` while some other function can operate with `ast.Expr`. In the `*ast.Ident` -> `ast.Expr` -> `ast.Node` chain we can simplify the `ast.Expr` -> `ast.Node` conversion if we use the information that `ast.Expr` is actually `*ast.Ident`.

A real-world case can be found in the Go compiler code.

```go
func (p *noder) structType(expr *syntax.StructType) ir.Node {
	// ...
	p.setlineno(expr) // setlineno is inlined
	// ...
}

func (p *noder) setlineno(n syntax.Node) {
	if n != nil {
		base.Pos = p.pos(n) // convI2I: node->poser
	}
}

// This functions can't be inlined.
func (m *posMap) pos(p poser) src.XPos { return m.makeXPos(p.Pos()) }
```

Another situation is when constructor returns an interface type, like `hash.Hash` and then they're passed as `io.Writer`. In many cases we can avoid `convI2I` for `md5.New()` -> `io.Writer` case (it works for most hash/crypto related constructors).

Here is a simple benchmark that illustrates the performance problem with I2I:

```go
package benchtest

import (
	"bytes"
	"io"
	"testing"
)

//go:noinline
func sinkWriter(io.Writer) {}

func BenchmarkDevirt(b *testing.B) {
	buf := &bytes.Buffer{}
	var rw io.ReadWriter = buf
	for i := 0; i < b.N; i++ {
		sinkWriter(rw) // convI2I
	}
}
```

If we rewrite `convI2I` like this (pseudo-code):

```
convI2I(rw, "io.Writer") -> convI2I(rw.(*bytes.Buffer), "io.Writer")
```

Then we get these results:

```
name      old time/op  new time/op  delta
Devirt-8  11.1ns ± 1%   2.4ns ± 1%  -78.55%  (p=0.000 n=10+10)
```

The same idea applies to the type assertions that involve interface-to-interface conversion.

The optimized code is also usually smaller from the machine code point of view.

`.text` segment size differences:

```
go tool: -224 bytes
cmd/asm: -192 bytes
```

Total binary size differences:

```
go tool: -543 bytes
cmd/asm: -310 bytes
```

> In general, this is not a binary size optimization as convI2I is quite rare on its own.
> There are even fewer cases that we can optimize at the compile time.
> I provided these numbers just to be sure that we cover the binary size impact as well.

Note: this does not solve all convI2I issues, but it can at least reduce the amount of convI2I we see in our CPU profiles.

I'll send a CL that provides my first attempt at this optimization. If CL is not good enough, we can at least have this issue that has some sweet numbers to think about.

## Implementation notes

Changing `devirtualize.go` from `ir.VisitList` to `ir.EditChildren` makes it measurably slower.

This is why a slightly less simple approach is used when we keep `ir.VisitList`, but handle some nodes via their parents. It covers less code, but in practice the optimization coverage should be OK. Suggestions are welcome: it could be the case that we can introduce these optimizations to some other part of the compiler.

This approach runs with almost identical speed, compilebench shows no significant diff this time:

```
name                      old time/op       new time/op       delta
Template                        411ms ± 2%        410ms ± 1%    ~     (p=1.000 n=5+5)
Unicode                         212ms ± 4%        210ms ± 3%    ~     (p=0.310 n=5+5)
GoTypes                         2.18s ± 1%        2.20s ± 7%    ~     (p=0.841 n=5+5)
Compiler                        166ms ± 2%        160ms ± 4%  -3.90%  (p=0.008 n=5+5)
SSA                             16.9s ± 2%        17.1s ± 0%    ~     (p=0.151 n=5+5)
Flate                           263ms ± 2%        261ms ± 3%    ~     (p=0.548 n=5+5)
GoParser                        467ms ± 1%        461ms ± 2%    ~     (p=0.095 n=5+5)
Tar                             351ms ± 2%        352ms ± 2%    ~     (p=1.000 n=5+5)
XML                             459ms ± 1%        462ms ± 1%    ~     (p=0.151 n=5+5)
LinkCompiler                    783ms ± 2%        778ms ± 2%    ~     (p=0.730 n=5+4)
ExternalLinkCompiler            2.56s ± 0%        2.58s ± 2%    ~     (p=0.413 n=4+5)
LinkWithoutDebugCompiler        480ms ± 2%        477ms ± 3%    ~     (p=0.690 n=5+5)
[Geo mean]                      683ms             680ms       -0.40%

name                      old user-time/op  new user-time/op  delta
Template                        532ms ± 2%        514ms ± 8%    ~     (p=0.151 n=5+5)
Unicode                         275ms ±12%        280ms ± 9%    ~     (p=0.841 n=5+5)
GoTypes                         2.84s ± 1%        2.85s ± 5%    ~     (p=0.841 n=5+5)
Compiler                        211ms ± 6%        212ms ± 6%    ~     (p=0.841 n=5+5)
SSA                             23.5s ± 1%        23.9s ± 1%    ~     (p=0.095 n=5+5)
Flate                           328ms ± 5%        333ms ± 5%    ~     (p=0.548 n=5+5)
GoParser                        571ms ± 0%        567ms ± 1%    ~     (p=0.222 n=5+5)
Tar                             442ms ± 6%        433ms ± 4%    ~     (p=0.548 n=5+5)
XML                             582ms ± 3%        589ms ± 3%    ~     (p=0.421 n=5+5)
LinkCompiler                    1.34s ± 4%        1.36s ±10%    ~     (p=0.690 n=5+5)
ExternalLinkCompiler            2.91s ± 3%        2.91s ± 1%    ~     (p=1.000 n=5+5)
LinkWithoutDebugCompiler        614ms ± 3%        615ms ± 3%    ~     (p=1.000 n=5+5)
[Geo mean]                      887ms             889ms       +0.19%
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cmd/compile: optimize convI2I, assertI2I, assertE2I for statically known types #51133

Implementation notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

cmd/compile: optimize convI2I, assertI2I, assertE2I for statically known types #51133

Description

Implementation notes

Activity

gopherbot commented on Feb 10, 2022

mvdan commented on Feb 10, 2022

ianlancetaylor commented on Feb 10, 2022

nightlyone commented on Mar 24, 2023

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions