Skip to content

Commit 79cd168

Browse files
committed
[dev.typeparams] cmd/compile: unified IR construction
This CL adds a new unified IR construction mode to the frontend. It's purely additive, and all files include "UNREVIEWED" at the top, like how types2 was initially imported. The next CL adds a -d=unified flag to actually enable unified IR mode. See below for more details, but some highlights: 1. It adds ~6kloc (excluding enum listings and stringer output), but I estimate it will allow removing ~14kloc (see CL 324670, including its commit message); 2. When enabled by default, it passes more tests than -G=3 does (see CL 325213 and CL 324673); 3. Without requiring any new code, it supports inlining of more code than the current inliner (see CL 324574; contrast CL 283112 and CL 266203, which added support for inlining function literals and type switches, respectively); 4. Aside from dictionaries (which I intend to add still), its support for generics is more complete (e.g., it fully supports local types, including local generic types within generic functions and instantiating generic types with local types; see test/typeparam/nested.go); 5. It supports lazy loading of types and objects for types2 type checking; 6. It supports re-exporting of types, objects, and inline bodies without needing to parse them into IR; 7. The new export data format has extensive support for debugging with "sync" markers, so mistakes during development are easier to catch; 8. When compiling with -d=inlfuncswithclosures=0, it enables "quirks mode" where it generates output that passes toolstash -cmp. -- The new unified IR pipeline combines noding, stenciling, inlining, and import/export into a single, shared code path. Previously, IR trees went through multiple phases of copying during compilation: 1. "Noding": the syntax AST is copied into the initial IR form. To support generics, there's now also "irgen", which implements the same idea, but takes advantage of types2 type-checking results to more directly construct IR. 2. "Stenciling": generic IR forms are copied into instantiated IR forms, substituting type parameters as appropriate. 3. "Inlining": the inliner made backup copies of inlinable functions, and then copied them again when inlining into a call site, with some modifications (e.g., updating position information, rewriting variable references, changing "return" statements into "goto"). 4. "Importing/exporting": the exporter wrote out the IR as saved by the inliner, and then the importer read it back as to be used by the inliner again. Normal functions are imported/exported "desugared", while generic functions are imported/exported in source form. These passes are all conceptually the same thing: make a copy of a function body, maybe with some minor changes/substitutions. However, they're all completely separate implementations that frequently run into the same issues because IR has many nuanced corner cases. For example, inlining currently doesn't support local defined types, "range" loops, or labeled "for"/"switch" statements, because these require special handling around Sym references. We've recently extended the inliner to support new features like inlining type switches and function literals, and they've had issues. The exporter only knows how to export from IR form, so when re-exporting inlinable functions (e.g., methods on imported types that are exposed via exported APIs), these functions may need to be imported as IR for the sole purpose of being immediately exported back out again. By unifying all of these modes of copying into a single code path that cleanly separates concerns, we eliminate many of these possible issues. Some recent examples: 1. Issues #45743 and #46472 were issues where type switches were mishandled by inlining and stenciling, respectively; but neither of these affected unified IR, because it constructs type switches using the exact same code as for normal functions. 2. CL 325409 fixes an issue in stenciling with implicit conversion of values of type-parameter type to variables of interface type, but this issue did not affect unified IR. Change-Id: I5a05991fe16d68bb0f712503e034cb9f2d19e296 Reviewed-on: https://go-review.googlesource.com/c/go/+/324573 Trust: Matthew Dempsky <[email protected]> Trust: Robert Griesemer <[email protected]> Run-TryBot: Matthew Dempsky <[email protected]> TryBot-Result: Go Bot <[email protected]> Reviewed-by: Robert Griesemer <[email protected]>
1 parent ea438bd commit 79cd168

File tree

12 files changed

+6164
-0
lines changed

12 files changed

+6164
-0
lines changed
Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
// UNREVIEWED
2+
3+
// Copyright 2021 The Go Authors. All rights reserved.
4+
// Use of this source code is governed by a BSD-style
5+
// license that can be found in the LICENSE file.
6+
7+
package noder
8+
9+
type code interface {
10+
marker() syncMarker
11+
value() int
12+
}
13+
14+
type codeVal int
15+
16+
func (c codeVal) marker() syncMarker { return syncVal }
17+
func (c codeVal) value() int { return int(c) }
18+
19+
const (
20+
valBool codeVal = iota
21+
valString
22+
valInt64
23+
valBigInt
24+
valBigRat
25+
valBigFloat
26+
)
27+
28+
type codeType int
29+
30+
func (c codeType) marker() syncMarker { return syncType }
31+
func (c codeType) value() int { return int(c) }
32+
33+
const (
34+
typeBasic codeType = iota
35+
typeNamed
36+
typePointer
37+
typeSlice
38+
typeArray
39+
typeChan
40+
typeMap
41+
typeSignature
42+
typeStruct
43+
typeInterface
44+
typeUnion
45+
typeTypeParam
46+
)
47+
48+
type codeObj int
49+
50+
func (c codeObj) marker() syncMarker { return syncCodeObj }
51+
func (c codeObj) value() int { return int(c) }
52+
53+
const (
54+
objAlias codeObj = iota
55+
objConst
56+
objType
57+
objFunc
58+
objVar
59+
objStub
60+
)
61+
62+
type codeStmt int
63+
64+
func (c codeStmt) marker() syncMarker { return syncStmt1 }
65+
func (c codeStmt) value() int { return int(c) }
66+
67+
const (
68+
stmtEnd codeStmt = iota
69+
stmtLabel
70+
stmtBlock
71+
stmtExpr
72+
stmtSend
73+
stmtAssign
74+
stmtAssignOp
75+
stmtIncDec
76+
stmtBranch
77+
stmtCall
78+
stmtReturn
79+
stmtIf
80+
stmtFor
81+
stmtSwitch
82+
stmtSelect
83+
84+
// TODO(mdempsky): Remove after we don't care about toolstash -cmp.
85+
stmtTypeDeclHack
86+
)
87+
88+
type codeExpr int
89+
90+
func (c codeExpr) marker() syncMarker { return syncExpr }
91+
func (c codeExpr) value() int { return int(c) }
92+
93+
// TODO(mdempsky): Split expr into addr, for lvalues.
94+
const (
95+
exprNone codeExpr = iota
96+
exprConst
97+
exprType // type expression
98+
exprLocal // local variable
99+
exprName // global variable or function
100+
exprBlank
101+
exprCompLit
102+
exprFuncLit
103+
exprSelector
104+
exprIndex
105+
exprSlice
106+
exprAssert
107+
exprUnaryOp
108+
exprBinaryOp
109+
exprCall
110+
111+
// TODO(mdempsky): Handle in switchStmt directly instead.
112+
exprTypeSwitchGuard
113+
)
114+
115+
type codeDecl int
116+
117+
func (c codeDecl) marker() syncMarker { return syncDecl }
118+
func (c codeDecl) value() int { return int(c) }
119+
120+
const (
121+
declEnd codeDecl = iota
122+
declFunc
123+
declMethod
124+
declVar
125+
declOther
126+
)
Lines changed: 243 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,243 @@
1+
// UNREVIEWED
2+
3+
// Copyright 2021 The Go Authors. All rights reserved.
4+
// Use of this source code is governed by a BSD-style
5+
// license that can be found in the LICENSE file.
6+
7+
package noder
8+
9+
import (
10+
"encoding/binary"
11+
"fmt"
12+
"go/constant"
13+
"go/token"
14+
"math/big"
15+
"os"
16+
"strings"
17+
18+
"cmd/compile/internal/base"
19+
)
20+
21+
type pkgDecoder struct {
22+
pkgPath string
23+
24+
elemEndsEnds [numRelocs]uint32
25+
elemEnds []uint32
26+
elemData string
27+
}
28+
29+
func newPkgDecoder(pkgPath, input string) pkgDecoder {
30+
pr := pkgDecoder{
31+
pkgPath: pkgPath,
32+
}
33+
34+
// TODO(mdempsky): Implement direct indexing of input string to
35+
// avoid copying the position information.
36+
37+
r := strings.NewReader(input)
38+
39+
assert(binary.Read(r, binary.LittleEndian, pr.elemEndsEnds[:]) == nil)
40+
41+
pr.elemEnds = make([]uint32, pr.elemEndsEnds[len(pr.elemEndsEnds)-1])
42+
assert(binary.Read(r, binary.LittleEndian, pr.elemEnds[:]) == nil)
43+
44+
pos, err := r.Seek(0, os.SEEK_CUR)
45+
assert(err == nil)
46+
47+
pr.elemData = input[pos:]
48+
assert(len(pr.elemData) == int(pr.elemEnds[len(pr.elemEnds)-1]))
49+
50+
return pr
51+
}
52+
53+
func (pr *pkgDecoder) numElems(k reloc) int {
54+
count := int(pr.elemEndsEnds[k])
55+
if k > 0 {
56+
count -= int(pr.elemEndsEnds[k-1])
57+
}
58+
return count
59+
}
60+
61+
func (pr *pkgDecoder) totalElems() int {
62+
return len(pr.elemEnds)
63+
}
64+
65+
func (pr *pkgDecoder) absIdx(k reloc, idx int) int {
66+
absIdx := idx
67+
if k > 0 {
68+
absIdx += int(pr.elemEndsEnds[k-1])
69+
}
70+
if absIdx >= int(pr.elemEndsEnds[k]) {
71+
base.Fatalf("%v:%v is out of bounds; %v", k, idx, pr.elemEndsEnds)
72+
}
73+
return absIdx
74+
}
75+
76+
func (pr *pkgDecoder) dataIdx(k reloc, idx int) string {
77+
absIdx := pr.absIdx(k, idx)
78+
79+
var start uint32
80+
if absIdx > 0 {
81+
start = pr.elemEnds[absIdx-1]
82+
}
83+
end := pr.elemEnds[absIdx]
84+
85+
return pr.elemData[start:end]
86+
}
87+
88+
func (pr *pkgDecoder) stringIdx(idx int) string {
89+
return pr.dataIdx(relocString, idx)
90+
}
91+
92+
func (pr *pkgDecoder) newDecoder(k reloc, idx int, marker syncMarker) decoder {
93+
r := pr.newDecoderRaw(k, idx)
94+
r.sync(marker)
95+
return r
96+
}
97+
98+
func (pr *pkgDecoder) newDecoderRaw(k reloc, idx int) decoder {
99+
r := decoder{
100+
common: pr,
101+
k: k,
102+
idx: idx,
103+
}
104+
105+
// TODO(mdempsky) r.data.Reset(...) after #44505 is resolved.
106+
r.data = *strings.NewReader(pr.dataIdx(k, idx))
107+
108+
r.sync(syncRelocs)
109+
r.relocs = make([]relocEnt, r.len())
110+
for i := range r.relocs {
111+
r.sync(syncReloc)
112+
r.relocs[i] = relocEnt{reloc(r.len()), r.len()}
113+
}
114+
115+
return r
116+
}
117+
118+
type decoder struct {
119+
common *pkgDecoder
120+
121+
relocs []relocEnt
122+
data strings.Reader
123+
124+
k reloc
125+
idx int
126+
}
127+
128+
func (r *decoder) checkErr(err error) {
129+
if err != nil {
130+
base.Fatalf("unexpected error: %v", err)
131+
}
132+
}
133+
134+
func (r *decoder) sync(m syncMarker) {
135+
if debug {
136+
pos, err0 := r.data.Seek(0, os.SEEK_CUR)
137+
x, err := r.data.ReadByte()
138+
r.checkErr(err)
139+
if x != byte(m) {
140+
// TODO(mdempsky): Revisit this error message, and make it more
141+
// useful (e.g., include r.p.pkgPath).
142+
base.Fatalf("data sync error: found %v at %v (%v) in (%v:%v), but expected %v", syncMarker(x), pos, err0, r.k, r.idx, m)
143+
}
144+
}
145+
}
146+
147+
func (r *decoder) bool() bool {
148+
r.sync(syncBool)
149+
x, err := r.data.ReadByte()
150+
r.checkErr(err)
151+
assert(x < 2)
152+
return x != 0
153+
}
154+
155+
func (r *decoder) int64() int64 {
156+
r.sync(syncInt64)
157+
x, err := binary.ReadVarint(&r.data)
158+
r.checkErr(err)
159+
return x
160+
}
161+
162+
func (r *decoder) uint64() uint64 {
163+
r.sync(syncUint64)
164+
x, err := binary.ReadUvarint(&r.data)
165+
r.checkErr(err)
166+
return x
167+
}
168+
169+
func (r *decoder) len() int { x := r.uint64(); v := int(x); assert(uint64(v) == x); return v }
170+
func (r *decoder) int() int { x := r.int64(); v := int(x); assert(int64(v) == x); return v }
171+
func (r *decoder) uint() uint { x := r.uint64(); v := uint(x); assert(uint64(v) == x); return v }
172+
173+
func (r *decoder) code(mark syncMarker) int {
174+
r.sync(mark)
175+
return r.len()
176+
}
177+
178+
func (r *decoder) reloc(k reloc) int {
179+
r.sync(syncUseReloc)
180+
idx := r.len()
181+
182+
e := r.relocs[idx]
183+
assert(e.kind == k)
184+
return e.idx
185+
}
186+
187+
func (r *decoder) string() string {
188+
r.sync(syncString)
189+
return r.common.stringIdx(r.reloc(relocString))
190+
}
191+
192+
func (r *decoder) strings() []string {
193+
res := make([]string, r.len())
194+
for i := range res {
195+
res[i] = r.string()
196+
}
197+
return res
198+
}
199+
200+
func (r *decoder) rawValue() constant.Value {
201+
isComplex := r.bool()
202+
val := r.scalar()
203+
if isComplex {
204+
val = constant.BinaryOp(val, token.ADD, constant.MakeImag(r.scalar()))
205+
}
206+
return val
207+
}
208+
209+
func (r *decoder) scalar() constant.Value {
210+
switch tag := codeVal(r.code(syncVal)); tag {
211+
default:
212+
panic(fmt.Sprintf("unexpected scalar tag: %v", tag))
213+
214+
case valBool:
215+
return constant.MakeBool(r.bool())
216+
case valString:
217+
return constant.MakeString(r.string())
218+
case valInt64:
219+
return constant.MakeInt64(r.int64())
220+
case valBigInt:
221+
return constant.Make(r.bigInt())
222+
case valBigRat:
223+
num := r.bigInt()
224+
denom := r.bigInt()
225+
return constant.Make(new(big.Rat).SetFrac(num, denom))
226+
case valBigFloat:
227+
return constant.Make(r.bigFloat())
228+
}
229+
}
230+
231+
func (r *decoder) bigInt() *big.Int {
232+
v := new(big.Int).SetBytes([]byte(r.string()))
233+
if r.bool() {
234+
v.Neg(v)
235+
}
236+
return v
237+
}
238+
239+
func (r *decoder) bigFloat() *big.Float {
240+
v := new(big.Float).SetPrec(512)
241+
assert(v.UnmarshalText([]byte(r.string())) == nil)
242+
return v
243+
}

0 commit comments

Comments
 (0)