@@ -4,188 +4,101 @@ Welcome to the sources of the dart2js compiler!
4
4
5
5
## Architecture
6
6
7
- The compiler is currently undergoing a long refactoring process. As you navigate
8
- this code you may find it helpful to understand how the compiler used to be,
9
- where it is going, and where it is today.
10
-
11
- ### The near future architecture
12
-
13
- The compiler will operate in these general phases:
14
-
15
- 1 . ** load kernel** : Load all the code as kernel
16
- * Collect dart sources transtively
17
- * Convert to kernel AST
18
-
19
- (this will be handled by invoking the front-end package)
20
-
21
- Alternatively, the compiler can start compilation directly from kernel files.
22
-
23
- 2 . ** model** : Create a Dart model of the program
24
- * The kernel ASTs could be used as a model, so this might be a no-op or just
25
- creating a thin wrapper on top of kernel.
26
-
27
- 3 . ** tree-shake and create world** : Build world of reachable code
28
- * For each reachable piece of code:
29
- * Compute impact (i1) from kernel AST
30
- * Build a closed world (w1)
31
-
32
- 4 . ** analyze** : Run a global analysis
33
- * Assume closed world semantics (from w1)
34
- * Produce a global result (g)
35
- * Like today (g) will contain type and nullability information
36
- * After we adopt strong-mode types, we want to explore simplifying this
37
- to only contain native + nullability information.
38
-
39
- 5 . ** codegen model** : Create a JS model of the program
40
- * Model JavaScript specific concepts (like the split of constructor bodies
41
- as separate elements) and provide a mapping to the Dart model
42
-
43
- 6 . ** codegen and tree-shake** : Generate code, as needed
44
- * For each reachable piece of code:
45
- * build ssa graph from kernel ASTs and global results (g)
46
- * optimize ssa
7
+ The compiler is structured to operate in several phases. By default these phases
8
+ are executed in sequence in a single process, but on some build systems, some of
9
+ these phases are split into separate processes. As such, there is plenty of
10
+ indirection and data representations used mostly for the purpose of serializing
11
+ intermediate results during compilation.
12
+
13
+ The current compiler phases are:
14
+
15
+ 1 . ** common front-end** : Execute traditional front-end compilation phases.
16
+ Dart2js delegates to the common front-end (also used by DDC and the VM) to
17
+ do all front-end features, this includes:
18
+ * parsing Dart source code,
19
+ * type checking,
20
+ * inferring implicit user types, like locals with a ` var ` declaration,
21
+ * lowering or simplifying Dart features. For example, this is how many
22
+ syntactic features, like extension methods and list comprehensions, are
23
+ implemented.
24
+ * additional web-specific lowering or simplifications. For example,
25
+ expansion of JS-interop features and web specific implementation of
26
+ language features like late variables.
27
+
28
+ The result of this phase is a kernel AST which is serialized as a ` .dill `
29
+ file.
30
+
31
+ 2 . ** modular analysis** : Using kernel as input, compute data recording
32
+ properties about each method in the program, especially around dependencies
33
+ and features they may need. We call this "impact data" (i1).
34
+
35
+ When the compiler runs as a single process, this is done lazily/on-demand
36
+ during the tree-shaking phase (below). However, this data can also be
37
+ computed independently for individual methods, files, or packages in the
38
+ application. That makes it possible to run this modularly and in parallel.
39
+
40
+ The result of this phase can be emitted as files containing impact data in
41
+ a serialized format.
42
+
43
+ 3 . ** tree-shake and create world** : Create a model to understand what parts of
44
+ the code are used by an application. This consists of:
45
+ * creating an intermediate representation called the "K model" that
46
+ wraps our kernel representation
47
+ * calculating which classes and methods are considered live in the
48
+ program. This is done by incrementally combining impact data (i1)
49
+ starting from ` main ` , then visiting reachable methods in the program
50
+ with an Rapid Type Analysis (RTA) algorithm to aggregate impacts
51
+ together.
52
+
53
+ The result of this phase is what we call a "closed world" (w1). The closed
54
+ world is also a datastructure that can answer interesting queries, such as:
55
+ Is this interface implemented by a single class? Is this method available
56
+ in any stubtype of some interface? The answers to these questions can help
57
+ the compiler generate higher quality JavaScript.
58
+
59
+ 4 . ** global analysis** : Run a global analysis that assumes closed world
60
+ semantics (from w1) and propagates information across method boundaries to
61
+ further understand what values flow through the program. This phase is
62
+ very valuable in narrowing down possibilities that are ambiguous based
63
+ solely on type information written by developers. It often finds
64
+ oportunities that enable the compiler to devirtualize or inline method
65
+ calls, generate code specializations, or trigger performance optimizations.
66
+
67
+ The result of this phase is a "global result" (g).
68
+
69
+ 5 . ** codegen model** : Create a JS or backend model of the program. This is an
70
+ intermediate representation of the entities in the program we referred to
71
+ as the "J model". It is very similar to the "K model", but it is tailored
72
+ to model JavaScript specific concepts (like the split of constructor bodies
73
+ as separate elements) and provide a mapping to the Dart model.
74
+
75
+ 6 . ** codegen** : Generate code for each method that is deemed necessary. This
76
+ includes:
77
+ * build an SSA graph from kernel ASTs and global results (g)
78
+ * optimize the SSA representation
47
79
* compute impact (i2) from optimized code
48
80
* emit JS ASTs for the code
49
- * Build a codegen closed world (w2) from new impacts (i2)
50
-
51
- 7 . ** emit** : Assemble and minify the program
52
- * Build program structure from the compiled pieces (w2)
53
- * Use frequency namer to minify names.
54
- * Emit js and source map files.
55
-
56
- ### The old architecture
57
81
58
- The compiler used to operate as follows:
59
82
60
- 1 . ** load dart** : Load all source files
61
- * Collect dart sources transtively
62
- * Scan enough tokens to build import dependencies.
83
+ 7 . ** link tree-shake** : Using the results of codegen, we perform a second
84
+ round of tree-shaking. This is important because code that was deemed
85
+ reachable in (w1) may be found unreachable after optimizations. The process
86
+ is very similar to the earlier phase: we combine incrementally the codegen
87
+ impact data (i2) and compute a codegen closed world (w2).
63
88
64
- 2 . ** model** : Create a Dart model (aka. Element Model) of the program
65
- * Do a diet-parse of the program to create the high-level element model
66
89
67
- 3 . ** resolve and tree-shake** : Resolve and build world of reachable code (the
68
- resolution enqueuer)
69
- * For each reachable piece of code:
70
- * Parse the full body of the function
71
- * Resolve it and enqueue other pieces that are reachable
72
- * Type check the body of the function
90
+ When dart2js runs as a single process the codegen phase is done lazily and
91
+ on-demand, together with the tree-shaking phase.
73
92
74
- 4 . ** analyze** : Run a global analysis
75
- * Assume closed world semantics (from everything enqueued by the resolver)
76
- * Produce a global result about type and nullability information of method
77
- arguments, return values, and receivers of dynamic sends.
78
-
79
- 5 . ** codegen and tree-shake** : Generate code, as needed (via the codegen
80
- enqueuer)
81
- * For each reachable piece of code:
82
- * build ssa graph from resolved source ASTs global results (g)
83
- * optimize ssa
84
- * enqueue visible dependencies
85
- * emit js asts for the code
86
-
87
- 6 . ** emit** : Assemble and minify the program
88
- * Build program structure from the compiled pieces
93
+ 8 . ** emit JavaScript files** : The final step is to assemble and minify the
94
+ final program. This includes:
95
+ * Build a JavaScript program structure from the compiled pieces (w2)
89
96
* Use frequency namer to minify names.
90
97
* Emit js and source map files.
91
98
92
- ### The architecture today (which might be changing while you read this!)
93
-
94
- When using the ` --use-kernel ` flag, you can test the latest state of the
95
- compiler as we are migrating to the new architecture. Currently it works as
96
- follows:
97
-
98
- 1 . ** load dart** : (same as old compiler)
99
-
100
- 2 . ** model** : (same element model as old compiler)
101
-
102
- 3 . ** resolve, tree-shake and build world** : Build world of reachable code
103
- * For each reachable piece of code:
104
- * Parse full body of the function
105
- * Resolve it from the parsed source ASTs
106
- * Type check it (same as old compiler)
107
- * Compute impact (i1) from resolved source ASTs (no kernel)
108
- * Build a closed world (w1)
109
-
110
- 4 . ** kernelize** : Create kernel ASTs
111
- * For all resolved elements in w1, compute their kernel representation using
112
- the ` rasta ` visitor.
113
-
114
- 5 . ** analyze** : (almost same as old compiler)
99
+ ## Code organization
115
100
116
- 6 . ** codegen and tree-shake** : Generate code, as needed
117
- * For each reachable piece of code:
118
- * build ssa graph from kernel ASTs (uses global results g)
119
- * optimize ssa
120
- * compute impact (i2) from optimized code
121
- * emit js asts for the code
122
- * Build a codegen closed world (w2) from new impacts (i2)
123
-
124
- 7 . ** emit** : (same as old compiler)
125
-
126
- Some additional details worth highlighting:
127
-
128
- * tree-shaking is close to working as we want: the notion of a world and world
129
- impacts are computed explicitly:
130
-
131
- * In the old compiler, the resolver and code generator directly
132
- enqueued items to be processed, there was no knowledge of what had
133
- to be done other than in the algorithm itself.
134
-
135
- * Now the information is computed explicitly in two ways:
136
-
137
- * The dependencies of a single element are computed as an "impact"
138
- object, these are derived from the structure of the
139
- code (either the resolved code or the generated code).
140
-
141
- * The closed world is now an explicit concept that can be replaced in the
142
- compiler.
143
-
144
- * This allows us to delete the resolver in the future and replace it
145
- with a kernel loader, an impact builder from kernel, and a kernel world.
146
-
147
- * There is an implementation of a kernel impact builder, but it is not yet
148
- in use in the compiler pipeline (gated on replacing the Dart model)
149
-
150
- * We still depend on the Dart model computed by resolution, but progress has
151
- been made introducing an abstraction common to the new and old models. The
152
- old model is the "Element model", the generic abstraction is called the
153
- "Entity model". Some portions of the compiler now refer to the entity model.
154
-
155
- * The ssa graph is built from the kernel ASTs, but it still depends on the old
156
- element model computed from resolution (accessed via a kernel2Ast adapter).
157
- The graph builder implementation covers a large chunk of the language
158
- features, but is not complete (89% of langage & corelib tests are passing).
159
-
160
- * Global analysis is still working on top of the dart2js ASTs.
161
-
162
- ## Code organization and history
163
-
164
- The compiler package was initially intended to be compiler for multiple targets:
165
- Javascript, Dart (dart2dart), and dartino bytecodes. It has now evolved to be a
166
- Javascript only compiler, but some of the abstractions to support multiple
167
- targets still remain.
168
-
169
- ### Possibly confusing terminology
170
-
171
- Some of the terminology in the compiler is confusing without knowing its
172
- history. We are cleaning this up as we are rearchitecting the system, but here
173
- are some of the legacy terminology we have:
174
-
175
- * ** target** : the output the compiler is producing. Nowdays it just
176
- JavaScript, but in the past there was also Dart and dartino bytecodes.
177
-
178
- * ** backend** : pieces of the compiler that were target-specific.
179
- Note: in the past we've used the term * backend* also for code that is used
180
- in the frontend of the compiler that happens to be target-specific, as well
181
- as and code that is used in the emitter or what traditionally is known
182
- as the backend of the compiler.
183
-
184
- * ** frontend** : the parser, resolver, and other early stages of the compiler.
185
- The front-end however makes target-specific choices. For example, to compile
186
- a program with async-await, the dart2js backend needs to include some helper
187
- functions that are used by the expanded async-await code, these helpers need
188
- to be parsed by the frontend and added to the compilation pipeline.
101
+ ### Some terminology used in the compiler
189
102
190
103
* ** world** : the compiler exploits closed-world assumptions to do
191
104
optimizations. The * world* encapsulates some of our knowledge of the
@@ -201,29 +114,22 @@ are some of the legacy terminology we have:
201
114
202
115
* ** model** : there are many models in the compiler:
203
116
204
- * ** element model** : this is an abstraction describing the elements seen in
205
- Dart programs, like "libraries", "classes", "methods", etc.
206
-
207
- * ** entity model** : also describes elements seen in Dart programs, but it is
208
- meant to be minimalistic and a super-hierarchy above the * element models* .
209
- This is a newer addition, is an added abstraction to make it possible to
210
- refactor our code from our old frontend to the kernel frontend.
211
-
212
- * ** Dart vs JS models** : the compiler in the past had a single model to
213
- describe elements in the source and elements that were being compiled. In
214
- the future we plan to have two. Both input model and output models will be
215
- implementations of the * entity model* . The JS model is intended to have
216
- concepts specific about generating code in JS (like constructor-bodies as
217
- a separate entity than the constructor, closure classes, etc).
117
+ * ** entity model** : this is an abstraction describing the elements seen in
118
+ Dart programs, like "libraries", "classes", "methods", etc. We currently
119
+ have two entity models, the "K model" (which is frontend centric and
120
+ usually maps 1:1 with kernel entities) and the "J model" (which is backend
121
+ centric).
218
122
219
123
* ** emitter model** : this is a model just used for dumping out the structure
220
124
of the program in a .js text file. It doesn't have enough semantic meaning
221
- to be a JS model for compilation at this moment.
125
+ to be a JS model for compilation, which is why there is a separate "J
126
+ model".
222
127
223
128
* ** enqueuer** : a work-queue used to achieve tree-shaking (or more precisely
224
129
tree-growing): elements are added to the enqueuer as we recognize that they
225
- are needed in a given application. Note that we even track how elements are
226
- used, since some ways of using an element require more code than others.
130
+ are needed in a given application (as described by the impact data). Note
131
+ that we even track how elements are used, since some ways of using an
132
+ element require more code than others.
227
133
228
134
### Code layout
229
135
0 commit comments