Skip to content

Commit 72507c4

Browse files
Initial draft of MIR dataflow framework docs
1 parent 2c733c9 commit 72507c4

File tree

2 files changed

+166
-0
lines changed

2 files changed

+166
-0
lines changed

src/SUMMARY.md

+1
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,7 @@
110110
- [Variance](./variance.md)
111111
- [Opaque Types](./opaque-types-type-alias-impl-trait.md)
112112
- [Pattern and Exhaustiveness Checking](./pat-exhaustive-checking.md)
113+
- [MIR dataflow](./mir/dataflow.md)
113114
- [The borrow checker](./borrow_check.md)
114115
- [Tracking moves and initialization](./borrow_check/moves_and_initialization.md)
115116
- [Move paths](./borrow_check/moves_and_initialization/move_paths.md)

src/mir/dataflow.md

+165
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,165 @@
1+
# Dataflow Analysis
2+
3+
If you work on the MIR, you will frequently come across various flavors of
4+
[dataflow analysis][wiki]. For example, `rustc` uses dataflow to find
5+
uninitialized variables, determine what variables are live across a generator
6+
`yield` statement, and compute which `Place`s are borrowed at a given point in
7+
the control-flow graph. Dataflow analysis is a fundamental concept in modern
8+
compilers, and knowledge of the subject will be helpful to prospective
9+
contributors.
10+
11+
However, this documentation is not a general introduction to dataflow analysis.
12+
It is merely a description of the framework used to define these analyses in
13+
`rustc`. It assumes that the reader is familiar with some basic terminology,
14+
such as "transfer function", "fixpoint" and "lattice". If you're unfamiliar
15+
with these terms, or if you want a quick refresher, [*Static Program Analysis*]
16+
by Anders Møller and Michael I. Schwartzbach is an excellent, freely available
17+
textbook. For those who prefer audiovisual learning, the Goethe University
18+
Frankfurt has published a series of short [youtube lectures][goethe] in English
19+
that are very approachable.
20+
21+
## Defining a Dataflow Analysis
22+
23+
The interface for dataflow analyses is split into three traits. The first is
24+
[`AnalysisDomain`], which must be implemented by *all* analyses. In addition to
25+
the type of the dataflow state, this trait defines the initial value of that
26+
state at entry to each block, as well as the direction of the analysis, either
27+
forward or backward. The domain of your dataflow analysis must be a [lattice][]
28+
(strictly speaking a join-semilattice) with a well-behaved `join` operator. See
29+
documentation for the [`lattice`] module, as well as the [`JoinSemiLattice`]
30+
trait, for more information.
31+
32+
You must then provide *either* a direct implementation of the [`Analysis`] trait
33+
*or* an implementation of the proxy trait [`GenKillAnalysis`]. The latter is for
34+
so-called ["gen-kill" problems], which have a simple class of transfer function
35+
that can be applied very efficiently. Analyses whose domain is not a `BitSet`
36+
of some index type, or whose transfer functions cannot be expressed through
37+
"gen" and "kill" operations, must implement `Analysis` directly, and will run
38+
slower as a result. All implementers of `GenKillAnalysis` also implement
39+
`Analysis` automatically via a default `impl`.
40+
41+
42+
```text
43+
AnalysisDomain
44+
^
45+
| | = has as a supertrait
46+
| . = provides a default impl for
47+
|
48+
Analysis
49+
^ ^
50+
| .
51+
| .
52+
| .
53+
GenKillAnalysis
54+
55+
```
56+
57+
### Transfer Functions and Effects
58+
59+
The dataflow framework in `rustc` allows each statement inside a basic block as
60+
well as the terminator to define its own transfer function. For brevity, these
61+
individual transfer functions are known as "effects". Each effect is applied
62+
successively in dataflow order, and together they define the transfer function
63+
for the entire basic block. It's also possible to define an effect for
64+
particular outgoing edges of some terminators (e.g.
65+
[`apply_call_return_effect`] for the `success` edge of a `Call`
66+
terminator). Collectively, these are known as per-edge effects.
67+
68+
The only meaningful difference (besides the "apply" prefix) between the methods
69+
of the `GenKillAnalysis` trait and the `Analysis` trait is that an `Analysis`
70+
has direct, mutable access to the dataflow state, whereas a `GenKillAnalysis`
71+
only sees an implementer of the `GenKill` trait, which only allows the `gen`
72+
and `kill` operations for mutation.
73+
74+
Observant readers of the documentation for these traits may notice that there
75+
are actually *two* possible effects for each statement and terminator, the
76+
"before" effect and the unprefixed (or "primary") effect. The "before" effects
77+
are applied immediately before the unprefixed effect **regardless of whether
78+
the analysis is backward or forward**. The vast majority of analyses should use
79+
only the unprefixed effects: Having multiple effects for each statement makes
80+
it difficult for consumers to know where they should be looking. However, the
81+
"before" variants can be useful in some scenarios, such as when the effect of
82+
the right-hand side of an assignment statement must be considered separately
83+
from the left-hand side.
84+
85+
### Convergence
86+
87+
TODO
88+
89+
## Inspecting the Results of a Dataflow Analysis
90+
91+
Once you have constructed an analysis, you must pass it to an [`Engine`], which
92+
is responsible for finding the steady-state solution to your dataflow problem.
93+
You should use the [`into_engine`] method defined on the `Analysis` trait for
94+
this, since it will use the more efficient `Engine::new_gen_kill` constructor
95+
when possible.
96+
97+
Calling `iterate_to_fixpoint` on your `Engine` will return a `Results`, which
98+
contains the dataflow state at fixpoint upon entry of each block. Once you have
99+
a `Results`, you can can inspect the dataflow state at fixpoint at any point in
100+
the CFG. If you only need the state at a few locations (e.g., each `Drop`
101+
terminator) use a [`ResultsCursor`]. If you need the state at *every* location,
102+
a [`ResultsVisitor`] will be more efficient.
103+
104+
```text
105+
Analysis
106+
|
107+
| into_engine(…)
108+
|
109+
Engine
110+
|
111+
| iterate_to_fixpoint()
112+
|
113+
Results
114+
/ \
115+
into_results_cursor(…) / \ visit_with(…)
116+
/ \
117+
ResultsCursor ResultsVisitor
118+
```
119+
120+
For example, the following code uses a [`ResultsVisitor`]...
121+
122+
123+
```rust,ignore
124+
// Assuming `MyVisitor` implements `ResultsVisitor<FlowState = MyAnalysis::Domain>`...
125+
let my_visitor = MyVisitor::new();
126+
127+
// inspect the fixpoint state for every location within every block in RPO.
128+
let results = MyAnalysis()
129+
.into_engine(tcx, body, def_id)
130+
.iterate_to_fixpoint()
131+
.visit_with(body, traversal::reverse_postorder(body), &mut my_visitor);
132+
```
133+
134+
whereas this code uses [`ResultsCursor`]:
135+
136+
```rust,ignore
137+
let mut results = MyAnalysis()
138+
.into_engine(tcx, body, def_id)
139+
.iterate_to_fixpoint()
140+
.into_results_cursor(body);
141+
142+
// Inspect the fixpoint state immediately before each `Drop` terminator.
143+
for (bb, block) in body.basic_blocks().iter_enumerated() {
144+
if let TerminatorKind::Drop { .. } = block.terminator().kind {
145+
results.seek_before_primary_effect(body.terminator_loc(bb));
146+
let state = results.get();
147+
println!("state before drop: {:#?}", state);
148+
}
149+
}
150+
```
151+
152+
["gen-kill" problems]: https://en.wikipedia.org/wiki/Data-flow_analysis#Bit_vector_problems
153+
[*Static Program Analysis*]: https://cs.au.dk/~amoeller/spa/
154+
[`AnalysisDomain`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/trait.AnalysisDomain.html
155+
[`Analysis`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/trait.Analysis.html
156+
[`GenKillAnalysis`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/trait.GenKillAnalysis.html
157+
[`JoinSemiLattice`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/lattice/trait.JoinSemiLattice.html
158+
[`ResultsCursor`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/struct.ResultsCursor.html
159+
[`ResultsVisitor`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/trait.ResultsVisitor.html
160+
[`apply_call_return_effect`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/trait.Analysis.html#tymethod.apply_call_return_effect
161+
[`into_engine`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/trait.Analysis.html#method.into_engine
162+
[`lattice`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/lattice/index.html
163+
[goethe]: https://www.youtube.com/watch?v=NVBQSR_HdL0&list=PL_sGR8T76Y58l3Gck3ZwIIHLWEmXrOLV_&index=2
164+
[lattice]: https://en.wikipedia.org/wiki/Lattice_(order)
165+
[wiki]: https://en.wikipedia.org/wiki/Data-flow_analysis#Basic_principles

0 commit comments

Comments
 (0)