|
1 |
| -/// Note: most of the tests relevant to this file can be found (at the time of writing) in |
2 |
| -/// src/tests/ui/pattern/usefulness. |
3 |
| -/// |
4 |
| -/// This file includes the logic for exhaustiveness and usefulness checking for |
5 |
| -/// pattern-matching. Specifically, given a list of patterns for a type, we can |
6 |
| -/// tell whether: |
7 |
| -/// (a) the patterns cover every possible constructor for the type [exhaustiveness] |
8 |
| -/// (b) each pattern is necessary [usefulness] |
9 |
| -/// |
10 |
| -/// The algorithm implemented here is a modified version of the one described in: |
11 |
| -/// http://moscova.inria.fr/~maranget/papers/warn/index.html |
12 |
| -/// However, to save future implementors from reading the original paper, we |
13 |
| -/// summarise the algorithm here to hopefully save time and be a little clearer |
14 |
| -/// (without being so rigorous). |
15 |
| -/// |
16 |
| -/// # Premise |
17 |
| -/// |
18 |
| -/// The core of the algorithm revolves about a "usefulness" check. In particular, we |
19 |
| -/// are trying to compute a predicate `U(P, p)` where `P` is a list of patterns (we refer to this as |
20 |
| -/// a matrix). `U(P, p)` represents whether, given an existing list of patterns |
21 |
| -/// `P_1 ..= P_m`, adding a new pattern `p` will be "useful" (that is, cover previously- |
22 |
| -/// uncovered values of the type). |
23 |
| -/// |
24 |
| -/// If we have this predicate, then we can easily compute both exhaustiveness of an |
25 |
| -/// entire set of patterns and the individual usefulness of each one. |
26 |
| -/// (a) the set of patterns is exhaustive iff `U(P, _)` is false (i.e., adding a wildcard |
27 |
| -/// match doesn't increase the number of values we're matching) |
28 |
| -/// (b) a pattern `P_i` is not useful if `U(P[0..=(i-1), P_i)` is false (i.e., adding a |
29 |
| -/// pattern to those that have come before it doesn't increase the number of values |
30 |
| -/// we're matching). |
31 |
| -/// |
32 |
| -/// # Core concept |
33 |
| -/// |
34 |
| -/// The idea that powers everything that is done in this file is the following: a value is made |
35 |
| -/// from a constructor applied to some fields. Examples of constructors are `Some`, `None`, `(,)` |
36 |
| -/// (the 2-tuple constructor), `Foo {..}` (the constructor for a struct `Foo`), and `2` (the |
37 |
| -/// constructor for the number `2`). Fields are just a (possibly empty) list of values. |
38 |
| -/// |
39 |
| -/// Some of the constructors listed above might feel weird: `None` and `2` don't take any |
40 |
| -/// arguments. This is part of what makes constructors so general: we will consider plain values |
41 |
| -/// like numbers and string literals to be constructors that take no arguments, also called "0-ary |
42 |
| -/// constructors"; they are the simplest case of constructors. This allows us to see any value as |
43 |
| -/// made up from a tree of constructors, each having a given number of children. For example: |
44 |
| -/// `(None, Ok(0))` is made from 4 different constructors. |
45 |
| -/// |
46 |
| -/// This idea can be extended to patterns: a pattern captures a set of possible values, and we can |
47 |
| -/// describe this set using constructors. For example, `Err(_)` captures all values of the type |
48 |
| -/// `Result<T, E>` that start with the `Err` constructor (for some choice of `T` and `E`). The |
49 |
| -/// wildcard `_` captures all values of the given type starting with any of the constructors for |
50 |
| -/// that type. |
51 |
| -/// |
52 |
| -/// We use this to compute whether different patterns might capture a same value. Do the patterns |
53 |
| -/// `Ok("foo")` and `Err(_)` capture a common value? The answer is no, because the first pattern |
54 |
| -/// captures only values starting with the `Ok` constructor and the second only values starting |
55 |
| -/// with the `Err` constructor. Do the patterns `Some(42)` and `Some(1..10)` intersect? They might, |
56 |
| -/// since they both capture values starting with `Some`. To be certain, we need to dig under the |
57 |
| -/// `Some` constructor and continue asking the question. This is the main idea behind the |
58 |
| -/// exhaustiveness algorithm: by looking at patterns constructor-by-constructor, we can efficiently |
59 |
| -/// figure out if some new pattern might capture a value that hadn't been captured by previous |
60 |
| -/// patterns. |
61 |
| -/// |
62 |
| -/// Constructors are represented by the `Constructor` enum, and its fields by the `Fields` enum. |
63 |
| -/// Most of the complexity of this file resides in transforming between patterns and |
64 |
| -/// (`Constructor`, `Fields`) pairs, handling all the special cases correctly. |
65 |
| -/// |
66 |
| -/// Caveat: this constructors/fields distinction doesn't quite cover every Rust value. For example |
67 |
| -/// a value of type `Rc<u64>` doesn't fit this idea very well, nor do various other things. |
68 |
| -/// However, this idea covers most of the cases that are relevant to exhaustiveness checking. |
69 |
| -/// |
70 |
| -/// |
71 |
| -/// # Algorithm |
72 |
| -/// |
73 |
| -/// Recall that `U(P, p)` represents whether, given an existing list of patterns (aka matrix) `P`, |
74 |
| -/// adding a new pattern `p` will cover previously-uncovered values of the type. |
75 |
| -/// During the course of the algorithm, the rows of the matrix won't just be individual patterns, |
76 |
| -/// but rather partially-deconstructed patterns in the form of a list of fields. The paper |
77 |
| -/// calls those pattern-vectors, and we will call them pattern-stacks. The same holds for the |
78 |
| -/// new pattern `p`. |
79 |
| -/// |
80 |
| -/// For example, say we have the following: |
81 |
| -/// ``` |
82 |
| -/// // x: (Option<bool>, Result<()>) |
83 |
| -/// match x { |
84 |
| -/// (Some(true), _) => {} |
85 |
| -/// (None, Err(())) => {} |
86 |
| -/// (None, Err(_)) => {} |
87 |
| -/// } |
88 |
| -/// ``` |
89 |
| -/// Here, the matrix `P` starts as: |
90 |
| -/// [ |
91 |
| -/// [(Some(true), _)], |
92 |
| -/// [(None, Err(()))], |
93 |
| -/// [(None, Err(_))], |
94 |
| -/// ] |
95 |
| -/// We can tell it's not exhaustive, because `U(P, _)` is true (we're not covering |
96 |
| -/// `[(Some(false), _)]`, for instance). In addition, row 3 is not useful, because |
97 |
| -/// all the values it covers are already covered by row 2. |
98 |
| -/// |
99 |
| -/// A list of patterns can be thought of as a stack, because we are mainly interested in the top of |
100 |
| -/// the stack at any given point, and we can pop or apply constructors to get new pattern-stacks. |
101 |
| -/// To match the paper, the top of the stack is at the beginning / on the left. |
102 |
| -/// |
103 |
| -/// There are two important operations on pattern-stacks necessary to understand the algorithm: |
104 |
| -/// 1. We can pop a given constructor off the top of a stack. This operation is called |
105 |
| -/// `specialize`, and is denoted `S(c, p)` where `c` is a constructor (like `Some` or |
106 |
| -/// `None`) and `p` a pattern-stack. |
107 |
| -/// If the pattern on top of the stack can cover `c`, this removes the constructor and |
108 |
| -/// pushes its arguments onto the stack. It also expands OR-patterns into distinct patterns. |
109 |
| -/// Otherwise the pattern-stack is discarded. |
110 |
| -/// This essentially filters those pattern-stacks whose top covers the constructor `c` and |
111 |
| -/// discards the others. |
112 |
| -/// |
113 |
| -/// For example, the first pattern above initially gives a stack `[(Some(true), _)]`. If we |
114 |
| -/// pop the tuple constructor, we are left with `[Some(true), _]`, and if we then pop the |
115 |
| -/// `Some` constructor we get `[true, _]`. If we had popped `None` instead, we would get |
116 |
| -/// nothing back. |
117 |
| -/// |
118 |
| -/// This returns zero or more new pattern-stacks, as follows. We look at the pattern `p_1` |
119 |
| -/// on top of the stack, and we have four cases: |
120 |
| -/// 1.1. `p_1 = c(r_1, .., r_a)`, i.e. the top of the stack has constructor `c`. We |
121 |
| -/// push onto the stack the arguments of this constructor, and return the result: |
122 |
| -/// r_1, .., r_a, p_2, .., p_n |
123 |
| -/// 1.2. `p_1 = c'(r_1, .., r_a')` where `c ≠ c'`. We discard the current stack and |
124 |
| -/// return nothing. |
125 |
| -/// 1.3. `p_1 = _`. We push onto the stack as many wildcards as the constructor `c` has |
126 |
| -/// arguments (its arity), and return the resulting stack: |
127 |
| -/// _, .., _, p_2, .., p_n |
128 |
| -/// 1.4. `p_1 = r_1 | r_2`. We expand the OR-pattern and then recurse on each resulting |
129 |
| -/// stack: |
130 |
| -/// S(c, (r_1, p_2, .., p_n)) |
131 |
| -/// S(c, (r_2, p_2, .., p_n)) |
132 |
| -/// |
133 |
| -/// 2. We can pop a wildcard off the top of the stack. This is called `D(p)`, where `p` is |
134 |
| -/// a pattern-stack. |
135 |
| -/// This is used when we know there are missing constructor cases, but there might be |
136 |
| -/// existing wildcard patterns, so to check the usefulness of the matrix, we have to check |
137 |
| -/// all its *other* components. |
138 |
| -/// |
139 |
| -/// It is computed as follows. We look at the pattern `p_1` on top of the stack, |
140 |
| -/// and we have three cases: |
141 |
| -/// 1.1. `p_1 = c(r_1, .., r_a)`. We discard the current stack and return nothing. |
142 |
| -/// 1.2. `p_1 = _`. We return the rest of the stack: |
143 |
| -/// p_2, .., p_n |
144 |
| -/// 1.3. `p_1 = r_1 | r_2`. We expand the OR-pattern and then recurse on each resulting |
145 |
| -/// stack. |
146 |
| -/// D((r_1, p_2, .., p_n)) |
147 |
| -/// D((r_2, p_2, .., p_n)) |
148 |
| -/// |
149 |
| -/// Note that the OR-patterns are not always used directly in Rust, but are used to derive the |
150 |
| -/// exhaustive integer matching rules, so they're written here for posterity. |
151 |
| -/// |
152 |
| -/// Both those operations extend straightforwardly to a list or pattern-stacks, i.e. a matrix, by |
153 |
| -/// working row-by-row. Popping a constructor ends up keeping only the matrix rows that start with |
154 |
| -/// the given constructor, and popping a wildcard keeps those rows that start with a wildcard. |
155 |
| -/// |
156 |
| -/// |
157 |
| -/// The algorithm for computing `U` |
158 |
| -/// ------------------------------- |
159 |
| -/// The algorithm is inductive (on the number of columns: i.e., components of tuple patterns). |
160 |
| -/// That means we're going to check the components from left-to-right, so the algorithm |
161 |
| -/// operates principally on the first component of the matrix and new pattern-stack `p`. |
162 |
| -/// This algorithm is realised in the `is_useful` function. |
163 |
| -/// |
164 |
| -/// Base case. (`n = 0`, i.e., an empty tuple pattern) |
165 |
| -/// - If `P` already contains an empty pattern (i.e., if the number of patterns `m > 0`), |
166 |
| -/// then `U(P, p)` is false. |
167 |
| -/// - Otherwise, `P` must be empty, so `U(P, p)` is true. |
168 |
| -/// |
169 |
| -/// Inductive step. (`n > 0`, i.e., whether there's at least one column |
170 |
| -/// [which may then be expanded into further columns later]) |
171 |
| -/// We're going to match on the top of the new pattern-stack, `p_1`. |
172 |
| -/// - If `p_1 == c(r_1, .., r_a)`, i.e. we have a constructor pattern. |
173 |
| -/// Then, the usefulness of `p_1` can be reduced to whether it is useful when |
174 |
| -/// we ignore all the patterns in the first column of `P` that involve other constructors. |
175 |
| -/// This is where `S(c, P)` comes in: |
176 |
| -/// `U(P, p) := U(S(c, P), S(c, p))` |
177 |
| -/// This special case is handled in `is_useful_specialized`. |
178 |
| -/// |
179 |
| -/// For example, if `P` is: |
180 |
| -/// [ |
181 |
| -/// [Some(true), _], |
182 |
| -/// [None, 0], |
183 |
| -/// ] |
184 |
| -/// and `p` is [Some(false), 0], then we don't care about row 2 since we know `p` only |
185 |
| -/// matches values that row 2 doesn't. For row 1 however, we need to dig into the |
186 |
| -/// arguments of `Some` to know whether some new value is covered. So we compute |
187 |
| -/// `U([[true, _]], [false, 0])`. |
188 |
| -/// |
189 |
| -/// - If `p_1 == _`, then we look at the list of constructors that appear in the first |
190 |
| -/// component of the rows of `P`: |
191 |
| -/// + If there are some constructors that aren't present, then we might think that the |
192 |
| -/// wildcard `_` is useful, since it covers those constructors that weren't covered |
193 |
| -/// before. |
194 |
| -/// That's almost correct, but only works if there were no wildcards in those first |
195 |
| -/// components. So we need to check that `p` is useful with respect to the rows that |
196 |
| -/// start with a wildcard, if there are any. This is where `D` comes in: |
197 |
| -/// `U(P, p) := U(D(P), D(p))` |
198 |
| -/// |
199 |
| -/// For example, if `P` is: |
200 |
| -/// [ |
201 |
| -/// [_, true, _], |
202 |
| -/// [None, false, 1], |
203 |
| -/// ] |
204 |
| -/// and `p` is [_, false, _], the `Some` constructor doesn't appear in `P`. So if we |
205 |
| -/// only had row 2, we'd know that `p` is useful. However row 1 starts with a |
206 |
| -/// wildcard, so we need to check whether `U([[true, _]], [false, 1])`. |
207 |
| -/// |
208 |
| -/// + Otherwise, all possible constructors (for the relevant type) are present. In this |
209 |
| -/// case we must check whether the wildcard pattern covers any unmatched value. For |
210 |
| -/// that, we can think of the `_` pattern as a big OR-pattern that covers all |
211 |
| -/// possible constructors. For `Option`, that would mean `_ = None | Some(_)` for |
212 |
| -/// example. The wildcard pattern is useful in this case if it is useful when |
213 |
| -/// specialized to one of the possible constructors. So we compute: |
214 |
| -/// `U(P, p) := ∃(k ϵ constructors) U(S(k, P), S(k, p))` |
215 |
| -/// |
216 |
| -/// For example, if `P` is: |
217 |
| -/// [ |
218 |
| -/// [Some(true), _], |
219 |
| -/// [None, false], |
220 |
| -/// ] |
221 |
| -/// and `p` is [_, false], both `None` and `Some` constructors appear in the first |
222 |
| -/// components of `P`. We will therefore try popping both constructors in turn: we |
223 |
| -/// compute U([[true, _]], [_, false]) for the `Some` constructor, and U([[false]], |
224 |
| -/// [false]) for the `None` constructor. The first case returns true, so we know that |
225 |
| -/// `p` is useful for `P`. Indeed, it matches `[Some(false), _]` that wasn't matched |
226 |
| -/// before. |
227 |
| -/// |
228 |
| -/// - If `p_1 == r_1 | r_2`, then the usefulness depends on each `r_i` separately: |
229 |
| -/// `U(P, p) := U(P, (r_1, p_2, .., p_n)) |
230 |
| -/// || U(P, (r_2, p_2, .., p_n))` |
231 |
| -/// |
232 |
| -/// Modifications to the algorithm |
233 |
| -/// ------------------------------ |
234 |
| -/// The algorithm in the paper doesn't cover some of the special cases that arise in Rust, for |
235 |
| -/// example uninhabited types and variable-length slice patterns. These are drawn attention to |
236 |
| -/// throughout the code below. I'll make a quick note here about how exhaustive integer matching is |
237 |
| -/// accounted for, though. |
238 |
| -/// |
239 |
| -/// Exhaustive integer matching |
240 |
| -/// --------------------------- |
241 |
| -/// An integer type can be thought of as a (huge) sum type: 1 | 2 | 3 | ... |
242 |
| -/// So to support exhaustive integer matching, we can make use of the logic in the paper for |
243 |
| -/// OR-patterns. However, we obviously can't just treat ranges x..=y as individual sums, because |
244 |
| -/// they are likely gigantic. So we instead treat ranges as constructors of the integers. This means |
245 |
| -/// that we have a constructor *of* constructors (the integers themselves). We then need to work |
246 |
| -/// through all the inductive step rules above, deriving how the ranges would be treated as |
247 |
| -/// OR-patterns, and making sure that they're treated in the same way even when they're ranges. |
248 |
| -/// There are really only four special cases here: |
249 |
| -/// - When we match on a constructor that's actually a range, we have to treat it as if we would |
250 |
| -/// an OR-pattern. |
251 |
| -/// + It turns out that we can simply extend the case for single-value patterns in |
252 |
| -/// `specialize` to either be *equal* to a value constructor, or *contained within* a range |
253 |
| -/// constructor. |
254 |
| -/// + When the pattern itself is a range, you just want to tell whether any of the values in |
255 |
| -/// the pattern range coincide with values in the constructor range, which is precisely |
256 |
| -/// intersection. |
257 |
| -/// Since when encountering a range pattern for a value constructor, we also use inclusion, it |
258 |
| -/// means that whenever the constructor is a value/range and the pattern is also a value/range, |
259 |
| -/// we can simply use intersection to test usefulness. |
260 |
| -/// - When we're testing for usefulness of a pattern and the pattern's first component is a |
261 |
| -/// wildcard. |
262 |
| -/// + If all the constructors appear in the matrix, we have a slight complication. By default, |
263 |
| -/// the behaviour (i.e., a disjunction over specialised matrices for each constructor) is |
264 |
| -/// invalid, because we want a disjunction over every *integer* in each range, not just a |
265 |
| -/// disjunction over every range. This is a bit more tricky to deal with: essentially we need |
266 |
| -/// to form equivalence classes of subranges of the constructor range for which the behaviour |
267 |
| -/// of the matrix `P` and new pattern `p` are the same. This is described in more |
268 |
| -/// detail in `split_grouped_constructors`. |
269 |
| -/// + If some constructors are missing from the matrix, it turns out we don't need to do |
270 |
| -/// anything special (because we know none of the integers are actually wildcards: i.e., we |
271 |
| -/// can't span wildcards using ranges). |
| 1 | +//! Note: most of the tests relevant to this file can be found (at the time of writing) in |
| 2 | +//! src/tests/ui/pattern/usefulness. |
| 3 | +//! |
| 4 | +//! This file includes the logic for exhaustiveness and usefulness checking for |
| 5 | +//! pattern-matching. Specifically, given a list of patterns for a type, we can |
| 6 | +//! tell whether: |
| 7 | +//! (a) the patterns cover every possible constructor for the type [exhaustiveness] |
| 8 | +//! (b) each pattern is necessary [usefulness] |
| 9 | +//! |
| 10 | +//! The algorithm implemented here is a modified version of the one described in: |
| 11 | +//! http://moscova.inria.fr/~maranget/papers/warn/index.html |
| 12 | +//! However, to save future implementors from reading the original paper, we |
| 13 | +//! summarise the algorithm here to hopefully save time and be a little clearer |
| 14 | +//! (without being so rigorous). |
| 15 | +//! |
| 16 | +//! # Premise |
| 17 | +//! |
| 18 | +//! The core of the algorithm revolves about a "usefulness" check. In particular, we |
| 19 | +//! are trying to compute a predicate `U(P, p)` where `P` is a list of patterns (we refer to this as |
| 20 | +//! a matrix). `U(P, p)` represents whether, given an existing list of patterns |
| 21 | +//! `P_1 ..= P_m`, adding a new pattern `p` will be "useful" (that is, cover previously- |
| 22 | +//! uncovered values of the type). |
| 23 | +//! |
| 24 | +//! If we have this predicate, then we can easily compute both exhaustiveness of an |
| 25 | +//! entire set of patterns and the individual usefulness of each one. |
| 26 | +//! (a) the set of patterns is exhaustive iff `U(P, _)` is false (i.e., adding a wildcard |
| 27 | +//! match doesn't increase the number of values we're matching) |
| 28 | +//! (b) a pattern `P_i` is not useful if `U(P[0..=(i-1), P_i)` is false (i.e., adding a |
| 29 | +//! pattern to those that have come before it doesn't increase the number of values |
| 30 | +//! we're matching). |
| 31 | +//! |
| 32 | +//! # Core concept |
| 33 | +//! |
| 34 | +//! The idea that powers everything that is done in this file is the following: a value is made |
| 35 | +//! from a constructor applied to some fields. Examples of constructors are `Some`, `None`, `(,)` |
| 36 | +//! (the 2-tuple constructor), `Foo {..}` (the constructor for a struct `Foo`), and `2` (the |
| 37 | +//! constructor for the number `2`). Fields are just a (possibly empty) list of values. |
| 38 | +//! |
| 39 | +//! Some of the constructors listed above might feel weird: `None` and `2` don't take any |
| 40 | +//! arguments. This is part of what makes constructors so general: we will consider plain values |
| 41 | +//! like numbers and string literals to be constructors that take no arguments, also called "0-ary |
| 42 | +//! constructors"; they are the simplest case of constructors. This allows us to see any value as |
| 43 | +//! made up from a tree of constructors, each having a given number of children. For example: |
| 44 | +//! `(None, Ok(0))` is made from 4 different constructors. |
| 45 | +//! |
| 46 | +//! This idea can be extended to patterns: a pattern captures a set of possible values, and we can |
| 47 | +//! describe this set using constructors. For example, `Err(_)` captures all values of the type |
| 48 | +//! `Result<T, E>` that start with the `Err` constructor (for some choice of `T` and `E`). The |
| 49 | +//! wildcard `_` captures all values of the given type starting with any of the constructors for |
| 50 | +//! that type. |
| 51 | +//! |
| 52 | +//! We use this to compute whether different patterns might capture a same value. Do the patterns |
| 53 | +//! `Ok("foo")` and `Err(_)` capture a common value? The answer is no, because the first pattern |
| 54 | +//! captures only values starting with the `Ok` constructor and the second only values starting |
| 55 | +//! with the `Err` constructor. Do the patterns `Some(42)` and `Some(1..10)` intersect? They might, |
| 56 | +//! since they both capture values starting with `Some`. To be certain, we need to dig under the |
| 57 | +//! `Some` constructor and continue asking the question. This is the main idea behind the |
| 58 | +//! exhaustiveness algorithm: by looking at patterns constructor-by-constructor, we can efficiently |
| 59 | +//! figure out if some new pattern might capture a value that hadn't been captured by previous |
| 60 | +//! patterns. |
| 61 | +//! |
| 62 | +//! Constructors are represented by the `Constructor` enum, and its fields by the `Fields` enum. |
| 63 | +//! Most of the complexity of this file resides in transforming between patterns and |
| 64 | +//! (`Constructor`, `Fields`) pairs, handling all the special cases correctly. |
| 65 | +//! |
| 66 | +//! Caveat: this constructors/fields distinction doesn't quite cover every Rust value. For example |
| 67 | +//! a value of type `Rc<u64>` doesn't fit this idea very well, nor do various other things. |
| 68 | +//! However, this idea covers most of the cases that are relevant to exhaustiveness checking. |
| 69 | +//! |
| 70 | +//! |
| 71 | +//! # Algorithm |
| 72 | +//! |
| 73 | +//! Recall that `U(P, p)` represents whether, given an existing list of patterns (aka matrix) `P`, |
| 74 | +//! adding a new pattern `p` will cover previously-uncovered values of the type. |
| 75 | +//! During the course of the algorithm, the rows of the matrix won't just be individual patterns, |
| 76 | +//! but rather partially-deconstructed patterns in the form of a list of fields. The paper |
| 77 | +//! calls those pattern-vectors, and we will call them pattern-stacks. The same holds for the |
| 78 | +//! new pattern `p`. |
| 79 | +//! |
| 80 | +//! For example, say we have the following: |
| 81 | +//! ``` |
| 82 | +//! // x: (Option<bool>, Result<()>) |
| 83 | +//! match x { |
| 84 | +//! (Some(true), _) => {} |
| 85 | +//! (None, Err(())) => {} |
| 86 | +//! (None, Err(_)) => {} |
| 87 | +//! } |
| 88 | +//! ``` |
| 89 | +//! Here, the matrix `P` starts as: |
| 90 | +//! [ |
| 91 | +//! [(Some(true), _)], |
| 92 | +//! [(None, Err(()))], |
| 93 | +//! [(None, Err(_))], |
| 94 | +//! ] |
| 95 | +//! We can tell it's not exhaustive, because `U(P, _)` is true (we're not covering |
| 96 | +//! `[(Some(false), _)]`, for instance). In addition, row 3 is not useful, because |
| 97 | +//! all the values it covers are already covered by row 2. |
| 98 | +//! |
| 99 | +//! A list of patterns can be thought of as a stack, because we are mainly interested in the top of |
| 100 | +//! the stack at any given point, and we can pop or apply constructors to get new pattern-stacks. |
| 101 | +//! To match the paper, the top of the stack is at the beginning / on the left. |
| 102 | +//! |
| 103 | +//! There are two important operations on pattern-stacks necessary to understand the algorithm: |
| 104 | +//! 1. We can pop a given constructor off the top of a stack. This operation is called |
| 105 | +//! `specialize`, and is denoted `S(c, p)` where `c` is a constructor (like `Some` or |
| 106 | +//! `None`) and `p` a pattern-stack. |
| 107 | +//! If the pattern on top of the stack can cover `c`, this removes the constructor and |
| 108 | +//! pushes its arguments onto the stack. It also expands OR-patterns into distinct patterns. |
| 109 | +//! Otherwise the pattern-stack is discarded. |
| 110 | +//! This essentially filters those pattern-stacks whose top covers the constructor `c` and |
| 111 | +//! discards the others. |
| 112 | +//! |
| 113 | +//! For example, the first pattern above initially gives a stack `[(Some(true), _)]`. If we |
| 114 | +//! pop the tuple constructor, we are left with `[Some(true), _]`, and if we then pop the |
| 115 | +//! `Some` constructor we get `[true, _]`. If we had popped `None` instead, we would get |
| 116 | +//! nothing back. |
| 117 | +//! |
| 118 | +//! This returns zero or more new pattern-stacks, as follows. We look at the pattern `p_1` |
| 119 | +//! on top of the stack, and we have four cases: |
| 120 | +//! 1.1. `p_1 = c(r_1, .., r_a)`, i.e. the top of the stack has constructor `c`. We |
| 121 | +//! push onto the stack the arguments of this constructor, and return the result: |
| 122 | +//! r_1, .., r_a, p_2, .., p_n |
| 123 | +//! 1.2. `p_1 = c'(r_1, .., r_a')` where `c ≠ c'`. We discard the current stack and |
| 124 | +//! return nothing. |
| 125 | +//! 1.3. `p_1 = _`. We push onto the stack as many wildcards as the constructor `c` has |
| 126 | +//! arguments (its arity), and return the resulting stack: |
| 127 | +//! _, .., _, p_2, .., p_n |
| 128 | +//! 1.4. `p_1 = r_1 | r_2`. We expand the OR-pattern and then recurse on each resulting |
| 129 | +//! stack: |
| 130 | +//! S(c, (r_1, p_2, .., p_n)) |
| 131 | +//! S(c, (r_2, p_2, .., p_n)) |
| 132 | +//! |
| 133 | +//! 2. We can pop a wildcard off the top of the stack. This is called `D(p)`, where `p` is |
| 134 | +//! a pattern-stack. |
| 135 | +//! This is used when we know there are missing constructor cases, but there might be |
| 136 | +//! existing wildcard patterns, so to check the usefulness of the matrix, we have to check |
| 137 | +//! all its *other* components. |
| 138 | +//! |
| 139 | +//! It is computed as follows. We look at the pattern `p_1` on top of the stack, |
| 140 | +//! and we have three cases: |
| 141 | +//! 1.1. `p_1 = c(r_1, .., r_a)`. We discard the current stack and return nothing. |
| 142 | +//! 1.2. `p_1 = _`. We return the rest of the stack: |
| 143 | +//! p_2, .., p_n |
| 144 | +//! 1.3. `p_1 = r_1 | r_2`. We expand the OR-pattern and then recurse on each resulting |
| 145 | +//! stack. |
| 146 | +//! D((r_1, p_2, .., p_n)) |
| 147 | +//! D((r_2, p_2, .., p_n)) |
| 148 | +//! |
| 149 | +//! Note that the OR-patterns are not always used directly in Rust, but are used to derive the |
| 150 | +//! exhaustive integer matching rules, so they're written here for posterity. |
| 151 | +//! |
| 152 | +//! Both those operations extend straightforwardly to a list or pattern-stacks, i.e. a matrix, by |
| 153 | +//! working row-by-row. Popping a constructor ends up keeping only the matrix rows that start with |
| 154 | +//! the given constructor, and popping a wildcard keeps those rows that start with a wildcard. |
| 155 | +//! |
| 156 | +//! |
| 157 | +//! The algorithm for computing `U` |
| 158 | +//! ------------------------------- |
| 159 | +//! The algorithm is inductive (on the number of columns: i.e., components of tuple patterns). |
| 160 | +//! That means we're going to check the components from left-to-right, so the algorithm |
| 161 | +//! operates principally on the first component of the matrix and new pattern-stack `p`. |
| 162 | +//! This algorithm is realised in the `is_useful` function. |
| 163 | +//! |
| 164 | +//! Base case. (`n = 0`, i.e., an empty tuple pattern) |
| 165 | +//! - If `P` already contains an empty pattern (i.e., if the number of patterns `m > 0`), |
| 166 | +//! then `U(P, p)` is false. |
| 167 | +//! - Otherwise, `P` must be empty, so `U(P, p)` is true. |
| 168 | +//! |
| 169 | +//! Inductive step. (`n > 0`, i.e., whether there's at least one column |
| 170 | +//! [which may then be expanded into further columns later]) |
| 171 | +//! We're going to match on the top of the new pattern-stack, `p_1`. |
| 172 | +//! - If `p_1 == c(r_1, .., r_a)`, i.e. we have a constructor pattern. |
| 173 | +//! Then, the usefulness of `p_1` can be reduced to whether it is useful when |
| 174 | +//! we ignore all the patterns in the first column of `P` that involve other constructors. |
| 175 | +//! This is where `S(c, P)` comes in: |
| 176 | +//! `U(P, p) := U(S(c, P), S(c, p))` |
| 177 | +//! This special case is handled in `is_useful_specialized`. |
| 178 | +//! |
| 179 | +//! For example, if `P` is: |
| 180 | +//! [ |
| 181 | +//! [Some(true), _], |
| 182 | +//! [None, 0], |
| 183 | +//! ] |
| 184 | +//! and `p` is [Some(false), 0], then we don't care about row 2 since we know `p` only |
| 185 | +//! matches values that row 2 doesn't. For row 1 however, we need to dig into the |
| 186 | +//! arguments of `Some` to know whether some new value is covered. So we compute |
| 187 | +//! `U([[true, _]], [false, 0])`. |
| 188 | +//! |
| 189 | +//! - If `p_1 == _`, then we look at the list of constructors that appear in the first |
| 190 | +//! component of the rows of `P`: |
| 191 | +//! + If there are some constructors that aren't present, then we might think that the |
| 192 | +//! wildcard `_` is useful, since it covers those constructors that weren't covered |
| 193 | +//! before. |
| 194 | +//! That's almost correct, but only works if there were no wildcards in those first |
| 195 | +//! components. So we need to check that `p` is useful with respect to the rows that |
| 196 | +//! start with a wildcard, if there are any. This is where `D` comes in: |
| 197 | +//! `U(P, p) := U(D(P), D(p))` |
| 198 | +//! |
| 199 | +//! For example, if `P` is: |
| 200 | +//! [ |
| 201 | +//! [_, true, _], |
| 202 | +//! [None, false, 1], |
| 203 | +//! ] |
| 204 | +//! and `p` is [_, false, _], the `Some` constructor doesn't appear in `P`. So if we |
| 205 | +//! only had row 2, we'd know that `p` is useful. However row 1 starts with a |
| 206 | +//! wildcard, so we need to check whether `U([[true, _]], [false, 1])`. |
| 207 | +//! |
| 208 | +//! + Otherwise, all possible constructors (for the relevant type) are present. In this |
| 209 | +//! case we must check whether the wildcard pattern covers any unmatched value. For |
| 210 | +//! that, we can think of the `_` pattern as a big OR-pattern that covers all |
| 211 | +//! possible constructors. For `Option`, that would mean `_ = None | Some(_)` for |
| 212 | +//! example. The wildcard pattern is useful in this case if it is useful when |
| 213 | +//! specialized to one of the possible constructors. So we compute: |
| 214 | +//! `U(P, p) := ∃(k ϵ constructors) U(S(k, P), S(k, p))` |
| 215 | +//! |
| 216 | +//! For example, if `P` is: |
| 217 | +//! [ |
| 218 | +//! [Some(true), _], |
| 219 | +//! [None, false], |
| 220 | +//! ] |
| 221 | +//! and `p` is [_, false], both `None` and `Some` constructors appear in the first |
| 222 | +//! components of `P`. We will therefore try popping both constructors in turn: we |
| 223 | +//! compute U([[true, _]], [_, false]) for the `Some` constructor, and U([[false]], |
| 224 | +//! [false]) for the `None` constructor. The first case returns true, so we know that |
| 225 | +//! `p` is useful for `P`. Indeed, it matches `[Some(false), _]` that wasn't matched |
| 226 | +//! before. |
| 227 | +//! |
| 228 | +//! - If `p_1 == r_1 | r_2`, then the usefulness depends on each `r_i` separately: |
| 229 | +//! `U(P, p) := U(P, (r_1, p_2, .., p_n)) |
| 230 | +//! || U(P, (r_2, p_2, .., p_n))` |
| 231 | +//! |
| 232 | +//! Modifications to the algorithm |
| 233 | +//! ------------------------------ |
| 234 | +//! The algorithm in the paper doesn't cover some of the special cases that arise in Rust, for |
| 235 | +//! example uninhabited types and variable-length slice patterns. These are drawn attention to |
| 236 | +//! throughout the code below. I'll make a quick note here about how exhaustive integer matching is |
| 237 | +//! accounted for, though. |
| 238 | +//! |
| 239 | +//! Exhaustive integer matching |
| 240 | +//! --------------------------- |
| 241 | +//! An integer type can be thought of as a (huge) sum type: 1 | 2 | 3 | ... |
| 242 | +//! So to support exhaustive integer matching, we can make use of the logic in the paper for |
| 243 | +//! OR-patterns. However, we obviously can't just treat ranges x..=y as individual sums, because |
| 244 | +//! they are likely gigantic. So we instead treat ranges as constructors of the integers. This means |
| 245 | +//! that we have a constructor *of* constructors (the integers themselves). We then need to work |
| 246 | +//! through all the inductive step rules above, deriving how the ranges would be treated as |
| 247 | +//! OR-patterns, and making sure that they're treated in the same way even when they're ranges. |
| 248 | +//! There are really only four special cases here: |
| 249 | +//! - When we match on a constructor that's actually a range, we have to treat it as if we would |
| 250 | +//! an OR-pattern. |
| 251 | +//! + It turns out that we can simply extend the case for single-value patterns in |
| 252 | +//! `specialize` to either be *equal* to a value constructor, or *contained within* a range |
| 253 | +//! constructor. |
| 254 | +//! + When the pattern itself is a range, you just want to tell whether any of the values in |
| 255 | +//! the pattern range coincide with values in the constructor range, which is precisely |
| 256 | +//! intersection. |
| 257 | +//! Since when encountering a range pattern for a value constructor, we also use inclusion, it |
| 258 | +//! means that whenever the constructor is a value/range and the pattern is also a value/range, |
| 259 | +//! we can simply use intersection to test usefulness. |
| 260 | +//! - When we're testing for usefulness of a pattern and the pattern's first component is a |
| 261 | +//! wildcard. |
| 262 | +//! + If all the constructors appear in the matrix, we have a slight complication. By default, |
| 263 | +//! the behaviour (i.e., a disjunction over specialised matrices for each constructor) is |
| 264 | +//! invalid, because we want a disjunction over every *integer* in each range, not just a |
| 265 | +//! disjunction over every range. This is a bit more tricky to deal with: essentially we need |
| 266 | +//! to form equivalence classes of subranges of the constructor range for which the behaviour |
| 267 | +//! of the matrix `P` and new pattern `p` are the same. This is described in more |
| 268 | +//! detail in `split_grouped_constructors`. |
| 269 | +//! + If some constructors are missing from the matrix, it turns out we don't need to do |
| 270 | +//! anything special (because we know none of the integers are actually wildcards: i.e., we |
| 271 | +//! can't span wildcards using ranges). |
272 | 272 | use self::Constructor::*;
|
273 | 273 | use self::SliceKind::*;
|
274 | 274 | use self::Usefulness::*;
|
|
0 commit comments