Skip to content
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Commit f2b28d6

Browse files
authoredMay 25, 2022
Update the now stale warning about PhantomData<T> and dropck
1 parent 10d40c5 commit f2b28d6

File tree

1 file changed

+156
-18
lines changed

1 file changed

+156
-18
lines changed
 

‎src/phantom-data.md

Lines changed: 156 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -42,42 +42,180 @@ struct Iter<'a, T: 'a> {
4242
and that's it. The lifetime will be bounded, and your iterator will be covariant
4343
over `'a` and `T`. Everything Just Works.
4444

45-
Another important example is Vec, which is (approximately) defined as follows:
45+
## Generic parameters and drop-checking
46+
47+
In the past, there used to be another thing to take into consideration.
48+
49+
This very documentation used to say:
50+
51+
> Another important example is Vec, which is (approximately) defined as follows:
52+
>
53+
> ```rust
54+
> struct Vec<T> {
55+
> data: *const T, // *const for variance!
56+
> len: usize,
57+
> cap: usize,
58+
> }
59+
> ```
60+
>
61+
> Unlike the previous example, it *appears* that everything is exactly as we
62+
> want. Every generic argument to Vec shows up in at least one field.
63+
> Good to go!
64+
65+
> Nope.
66+
67+
> The drop checker will generously determine that `Vec<T>` does not own any values
68+
> of type T. This will in turn make it conclude that it doesn't need to worry
69+
> about Vec dropping any T's in its destructor for determining drop check
70+
> soundness. This will in turn allow people to create unsoundness using
71+
> Vec's destructor.
72+
73+
> In order to tell the drop checker that we *do* own values of type T, and
74+
> therefore may drop some T's when *we* drop, we must add an extra `PhantomData`
75+
> saying exactly that:
76+
>
77+
> ```rust
78+
> use std::marker;
79+
>
80+
> struct Vec<T> {
81+
> data: *const T, // *const for variance!
82+
> len: usize,
83+
> cap: usize,
84+
> _marker: marker::PhantomData<T>,
85+
> }
86+
> ```
87+
88+
But ever since [RFC 1238](https://rust-lang.github.io/rfcs/1238-nonparametric-dropck.html),
89+
**this is no longer true nor necessary**.
90+
91+
If you were to write:
4692
4793
```rust
4894
struct Vec<T> {
49-
data: *const T, // *const for variance!
95+
data: *const T, // `*const` for variance!
5096
len: usize,
5197
cap: usize,
5298
}
99+
100+
impl<T> Drop for Vec<T> { // etc.
53101
```
54102
55-
Unlike the previous example, it *appears* that everything is exactly as we
56-
want. Every generic argument to Vec shows up in at least one field.
57-
Good to go!
103+
then the existence of that `impl<T> Drop for Vec<T>` makes it so Rust will consider
104+
that that `Vec<T>` _owns_ values of type `T` (more precisely: may use values of type `T`
105+
in its `Drop` implementation), and Rust will thus not allow them to _dangle_ should a
106+
`Vec<T>` be dropped.
107+
108+
**Adding an extra `_marker: PhantomData<T>` field is thus _superflous_ and accomplishes nothing**.
109+
110+
___
111+
112+
But this situation can sometimes lead to overly restrictive code. That's why the
113+
standard library uses an unstable and `unsafe` attribute to opt back into the old
114+
"unchecked" drop-checking behavior, that this very documentation warned about: the
115+
`#[may_dangle]` attribute.
116+
117+
### An exception: the special case of the standard library and its unstable `#[may_dangle]`
118+
119+
This section can be skipped if you are only writing your own library code; but if you are
120+
curious about what the standard library does with the actual `Vec` definition, you'll notice
121+
that it still needs to use a `_marker: PhantomData<T>` field for soundness.
122+
123+
<details><summary>Click here to see why</summary>
124+
125+
Consider the following example:
58126

59-
Nope.
127+
```rust
128+
{
129+
let mut v: Vec<&str> = Vec::new();
130+
let s: String = "Short-lived".into();
131+
v.push(&s);
132+
drop(s);
133+
} // <- `v` is dropped here
134+
```
135+
136+
with a classical `impl<T> Drop for Vec<T> {` definition, the above [is denied].
137+
138+
[is denied]: https://rust.godbolt.org/z/ans15Kqz3
139+
140+
Indeed, in this case we have a `Vec</* T = */ &'s str>` vector of `'s`-lived references
141+
to `str`ings, but in the case of `let s: String`, it is dropped before the `Vec` is, and
142+
thus `'s` **is expired** by the time the `Vec` is dropped, and the
143+
`impl<'s> Drop for Vec<&'s str> {` is used.
60144

61-
The drop checker will generously determine that `Vec<T>` does not own any values
62-
of type T. This will in turn make it conclude that it doesn't need to worry
63-
about Vec dropping any T's in its destructor for determining drop check
64-
soundness. This will in turn allow people to create unsoundness using
65-
Vec's destructor.
145+
This means that if such `Drop` were to be used, it would be dealing with an _expired_, or
146+
_dangling_ lifetime `'s`. But this is contrary to Rust principles, where by default all
147+
Rust references involved in a function signature are non-dangling and valid to dereference.
66148

67-
In order to tell the drop checker that we *do* own values of type T, and
68-
therefore may drop some T's when *we* drop, we must add an extra `PhantomData`
69-
saying exactly that:
149+
Hence why Rust has to conservatively deny this snippet.
150+
151+
And yet, in the case of the real `Vec`, the `Drop` impl does not care about `&'s str`,
152+
_since it has no drop glue of its own_: it only wants to deallocate the backing buffer.
70153

154+
In other words, it would be nice if the above snippet was somehow accepted, by special
155+
casing `Vec`, or by relying on some special property of `Vec`: `Vec` could try to
156+
_promise not to use the `&'s str`s it holds when being dropped_.
157+
158+
This is the kind of `unsafe` promise that can be expressed with `#[may_dangle]`:
159+
71160
```rust
72-
use std::marker;
161+
unsafe impl<#[may_dangle] 's> Drop for Vec<&'s str> {
162+
```
163+
164+
or, more generally:
165+
166+
```rust
167+
unsafe impl<#[may_dangle] T> Drop for Vec<T> {
168+
```
169+
170+
is the `unsafe` way to opt out of this conservative assumption that Rust's drop
171+
checker makes about type parameters of a dropped instance not being allowed to dangle.
172+
173+
And when this is done, such as in the standard library, we need to be careful in the
174+
case where `T` has drop glue of its own. In this instance, imagine replacing the
175+
`&'s str`s with a `struct PrintOnDrop<'s> /* = */ (&'s str);` which would have a
176+
`Drop` impl wherein the inner `&'s str` would be dereferenced and printed to the screen.
177+
178+
Indeed, `Drop for Vec<T> {`, before deallocating the backing buffer, does have to transitively
179+
drop each `T` item when it has drop glue; in the case of `PrintOnDrop<'s>`, it means that
180+
`Drop for Vec<PrintOnDrop<'s>>` has to transitively drop the `PrintOnDrop<'s>`s elements before
181+
deallocating the backing buffer.
182+
183+
So when we said that `'s` `#[may_dangle]`, it was an excessively loose statement. We'd rather want
184+
to say: "`'s` may dangle provided it not be involved in some transitive drop glue". Or, more generally,
185+
"`T` may dangle provided it not be involved in some transitive drop glue". This "exception to the
186+
exception" is a pervasive situation whenever **we own a `T`**. That's why Rust's `#[may_dangle]` is
187+
smart enough to know of this opt-out, and will thus be disabled _when the generic parameter is held
188+
in an owning fashion_ by the fields of the struct.
189+
190+
Hence why the standard library ends up with:
191+
192+
```rust
193+
// we pinky-swear not to use `T` here…
194+
unsafe impl<#[may_dangle] T> Drop for Vec<T> {
195+
fn drop(&mut self) {
196+
unsafe {
197+
if mem::needs_drop::<T>() {
198+
/* uhh */
199+
ptr::drop_in_place::<[T]>(…);
200+
}
201+
dealloc(…) …
202+
}
203+
}
204+
}
73205

74206
struct Vec<T> {
75-
data: *const T, // *const for variance!
207+
// …except for the fact that we may be dropping `T` items! => the "uhh" is okay!
208+
_marker: PhantomData<T>,
209+
210+
ptr: *const T, // `*const` for variance (but this does not express ownership of a `T` *per se*)
76211
len: usize,
77212
cap: usize,
78-
_marker: marker::PhantomData<T>,
79-
}
213+
}
80214
```
215+
216+
</details>
217+
218+
___
81219

82220
Raw pointers that own an allocation is such a pervasive pattern that the
83221
standard library made a utility for itself called `Unique<T>` which:

0 commit comments

Comments
 (0)
Please sign in to comment.