-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Type conversions #401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Type conversions #401
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,437 @@ | ||
- Start Date: (fill me in with today's date, YYYY-MM-DD) | ||
- RFC PR #: (leave this empty) | ||
- Rust Issue #: (leave this empty) | ||
|
||
# Summary | ||
|
||
Describe the various kinds of type conversions available in Rust and suggest | ||
some tweaks. | ||
|
||
Provide a mechanism for smart pointers to be part of the DST coercion system. | ||
|
||
Reform coercions from functions to closures. | ||
|
||
The `transmute` intrinsic and other unsafe methods of type conversion are not | ||
covered by this RFC. | ||
|
||
|
||
# Motivation | ||
|
||
It is often useful to convert a value from one type to another. This conversion | ||
might be implicit or explicit and may or may not involve some runtime action. | ||
Such conversions are useful for improving reuse of code, and avoiding unsafe | ||
transmutes. | ||
|
||
Our current rules around type conversions are not well-described. The different | ||
conversion mechanisms interact poorly and the implementation is somewhat ad-hoc. | ||
|
||
# Detailed design | ||
|
||
Rust has several kinds of type conversion: subtyping, coercion, and casting. | ||
Subtyping and coercion are implicit, there is no syntax. Casting is explicit, | ||
using the `as` keyword. The syntax for a cast expression is: | ||
|
||
``` | ||
e_cast ::= e as U | ||
``` | ||
|
||
Where `e` is any valid expression and `U` is any valid type (note that we | ||
restrict in type checking the valid types for `U`). | ||
|
||
These conversions (and type equality) form a total order in terms of their | ||
strength. For any types `T` and `U`, if `T == U` then `T` is also a subtype of | ||
`U`. If `T` is a subtype of `U`, then `T` coerces to `U`, and if `T` coerces to | ||
`U`, then `T` can be cast to `U`. | ||
|
||
There is an additional kind of coercion which does not fit into that total order | ||
- implicit coercions of receiver expressions. (I will use 'expression coercion' | ||
when I need to distinguish coercions in non-receiver position from coercions of | ||
receivers). All expression coercions are valid receiver coercions, but not all | ||
receiver coercions are valid casts. | ||
|
||
Finally, I will discuss function polymorphism, which is something of a coercion | ||
edge case. | ||
|
||
## Subtyping | ||
|
||
Subtyping is implicit and can occur at any stage in type checking or inference. | ||
Subtyping in Rust is very restricted and occurs only due to variance with | ||
respect to lifetimes and between types with higher ranked lifetimes. If we were | ||
to erase lifetimes from types, then the only subtyping would be due to type | ||
equality. | ||
|
||
|
||
## Coercions | ||
|
||
A coercion is implicit and has no syntax. A coercion can only occur at certain | ||
coercion sites in a program, these are typically places where the desired type | ||
is explicit or can be dervied by propagation from explicit types (without type | ||
inference). The base cases are: | ||
|
||
* In `let` statements where an explicit type is given: in `let _: U = e;`, `e` | ||
is coerced to to have type `U`; | ||
|
||
* In statics and consts, similarly to `let` statements; | ||
|
||
* In argument position for function calls. The value being coerced is the actual | ||
parameter and it is coerced to the type of the formal parameter. For example, | ||
where `foo` is defined as `fn foo(x: U) { ... }` and is called with `foo(e);`, | ||
`e` is coerced to have type `U`; | ||
|
||
* Where a field of a struct or variant is instantiated. E.g., where `struct Foo | ||
{ x: U }` and the instantiation is `Foo { x: e }`, `e` is coerced to to have | ||
type `U`; | ||
|
||
* The result of a function, either the final line of a block if it is not semi- | ||
colon terminated or any expression in a `return` statement. For example, for | ||
`fn foo() -> U { e }`, `e` is coerced to to have type `U`; | ||
|
||
If the expression in one of these coercion sites is a coercion-propagating | ||
expression, then the relevant sub-expressions in that expression are also | ||
coercion sites. Propagation recurses from these new coercion sites. Propagating | ||
expressions and their relevant sub-expressions are: | ||
|
||
* array literals, where the array has type `[U, ..n]`, each sub-expression in | ||
the array literal is a coercion site for coercion to type `U`; | ||
|
||
* array literals with repeating syntax, where the array has type `[U, ..n]`, the | ||
repeated sub-expression is a coercion site for coercion to type `U`; | ||
|
||
* tuples, where a tuple is a coercion site to type `(U_0, U_1, ..., U_n)`, each | ||
sub-expression is a coercion site for the respective type, e.g., the zero-th | ||
sub-expression is a coercion site to `U_0`; | ||
|
||
* the box expression, if the expression has type `Box<U>`, the sub-expression is | ||
a coercion site to `U` (I expect this to be generalised when `box` expressions | ||
are); | ||
|
||
* parenthesised sub-expressions (`(e)`), if the expression has type `U`, then | ||
the sub-expression is a coercion site to `U`; | ||
|
||
* blocks, if a block has type `U`, then the last expression in the block (if it | ||
is not semicolon-terminated) is a coercion site to `U`. This includes blocks | ||
which are part of control flow statements, such as `if`/`else`, if the block | ||
has a known type. | ||
|
||
|
||
Note that we do not perform coercions when matching traits (except for | ||
receivers, see below). If there is an impl for some type `U` and `T` coerces to | ||
`U`, that does not constitute an implementation for `T`. For example, the | ||
following will not type check, even though it is OK to coerce `t` to `&T` and | ||
there is an impl for `&T`: | ||
|
||
``` | ||
struct T; | ||
trait Trait {} | ||
fn foo<X: Trait>(t: X) {} | ||
impl<'a> Trait for &'a T {} | ||
fn main() { | ||
let t: &mut T = &mut T; | ||
foo(t); //~ ERROR failed to find an implementation of trait Trait for &mut T | ||
} | ||
``` | ||
|
||
In a cast expression, `e as U`, the compiler will first attempt to coerce `e` to | ||
`U`, only if that fails will the conversion rules for casts (see below) be | ||
applied. | ||
|
||
Coercion is allowed between the following types: | ||
|
||
* `T` to `U` if `T` is a subtype of `U` (the 'identity' case); | ||
|
||
* `T_1` to `T_3` where `T_1` coerces to `T_2` and `T_2` coerces to `T_3` | ||
(transitivity case); | ||
|
||
* `&mut T` to `&T`; | ||
|
||
* `*mut T` to `*const T`; | ||
|
||
* `&T` to `*const T`; | ||
|
||
* `&mut T` to `*mut T`; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How does this interact with raw pointers that denote arrays, like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This only allows casts TO There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If I pass a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It won't be a buffer overrun if the length is less than or equal to 1; it's trivial to get a buffer overrun with Also, note that we currently support this |
||
|
||
* `T` to `U` if `T` implements `CoerceUnsized<U>` (see below) and `T = Foo<...>` | ||
and `U = Foo<...>` (for any `Foo`, when we get HKT I expect this could be a | ||
constraint on the `CoerceUnsized` trait, rather than being checked here); | ||
|
||
* From TyCtor(`T`) to TyCtor(coerce_inner(`T`)) (these coercions could be | ||
provided by implementing `CoerceUnsized` for all instances of TyCtor); | ||
|
||
where TyCtor(`T`) is one of `&T`, `&mut T`, `*const T`, `*mut T`, or `Box<T>`. | ||
And where coerce_inner is defined as | ||
|
||
* coerce_inner(`[T, ..n]`) = `[T]`; | ||
|
||
* coerce_inner(`T`) = `U` where `T` is a concrete type which implements the | ||
trait `U`; | ||
|
||
* coerce_inner(`T`) = `U` where `T` is a sub-trait of `U`; | ||
|
||
* coerce_inner(`Foo<..., T, ...>`) = `Foo<..., coerce_inner(T), ...>` where | ||
`Foo` is a struct and only the last field has type `T` and `T` is not part of | ||
the type of any other fields; | ||
|
||
* coerce_inner(`(..., T)`) = `(..., coerce_inner(T))`. | ||
|
||
Note that coercing from sub-trait to a super-trait is a new coercion and is non- | ||
trivial. One implementation strategy which avoids re-computation of vtables is | ||
given in RFC PR #250. | ||
|
||
A note for the future: although there hasn't been an RFC nor much discussion, it | ||
is likely that post-1.0 we will add type ascription to the language (see #354). | ||
That will (probably) allow any expression to be annotated with a type (e.g, | ||
`foo(a, b: T, c)` a function call where the second argument has a type | ||
annotation). | ||
|
||
Type ascription is purely descriptive and does not cast the sub-expression to | ||
the required type. However, it seems sensible that type ascription would be a | ||
coercion site, and thus type ascription would be a way to make implicit | ||
coercions explicit. There is a danger that such coercions would be confused with | ||
casts. I hope the rule that casting should change the type and type ascription | ||
should not is enough of a discriminant. Perhaps we will need a style guideline | ||
to encourage either casts or type ascription to force an implicit coercion. | ||
Perhaps type ascription should not be a coercion site. Or perhaps we don't need | ||
type ascription at all if we allow trivial casts. | ||
|
||
|
||
### Custom unsizing coercions | ||
|
||
It should be possible to coerce smart pointers (e.g., `Rc`) in the same way as | ||
the built-in pointers. In order to do so, we provide two traits and an intrinsic | ||
to allow users to make their smart pointers work with the compiler's coercions. | ||
It might be possible to implement some of the coercions described for built-in | ||
pointers using this machinery, whether that is a good idea or not is an | ||
implementation detail. | ||
|
||
``` | ||
// Cannot be impl'ed - it really is quite a magical trait, see the cases below. | ||
trait Unsize<Sized? U> for Sized? {} | ||
``` | ||
|
||
The `Unsize` trait is a marker trait and a lang item. It should not be | ||
implemented by users and user implementations will be ignored. The compiler will | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it would be better to ban such implementations than ignore them. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That is a reasonable alternative. It would require an ad-hoc check in the type checker though |
||
assume the following implementations, these correspond to the definition of | ||
coerce_inner, above; note that these cannot be expressed in real Rust: | ||
|
||
``` | ||
impl<T, n: int> Unsize<[T]> for [T, ..n] {} | ||
// Where T is a trait | ||
impl<Sized? T, U: T> Unsize<T> for U {} | ||
// Where T and U are traits | ||
impl<Sized? T, Sized? U: T> Unsize<T> for U {} | ||
// Where T and U are structs ... following the rules for coerce_inner | ||
impl Unsize<T> for U {} | ||
impl Unsize<(..., T)> for (..., U) | ||
where U: Unsize(T) {} | ||
``` | ||
|
||
The `CoerceUnsized` trait should be implemented by smart pointers and containers | ||
which want to be part of the coercions system. | ||
|
||
``` | ||
trait CoerceUnsized<U> { | ||
fn coerce(self) -> U; | ||
} | ||
``` | ||
|
||
To help implement `CoerceUnsized`, we provide an intrinsic - | ||
`fat_pointer_convert`. This takes and returns raw pointers. The common case will | ||
be to take a thin pointer, unsize the contents, and return a fat pointer. But | ||
the exact behaviour depends on the types involved. This will perform any | ||
computation associated with a coercion (for example, adjusting or creating | ||
vtables). The implementation of fat_pointer_convert will match what the | ||
compiler must do in coerce_inner as described above. | ||
|
||
``` | ||
intrinsic fn fat_pointer_convert<Sized? T, Sized? U>(t: *const T) -> *const U | ||
where T : Unsize<U>; | ||
``` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the difference between this and Alternately, can this be part of the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
|
||
Here is an example implementation of `CoerceUnsized` for `Rc`: | ||
|
||
``` | ||
impl<Sized? T, Sized? U> CoerceUnsized<Rc<T>> for Rc<U> { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this need some kind of where clause for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, I don't think so. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How does that work? As written, it seems like you could [manually] call coerce() to convert any There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, actually I guess you do need a where clause on the impl - |
||
where U: Unsize<T> | ||
fn coerce(self) -> Rc<T> { | ||
let new_ptr: *const RcBox<T> = fat_pointer_convert(self._ptr); | ||
Rc { _ptr: new_ptr } | ||
} | ||
} | ||
``` | ||
|
||
## Coercions of receiver expressions | ||
|
||
These coercions occur when matching the type of the receiver of a method call | ||
with the self type (i.e., the type of `e` in `e.m(...)`) or in field access. | ||
These coercions can be thought of as a feature of the `.` operator, they do not | ||
apply when using the UFCS form with the self argument in argument position. Only | ||
an expression before the dot is coerced as a receiver. When using the UFCS form | ||
of method call, arguments are only coerced according to the expression coercion | ||
rules. This matches the rules for dispatch - dynamic dispatch only happens using | ||
the `.` operator, not the UFCS form. | ||
|
||
In method calls the target type of the coercion is the concrete type of the impl | ||
in which the method is defined, modified by the type of `self`. Assuming the | ||
impl is for `T`, the target type is given by: | ||
|
||
self | target type | ||
------------------|------------ | ||
`self` | `T` | ||
`&self` | `&T` | ||
`&mut self` | `&mut T` | ||
`self: Box<Self>`| `Box<T>` | ||
|
||
and likewise with any variations of the self type we might add in the future. | ||
|
||
For field access, the target type is `&T`, `&mut T` for field assignment, | ||
where `T` is a struct with the named field. | ||
|
||
A receiver coercion consists of some number of dereferences (either compiler | ||
built-in (of a borrowed reference or `Box` pointer, not raw pointers) or custom, | ||
given by the `Deref` trait), one or zero applications of `coerce_inner` or use | ||
of the `CoerceUnsized` trait (as defined above, note that this requires we are | ||
at a type which has neither references nor dereferences at the top level), and | ||
up to two address-of operations (i.e., `T` to `&T`, `&mut T`, `*const T`, or | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When is a second address-of operation necessary? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When the impl is for |
||
`*mut T`, with a fresh lifetime.). The usual mutability rules for taking a | ||
reference apply. (Note that the implementation of the coercion isn't so simple, | ||
it is embedded in the search for candidate methods, but from the point of view | ||
of type conversions, that is not relevant). | ||
|
||
Alternatively, a receiver coercion may be thought of as a two stage process. | ||
First, we dereference and then take the address until the source type has the | ||
same shape (i.e., has the same kind and number of indirection) as the target | ||
type. Then we try to coerce the adjusted source type to the target type using | ||
the usual coercion machinery. I believe, but have not proved, that these two | ||
descriptions are equivalent. | ||
|
||
|
||
## Casts | ||
|
||
Casting is indicated by the `as` keyword. A cast `e as U` is valid if one of the | ||
following holds: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't see it spelled out here (but I guess neither are numeric casts), There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No casting from u8 to u32 or u32 to u8? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Numeric casts are an omission, I'll add them in. @ben0x539 what are There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, |
||
|
||
* `e` has type `T` and `T` coerces to `U`; | ||
|
||
* `e` has type `*T` and `U` is `*U_0` (i.e., between any raw pointers); | ||
|
||
* `e` has type `*T` and `U` is `uint` , or vice versa; | ||
|
||
* `e` has type `T` and `T` and `U` are any numeric types; | ||
|
||
* `e` is a C-like enum and `U` is any integer type, `bool`; | ||
|
||
* `e` has type `T` and `T == u8` and `U == char`; | ||
|
||
* `e` has type `T` and `T == &[V, ..n]` or `T == &V` and `U == *const V`, and | ||
similarly for the mutable variants to either `*const V` or `*mut V`. | ||
|
||
Casting is not transitive, that is, even if `e as U1 as U2` is a valid | ||
expression, `e as U2` is not necessarily so (in fact it will only be valid if | ||
`U1` coerces to `U2`). | ||
|
||
A cast may require a runtime conversion. | ||
|
||
There will be a lint for trivial casts. A trivial cast is a cast `e as T` where | ||
`e` has type `U` and `U` is a subtype of `T`. The lint will be warn by default. | ||
|
||
|
||
## Function type polymorphism | ||
|
||
Currently, functions may be used where a closure is expected by coercing a | ||
function to a closure. We will remove this coercion and instead use the | ||
following scheme: | ||
|
||
* Every function item has its own fresh type. This type cannot be written by the | ||
programmer (i.e., it is expressible but not denotable). | ||
* Conceptually, for each fresh function type, there is an automatically generated | ||
implementation of the `Fn`, `FnMut`, and `FnOnce` traits. | ||
* All function types are implicitly coercible to a `fn()` type with the | ||
corresponding parameter types. | ||
* Conceptually, there is an implementation of `Fn`, `FnMut`, and `FnOnce` for | ||
every `fn()` type. | ||
* `Fn`, `FnMut`, or `FnOnce` trait objects and references to type parameters | ||
bounded by these traits may be considered to have the corresponding unboxed | ||
closure type. This is a desugaring (alias), rather than a coercion. This is | ||
an existing part of the unboxed closures work. | ||
|
||
These steps should allow for functions to be stored in variables with both | ||
closure and function type. It also allows variables with function type to be | ||
stored as a variable with closure type. Note that these have different | ||
dynamic semantics, as described below. For example, | ||
|
||
``` | ||
fn foo() { ... } // `foo` has a fresh and non-denotable type. | ||
fn main() { | ||
let x: fn() = foo; // `foo` is coerced to `fn()`. | ||
let y: || = x; // `x` is coerced to `&Fn` (a closure object), | ||
// legal due to the `fn()` auto-impls. | ||
let z: || = foo; // `foo` is coerced to `&T` where `T` is fresh and | ||
// bounded by `Fn`. Legal due to the fresh function | ||
// type auto-impls. | ||
} | ||
``` | ||
|
||
The two kinds of auto-generated impls are rather different: the first case (for | ||
the fresh and non-denotable function types) is a static call to `Fn::Call`, | ||
which in turn calls the function with the given arguments. The first call would | ||
be inlined (in fact, the impls and calls to them may be special-cased by the | ||
compiler). In the second case (for `fn()` types), we must execute a virtual call | ||
to find the implementing method and then call the function itself because the | ||
function is 'wrapped' in a closure object. | ||
|
||
|
||
## Changes required | ||
|
||
* Add cast from unsized slices to raw pointers (`&[V] to *V`); | ||
|
||
* allow coercions as casts and add lint for trivial casts; | ||
|
||
* ensure we support all coercion sites; | ||
|
||
* remove [T, ..n] to &[T]/*[T] coercions; | ||
|
||
* add raw pointer coercions; | ||
|
||
* add sub-trait coercions; | ||
|
||
* add unsized tuple coercions; | ||
|
||
* add all transitive coercions; | ||
|
||
* receiver coercions - add referencing to raw pointers, remove triple | ||
referencing for slices; | ||
|
||
* remove function coercions, add function type polymorphism; | ||
|
||
* add DST/custom coercions. | ||
|
||
|
||
# Drawbacks | ||
|
||
We are adding and removing some coercions. There is always a trade-off with | ||
implicit coercions on making Rust ergonomic vs making it hard to comprehend due | ||
to magical conversions. By changing this balance we might be making some things | ||
worse. | ||
|
||
|
||
# Alternatives | ||
|
||
These rules could be tweaked in any number of ways. | ||
|
||
Specifically for the DST custom coercions, the compiler could throw an error if | ||
it finds a user-supplied implementation of the `Unsize` trait, rather than | ||
silently ignoring them. | ||
|
||
# Unresolved questions | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally! ❤️