From 4b4cfe8565012b2abe0b368999c1c45cb7db6f2b Mon Sep 17 00:00:00 2001 From: petrochenkov Date: Mon, 29 Sep 2014 03:48:25 +0400 Subject: [PATCH 1/8] initial commit --- active/0000-statically-sized-literals.md | 246 +++++++++++++++++++++++ 1 file changed, 246 insertions(+) create mode 100644 active/0000-statically-sized-literals.md diff --git a/active/0000-statically-sized-literals.md b/active/0000-statically-sized-literals.md new file mode 100644 index 00000000000..6baf0829ab1 --- /dev/null +++ b/active/0000-statically-sized-literals.md @@ -0,0 +1,246 @@ +- Start Date: 2014-09-29 +- RFC PR: (leave this empty) +- Rust Issue: (leave this empty) + +# Summary + +Change the types of array, byte string and string literals to be (references to) statically sized types. +Introduce strings of fixed size. + +# Motivation + +Currently byte string and string literals have types `&'static [u8]` and `&'static str`. +Therefore. although the sizes of the literals are known at compile time, they are erased from their types and inaccessible until runtime. +This RFC suggests to change the types to `&'static [u8, ..N]` and `&'static str[..N]` respectively. +Additionally this RFC suggests to change the types of array literals from `[T, ..N]` to &'a [T, ..N] for consistency and ergonomics. + +Today, given the lack of non-type generic parameters and compile time (function) evaluation (CTE), strings of fixed size are not very useful. +But after introducing CTE the need in compile time string operations will raise quickly. +Even without CTE but with non-type generic parameters alone strings of fixed size can be used in runtime for "heapless" string operations, which are useful in constrained environments or for optimization. +So the main motivation for changes today is forward compatibility and before 1.0 `str[..N]` can be implemented as minimally as possible to allow the change of the types of string literals. + +Examples of use for new literals, that are not possible with old literals: + +``` +// Today: initialize mutable array with literal +let mut arr: [u8, ..3] = *b"abc"; +arr[0] = b'd'; + +// Future with CTE: compile time string concatenation +static LANG_DIR: str[..5 /*The size should, probably, be inferred*/ ] = *"lang/"; +static EN_FILE: str[.._] = LANG_DIR + *"en"; // str[..N] implements Add +static FR_FILE: str[.._] = LANG_DIR + *"fr"; + +// Future without CTE: runtime "heapless" string concatenation +let DE_FILE = LANG_DIR + *"de"; // Performed at runtime if not optimized +``` + +# Detailed design + +### Proposed changes: + +1) +Change the type of array literals from [T, ..N] to &'a [T, ..N]. +Change the type of byte string literals from &'static [u8] to &'static [u8, ..N]. +Change the type of string literals form &'static str to &'static str[..N]. + +2) +Introduce the missing family of types - strings of fixed size - `str[..N]`. +`str[..N]` is essentially a `[u8, ..N]` with UTF-8 invariants and, eventually, additional string methods/traits. +It fills the gap in the vector/string table: +Vec | String +---------------- +[T, N] | ??? +---------------- +&[T] | &str + +### Static lifetime + +Although all the literals under consideration are very similar and are essentially arrays of fixed size, array literals are different from byte string and string literals with regard to lifetimes. +While byte string and string literals can always be placed into static memory and have static lifetime, array literals can depend on local variables and can't have static lifetime in general case. The chosen design potentially allows to trivially enhance *some* array literals with static lifetime in the future to allow use like +``` +fn f() -> &'static [int] { + [1, 2, 3] +} +``` +, but this RFC doesn't propose such an enhancement. + +### Statistics for array literals + +Array literals can be used both as slices when a view to array is enough to perform the task, and as values when arrays themselves should be copied or modified. +The exact estimation of the frequency of both uses is problematic, but some regex search in the rust codebase gives the next statistics: + +In approximately 70% array literals are used as slices (explicit `&` on array literals, immutable bindings). +In approximately 20% array literals are used as values (initialization of struct fields, mutable bindings, boxes). +In approximately 10% the use is unclear. + +So, in most cases changing the type of array literals will lead to shorted notation. + +### Backward compatibility + +No code using the literas as slices is broken, DST coercions `&[T, ..N] -> &[T], &str[..N] -> &str` do all the job for compatibility. +``` +fn f(arg: &str) {} +f("Hello"); // DST coercion + +static GOODBYE: &'static str = "Goodbye"; // DST coercion + +fn main() { + let s = "Hello"; + fn f(arg: &str) {} + f(s); // No breakage, DST coercion +} + +fn g(arg: &[int]) {} +g([1i, 2, 3]); // DST coercion &[int, ..3] -> &[int] +``` + +Unfortunately, autocoercions from arrays of fixed size to slices was prohibited too soon and a lot of array literals like `[1, 2, 3]` were changed to `&[1, 2, 3]`. These changes have to be reverted (but the prohibition of autocoercions stays in place). + +Code using array literals as values is broken, but can be fixed easily. +``` +// Array as a struct field +struct S { + arr: [int, ..3], +} + +let s = S { arr: [1, 2, 3] }; // Have to be changed to let s = S { arr: *[1, 2, 3] }; + +// Mutable array +let mut a = [1i, 2, 3]; // Have to be changed to let mut a = *[1i, 2, 3]; +``` + +This change has some benefits - you have to opt-in to use arrays as values and potentially costly array copies become a bit more visible. +Anyway, array literals are less frequently used as values (see the statistics), but more often as slices. + +### Precedents + +C and C++ string literals are `char` arrays of fixed size. +C++ library proposal for strings of fixed size ([link][1]), the paper also contains some discussion and motivation. + +# Drawbacks + +Some breakage for array literals. See "Backward compatibility" section. + +# Alternatives + +The alternative design is to make the literals values and not references. + +### Necessary changes + +1) +Keep the types of array literals as `[T, ..N]`. +Change the types of byte literals from `&'static [u8]` to `[u8, ..N]` +Change the types of string literals form `&'static str` to to `str[..N]`. + +2) +Introduce the missing family of types - strings of fixed size - `str[..N]`. +... + +3) +Add the autocoercion of array *literals* (not arrays of fixed size in general) to slices. +Add the autocoercion of new byte literals to slices. +Add the autocoercion of new string literals to slices. +Non-literal arrays and strings do not autocoerce to slices, in accordance with the general agreements on explicitness. + +4) +(Optional) Make string and byte literals lvalues with static lifetime + +Examples of use: +``` +// Today: initialize mutable array with literal +let mut arr: [u8, ..3] = b"abc"; +arr[0] = b'd'; + +// Future with CTE: compile time string concatenation +static LANG_DIR: str[.._] = "lang/"; +static EN_FILE: str[.._] = LANG_DIR + "en"; // str[..N] implements Add +static FR_FILE: str[.._] = LANG_DIR + "fr"; + +// Future without CTE: runtime "heapless" string concatenation +let DE_FILE = LANG_DIR + "de"; // Performed at runtime if not optimized +``` + +### Drawbacks of the alternative design + +Special rules about (byte) string literals being lvalues add a bit of unnecessary complexity to the specification. + +In theory `let s = "abcd";` copies the string from static memory to stack, but the copy is unobservable an can, probably, be elided in most cases. + +The set of additional autocoercions has to exist for ergonomic purpose (and for backward compatibility). +Writing something like: +``` +fn f(arg: &str) {} +f("Hello"[]); +f(&"Hello"); +``` +for all literals would be just unacceptable. + +Minor breakage: +``` +fn main() { + let s = "Hello"; + fn f(arg: &str) {} + f(s); // Will require explicit slicing f(s[]) or implicit DST coersion from reference f(&s) +} +``` + +### Status quo + +Status quo (or applying the changes partially) is always an alternative. + +### Drawbacks of status quo + +Examples: +``` +// Today: can't use byte string literals in some cases +let mut arr: [u8, ..3] = [b'a', b'b', b'c']; // Have to use array literals +arr[0] = b'd'; + +// Future: str[..N] is added, CTE is added, but the literal types remains old +let mut arr: [u8, ..3] = b"abc".to_fixed(); // Have to use a conversion method +arr[0] = b'd'; + +static LANG_DIR: str[.._] = "lang/".to_fixed(); // Have to use a conversion method +static EN_FILE: str[.._] = LANG_DIR + "en".to_fixed(); +static FR_FILE: str[.._] = LANG_DIR + "fr".to_fixed(); + +// Bad future: str[..N] is not added +// "Heapless"/compile-time string operations aren't possible, or performed with "magic" like extended concat! or recursive macros. +``` + +Note, that in the "Future" scenario the return *type* of `to_fixed` depends on the *value* of `self`, so it requires sufficiently advanced CTE, for example C++14 with its powerful `constexpr` machinery still doesn't allow to write such a function. + +# Unresolved questions + +If `str` is moved from core language to the library and implemented as a wrapper around u8 array, then strings of fixed size will require additional care. + +Assume we implemented `str` like this: +``` +struct StrImpl { underlying_array: T } + +type str = StrImpl<[u8]>; +type str_of_fixed_size_bikeshed = StrImpl<[u8, ..N]>; // Non-type generic parameters are required +``` + +Then `&str_of_fixed_size_bikeshed` (the type of string literals) should somehow autocoerce to `&str` and this coercion is not covered by current rules. + +One possible solution is to make `str` a "not-so-smart" pointer to unsized type and not an unsized type itself. + +``` +struct StrImplVal { underlying_array: T } +struct StrImplRef<'a, T> { ref_: &'a StrImplVal } + +type<'a> str<'a> = StrImplRef<'a, [u8]>; +type<'a, N: uint> ref_to_str_of_fixed_size_bikeshed<'a, N> = StrImplRef<'a, [u8, ..N]>; // Non-type generic parameters are required +type str_of_fixed_size_bikeshed = StrImplVal<[u8, ..N]>; // Non-type generic parameters are required +``` + +Then string literals (and strings of fixed size in general) have type `ref_to_str_of_fixed_size_bikeshed<'static, N>`. +And DST coercion from `ref_to_str_of_fixed_size_bikeshed<'a, N>` to `str<'a>` (`StrImplRef<'a, [u8, ..N]> -> StrImplRef<'a, [u8]>`) is possible. +And dereference on `ref_to_str_of_fixed_size_bikeshed<'a, N>` should return `&'a str_of_fixed_size_bikeshed`. +And every `&'a str` has to be rewritten as `str<'a>` (and `&str` as `str`), which is a terribly backward incompatible change (but automatically fixable). + +I suppose this change to `str` may be useful on itself and can be proposed by a separate RFC. + + [1]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4121.pdf From b66a64a09d2168c48c6b3c57db383a2c9f8a925a Mon Sep 17 00:00:00 2001 From: petrochenkov Date: Mon, 29 Sep 2014 14:53:25 +0400 Subject: [PATCH 2/8] editorial changes --- active/0000-statically-sized-literals.md | 125 +++++++++++------------ 1 file changed, 57 insertions(+), 68 deletions(-) diff --git a/active/0000-statically-sized-literals.md b/active/0000-statically-sized-literals.md index 6baf0829ab1..29c6dec8d63 100644 --- a/active/0000-statically-sized-literals.md +++ b/active/0000-statically-sized-literals.md @@ -9,20 +9,20 @@ Introduce strings of fixed size. # Motivation -Currently byte string and string literals have types `&'static [u8]` and `&'static str`. -Therefore. although the sizes of the literals are known at compile time, they are erased from their types and inaccessible until runtime. +Currently byte string and string literals have types `&'static [u8]` and `&'static str`. +Therefore, although the sizes of the literals are known at compile time, they are erased from their types and inaccessible until runtime. This RFC suggests to change the types to `&'static [u8, ..N]` and `&'static str[..N]` respectively. -Additionally this RFC suggests to change the types of array literals from `[T, ..N]` to &'a [T, ..N] for consistency and ergonomics. +Additionally this RFC suggests to change the types of array literals from `[T, ..N]` to `&'a [T, ..N]` for consistency and ergonomics. Today, given the lack of non-type generic parameters and compile time (function) evaluation (CTE), strings of fixed size are not very useful. But after introducing CTE the need in compile time string operations will raise quickly. -Even without CTE but with non-type generic parameters alone strings of fixed size can be used in runtime for "heapless" string operations, which are useful in constrained environments or for optimization. +Even without CTE but with non-type generic parameters alone strings of fixed size can be used in runtime for "heapless" string operations, which are useful in constrained environments or for optimization. So the main motivation for changes today is forward compatibility and before 1.0 `str[..N]` can be implemented as minimally as possible to allow the change of the types of string literals. Examples of use for new literals, that are not possible with old literals: ``` -// Today: initialize mutable array with literal +// Today: initialize mutable array with byte string literal let mut arr: [u8, ..3] = *b"abc"; arr[0] = b'd'; @@ -39,25 +39,25 @@ let DE_FILE = LANG_DIR + *"de"; // Performed at runtime if not optimized ### Proposed changes: -1) -Change the type of array literals from [T, ..N] to &'a [T, ..N]. -Change the type of byte string literals from &'static [u8] to &'static [u8, ..N]. -Change the type of string literals form &'static str to &'static str[..N]. - -2) -Introduce the missing family of types - strings of fixed size - `str[..N]`. -`str[..N]` is essentially a `[u8, ..N]` with UTF-8 invariants and, eventually, additional string methods/traits. -It fills the gap in the vector/string table: -Vec | String ----------------- -[T, N] | ??? ----------------- -&[T] | &str +1) +Change the types of array literals from `[T, ..N]` to `&'a [T, ..N]`. +Change the types of byte string literals from `&'static [u8]` to `&'static [u8, ..N]`. +Change the types of string literals form `&'static str` to `&'static str[..N]`. +2) +Introduce the missing family of types - strings of fixed size - `str[..N]`. +`str[..N]` is essentially a `[u8, ..N]` with UTF-8 invariants and, eventually, additional string methods/traits. +It fills the gap in the vector/string chart: + +`Vec` | `String` +---------|-------- +`[T, ..N]` | ??? +`&[T]` | `&str` ### Static lifetime -Although all the literals under consideration are very similar and are essentially arrays of fixed size, array literals are different from byte string and string literals with regard to lifetimes. -While byte string and string literals can always be placed into static memory and have static lifetime, array literals can depend on local variables and can't have static lifetime in general case. The chosen design potentially allows to trivially enhance *some* array literals with static lifetime in the future to allow use like +Although all the literals under consideration are similar and are essentially arrays of fixed size, array literals are different from byte string and string literals with regard to lifetimes. +While byte string and string literals can always be placed into static memory and have static lifetime, array literals can depend on local variables and can't have static lifetime in general case. +The chosen design potentially allows to trivially enhance *some* array literals with static lifetime in the future to allow use like ``` fn f() -> &'static [int] { [1, 2, 3] @@ -65,20 +65,19 @@ fn f() -> &'static [int] { ``` , but this RFC doesn't propose such an enhancement. -### Statistics for array literals +### Usage statistics for array literals -Array literals can be used both as slices when a view to array is enough to perform the task, and as values when arrays themselves should be copied or modified. -The exact estimation of the frequency of both uses is problematic, but some regex search in the rust codebase gives the next statistics: +Array literals can be used both as slices, when a view to array is sufficient to perform the task, and as values when arrays themselves should be copied or modified. +The exact estimation of the frequencies of both uses is problematic, but some regex search in the Rust codebase gives the next statistics: +In approximately *70%* of cases array literals are used as slices (explicit `&` on array literals, immutable bindings). +In approximately *20%* of cases array literals are used as values (initialization of struct fields, mutable bindings, boxes). +In approximately *10%* of cases the use is unclear. -In approximately 70% array literals are used as slices (explicit `&` on array literals, immutable bindings). -In approximately 20% array literals are used as values (initialization of struct fields, mutable bindings, boxes). -In approximately 10% the use is unclear. - -So, in most cases changing the type of array literals will lead to shorted notation. +So, in most cases the change to the types of array literals will lead to shorter notation. ### Backward compatibility -No code using the literas as slices is broken, DST coercions `&[T, ..N] -> &[T], &str[..N] -> &str` do all the job for compatibility. +No code using the literals as slices is broken, DST coercions `&[T, ..N] -> &[T], &str[..N] -> &str` do all the job for compatibility. ``` fn f(arg: &str) {} f("Hello"); // DST coercion @@ -94,7 +93,6 @@ fn main() { fn g(arg: &[int]) {} g([1i, 2, 3]); // DST coercion &[int, ..3] -> &[int] ``` - Unfortunately, autocoercions from arrays of fixed size to slices was prohibited too soon and a lot of array literals like `[1, 2, 3]` were changed to `&[1, 2, 3]`. These changes have to be reverted (but the prohibition of autocoercions stays in place). Code using array literals as values is broken, but can be fixed easily. @@ -109,13 +107,12 @@ let s = S { arr: [1, 2, 3] }; // Have to be changed to let s = S { arr: *[1, 2, // Mutable array let mut a = [1i, 2, 3]; // Have to be changed to let mut a = *[1i, 2, 3]; ``` - -This change has some benefits - you have to opt-in to use arrays as values and potentially costly array copies become a bit more visible. +This explicit dereference has some benefits - you have to opt-in to use arrays as values and potentially costly array copies become a bit more visible and searchable. Anyway, array literals are less frequently used as values (see the statistics), but more often as slices. ### Precedents -C and C++ string literals are `char` arrays of fixed size. +C and C++ string literals are lvalue `char` arrays of fixed size with static duration. C++ library proposal for strings of fixed size ([link][1]), the paper also contains some discussion and motivation. # Drawbacks @@ -126,25 +123,22 @@ Some breakage for array literals. See "Backward compatibility" section. The alternative design is to make the literals values and not references. -### Necessary changes - -1) -Keep the types of array literals as `[T, ..N]`. -Change the types of byte literals from `&'static [u8]` to `[u8, ..N]` -Change the types of string literals form `&'static str` to to `str[..N]`. - -2) -Introduce the missing family of types - strings of fixed size - `str[..N]`. -... - -3) -Add the autocoercion of array *literals* (not arrays of fixed size in general) to slices. -Add the autocoercion of new byte literals to slices. -Add the autocoercion of new string literals to slices. -Non-literal arrays and strings do not autocoerce to slices, in accordance with the general agreements on explicitness. - -4) -(Optional) Make string and byte literals lvalues with static lifetime +### The changes + +1) +Keep the types of array literals as `[T, ..N]`. +Change the types of byte literals from `&'static [u8]` to `[u8, ..N]`. +Change the types of string literals form `&'static str` to to `str[..N]`. +2) +Introduce the missing family of types - strings of fixed size - `str[..N]`. +... +3) +Add the autocoercion of array *literals* (not arrays of fixed size in general) to slices. +Add the autocoercion of new byte literals to slices. +Add the autocoercion of new string literals to slices. +Non-literal arrays and strings do not autocoerce to slices, in accordance with the general agreements on explicitness. +4) +Make string and byte literals lvalues with static lifetime. Examples of use: ``` @@ -163,7 +157,7 @@ let DE_FILE = LANG_DIR + "de"; // Performed at runtime if not optimized ### Drawbacks of the alternative design -Special rules about (byte) string literals being lvalues add a bit of unnecessary complexity to the specification. +Special rules about (byte) string literals being static lvalues add a bit of unnecessary complexity to the specification. In theory `let s = "abcd";` copies the string from static memory to stack, but the copy is unobservable an can, probably, be elided in most cases. @@ -187,7 +181,7 @@ fn main() { ### Status quo -Status quo (or applying the changes partially) is always an alternative. +Status quo (or partial application of the changes) is always an alternative. ### Drawbacks of status quo @@ -197,7 +191,7 @@ Examples: let mut arr: [u8, ..3] = [b'a', b'b', b'c']; // Have to use array literals arr[0] = b'd'; -// Future: str[..N] is added, CTE is added, but the literal types remains old +// Future: str[..N] is added, CTE is added, but the literal types remain old let mut arr: [u8, ..3] = b"abc".to_fixed(); // Have to use a conversion method arr[0] = b'd'; @@ -208,12 +202,11 @@ static FR_FILE: str[.._] = LANG_DIR + "fr".to_fixed(); // Bad future: str[..N] is not added // "Heapless"/compile-time string operations aren't possible, or performed with "magic" like extended concat! or recursive macros. ``` - Note, that in the "Future" scenario the return *type* of `to_fixed` depends on the *value* of `self`, so it requires sufficiently advanced CTE, for example C++14 with its powerful `constexpr` machinery still doesn't allow to write such a function. # Unresolved questions -If `str` is moved from core language to the library and implemented as a wrapper around u8 array, then strings of fixed size will require additional care. +If `str` is moved from core language to the library and implemented as a wrapper around u8 array, then strings of fixed size will require additional attention. Moreover, the changes to string literals should, probably, be applied after this move. Assume we implemented `str` like this: ``` @@ -222,11 +215,9 @@ struct StrImpl { underlying_array: T } type str = StrImpl<[u8]>; type str_of_fixed_size_bikeshed = StrImpl<[u8, ..N]>; // Non-type generic parameters are required ``` - Then `&str_of_fixed_size_bikeshed` (the type of string literals) should somehow autocoerce to `&str` and this coercion is not covered by current rules. -One possible solution is to make `str` a "not-so-smart" pointer to unsized type and not an unsized type itself. - +One possible solution is to make `str` a "not-so-smart" pointer to unsized type and not the unsized type itself. ``` struct StrImplVal { underlying_array: T } struct StrImplRef<'a, T> { ref_: &'a StrImplVal } @@ -235,12 +226,10 @@ type<'a> str<'a> = StrImplRef<'a, [u8]>; type<'a, N: uint> ref_to_str_of_fixed_size_bikeshed<'a, N> = StrImplRef<'a, [u8, ..N]>; // Non-type generic parameters are required type str_of_fixed_size_bikeshed = StrImplVal<[u8, ..N]>; // Non-type generic parameters are required ``` - -Then string literals (and strings of fixed size in general) have type `ref_to_str_of_fixed_size_bikeshed<'static, N>`. -And DST coercion from `ref_to_str_of_fixed_size_bikeshed<'a, N>` to `str<'a>` (`StrImplRef<'a, [u8, ..N]> -> StrImplRef<'a, [u8]>`) is possible. -And dereference on `ref_to_str_of_fixed_size_bikeshed<'a, N>` should return `&'a str_of_fixed_size_bikeshed`. -And every `&'a str` has to be rewritten as `str<'a>` (and `&str` as `str`), which is a terribly backward incompatible change (but automatically fixable). - -I suppose this change to `str` may be useful on itself and can be proposed by a separate RFC. +In this case string literals have types `ref_to_str_of_fixed_size_bikeshed<'static, N>` and strings of fixed size have types `str_of_fixed_size_bikeshed`. +And the coercion from `ref_to_str_of_fixed_size_bikeshed<'a, N>` to `str<'a>` (`StrImplRef<'a, [u8, ..N]> -> StrImplRef<'a, [u8]>`) is an usual DST coercion. +And dereference on `ref_to_str_of_fixed_size_bikeshed<'a, N>` should return `&'a str_of_fixed_size_bikeshed`. +And every `&'a str` has to be rewritten as `str<'a>` (and `&str` as `str`), which is a terribly backward incompatible change (but automatically fixable). +I suppose this change to `str` may be useful by itself and can be proposed by a separate RFC. [1]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4121.pdf From 566f9fd4bcca2afeacfe44896cb22cb7f10d0318 Mon Sep 17 00:00:00 2001 From: madrugado Date: Mon, 29 Sep 2014 20:17:17 +0400 Subject: [PATCH 3/8] FIX: some small fixes of the grammar --- active/0000-statically-sized-literals.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/active/0000-statically-sized-literals.md b/active/0000-statically-sized-literals.md index 29c6dec8d63..0b80e0b0742 100644 --- a/active/0000-statically-sized-literals.md +++ b/active/0000-statically-sized-literals.md @@ -4,7 +4,7 @@ # Summary -Change the types of array, byte string and string literals to be (references to) statically sized types. +Change the types of array, byte string and string literals to be statically sized types (references to ones). Introduce strings of fixed size. # Motivation @@ -15,9 +15,9 @@ This RFC suggests to change the types to `&'static [u8, ..N]` and `&'static str[ Additionally this RFC suggests to change the types of array literals from `[T, ..N]` to `&'a [T, ..N]` for consistency and ergonomics. Today, given the lack of non-type generic parameters and compile time (function) evaluation (CTE), strings of fixed size are not very useful. -But after introducing CTE the need in compile time string operations will raise quickly. -Even without CTE but with non-type generic parameters alone strings of fixed size can be used in runtime for "heapless" string operations, which are useful in constrained environments or for optimization. -So the main motivation for changes today is forward compatibility and before 1.0 `str[..N]` can be implemented as minimally as possible to allow the change of the types of string literals. +But after introduction of CTE the need in compile time string operations will raise rapidly. +Even without CTE but with non-type generic parameters alone fixed size strings can be used in runtime for "heapless" string operations, which are useful in constrained environments or for optimization. +So the main motivation for changes today is forward compatibility and before 1.0 `str[..N]` can be implemented as marginally as possible to allow the change of the types of string literals. Examples of use for new literals, that are not possible with old literals: @@ -71,7 +71,7 @@ Array literals can be used both as slices, when a view to array is sufficient to The exact estimation of the frequencies of both uses is problematic, but some regex search in the Rust codebase gives the next statistics: In approximately *70%* of cases array literals are used as slices (explicit `&` on array literals, immutable bindings). In approximately *20%* of cases array literals are used as values (initialization of struct fields, mutable bindings, boxes). -In approximately *10%* of cases the use is unclear. +In the rest *10%* of cases the usage is unclear. So, in most cases the change to the types of array literals will lead to shorter notation. @@ -93,7 +93,7 @@ fn main() { fn g(arg: &[int]) {} g([1i, 2, 3]); // DST coercion &[int, ..3] -> &[int] ``` -Unfortunately, autocoercions from arrays of fixed size to slices was prohibited too soon and a lot of array literals like `[1, 2, 3]` were changed to `&[1, 2, 3]`. These changes have to be reverted (but the prohibition of autocoercions stays in place). +Unfortunately, autocoercions from arrays of fixed size to slices was prohibited too soon and a lot of array literals like `[1, 2, 3]` were changed to `&[1, 2, 3]`. These changes have to be reverted (but the prohibition of autocoercions should stay in place). Code using array literals as values is broken, but can be fixed easily. ``` @@ -121,7 +121,7 @@ Some breakage for array literals. See "Backward compatibility" section. # Alternatives -The alternative design is to make the literals values and not references. +The alternative design is to make the literals the values and not the references. ### The changes @@ -215,7 +215,7 @@ struct StrImpl { underlying_array: T } type str = StrImpl<[u8]>; type str_of_fixed_size_bikeshed = StrImpl<[u8, ..N]>; // Non-type generic parameters are required ``` -Then `&str_of_fixed_size_bikeshed` (the type of string literals) should somehow autocoerce to `&str` and this coercion is not covered by current rules. +Then `&str_of_fixed_size_bikeshed` (the type of string literals) should somehow autocoerce to `&str` and this coercion is not covered by the current rules. One possible solution is to make `str` a "not-so-smart" pointer to unsized type and not the unsized type itself. ``` @@ -230,6 +230,6 @@ In this case string literals have types `ref_to_str_of_fixed_size_bikeshed<'stat And the coercion from `ref_to_str_of_fixed_size_bikeshed<'a, N>` to `str<'a>` (`StrImplRef<'a, [u8, ..N]> -> StrImplRef<'a, [u8]>`) is an usual DST coercion. And dereference on `ref_to_str_of_fixed_size_bikeshed<'a, N>` should return `&'a str_of_fixed_size_bikeshed`. And every `&'a str` has to be rewritten as `str<'a>` (and `&str` as `str`), which is a terribly backward incompatible change (but automatically fixable). -I suppose this change to `str` may be useful by itself and can be proposed by a separate RFC. +I suppose this change to `str` may be useful by itself and can be proposed as a separate RFC. [1]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4121.pdf From 5740fce1e2bb2c0fd3c9687a7be7d21cac8bc2e6 Mon Sep 17 00:00:00 2001 From: petrochenkov Date: Mon, 29 Sep 2014 21:28:30 +0400 Subject: [PATCH 4/8] fix --- active/0000-statically-sized-literals.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/active/0000-statically-sized-literals.md b/active/0000-statically-sized-literals.md index 0b80e0b0742..fcb91bbbad7 100644 --- a/active/0000-statically-sized-literals.md +++ b/active/0000-statically-sized-literals.md @@ -4,7 +4,7 @@ # Summary -Change the types of array, byte string and string literals to be statically sized types (references to ones). +Change the types of array, byte string and string literals to be references to statically sized types. Introduce strings of fixed size. # Motivation From 9d042c9db96cee5a2d0fcf3b61b87fd3a11a6458 Mon Sep 17 00:00:00 2001 From: petrochenkov Date: Tue, 30 Sep 2014 13:18:20 +0400 Subject: [PATCH 5/8] updated "unresolved questions" --- active/0000-statically-sized-literals.md | 25 ++++-------------------- 1 file changed, 4 insertions(+), 21 deletions(-) diff --git a/active/0000-statically-sized-literals.md b/active/0000-statically-sized-literals.md index fcb91bbbad7..43c3e81f809 100644 --- a/active/0000-statically-sized-literals.md +++ b/active/0000-statically-sized-literals.md @@ -206,30 +206,13 @@ Note, that in the "Future" scenario the return *type* of `to_fixed` depends on t # Unresolved questions -If `str` is moved from core language to the library and implemented as a wrapper around u8 array, then strings of fixed size will require additional attention. Moreover, the changes to string literals should, probably, be applied after this move. +If `str` is moved from core language to the library and implemented as a wrapper around u8 array, then strings of fixed size will require some additional attention. Moreover, the changes to string literals should, probably, be applied after this move. Assume we implemented `str` like this: ``` -struct StrImpl { underlying_array: T } - -type str = StrImpl<[u8]>; -type str_of_fixed_size_bikeshed = StrImpl<[u8, ..N]>; // Non-type generic parameters are required -``` -Then `&str_of_fixed_size_bikeshed` (the type of string literals) should somehow autocoerce to `&str` and this coercion is not covered by the current rules. - -One possible solution is to make `str` a "not-so-smart" pointer to unsized type and not the unsized type itself. -``` -struct StrImplVal { underlying_array: T } -struct StrImplRef<'a, T> { ref_: &'a StrImplVal } - -type<'a> str<'a> = StrImplRef<'a, [u8]>; -type<'a, N: uint> ref_to_str_of_fixed_size_bikeshed<'a, N> = StrImplRef<'a, [u8, ..N]>; // Non-type generic parameters are required -type str_of_fixed_size_bikeshed = StrImplVal<[u8, ..N]>; // Non-type generic parameters are required +struct str { underlying_array: T } ``` -In this case string literals have types `ref_to_str_of_fixed_size_bikeshed<'static, N>` and strings of fixed size have types `str_of_fixed_size_bikeshed`. -And the coercion from `ref_to_str_of_fixed_size_bikeshed<'a, N>` to `str<'a>` (`StrImplRef<'a, [u8, ..N]> -> StrImplRef<'a, [u8]>`) is an usual DST coercion. -And dereference on `ref_to_str_of_fixed_size_bikeshed<'a, N>` should return `&'a str_of_fixed_size_bikeshed`. -And every `&'a str` has to be rewritten as `str<'a>` (and `&str` as `str`), which is a terribly backward incompatible change (but automatically fixable). -I suppose this change to `str` may be useful by itself and can be proposed as a separate RFC. +Then strings of fixed size will have library types `str<[u8, ..N]>` instead of built-in `str[..N]` and that's a clear improvement. +In that case string literals have types `&str<[u8, ..N]>` which can be autocoerced to `&str` by DST coercion. [1]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4121.pdf From 3bad80c9974743075352107e9151e8c1312013cd Mon Sep 17 00:00:00 2001 From: petrochenkov Date: Tue, 30 Sep 2014 13:41:41 +0400 Subject: [PATCH 6/8] 'static --- active/0000-statically-sized-literals.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/active/0000-statically-sized-literals.md b/active/0000-statically-sized-literals.md index 43c3e81f809..2435f853f21 100644 --- a/active/0000-statically-sized-literals.md +++ b/active/0000-statically-sized-literals.md @@ -213,6 +213,6 @@ Assume we implemented `str` like this: struct str { underlying_array: T } ``` Then strings of fixed size will have library types `str<[u8, ..N]>` instead of built-in `str[..N]` and that's a clear improvement. -In that case string literals have types `&str<[u8, ..N]>` which can be autocoerced to `&str` by DST coercion. +In that case string literals have types `&'static str<[u8, ..N]>` which can be autocoerced to `&'static str` by DST coercion. [1]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4121.pdf From 8cf7bbe1488c3a6a56e5421913380c0a6b80497a Mon Sep 17 00:00:00 2001 From: petrochenkov Date: Sun, 12 Oct 2014 18:31:11 +0400 Subject: [PATCH 7/8] Updated based on the feedback --- active/0000-statically-sized-literals.md | 160 ++++++++++------------- 1 file changed, 72 insertions(+), 88 deletions(-) diff --git a/active/0000-statically-sized-literals.md b/active/0000-statically-sized-literals.md index 2435f853f21..832c2048255 100644 --- a/active/0000-statically-sized-literals.md +++ b/active/0000-statically-sized-literals.md @@ -4,20 +4,27 @@ # Summary -Change the types of array, byte string and string literals to be references to statically sized types. -Introduce strings of fixed size. +Change the types of byte string literals to be references to statically sized types. +Ensure the same change can be performed backward compatibly for string literals in the future. # Motivation Currently byte string and string literals have types `&'static [u8]` and `&'static str`. Therefore, although the sizes of the literals are known at compile time, they are erased from their types and inaccessible until runtime. -This RFC suggests to change the types to `&'static [u8, ..N]` and `&'static str[..N]` respectively. -Additionally this RFC suggests to change the types of array literals from `[T, ..N]` to `&'a [T, ..N]` for consistency and ergonomics. +This RFC suggests to change the type of byte string literals to `&'static [u8, ..N]`. +In addition this RFC suggest not to introduce any changes to `str` or string literals, that would prevent a backward compatible addition of strings of fixed size `FixedString` (the name FixedString in this RFC is a placeholder and is open for bikeshedding) and the change of the type of string literals to `&'static FixedString` in the future. + +`FixedString` is essentially a `[u8, ..N]` with UTF-8 invariants and additional string methods/traits. +It fills the gap in the vector/string chart: + +`Vec` | `String` +---------|-------- +`[T, ..N]` | ??? +`&[T]` | `&str` Today, given the lack of non-type generic parameters and compile time (function) evaluation (CTE), strings of fixed size are not very useful. But after introduction of CTE the need in compile time string operations will raise rapidly. -Even without CTE but with non-type generic parameters alone fixed size strings can be used in runtime for "heapless" string operations, which are useful in constrained environments or for optimization. -So the main motivation for changes today is forward compatibility and before 1.0 `str[..N]` can be implemented as marginally as possible to allow the change of the types of string literals. +Even without CTE but with non-type generic parameters alone fixed size strings can be used in runtime for "heapless" string operations, which are useful in constrained environments or for optimization. So the main motivation for changes today is forward compatibility. Examples of use for new literals, that are not possible with old literals: @@ -27,9 +34,9 @@ let mut arr: [u8, ..3] = *b"abc"; arr[0] = b'd'; // Future with CTE: compile time string concatenation -static LANG_DIR: str[..5 /*The size should, probably, be inferred*/ ] = *"lang/"; -static EN_FILE: str[.._] = LANG_DIR + *"en"; // str[..N] implements Add -static FR_FILE: str[.._] = LANG_DIR + *"fr"; +static LANG_DIR: FixedString<5 /*The size should, probably, be inferred*/> = *"lang/"; +static EN_FILE: FixedString<_> = LANG_DIR + *"en"; // FixedString implements Add +static FR_FILE: FixedString<_> = LANG_DIR + *"fr"; // Future without CTE: runtime "heapless" string concatenation let DE_FILE = LANG_DIR + *"de"; // Performed at runtime if not optimized @@ -37,33 +44,46 @@ let DE_FILE = LANG_DIR + *"de"; // Performed at runtime if not optimized # Detailed design -### Proposed changes: +Change the type of byte string literals from `&'static [u8]` to `&'static [u8, ..N]`. +Leave the door open for a backward compatible change of the type of string literals from `&'static str` to `&'static FixedString`. -1) -Change the types of array literals from `[T, ..N]` to `&'a [T, ..N]`. -Change the types of byte string literals from `&'static [u8]` to `&'static [u8, ..N]`. -Change the types of string literals form `&'static str` to `&'static str[..N]`. -2) -Introduce the missing family of types - strings of fixed size - `str[..N]`. -`str[..N]` is essentially a `[u8, ..N]` with UTF-8 invariants and, eventually, additional string methods/traits. -It fills the gap in the vector/string chart: +### Strings of fixed size -`Vec` | `String` ----------|-------- -`[T, ..N]` | ??? -`&[T]` | `&str` +If `str` is moved to the library today, then strings of fixed size can be implemented like this: +``` +struct str(T); +``` +Then string literals will have types `&'static str<[u8, ..N]>`. -### Static lifetime +Drawbacks of this approach include unnecessary exposition of the implementation - underlying sized or unsized arrays `[u8]`/`[u8, ..N]` and generic parameter `T`. +The key requirement here is the autocoercion from reference to fixed string to string slice an we are unable to meet it now without exposing the implementation. -Although all the literals under consideration are similar and are essentially arrays of fixed size, array literals are different from byte string and string literals with regard to lifetimes. -While byte string and string literals can always be placed into static memory and have static lifetime, array literals can depend on local variables and can't have static lifetime in general case. -The chosen design potentially allows to trivially enhance *some* array literals with static lifetime in the future to allow use like +In the future, after gaining the ability to parameterize on integers, strings of fixed size could be implemented in a better way: ``` -fn f() -> &'static [int] { - [1, 2, 3] -} +struct __StrImpl(T); // private + +pub type str = StrImpl<[u8]>; // unsized referent of string slice `&str`, public +pub type FixedString = __StrImpl<[u8, ..N]>; // string of fixed size, public + +// &FixedString -> &str : OK, including &'static FixedString -> &'static str for string literals +``` +So, we don't propose to make these changes today and sugget to wait until parameterizing on integers is added to the language. + +### Precedents + +C and C++ string literals are lvalue `char` arrays of fixed size with static duration. +C++ library proposal for strings of fixed size ([link][1]), the paper also contains some discussion and motivation. + +# Rejected alternatives and discussion + +## Array literals + +The types of array literals potentially can be changed from `[T, ..N]` to `&'a [T, ..N]` for consistency with the other literals and ergonomics. +The major blocker for this change is the inability to move out from a dereferenced array literal if `T` is not `Copy`. ``` -, but this RFC doesn't propose such an enhancement. +let mut a = *[box 1i, box 2, box 3]; // Wouldn't work without special-casing of string literal with regard to moving out from dereferenced borrowed pointer +``` +Despite that array literals as references have better usability, possible `static`ness and consistency with other literals. ### Usage statistics for array literals @@ -75,51 +95,18 @@ In the rest *10%* of cases the usage is unclear. So, in most cases the change to the types of array literals will lead to shorter notation. -### Backward compatibility - -No code using the literals as slices is broken, DST coercions `&[T, ..N] -> &[T], &str[..N] -> &str` do all the job for compatibility. -``` -fn f(arg: &str) {} -f("Hello"); // DST coercion - -static GOODBYE: &'static str = "Goodbye"; // DST coercion - -fn main() { - let s = "Hello"; - fn f(arg: &str) {} - f(s); // No breakage, DST coercion -} - -fn g(arg: &[int]) {} -g([1i, 2, 3]); // DST coercion &[int, ..3] -> &[int] -``` -Unfortunately, autocoercions from arrays of fixed size to slices was prohibited too soon and a lot of array literals like `[1, 2, 3]` were changed to `&[1, 2, 3]`. These changes have to be reverted (but the prohibition of autocoercions should stay in place). +### Static lifetime -Code using array literals as values is broken, but can be fixed easily. +Although all the literals under consideration are similar and are essentially arrays of fixed size, array literals are different from byte string and string literals with regard to lifetimes. +While byte string and string literals can always be placed into static memory and have static lifetime, array literals can depend on local variables and can't have static lifetime in general case. +The chosen design potentially allows to trivially enhance *some* array literals with static lifetime in the future to allow use like ``` -// Array as a struct field -struct S { - arr: [int, ..3], +fn f() -> &'static [int] { + [1, 2, 3] } - -let s = S { arr: [1, 2, 3] }; // Have to be changed to let s = S { arr: *[1, 2, 3] }; - -// Mutable array -let mut a = [1i, 2, 3]; // Have to be changed to let mut a = *[1i, 2, 3]; ``` -This explicit dereference has some benefits - you have to opt-in to use arrays as values and potentially costly array copies become a bit more visible and searchable. -Anyway, array literals are less frequently used as values (see the statistics), but more often as slices. - -### Precedents - -C and C++ string literals are lvalue `char` arrays of fixed size with static duration. -C++ library proposal for strings of fixed size ([link][1]), the paper also contains some discussion and motivation. - -# Drawbacks - -Some breakage for array literals. See "Backward compatibility" section. -# Alternatives +## Alternatives The alternative design is to make the literals the values and not the references. @@ -128,9 +115,9 @@ The alternative design is to make the literals the values and not the references 1) Keep the types of array literals as `[T, ..N]`. Change the types of byte literals from `&'static [u8]` to `[u8, ..N]`. -Change the types of string literals form `&'static str` to to `str[..N]`. +Change the types of string literals form `&'static str` to to `FixedString`. 2) -Introduce the missing family of types - strings of fixed size - `str[..N]`. +Introduce the missing family of types - strings of fixed size - `FixedString`. ... 3) Add the autocoercion of array *literals* (not arrays of fixed size in general) to slices. @@ -147,9 +134,9 @@ let mut arr: [u8, ..3] = b"abc"; arr[0] = b'd'; // Future with CTE: compile time string concatenation -static LANG_DIR: str[.._] = "lang/"; -static EN_FILE: str[.._] = LANG_DIR + "en"; // str[..N] implements Add -static FR_FILE: str[.._] = LANG_DIR + "fr"; +static LANG_DIR: FixedString<_> = "lang/"; +static EN_FILE: FixedString<_> = LANG_DIR + "en"; // FixedString implements Add +static FR_FILE: FixedString<_> = LANG_DIR + "fr"; // Future without CTE: runtime "heapless" string concatenation let DE_FILE = LANG_DIR + "de"; // Performed at runtime if not optimized @@ -191,28 +178,25 @@ Examples: let mut arr: [u8, ..3] = [b'a', b'b', b'c']; // Have to use array literals arr[0] = b'd'; -// Future: str[..N] is added, CTE is added, but the literal types remain old +// Future: FixedString is added, CTE is added, but the literal types remain old let mut arr: [u8, ..3] = b"abc".to_fixed(); // Have to use a conversion method arr[0] = b'd'; -static LANG_DIR: str[.._] = "lang/".to_fixed(); // Have to use a conversion method -static EN_FILE: str[.._] = LANG_DIR + "en".to_fixed(); -static FR_FILE: str[.._] = LANG_DIR + "fr".to_fixed(); +static LANG_DIR: FixedString<_> = "lang/".to_fixed(); // Have to use a conversion method +static EN_FILE: FixedString<_> = LANG_DIR + "en".to_fixed(); +static FR_FILE: FixedString<_> = LANG_DIR + "fr".to_fixed(); -// Bad future: str[..N] is not added +// Bad future: FixedString is not added // "Heapless"/compile-time string operations aren't possible, or performed with "magic" like extended concat! or recursive macros. ``` Note, that in the "Future" scenario the return *type* of `to_fixed` depends on the *value* of `self`, so it requires sufficiently advanced CTE, for example C++14 with its powerful `constexpr` machinery still doesn't allow to write such a function. -# Unresolved questions +# Drawbacks -If `str` is moved from core language to the library and implemented as a wrapper around u8 array, then strings of fixed size will require some additional attention. Moreover, the changes to string literals should, probably, be applied after this move. +None. -Assume we implemented `str` like this: -``` -struct str { underlying_array: T } -``` -Then strings of fixed size will have library types `str<[u8, ..N]>` instead of built-in `str[..N]` and that's a clear improvement. -In that case string literals have types `&'static str<[u8, ..N]>` which can be autocoerced to `&'static str` by DST coercion. +# Unresolved questions + +None. [1]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4121.pdf From b3e2076d8d02afd490131dce468571c72d0cd5e3 Mon Sep 17 00:00:00 2001 From: Vadim Petrochenkov Date: Mon, 13 Oct 2014 02:21:18 +0400 Subject: [PATCH 8/8] editorial --- active/0000-statically-sized-literals.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/active/0000-statically-sized-literals.md b/active/0000-statically-sized-literals.md index 832c2048255..7a7dd8a95a0 100644 --- a/active/0000-statically-sized-literals.md +++ b/active/0000-statically-sized-literals.md @@ -62,26 +62,26 @@ In the future, after gaining the ability to parameterize on integers, strings of ``` struct __StrImpl(T); // private -pub type str = StrImpl<[u8]>; // unsized referent of string slice `&str`, public +pub type str = __StrImpl<[u8]>; // unsized referent of string slice `&str`, public pub type FixedString = __StrImpl<[u8, ..N]>; // string of fixed size, public // &FixedString -> &str : OK, including &'static FixedString -> &'static str for string literals ``` -So, we don't propose to make these changes today and sugget to wait until parameterizing on integers is added to the language. +So, we don't propose to make these changes today and suggest to wait until generic parameterization on integers is added to the language. ### Precedents -C and C++ string literals are lvalue `char` arrays of fixed size with static duration. +C and C++ string literals are lvalue `char` arrays of fixed size with static duration. C++ library proposal for strings of fixed size ([link][1]), the paper also contains some discussion and motivation. # Rejected alternatives and discussion ## Array literals -The types of array literals potentially can be changed from `[T, ..N]` to `&'a [T, ..N]` for consistency with the other literals and ergonomics. +The types of array literals potentially can be changed from `[T, ..N]` to `&'a [T, ..N]` for consistency with the other literals and ergonomics. The major blocker for this change is the inability to move out from a dereferenced array literal if `T` is not `Copy`. ``` -let mut a = *[box 1i, box 2, box 3]; // Wouldn't work without special-casing of string literal with regard to moving out from dereferenced borrowed pointer +let mut a = *[box 1i, box 2, box 3]; // Wouldn't work without special-casing of array literals with regard to moving out from dereferenced borrowed pointer ``` Despite that array literals as references have better usability, possible `static`ness and consistency with other literals.