Link time sets and maps

In response to #369, this issue proposes _link time sets and maps_ as a feature that enables a small amount of pre-main computation beyond what's already possible with `const`. The point is that this allows a "well-known" declaration _D_ (in some library that is widely used) to offer a registration service, in the sense that any library where _D_ is imported can contribute to the value of _D_.

In #369, @davidmorgan mentioned some important situations where this is useful:

> Some frameworks need a way to run initialization code for all
> leaf code using the framework. For example, dependency injection
> frameworks want to build a map of injectable types, and serialization
> frameworks want to know about all serializable types.

The feature proposed here is built directly on top of the support for constants in Dart, and it will not enable execution of user-written code. However it is sufficiently powerful to satisfy the need for modular expression (for instance, the framework _can_ build a map of all injectable types, even though they are declared in libraries not imported by the framework), and it allows the main program to use the data whenever it wishes to do so (there is no inherent startup cost, because any work associated with the registered entities is performed by explicit code called from `main`, which can use any algorithm we'd want, e.g., to support laziness).

In order to avoid problems with excessively large programs (in terms of code size or heap size), it is [crucial](https://github.com/dart-lang/language/issues/369#issuecomment-495257279) that this mechanism allows for conditional contributions: For instance, the transitive closure of some useful imports may include a very large number of serializable types, but any given program may only actually use a few of them.

This feature relies on _tree-shaking_ to enable inclusion of approximately the smallest possible set of entities. It should be noted that tree-shaking is an implementation dependent concept, but it is required to have a soundness property: No tree-shaking algorithm is allowed to remove any part of the program which is actually used at run-time.


## Link Time Sets and Maps, by Example

This feature adds a new kind of top-level declaration that introduces a link time set or map:

```dart
Set<T> linkTimeSet of const;

Map<K, V> linkTimeMap of const;
```

`linkTimeSet` and `linkTimeMap` are not constant (so they cannot be used in constant expressions), but they can contain elements (`linkTimeSet`) and key/value bindings (`linkTimeMap`) which are specified elsewhere, in so-called contributing declarations (described below). A link time set or map is immutable.

A _contributing declaration_ for a link time set or map is a declaration that contributes a single element to the set, respectively a single binding for the map:

```dart
const linkTimeSet.add(c);
const linkTimeMap[c1] = c2;
```

It is an error unless `c`, `c1` and `c2` are constant expressions. These declarations can only have the form shown above, so any attempt to write general code to manipulate the set/map is an error&mdash;this feature only supports building the set/map one element/binding at a time.

The usual static errors apply, as if the contributing declaration had been a statement in the same binding environment. For instance, it is a compile-time error unless `c` is assignable to `T`.

A _conditional contributing declaration_ includes a condition which lists some entities (such as classes or functions):

```dart
const linkTimeSet.add(c) if E1, ..., Ek;
const linkTimeMap[c1] = c2 if E1, ..., Ek;
```

Again, conditional contributing declarations can only have the forms shown above (we can only use `add` and `[]=`), the expressions `c`, `c1`, and `c2` must be _potentially_ constant, and the usual static errors apply just like they do in the unconditional case. `E1 .. Ek` are constant expressions denoting types or functions.

The value of `linkTimeSet` is an immutable set of type `Set<T>` containing the elements which are added using contributing declarations. Some contributing declarations may add the same element (which is not an error).

The value of `linkTimeMap` is an immutable map of type `Map<K, V>` containing the bindings which are added by contributing declarations. It is a link time error if two contributed bindings have the same key but different values. It is not an error if two contributed bindings have the same keys and the same values. Again, no guarantees are given for the iteration order.

(*A link time error is a new concept in the specification of Dart. It just needs to occur before the execution of `main` starts, but specific tool chains may offer a specific link operation where the error can occur. In any case, the error will be detected if the resulting program is executed at all, not just if the program gets some specific input.*)

The process whereby link time sets and maps receive elements respectively bindings runs in phases. In the first phase, all conditional contributing declarations are ignored. At that point a tree-shaking procedure may run, which marks a subset of the entities in the program as unreachable. Now each conditional contributing declaration is visited, and if all of `E1 .. Ek` are marked as unreachable then it is still ignored, otherwise the requested contribution takes place, which may cause some entities that were previously marked as unreachable to be included in the program. This step is repeated until it has no effect.

Note that there are no constraints on the relationship between the condition and the object/binding which is contributed. This means that it is possible, for instance, to specify that a deserialization helper function `deserializeC` can be found under the key `"C"` in the map `deserializers` if the program uses the class `C` as follows:

```dart
// In some core library of the serialization package.
Map<String, Object Function(String)> deserializers of const;

// In some third party library.
class C {...}
C deserializeC(String s) {...}
const deserializers["C"] = deserializeC if C;
```

Different applications may need to use different techniques in order to enable the inclusion of `deserializeC` exactly if that is needed. For instance, it could be the case that the program simply uses `C` in that case, by creating new instances of `C` explicitly, or by using `C` as a type annotation. It could also be necessary to force usage of certain classes like `C`, say, because some supertypes of `C` are used explicitly but `C` itself is one of a number of implementation classes, and we only know which implementation classes are needed because that's a business level "contract" with some cloud services that this application is using. In that case we may need to mention the required classes in `main`:

```dart
main() {
  used(C);
}
```

A little bit of careful coding is needed whenever there is a need to interact with implementation dependent mechanisms like tree-shaking, but the point is that `used` should be written in such a way that no optimizations can eliminate the dependency on `C`.

We would need to standardize how to do such things, but surely it will be possible to achieve the desired effect: That `C` _is_ included after all optimizations are complete.


## Initial Feature Specification Proposal


### Syntax

The grammar is modified as follows:

```
<topLevelDefinition> ::= // Add two new alternatives.
    ... |
    'const' <linkTimeCollectionContribution> ';' |
    <linkTimeCollectionDeclaration> ';'

<linkTimeCollectionContribution> ::= // New rule.
    (<typeIdentifier> '.')? <identifier> <linkTimeCollectionAction>
    ('if' <expressionList>)?

<linkTimeCollectionAction> ::= // New rule.
    '.' <identifier> '(' <expression> ')' |
    '[' <expression> ']' '=' <expression>

<linkTimeCollectionDeclaration> ::= // New rule.
    <identifier> <typeArguments> <identifier> 'of' 'const'
```

*In order to experiment with the new syntax and see the complete grammar, please consult [this CL](https://dart-review.googlesource.com/c/sdk/+/103526).*

*Note that the type arguments are not optional. This is an opinionated choice, based on the assumption that a semi-dynamically typed link time collections should be avoided, or at least explicitly typed as `Set<dynamic>` or `Map<dynamic, dynamic>`. Similarly there is no way to omit the type entirely.*


### Static Analysis

We introduce the implementation dependent notion of _program inclusion_.

*This is usually associated with an algorithm known as tree-shaking. In general, program inclusion proceeds by marking a program element `E` as included if some potential execution starting from `main` may depend on `E`, e.g., by executing it if `E` is an expression or statement, or by referring to it if `E` is a declaration. At the end of a fixpoint iteration where no more entities are included, all the entities which were not included may be removed from the program before deployment. Any size of program element may be eligible for inclusion, e.g., a single subexpression or an entire class, but this feature only relies on the inclusion of a type or a function as a whole, and it does not matter whether parts of said type or function have been eliminated.*

An implementation is required to use a sound algorithm to compute which entities are included in a program, but it is allowed to eliminate any program element which is guaranteed to be unable to observably influence the execution.

*Events like 'out of memory' may be influenced by tree-shaking. But they are also implementation dependent, and hence they are not considered observable.*

*It is a bug if any program element is actually used, but it was eliminated by tree-shaking. On the other hand, it is allowed for an implementation to use the trivial algorithm which does not eliminate any program elements at all. This means that different implementations may offer a different quality of service in this respect, but given that each deployed program was actually produced by a specific tool chain, and tree-shaking is completed before deployment, it is always possible to assess the actual quality of tree-shaking.*

Consider a `<linkTimeCollectionDeclaration>` _D_ of the form `id<T1, .. Tk> c of const`. It is a compile-time error unless each of `T1` .. `Tk` is a constant type expression. It is a compile-time error unless one of the following holds:

- `id` is `Set` and `k` is 1.
- `id` is `Map` and `k` is 2.

The effect of _D_ is to introduce the identifier `c` into the library scope of the enclosing library, with the declared type `id<T1, .. Tk>`.

Consider a top level definition derived from `'const' <linkTimeCollectionContribution> ';'` _D_ of the form `const q.id(e);`. It is a compile-time error unless `id` is `add`. It is a compile-time error unless `e` is a constant expression. It is a compile-time error unless `q` denotes a link-time collection declaration whose type is of the form `Set<T>`, where `Set` denotes the built-in set type. It is a compile-time error unless the type of `e` is a subtype of `T`.

Consider the case where _D_ is of the form `const q.id(e) if E1, ..., Ek;`. In this case exactly the same compile-time errors exist for `id`, `q`, and for the type of `e` as for the form with no `if`. Moreover, it is a compile-time error unless `e` is a potentially constant expression, and it is a compile-time error unless each `Ej` for `j` in 1..k is a constant expression denoting a type or a function.

Consider a top level definition derived from `'const' <linkTimeCollectionContribution> ';'` _D_ of the form `const q[e1] = e2;`. It is a compile-time error unless `e1` and `e2` are constant expressions. It is a compile-time error unless `q` denotes a link-time collection declaration whose type is of the form `Map<K, V>`, where `Map` denotes the built-in map type. It is a compile-time error unless the type of the value of `e1` is a subtype of `K` and the type of the value of `e2` is a subtype of `V`.

Consider the case where _D_ is of the form `const q[e1] = e1 if E1, ..., Ek;`. In this case exactly the same compile-time errors exist for `q`, and for the types of `e1` and `e2` as for the form with no `if`. Moreover, it is a compile-time error unless `e1` and `e2` are potentially constant expressions, and it is a compile-time error unless each `Ej` for `j` in 1..k is a constant expression denoting a type or a function.

It is an error that must be raised before the execution of `main` (possibly at compile time or link time, if that concept is relevant for a given a tool) if two link time map contributions bind the value `k` to two different values `v1` and `v2`.

We say that a contributing declaration is _conditional_ when it includes the part starting with `if`. Other contributing declarations are said to be _unconditional_.

Consider the case where a script _S_ is given (*that is, a library which declares a `main` function and defines which libraries are included in a complete program based on the transitive closure of its imports*). Assume that program inclusion has been computed for a version of the program where every conditional contribution declaration is ignored (*as if it had been commented out*).

The effect of conditional contribution declarations is then specified by a repeated application of the following step, until it has no effect:

_Conditional contribution inclusion step_: Consider a conditional contribution declaration _D_ whose condition is of the form `if E1, ..., Ek`. If program inclusion has marked any of `E1, ..., Ek` as included then the unconditional contributing declaration corresponding to _D_ is added to the program at the same location as _D_, and _D_ is removed; otherwise _D_ is still ignored. Next, the program inclusion algorithm is repeated (*such that all entities which are potentially reachable starting from the newly added unconditional contribution declarations are also included*).

When the iteration is complete, any remaining conditional contributing declarations are removed from the program.

*Note that the addition of an unconditional contributing declaration to the program may cause an error, because certain expressions must now be constant rather than just potentially constant. Also note that every object contained by a link-time collection is constant, but the name of a link-time collection is not a constant expression. It follows that they are not canonicalized.*


### Dynamic Semantics

*Note that there is no dynamic semantics for conditional contributing declarations, because they have all been transformed into unconditional ones or removed from the program.*

Let `s` be the name of a link-time collection declaration of type `Set<T>`. Evaluation of `s` at run time yields an instance of a subtype of `Set<T>` which is not a subtype of `Set<S>` for any `S` unless `T <: S`. That instance is an immutable set. (*Hence, any attempt to modify it at run-time causes a dynamic error.*)

The elements contained in `s` are exactly the objects that are mentioned in contributing declarations for `s` of the form `const s.add(c)` or `const p.s.add(c)`, in any library which is transitively imported by the entry point (*which is the "main" library of the program*). No guarantees are given with respect to the iteration order of `s`.

Let `m` be the name of a link-time collection declaration of type `Map<K, V>`. Evaluation of `m` at run time yields an instance of a subtype of `Map<K, V>` which is not a subtype of `Map<K1, V1>` for any `K1`, `V1`, unless `K <: K1` and `V <: V1`. That instance is an immutable map. (*Hence, any attempt to modify it at run-time causes a dynamic error.*)

The map elements contained in `m` are exactly the ones that hold a key `k` and a value `v` which occur in contributing declarations for `m` of the form `const s[k] = v` or `const p.s[k] = v`, in any library which is transitively imported by the entry point. No guarantees are given with respect to the iteration order of `m`.

## Updates

Jun 7th 2019: Added support for conditions, as a way to provide language level access to tree-shaking.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Link time sets and maps #371

Link Time Sets and Maps, by Example

Initial Feature Specification Proposal

Syntax

Static Analysis

Dynamic Semantics

Updates

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Link time sets and maps #371

Description

Link Time Sets and Maps, by Example

Initial Feature Specification Proposal

Syntax

Static Analysis

Dynamic Semantics

Updates

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions