Skip to content

Link time sets and maps #371

Open
Open
@eernstg

Description

@eernstg

In response to #369, this issue proposes link time sets and maps as a feature that enables a small amount of pre-main computation beyond what's already possible with const. The point is that this allows a "well-known" declaration D (in some library that is widely used) to offer a registration service, in the sense that any library where D is imported can contribute to the value of D.

In #369, @davidmorgan mentioned some important situations where this is useful:

Some frameworks need a way to run initialization code for all
leaf code using the framework. For example, dependency injection
frameworks want to build a map of injectable types, and serialization
frameworks want to know about all serializable types.

The feature proposed here is built directly on top of the support for constants in Dart, and it will not enable execution of user-written code. However it is sufficiently powerful to satisfy the need for modular expression (for instance, the framework can build a map of all injectable types, even though they are declared in libraries not imported by the framework), and it allows the main program to use the data whenever it wishes to do so (there is no inherent startup cost, because any work associated with the registered entities is performed by explicit code called from main, which can use any algorithm we'd want, e.g., to support laziness).

In order to avoid problems with excessively large programs (in terms of code size or heap size), it is crucial that this mechanism allows for conditional contributions: For instance, the transitive closure of some useful imports may include a very large number of serializable types, but any given program may only actually use a few of them.

This feature relies on tree-shaking to enable inclusion of approximately the smallest possible set of entities. It should be noted that tree-shaking is an implementation dependent concept, but it is required to have a soundness property: No tree-shaking algorithm is allowed to remove any part of the program which is actually used at run-time.

Link Time Sets and Maps, by Example

This feature adds a new kind of top-level declaration that introduces a link time set or map:

Set<T> linkTimeSet of const;

Map<K, V> linkTimeMap of const;

linkTimeSet and linkTimeMap are not constant (so they cannot be used in constant expressions), but they can contain elements (linkTimeSet) and key/value bindings (linkTimeMap) which are specified elsewhere, in so-called contributing declarations (described below). A link time set or map is immutable.

A contributing declaration for a link time set or map is a declaration that contributes a single element to the set, respectively a single binding for the map:

const linkTimeSet.add(c);
const linkTimeMap[c1] = c2;

It is an error unless c, c1 and c2 are constant expressions. These declarations can only have the form shown above, so any attempt to write general code to manipulate the set/map is an error—this feature only supports building the set/map one element/binding at a time.

The usual static errors apply, as if the contributing declaration had been a statement in the same binding environment. For instance, it is a compile-time error unless c is assignable to T.

A conditional contributing declaration includes a condition which lists some entities (such as classes or functions):

const linkTimeSet.add(c) if E1, ..., Ek;
const linkTimeMap[c1] = c2 if E1, ..., Ek;

Again, conditional contributing declarations can only have the forms shown above (we can only use add and []=), the expressions c, c1, and c2 must be potentially constant, and the usual static errors apply just like they do in the unconditional case. E1 .. Ek are constant expressions denoting types or functions.

The value of linkTimeSet is an immutable set of type Set<T> containing the elements which are added using contributing declarations. Some contributing declarations may add the same element (which is not an error).

The value of linkTimeMap is an immutable map of type Map<K, V> containing the bindings which are added by contributing declarations. It is a link time error if two contributed bindings have the same key but different values. It is not an error if two contributed bindings have the same keys and the same values. Again, no guarantees are given for the iteration order.

(A link time error is a new concept in the specification of Dart. It just needs to occur before the execution of main starts, but specific tool chains may offer a specific link operation where the error can occur. In any case, the error will be detected if the resulting program is executed at all, not just if the program gets some specific input.)

The process whereby link time sets and maps receive elements respectively bindings runs in phases. In the first phase, all conditional contributing declarations are ignored. At that point a tree-shaking procedure may run, which marks a subset of the entities in the program as unreachable. Now each conditional contributing declaration is visited, and if all of E1 .. Ek are marked as unreachable then it is still ignored, otherwise the requested contribution takes place, which may cause some entities that were previously marked as unreachable to be included in the program. This step is repeated until it has no effect.

Note that there are no constraints on the relationship between the condition and the object/binding which is contributed. This means that it is possible, for instance, to specify that a deserialization helper function deserializeC can be found under the key "C" in the map deserializers if the program uses the class C as follows:

// In some core library of the serialization package.
Map<String, Object Function(String)> deserializers of const;

// In some third party library.
class C {...}
C deserializeC(String s) {...}
const deserializers["C"] = deserializeC if C;

Different applications may need to use different techniques in order to enable the inclusion of deserializeC exactly if that is needed. For instance, it could be the case that the program simply uses C in that case, by creating new instances of C explicitly, or by using C as a type annotation. It could also be necessary to force usage of certain classes like C, say, because some supertypes of C are used explicitly but C itself is one of a number of implementation classes, and we only know which implementation classes are needed because that's a business level "contract" with some cloud services that this application is using. In that case we may need to mention the required classes in main:

main() {
  used(C);
}

A little bit of careful coding is needed whenever there is a need to interact with implementation dependent mechanisms like tree-shaking, but the point is that used should be written in such a way that no optimizations can eliminate the dependency on C.

We would need to standardize how to do such things, but surely it will be possible to achieve the desired effect: That C is included after all optimizations are complete.

Initial Feature Specification Proposal

Syntax

The grammar is modified as follows:

<topLevelDefinition> ::= // Add two new alternatives.
    ... |
    'const' <linkTimeCollectionContribution> ';' |
    <linkTimeCollectionDeclaration> ';'

<linkTimeCollectionContribution> ::= // New rule.
    (<typeIdentifier> '.')? <identifier> <linkTimeCollectionAction>
    ('if' <expressionList>)?

<linkTimeCollectionAction> ::= // New rule.
    '.' <identifier> '(' <expression> ')' |
    '[' <expression> ']' '=' <expression>

<linkTimeCollectionDeclaration> ::= // New rule.
    <identifier> <typeArguments> <identifier> 'of' 'const'

In order to experiment with the new syntax and see the complete grammar, please consult this CL.

Note that the type arguments are not optional. This is an opinionated choice, based on the assumption that a semi-dynamically typed link time collections should be avoided, or at least explicitly typed as Set<dynamic> or Map<dynamic, dynamic>. Similarly there is no way to omit the type entirely.

Static Analysis

We introduce the implementation dependent notion of program inclusion.

This is usually associated with an algorithm known as tree-shaking. In general, program inclusion proceeds by marking a program element E as included if some potential execution starting from main may depend on E, e.g., by executing it if E is an expression or statement, or by referring to it if E is a declaration. At the end of a fixpoint iteration where no more entities are included, all the entities which were not included may be removed from the program before deployment. Any size of program element may be eligible for inclusion, e.g., a single subexpression or an entire class, but this feature only relies on the inclusion of a type or a function as a whole, and it does not matter whether parts of said type or function have been eliminated.

An implementation is required to use a sound algorithm to compute which entities are included in a program, but it is allowed to eliminate any program element which is guaranteed to be unable to observably influence the execution.

Events like 'out of memory' may be influenced by tree-shaking. But they are also implementation dependent, and hence they are not considered observable.

It is a bug if any program element is actually used, but it was eliminated by tree-shaking. On the other hand, it is allowed for an implementation to use the trivial algorithm which does not eliminate any program elements at all. This means that different implementations may offer a different quality of service in this respect, but given that each deployed program was actually produced by a specific tool chain, and tree-shaking is completed before deployment, it is always possible to assess the actual quality of tree-shaking.

Consider a <linkTimeCollectionDeclaration> D of the form id<T1, .. Tk> c of const. It is a compile-time error unless each of T1 .. Tk is a constant type expression. It is a compile-time error unless one of the following holds:

  • id is Set and k is 1.
  • id is Map and k is 2.

The effect of D is to introduce the identifier c into the library scope of the enclosing library, with the declared type id<T1, .. Tk>.

Consider a top level definition derived from 'const' <linkTimeCollectionContribution> ';' D of the form const q.id(e);. It is a compile-time error unless id is add. It is a compile-time error unless e is a constant expression. It is a compile-time error unless q denotes a link-time collection declaration whose type is of the form Set<T>, where Set denotes the built-in set type. It is a compile-time error unless the type of e is a subtype of T.

Consider the case where D is of the form const q.id(e) if E1, ..., Ek;. In this case exactly the same compile-time errors exist for id, q, and for the type of e as for the form with no if. Moreover, it is a compile-time error unless e is a potentially constant expression, and it is a compile-time error unless each Ej for j in 1..k is a constant expression denoting a type or a function.

Consider a top level definition derived from 'const' <linkTimeCollectionContribution> ';' D of the form const q[e1] = e2;. It is a compile-time error unless e1 and e2 are constant expressions. It is a compile-time error unless q denotes a link-time collection declaration whose type is of the form Map<K, V>, where Map denotes the built-in map type. It is a compile-time error unless the type of the value of e1 is a subtype of K and the type of the value of e2 is a subtype of V.

Consider the case where D is of the form const q[e1] = e1 if E1, ..., Ek;. In this case exactly the same compile-time errors exist for q, and for the types of e1 and e2 as for the form with no if. Moreover, it is a compile-time error unless e1 and e2 are potentially constant expressions, and it is a compile-time error unless each Ej for j in 1..k is a constant expression denoting a type or a function.

It is an error that must be raised before the execution of main (possibly at compile time or link time, if that concept is relevant for a given a tool) if two link time map contributions bind the value k to two different values v1 and v2.

We say that a contributing declaration is conditional when it includes the part starting with if. Other contributing declarations are said to be unconditional.

Consider the case where a script S is given (that is, a library which declares a main function and defines which libraries are included in a complete program based on the transitive closure of its imports). Assume that program inclusion has been computed for a version of the program where every conditional contribution declaration is ignored (as if it had been commented out).

The effect of conditional contribution declarations is then specified by a repeated application of the following step, until it has no effect:

Conditional contribution inclusion step: Consider a conditional contribution declaration D whose condition is of the form if E1, ..., Ek. If program inclusion has marked any of E1, ..., Ek as included then the unconditional contributing declaration corresponding to D is added to the program at the same location as D, and D is removed; otherwise D is still ignored. Next, the program inclusion algorithm is repeated (such that all entities which are potentially reachable starting from the newly added unconditional contribution declarations are also included).

When the iteration is complete, any remaining conditional contributing declarations are removed from the program.

Note that the addition of an unconditional contributing declaration to the program may cause an error, because certain expressions must now be constant rather than just potentially constant. Also note that every object contained by a link-time collection is constant, but the name of a link-time collection is not a constant expression. It follows that they are not canonicalized.

Dynamic Semantics

Note that there is no dynamic semantics for conditional contributing declarations, because they have all been transformed into unconditional ones or removed from the program.

Let s be the name of a link-time collection declaration of type Set<T>. Evaluation of s at run time yields an instance of a subtype of Set<T> which is not a subtype of Set<S> for any S unless T <: S. That instance is an immutable set. (Hence, any attempt to modify it at run-time causes a dynamic error.)

The elements contained in s are exactly the objects that are mentioned in contributing declarations for s of the form const s.add(c) or const p.s.add(c), in any library which is transitively imported by the entry point (which is the "main" library of the program). No guarantees are given with respect to the iteration order of s.

Let m be the name of a link-time collection declaration of type Map<K, V>. Evaluation of m at run time yields an instance of a subtype of Map<K, V> which is not a subtype of Map<K1, V1> for any K1, V1, unless K <: K1 and V <: V1. That instance is an immutable map. (Hence, any attempt to modify it at run-time causes a dynamic error.)

The map elements contained in m are exactly the ones that hold a key k and a value v which occur in contributing declarations for m of the form const s[k] = v or const p.s[k] = v, in any library which is transitively imported by the entry point. No guarantees are given with respect to the iteration order of m.

Updates

Jun 7th 2019: Added support for conditions, as a way to provide language level access to tree-shaking.

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureProposed language feature that solves one or more problems

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions