@@ -4,7 +4,7 @@ Author: Bob Nystrom
4
4
5
5
Status: In-progress
6
6
7
- Version 0.3 (see [ CHANGELOG] ( #CHANGELOG ) at end)
7
+ Version 0.4 (see [ CHANGELOG] ( #CHANGELOG ) at end)
8
8
9
9
Experiment flag: unquoted-imports
10
10
@@ -107,18 +107,18 @@ import widget.tla.proto/client/component;
107
107
```
108
108
109
109
You can probably infer what's going on from the before and after, but the basic
110
- idea is that the library is a slash-separated series of dotted identifier
111
- segments. The first segment is the name of the package . The rest is the path to
112
- the library within that package. A ` .dart ` extension is implicitly added to the
113
- end. If there is only a single segment, it is treated as the package name and
114
- its last dotted component is the path. If the package name is ` dart ` , it's a
115
- "dart:" library import.
110
+ idea is that the library is a slash-separated series path segments, each of
111
+ which is a dotted-separated identifier component . The first segment is the name
112
+ of the package. The rest is the path to the library within that package. A
113
+ ` .dart ` extension is implicitly added to the end. If there is only a single
114
+ segment, it is treated as the package name and its last dotted component is the
115
+ path. If the package name is ` dart ` , it's a "dart:" library import.
116
116
117
117
The way I think about the proposed syntax is that relative imports are
118
118
* physical* in that they specify the actual relative path on the file system from
119
119
the current library to another library * file* . Because those are physical file
120
120
paths, they use string literals and file extensions as they do today. SDK and
121
- package imports are * logical* in that you don't know where the library your
121
+ package imports are * logical* in that you don't know where the library you're
122
122
importing lives on your disk. What you know is it's * logical name* and the
123
123
relative location of the library you want inside that package. Since these are
124
124
abstract references to a * library* , they are unquoted and omit the file
@@ -160,7 +160,7 @@ the reasons for the choices this proposal makes:
160
160
161
161
### Path separator
162
162
163
- An import shorthand syntax that only supported a single identifier would work
163
+ A package shorthand syntax that only supported a single identifier would work
164
164
for packages like ` test ` and ` args ` that only expose a single library, but
165
165
would fail for even very common libraries like ` package:flutter/material.dart ` .
166
166
So we need some notion of a package name and a path within the that package.
@@ -220,27 +220,29 @@ import flutter/material;
220
220
Is the ` flutter/material ` part a single token or three (` flutter ` , ` / ` , and
221
221
` material ` )? The main advantage of tokenizing it as a single monolithic token is
222
222
that we could potentially allow characters or identifiers in there aren't
223
- otherwise valid Dart. For example, we could let you use reserved words as path
224
- segments :
223
+ otherwise valid Dart. For example, we could let you use hyphens as word
224
+ separators as in :
225
225
226
226
``` dart
227
- import weird_package/for/if/ ok;
227
+ import weird-package/but- ok;
228
228
```
229
229
230
230
The disadvantage is that the tokenizer doesn't generally have enough context to
231
- know when it should tokenize ` foo/bar ` as a single import path token versus
231
+ know when it should tokenize ` foo/bar ` as a single package path token versus
232
232
three tokens that are presumably dividing two variables named ` foo ` and ` bar ` .
233
233
234
- Unlike Lasse's [ earlier proposal] [ lasse ] , this proposal does * not* tokenize an
235
- import path as a single token. Instead, it's tokenized using Dart's current
234
+ Unlike Lasse's [ earlier proposal] [ lasse ] , this proposal does * not* tokenize a
235
+ package path as a single token. Instead, it's tokenized using Dart's current
236
236
lexical grammar.
237
237
238
- This means you can't have a path segment that's a reserved word or is otherwise
239
- not a valid Dart identifier. Fortunately, our published guidance has * always*
240
- told users that [ package names] [ name guideline ] and [ directories] [ directory
241
- guideline] should be valid Dart identifiers. Pub will complain if you try to
242
- publish a package whose name isn't a valid identifier. Likewise, the linter will
243
- flag directory or library names that aren't identifiers.
238
+ This means you can't have a path component that uses some combination of
239
+ characters that isn't currently a single token in Dart, like ` hyphen-separated `
240
+ or ` 123LeadingDigits ` . A path component must be an identifier (including
241
+ built-in identifiers) or a reserved word. Fortunately, our published guidance
242
+ has * always* told users that [ package names] [ name guideline ] and
243
+ [ directories] [ directory guideline ] should be valid Dart identifiers. Pub will
244
+ complain if you try to publish a package whose name isn't a valid identifier.
245
+ Likewise, the linter will flag file names that aren't identifiers.
244
246
245
247
[ name guideline ] : https://dart.dev/tools/pub/pubspec#name
246
248
[ directory guideline ] : https://dart.dev/effective-dart/style#do-name-packages-and-file-system-entities-using-lowercase-with-underscores
@@ -258,18 +260,52 @@ in a large corpus of pub packages and open source widgets:
258
260
69 ( 0.010%): dotted with non-identifiers =
259
261
```
260
262
261
- This splits every "package:" import's path into segments separated by ` / ` . Then
262
- for each segment, it reports whether the segment is a valid identifier, a
263
- built-in identifier like ` dynamic ` or ` covariant ` , etc. Almost all segments are
264
- either valid identifiers, or dotted identifiers where each subcomponent is a
265
- valid identifier.
263
+ This splits every "package:" path into segments separated by ` / ` . Then it splits
264
+ segments into components separated by ` . ` For each component, the analysis
265
+ reports whether the component is a valid identifier, a built-in identifier like
266
+ ` dynamic ` or ` covariant ` , or a reserved word like ` for ` or ` if ` .
266
267
267
- (For the very small number that aren't, they can continue to use the old quoted
268
- "package:" import syntax to import the library.)
268
+ Components that are not some kind of identifier (regular, reserved, or built-in)
269
+ are vanishingly rare. In those few cases, if a user can't simply rename the
270
+ file, they can continue to use the old quoted "package:" syntax to refer to the
271
+ file.
269
272
270
- I think this approach is much simpler than trying to add special lexing rules.
271
- It's consistent with how Java, C# and other languages parse their imports. It
272
- does mean users can do silly things like:
273
+ ### Reserved words and semi-reserved words
274
+
275
+ One confusing area of Dart that the previous table hints at is that Dart has
276
+ several categories of identifiers that vary in how user-accessible they are:
277
+
278
+ * Reserved words like ` for ` and ` class ` can never be used by a user as a
279
+ regular identifier in any context.
280
+
281
+ * Built-in identifiers like ` abstract ` and ` interface ` can't be used as * type*
282
+ names but can be used as other kinds of identifiers.
283
+
284
+ * Contextual keywords like ` await ` and ` show ` behave like keywords in some
285
+ specific contexts but are usable as regular identifiers everywhere else.
286
+
287
+ This leads to confusion about which of these flavors of identifiers can be used
288
+ as package paths. Which of these, if any, are valid:
289
+
290
+ ``` dart
291
+ import if/else;
292
+ import abstract/interface;
293
+ import show/hide;
294
+ ```
295
+
296
+ Many Dart users (including experts, some of whom may be members of the Dart
297
+ language team) don't know the full list of reserved or semi-reserved words. We
298
+ don't want users to run into problems determining which identifiers work in
299
+ package paths. To that end, we allow * all* reserved words and identifiers,
300
+ including built-in identifiers and contextual keywords as path components.
301
+
302
+ ### Whitespace and comments
303
+
304
+ Even though the unquoted path is tokenized as separate tokens, we don't allow
305
+ whitespace or comments to appear between them as we do in most other places in
306
+ the language.
307
+
308
+ We could allow users to write code like:
273
309
274
310
``` dart
275
311
import strange /* comment */ . but
@@ -281,7 +317,37 @@ import strange /* comment */ . but
281
317
fine;
282
318
```
283
319
284
- But they can also choose to * not* do that.
320
+ This wouldn't cause any problems for a Dart implementation. It would simply
321
+ discard the whitespace and comments as it does elsewhere and the resulting path
322
+ is ` strange.but/another/fine ` .
323
+
324
+ However, it likely causes problems for Dart * users* and other simpler tools and
325
+ scripts that work with Dart code. In particular, we often see homegrown tools
326
+ that want to "parse" a Dart file to find its package references and traverse the
327
+ dependency graph. While these tools ideally should use a full Dart parser (like
328
+ the one in the [ analyzer package] [ ] , which is freely available), the reality is
329
+ that users often cobble together simple scripts using regex to do this kind of
330
+ parsing, or they need to write these tools in a language other than Dart. In
331
+ those cases, if the package path happens to contain whitespace or comments, the
332
+ tool will likely silently fail to recognize the package path.
333
+
334
+ [ analyzer package ] : https://pub.dev/packages/analyzer
335
+
336
+ Also, we find no compelling * use* for whitespace and comments inside package
337
+ paths. To that end, this proposal makes it an error. All of the tokens in the
338
+ path must be directly adjacent with no whitespace, newlines, or comments between
339
+ them. The previous import is an error. However, we still allow comments in or
340
+ after the directives outside of the path. These are all valid:
341
+
342
+ ``` dart
343
+ import /* Weird but OK. */ some/path;
344
+ export some/path; // Hi there.
345
+ part some/path // Before the semicolon? Really?
346
+ ;
347
+ ```
348
+
349
+ The syntax that results from the above few sections is simple to tokenize and
350
+ parse while looking like a single opaque "unquoted string" to users and tools.
285
351
286
352
## Syntax
287
353
@@ -291,54 +357,57 @@ We add a new rule and hang it off the existing `uri` rule already used by import
291
357
and export directives:
292
358
293
359
```
294
- uri ::= stringLiteral | packagePath
295
- packagePath ::= packagePathSegment ( '/' packagePathSegment )*
296
- packagePathSegment ::= dottedIdentifierList
297
- dottedIdentifierList ::= identifier ('.' identifier)*
360
+ uri ::= stringLiteral | packagePath
361
+ packagePath ::= pathSegment ( '/' pathSegment )*
362
+ pathSegment ::= segmentComponent ( '.' segmentComponent )*
363
+ segmentComponent ::= IDENTIFIER
364
+ | RESERVED_WORD
365
+ | BUILT_IN_IDENTIFIER
366
+ | OTHER_IDENTIFIER
298
367
```
299
368
300
- An import or export can continue to use a ` stringLiteral ` for the quoted form
301
- (which is what they will do for relative imports). But they can also use a
302
- ` packagePath ` , which is a slash-separated series of segments, each of which is a
303
- series of dot-separated identifiers. * (The ` dottedIdentifierList ` rule is
304
- already in the grammar and is shown here for clarity.)*
369
+ It is a compile-time error if any whitespace, newlines, or comments occur
370
+ between any of the ` segmentComponent ` , ` / ` , or ` . ` tokens in a ` packagePath ` .
371
+ * In other words, there can be nothing except the terminals themselves from the
372
+ first ` segmentComponent ` in the ` packagePath ` to the last.*
373
+
374
+ * An import, export, or part directive can continue to use a ` stringLiteral ` for
375
+ the quoted form (which is what they will do for relative references). But they
376
+ can also use a ` packagePath ` , which is a slash-separated series of segments,
377
+ each of which is a series of dot-separated components.*
305
378
306
379
### Part directive lookahead
307
380
308
- * There are two directives for working with part files, ` part ` and ` part of ` . The
309
- ` of ` identifier is not a reserved word in Dart. This means that when the parser
310
- sees ` part of ` , it doesn't immediately know if it is looking at a ` part `
311
- directive followed by an unquoted identifier like ` part of; ` or `part
312
- of.some/other.thing;` versus a ` part of` directive like ` part of thing;` or
313
- ` part of 'uri.dart'; ` It must lookahead past the ` of ` identifier to see if the
314
- next token is ` ; ` , ` . ` , ` / ` , or another identifier.*
381
+ * There are two directives for working with part files, ` part ` and ` part of ` .
382
+ This means that when the parser sees ` part of ` , it doesn't immediately know if
383
+ it is looking at a ` part ` directive followed by an unquoted identifier like
384
+ ` part of; ` or ` part of.some/other.thing; ` versus a ` part of ` directive like
385
+ ` part of thing; ` or ` part of 'uri.dart'; ` It must lookahead past the ` of `
386
+ identifier to see if the next token is ` ; ` , ` . ` , ` / ` , or another identifier.*
315
387
316
388
* This may add some complexity to parsing, but should be minor. Dart's grammar
317
389
has other places that require much more (sometimes unbounded) lookahead.*
318
390
319
391
## Static semantics
320
392
321
393
The semantics of the new syntax are defined by taking the ` packagePath ` and
322
- converting it to a string. The directive then behaves as if the user had written
323
- a string literal containing that string . The process is:
394
+ converting it to a URI string. The directive then behaves as if the user had
395
+ written a string literal containing that URI . The process is:
324
396
325
- 1 . Let the * segment* for a ` packagePathSegment ` be a string defined by the
326
- ordered concatenation of the ` identifier ` and ` . ` terminals in the
327
- ` packagePathSegment ` , with all whitespace and comments removed. * So if
328
- ` packagePathSegment ` is ` a . b /* comment */ . c ` , then its * segment* is
397
+ 1 . Let the * segment* for a ` pathSegment ` be a string defined by the ordered
398
+ concatenation of the ` segmentComponent ` and ` . ` terminals in the
399
+ ` pathSegment ` . * So if ` pathSegment ` is ` a.b.c ` , then its * segment* is
329
400
"a.b.c".*
330
401
331
- 2 . Let * segments* be an ordered list of the segments of each
332
- ` packagePathSegment ` in ` packagePath ` . * In other words, this and the
333
- preceding step take the ` packagePath ` and convert it to a list of segment
334
- strings while discarding whitespace and comments. So if ` packagePathSegment `
335
- is ` a . b /* comment */ / c / d . e ` , then * segments* is [ "a.b", "c",
336
- "d.e"] .*
402
+ 2 . Let * segments* be an ordered list of the segments of each ` pathSegment ` in
403
+ ` packagePath ` . * In other words, this and the preceding step take the
404
+ ` packagePath ` and convert it to a list of segment strings. So if
405
+ ` pathSegment ` is ` a.b/c/d.e ` , then * segments* is [ "a.b", "c", "d.e"] .*
337
406
338
407
3 . If the first segment in * segments* is "dart":
339
408
340
- 1 . It is a compile error if there are no subsequent segments. * There's no
341
- "dart: dart " or "package: dart /dart.dart" library. We reserve the right
409
+ 1 . It is a compile-time error if there are no subsequent segments. * There's
410
+ no "dart: dart " or "package: dart /dart.dart" library. We reserve the right
342
411
to use ` import dart; ` in the future to mean something useful.*
343
412
344
413
2 . Let * path* be the concatenation of the remaining segments, separated
@@ -347,38 +416,38 @@ a string literal containing that string. The process is:
347
416
imports. But a custom Dart embedder or future version of Dart could in
348
417
theory introduce directories for SDK libraries.*
349
418
350
- 3 . The URI is "dart:* path* ". * So ` import dart/async; ` desugars to
351
- ` import "dart:async"; ` .*
419
+ 3 . The URI is "dart:* path* ". * So ` import dart/async; ` imports the library
420
+ ` "dart:async" ` .*
352
421
353
422
4 . Else if there is only a single segment:
354
423
355
424
1 . Let * name* be the segment.
356
425
357
- 2 . Let * path* be the last identifier in the segment. * If the segment is
358
- only a single identifier , this is the entire segment. Otherwise, it's
359
- the last identifier after the last ` . ` . So in ` foo ` , * path * is ` foo ` .
360
- In ` foo.bar.baz ` , it's ` baz ` .*
426
+ 2 . Let * path* be the last ` segmentComponent ` in the segment. * If the
427
+ segment is only a single ` segmentComponent ` , this is the entire segment.
428
+ Otherwise, it's the last identifier after the last ` . ` . So in ` foo ` ,
429
+ * path * is ` foo ` . In ` foo.bar.baz ` , it's ` baz ` .*
361
430
362
- 3 . The URI is "package:* name* /* path* .dart". * So ` import test; ` desugars to
363
- ` import "package:test/test.dart"; ` , and ` import server.api; ` desugars
364
- to ` import "package:server.api/api.dart"; ` .*
431
+ 3 . The URI is "package:* name* /* path* .dart". * So ` import test; ` imports the
432
+ library ` "package:test/test.dart" ` , and ` import server.api; ` imports
433
+ ` "package:server.api/api.dart" ` .*
365
434
366
435
5 . Else:
367
436
368
437
1 . Let * path* be the concatenation of the segments, separated by ` / ` .
369
438
370
- 3 . The URI is "package:* path* .dart". * So ` import a/b/c/d; ` desugars to
371
- ` import "package:a/b/c/d.dart"; ` .
439
+ 2 . The URI is "package:* path* .dart". * So ` import a/b/c/d; ` imports
440
+ ` "package:a/b/c/d.dart" ` .
372
441
373
442
Once the ` packagePath ` has been converted to a string, the directive behaves
374
443
exactly as if the user had written a ` stringLiteral ` containing that same
375
444
string.
376
445
377
- Given the list of segments, here is a complete implementation of the desugaring
378
- logic in Dart :
446
+ Given the list of segments, here is a complete Dart implementation of the logic
447
+ to convert an unquoted path to the effective URI it refers to :
379
448
380
449
``` dart
381
- String desugar (List<String> segments) => switch (segments) {
450
+ String toUri (List<String> segments) => switch (segments) {
382
451
['dart'] => 'ERROR. Not allowed to import just "dart"',
383
452
['dart', ...var rest] => 'dart:${rest.join('/')}',
384
453
[var name] => 'package:$name/${name.split('.').last}.dart',
@@ -409,15 +478,15 @@ may make a breaking change and remove support for the old syntax.
409
478
410
479
The ` part of ` directive allows a library name after ` of ` instead of a string
411
480
literal. With this proposal, that syntax is now ambiguous. Is it interpreted
412
- as a library name, or as an unquoted URI that should be desugared to a URI?
481
+ as a library name, or as an unquoted URI that should be converted to a URI?
413
482
In other words, given:
414
483
415
484
``` dart
416
485
part of foo.bar;
417
486
```
418
487
419
488
Is the file saying it's a part of the library containing ` library foo.bar; ` or
420
- that it's part of the library found at URI ` package:foo/bar.dart ` ?
489
+ that it's part of the library found at URI ` package:foo.bar /bar.dart ` ?
421
490
422
491
Library names in ` part of ` directives have been deprecated for many years
423
492
because the syntax doesn't work well with many tools. How is a given tool
@@ -463,7 +532,7 @@ this proposal's semantics. In other words, `part of foo.bar;` is part of the
463
532
library at ` package:foo/bar.dart ` , not part of the library with name ` foo.bar ` .
464
533
465
534
Users affected by the breakage can and should update their ` part of ` directive
466
- to point to the URI of the library that the file is a part, using either the
535
+ to point to the URI of the library that the file is a part of , using either the
467
536
quoted or unquoted syntax.
468
537
469
538
### Language versioning
@@ -487,7 +556,7 @@ Since the static semantics are so simple, it is trivial to write a `dart fix`
487
556
that automatically converts existing "dart:" and "package:" string-based
488
557
directives to the new syntax. A handful of regexes are sufficient to break an
489
558
existing import into a series of slash-separated segments which are
490
- dot-separated identifiers . Then the above snippet of Dart code will convert that
559
+ dot-separated components . Then the above snippet of Dart code will convert that
491
560
to the new syntax.
492
561
493
562
### Lint
@@ -501,6 +570,12 @@ new unquoted style whenever an existing directive could use it.
501
570
502
571
## Changelog
503
572
573
+ ### 0.4
574
+
575
+ - Allow reserved words and built-in identifiers as path components (#3984 ).
576
+
577
+ - Disallow whitespace and comments inside package paths (#3983 ).
578
+
504
579
### 0.3
505
580
506
581
- Address breaking change in ` part of ` directives with library names.
0 commit comments