Skip to content

Commit 94194ce

Browse files
authored
Allow reserved words in unquoted imports but disallow whitespace. (#4035)
Allow reserved words in unquoted imports but disallow whitespace. Fix #3983. Fix #3984.
1 parent b94de0e commit 94194ce

File tree

1 file changed

+155
-80
lines changed

1 file changed

+155
-80
lines changed

working/unquoted-imports/feature-specification.md

Lines changed: 155 additions & 80 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ Author: Bob Nystrom
44

55
Status: In-progress
66

7-
Version 0.3 (see [CHANGELOG](#CHANGELOG) at end)
7+
Version 0.4 (see [CHANGELOG](#CHANGELOG) at end)
88

99
Experiment flag: unquoted-imports
1010

@@ -107,18 +107,18 @@ import widget.tla.proto/client/component;
107107
```
108108

109109
You can probably infer what's going on from the before and after, but the basic
110-
idea is that the library is a slash-separated series of dotted identifier
111-
segments. The first segment is the name of the package. The rest is the path to
112-
the library within that package. A `.dart` extension is implicitly added to the
113-
end. If there is only a single segment, it is treated as the package name and
114-
its last dotted component is the path. If the package name is `dart`, it's a
115-
"dart:" library import.
110+
idea is that the library is a slash-separated series path segments, each of
111+
which is a dotted-separated identifier component. The first segment is the name
112+
of the package. The rest is the path to the library within that package. A
113+
`.dart` extension is implicitly added to the end. If there is only a single
114+
segment, it is treated as the package name and its last dotted component is the
115+
path. If the package name is `dart`, it's a "dart:" library import.
116116

117117
The way I think about the proposed syntax is that relative imports are
118118
*physical* in that they specify the actual relative path on the file system from
119119
the current library to another library *file*. Because those are physical file
120120
paths, they use string literals and file extensions as they do today. SDK and
121-
package imports are *logical* in that you don't know where the library your
121+
package imports are *logical* in that you don't know where the library you're
122122
importing lives on your disk. What you know is it's *logical name* and the
123123
relative location of the library you want inside that package. Since these are
124124
abstract references to a *library*, they are unquoted and omit the file
@@ -160,7 +160,7 @@ the reasons for the choices this proposal makes:
160160

161161
### Path separator
162162

163-
An import shorthand syntax that only supported a single identifier would work
163+
A package shorthand syntax that only supported a single identifier would work
164164
for packages like `test` and `args` that only expose a single library, but
165165
would fail for even very common libraries like `package:flutter/material.dart`.
166166
So we need some notion of a package name and a path within the that package.
@@ -220,27 +220,29 @@ import flutter/material;
220220
Is the `flutter/material` part a single token or three (`flutter`, `/`, and
221221
`material`)? The main advantage of tokenizing it as a single monolithic token is
222222
that we could potentially allow characters or identifiers in there aren't
223-
otherwise valid Dart. For example, we could let you use reserved words as path
224-
segments:
223+
otherwise valid Dart. For example, we could let you use hyphens as word
224+
separators as in:
225225

226226
```dart
227-
import weird_package/for/if/ok;
227+
import weird-package/but-ok;
228228
```
229229

230230
The disadvantage is that the tokenizer doesn't generally have enough context to
231-
know when it should tokenize `foo/bar` as a single import path token versus
231+
know when it should tokenize `foo/bar` as a single package path token versus
232232
three tokens that are presumably dividing two variables named `foo` and `bar`.
233233

234-
Unlike Lasse's [earlier proposal][lasse], this proposal does *not* tokenize an
235-
import path as a single token. Instead, it's tokenized using Dart's current
234+
Unlike Lasse's [earlier proposal][lasse], this proposal does *not* tokenize a
235+
package path as a single token. Instead, it's tokenized using Dart's current
236236
lexical grammar.
237237

238-
This means you can't have a path segment that's a reserved word or is otherwise
239-
not a valid Dart identifier. Fortunately, our published guidance has *always*
240-
told users that [package names][name guideline] and [directories][directory
241-
guideline] should be valid Dart identifiers. Pub will complain if you try to
242-
publish a package whose name isn't a valid identifier. Likewise, the linter will
243-
flag directory or library names that aren't identifiers.
238+
This means you can't have a path component that uses some combination of
239+
characters that isn't currently a single token in Dart, like `hyphen-separated`
240+
or `123LeadingDigits`. A path component must be an identifier (including
241+
built-in identifiers) or a reserved word. Fortunately, our published guidance
242+
has *always* told users that [package names][name guideline] and
243+
[directories][directory guideline] should be valid Dart identifiers. Pub will
244+
complain if you try to publish a package whose name isn't a valid identifier.
245+
Likewise, the linter will flag file names that aren't identifiers.
244246

245247
[name guideline]: https://dart.dev/tools/pub/pubspec#name
246248
[directory guideline]: https://dart.dev/effective-dart/style#do-name-packages-and-file-system-entities-using-lowercase-with-underscores
@@ -258,18 +260,52 @@ in a large corpus of pub packages and open source widgets:
258260
69 ( 0.010%): dotted with non-identifiers =
259261
```
260262

261-
This splits every "package:" import's path into segments separated by `/`. Then
262-
for each segment, it reports whether the segment is a valid identifier, a
263-
built-in identifier like `dynamic` or `covariant`, etc. Almost all segments are
264-
either valid identifiers, or dotted identifiers where each subcomponent is a
265-
valid identifier.
263+
This splits every "package:" path into segments separated by `/`. Then it splits
264+
segments into components separated by `.` For each component, the analysis
265+
reports whether the component is a valid identifier, a built-in identifier like
266+
`dynamic` or `covariant`, or a reserved word like `for` or `if`.
266267

267-
(For the very small number that aren't, they can continue to use the old quoted
268-
"package:" import syntax to import the library.)
268+
Components that are not some kind of identifier (regular, reserved, or built-in)
269+
are vanishingly rare. In those few cases, if a user can't simply rename the
270+
file, they can continue to use the old quoted "package:" syntax to refer to the
271+
file.
269272

270-
I think this approach is much simpler than trying to add special lexing rules.
271-
It's consistent with how Java, C# and other languages parse their imports. It
272-
does mean users can do silly things like:
273+
### Reserved words and semi-reserved words
274+
275+
One confusing area of Dart that the previous table hints at is that Dart has
276+
several categories of identifiers that vary in how user-accessible they are:
277+
278+
* Reserved words like `for` and `class` can never be used by a user as a
279+
regular identifier in any context.
280+
281+
* Built-in identifiers like `abstract` and `interface` can't be used as *type*
282+
names but can be used as other kinds of identifiers.
283+
284+
* Contextual keywords like `await` and `show` behave like keywords in some
285+
specific contexts but are usable as regular identifiers everywhere else.
286+
287+
This leads to confusion about which of these flavors of identifiers can be used
288+
as package paths. Which of these, if any, are valid:
289+
290+
```dart
291+
import if/else;
292+
import abstract/interface;
293+
import show/hide;
294+
```
295+
296+
Many Dart users (including experts, some of whom may be members of the Dart
297+
language team) don't know the full list of reserved or semi-reserved words. We
298+
don't want users to run into problems determining which identifiers work in
299+
package paths. To that end, we allow *all* reserved words and identifiers,
300+
including built-in identifiers and contextual keywords as path components.
301+
302+
### Whitespace and comments
303+
304+
Even though the unquoted path is tokenized as separate tokens, we don't allow
305+
whitespace or comments to appear between them as we do in most other places in
306+
the language.
307+
308+
We could allow users to write code like:
273309

274310
```dart
275311
import strange /* comment */ . but
@@ -281,7 +317,37 @@ import strange /* comment */ . but
281317
fine;
282318
```
283319

284-
But they can also choose to *not* do that.
320+
This wouldn't cause any problems for a Dart implementation. It would simply
321+
discard the whitespace and comments as it does elsewhere and the resulting path
322+
is `strange.but/another/fine`.
323+
324+
However, it likely causes problems for Dart *users* and other simpler tools and
325+
scripts that work with Dart code. In particular, we often see homegrown tools
326+
that want to "parse" a Dart file to find its package references and traverse the
327+
dependency graph. While these tools ideally should use a full Dart parser (like
328+
the one in the [analyzer package][], which is freely available), the reality is
329+
that users often cobble together simple scripts using regex to do this kind of
330+
parsing, or they need to write these tools in a language other than Dart. In
331+
those cases, if the package path happens to contain whitespace or comments, the
332+
tool will likely silently fail to recognize the package path.
333+
334+
[analyzer package]: https://pub.dev/packages/analyzer
335+
336+
Also, we find no compelling *use* for whitespace and comments inside package
337+
paths. To that end, this proposal makes it an error. All of the tokens in the
338+
path must be directly adjacent with no whitespace, newlines, or comments between
339+
them. The previous import is an error. However, we still allow comments in or
340+
after the directives outside of the path. These are all valid:
341+
342+
```dart
343+
import /* Weird but OK. */ some/path;
344+
export some/path; // Hi there.
345+
part some/path // Before the semicolon? Really?
346+
;
347+
```
348+
349+
The syntax that results from the above few sections is simple to tokenize and
350+
parse while looking like a single opaque "unquoted string" to users and tools.
285351

286352
## Syntax
287353

@@ -291,54 +357,57 @@ We add a new rule and hang it off the existing `uri` rule already used by import
291357
and export directives:
292358

293359
```
294-
uri ::= stringLiteral | packagePath
295-
packagePath ::= packagePathSegment ( '/' packagePathSegment )*
296-
packagePathSegment ::= dottedIdentifierList
297-
dottedIdentifierList ::= identifier ('.' identifier)*
360+
uri ::= stringLiteral | packagePath
361+
packagePath ::= pathSegment ( '/' pathSegment )*
362+
pathSegment ::= segmentComponent ( '.' segmentComponent )*
363+
segmentComponent ::= IDENTIFIER
364+
| RESERVED_WORD
365+
| BUILT_IN_IDENTIFIER
366+
| OTHER_IDENTIFIER
298367
```
299368

300-
An import or export can continue to use a `stringLiteral` for the quoted form
301-
(which is what they will do for relative imports). But they can also use a
302-
`packagePath`, which is a slash-separated series of segments, each of which is a
303-
series of dot-separated identifiers. *(The `dottedIdentifierList` rule is
304-
already in the grammar and is shown here for clarity.)*
369+
It is a compile-time error if any whitespace, newlines, or comments occur
370+
between any of the `segmentComponent`, `/`, or `.` tokens in a `packagePath`.
371+
*In other words, there can be nothing except the terminals themselves from the
372+
first `segmentComponent` in the `packagePath` to the last.*
373+
374+
*An import, export, or part directive can continue to use a `stringLiteral` for
375+
the quoted form (which is what they will do for relative references). But they
376+
can also use a `packagePath`, which is a slash-separated series of segments,
377+
each of which is a series of dot-separated components.*
305378

306379
### Part directive lookahead
307380

308-
*There are two directives for working with part files, `part` and `part of`. The
309-
`of` identifier is not a reserved word in Dart. This means that when the parser
310-
sees `part of`, it doesn't immediately know if it is looking at a `part`
311-
directive followed by an unquoted identifier like `part of;` or `part
312-
of.some/other.thing;` versus a `part of` directive like `part of thing;` or
313-
`part of 'uri.dart';` It must lookahead past the `of` identifier to see if the
314-
next token is `;`, `.`, `/`, or another identifier.*
381+
*There are two directives for working with part files, `part` and `part of`.
382+
This means that when the parser sees `part of`, it doesn't immediately know if
383+
it is looking at a `part` directive followed by an unquoted identifier like
384+
`part of;` or `part of.some/other.thing;` versus a `part of` directive like
385+
`part of thing;` or `part of 'uri.dart';` It must lookahead past the `of`
386+
identifier to see if the next token is `;`, `.`, `/`, or another identifier.*
315387

316388
*This may add some complexity to parsing, but should be minor. Dart's grammar
317389
has other places that require much more (sometimes unbounded) lookahead.*
318390

319391
## Static semantics
320392

321393
The semantics of the new syntax are defined by taking the `packagePath` and
322-
converting it to a string. The directive then behaves as if the user had written
323-
a string literal containing that string. The process is:
394+
converting it to a URI string. The directive then behaves as if the user had
395+
written a string literal containing that URI. The process is:
324396

325-
1. Let the *segment* for a `packagePathSegment` be a string defined by the
326-
ordered concatenation of the `identifier` and `.` terminals in the
327-
`packagePathSegment`, with all whitespace and comments removed. *So if
328-
`packagePathSegment` is `a . b /* comment */ . c`, then its *segment* is
397+
1. Let the *segment* for a `pathSegment` be a string defined by the ordered
398+
concatenation of the `segmentComponent` and `.` terminals in the
399+
`pathSegment`. *So if `pathSegment` is `a.b.c`, then its *segment* is
329400
"a.b.c".*
330401

331-
2. Let *segments* be an ordered list of the segments of each
332-
`packagePathSegment` in `packagePath`. *In other words, this and the
333-
preceding step take the `packagePath` and convert it to a list of segment
334-
strings while discarding whitespace and comments. So if `packagePathSegment`
335-
is `a . b /* comment */ / c / d . e`, then *segments* is ["a.b", "c",
336-
"d.e"].*
402+
2. Let *segments* be an ordered list of the segments of each `pathSegment` in
403+
`packagePath`. *In other words, this and the preceding step take the
404+
`packagePath` and convert it to a list of segment strings. So if
405+
`pathSegment` is `a.b/c/d.e`, then *segments* is ["a.b", "c", "d.e"].*
337406

338407
3. If the first segment in *segments* is "dart":
339408

340-
1. It is a compile error if there are no subsequent segments. *There's no
341-
"dart:dart" or "package:dart/dart.dart" library. We reserve the right
409+
1. It is a compile-time error if there are no subsequent segments. *There's
410+
no "dart:dart" or "package:dart/dart.dart" library. We reserve the right
342411
to use `import dart;` in the future to mean something useful.*
343412

344413
2. Let *path* be the concatenation of the remaining segments, separated
@@ -347,38 +416,38 @@ a string literal containing that string. The process is:
347416
imports. But a custom Dart embedder or future version of Dart could in
348417
theory introduce directories for SDK libraries.*
349418

350-
3. The URI is "dart:*path*". *So `import dart/async;` desugars to
351-
`import "dart:async";`.*
419+
3. The URI is "dart:*path*". *So `import dart/async;` imports the library
420+
`"dart:async"`.*
352421

353422
4. Else if there is only a single segment:
354423

355424
1. Let *name* be the segment.
356425

357-
2. Let *path* be the last identifier in the segment. *If the segment is
358-
only a single identifier, this is the entire segment. Otherwise, it's
359-
the last identifier after the last `.`. So in `foo`, *path* is `foo`.
360-
In `foo.bar.baz`, it's `baz`.*
426+
2. Let *path* be the last `segmentComponent` in the segment. *If the
427+
segment is only a single `segmentComponent`, this is the entire segment.
428+
Otherwise, it's the last identifier after the last `.`. So in `foo`,
429+
*path* is `foo`. In `foo.bar.baz`, it's `baz`.*
361430

362-
3. The URI is "package:*name*/*path*.dart". *So `import test;` desugars to
363-
`import "package:test/test.dart";`, and `import server.api;` desugars
364-
to `import "package:server.api/api.dart";`.*
431+
3. The URI is "package:*name*/*path*.dart". *So `import test;` imports the
432+
library `"package:test/test.dart"`, and `import server.api;` imports
433+
`"package:server.api/api.dart"`.*
365434

366435
5. Else:
367436

368437
1. Let *path* be the concatenation of the segments, separated by `/`.
369438

370-
3. The URI is "package:*path*.dart". *So `import a/b/c/d;` desugars to
371-
`import "package:a/b/c/d.dart";`.
439+
2. The URI is "package:*path*.dart". *So `import a/b/c/d;` imports
440+
`"package:a/b/c/d.dart"`.
372441

373442
Once the `packagePath` has been converted to a string, the directive behaves
374443
exactly as if the user had written a `stringLiteral` containing that same
375444
string.
376445

377-
Given the list of segments, here is a complete implementation of the desugaring
378-
logic in Dart:
446+
Given the list of segments, here is a complete Dart implementation of the logic
447+
to convert an unquoted path to the effective URI it refers to:
379448

380449
```dart
381-
String desugar(List<String> segments) => switch (segments) {
450+
String toUri(List<String> segments) => switch (segments) {
382451
['dart'] => 'ERROR. Not allowed to import just "dart"',
383452
['dart', ...var rest] => 'dart:${rest.join('/')}',
384453
[var name] => 'package:$name/${name.split('.').last}.dart',
@@ -409,15 +478,15 @@ may make a breaking change and remove support for the old syntax.
409478

410479
The `part of` directive allows a library name after `of` instead of a string
411480
literal. With this proposal, that syntax is now ambiguous. Is it interpreted
412-
as a library name, or as an unquoted URI that should be desugared to a URI?
481+
as a library name, or as an unquoted URI that should be converted to a URI?
413482
In other words, given:
414483

415484
```dart
416485
part of foo.bar;
417486
```
418487

419488
Is the file saying it's a part of the library containing `library foo.bar;` or
420-
that it's part of the library found at URI `package:foo/bar.dart`?
489+
that it's part of the library found at URI `package:foo.bar/bar.dart`?
421490

422491
Library names in `part of` directives have been deprecated for many years
423492
because the syntax doesn't work well with many tools. How is a given tool
@@ -463,7 +532,7 @@ this proposal's semantics. In other words, `part of foo.bar;` is part of the
463532
library at `package:foo/bar.dart`, not part of the library with name `foo.bar`.
464533

465534
Users affected by the breakage can and should update their `part of` directive
466-
to point to the URI of the library that the file is a part, using either the
535+
to point to the URI of the library that the file is a part of, using either the
467536
quoted or unquoted syntax.
468537

469538
### Language versioning
@@ -487,7 +556,7 @@ Since the static semantics are so simple, it is trivial to write a `dart fix`
487556
that automatically converts existing "dart:" and "package:" string-based
488557
directives to the new syntax. A handful of regexes are sufficient to break an
489558
existing import into a series of slash-separated segments which are
490-
dot-separated identifiers. Then the above snippet of Dart code will convert that
559+
dot-separated components. Then the above snippet of Dart code will convert that
491560
to the new syntax.
492561

493562
### Lint
@@ -501,6 +570,12 @@ new unquoted style whenever an existing directive could use it.
501570

502571
## Changelog
503572

573+
### 0.4
574+
575+
- Allow reserved words and built-in identifiers as path components (#3984).
576+
577+
- Disallow whitespace and comments inside package paths (#3983).
578+
504579
### 0.3
505580

506581
- Address breaking change in `part of` directives with library names.

0 commit comments

Comments
 (0)