Skip to content

Commit 98652a6

Browse files
committed
Allow reserved words in unquoted imports but disallow whitespace.
Fix #3983. Fix #3984.
1 parent 5527a8f commit 98652a6

File tree

1 file changed

+134
-61
lines changed

1 file changed

+134
-61
lines changed

working/unquoted-imports/feature-specification.md

+134-61
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ Author: Bob Nystrom
44

55
Status: In-progress
66

7-
Version 0.3 (see [CHANGELOG](#CHANGELOG) at end)
7+
Version 0.4 (see [CHANGELOG](#CHANGELOG) at end)
88

99
Experiment flag: unquoted-imports
1010

@@ -118,7 +118,7 @@ The way I think about the proposed syntax is that relative imports are
118118
*physical* in that they specify the actual relative path on the file system from
119119
the current library to another library *file*. Because those are physical file
120120
paths, they use string literals and file extensions as they do today. SDK and
121-
package imports are *logical* in that you don't know where the library your
121+
package imports are *logical* in that you don't know where the library you're
122122
importing lives on your disk. What you know is it's *logical name* and the
123123
relative location of the library you want inside that package. Since these are
124124
abstract references to a *library*, they are unquoted and omit the file
@@ -160,7 +160,7 @@ the reasons for the choices this proposal makes:
160160

161161
### Path separator
162162

163-
An import shorthand syntax that only supported a single identifier would work
163+
A package shorthand syntax that only supported a single identifier would work
164164
for packages like `test` and `args` that only expose a single library, but
165165
would fail for even very common libraries like `package:flutter/material.dart`.
166166
So we need some notion of a package name and a path within the that package.
@@ -220,27 +220,30 @@ import flutter/material;
220220
Is the `flutter/material` part a single token or three (`flutter`, `/`, and
221221
`material`)? The main advantage of tokenizing it as a single monolithic token is
222222
that we could potentially allow characters or identifiers in there aren't
223-
otherwise valid Dart. For example, we could let you use reserved words as path
224-
segments:
223+
otherwise valid Dart. For example, we could let you use hyphens as word
224+
separators as in:
225225

226226
```dart
227-
import weird_package/for/if/ok;
227+
import weird-package/but-ok;
228228
```
229229

230230
The disadvantage is that the tokenizer doesn't generally have enough context to
231-
know when it should tokenize `foo/bar` as a single import path token versus
231+
know when it should tokenize `foo/bar` as a single package path token versus
232232
three tokens that are presumably dividing two variables named `foo` and `bar`.
233233

234-
Unlike Lasse's [earlier proposal][lasse], this proposal does *not* tokenize an
235-
import path as a single token. Instead, it's tokenized using Dart's current
234+
Unlike Lasse's [earlier proposal][lasse], this proposal does *not* tokenize a
235+
package path as a single token. Instead, it's tokenized using Dart's current
236236
lexical grammar.
237237

238-
This means you can't have a path segment that's a reserved word or is otherwise
239-
not a valid Dart identifier. Fortunately, our published guidance has *always*
240-
told users that [package names][name guideline] and [directories][directory
241-
guideline] should be valid Dart identifiers. Pub will complain if you try to
242-
publish a package whose name isn't a valid identifier. Likewise, the linter will
243-
flag directory or library names that aren't identifiers.
238+
This means you can't have a path segment that uses some combination of
239+
characters that isn't currently a single token in Dart, like `hyphen-separated`
240+
or `123LeadingDigits`. A path component must be an identifier (which may be a
241+
reserved word or built-in identifier, discussed below). Fortunately, our
242+
published guidance has *always* told users that [package names][name guideline]
243+
and [directories][directory guideline] should be valid Dart identifiers. Pub
244+
will complain if you try to publish a package whose name isn't a valid
245+
identifier. Likewise, the linter will flag directory or file names that aren't
246+
identifiers.
244247

245248
[name guideline]: https://dart.dev/tools/pub/pubspec#name
246249
[directory guideline]: https://dart.dev/effective-dart/style#do-name-packages-and-file-system-entities-using-lowercase-with-underscores
@@ -258,18 +261,49 @@ in a large corpus of pub packages and open source widgets:
258261
69 ( 0.010%): dotted with non-identifiers =
259262
```
260263

261-
This splits every "package:" import's path into segments separated by `/`. Then
262-
for each segment, it reports whether the segment is a valid identifier, a
263-
built-in identifier like `dynamic` or `covariant`, etc. Almost all segments are
264-
either valid identifiers, or dotted identifiers where each subcomponent is a
265-
valid identifier.
264+
This splits every "package:" path into segments separated by `/`. Then it splits
265+
segments into components separated by `.` For each component, the analysis
266+
reports whether the component is a valid identifier, a built-in identifier like
267+
`dynamic` or `covariant`, or a reserved word like `for` or `if`.
266268

267-
(For the very small number that aren't, they can continue to use the old quoted
268-
"package:" import syntax to import the library.)
269+
Components that are not some kind of identifier (regular, reserved, or built-in)
270+
are vanishingly rare. In those few cases, if a user can't simply rename the
271+
file, they can continue to use the old quoted "package:" syntax to refer to the
272+
file.
269273

270-
I think this approach is much simpler than trying to add special lexing rules.
271-
It's consistent with how Java, C# and other languages parse their imports. It
272-
does mean users can do silly things like:
274+
### Reserved words and semi-reserved words
275+
276+
One confusing area of Dart that the previous table hints at is that Dart has
277+
several categories of identifiers that vary in how user-accessible they are:
278+
279+
* Reserved words like `for` and `class` can never be used by a user as a
280+
regular identifier in any context.
281+
282+
* Built-in identifiers like `abstract` and `interface` can't be used as *type*
283+
names but can be used as other kinds of identifiers.
284+
285+
* Contextual keywords like `await` and `show` behave like keywords in some
286+
specific contexts but are usable as regular identifiers everywhere else.
287+
288+
This leads to confusion about which of these flavors of identifiers can be used
289+
as package paths. Which of these, if any, are valid:
290+
291+
```dart
292+
import if/else;
293+
import abstract/interface;
294+
import show/hide;
295+
```
296+
297+
Many Dart users (including experts, some of whom may be members of the Dart
298+
language team) don't know the full list of reserved or semi-reserved words. We
299+
don't want them to run into problems determining which identifiers work in
300+
package paths. To that end, we allow *all* identifiers, including reserved
301+
words, built-in identifiers, and contextual keywords as path segments.
302+
303+
### Whitespace and comments
304+
305+
If we don't use any special tokenizing rules for the path, that suggests that
306+
whitespace and comments are allowed between the tokens as in:
273307

274308
```dart
275309
import strange /* comment */ . but
@@ -281,7 +315,37 @@ import strange /* comment */ . but
281315
fine;
282316
```
283317

284-
But they can also choose to *not* do that.
318+
This wouldn't cause any problems for a Dart implementation. It would simply
319+
discard the whitespace and comments as it does elsewhere and the resulting path
320+
is `strange.but/another/fine`.
321+
322+
However, it likely causes problems for Dart *users* and other simpler tools and
323+
scripts that work with Dart code. In particular, we often see homegrown tools
324+
that want to "parse" a Dart file to find its package references and traverse the
325+
dependency graph. While these tools ideally should use a full Dart parser (like
326+
the one in the [analyzer package][], which is freely available), the reality is
327+
that users often cobble together simple scripts using regex to do this kind of
328+
parsing, or they need to write these tools in a language other than Dart. In
329+
those cases, if the package path happens to contain whitespace or comments, the
330+
tool will likely silently fail to recognize the package path.
331+
332+
[analyzer package]: https://pub.dev/packages/analyzer
333+
334+
Also, we find no compelling *use* for whitespace and comments inside package
335+
paths. To that end, this proposal makes it an error. All of the tokens in the
336+
path must be directly adjacent with no whitespace, newlines, or comments between
337+
them. The previous import is an error. However, we still allow comments in or
338+
after the directives outside of the path. These are all valid:
339+
340+
```dart
341+
import /* Weird but OK. */ some/path;
342+
export some/path; // Hi there.
343+
part some/path // Before the semicolon? Really?
344+
;
345+
```
346+
347+
The syntax that results from the above few sections is simple to tokenize and
348+
parse while looking like a single opaque "unquoted string" to users and tools.
285349

286350
## Syntax
287351

@@ -291,27 +355,33 @@ We add a new rule and hang it off the existing `uri` rule already used by import
291355
and export directives:
292356

293357
```
294-
uri ::= stringLiteral | packagePath
295-
packagePath ::= packagePathSegment ( '/' packagePathSegment )*
296-
packagePathSegment ::= dottedIdentifierList
297-
dottedIdentifierList ::= identifier ('.' identifier)*
358+
uri ::= stringLiteral | packagePath
359+
packagePath ::= pathSegment ( '/' pathSegment )*
360+
pathSegment ::= segmentComponent ( '.' segmentComponent )*
361+
segmentComponent ::= identifier
362+
| ⟨RESERVED_WORD⟩
363+
| ⟨BUILT_IN_IDENTIFIER⟩
364+
| ⟨OTHER_IDENTIFIER⟩
298365
```
299366

300-
An import or export can continue to use a `stringLiteral` for the quoted form
301-
(which is what they will do for relative imports). But they can also use a
302-
`packagePath`, which is a slash-separated series of segments, each of which is a
303-
series of dot-separated identifiers. *(The `dottedIdentifierList` rule is
304-
already in the grammar and is shown here for clarity.)*
367+
It is a compile-time error if any whitespace, newlines, or comments occur
368+
between any of the `segmentComponent`, `/`, or `.` tokens in a `packagePath`.
369+
*In other words, there can be nothing except the terminals themselves from the
370+
first `segmentComponent` in the `packagePath` to the last.*
371+
372+
*An import, export, or part directive can continue to use a `stringLiteral` for
373+
the quoted form (which is what they will do for relative references). But they
374+
can also use a `packagePath`, which is a slash-separated series of segments,
375+
each of which is a series of dot-separated components.*
305376

306377
### Part directive lookahead
307378

308-
*There are two directives for working with part files, `part` and `part of`. The
309-
`of` identifier is not a reserved word in Dart. This means that when the parser
310-
sees `part of`, it doesn't immediately know if it is looking at a `part`
311-
directive followed by an unquoted identifier like `part of;` or `part
312-
of.some/other.thing;` versus a `part of` directive like `part of thing;` or
313-
`part of 'uri.dart';` It must lookahead past the `of` identifier to see if the
314-
next token is `;`, `.`, `/`, or another identifier.*
379+
*There are two directives for working with part files, `part` and `part of`.
380+
This means that when the parser sees `part of`, it doesn't immediately know if
381+
it is looking at a `part` directive followed by an unquoted identifier like
382+
`part of;` or `part of.some/other.thing;` versus a `part of` directive like
383+
`part of thing;` or `part of 'uri.dart';` It must lookahead past the `of`
384+
identifier to see if the next token is `;`, `.`, `/`, or another identifier.*
315385

316386
*This may add some complexity to parsing, but should be minor. Dart's grammar
317387
has other places that require much more (sometimes unbounded) lookahead.*
@@ -322,23 +392,20 @@ The semantics of the new syntax are defined by taking the `packagePath` and
322392
converting it to a string. The directive then behaves as if the user had written
323393
a string literal containing that string. The process is:
324394

325-
1. Let the *segment* for a `packagePathSegment` be a string defined by the
326-
ordered concatenation of the `identifier` and `.` terminals in the
327-
`packagePathSegment`, with all whitespace and comments removed. *So if
328-
`packagePathSegment` is `a . b /* comment */ . c`, then its *segment* is
395+
1. Let the *segment* for a `pathSegment` be a string defined by the ordered
396+
concatenation of the `segmentComponent` and `.` terminals in the
397+
`pathSegment`. *So if `pathSegment` is `a.b.c`, then its *segment* is
329398
"a.b.c".*
330399

331-
2. Let *segments* be an ordered list of the segments of each
332-
`packagePathSegment` in `packagePath`. *In other words, this and the
333-
preceding step take the `packagePath` and convert it to a list of segment
334-
strings while discarding whitespace and comments. So if `packagePathSegment`
335-
is `a . b /* comment */ / c / d . e`, then *segments* is ["a.b", "c",
336-
"d.e"].*
400+
2. Let *segments* be an ordered list of the segments of each `pathSegment` in
401+
`packagePath`. *In other words, this and the preceding step take the
402+
`packagePath` and convert it to a list of segment strings. So if
403+
`pathSegment` is `a.b/c/d.e`, then *segments* is ["a.b", "c", "d.e"].*
337404

338405
3. If the first segment in *segments* is "dart":
339406

340-
1. It is a compile error if there are no subsequent segments. *There's no
341-
"dart:dart" or "package:dart/dart.dart" library. We reserve the right
407+
1. It is a compile-time error if there are no subsequent segments. *There's
408+
no "dart:dart" or "package:dart/dart.dart" library. We reserve the right
342409
to use `import dart;` in the future to mean something useful.*
343410

344411
2. Let *path* be the concatenation of the remaining segments, separated
@@ -354,14 +421,14 @@ a string literal containing that string. The process is:
354421

355422
1. Let *name* be the segment.
356423

357-
2. Let *path* be the last identifier in the segment. *If the segment is
358-
only a single identifier, this is the entire segment. Otherwise, it's
359-
the last identifier after the last `.`. So in `foo`, *path* is `foo`.
360-
In `foo.bar.baz`, it's `baz`.*
424+
2. Let *path* be the last `segmentComponent` in the segment. *If the
425+
segment is only a single `segmentComponent`, this is the entire segment.
426+
Otherwise, it's the last identifier after the last `.`. So in `foo`,
427+
*path* is `foo`. In `foo.bar.baz`, it's `baz`.*
361428

362429
3. The URI is "package:*name*/*path*.dart". *So `import test;` desugars to
363-
`import "package:test/test.dart";`, and `import server.api;` desugars
364-
to `import "package:server.api/api.dart";`.*
430+
`import "package:test/test.dart";`, and `import server.api;` desugars to
431+
`import "package:server.api/api.dart";`.*
365432

366433
5. Else:
367434

@@ -463,7 +530,7 @@ this proposal's semantics. In other words, `part of foo.bar;` is part of the
463530
library at `package:foo/bar.dart`, not part of the library with name `foo.bar`.
464531

465532
Users affected by the breakage can and should update their `part of` directive
466-
to point to the URI of the library that the file is a part, using either the
533+
to point to the URI of the library that the file is a part of, using either the
467534
quoted or unquoted syntax.
468535

469536
### Language versioning
@@ -501,6 +568,12 @@ new unquoted style whenever an existing directive could use it.
501568

502569
## Changelog
503570

571+
### 0.4
572+
573+
- Allow reserved words and built-in identifiers as path components (#3984).
574+
575+
- Disallow whitespace and comments inside package paths (#3983).
576+
504577
### 0.3
505578

506579
- Address breaking change in `part of` directives with library names.

0 commit comments

Comments
 (0)