Skip to content

URI shorthands, allow reserved words. #3984

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lrhn opened this issue Jul 15, 2024 · 8 comments
Closed

URI shorthands, allow reserved words. #3984

lrhn opened this issue Jul 15, 2024 · 8 comments
Labels
feature Proposed language feature that solves one or more problems unquoted-uris The unquoted URI feature

Comments

@lrhn
Copy link
Member

lrhn commented Jul 15, 2024

I propose to allow reserved words to occur in unquoted URIs like any other identifier, because URI/file-system paths are not Dart code, and have no reason to be restricted by what happens to be Dart reserved words.

The proposal for URI shorthands allows sequence of tokens of the form: <dottedIdentifierList> ('/' <dottedIdentifierList>)* as unquoted imports.

It's a syntactic grammar, not a lexical grammar, which means that it does not affect tokenization, which would otherwise need to be context sensitive, and so it can reuse existing grammar productions.

An example import could be import somepackage/somelibrary;.

By using <dottedIdentifier> it only allows identifiers as parts of the path segments of the URI, which means that reserved words are not allowed.
Directory names and URI path segments have no notion of Dart identifiers, so linking the shorthand to only Dart identifiers seems unnecessarily restrictive, and may disallow using the syntax for a URIs like:

import package.for.dart/banana; // contains `for`.
import mypkg/src/new/library; // contains `new`

Dart reserved words are small common words that can occur as directory names, or as parts of dotted names (if anyone used dotted names).

It's easy to allow reserved words in unquoted URIs.
Rather than using <dottedIdentifierList>, use:

<dottedUriWords> ::= (<uriWord> '.')* <uriWord> 
<uriWord> ::= <identifier> | <RESERVED_WORD>

It does nothing except allow reserved words too. Grammatically it should be no harder to work with than <dottedIdentifierList>, and it allows users to not worry about whether their file paths contain Dart reserved words, which have no reason to be special in that context.

Further, if we add more reserved words in the future, we will not introduce compile-time errors if someone used that word as a directory name.

(Why only disallow reserved words, and not built-in identifiers? If it's because there is no reason to disallow built-in identifiers, then there is equally no reason to disallow reserved words.)

We should consider that URIs can be followed by if in conditional imports:

import foo/bar.
  if (condition) foo/qux;

A mistaken trailing ., like after bar above, would include the if in the URI, then complain about the (.
That won't happen if we don't allow reserved words. It also won't happen if we disallow internal whitespace (#3983), and it's not significantly different from import foo/bar. hide banana;, which we do allow.
I don't think it's a reason to not allow reserved words.

@lrhn lrhn added feature Proposed language feature that solves one or more problems unquoted-uris The unquoted URI feature labels Jul 15, 2024
@munificent
Copy link
Member

Further, if we add more reserved words in the future, we will not introduce compile-time errors if someone used that word as a directory name.

I don't think that buys us much because adding reserved words will still break plenty of other things even if they don't break imports.

For what it's worth:

  • I did a scrape of a large corpus of open source Dart code and less than .1% of path components in existing imports are reserved words.

  • Java does not allow reserved words in import directives.

  • C# does not allow reserved words in using directives.

  • Swift does not allow reserved words in import declarations.

  • Kotlin does not allow reserved words in import directives.

  • Python does not allow reserved words in import statements (and gives you the very unhelpful error "SyntaxError: invalid syntax" if you use one).

  • Rust does not allow reserved words in use declarations. Props to Rust for having very nice error messages:

      |
    1 | use for;
      |     ^^^ expected identifier, found keyword
      |
    help: escape `for` to use it as an identifier
      |
    1 | use r#for;
      |     ++
    

    I especially like how it gives you a workaround if you need it, which we could do by showing the quoted form.

So far, I haven't found a widely used language with unquoted imports that does allow reserved words.

@ghost
Copy link

ghost commented Jul 25, 2024

My main argument is that there's no language (AFAIK) that disallows quoting the path. All examples of (unquoted) dot-separated names in other languages refer to hierarchical module names, not pathnames. But as soon as the language provides a way of mapping one hierarchy to another, the pathname gets quoted. e.g. #[path = "foo.rs"] (in rust).

The idea of angle brackets was to support quoted paths, it's just the quote symbols are different ( <...> rather than '...').
There's a precedent for this (C). The fact that <a/b/c> means something different from 'a/b/c' won't surprise anybody.
But that import a/b/c is different from import 'a/b/c' is quite surprising IMO.

@munificent
Copy link
Member

There's a precedent for this (C). The fact that <a/b/c> means something different from 'a/b/c' won't surprise anybody.
But that import a/b/c is different from import 'a/b/c' is quite surprising IMO.

Technically, the angle brackets and quotes aren't part of C at all but are part of the preprocessor.

That matters because the includes and angle brackets are gone before the C lexer ever sees them, so it doesn't have to worry about having different lexical grammar rules inside the angle brackets versus when angle brackets are used for comparison operators.

@ghost
Copy link

ghost commented Jul 26, 2024

I see. In dart, import is not a "first-class" reserved word. There can be an identifier import, and if it happens to be a name of a parametrized function, import<int>(5) will clash with the package import import <int> (the equivalent of import 'package:int/int.dart').
This conflict can be resolved by taking into account the whitespace after import. The lexer can handle the sequence of import, whitespace, < as a trigger for parsing everything between angle brackets <a/b/c> as a string, and creating the same output as while parsing import 'package:a/b/c.dart'.

@munificent
Copy link
Member

We discussed this in the language meeting this morning and in another meeting following that. Based on those discussions, we've decided that, yes, we will allow reserved words as path components in unquoted imports. So this will be allowed and valid:

import if.for/do.class;

Obviously, users should rarely rely on this support. It's certainly better style to avoid path component names that collide with reserved words.

But allowing them means that tools that generate unquoted imports aren't required to know the full set of reserved words and carefully route around them. Also, for better or worse, Dart has a long and somewhat confusing set of reserved words amended by an even longer and more confusing set of "built-in identifiers" and "contextual keywords". Given that, it's actually fairly difficult for a user to know whether a given identifier is fully reserved or not. Is class? (Yes.) What about mixin? (No.) How about is? (Yes.) And as? (No.)

The grammar of unquoted imports/exports is restricted enough that we can allow reserved words there without ambiguity and it avoids users having to worry about accidentally stepping on a reserved word.

I'll write up a PR to update the spec and close this issue when that lands.

@bwilkerson
Copy link
Member

There is an implication here for the UX. If a user is in the middle of typing in an import there could now be an ambiguity that wasn't there before. In particular, consider the following (where ^ indicates the cursor position):

import some.^

class C {}

Because the parser is greedy, it will, by default, decide that this should be read as

import some.class;

C {}

That will result in a poor UX in which diagnostics are generated that are not helpful to solving the real issue.

It's likely that this will be rare enough that we'll choose to ignore it, but I wanted to make sure it has been considered.

@munificent
Copy link
Member

munificent commented Aug 14, 2024

That's a good point. Even without allowing reserved words, that UX problem exists:

import some.^

SomeType x;

Again, the parser will try to make SomeType part of the import and then report a confusing error on x.

I definitely don't like making it the parser implementer's job to conjure up a good UX here, but I suspect that's going to be necessary no matter what. Do you think allowing reserved words makes this problem noticeably worse?

@bwilkerson
Copy link
Member

In a different issue you wrote

We've decided that as this issue proposes, we will be restrictive and not allow internal whitespace or comments inside the path part of an unquoted import.

If I'm understanding correctly, that ought to mostly prevent this kind of problem from occurring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Proposed language feature that solves one or more problems unquoted-uris The unquoted URI feature
Projects
None yet
Development

No branches or pull requests

3 participants