Skip to content

URI shorthands, more permissive #3985

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lrhn opened this issue Jul 15, 2024 · 4 comments
Closed

URI shorthands, more permissive #3985

lrhn opened this issue Jul 15, 2024 · 4 comments
Labels
feature Proposed language feature that solves one or more problems unquoted-uris The unquoted URI feature

Comments

@lrhn
Copy link
Member

lrhn commented Jul 15, 2024

We should consider allowing unquoted URI shorthands to contain more characters. I suggest leading digits, - and possibly +.

That comes back to our goal with this feature: Is it to allow common usages to be short, or is it to enforce a specific style on common usages.

  • In the former case, we should make the feature flexible enough to allow actual, existing usages, and reasonable hypothetical usages.

  • In the latter case, we should define which style it is we want to enforce, and then allow precisely that style to be supported by the unquoted URIs, and everything else has to use the "old" quoted URIs.

The current proposal for URI shorthands allows a very restricted format, sequence of tokens of the form: <dottedIdentifierList> ('/' <dottedIdentifierList>)*, as unquoted imports.

That means that the only package names, directory names and library names that are allowed are .-separated Dart identifiers.

I propose to also allow reserved words. Without those, we are enforcing a style where directory names cannot be Dart reserved words, for not obvious reason.
Is this a style we want to enforce? And if so, why? Who benefits?

I lean towards allowing users to use the feature for their needs, not trying to enforce a specific style. I see no benefit from enforcing such a style.

Even with reserved words, the current feature still disallows what could be considered reasonable directory names like /versions/2.0/ or /versions/1.0.0-beta+1235/.
That's numbers, mostly. By allowing only identifiers, we disallow a directory starting with a digit.

As for reserved words, the restriction seems somewhat arbitrary and unnecessary. The file system has no issue with directories starting with digits. While users haven't used it much (five uses in a quick internal check, four in examples in the same package), it's not an unreasonable choice.

Being a "dart identifier" is only relevant inside Dart code. When referencing external namespaces, there is no reason to restrict oneself to that.

The current unquoted URIs only work for dart: and package: URIs, which means they're unlikely to look into internal paths, and most public libraries will have a "normal" single-identifier-with-underscores name. For the current use of unquoted URIs, there is probably not much need for leading digits today.
(I propose to include all adjacent tokens up to the next whitespace or semicolon in the unquoted URI, then validate that sequence separately. That allows extending the permitted unquoted content in the future without changing "meta-tokenization" of what is considered part of the unquoted URI.)

I personally want to expand the unquoted URIs in the future, fx to local imports (a leading /, ./ or ../). In that case, being more permissive may become useful.

If we allow leading digits, + and -, maybe even only + and - separating letters-or-digits, then the paths can contain reasonable numbers and semantic versions, and the change to the "valid URI" grammar is to allow leading numerals too:

<uriWords> ::= (<uriWord> ('-' | '+' | '.'))* <uriWord> 
<uriWord> ::= <identifier> | <RESERVED_WORD> | <numericLiteral> (<identifier> | <RESERVED_WORD>)?

As mentioned, this may not be very important for the initial unquoted URIs, where the most common use is importing from other packages, and you rarely need more than one or two identifiers for that.
It may become more relevant if we want to expand the scope of unquoted URIs (which I do).

Grammatically it's a little more complicated than just allowing reserved words, because we place significance on the part after the last . of a package name, and that . can be inside a double literal: foo.version.2.1e+100 has package name 1e+100. That's just something we should write an "unquoted URI helper library" to handle, once and for all, so we ensure consistent behavior.

@lrhn lrhn added feature Proposed language feature that solves one or more problems unquoted-uris The unquoted URI feature labels Jul 15, 2024
@ghost
Copy link

ghost commented Jul 15, 2024

Have you considered a (retro)-syntax where the path is enclosed in angle brackets?
The advantages:

  • easy to parse
  • every character currently allowed by import path syntax remains allowed
  • the difference between the proposed syntax and the current package: syntax is substantial enough to warrant a visibly different syntax (angle brackets emphasize the distinction)
  • in every language I'm familiar with that allows unquoted path literals (various flavors of shell languages) the path can be optionally quoted with no effect on semantics (with some minor exceptions). In dart, courtesy of the shorthand proposal, the quoted path literal will have a different meaning

Angle brackets will address all of those misfortunes.

@munificent
Copy link
Member

Angle brackets wouldn't be any easier to lex/parse. The language already allows them for comparison operators and type argument lists, so allowing arbitrary characters after < in an import but not elsewhere would be quite tricky.

Honestly, I just don't think this is a corner of the language where it's worth adding a lot of complexity. Most other languages (Java, C#, Rust, Python, etc.) seem to get by fine with just a dotted series of non-reserved identifiers. Our story is a little more complex because we (for better or worse) use dotted names as package names so need some other path separator in addition to dots. But it doesn't need to be more complex than that.

@munificent
Copy link
Member

We discussed this in the language meeting this morning and in another meeting following that. Based on those discussions, we've decided that no, leading digits, -, and + will not be allowed in unquoted imports.

The current design has a nice property that each path component is a single token and doesn't require any special collusion between lexing and parsing. If we allow leading digits or operator characters, that property is lost. Some examples:

import 123abc;

Here, 123 and abc are lexed as separate tokens but would then be treated as a single path component in the import. Conversely:

import 12.34;

Here, 12.34 is a single double literal token but would be treated as two dot-separated path components (12 and 34) in the import.

Things get weirder when you include scientific notation. We suspect that this would be a bug farm and just doesn't carry its weight in terms of complexity.

If you have some file path that starts with leading digits or contains - or +, you can always continue to use a quoted import to refer to it, or simply rename the file.

@lrhn
Copy link
Member Author

lrhn commented Aug 16, 2024

Here, 12.34 is a single double literal token but would be treated as two dot-separated path components (12 and 34) in the import.

(And those path components will actually need to be separated (as the only path segment isn't aren't just included verbatim in the resulting URI), because the actual import would be package:12.34/34.dart. It really is the pessimal example of allowing double literals, but if we wanted to allow foo.v.2.x we'll also want to allow foo.v.2.0, and then we can't omit sequences that will be tokenized as double literals containing a .. Only way to win is not to play.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Proposed language feature that solves one or more problems unquoted-uris The unquoted URI feature
Projects
None yet
Development

No branches or pull requests

2 participants