Skip to content

[records] parse errors when trying to annotate a record type class field #2469

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pq opened this issue Sep 7, 2022 · 45 comments · Fixed by #2529
Closed

[records] parse errors when trying to annotate a record type class field #2469

pq opened this issue Sep 7, 2022 · 45 comments · Fixed by #2529
Labels
records Issues related to records.

Comments

@pq
Copy link
Member

pq commented Sep 7, 2022

The following code generates what look like parser errors.

image

error: Annotation must be either a const variable reference or const constructor invocation. (invalid_annotation at [sandbox] lib/public.dart:27)
error: Variables must be declared using the keywords 'const', 'final', 'var' or a type name. (missing_const_final_var_or_type at [sandbox] lib/public.dart:28)

/fyi @jensjoha

@bwilkerson
Copy link
Member

My first guess is that we're parsing the record type as an argument list for the annotation. Not sure whether the proposal says how we should disambiguate this case.

@munificent

@lrhn
Copy link
Member

lrhn commented Sep 7, 2022

Ouch, that is an ambiguity.
Not sure there is any good solution.

@foo 
(int, int) Function(int) bar;

is completely ambiguous.

There is no simple disambiguation. We can't add something to @constantName to show that it doesn't want the argument.
We can't add something to (int, int) Function(int) to show that it does want the (otherwise optional) return type.

The best hacks would be:

@foo 
@DUMMY()
(int, int) Function(int) x;

or

typedef typeof<T> = T;

@foo 
typeof<(int, int) Function(int)> x;

Neither is something we can reasonably ask people to do.

@bwilkerson
Copy link
Member

@stereotype441

@pq
Copy link
Member Author

pq commented Sep 8, 2022

@leafpetersen

@leafpetersen
Copy link
Member

I guess one (unfortunate) option is to disambiguate based on space. @foo(...) is an annotation constructor call, @foo (...) is an annotated record type. We'd want to be sure that dartfmt didn't change the spacing in that case though.

@munificent munificent transferred this issue from dart-lang/sdk Sep 8, 2022
@munificent
Copy link
Member

Moving this over to the language repo because this is an ambiguity in the actual language feature and not a parser bug. We'll have to change the proposal to handle this. :(

@munificent munificent added the records Issues related to records. label Sep 8, 2022
@munificent
Copy link
Member

One idea is to put Record at the beginning of record types, like:

class C {
  @deprecated
  Record(int, int) pair;
}

@foo
Record(int, int) Function(int) bar;

I don't like the verbosity, but it's consistent with function type syntax and resolves this ambiguity. It also makes the zero-element record type Record() instead of () record, which I kind of like because the latter feels a little terse and opaque for a type annotation.

I'll think about this some more and see if I can come up with any other ideas.

So how about them types on the right?

@munificent
Copy link
Member

Another (probably bad) idea: We could extend the language to support named type parameters. That could arguably be useful for other user-defined generic types because (as with optional named parameters) it would let use sites omit the type parameters can be inferred/defaulted and write only the ones the user cares about.

If we had that, then record types (and in principle function types) could look like Record<int, String s>.

But given that we already have function types which use parentheses and users seem to be OK with, this feels like the wrong approach.

@lrhn
Copy link
Member

lrhn commented Sep 8, 2022

So we can either change the record type grammar, or the annotation grammar, or add something that users can use to disambiguate in cases where there is ambiguity.

The things we can do without changing the grammar are annoyances to users, and pretty much undiscoverable (like the typeof<T> wrapper).

We could introduce <type> as grouping parentheses for types, without the leading typeof, so you can do @foo <(a, b)> bar(){} and have (a, b) be a type instead of a parameter list. Will probably cause other ambiguities instead, though.

We can change the annotation grammar. Lots of options, all of them breaking and requiring migration, but probably something that can be automated.

We could decide to disallow annotations without parentheses. It would be slightly annoying for a few annotations. Mainly @override, which should be a language feature anyway.
And probably a bunch of others that I don't know because I never use them anyway.

We could change the annotation grammar entirely to use, say, the C#-style: [expression] instead of @expression.
If we are changing something, we might as well change everything 😁 .

Or we can introduce some sort of delimiter you can put after annotations to separate them from the following thing. Trailing comma? Optional semicolon after annotations? (Giving people optional semicolons, just not the ones they want!)

Changing the record type grammar is less breaking because it doesn't exist yet, so we won't break any code.
It's just so. darn. simple. right now, and any change will make it worse.

Options like Record(current record type) or #(current record type) disambiguate a lot. They're also verbose and somewhat unintuitive. The # is better than Record, mainly from being shorter, and it won't conflict with symbols. Both are better than using Record<...>, that looks too much like a generic thing. I'd use the same rules as for Function: if followed by ( it's parsed specially, otherwise it's just the type Record itself.
Also, then the type won't look like the destructuring syntax, or the literal syntax.

@bwilkerson
Copy link
Member

FWIW, I actually find Record(int, {String b}) to be intuitive, and I especially like the symmetry with function types because the two kinds of types have a lot of similarity.

@stereotype441
Copy link
Member

I think my favorite of the suggestions above is using Record(int, {String b}) to denote a record type. I really like the similarity to what we do for function types.

One other random idea: add an optional : to the end of the annotation syntax (which maybe we make required in Dart 3.0). Then the user disambiguates via the placement of the colon, e.g.:

@deprecated:
(int, int) pair;

vs.

@Deprecated('message'):
foo;

If the user leaves off the : we should assume the thing in parentheses is an annotation argument, for backwards compatibility.

@pq
Copy link
Member Author

pq commented Sep 12, 2022

FWIW, I lean towards Record(int, {String b}) for record types too and agree that the symmetry with functions is nice.

@lrhn
Copy link
Member

lrhn commented Sep 12, 2022

I like the Record(actualRecordType) syntax on principle, it's consistent, readable and easy to parse, but I worry it is too verbose.

When you want to use a record type, it's in a place where you don't want the ceremony of introducing a class. Adding extra verbosity to those places makes it less convenient, and might just remove some of the lure of having records. (Not all, but some.)
Or maybe I'm over-thinking it.

@scheglov
Copy link
Contributor

Looking on the motivation in the specification, I see three reasons: (1) no behavior, (2) verbosity, and (3) coupled to a class definition.

Surprisingly, I don't see performance, i.e. I'd expect that because we don't need allocating a new object instance, and then GC it, and instead just return a few values on the stack, it is slightly faster.

I don't understand (3) as well. What is wrong with being coupled to a class name?

For (2) I think the biggest issue is not verbosity itself, but non-locality. I have to scroll down to the bottom of the file to declare a new class that I will return from this method. Not a huge work, I admit, but still. Typing Record is still more lightweight than scrolling. Although of course adds some verbosity.

And I also like the symmetry with Function.

@lrhn
Copy link
Member

lrhn commented Sep 13, 2022

I'll admit, I also find Function too verbose.
And I can't even typedef my way around it, because even with typedef Fn = Function, the int Function(int) syntax is a special syntactic form unrelated to the similarly named class, so the alias doesn't apply.

Compare

Verbose Concise Overly Brief?
Record(int, int) Rd(int, int) #(int, int)
int Function(int) int Fn(int) (int)->int

I think a point could be made for shorter keywords. :)

@Jetz72
Copy link

Jetz72 commented Sep 13, 2022

Could you make the word Record optional perhaps? Allowing both Record(int, int) and the shorter (int, int)? The latter would probably be used in most cases, but if you needed to precede it with an annotation (or some other ambiguous case comes up down the line) or you just wanted to be more explicit, you could write out the full Record.

@mateusfccp
Copy link
Contributor

And I can't even typedef my way around it, because even with typedef Fn = Function, the int Function(int) syntax is a special syntactic form unrelated to the similarly named class, so the alias doesn't apply.

Could we fix this?

@TimWhiting
Copy link

TimWhiting commented Sep 13, 2022

I prefer the more concise syntax for records types. I agree with @lrhn that already Function is a bit verbose.

Compare:

Record(String, String) doSomething(Record(String, String) Function(Record(int, int), String Function(int)) transform) {
 return transform(((1, 2), (i) => '$i'));
}

with the current (with allowing Fn)

(String, String) doSomething((String, String) Fn((int, int), String Fn(int)) transform) {
...
}

Or:

doSomething(transform: ((int, int), int -> String) -> (String, String)) -> (String, String) {
...
}

With disallowing : for default named parameters I think @munificent hinted that would make it easier for types on the right, but that would be a hugely breaking change. Would that make it easier for optional semicolons and other requests though? :) One can hope.

In any case, I know this is a contrived example, but Record seems unnecessarily wordy. Especially for a language that has void main() rather than public static void main(String[] args).

I like @stereotype441's suggestion of adding an optional : after metadata (could even do a ;) for disambiguation, but if you really want a prefix I liked @lrhn's suggestion of #(int, int)

@stereotype441
Copy link
Member

Another idea that came up in this morning's language team meeting is to disambiguate the two types of annotations based on whether there's whitespace between the identifier and the (. So e.g. @foo(...) would parse as an annotation that constructs an instance of foo, whereas @foo (...) would parse as the annotation @foo followed by a record type.

Although technically breaking, this disambiguation rule would probably do the right thing in 99.9% of cases.

@leafpetersen
Copy link
Member

Summarizing discussion from this morning.

There are two broad approaches we could take: do something with the annotation syntax to disambiguate, or change the record type syntax.

On the first approach:

  • We could require (or allow) a colon after the annotation to terminate it. That is, an annotation becomes either @name: or @name(...):.
    • This would be very breaking to existing code
    • This would be noisy, people generally like the existing syntax.
  • We could always require parentheses on annotations
    • Very breaking
    • Noisy for common cases
  • We could disambiguate based on the presence/absence of return type. That is, we could say that for methods, the method declaration "greedily" eats the type, so if there is a parenthesized list before a method declaration name, it is always interpreted as a record type.
    - Probably very breaking
    - Ugly to parse
    - Not clear it covers all of the cases
  • We could disambiguate based on whitespace: @name( always starts an annotation constructor call, and @name ( is always an annotated record type.
    - This may be ugly to implement
    - Probably makes the grammar more rigid, harder to change in the future
    - Technically breaking, but in practice probably not
    - Code generators would have to be aware of this, but probably not an issue (they need to add whitespace to avoid accidental concatenation anyway).

On the second approach:

  • Using generic method syntax (e.g. Record<type1, {type2 name}>) was not appealing
    • Strange to have named parameters in there
  • Using Record(type1, {type2 name}) could be done.
    • Pretty broad agreement that this is verbose, nobody excited about it
    • Some subset of the team felt it's also pretty ugly, but not universal
    • Consistent with the Function case
    • Makes multiple return type method declarations much less "symmetric"
  • Using #(...) just for the type level would work, somewhat consistent with Javascript proposals which use #{ and #[
    • Harder to search for
    • A bit punctuation heavy
    • Inconsistent with the term level
  • Using the #(...) syntax uniformly for types, expressions, and patterns could work
    • More consistent, avoids some issues with parenthesized expressions

There isn't broad agreement on the team as to a direction yet, we'll keep discussing this.

@leafpetersen
Copy link
Member

leafpetersen commented Sep 14, 2022

One possibility that we didn't discuss yet is using a different infix operator for the type syntax. An idea which probably doesn't work is to use , without parentheses:

  • int,int foo(int, int) => (3, 3) is a function returning a pair of integers.
  • This is probably ambiguous: is foo(int, int x) a function expecting a single parameter named x of type int, int, or a function expecting a parameter named int of type dynamic and parameter named x of type int?

The ML family of languages uses infix *.

  • int * int foo(int x) is a function returning a pair of integers
  • This seems unambiguous for the annotation case: @deprecated int * int is fine?
  • We would need to rely on precedence for things like int * int Function(int): is it a function returning a pair, or a pair with a function typed component?
    • The ML family of languages makes * bind more tightly
  • There may be other ambiguities lurking here
  • This gives no syntax for the empty or unary tuple types.
    • we could use () for empty, but we still have the same ambiguity to work around
    • We could define these as primitive named types
  • This will look very weird for the named parameters, especially if there are no positional parameters
    • int * {int x, int y}
    • {int x, int y}
    • Though arguably, the latter is a "feature": you don't need to nest the delimiters for records with no positional fields, so it's almost as if you have a separate record syntax.
    • Is this really not ambiguous somewhere?
      • [Edited to add]: Clearly we would need to be able to parenthesize optionally for nested tuples.

A variant of the above would be to use parentheses with an infix * as a delimiter:

  • (int * int) foo(int x) is a function returning a pair
  • We still have the empty tuple problem ((*)?)
  • We could use the trailing * trick for the singleton tuple: (int*)

Other delimiters we could use (with or without parentheses):

  • int & int or (int & int)
  • int ^ int or (int ^ int)

@jensjoha
Copy link

I had hoped to stay out of this discussion, but feel I have to voice my dislike of using spaces (and lack of spaces) as disambiguation.
On an objective level, we've never done anything like that before and e.g. actively chose a potential exponential (as I recall) parsing for disambiguating between null-aware bracket or a conditional. As also mentioned elsewhere here it will also invalidate assumptions for everything having to do with dart code (say for instance code generation and code formatting).
On a completely subjective level I also just don't like it.

(And now that I'm already writing a comment:)

As for, e.g., Rd I'd point out that I'd probably not get through a code review with code including such a weird abbreviation.

As for having two different syntaxes so that one can use one in normal cases and the other if the normal case is ambiguous seems complete bonkers to me. To me, at least, having a single consistent syntax would be much preferable.

Also I personally don't mind the verbosity of Record. My mind would have an easier time remembering that, than some (combination of) punctuation(s) or abbreviation(s). That it becomes too long when nesting enough of them doesn't sound like a good argument to me --- with enough nesting it's going to be completely unreadable no matter what we do.

@lrhn
Copy link
Member

lrhn commented Sep 15, 2022

The Record(...) syntax is definitely the safe and rational choice.

It's consistent with Function. It's safe in all the right ways, least likely to set us up for problems in the future, because the grammar is well known. Accepts all record types without problems.

Using significant whitespace may fix this particular problem, but it leaves us wide open to the next conflict with an untagged parenthesis occurring after something else. It locks us down with regard to what other syntax we can add.
I'd be happy using the significant whitespace solution, but I don't think it's the most forward-looking choice.
(Edit: And we already had another one: on (e, s) {} after a try. Raw parentheses are darn dangerous.)

The problem with Record(...) is that it doesn't bring me joy. And I know how much writing Function annoys me today. It's a constant annoyance, something abrasive that adds to my daily amount of wear and tear every single time I write it. I doesn't go away. It doesn't get better. It's just "here we go again". It's a cost of using Dart that I have to write Function when I want a function type.
And I think that matters.

In that comparison, I'd prefer #(...) for the type, and not for the values, even if it takes up precious grammatical space, because it will keep records a low syntactic overhead. I won't mind writing it. It's still searchable (#( is unique). And if we can't use our syntax for something as central as the product type, what can we use it for?

@ds84182
Copy link

ds84182 commented Sep 15, 2022

I really like the colon suggestion from @stereotype441. I think allowing it (not making it required) wouldn't be a breaking change right? The point is there needs to be some delimiter between annotations and what they're annotating. Of all the solutions presented, it feels like the cleanest.

Also this applies to annotated parameters as well. int foo(@annotation (int, int) latLong) has the same ambiguity.

@Wdestroier
Copy link

Sometimes I feel like the parenthesis are extra unnecessary code, let's say, compared to Golang.

Example: int, int getOrigin() { return 0, 0; } vs (int, int) getOrigin() { return (0, 0); }.

However, it's not too much code and may make the code more readable.

In my opinion, having to write Record defeats the main advantage of records by requiring so much boilerplate and making the code not look good.

@TimWhiting
Copy link

TimWhiting commented Sep 16, 2022

As another consideration. Right now the Records feature does not break user written classes named Record because the user written declaration shadows the dart:core Record, and the record type does not need the name Record in it. If you make it mandatory to write Record as part of the type, it might make the disambiguation a little weird.

// User written
class Record {}

Record(int, int) f0(Record(int, int) value);

Record f1(Record value);

I'm assuming you would do the right thing here and resolve to the dart:core Record type for f0, but the user written class for f1, (they are disambiguated because of the parentheses, but it definitely makes it confusing to read).

@lrhn
Copy link
Member

lrhn commented Sep 16, 2022

We'd have to be absolutely certain that we can distinguish whether Record(int) should be a type or a constructor invocation.
We generally can do that. A lot of Dart would break down if we couldn't distinguish whether a position should be a type or an expression, since identifiers can be both. So it would probably work out. Like for Function, the following ( force-disambiguates it into a type in a type-position, but not in an expression position.

@TimWhiting
Copy link

I can't think of any true ambiguity (like you say the following ( disambiguates like Function) but there will be a lot during editing. Here are a few I can think of.

Record(int, int) f0(Record(int, int), value);
//                         ^ record parameter missing a name, or is this a function parameter named Record 
// (function parameter as written)

void main() {
  Record(int, String)
 { //  ^ is this a local function (yes), but also could be 
// an object instantiation / function call with a missing semicolon
// a variable declaration with a missing name & semicolon
// (Makes it harder to be rid of semicolons), though I guess this is an existing problem with blocks
  }
}

@natebosch
Copy link
Member

Would using #( for types, but not for expressions, open up the possibility for record type literals? If we use either #( or Record do we plan to allow for record type literals?

@munificent
Copy link
Member

Would using #( for types, but not for expressions, open up the possibility for record type literals?

Yes, it would.

If we use either #( or Record do we plan to allow for record type literals?

I don't think we support Function(...) for function type type literals. I believe it's because relying on a built-in identifier (and not a true reserved word) makes that weird. So we probably wouldn't support record type literals if we use Record(...). Honestly, I think type literals are often more trouble than they're worth. I'm not inclined to double down on them because making the type annotation syntax overlap the expression syntax just paints us into a corner in a lot of ways. (Like the fact that we couldn't use class names as tear-offs for the unnamed constructor.) So I'm not inclined to make record type type literals a priority. But if we picked a syntax like #(...) which made it clearly unambiguous, we definitely could allow them.

@lrhn
Copy link
Member

lrhn commented Sep 20, 2022

I too would avoid allowing type literals for anything new. If it doesn't follow the existing grammar (identifier, identifier + type parameters), then I won't introduce new syntax to allow the type literal. It takes syntax away from much more worthwhile features (aka. any other feature).

We can define typedef typeof<T> = T; today and allow typeof<any type> as a type literal for any type. Use $ if you want it even shorter. Creating a type literal for a type is not a problem which needs a syntactic solution.

Also #2393, for the extreme position.

@munificent
Copy link
Member

We discussed this in the language meeting today. While the alternatives on the table have their merits, we prefer the original syntax that's currently in the proposal.

In order to resolve the ambiguity, we will take @leafpetersen's original suggestion to make whitespace significant. That means:

// Metadata with argument list on function with no return type:
@annotation(a, b) function() {}

// Metadata with no arguments on function returning a record:
@annotation (a, b) function() {}

Note that we'll make any whitespace significant, not just newlines. Making newlines significant would work for function and field declarations where idiomatic formatted code puts annotations on the previous line. But it doesn't work for annotations on function parameters which are kept inline, as in:

function({
  @deprecated (int, int) pair
}) { ... }

The idea is that we'll try this approach out while the feature is still behind an experiment flag. During the implementation, we'll get a better sense of whether making the parser whitespace-sensitive in this way feels brittle or regretful. And we can try out this behavior to see how it feels in practice. If it feels like a bad choice, we can revisit it. But if it works (and we think it will), then it lets us keep the syntax we like the most.

We have not figured out exactly how whitespace is sensitive here yet. Is it only for cases that are actually ambiguous? Or do we never treat a parenthesized thing after a metadata annotation as its argument list if there is space before it?

Here's an initial suggestion:

  • When parsing a metadata annotation, ignore any whitespace and try to parse it as an argument list as usual.

  • If that leads to a parse where the metadata annotation precedes a location where a type annotation is optional and has been omitted, then look for whitespace before the (. If found, instead treat the argument list as a record type. Do this unconditionally, even if it turns out that the parenthsized construct would be a valid metadata annotation argument list but not a valid record type. That means this would become an error:

    @annotation (1) function() {}

Thoughts?

@bwilkerson
Copy link
Member

Here's an initial suggestion: ...

@jensjoha

I have to wonder how that will impact the performance of the parser. Today the parser is very eager when it comes to invoking methods on the listener, so this will effectively require it to parse every piece of metadata twice (once to decide how to parse it and once to actually parse it). It might not be a huge hit, but my understanding that the performance of the parser is fairly important.

There was some discussion in the past about building a general mechanism to support recovery (basically capture the events that would have been sent to the listener and then either actually send them or completely discard them based on whether the attempted parse succeeded. It might be time to revisit some of those discussions.

@leafpetersen
Copy link
Member

We have not figured out exactly how whitespace is sensitive here yet. Is it only for cases that are actually ambiguous? Or do we never treat a parenthesized thing after a metadata annotation as its argument list if there is space before it?

The latter is my preference, and I think it shouldn't have any performance implications at all. If you see @<identifier><whitespace> you're done with the metadata, otherwise whatever is there instead of the whitespace is an argument list. Does that fall down somewhere?

@bwilkerson
Copy link
Member

That would certainly resolve the parsing issues. There's no need for backtracking under that definition.

@munificent
Copy link
Member

That sounds good to me too.

@lrhn
Copy link
Member

lrhn commented Sep 22, 2022

We need to decide where whitespace can go when there is a type argument.

We can choose any of:

@foo<int>(bar) @foo <int>(bar) @foo<int> (bar) @foo <int> (bar)
No internal space ✔️
No space before arguments ✔️ ✔️
No space after identifier ✔️ ✔️
No ambiguity if type arguments ✔️ ✔️ ✔️ ✔️

The remaining combinations would be inconsistent.

We can definitely say that any whitespace after identifier ends the metadatum, that reduces it to "no space after identifier" and "no internal space", both valid options.

The grammar production for those could be:

No internal whitespace:

'@' <identifier> ( ~<WHITESPACE> <typeArguments>? ~<WHITESPACE> <argumentList>)?

No whitespace after identifier:

'@' <identifier> ( ~<WHITESPACE> <typeArguments>? <argumentList>)?

No whitespace before arguments:

'@' <identifier> ( <typeArguments>? ~<WHITESPACE> <argumentList>)?

No ambiguity if type arguments:

'@' <identifier> ( <typeArguments> <argumentList> | ~<WHITESPACE> <argumentList>)?

(I'm fairly sure I'm stretching the grammar by using negative lexical tokens in grammatical rules as negative lookahead, where in lexical rules they match an actual character other than the negated set.)

@leafpetersen
Copy link
Member

We can definitely say that any whitespace after identifier ends the metadatum, that reduces it to "no space after identifier" and "no internal space", both valid options.

I don't have extremely strong opinions here, but I feel like "no space after the identifier" is easy to explain, and should be easy to implement (you could in principle do it in the tokenizer, I think).

@lrhn
Copy link
Member

lrhn commented Sep 22, 2022

The one risk of doing "no space after identifier" vs "no internal space" is that if we ever want to allow explicit generic instantiation as metadata, @foo<typeargs> and nothing more, we would introduce another ambiguity.
If we go with "no internal space" now, we don't have to change it later.

On the other hand, if we never allow @foo<typeargs> by itself (which would be weird, since we're then annotating with either a function or a Type object), making our current life easier is also valid.

We can change it later. The formatter will have ensured that no existing formatted code at that time will be affected by such a change.

So, "no space after identifier" SGTM.

@munificent
Copy link
Member

I hate myself for even asking this, but what about:

@foo/*comment*/(a, b) thing;

@lrhn
Copy link
Member

lrhn commented Sep 23, 2022

Don't hate yourself. Hate society for bringing us to this point!

Having a comment isn't even completely unreasonable at that point. Consider a generic annotation, and

@Marker/*<int|String>*/(const bool.hasEnvironment("flooNum") 
    ? const int.fromEnvironment("flooNum") 
    : const String.fromEnvironment("floo"));

Ok, a little speculative maybe, it's in a position where no reader benefits from the comment.

I'm very tempted to say that if the next source token (ignoring comments and whitespace) after the identifier does not start right where the identifier ends, it's not part of the annotation. So no comments either.

If we allow comments, we get into cases like:

@foo// Nyah, nyah
(int, int) get foo => 1;

and we'll have to decide whether the newline is part of the comment or counts as a separate whitespace. (Or forget to consider some similar case, and have implementations do different things.)

The KISS is (/< must be right after the identifier. That's unambiguous. Allowing anything there opens us up to unforeseen complications and special cases.

There are cases of:

@deprecated // deprecated

in the world today, but I don't see any cases of the / being flushed against the identifier (thank you Dart formatter!).

That brings up another way to consider it: What would the formatter do?
It does:

@C /*x*/ ()
@C /*x*/ <int>()
@C<int> /*x*/ ()
@C<int>() /*x*/
@C /*x
  y*/
()
@C /*x
  y */
<int>()
@C<int> /*x
  y*/
    ()
@C<int>() /*x
  y*/
@C // x
<int>()
@C<int> // x
    ()
@C<int>() // x
void main() {}

The formatter wants to put spaces between comments and other things. All existing formatted code will have the space, which means that existing code with comments between identifier and </( will stop working anyway. Making the comment itself breaking won't change that.
(If there was any such existing code,)

@stereotype441
Copy link
Member

That brings up another way to consider it: What would the formatter do?

I want a bumper sticker that says "What would the formatter do?" 😃

The KISS is (/< must be right after the identifier. That's unambiguous. Allowing anything there opens us up to unforeseen complications and special cases.

Agreed. And it's consistent with the way the scanner treats comments elsewhere in the language. E.g. int/* */x means int x, not intx. And i-/* */- means i- - (a syntax error), not i--.

@lrhn
Copy link
Member

lrhn commented Sep 23, 2022

I wouldn't say that int/* */x means int x, just that comments are token separators the same way as whitespace. It means int and x are separate tokens, and i-/**/- is three tokens instead of two.

For @id(), the id and ( are going to be separate tokens anyway, with or without intermediate token separators. Adding or removing whitespace or comments between id and ( doesn't currently affect parsing at all.

The difference for the "whitespace sensitive parsing" is that we make the parsing depend not only on the sequence of tokens, but also on their physical location, the presence or absence of non-token source characters between them. That's something we don't do anywhere else in the current grammar (as far as I know).

The formatter is guaranteed to work, because all it does is to change the position of tokens, but it makes sure to not change the order or the tokenization (no making two identifiers touch).
With whitespace sensitive parsing, the formatter must also make sure to not put space between the id and (, or remove such space, which is a safe operation everywhere else.

It's nothing insurmountable, but it is a new thing, and a special case that must be handled specifically by new code that doesn't already exist.

I guess you could say that we introduce new tokes IDLT : IDENTIFIER '<' and IDPAREN : IDENTIFIER '(' and make the grammar of metadatum be:

 '@' (<identifier> | IDLT <typeList> '>' <arguments> | IDPAREN (<argumentList> ','?)? ')'

(and then everywhere we otherwise accept identifier '<'/identifier '(' we should also accept IDLT/IDPAREN too.)

Well, except that it's not IDENTIFIER, it's qualified, so we need to tweak that too.

@munificent
Copy link
Member

munificent commented Sep 23, 2022

That brings up another way to consider it: What would the formatter do?

I want a bumper sticker that says "What would the formatter do?" 😃

"But, doctor, I am Pagliacci!"

The formatter wants to put spaces between comments and other things.

It does have some logic to not put spaces before /* and after */ in certain contexts, mainly to try to make the old generic method comments look right. That logic is heuristic-based and pretty brittle.

It's nothing insurmountable, but it is a new thing, and a special case that must be handled specifically by new code that doesn't already exist.

It's not entirely new. In addition to not adhering identifiers together (which is obvious), the formatter also has to take care to insert a space between nested unary minus so that it doesn't turn - -2 into --2.

@lrhn
Copy link
Member

lrhn commented Sep 23, 2022

The formatter is probably more aware of whitespace than any other tool. It makes sense that it is already doing things with it.
The new code I was talking about was also in the parser, which should not have anything today which checks whether two tokens are adjacent.
Maybe it's easy, but it will be new.

munificent added a commit that referenced this issue Oct 26, 2022
#2529)

* [records] Disambiguate metadata annotation arguments and record types.

Fix #2469.

* Don't allow any characters at all before metadata arguments.

* Clarify that disambiguation is not gated on language versioning.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
records Issues related to records.
Projects
None yet
Development

Successfully merging a pull request may close this issue.