-
Notifications
You must be signed in to change notification settings - Fork 213
[records] parse errors when trying to annotate a record type class field #2469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
My first guess is that we're parsing the record type as an argument list for the annotation. Not sure whether the proposal says how we should disambiguate this case. |
Ouch, that is an ambiguity. @foo
(int, int) Function(int) bar; is completely ambiguous. There is no simple disambiguation. We can't add something to The best hacks would be: @foo
@DUMMY()
(int, int) Function(int) x; or
Neither is something we can reasonably ask people to do. |
I guess one (unfortunate) option is to disambiguate based on space. |
Moving this over to the language repo because this is an ambiguity in the actual language feature and not a parser bug. We'll have to change the proposal to handle this. :( |
One idea is to put class C {
@deprecated
Record(int, int) pair;
}
@foo
Record(int, int) Function(int) bar; I don't like the verbosity, but it's consistent with function type syntax and resolves this ambiguity. It also makes the zero-element record type I'll think about this some more and see if I can come up with any other ideas. So how about them types on the right? |
Another (probably bad) idea: We could extend the language to support named type parameters. That could arguably be useful for other user-defined generic types because (as with optional named parameters) it would let use sites omit the type parameters can be inferred/defaulted and write only the ones the user cares about. If we had that, then record types (and in principle function types) could look like But given that we already have function types which use parentheses and users seem to be OK with, this feels like the wrong approach. |
So we can either change the record type grammar, or the annotation grammar, or add something that users can use to disambiguate in cases where there is ambiguity. The things we can do without changing the grammar are annoyances to users, and pretty much undiscoverable (like the We could introduce We can change the annotation grammar. Lots of options, all of them breaking and requiring migration, but probably something that can be automated. We could decide to disallow annotations without parentheses. It would be slightly annoying for a few annotations. Mainly We could change the annotation grammar entirely to use, say, the C#-style: Or we can introduce some sort of delimiter you can put after annotations to separate them from the following thing. Trailing comma? Optional semicolon after annotations? (Giving people optional semicolons, just not the ones they want!) Changing the record type grammar is less breaking because it doesn't exist yet, so we won't break any code. Options like |
FWIW, I actually find |
I think my favorite of the suggestions above is using One other random idea: add an optional @deprecated:
(int, int) pair; vs. @Deprecated('message'):
foo; If the user leaves off the |
FWIW, I lean towards |
I like the When you want to use a record type, it's in a place where you don't want the ceremony of introducing a class. Adding extra verbosity to those places makes it less convenient, and might just remove some of the lure of having records. (Not all, but some.) |
Looking on the motivation in the specification, I see three reasons: (1) no behavior, (2) verbosity, and (3) coupled to a class definition. Surprisingly, I don't see performance, i.e. I'd expect that because we don't need allocating a new object instance, and then GC it, and instead just return a few values on the stack, it is slightly faster. I don't understand (3) as well. What is wrong with being coupled to a class name? For (2) I think the biggest issue is not verbosity itself, but non-locality. I have to scroll down to the bottom of the file to declare a new class that I will return from this method. Not a huge work, I admit, but still. Typing And I also like the symmetry with |
I'll admit, I also find Compare
I think a point could be made for shorter keywords. :) |
Could you make the word |
Could we fix this? |
I prefer the more concise syntax for records types. I agree with @lrhn that already Compare: Record(String, String) doSomething(Record(String, String) Function(Record(int, int), String Function(int)) transform) {
return transform(((1, 2), (i) => '$i'));
} with the current (with allowing Fn) (String, String) doSomething((String, String) Fn((int, int), String Fn(int)) transform) {
...
} Or: doSomething(transform: ((int, int), int -> String) -> (String, String)) -> (String, String) {
...
} With disallowing In any case, I know this is a contrived example, but Record seems unnecessarily wordy. Especially for a language that has I like @stereotype441's suggestion of adding an optional |
Another idea that came up in this morning's language team meeting is to disambiguate the two types of annotations based on whether there's whitespace between the identifier and the Although technically breaking, this disambiguation rule would probably do the right thing in 99.9% of cases. |
Summarizing discussion from this morning. There are two broad approaches we could take: do something with the annotation syntax to disambiguate, or change the record type syntax. On the first approach:
On the second approach:
There isn't broad agreement on the team as to a direction yet, we'll keep discussing this. |
One possibility that we didn't discuss yet is using a different infix operator for the type syntax. An idea which probably doesn't work is to use
The ML family of languages uses infix
A variant of the above would be to use parentheses with an infix * as a delimiter:
Other delimiters we could use (with or without parentheses):
|
I had hoped to stay out of this discussion, but feel I have to voice my dislike of using spaces (and lack of spaces) as disambiguation. (And now that I'm already writing a comment:) As for, e.g., As for having two different syntaxes so that one can use one in normal cases and the other if the normal case is ambiguous seems complete bonkers to me. To me, at least, having a single consistent syntax would be much preferable. Also I personally don't mind the verbosity of |
The It's consistent with Using significant whitespace may fix this particular problem, but it leaves us wide open to the next conflict with an untagged parenthesis occurring after something else. It locks us down with regard to what other syntax we can add. The problem with In that comparison, I'd prefer |
I really like the colon suggestion from @stereotype441. I think allowing it (not making it required) wouldn't be a breaking change right? The point is there needs to be some delimiter between annotations and what they're annotating. Of all the solutions presented, it feels like the cleanest. Also this applies to annotated parameters as well. |
Sometimes I feel like the parenthesis are extra unnecessary code, let's say, compared to Golang. Example: However, it's not too much code and may make the code more readable. In my opinion, having to write Record defeats the main advantage of records by requiring so much boilerplate and making the code not look good. |
As another consideration. Right now the Records feature does not break user written classes named // User written
class Record {}
Record(int, int) f0(Record(int, int) value);
Record f1(Record value); I'm assuming you would do the right thing here and resolve to the dart:core Record type for f0, but the user written class for f1, (they are disambiguated because of the parentheses, but it definitely makes it confusing to read). |
We'd have to be absolutely certain that we can distinguish whether |
I can't think of any true ambiguity (like you say the following Record(int, int) f0(Record(int, int), value);
// ^ record parameter missing a name, or is this a function parameter named Record
// (function parameter as written)
void main() {
Record(int, String)
{ // ^ is this a local function (yes), but also could be
// an object instantiation / function call with a missing semicolon
// a variable declaration with a missing name & semicolon
// (Makes it harder to be rid of semicolons), though I guess this is an existing problem with blocks
}
} |
Would using |
Yes, it would.
I don't think we support |
I too would avoid allowing type literals for anything new. If it doesn't follow the existing grammar (identifier, identifier + type parameters), then I won't introduce new syntax to allow the type literal. It takes syntax away from much more worthwhile features (aka. any other feature). We can define Also #2393, for the extreme position. |
We discussed this in the language meeting today. While the alternatives on the table have their merits, we prefer the original syntax that's currently in the proposal. In order to resolve the ambiguity, we will take @leafpetersen's original suggestion to make whitespace significant. That means: // Metadata with argument list on function with no return type:
@annotation(a, b) function() {}
// Metadata with no arguments on function returning a record:
@annotation (a, b) function() {} Note that we'll make any whitespace significant, not just newlines. Making newlines significant would work for function and field declarations where idiomatic formatted code puts annotations on the previous line. But it doesn't work for annotations on function parameters which are kept inline, as in: function({
@deprecated (int, int) pair
}) { ... } The idea is that we'll try this approach out while the feature is still behind an experiment flag. During the implementation, we'll get a better sense of whether making the parser whitespace-sensitive in this way feels brittle or regretful. And we can try out this behavior to see how it feels in practice. If it feels like a bad choice, we can revisit it. But if it works (and we think it will), then it lets us keep the syntax we like the most. We have not figured out exactly how whitespace is sensitive here yet. Is it only for cases that are actually ambiguous? Or do we never treat a parenthesized thing after a metadata annotation as its argument list if there is space before it? Here's an initial suggestion:
Thoughts? |
I have to wonder how that will impact the performance of the parser. Today the parser is very eager when it comes to invoking methods on the listener, so this will effectively require it to parse every piece of metadata twice (once to decide how to parse it and once to actually parse it). It might not be a huge hit, but my understanding that the performance of the parser is fairly important. There was some discussion in the past about building a general mechanism to support recovery (basically capture the events that would have been sent to the listener and then either actually send them or completely discard them based on whether the attempted parse succeeded. It might be time to revisit some of those discussions. |
The latter is my preference, and I think it shouldn't have any performance implications at all. If you see |
That would certainly resolve the parsing issues. There's no need for backtracking under that definition. |
That sounds good to me too. |
We need to decide where whitespace can go when there is a type argument. We can choose any of:
The remaining combinations would be inconsistent. We can definitely say that any whitespace after The grammar production for those could be: No internal whitespace: '@' <identifier> ( ~<WHITESPACE> <typeArguments>? ~<WHITESPACE> <argumentList>)? No whitespace after identifier: '@' <identifier> ( ~<WHITESPACE> <typeArguments>? <argumentList>)? No whitespace before arguments: '@' <identifier> ( <typeArguments>? ~<WHITESPACE> <argumentList>)? No ambiguity if type arguments: '@' <identifier> ( <typeArguments> <argumentList> | ~<WHITESPACE> <argumentList>)? (I'm fairly sure I'm stretching the grammar by using negative lexical tokens in grammatical rules as negative lookahead, where in lexical rules they match an actual character other than the negated set.) |
I don't have extremely strong opinions here, but I feel like "no space after the identifier" is easy to explain, and should be easy to implement (you could in principle do it in the tokenizer, I think). |
The one risk of doing "no space after identifier" vs "no internal space" is that if we ever want to allow explicit generic instantiation as metadata, On the other hand, if we never allow We can change it later. The formatter will have ensured that no existing formatted code at that time will be affected by such a change. So, "no space after identifier" SGTM. |
I hate myself for even asking this, but what about: @foo/*comment*/(a, b) thing; |
Don't hate yourself. Hate society for bringing us to this point! Having a comment isn't even completely unreasonable at that point. Consider a generic annotation, and
Ok, a little speculative maybe, it's in a position where no reader benefits from the comment. I'm very tempted to say that if the next source token (ignoring comments and whitespace) after the identifier does not start right where the identifier ends, it's not part of the annotation. So no comments either. If we allow comments, we get into cases like: @foo// Nyah, nyah
(int, int) get foo => 1; and we'll have to decide whether the newline is part of the comment or counts as a separate whitespace. (Or forget to consider some similar case, and have implementations do different things.) The KISS is There are cases of: @deprecated // deprecated in the world today, but I don't see any cases of the That brings up another way to consider it: What would the formatter do? @C /*x*/ ()
@C /*x*/ <int>()
@C<int> /*x*/ ()
@C<int>() /*x*/
@C /*x
y*/
()
@C /*x
y */
<int>()
@C<int> /*x
y*/
()
@C<int>() /*x
y*/
@C // x
<int>()
@C<int> // x
()
@C<int>() // x
void main() {} The formatter wants to put spaces between comments and other things. All existing formatted code will have the space, which means that existing code with comments between identifier and |
I want a bumper sticker that says "What would the formatter do?" 😃
Agreed. And it's consistent with the way the scanner treats comments elsewhere in the language. E.g. |
I wouldn't say that For The difference for the "whitespace sensitive parsing" is that we make the parsing depend not only on the sequence of tokens, but also on their physical location, the presence or absence of non-token source characters between them. That's something we don't do anywhere else in the current grammar (as far as I know). The formatter is guaranteed to work, because all it does is to change the position of tokens, but it makes sure to not change the order or the tokenization (no making two identifiers touch). It's nothing insurmountable, but it is a new thing, and a special case that must be handled specifically by new code that doesn't already exist. I guess you could say that we introduce new tokes '@' (<identifier> | IDLT <typeList> '>' <arguments> | IDPAREN (<argumentList> ','?)? ')' (and then everywhere we otherwise accept Well, except that it's not |
"But, doctor, I am Pagliacci!"
It does have some logic to not put spaces before
It's not entirely new. In addition to not adhering identifiers together (which is obvious), the formatter also has to take care to insert a space between nested unary minus so that it doesn't turn |
The formatter is probably more aware of whitespace than any other tool. It makes sense that it is already doing things with it. |
The following code generates what look like parser errors.
/fyi @jensjoha
The text was updated successfully, but these errors were encountered: