-
Notifications
You must be signed in to change notification settings - Fork 213
Alternative syntax for record positional field getters - viability query #2726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
We should also consider the generality of this mechanism. It would have been nice if we could support this notion of integer numerals as selectors as more than a very narrow special case, but that doesn't seem to work out so smoothly: extension on Object {
void operator 0() => print('Hello, $this!');
}
void main() {
2.0; // Print 'Hello, 2!', or evaluate a `double` value and discard it?
} |
It's true that if we later introduce integer getters (or general members, wohoo!) on arbitrary types, then Doing maximal tokenization, as usual, and then splitting the tokens if necessary, will give that effect. I don't think there are other cases where both tokenizations of |
If I can get some sample test cases I can try to spend a little time hacking on it and see how far I come. Probably that should give us an idea. |
Here is a CL with some hypothetical tests. Since there is no syntax support, I probably made plenty of mistakes, so take it with a grain of salt. |
My opinions:
No. It's a label whose entire lexeme is meaningful, not an integer whose value is the identifier. Rust doesn't allow leading zeroes.
Again, no.
Zero. Rust and Swift both use this syntax start at zero. All of the languages I found that use integer literals in some form (as opposed to some prefixed identifier-like thing like
I'm fine either way. It works since you can construct a symbol whose name is |
ACK on no leading zeros. Updated hypothetical test files. Going for "cannot be intercepted by (I have an even more hypothetical document theorizing how we could possibly make a grammar for this: https://gist.github.com/lrhn/f06ba8300def9cc4bfe84869c3d78229. May have no bearing on the reality of parsing.) |
It seems to work @ https://dart-review.googlesource.com/c/sdk/+/278506 |
That's darn impressive! |
I wanted to get some actual data about whether users prefer numbered lists of things in their code to be zero-based or one-based. I did some scraping. My script looks at type parameter lists and parameters. For each one, it collects all of the identifiers that have the same name with numeric suffixes. For each of those sequences, it sorts the numbers and looks at the starting one. After looking at 14,826,488 lines in 90,919 files across a large collection of Pub packages, Flutter widgets, and Flutter apps, I see:
So there's a slight preference for 1-based, but not huge. Looking at parameter lists and type parameter lists separately:
The stark difference here suggests that may be some outlier code defining a ton of type parameter lists with a certain style. Indeed, if we look at the number of sequences in each package:
So ffigen (whose names suggests contains a ton of generated code) heavily skews the data. Really, what we want to know is not what each sequence prefers, but what each user prefers. If only one user prefers starting at zero and everyone else prefers starting at one, but that user authors thousands of parameter lists, that doesn't mean they get their way. To approximate per-user preference, I treated each top level directory as a separate "author". For each one, I looked at all of the sequences in it to see if they start at one, zero, (or both):
While there are many sequences that start with zero, they are heavily concentrated in a few packages like ffigen and realm. When you consider each package as a single vote for a given style, then there is a much larger number of packages that contain one-based sequences. If you look at them, each one-based package only has a fairly small number of sequences. But there are many of these packages. That suggests that most users hand-authoring type parameter and parameter sequences prefer starting them at one. Based on that, I think we should start positional record field getters at one too. |
So what you're saying is that you're looking for "foo1" and "foo2" for instance (vs "foo0" and "foo1"?) (My 2 cents would be that with lists starting at 0 it would add confusion that these don't.) |
Yes, exactly. It looks for parameters whose identifier is
That was my intuition too, which is why the proposal initially had them start at zero. But from looking at the data, it seems pretty clear that when users number sequences of identifiers, they mostly start them at 1. See, for example, |
Incredible! We spent a bunch of time discussing this in the language meeting. While it appears to be technically feasible, from looking through the tests we concluded that it's just too weird and brittle. Given that Dart already has floating point literals that don't require a leading digit, null-aware operators, and cascades, it will be very hard (but apparently not impossible!) for tools to parse it correctly. While we might be able to get our compiler to handle it, all of the various syntax highlighters, static analyzers, IDE integrations, tree-sitters, etc. might not be so lucky. It's just a bridge too far. I don't think anyone loves the @jensjoha, thank you working through an experimental implementation. I really appreciate it. In particular, the thorough tests are good for giving us a sense of what we'd be getters ourselves (and our users) into if we did this syntax. We've decided to just stick with the current proposal and use |
The current records proposal uses then name
$0
to access the first positional field of a record.It's a simple approach which requires no new syntax, because it's just a named member access like any other. It works with
dynamic
invocations. There is a risk of name conflict with positional fields, but the$
prefix should make it unlikely (more unlikely than, say,.item1
, but probably not by much).We are now considering a Swift/Rust-like syntax of
record.0
instead ofrecord.$0
. That has some benefits, but also possibly some drawbacks, mainly around parsing.Basically, we'd add
'.' <DECIMAL_NUMERAL>
(where<DECIMAL_NUMERAL> ::= <DIGIT>+
) as a selector, similar to an identifier-named selector. It should be usable as.2
,?.2
,..2
and?..2
. (so2
is a cascade selector). It's not (currently) an assignable selector, since it will only apply to record fields, and those are final.It will act just like a member access for an integer-named member, and so far only records will be able to have such, and they're all getters.
The advantages is that it's shorter, some thinks prettier (because the
$
is noise) and it removes the risk of name collision. It's (arguably) more reasonable to start counting from zero, than it is for more name-like getters.**We are interested in understanding the viability of using this syntax, before going any further. **
Choosing such a syntax is mainly expected to affect the front-ends, and mostly the parsers. After that, it's expected that we can treat the integer selector as a named selector, with an unique unspeakable name for each number, for most purposes. We may want to retain the integer value if the back-ends can use it.
Possible parsing issue:
Tokenization becomes ambiguous. A
.2
can be either a double literal or a.
followed by a decimal selector2
, asr.2
.That's definitely an issue that needs to be resolved. It may be somewhat similar to how we handle
>>>
, which is tokenized into a single "triple shift" operator, but may be split into individual>
s again if parsing needs to end a type argument list.It may be possible to similarly split a double literal like
.2
or2.2
into into individual decimal-numerals and dots if it occurs in a selector or selector-name position. We do believe that those positions can never validly contain a double literal, so there will not be ambiguity between valid programs, other than a leading double numeral like2.2;
itself, which should still be tokenized as a number. It's only when a double literal occurs after another expression, or another expression followed by.
/?.
/..
/?..
, that it may need reinterpreting as a selector. (And we treat?.
as a single token, different from? .
, and we don't try to split that, so{e?.2:0}
is a map literal.Even if we can parse such valid programs, it may still negatively affect parser recovery for almost-correct programs.
Open design choices
There are a few ways we can vary the syntax, which could make it easier or harder to parse, but won't necessarily make any difference.
record.01
to mean the same asrecord.1
? (No strong preference. People might want to align things, but it's otherwise unnecessary.)record.0xA
to mean the same asrecord.10
. (Probably not. We don't expect so many fields that it'll make much of a difference.).0
or.1
. Should not affect parsing.dynamicValue.2
should work on records with at least three positional fields, and fail on non-records or shorter records, which won't have a getter named2
. When it fails, should it callnoSuchMethod
of the object? If so, what should thememberName
symbol be? (Likely no to callingnoSuchMethod
, but if yes,const Symbol("2")
is valid. Don't want to add a#2
symbol literal.)Developers, front-end ones first, WDYT - viable or near impossible?
@johnniwinther @jensjoha
The text was updated successfully, but these errors were encountered: