Alternative syntax for record positional field getters - viability query #2726

lrhn · 2022-12-15T14:35:16Z

The current records proposal uses then name $0 to access the first positional field of a record.

It's a simple approach which requires no new syntax, because it's just a named member access like any other. It works with dynamic invocations. There is a risk of name conflict with positional fields, but the $ prefix should make it unlikely (more unlikely than, say, .item1, but probably not by much).

We are now considering a Swift/Rust-like syntax of record.0 instead of record.$0. That has some benefits, but also possibly some drawbacks, mainly around parsing.

Basically, we'd add '.' <DECIMAL_NUMERAL> (where <DECIMAL_NUMERAL> ::= <DIGIT>+) as a selector, similar to an identifier-named selector. It should be usable as .2, ?.2, ..2 and ?..2. (so 2 is a cascade selector). It's not (currently) an assignable selector, since it will only apply to record fields, and those are final.
It will act just like a member access for an integer-named member, and so far only records will be able to have such, and they're all getters.

The advantages is that it's shorter, some thinks prettier (because the $ is noise) and it removes the risk of name collision. It's (arguably) more reasonable to start counting from zero, than it is for more name-like getters.

**We are interested in understanding the viability of using this syntax, before going any further. **

Choosing such a syntax is mainly expected to affect the front-ends, and mostly the parsers. After that, it's expected that we can treat the integer selector as a named selector, with an unique unspeakable name for each number, for most purposes. We may want to retain the integer value if the back-ends can use it.

Possible parsing issue:

Tokenization becomes ambiguous. A .2 can be either a double literal or a . followed by a decimal selector 2, as r.2.

That's definitely an issue that needs to be resolved. It may be somewhat similar to how we handle >>>, which is tokenized into a single "triple shift" operator, but may be split into individual > s again if parsing needs to end a type argument list.

It may be possible to similarly split a double literal like .2 or 2.2 into into individual decimal-numerals and dots if it occurs in a selector or selector-name position. We do believe that those positions can never validly contain a double literal, so there will not be ambiguity between valid programs, other than a leading double numeral like 2.2; itself, which should still be tokenized as a number. It's only when a double literal occurs after another expression, or another expression followed by ./?./../?.., that it may need reinterpreting as a selector. (And we treat ?. as a single token, different from ? ., and we don't try to split that, so {e?.2:0} is a map literal.

Even if we can parse such valid programs, it may still negatively affect parser recovery for almost-correct programs.

Open design choices

There are a few ways we can vary the syntax, which could make it easier or harder to parse, but won't necessarily make any difference.

Allow leading zeros. Should we allow record.01 to mean the same as record.1? (No strong preference. People might want to align things, but it's otherwise unnecessary.)
Hex literals. Should we allow record.0xA to mean the same as record.10. (Probably not. We don't expect so many fields that it'll make much of a difference.)
Start at .0 or .1. Should not affect parsing.
Dynamic invocations. Should not affect parsing. Doing dynamicValue.2 should work on records with at least three positional fields, and fail on non-records or shorter records, which won't have a getter named 2. When it fails, should it call noSuchMethod of the object? If so, what should the memberName symbol be? (Likely no to calling noSuchMethod, but if yes, const Symbol("2") is valid. Don't want to add a #2 symbol literal.)

Developers, front-end ones first, WDYT - viable or near impossible?
@johnniwinther @jensjoha

The text was updated successfully, but these errors were encountered:

eernstg · 2022-12-15T15:25:47Z

We should also consider the generality of this mechanism. It would have been nice if we could support this notion of integer numerals as selectors as more than a very narrow special case, but that doesn't seem to work out so smoothly:

extension on Object {
  void operator 0() => print('Hello, $this!');
}

void main() {
  2.0; // Print 'Hello, 2!', or evaluate a `double` value and discard it?
}

lrhn · 2022-12-15T15:50:25Z

It's true that if we later introduce integer getters (or general members, wohoo!) on arbitrary types, then 2.0 becomes both grammatically and syntactically valid for both possible tokenizations. So we have to choose, and choosing the double literal is the only rational choice. So it evaluates to the double, discards it, and gets an analyzer warning for useless code.

Doing maximal tokenization, as usual, and then splitting the tokens if necessary, will give that effect.

I don't think there are other cases where both tokenizations of '.' DIGIT can lead to grammatically valid programs.

jensjoha · 2022-12-19T10:55:46Z

If I can get some sample test cases I can try to spend a little time hacking on it and see how far I come. Probably that should give us an idea.

lrhn · 2022-12-19T15:41:27Z

Here is a CL with some hypothetical tests. Since there is no syntax support, I probably made plenty of mistakes, so take it with a grain of salt.

https://dart-review.googlesource.com/c/sdk/+/276522

munificent · 2022-12-19T23:16:26Z

My opinions:

Allow leading zeros. Should we allow record.01 to mean the same as record.1? (No strong preference. People might want to align things, but it's otherwise unnecessary.)

No. It's a label whose entire lexeme is meaningful, not an integer whose value is the identifier.

Rust doesn't allow leading zeroes.

Hex literals. Should we allow record.0xA to mean the same as record.10. (Probably not. We don't expect so many fields that it'll make much of a difference.)

Again, no.

Start at .0 or .1. Should not affect parsing.

Zero. Rust and Swift both use this syntax start at zero.

All of the languages I found that use integer literals in some form (as opposed to some prefixed identifier-like thing like #1, Item1, or _1) start a zero:

Crystal, D: tuple[0], tuple[1], etc.
Scala 3: tuple(0), tuple(1), etc.

Dynamic invocations. Should not affect parsing. Doing dynamicValue.2 should work on records with at least three positional fields, and fail on non-records or shorter records, which won't have a getter named 2. When it fails, should it call noSuchMethod of the object? If so, what should the memberName symbol be? (Likely no to calling noSuchMethod, but if yes, const Symbol("2") is valid. Don't want to add a #2 symbol literal.)

I'm fine either way. It works since you can construct a symbol whose name is 2. Agreed that it's definitely not worth adding symbol literal support for this.

lrhn · 2022-12-20T10:11:53Z

ACK on no leading zeros. Updated hypothetical test files.

Going for "cannot be intercepted by noSuchMethod" for now, which means all we have to worry about is how to create a noSuchMethodError, not necessarily an Invocation with a memberName.

(I have an even more hypothetical document theorizing how we could possibly make a grammar for this: https://gist.github.com/lrhn/f06ba8300def9cc4bfe84869c3d78229. May have no bearing on the reality of parsing.)

jensjoha · 2023-01-09T12:07:17Z

It seems to work @ https://dart-review.googlesource.com/c/sdk/+/278506

lrhn · 2023-01-09T12:14:16Z

That's darn impressive!

munificent · 2023-01-12T02:42:41Z

I wanted to get some actual data about whether users prefer numbered lists of things in their code to be zero-based or one-based. I did some scraping. My script looks at type parameter lists and parameters. For each one, it collects all of the identifiers that have the same name with numeric suffixes. For each of those sequences, it sorts the numbers and looks at the starting one.

After looking at 14,826,488 lines in 90,919 files across a large collection of Pub packages, Flutter widgets, and Flutter apps, I see:

-- Start (2740 total) --
   1544 ( 56.350%): 1     ===============================
   1114 ( 40.657%): 0     ======================
     59 (  2.153%): 2     ==
      6 (  0.219%): 30    =
      4 (  0.146%): 8     =
      3 (  0.109%): 11    =
      2 (  0.073%): 32    =
      2 (  0.073%): 5     =
      2 (  0.073%): 6391  =
      1 (  0.036%): 3     =
      1 (  0.036%): 91    =
      1 (  0.036%): 37    =
      1 (  0.036%): 24    =

So there's a slight preference for 1-based, but not huge. Looking at parameter lists and type parameter lists separately:

-- Parameters start (2618 total) --
   1435 ( 54.813%): 1     ==============================
   1105 ( 42.208%): 0     =======================
     55 (  2.101%): 2     ==
      6 (  0.229%): 30    =
      4 (  0.153%): 8     =
      3 (  0.115%): 11    =
      2 (  0.076%): 32    =
      2 (  0.076%): 5     =
      2 (  0.076%): 6391  =
      1 (  0.038%): 3     =
      1 (  0.038%): 91    =
      1 (  0.038%): 37    =
      1 (  0.038%): 24    =

-- Type parameters start (122 total) --
    109 ( 89.344%): 1  ===================================================
      9 (  7.377%): 0  =====
      4 (  3.279%): 2  ==

The stark difference here suggests that may be some outlier code defining a ton of type parameter lists with a certain style. Indeed, if we look at the number of sequences in each package:

-- Package (6089 total) --
   1344 ( 22.073%): ffigen-6.1.2
    500 (  8.212%): realm-0.4.0+beta
    440 (  7.226%): artemis_cupps-0.0.76
    308 (  5.058%): _fe_analyzer_shared-46.0.0
    277 (  4.549%): tencent_im_base-0.0.33
    250 (  4.106%): realm_dart-0.4.0+beta
    172 (  2.825%): flutter-flutter
    167 (  2.743%): invoiceninja-admin-portal
    167 (  2.743%): invoiceninja-flutter-mobile
    111 (  1.823%): statistics-1.0.23
     71 (  1.166%): dart_native-0.7.4
     59 (  0.969%): sass-1.54.5
     56 (  0.920%): fpdt-0.0.63
     53 (  0.870%): objectbox-1.6.2
     49 (  0.805%): medea_flutter_webrtc-0.8.0-dev+rev.fe4d3b9cd21a390870d5390393300371fe5f1bb2
     46 (  0.755%): linter-1.27.0

So ffigen (whose names suggests contains a ton of generated code) heavily skews the data.

Really, what we want to know is not what each sequence prefers, but what each user prefers. If only one user prefers starting at zero and everyone else prefers starting at one, but that user authors thousands of parameter lists, that doesn't mean they get their way.

To approximate per-user preference, I treated each top level directory as a separate "author". For each one, I looked at all of the sequences in it to see if they start at one, zero, (or both):

-- By package/author (338 total) --
    305 ( 90.237%): Only one-based                 ===========================
     22 (  6.509%): Only zero-based                ==
     11 (  3.254%): Both zero-based and one-based  =

While there are many sequences that start with zero, they are heavily concentrated in a few packages like ffigen and realm. When you consider each package as a single vote for a given style, then there is a much larger number of packages that contain one-based sequences. If you look at them, each one-based package only has a fairly small number of sequences. But there are many of these packages. That suggests that most users hand-authoring type parameter and parameter sequences prefer starting them at one.

Based on that, I think we should start positional record field getters at one too.

jensjoha · 2023-01-12T07:26:30Z

I wanted to get some actual data about whether users prefer numbered lists of things in their code to be zero-based or one-based. I did some scraping. My script looks at type parameter lists and parameters. For each one, it collects all of the identifiers that have the same name with numeric suffixes.

So what you're saying is that you're looking for "foo1" and "foo2" for instance (vs "foo0" and "foo1"?)

(My 2 cents would be that with lists starting at 0 it would add confusion that these don't.)

munificent · 2023-01-12T19:32:44Z

So what you're saying is that you're looking for "foo1" and "foo2" for instance (vs "foo0" and "foo1"?)

Yes, exactly. It looks for parameters whose identifier is [alpha][number] and groups them by their shared prefix. Then for each of those groups with more than one entry, it looks at the lowest number in the range. The code for the script is here.

(My 2 cents would be that with lists starting at 0 it would add confusion that these don't.)

That was my intuition too, which is why the proposal initially had them start at zero. But from looking at the data, it seems pretty clear that when users number sequences of identifiers, they mostly start them at 1. See, for example, Object.hash().

munificent · 2023-01-12T22:18:24Z

It seems to work @ https://dart-review.googlesource.com/c/sdk/+/278506

Incredible!

We spent a bunch of time discussing this in the language meeting. While it appears to be technically feasible, from looking through the tests we concluded that it's just too weird and brittle. Given that Dart already has floating point literals that don't require a leading digit, null-aware operators, and cascades, it will be very hard (but apparently not impossible!) for tools to parse it correctly. While we might be able to get our compiler to handle it, all of the various syntax highlighters, static analyzers, IDE integrations, tree-sitters, etc. might not be so lucky.

It's just a bridge too far. I don't think anyone loves the $1 syntax, but it's simple and safe. For a feature that we don't anticipate being used heavily—users should prefer destructuring—that's the right trade-off.

@jensjoha, thank you working through an experimental implementation. I really appreciate it. In particular, the thorough tests are good for giving us a sense of what we'd be getters ourselves (and our users) into if we did this syntax.

We've decided to just stick with the current proposal and use $.

lrhn added feature Proposed language feature that solves one or more problems records Issues related to records. labels Dec 15, 2022

dart-lang deleted a comment from johnniwinther Dec 16, 2022

munificent mentioned this issue Jan 12, 2023

Should record fields start at $0 or $1. #2638

Closed

munificent closed this as completed Jan 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alternative syntax for record positional field getters - viability query #2726

Alternative syntax for record positional field getters - viability query #2726

lrhn commented Dec 15, 2022 •

edited

Loading

eernstg commented Dec 15, 2022

lrhn commented Dec 15, 2022

jensjoha commented Dec 19, 2022

lrhn commented Dec 19, 2022

munificent commented Dec 19, 2022

lrhn commented Dec 20, 2022

jensjoha commented Jan 9, 2023

lrhn commented Jan 9, 2023

munificent commented Jan 12, 2023 •

edited

Loading

jensjoha commented Jan 12, 2023

munificent commented Jan 12, 2023 •

edited

Loading

munificent commented Jan 12, 2023 •

edited

Loading

Alternative syntax for record positional field getters - viability query #2726

Alternative syntax for record positional field getters - viability query #2726

Comments

lrhn commented Dec 15, 2022 • edited Loading

eernstg commented Dec 15, 2022

lrhn commented Dec 15, 2022

jensjoha commented Dec 19, 2022

lrhn commented Dec 19, 2022

munificent commented Dec 19, 2022

lrhn commented Dec 20, 2022

jensjoha commented Jan 9, 2023

lrhn commented Jan 9, 2023

munificent commented Jan 12, 2023 • edited Loading

jensjoha commented Jan 12, 2023

munificent commented Jan 12, 2023 • edited Loading

munificent commented Jan 12, 2023 • edited Loading

lrhn commented Dec 15, 2022 •

edited

Loading

munificent commented Jan 12, 2023 •

edited

Loading

munificent commented Jan 12, 2023 •

edited

Loading

munificent commented Jan 12, 2023 •

edited

Loading