Skip to content

Add syntax for grapheme clusters literals. #1432

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Cat-sushi opened this issue Feb 3, 2021 · 11 comments
Open

Add syntax for grapheme clusters literals. #1432

Cat-sushi opened this issue Feb 3, 2021 · 11 comments
Labels
feature Proposed language feature that solves one or more problems

Comments

@Cat-sushi
Copy link

Cat-sushi commented Feb 3, 2021

Currently, grapheme clusters (Characters) are the only way to manipulate natural languages correctly.
So, I propose syntax for grapheme clusters literals like g"𠮷野".
It might include a proposal that characters extension must be a part of dart:core.

@Cat-sushi Cat-sushi added the feature Proposed language feature that solves one or more problems label Feb 3, 2021
@Cat-sushi
Copy link
Author

This proposal is derived from the closed proposal #1428.
g"𠮷野".length returns 2 (grapheme clusters), but not 3 (code units).

@Cat-sushi Cat-sushi changed the title Add syntax for grapheme clusters constants. Add syntax for grapheme clusters literals. Feb 3, 2021
@Cat-sushi
Copy link
Author

I think it should be constant, but I'm not sure it is a good idea.
So, I changed the title.

@Cat-sushi
Copy link
Author

Naming system of prefix must be arranged with #886 and others if exist.

@AKushWarrior
Copy link

I don't know that the g"str" syntax is necessarily in line with dart style conventions to this point, though there is precedent in Rust's byte literal syntax b"str". I might prefer to simply be able to access "words".characters or "words".clusters; that's pretty much how it's handled now with codeunits and runes.

I agree that the characters package should be included as a core package; it provides a fundamental functionality, and it's a lot easier to import "dart:characters" than go to pubspec.yaml, include characters, come back to my file, import the package, and remember why I needed it in the first place.

@Cat-sushi
Copy link
Author

Cat-sushi commented Feb 4, 2021

I might prefer to simply be able to access "words".characters or "words".clusters; that's pretty much how it's handled now with codeunits and runes

There is a proposal to introduce single code point constant (but not sequence of code points) with similar syntax by core member.
Refer #886, in which the necessity of literal is mentioned.
"words".characters already exists, which returns a Iterable view of String.
Sequence of code units is a default representation of String and String natively provides code unit based API.
On the other hand, String.codeUnits generate List<int> in which every single code unit(16 bits) are represented int(64 bits), which have quite different purpose from that of Characters.

As you said, grapheme cluster is fundamental, which deserves literal, I think.

@Cat-sushi
Copy link
Author

Characters cs = '𠮷野'; // lint : omit_local_variable_types

can be rewrote to

var cs = g'𠮷野';

@lrhn
Copy link
Member

lrhn commented Feb 12, 2021

If we move Characters into the platform libraries, then adding a literal for creating (effectively) const Characters(stringLiteral) seems reasonable.

I'm also sure that some will argue that Characters should be the default string literal, and you'd have to write u16"...." to get the current string. (Then u8"...." could be UTF-8 encoded).
That's a tough sell, though.

@Cat-sushi
Copy link
Author

Cat-sushi commented Feb 12, 2021

@lrhn

I'm also sure that some will argue that Characters should be the default string literal, and you'd have to write u16"...." to get the current string. (Then u8"...." could be UTF-8 encoded).
That's a tough sell, though.

I knew.
I don't request that far.

@dnfield
Copy link

dnfield commented Feb 18, 2021

It might be nice to have a lint discouraging people from using String.length too. It's almost never what they really want.

@lrhn
Copy link
Member

lrhn commented Feb 19, 2021

I can assure you, as someone who's written quite a lot of small parsers, that String.length is exactly what I want when I traverse the code units of a string. Parsing JSON, or integer literals, or URLs, or XML, or any other structured textual input which is commonly stored as a String, is quite different from handling user-written text. The Dart String class contains both. The API just happens to be better suited for the former.

A Dart String is a sequence of code units. Any abstraction on top of that is a separate class (Runes, Characters). You can, an should, choose the abstraction you need, but sometimes "sequence of code units" is the abstraction level you need.

A String is not only for text - words and phrases intended to be displayed as such. It supports that as well.

@Cat-sushi
Copy link
Author

@dnfield @lrhn

String.length is exactly what I want when I traverse the code units of a string

Yes.
The problem is that, String.length is too exposed to average programmers.
So, deprecation of String.length and introduction String.size might be a solution.
But, that was a discussion at #1428.

This proposal is just for literal and dart:core.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Proposed language feature that solves one or more problems
Projects
None yet
Development

No branches or pull requests

4 participants