Skip to content

Friction points when adding the first functions to the registry #422

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mihnita opened this issue Jul 13, 2023 · 14 comments
Closed

Friction points when adding the first functions to the registry #422

mihnita opened this issue Jul 13, 2023 · 14 comments
Labels
functions Issue pertains to the default function set Future Deferred for future standardization resolve-candidate This issue appears to have been answered or resolved, and may be closed soon.

Comments

@mihnita
Copy link
Collaborator

mihnita commented Jul 13, 2023

The PR #420 tries to add a first few functions to the registry.

Since this is the first time we try to put something in the registry, it is a learning experience.

I've noticed some areas that would benefit from a discussion, can be used to derive more generic guidelines,
or update the registry description, or need more explaining.

So here there are, captured with not obvious order:

Registry improvements

I found that regexp / list of value mechanism is not powerful enough to describe all things we might want, especially for input.

ISO 8601 is one example: https://en.wikipedia.org/wiki/ISO_8601
And, as Addison noted, ISO8601 and RFC3339 are in the process of being extended to support of Temporals.

One can also imagine phone numbers, emails, urls, measurement, intervals, time duration etc.
Even the describing numbers is difficult. The current regex does not allow certain things that one would
reasonably expect, and allows certain things that one might find unusual.

So the current regex in patterns are to be seen as placeholders until we have something better.

List of values are also not enough, several comments were about the 20+ options in the number formatting
(maximumFractionDigits, minimumSignificantDigits, maximumSignificantDigits)

One idea: probably a minimal option would be an URL?

<pattern id="interval" url="https://url-to-detailed-spec"/>

Another idea: use XML Schema instead of DTD.
It is a bit richer in datatypes, and can also define new types.
But it is probably still not powerful enough.

Options left out

The first batch is strongly influenced by the ECMAScript Intl, intersected with what ICU can do today,
and also keeping the back of the head what current OS APIs can do.

Some areas of ECMAScript were left out, as they don't seem to have a be too portable:

  • The locale matching algorithm to use: localeMatcher
  • The format matching algorithm to use: formatMatcher
  • Whether to use 12-hour time (as opposed to 24-hour time): hour12. It looks like it overlaps with hourCycle?

Experimental options: left out

We left out all the Experimental options in Intl.NumberFormat:
signDisplay@negative, useGrouping, roundingMode, roundingPriority, roundingIncrement, trailingZeroDisplay

Functions left out

For now we ignored the functions that are ICU only or ECMAScript only.

list formatting : ECMAScripy only
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/ListFormat

relative time formatting : ECMAScripy only
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/RelativeTimeFormat

spellout : ICU only
duration : ICU only

ICU has relative time and list formatting, but not in MessageFormat.

Adding function wrappers to have them in MessageFormat 2 is a simple.
In fact list formatting is already in the ICU4J implementation, in a unit test.
It took about 40 lines of code.

@mihnita
Copy link
Collaborator Author

mihnita commented Jul 13, 2023

Do we want a "gender" function now, or leave it for later?

@mihnita
Copy link
Collaborator Author

mihnita commented Jul 13, 2023

From @eemeli

Exponents, anyone? Also, is it intentional to allow for leading zeros?

In this context:

<pattern id="anyNumber" regex="-?[0-9]+(\.[0-9]+)"/>

@mihnita
Copy link
Collaborator Author

mihnita commented Jul 13, 2023

From @aphillips

Don't forget offset time zones.

Context:

<!-- The time zone to use. The only value implementations must recognize
   is "UTC"; the default is the runtime's default time zone.
   Implementations may also recognize the time zone names of the IANA
   time zone database, such as "Asia/Shanghai", "Asia/Kolkata",
   "America/New_York".
-->
<option name="timeZone" pattern="timeZoneId"/>

@mihnita
Copy link
Collaborator Author

mihnita commented Jul 13, 2023

Bigger theme: what are we doing if certain things are incomplete / controversial in ECMAScript?

@mihnita
Copy link
Collaborator Author

mihnita commented Jul 13, 2023

From @macchiati

That is, I'm looking for some way we can make it harder (or at least alert people to )make the following mistake:

match {$count}
when one {You have a book}
when * {You have {$count :number maximumFractionDigits=0} books}

That is, when $count = 1.1, this fails, with *You have 1 books

To discuss where can we put this kind of info.

I don't think the registry is flexible enough to handle this.

Ideas:

  1. allow "external spec" for functions, with the registry.xml pointing to that (need a small registry spec change)
  2. in the registry.md
  3. make the registry more flexible (i think it is unrealistic, but TBD)
  4. ???

@macchiati
Copy link
Member

Basically, for plurals, whenever any value of $count is formatted differently than another, it is a likely mistake. And that includes the implicit "formatting" involved in a match. The following, for example will likely be a failure in some language.

FAIL
match {$count}
when one {You have a book}
when many {You have {$count :number maximumFractionDigits=1} books}
when * {You have {$count :number maximumFractionDigits=0} books}

And this:

FAIL
let $count = {$count :number maximumFractionDigits=1}
match {$count}
when one {You have a book}
when * {You have {$count :number maximumFractionDigits=0} books}

And so on. This, on the other hand is ok, because whatever the default formatting of $count is, it is used for both.

OK
match {$count}
when one {You have a book}
when * {You have {$count} books}

Also ok:

OK
match {$count :number maximumFractionDigits=1}
when one {You have a book}
when * {You have {$count :number maximumFractionDigits=1} books}

OK
let $count = {$count :number maximumFractionDigits=1}
match {$count}
when one {You have a book}
when * {You have {$count} books}

So we should recommend that a "message formatter linter" identify and flag the bad cases, since they can be identified fairly easily. Can't stop all bad cases, but l meglio è nemico del bene.

@macchiati
Copy link
Member

On gender. ICU just uses the string select for personal gender. That has some disadvantages.

Because it is not a closed set, software can't recognize when there are faulty values (eg simple typos like "fernale"). Translation software also can't "flesh out" a message by adding alternatives that don't exist in the original source. Suppose the original message is in French, for example, with two genders. The translation software can't recognize that that should be expanded for languages that more than two commonly-used personal genders (eg he/she/they in English, han/hon/hen Swedish, ...), or contracted for languages that only have a single personal gender (eg Turkish).

So I recommend having a :gender selector.

@mihnita
Copy link
Collaborator Author

mihnita commented Jul 24, 2023

From eemeli@

<option name="style" values="decimal currency percent unit" default="decimal"/>

To consider: If we were to leave out currency and unit formatting from the default, then we wouldn't need to say how compound units need to work.

@aphillips aphillips added the functions Issue pertains to the default function set label Jul 29, 2023
@aphillips
Copy link
Member

@mihnita I think this issue was super-useful in documenting what you ran into in creating #420.

What are the next steps for this issue? Can we close this in favor of specific tasks?

Regarding :gender, I think my reading is that such a selector is not yet ready for required implementation by every MF2. Does CLDR have data for us to point to that is stable and mature enough to bring it into 2.0?

@mihnita
Copy link
Collaborator Author

mihnita commented Dec 15, 2023

I think that describing / validating values with regexp is not powerful enough.

Should I open an issue (so that we can close this one?)


For gender there is this section in TR 35:
https://unicode.org/reports/tr35/tr35-general.html#Gender

These values are already used by some formatters, and I think by number spellout.

Android also added some minimal gender support, for the gender of the user only (so no worries about inanimate)
(https://developer.android.com/about/versions/14/features/grammatical-inflection)

iOS has something too (https://developer.apple.com/documentation/foundation/nsgrammaticalgender)

So I think it might be good to have some minimal level of support: the name of the selector, and a list of values.
Tbd what we list as values, but I think that feminine, masculine, neuter are pretty non-controversial.

And we can add more later.

But at least we know that people don't use feminine female fem, F and who knows what.

Should I open an issue for gender and we can close this one?

@aphillips
Copy link
Member

aphillips commented Dec 16, 2023

Thanks @mihnita !

I don't think the question is whether there exists some grammatical gender support, but whether we should require every implementation of MF2 to provide such a feature in 2.0. I don't think that makes sense, since the APIs and CLDR data available are still somewhat primitive. ICU4J could provide such an API, if you think it's ready and we could incubate future support. Does that make sense?

PS> Yes, if you want to discuss further, please do open an issue!

@mihnita
Copy link
Collaborator Author

mihnita commented Dec 18, 2023

I don't think it is high priority for the release.
And since this is about registry, which is designed to add new things without changing the spec, it is not a problem to add later.
Will still be 2.0 :-)
So if the goal is to close issues, or tag what we have as "release-blocker" or similar, then this is definitely not a release blocker.

Even if I open an issue, I would not consider that critical for the release (because it is not spec, it is registry).
Do we have a way to open issues "so that we don't forget this", but explicitly not release blocker?
Should we invent such a tag? Or tag the opposites, with the existing blocker tag.

Way-back-when I created blocker-candidate and blocker
The idea was that people should tag issues that they think must be in the first release as blocker-candidate, and if the group looks at them and they agree, they become blocker
But they didn't see much usage.

@aphillips
Copy link
Member

So if the goal is to close issues, or tag what we have as "release-blocker" or similar, then this is definitely not a release blocker.

I slightly disagree. Yes, the registry is separate from the spec on some level. But it is designed to be normative. There could be non-normative extensions that are recommended.

Do we have a way to open issues "so that we don't forget this", but explicitly not release blocker?
Should we invent such a tag? Or tag the opposites, with the existing blocker tag.

Yes: I created the Future tag for items that do not get blocker-candidate/blocker/LDML45 tags.

@aphillips aphillips added resolve-candidate This issue appears to have been answered or resolved, and may be closed soon. Future Deferred for future standardization labels Jan 14, 2024
@aphillips
Copy link
Member

per discussion in 2024-02-05, closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
functions Issue pertains to the default function set Future Deferred for future standardization resolve-candidate This issue appears to have been answered or resolved, and may be closed soon.
Projects
None yet
Development

No branches or pull requests

3 participants