Skip to content

Selectors docs regarding use of NUMBER are incorrect (and a bad idea) #155

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
spookylukey opened this issue Jul 19, 2018 · 6 comments
Open
Labels

Comments

@spookylukey
Copy link
Contributor

Here:

https://projectfluent.org/fluent/guide/selectors.html

If the translation requires a number to be formatted in a particular non-default manner, the selector should use the same formatting options. The formatted number will then be used to choose the correct CLDR plural category which, for some languages, might be different than the category of the unformatted number:

your-score =
   { NUMBER($score, minimumFractionDigits: 1) ->
       [0.0]   You scored zero points. What happened?
      *[other] You scored { NUMBER($score, minimumFractionDigits: 1) } points.
   }

The text claims that the formatted version of $score will be used to do the matching. In reality this doesn't happen in the reference implementation - see:

The 0.0 is parsed as a number literal, converted to a FluentNumber, and then compared by value to the FluentNumber instance that results from the first NUMBER call. I added a test to the suite to confirm this, and it makes no difference if you change 0.0 to 0 or 0.00, or if you increase/decrease the value of minimumFractionDigits etc - it is doing a numerical comparison, not taking into account the formatting options.

Further, the behaviour described in the docs would be a bad idea, AFAICS:

  1. Number formatting can be quite expensive, we want to avoid it for selector matching.

  2. If the formatted number should appear in the variant key expression, this presumably means it needs to be changed by translators e.g. in the example, 0.0 when considered as a string matches the formatted NUMBER expression for English, but it would have to be changed to 0,0 for German etc. I'm guessing translators would have confusion over whether to change these numbers like this.

  3. If the formatted number should appear in the variant key expression, all kinds of confusion occurs in terms of parsing/interpreting the FTL. For English locales, the formatted number looks like the number literal for small numbers (1, 2.3 etc.), but not for large numbers (e.g. 1,000 which is not a valid number literal.). For other locales, sometimes the formatted number will again not be a valid number literal, sometimes it might look like a number literal but for a different number (e.g. 1.000 is German for one thousand, but would be parsed as a valid FTL number literal for one).

@stasm
Copy link
Contributor

stasm commented Jul 19, 2018

Thanks. This looks like two issues:

  1. The wording in the Guide is misleading. You give valid reasons for why it is so.
  2. The fluent.js implementation doesn't currently look at the decimal part of numbers.

Let's use this issue to refine the wording in the Guide. I'll then file a bug to add the desired behavior to fluent.js.

The motivation here is to be able to tell 1 and 1.0 apart. Some languages use a different plural form for these. I think English is one of them, actually: 1 apple but 1.0 apples.

The spec should mention that the decimal part of the number is significant, even if it's all zeros. We shouldn't format the number however, as that might produce non-digits, too, and use different separators. The implementation might actually need to build a custom representation of the number just for this comparison. Does this sound like a reasonable thing to do?

@spookylukey
Copy link
Contributor Author

@stasm - In terms of fixing fluent.js, it seems workable, although I'm wondering where the data would come from for this. Does CLDR even distinguish this case? AFAICS it doesn't. Also it seems like the algorithm for determining a match is already quite complex (and not clearly specified anywhere I can find, other than in the JS implementation), this could be making it even more complex.

@spookylukey
Copy link
Contributor Author

spookylukey commented Jul 21, 2018

Actually, thinking some more, I'm confused about the need for this feature or how it should work.

In English, your use case would be covered by:

apples-kilos = There are { NUMBER($qty, minimumFractionDigits: 1) } kilos of apples

i.e. you use the plural form 'kilos' always, because 1 becomes 1.0 which takes the plural. I don't see the need for some special matching functionality for English - but it does require the translator to be aware of this issue.

For other languages I don't know how it would work.

@zbraniecki
Copy link
Collaborator

Does CLDR even distinguish this case? AFAICS it doesn't.

It does. You can read more about it in https://www.unicode.org/reports/tr35/tr35-numbers.html#Language_Plural_Rules

In particular, a lot about operands and CLDR plural rules syntax will help you recognize how much effort CLDR puts into that.

And ECMA402 Intl.PluralRules correctly handles formatted numbers, so minimumFractionDigits will impact your result:

(new Intl.PluralRules("en", {minimumFractionDigits: 1}).select(1))
"other"
(new Intl.PluralRules("en", {minimumFractionDigits: 0}).select(1))
"one" 

@spookylukey
Copy link
Contributor Author

@zbraniecki - thanks so much for that, it looks like I was misunderstanding based on what I was reading from Python Babel's implementation. In fact on further investigation Babel does also seem to support this correctly if you use Decimal rather than float.

@alabamenhu
Copy link
Contributor

alabamenhu commented Apr 17, 2019

Don't forget that applying NUMBER() to a small integer value may also even result invalid digits. Consider an Arabic user for whom NUMBER(123) may produce ١٢٣ (even an English speaker can override their preferred number forms).

That said, using the plural system already handles the 1 vs 1.0 case in English:

     [0] You have no messages.
     [1] You have one message
*[other] You have { NUMBER($number) } messages.

Because other is the plural category of 1.0. I can't think of a use case, though, for something like

     [0] You have no water.
     [1] You have a cup of water
     [1.0] You have exactly one cup of water
*[other] You have { NUMBER($number) } messages.

Presumably, if the number is formatted with a single fractional digit, the [1] message is pointless — it can never be reached, so it could only be reached if the developer passes a string that can be coerced into a number (but in such case, should 1. be considered separately?).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants