Skip to content

Define non-negative integer option #741

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Mar 26, 2024

Conversation

aphillips
Copy link
Member

@catamorphism identified this as a gap in e.g. #739.

This PR defines what a non-negative integer option value is for :number and :integer. It does not attempt to address the larger question raised in 738/739.

@catamorphism identified this as a gap in e.g. #739. 

This PR defines what a non-negative integer option value is for `:number` and `:integer`. It does not attempt to address the larger question raised in 738/739.
@aphillips aphillips added functions Issue pertains to the default function set LDML45 LDML45 Release (Tech Preview) labels Mar 19, 2024
spec/registry.md Outdated
A "non-negative integer" is one of:
- an implementation-defined integer numeric type where the value of the
encoded number is not less than zero
- a literal consisting only of ASCII digits U+0030 through U+0039;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saying "literal" here would mean that a variable reference resolving to a numeric string value would not be valid. This is problematic because it would introduce a new requirement for a function implementation to be able to tell apart a literal string value and a variable string value.

Suggested change
- a literal consisting only of ASCII digits U+0030 through U+0039;
- a string consisting only of ASCII digits U+0030 through U+0039;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I said "literal" to allow the value to be quoted, e.g. maximumFractionDigits=|11|

I agree that my phrasing wouldn't allow a variable reference, but the item to fix to accomplish that is the first one. Change to follow...

... via a rewrite.

Adds an example. Permits upper length limit. Removes any mention of types. Permits an upper numeric amount (e.g. if you're implementation of number formatting is limited to, say, 64 digits, you can make the maximum allowed value `64`)
@aphillips aphillips requested a review from eemeli March 19, 2024 23:45
spec/registry.md Outdated
Comment on lines 630 to 631
If the value is passed as a literal
it MUST contain only the ASCII digits U+0030 through U+0039.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still problematic. Consider these two example messages:

literal: {42 :number minimumFractionDigits=2}
variable: {42 :number minimumFractionDigits=$foo}

With the current proposed language, formatting the first is valid, because there the 2 is a literal, but formatting the second with { foo: '2' } as input is invalid, because it's a variable, and we're here establishing stricter rules for variables than literals.

The problem with that is that in order to make that differentiation, one of two things must happen:

  1. The messageformat runtime implementation must parse the numeric value out of the literal before passing it to the :number implementation as an integer, or emitting an error if it doesn't match the expectations. This requires for :number to somehow communicate to the runtime its expectations on the types of its options, and for types to be introduced to the runtime. I really doubt that we want this.

  2. The :number implementation must be able to tell where an option value comes from, so that it can apply one set of rules to literal values and another to variable values. In practice, with the examples as above, it would mean that a :number implementation would need to be called with

    number(context, Literal('42'), { minimumFractionDigits: Literal('2') })
    

    for the first message, and with

    number(context, Literal('42'), { minimumFractionDigits: Variable('2') })
    

    for the second, so that Literal('2') and Variable('2') could be differentiated.

    This would be a new, additional requirement on implementations, as without this addition it is possible for a valid runtime to call a :number implementation for either message as

    number(context, '42', { minimumFractionDigits: '2' })
    

So let's not introduce language here or elsewhere that differentiates the treatment of literal vs variable values, because that would add complexity to all custom functions in all implementations. Instead, let's just allow for a variable with a value of '2' to get treated the exact same as a literal 2 in options.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here is to impose implementation details on implementations of :number (and :integer), not on MF. There has to be a clear delineation for message authors about what literal values are accepted. I will try to clarify that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(also, I didn't read "passed as a literal" to only mean in the syntax, but also as the value of a variable)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think there's some confusion because this text isn't defining a syntactic property, it's defining the runtime interface with a function implemented in another language, so saying "must accept a string" doesn't mean you can't write maximumFractionDigits=|11| in the syntax. Passing in a string of length 4 that begins and ends with a '|' should be an error if an integer option is expected. Function implementations take parsed values (though of course this area is muddy because we avoid saying what a parsed value is.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so saying "must accept a string" doesn't mean you can't write maximumFractionDigits=|11| in the syntax. Passing in a string of length 4 that begins and ends with a '|' should be an error if an integer option is expected

Nah. That thing to the right of the equals sign can't be anything but a literal (or a variable with its $ sigil). That means it can be quoted. We very explicitly say that there is no difference between |11| and 11. As far as MF is concerned, all there is to the right of the equals is a literal. It is up to the number function to decide what to do with it.

A literal of length 4 that begins and ends with a | looks like maxiumumFractionDigits=|\|11\|| in our syntax.

Functions implementations can get typed values via variable references, e.g. maximumFractionDigits=$maxDigits where $maxDigits is some kind of int-like type. Otherwise it must parse a literal. This is "number functions" defining what literals it accepts and what it does with them.

Copy link
Collaborator

@catamorphism catamorphism Mar 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe what's confusing is that there are two different parsers:

  • The MessageFormat parser takes a MessageFormat string and produces a data model, which includes nodes representing literals in the syntax.
  • Each function implementation has its own parser that takes strings (or other implementation-defined types) and produces some implementation-defined type, in this case a number.

The first parser is responsible for stripping the '|' characters, so that the second parser doesn't need to have its own partial implementation of MessageFormat syntax. That's what I meant.

Another way to say this is that we're defining a foreign function interface and that means defining common types across the interface. "String" makes sense as a common type, since most programming languages have a string type. The name "literal" introduces confusion because it has meaning on one side of the interface that doesn't map onto anything on the other side.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean. The registry spec is written from the outside in: it talks about what you can have in the message syntax and uses MF's terminology for that. So the first parser is what is in play. You're correct that the number function implementation never sees the | in your example. It gets the literal string 11 and parses it. That implementation, by the way, is probably glue code that calls the ultimate formatter.

that means defining common types across the interface

I don't agree. I don't think the default registry should touch types nor does it need to. The default registry should be written in a type-neutral way, allowing local implementations to define variable types (for operands or option values) as needed/expected by users. An implementation of the number functions in the registry (as an example) could be written entirely in a language that has no number type or integer type and could work entirely in terms of strings, with only strings/literals as input. (That would be weird, but it could happen).

What registry.md defines is what the function is supposed to do; what operands are accepted (which mostly means: is there an accepted literal form?); and what options/option values are accepted (with the values defining what literal forms can be passed--if any--for implementation-defined variable values).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the default registry can be completely type-neutral; it already refers to string types. That's what I mean by "a common type": I can't see how I would rewrite this spec without assuming the implementation had a string type. So why not talk about strings here, given that the formatting spec already says "The resolution of a text or literal MUST resolve to a string"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I turned the text hear upside down and I think it works better: now we presume a decimal integer string and permit other types. See what you think.

@aphillips aphillips requested a review from eemeli March 20, 2024 17:43
@aphillips aphillips requested a review from eemeli March 20, 2024 23:18

A "non-negative integer" is an _option_ value that the _function_ evaluates as an integer
greater than or equal to zero.
Implementations MAY define an upper limit on the resolved value
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to say anything about what implementations should do about the lower bound?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lower bound is always zero.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, yes :) But it seems like we don't say anything about what should happen when the passed value is, in fact, negative? Should functions emit an error? Perhaps we need a sister error to the Operand Mismatch Error, but for invalid options?

Should I file a separate issue about this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalid Option suggests itself as a sub-species of Invalid Expression.

A negative number won't parse correctly about the same way that the string moo or the string $57% does not parse as a non-negative integer. FWIW, 1,234 does not parse according to the rules laid out here nor does 11.0

The reason for upper bounds, of course, is overflow prevention.

Co-authored-by: Stanisław Małolepszy <[email protected]>
@aphillips aphillips requested a review from stasm March 22, 2024 13:46
spec/registry.md Outdated
of a non-negative integer option consistent with that implementation's practical limits.

The implementation MAY accept any implementation-defined type as the value.
Implementations MUST accept a _literal_ as the value.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @eemeli said, this is confusing. Functions aren't passed a data model node, and a literal here could only mean that.

My suggestion is to define this similarly to how the operand of :string is defined:

The operand of :string is either any implementation-defined type that is a string or for which conversion to a string is supported

and say that a (thing that can be the operand of :string) MUST be accepted if it matches non-negative in the ABNF.

Perhaps it would be helpful to have a section at the beginning of the registry spec defining the set of implementation-defined types -- that is, it has to at least include a string type, and then, we could have a name for the set of things that can be converted to strings. From there, it would be possible to give a name to the set of things that can be converted to numbers.

spec/registry.md Outdated
Comment on lines 628 to 629
A "non-negative integer" is an _option_ value that the _function_ evaluates as an integer
greater than or equal to zero.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A "non-negative integer" is an _option_ value that the _function_ evaluates as an integer
greater than or equal to zero.
A "non-negative integer" is an _option_ value that can be coerced to an integer
that is greater than equal to zero,
according to implementation-specific coercion rules.

Maybe this helps address my other comment too: the subsequent text could say that even though the coercion rules are implementation-specific, they MUST at least be defined so that a stringified integer can be coerced to an integer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this text. I originally thought about using the word "coerce" here. But that gets us back into the type game. I also added the bits about "... that the function evaluates..." to make clear that it is not message format that is doing the evaluation/coercion (or at least not-necessarily the MF code).

We could say "implementation-specific coercion rules", although I think we want implementations to accept a sequence of ASCII digits (so that message are portable). It seems dubious that anyone will get this wrong, given that one can have a a number between 0 and maybe around 32 inclusive that produces some kind of meaningful result. (I mean, yeah, technically you can have "max int" digits, but that is a stupendously sized number and won't fit on most screens....... or print on less than a ream of paper 😁). With that in mind, I basically want us to say: "some implementation defined number type with a non-negative value or one of these ASCII digit strings". All else is rubbish.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this text. I originally thought about using the word "coerce" here. But that gets us back into the type game. I also added the bits about "... that the function evaluates..." to make clear that it is not message format that is doing the evaluation/coercion (or at least not-necessarily the MF code).

Right, I think I understand what you were trying to do, but "evaluates" is a bit too vague for my tastes. I understand your objection to "coerce", though.

We could say "implementation-specific coercion rules", although I think we want implementations to accept a sequence of ASCII digits (so that message are portable). It seems dubious that anyone will get this wrong, given that one can have a a number between 0 and maybe around 32 inclusive that produces some kind of meaningful result. (I mean, yeah, technically you can have "max int" digits, but that is a stupendously sized number and won't fit on most screens....... or print on less than a ream of paper 😁). With that in mind, I basically want us to say: "some implementation defined number type with a non-negative value or one of these ASCII digit strings". All else is rubbish.

Well, if we really want these to be finite enumerations, that's a lot easier :) Is there anything wrong with saying that the input MUST be either a string in {"1", "2", ..., "32"} or an integer in the interval [1, 32]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different implementations will have different max values for these options. I didn't want to put a hard definition in. We could have a note that says "expect a small number". I would be surprised, for this particular set of options, if any implementation supported a value with more than two digits, but I don't think we need to prevent it here per-se.

There is a buffer-overflow attack from our syntax. This is valid:

{$n :number maximumFractionDigits=123456789123456789123456789123456789} <- plus more digits if you want

@aphillips
Copy link
Member Author

In the 2024-03-25 call we agreed that we will commit a version of this, after I have updated it with specific edits, especially changing the name to "digit size options"

@aphillips aphillips requested a review from catamorphism March 25, 2024 21:01
@aphillips aphillips requested review from mihnita and eemeli March 25, 2024 21:01
Copy link
Collaborator

@catamorphism catamorphism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment should be taken as advisory.

spec/registry.md Outdated
In most cases, the value of a digit size option will be a string literal that
encodes the value as a decimal integer.
Implementations MAY also accept implementation-defined types as the value.
The _literal_ representation of a digit size option matches the following ABNF:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The _literal_ representation of a digit size option matches the following ABNF:
The textual representation of a digit size option matches the following ABNF:

This looks good except that using literal here still puts one foot on each side of the abstraction barrier between function implementations and MessageFormat. I think what we really want to say is that a valid option is anything that resolves to a string matching digit-size-option. But I think "textual" gets it across.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well... what we really mean is "... anything that resolves to an integer between zero and some number that doesn't exceed 99..."

I used literal here because I mean the literal production in our syntax. Where literal doesn't belong is on line 633. One more fix...

@aphillips aphillips merged commit 2fe2111 into main Mar 26, 2024
@aphillips aphillips deleted the aphillips-number-non-negative-integer-option branch March 26, 2024 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
functions Issue pertains to the default function set LDML45 LDML45 Release (Tech Preview)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants