Define non-negative integer option #741

aphillips · 2024-03-19T19:57:17Z

@catamorphism identified this as a gap in e.g. #739.

This PR defines what a non-negative integer option value is for :number and :integer. It does not attempt to address the larger question raised in 738/739.

@catamorphism

@catamorphism identified this as a gap in e.g. #739. This PR defines what a non-negative integer option value is for `:number` and `:integer`. It does not attempt to address the larger question raised in 738/739.

eemeli · 2024-03-19T21:28:49Z

spec/registry.md

+A "non-negative integer" is one of:
+- an implementation-defined integer numeric type where the value of the 
+  encoded number is not less than zero
+- a literal consisting only of ASCII digits U+0030 through U+0039;


Saying "literal" here would mean that a variable reference resolving to a numeric string value would not be valid. This is problematic because it would introduce a new requirement for a function implementation to be able to tell apart a literal string value and a variable string value.

Suggested change

- a literal consisting only of ASCII digits U+0030 through U+0039;

- a string consisting only of ASCII digits U+0030 through U+0039;

I said "literal" to allow the value to be quoted, e.g. maximumFractionDigits=|11|

I agree that my phrasing wouldn't allow a variable reference, but the item to fix to accomplish that is the first one. Change to follow...

... via a rewrite. Adds an example. Permits upper length limit. Removes any mention of types. Permits an upper numeric amount (e.g. if you're implementation of number formatting is limited to, say, 64 digits, you can make the maximum allowed value `64`)

spec/registry.md

eemeli · 2024-03-20T08:03:23Z

spec/registry.md

+If the value is passed as a literal
+it MUST contain only the ASCII digits U+0030 through U+0039.


This is still problematic. Consider these two example messages:

literal: {42 :number minimumFractionDigits=2}

variable: {42 :number minimumFractionDigits=$foo}

With the current proposed language, formatting the first is valid, because there the 2 is a literal, but formatting the second with { foo: '2' } as input is invalid, because it's a variable, and we're here establishing stricter rules for variables than literals.

The problem with that is that in order to make that differentiation, one of two things must happen:

The messageformat runtime implementation must parse the numeric value out of the literal before passing it to the :number implementation as an integer, or emitting an error if it doesn't match the expectations. This requires for :number to somehow communicate to the runtime its expectations on the types of its options, and for types to be introduced to the runtime. I really doubt that we want this.

The :number implementation must be able to tell where an option value comes from, so that it can apply one set of rules to literal values and another to variable values. In practice, with the examples as above, it would mean that a :number implementation would need to be called with

number(context, Literal('42'), { minimumFractionDigits: Literal('2') })

for the first message, and with

number(context, Literal('42'), { minimumFractionDigits: Variable('2') })

for the second, so that Literal('2') and Variable('2') could be differentiated.

This would be a new, additional requirement on implementations, as without this addition it is possible for a valid runtime to call a :number implementation for either message as

number(context, '42', { minimumFractionDigits: '2' })

So let's not introduce language here or elsewhere that differentiates the treatment of literal vs variable values, because that would add complexity to all custom functions in all implementations. Instead, let's just allow for a variable with a value of '2' to get treated the exact same as a literal 2 in options.

The idea here is to impose implementation details on implementations of :number (and :integer), not on MF. There has to be a clear delineation for message authors about what literal values are accepted. I will try to clarify that.

(also, I didn't read "passed as a literal" to only mean in the syntax, but also as the value of a variable)

I still think there's some confusion because this text isn't defining a syntactic property, it's defining the runtime interface with a function implemented in another language, so saying "must accept a string" doesn't mean you can't write maximumFractionDigits=|11| in the syntax. Passing in a string of length 4 that begins and ends with a '|' should be an error if an integer option is expected. Function implementations take parsed values (though of course this area is muddy because we avoid saying what a parsed value is.)

so saying "must accept a string" doesn't mean you can't write maximumFractionDigits=|11| in the syntax. Passing in a string of length 4 that begins and ends with a '|' should be an error if an integer option is expected

Nah. That thing to the right of the equals sign can't be anything but a literal (or a variable with its $ sigil). That means it can be quoted. We very explicitly say that there is no difference between |11| and 11. As far as MF is concerned, all there is to the right of the equals is a literal. It is up to the number function to decide what to do with it.

A literal of length 4 that begins and ends with a | looks like maxiumumFractionDigits=|\|11\|| in our syntax.

Functions implementations can get typed values via variable references, e.g. maximumFractionDigits=$maxDigits where $maxDigits is some kind of int-like type. Otherwise it must parse a literal. This is "number functions" defining what literals it accepts and what it does with them.

Maybe what's confusing is that there are two different parsers:

The MessageFormat parser takes a MessageFormat string and produces a data model, which includes nodes representing literals in the syntax.

Each function implementation has its own parser that takes strings (or other implementation-defined types) and produces some implementation-defined type, in this case a number.

The first parser is responsible for stripping the '|' characters, so that the second parser doesn't need to have its own partial implementation of MessageFormat syntax. That's what I meant.

Another way to say this is that we're defining a foreign function interface and that means defining common types across the interface. "String" makes sense as a common type, since most programming languages have a string type. The name "literal" introduces confusion because it has meaning on one side of the interface that doesn't map onto anything on the other side.

I see what you mean. The registry spec is written from the outside in: it talks about what you can have in the message syntax and uses MF's terminology for that. So the first parser is what is in play. You're correct that the number function implementation never sees the | in your example. It gets the literal string 11 and parses it. That implementation, by the way, is probably glue code that calls the ultimate formatter.

that means defining common types across the interface

I don't agree. I don't think the default registry should touch types nor does it need to. The default registry should be written in a type-neutral way, allowing local implementations to define variable types (for operands or option values) as needed/expected by users. An implementation of the number functions in the registry (as an example) could be written entirely in a language that has no number type or integer type and could work entirely in terms of strings, with only strings/literals as input. (That would be weird, but it could happen).

What registry.md defines is what the function is supposed to do; what operands are accepted (which mostly means: is there an accepted literal form?); and what options/option values are accepted (with the values defining what literal forms can be passed--if any--for implementation-defined variable values).

I don't think the default registry can be completely type-neutral; it already refers to string types. That's what I mean by "a common type": I can't see how I would rewrite this spec without assuming the implementation had a string type. So why not talk about strings here, given that the formatting spec already says "The resolution of a text or literal MUST resolve to a string"?

I turned the text hear upside down and I think it works better: now we presume a decimal integer string and permit other types. See what you think.

spec/registry.md

Co-authored-by: Eemeli Aro <[email protected]>

spec/registry.md

stasm · 2024-03-22T12:29:17Z

spec/registry.md

+
+A "non-negative integer" is an _option_ value that the _function_ evaluates as an integer 
+greater than or equal to zero.
+Implementations MAY define an upper limit on the resolved value 


Do we want to say anything about what implementations should do about the lower bound?

The lower bound is always zero.

Well, yes :) But it seems like we don't say anything about what should happen when the passed value is, in fact, negative? Should functions emit an error? Perhaps we need a sister error to the Operand Mismatch Error, but for invalid options?

Should I file a separate issue about this?

Invalid Option suggests itself as a sub-species of Invalid Expression.

A negative number won't parse correctly about the same way that the string moo or the string $57% does not parse as a non-negative integer. FWIW, 1,234 does not parse according to the rules laid out here nor does 11.0

The reason for upper bounds, of course, is overflow prevention.

Co-authored-by: Stanisław Małolepszy <[email protected]>

catamorphism · 2024-03-22T20:11:01Z

spec/registry.md

+of a non-negative integer option consistent with that implementation's practical limits.
+
+The implementation MAY accept any implementation-defined type as the value.
+Implementations MUST accept a _literal_ as the value.


As @eemeli said, this is confusing. Functions aren't passed a data model node, and a literal here could only mean that.

My suggestion is to define this similarly to how the operand of :string is defined:

The operand of :string is either any implementation-defined type that is a string or for which conversion to a string is supported

and say that a (thing that can be the operand of :string) MUST be accepted if it matches non-negative in the ABNF.

Perhaps it would be helpful to have a section at the beginning of the registry spec defining the set of implementation-defined types -- that is, it has to at least include a string type, and then, we could have a name for the set of things that can be converted to strings. From there, it would be possible to give a name to the set of things that can be converted to numbers.

catamorphism · 2024-03-22T20:13:55Z

spec/registry.md

+A "non-negative integer" is an _option_ value that the _function_ evaluates as an integer 
+greater than or equal to zero.


Suggested change

A "non-negative integer" is an _option_ value that the _function_ evaluates as an integer

greater than or equal to zero.

A "non-negative integer" is an _option_ value that can be coerced to an integer

that is greater than equal to zero,

according to implementation-specific coercion rules.

Maybe this helps address my other comment too: the subsequent text could say that even though the coercion rules are implementation-specific, they MUST at least be defined so that a stringified integer can be coerced to an integer.

I like this text. I originally thought about using the word "coerce" here. But that gets us back into the type game. I also added the bits about "... that the function evaluates..." to make clear that it is not message format that is doing the evaluation/coercion (or at least not-necessarily the MF code).

We could say "implementation-specific coercion rules", although I think we want implementations to accept a sequence of ASCII digits (so that message are portable). It seems dubious that anyone will get this wrong, given that one can have a a number between 0 and maybe around 32 inclusive that produces some kind of meaningful result. (I mean, yeah, technically you can have "max int" digits, but that is a stupendously sized number and won't fit on most screens....... or print on less than a ream of paper 😁). With that in mind, I basically want us to say: "some implementation defined number type with a non-negative value or one of these ASCII digit strings". All else is rubbish.

I like this text. I originally thought about using the word "coerce" here. But that gets us back into the type game. I also added the bits about "... that the function evaluates..." to make clear that it is not message format that is doing the evaluation/coercion (or at least not-necessarily the MF code).

Right, I think I understand what you were trying to do, but "evaluates" is a bit too vague for my tastes. I understand your objection to "coerce", though.

We could say "implementation-specific coercion rules", although I think we want implementations to accept a sequence of ASCII digits (so that message are portable). It seems dubious that anyone will get this wrong, given that one can have a a number between 0 and maybe around 32 inclusive that produces some kind of meaningful result. (I mean, yeah, technically you can have "max int" digits, but that is a stupendously sized number and won't fit on most screens....... or print on less than a ream of paper 😁). With that in mind, I basically want us to say: "some implementation defined number type with a non-negative value or one of these ASCII digit strings". All else is rubbish.

Well, if we really want these to be finite enumerations, that's a lot easier :) Is there anything wrong with saying that the input MUST be either a string in {"1", "2", ..., "32"} or an integer in the interval [1, 32]?

Different implementations will have different max values for these options. I didn't want to put a hard definition in. We could have a note that says "expect a small number". I would be surprised, for this particular set of options, if any implementation supported a value with more than two digits, but I don't think we need to prevent it here per-se.

There is a buffer-overflow attack from our syntax. This is valid:

{$n :number maximumFractionDigits=123456789123456789123456789123456789} <- plus more digits if you want

aphillips · 2024-03-25T18:09:11Z

In the 2024-03-25 call we agreed that we will commit a version of this, after I have updated it with specific edits, especially changing the name to "digit size options"

catamorphism

My comment should be taken as advisory.

catamorphism · 2024-03-25T21:27:25Z

spec/registry.md

+In most cases, the value of a digit size option will be a string literal that
+encodes the value as a decimal integer.
+Implementations MAY also accept implementation-defined types as the value.
+The _literal_ representation of a digit size option matches the following ABNF:


Suggested change

The _literal_ representation of a digit size option matches the following ABNF:

The textual representation of a digit size option matches the following ABNF:

This looks good except that using literal here still puts one foot on each side of the abstraction barrier between function implementations and MessageFormat. I think what we really want to say is that a valid option is anything that resolves to a string matching digit-size-option. But I think "textual" gets it across.

Well... what we really mean is "... anything that resolves to an integer between zero and some number that doesn't exceed 99..."

I used literal here because I mean the literal production in our syntax. Where literal doesn't belong is on line 633. One more fix...

Define non-negative integer option

7e43bb5

@catamorphism identified this as a gap in e.g. #739. This PR defines what a non-negative integer option value is for `:number` and `:integer`. It does not attempt to address the larger question raised in 738/739.

aphillips added functions Issue pertains to the default function set LDML45 LDML45 Release (Tech Preview) labels Mar 19, 2024

aphillips requested review from stasm, catamorphism, eemeli, mihnita and macchiati March 19, 2024 19:57

aphillips mentioned this pull request Mar 19, 2024

Remove tests that rely on nonexistent bad-option error #740

Closed

eemeli requested changes Mar 19, 2024

View reviewed changes

Address @eemelie's comment

044ed78

... via a rewrite. Adds an example. Permits upper length limit. Removes any mention of types. Permits an upper numeric amount (e.g. if you're implementation of number formatting is limited to, say, 64 digits, you can make the maximum allowed value `64`)

aphillips requested a review from eemeli March 19, 2024 23:45

aphillips commented Mar 19, 2024

View reviewed changes

spec/registry.md Outdated Show resolved Hide resolved

Update spec/registry.md

f9bd831

eemeli requested changes Mar 20, 2024

View reviewed changes

Address comments

af10060

aphillips requested a review from eemeli March 20, 2024 17:43

eemeli requested changes Mar 20, 2024

View reviewed changes

spec/registry.md Outdated Show resolved Hide resolved

spec/registry.md Outdated Show resolved Hide resolved

aphillips and others added 2 commits March 20, 2024 16:18

Update spec/registry.md

f002e44

Update spec/registry.md

b23aaec

Co-authored-by: Eemeli Aro <[email protected]>

aphillips requested a review from eemeli March 20, 2024 23:18

eemeli approved these changes Mar 20, 2024

View reviewed changes

stasm reviewed Mar 22, 2024

View reviewed changes

Update spec/registry.md

94db1cb

Co-authored-by: Stanisław Małolepszy <[email protected]>

aphillips requested a review from stasm March 22, 2024 13:46

catamorphism requested changes Mar 22, 2024

View reviewed changes

mihnita approved these changes Mar 25, 2024

View reviewed changes

Implement changes discussed in 2024-03-25 call

632f6a6

aphillips requested a review from catamorphism March 25, 2024 21:01

aphillips requested review from mihnita and eemeli March 25, 2024 21:01

catamorphism approved these changes Mar 25, 2024

View reviewed changes

Address comments

dd6cada

eemeli approved these changes Mar 26, 2024

View reviewed changes

aphillips merged commit 2fe2111 into main Mar 26, 2024

aphillips deleted the aphillips-number-non-negative-integer-option branch March 26, 2024 16:59

	- a literal consisting only of ASCII digits U+0030 through U+0039;
	- a string consisting only of ASCII digits U+0030 through U+0039;

		If the value is passed as a literal
		it MUST contain only the ASCII digits U+0030 through U+0039.

		A "non-negative integer" is an _option_ value that the _function_ evaluates as an integer
		greater than or equal to zero.

-A "non-negative integer" is an _option_ value that the _function_ evaluates as an integer
-greater than or equal to zero.
+A "non-negative integer" is an _option_ value that can be coerced to an integer
+that is greater than equal to zero,
+according to implementation-specific coercion rules.

	The _literal_ representation of a digit size option matches the following ABNF:
	The textual representation of a digit size option matches the following ABNF:

Uh oh!

Define non-negative integer option #741

Define non-negative integer option #741

Uh oh!

Conversation

aphillips commented Mar 19, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

catamorphism Mar 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aphillips commented Mar 25, 2024

Uh oh!

catamorphism left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

catamorphism Mar 22, 2024 •

edited

Loading