-
-
Notifications
You must be signed in to change notification settings - Fork 36
Define non-negative integer option #741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@catamorphism identified this as a gap in e.g. #739. This PR defines what a non-negative integer option value is for `:number` and `:integer`. It does not attempt to address the larger question raised in 738/739.
spec/registry.md
Outdated
A "non-negative integer" is one of: | ||
- an implementation-defined integer numeric type where the value of the | ||
encoded number is not less than zero | ||
- a literal consisting only of ASCII digits U+0030 through U+0039; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Saying "literal" here would mean that a variable reference resolving to a numeric string value would not be valid. This is problematic because it would introduce a new requirement for a function implementation to be able to tell apart a literal string value and a variable string value.
- a literal consisting only of ASCII digits U+0030 through U+0039; | |
- a string consisting only of ASCII digits U+0030 through U+0039; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I said "literal" to allow the value to be quoted, e.g. maximumFractionDigits=|11|
I agree that my phrasing wouldn't allow a variable reference, but the item to fix to accomplish that is the first one. Change to follow...
... via a rewrite. Adds an example. Permits upper length limit. Removes any mention of types. Permits an upper numeric amount (e.g. if you're implementation of number formatting is limited to, say, 64 digits, you can make the maximum allowed value `64`)
spec/registry.md
Outdated
If the value is passed as a literal | ||
it MUST contain only the ASCII digits U+0030 through U+0039. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still problematic. Consider these two example messages:
literal: {42 :number minimumFractionDigits=2}
variable: {42 :number minimumFractionDigits=$foo}
With the current proposed language, formatting the first is valid, because there the 2
is a literal, but formatting the second with { foo: '2' }
as input is invalid, because it's a variable, and we're here establishing stricter rules for variables than literals.
The problem with that is that in order to make that differentiation, one of two things must happen:
-
The messageformat runtime implementation must parse the numeric value out of the literal before passing it to the
:number
implementation as an integer, or emitting an error if it doesn't match the expectations. This requires for:number
to somehow communicate to the runtime its expectations on the types of its options, and for types to be introduced to the runtime. I really doubt that we want this. -
The
:number
implementation must be able to tell where an option value comes from, so that it can apply one set of rules to literal values and another to variable values. In practice, with the examples as above, it would mean that a:number
implementation would need to be called withnumber(context, Literal('42'), { minimumFractionDigits: Literal('2') })
for the first message, and with
number(context, Literal('42'), { minimumFractionDigits: Variable('2') })
for the second, so that
Literal('2')
andVariable('2')
could be differentiated.This would be a new, additional requirement on implementations, as without this addition it is possible for a valid runtime to call a
:number
implementation for either message asnumber(context, '42', { minimumFractionDigits: '2' })
So let's not introduce language here or elsewhere that differentiates the treatment of literal vs variable values, because that would add complexity to all custom functions in all implementations. Instead, let's just allow for a variable with a value of '2'
to get treated the exact same as a literal 2
in options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea here is to impose implementation details on implementations of :number
(and :integer
), not on MF. There has to be a clear delineation for message authors about what literal values are accepted. I will try to clarify that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(also, I didn't read "passed as a literal" to only mean in the syntax, but also as the value of a variable)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think there's some confusion because this text isn't defining a syntactic property, it's defining the runtime interface with a function implemented in another language, so saying "must accept a string" doesn't mean you can't write maximumFractionDigits=|11|
in the syntax. Passing in a string of length 4 that begins and ends with a '|' should be an error if an integer option is expected. Function implementations take parsed values (though of course this area is muddy because we avoid saying what a parsed value is.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so saying "must accept a string" doesn't mean you can't write maximumFractionDigits=|11| in the syntax. Passing in a string of length 4 that begins and ends with a '|' should be an error if an integer option is expected
Nah. That thing to the right of the equals sign can't be anything but a literal (or a variable with its $
sigil). That means it can be quoted. We very explicitly say that there is no difference between |11|
and 11
. As far as MF is concerned, all there is to the right of the equals is a literal. It is up to the number function to decide what to do with it.
A literal of length 4 that begins and ends with a |
looks like maxiumumFractionDigits=|\|11\||
in our syntax.
Functions implementations can get typed values via variable references, e.g. maximumFractionDigits=$maxDigits
where $maxDigits
is some kind of int
-like type. Otherwise it must parse a literal. This is "number functions" defining what literals it accepts and what it does with them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe what's confusing is that there are two different parsers:
- The MessageFormat parser takes a MessageFormat string and produces a data model, which includes nodes representing literals in the syntax.
- Each function implementation has its own parser that takes strings (or other implementation-defined types) and produces some implementation-defined type, in this case a number.
The first parser is responsible for stripping the '|' characters, so that the second parser doesn't need to have its own partial implementation of MessageFormat syntax. That's what I meant.
Another way to say this is that we're defining a foreign function interface and that means defining common types across the interface. "String" makes sense as a common type, since most programming languages have a string type. The name "literal" introduces confusion because it has meaning on one side of the interface that doesn't map onto anything on the other side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean. The registry spec is written from the outside in: it talks about what you can have in the message syntax and uses MF's terminology for that. So the first parser is what is in play. You're correct that the number function implementation never sees the |
in your example. It gets the literal string 11
and parses it. That implementation, by the way, is probably glue code that calls the ultimate formatter.
that means defining common types across the interface
I don't agree. I don't think the default registry should touch types nor does it need to. The default registry should be written in a type-neutral way, allowing local implementations to define variable types (for operands or option values) as needed/expected by users. An implementation of the number functions in the registry (as an example) could be written entirely in a language that has no number type or integer type and could work entirely in terms of strings, with only strings/literals as input. (That would be weird, but it could happen).
What registry.md
defines is what the function is supposed to do; what operands are accepted (which mostly means: is there an accepted literal form?); and what options/option values are accepted (with the values defining what literal forms can be passed--if any--for implementation-defined variable values).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the default registry can be completely type-neutral; it already refers to string types. That's what I mean by "a common type": I can't see how I would rewrite this spec without assuming the implementation had a string type. So why not talk about strings here, given that the formatting spec already says "The resolution of a text or literal MUST resolve to a string"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I turned the text hear upside down and I think it works better: now we presume a decimal integer string and permit other types. See what you think.
Co-authored-by: Eemeli Aro <[email protected]>
|
||
A "non-negative integer" is an _option_ value that the _function_ evaluates as an integer | ||
greater than or equal to zero. | ||
Implementations MAY define an upper limit on the resolved value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to say anything about what implementations should do about the lower bound?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lower bound is always zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, yes :) But it seems like we don't say anything about what should happen when the passed value is, in fact, negative? Should functions emit an error? Perhaps we need a sister error to the Operand Mismatch Error
, but for invalid options?
Should I file a separate issue about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Invalid Option
suggests itself as a sub-species of Invalid Expression
.
A negative number won't parse correctly about the same way that the string moo
or the string $57%
does not parse as a non-negative integer. FWIW, 1,234
does not parse according to the rules laid out here nor does 11.0
The reason for upper bounds, of course, is overflow prevention.
Co-authored-by: Stanisław Małolepszy <[email protected]>
spec/registry.md
Outdated
of a non-negative integer option consistent with that implementation's practical limits. | ||
|
||
The implementation MAY accept any implementation-defined type as the value. | ||
Implementations MUST accept a _literal_ as the value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @eemeli said, this is confusing. Functions aren't passed a data model node, and a literal here could only mean that.
My suggestion is to define this similarly to how the operand of :string
is defined:
The operand of :string is either any implementation-defined type that is a string or for which conversion to a string is supported
and say that a (thing that can be the operand of :string
) MUST be accepted if it matches non-negative
in the ABNF.
Perhaps it would be helpful to have a section at the beginning of the registry spec defining the set of implementation-defined types -- that is, it has to at least include a string type, and then, we could have a name for the set of things that can be converted to strings. From there, it would be possible to give a name to the set of things that can be converted to numbers.
spec/registry.md
Outdated
A "non-negative integer" is an _option_ value that the _function_ evaluates as an integer | ||
greater than or equal to zero. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A "non-negative integer" is an _option_ value that the _function_ evaluates as an integer | |
greater than or equal to zero. | |
A "non-negative integer" is an _option_ value that can be coerced to an integer | |
that is greater than equal to zero, | |
according to implementation-specific coercion rules. |
Maybe this helps address my other comment too: the subsequent text could say that even though the coercion rules are implementation-specific, they MUST at least be defined so that a stringified integer can be coerced to an integer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this text. I originally thought about using the word "coerce" here. But that gets us back into the type game. I also added the bits about "... that the function evaluates..." to make clear that it is not message format that is doing the evaluation/coercion (or at least not-necessarily the MF code).
We could say "implementation-specific coercion rules", although I think we want implementations to accept a sequence of ASCII digits (so that message are portable). It seems dubious that anyone will get this wrong, given that one can have a a number between 0
and maybe around 32
inclusive that produces some kind of meaningful result. (I mean, yeah, technically you can have "max int" digits, but that is a stupendously sized number and won't fit on most screens....... or print on less than a ream of paper 😁). With that in mind, I basically want us to say: "some implementation defined number type with a non-negative value or one of these ASCII digit strings". All else is rubbish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this text. I originally thought about using the word "coerce" here. But that gets us back into the type game. I also added the bits about "... that the function evaluates..." to make clear that it is not message format that is doing the evaluation/coercion (or at least not-necessarily the MF code).
Right, I think I understand what you were trying to do, but "evaluates" is a bit too vague for my tastes. I understand your objection to "coerce", though.
We could say "implementation-specific coercion rules", although I think we want implementations to accept a sequence of ASCII digits (so that message are portable). It seems dubious that anyone will get this wrong, given that one can have a a number between
0
and maybe around32
inclusive that produces some kind of meaningful result. (I mean, yeah, technically you can have "max int" digits, but that is a stupendously sized number and won't fit on most screens....... or print on less than a ream of paper 😁). With that in mind, I basically want us to say: "some implementation defined number type with a non-negative value or one of these ASCII digit strings". All else is rubbish.
Well, if we really want these to be finite enumerations, that's a lot easier :) Is there anything wrong with saying that the input MUST be either a string in {"1", "2", ..., "32"}
or an integer in the interval [1, 32]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Different implementations will have different max values for these options. I didn't want to put a hard definition in. We could have a note that says "expect a small number". I would be surprised, for this particular set of options, if any implementation supported a value with more than two digits, but I don't think we need to prevent it here per-se.
There is a buffer-overflow attack from our syntax. This is valid:
{$n :number maximumFractionDigits=123456789123456789123456789123456789} <- plus more digits if you want
In the 2024-03-25 call we agreed that we will commit a version of this, after I have updated it with specific edits, especially changing the name to "digit size options" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comment should be taken as advisory.
spec/registry.md
Outdated
In most cases, the value of a digit size option will be a string literal that | ||
encodes the value as a decimal integer. | ||
Implementations MAY also accept implementation-defined types as the value. | ||
The _literal_ representation of a digit size option matches the following ABNF: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _literal_ representation of a digit size option matches the following ABNF: | |
The textual representation of a digit size option matches the following ABNF: |
This looks good except that using literal here still puts one foot on each side of the abstraction barrier between function implementations and MessageFormat. I think what we really want to say is that a valid option is anything that resolves to a string matching digit-size-option
. But I think "textual" gets it across.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well... what we really mean is "... anything that resolves to an integer between zero and some number that doesn't exceed 99..."
I used literal here because I mean the literal production in our syntax. Where literal doesn't belong is on line 633. One more fix...
@catamorphism identified this as a gap in e.g. #739.
This PR defines what a non-negative integer option value is for
:number
and:integer
. It does not attempt to address the larger question raised in 738/739.