Skip to content

Add column-first pattern selection #372

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 22, 2023

Conversation

eemeli
Copy link
Collaborator

@eemeli eemeli commented Mar 29, 2023

This is an attempt to explicitly document how case/variant selection happens with messages that have a when selector statement. The method presented here implements the "column-first" selection method that was agreed upon at the last meeting.

A JS implementation of the algorithm is available; see #369 for more information on that.

The overall intent is to minimally but sufficiently define selection, such that two implementations that use similar custom selector functions will make the same selection, when given the same input message and formatting context. In a number of places details are left for each implementation to fill in for themselves, as each may have a different internal representations of resolved and unresolved values and may perform value matching in different ways.

By necessity, the method definition needs to use more formal language than what we have so far in the spec. For that, I've borrowed some of the conventions of the TC39 spec and hope that it's sufficiently readable as is, without us needing separate definitions of e.g. what a "list" is.

Copy link
Member

@aphillips aphillips left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm struggling to understand this algorithm, partially for wording reasons and partially because I think that you and I think about matching differently. This seems to do what I think it should do, but may (as noted in my comments) restrict how selectors can be implemented or might have some quirks with unbounded value spaces. Have a look and see what you think.

Comment on lines +27 to +33
When a message has a single _selector_,
an implementation-defined method compares each key to the _selector_
and determines which of the keys match, and in what order of preference.
A catch-all key will always match, but is always the least preferred choice.
During selection, the _variant_ with the best-matching key is selected.

In a message with more than one _selector_,
Copy link
Member

@aphillips aphillips Mar 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is clearer to think of a selector as a function. Then one can describe single-select and the later mulitple-select more generically. I think the single-vs-multiple is a red herring as well: they should work identically. Perhaps:

When a message has selectors, the implementation needs to select a single variant
to serve as the pattern. To do this, each selector in turn is passed the complete list of
keys in the remaining list of variants and returns a filtered and sorted list of variants.
When no variant matches, the implementation emits an error and nomatch is returned.

The catch-all key * MUST always match, but is always the least preferred choice.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent with these paragraphs is to provide an easier-to-understand but accurate representation of the selection method than what's detailed in the subsequent sections. The split into two cases (N=1, then N>1) is intended to first explain the more common and more straightforward situation, before extending to the more complex.

I think it is clearer to think of a selector as a function. Then one can describe single-select and the later mulitple-select more generically.

To me, this doesn't quite match with our current definition of the word:

A selector is a statement containing one or more expressions which will be used to choose one of the variants during formatting.

If accepted, this is modified by PR #371 to read:

A match statement contains one or more selectors which will be used to choose one of the variants during formatting.

I read this as describing a selector primarily as a data structure rather than a function, which means that until we actually define such a function or other construct, we can't use it for further explanations.

I think the single-vs-multiple is a red herring as well: they should work identically.

They do work identically; it's just that with only one we don't need to combine multiple preference orders, which is the main source of our complexity here.


### Resolve Selectors

First, resolve the values of each _selector_:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this? What are the "values" of a selector?

I think a selector has two inputs: an expression (either a variable or a literal) and a list of keys. Resolving the "value" of the expression seems to be early-binding in a way that isn't required to implement the spec. Or is "value" something else?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to play with the semantics here, but what I intend the "resolved value" here to mean is probably pretty close to your understanding of a selector as a function. For example, let's say that we start from the following syntax representation of a selector:

{$count :plural minimumFractionDigits=2}

This may be parsed e.g. into the following data model representation, which we can still claim to represent the selector:

{
  type: 'expression',
  name: 'plural',
  operand: { type: 'variable', name: 'count' },
  options: [
    { name: 'minimumFractionDigits', value: { type: 'nmtoken', value: '2' } }
  ]
}

For the sake of simplicity, let's say that $count may resolve to an externally provided integer value 42. With that, the resolution could in practice be done by calling the handler for :plural that's defined in our registry:

let rv = pluralHandler(42, { minimumFractionDigits: '2'})

The important bit here is that rv doesn't need to be a concrete formatted value like '42.00', but it could be something like:

function matchPlurals(
  pr: Intl.PluralRules,
  value: number,
  keys: Set<string>
): string[] {
  const matches: string[] = [];
  const str = String(value);
  if (keys.has(str)) matches.push(str);
  const cat = pr.select(value);
  if (keys.has(cat)) matches.push(cat);

  return matches;
}

// The `bind()` returns a curried function with a single argument `keys`
let rv = matchPlurals.bind(null,
  new Intl.PluralRules('en', { minimumFractionDigits: 2 }),
  42
);

So while this does do a little bit of early binding for resolving $count and :plural, as well as constructing a PluralRules instance, it still leaves the actual selection to be performed later. This is ultimately necessary to ensure that we emit the right errors at the right time.

For example, if the value given as the minimumFractionDigits option were instead -1, the new Intl.PluralRules() call would have failed with a range error. Our selection method should ensure that an error for this is emitted exactly once, and at an appropriate time compared to errors in any other selector.

It's important to note that this doesn't lead to the requirement that an implementation must be as eager in its resolution, as long as its observable behaviour (including error emission) matches the method presented here.


### Resolve Preferences

Next, using `res`, resolve the preferential order for all message keys:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does "preferential order" mean here? Can you clarify what you mean?

I think you're asking the selector to resolve all possible values in their preferred order. This is fine for a selector with an enumerated set of values (like plural rules), but limiting for implementations that might be somewhat dynamic (like plural explicit values, where the infinite set of numbers might be valid key values)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Next, using `res`, resolve the preferential order for all message keys:
Next, using `res`, resolve the preferential order for all variant keys:

The "all" here is meant to refer to all the keys in the current message, rather than all possible keys. Would "all variant keys" communicate that better? If so, please feel free to commit this change.

Comment on lines +83 to +88
The method MatchSelectorKeys is determined by the implementation.
It takes as arguments a resolved _selector_ value `rv` and a list of string keys `keys`,
and returns a list of string keys in preferential order.
The returned list must only contain unique elements of the input list `keys`,
and it may be empty.
The most preferred key is first, with the rest sorted by decreasing preference.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be phrased oddly?

As I noted above, I'd rather think of the selector as having the burden of matching, not "the implementation"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The odd phrasing is meant to allow for an implementation to not be limited in the shape that it ends up using for the resolved values of its selectors. If it's a callable function, then you could have:

function MatchSelectorKeys(rv, keys) {
  return rv(keys);
}

or if it's an instance of a class with a known shape, you could have:

function MatchSelectorKeys(rv, keys) {
  return rv.selectKeys(keys);
}

or if it's a pure data structure of some sort, you could have:

function MatchSelectorKeys(rv, keys) {
  const sel = getSelectorFunction(rv);
  return sel(keys);
}

In different situations, each of those could be the best choice for an implementation, and we should not require that an implementation matches any one of them exactly.

eemeli and others added 2 commits April 3, 2023 11:34
Co-authored-by: Addison Phillips <[email protected]>
@aphillips aphillips merged commit cc93bf6 into unicode-org:main May 22, 2023
@eemeli eemeli deleted the first-col-selection branch May 22, 2023 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants