Skip to content

How to refer to CLDR data on plural rules? #538

@eemeli

Description

@eemeli

As discussed on #534 (comment) and previously, it would be Very Good to be able to make use of the existing CLDR data at least in plurals.xml and ordinals.xml, which effectively describe which locales use which plural categories, and how those are selected. The structure of these files is described in ldmlSupplemental.dtd.

As far as I'm aware, only information for plural and ordinal categories is available with this specific format. For example, the CLDR grammaticalFeatures.xml file has a rather different structure for its presentation of other grammatical features, which may become useful for other formatters and selectors.

As @aphillips notes, "Perhaps we should have a referencing mechanism to CLDR instead of replicating data [for plural matching]."

However, this isn't eminently straightforward, as evidenced by the fact that this hasn't been done yet. In terms of what's theoretically achievable here, we have the following capability levels:

  1. We can just go with the full set of categories with a <match values="zero one two few many other">, which does not require any additional data. We'll need to provide this baseline in any case; everything else filters this to some subset.
  2. If we can make the <plurals type="..."><pluralRules locales="..."><pluralRule count="..."> attribute information available to registry users, they can determine that given a type (cardinal or ordinal) and a locale code, the count attributes of the set of <pluralRule> elements defines the available locales.
  3. If we can parse and process the contents of the <pluralRule> elements, we can further restrict the locales in many cases. For example, we could determine that in English, a numeric selector with minimumFractionDigits=2 will only ever resolve to the other category, or that in Polish an :integer plural selector would only match one, few, or many, and never other.

In order for us to go beyond Level 0 in the core registry definition without actually duplicating data, I think we would need something like an XSL Transform specifically for plural and ordinal data, and a referencing style where we could say something like (syntax only indicative):

<matchRef href="path/to/plurals.xml" transform="path/to/plural-match-mapper.xsl"/>

I'm not sure that there's a reasonable way to extract the @integer / @decimal -ness of the rules with XSLT, to allow for reaching Level 2.

Without the transform itself, we could also leave a SHOULD-ish statement in the description of numerical selectors for tool builders to narrow the full set using CLDR data where appropriate.


We should decide how much of this we consider to be within scope of the core registry, and how much we intend to get done for next Spring's release.

Metadata

Metadata

Assignees

No one assigned

    Labels

    FutureDeferred for future standardizationfunctionsIssue pertains to the default function setquestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions