diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md index d1399137fb..a82e1ac4f1 100644 --- a/.github/ISSUE_TEMPLATE/feature_request.md +++ b/.github/ISSUE_TEMPLATE/feature_request.md @@ -1,10 +1,9 @@ --- name: Feature request about: Suggest an idea or feature for Message Format -title: '' -labels: '' -assignees: '' - +title: "" +labels: "" +assignees: "" --- **Is your feature request related to a problem? Please describe.** diff --git a/.github/workflows/prettier.yaml b/.github/workflows/prettier.yaml new file mode 100644 index 0000000000..3d28c66b60 --- /dev/null +++ b/.github/workflows/prettier.yaml @@ -0,0 +1,30 @@ +name: Apply Prettier style + +on: + push: + paths: + - "**.md" + +permissions: + contents: write + pull-requests: write + +jobs: + prettier: + runs-on: ubuntu-latest + + steps: + - uses: actions/checkout@v3 + - uses: actions/setup-node@v3 + with: + node-version: "20.x" + - run: npm install --no-save prettier@3 + - run: npx prettier --write . + - name: git config + run: | + git config user.name "github-actions[bot]" + git config user.email "41898282+github-actions[bot]@users.noreply.github.com" + - run: git add . + - name: git commit & push any changes + run: | + git diff-index --quiet HEAD || (git commit -m "style: Apply Prettier" && git push) diff --git a/.gitignore b/.gitignore index 496ee2ca6a..646ac519ef 100644 --- a/.gitignore +++ b/.gitignore @@ -1 +1,2 @@ -.DS_Store \ No newline at end of file +.DS_Store +node_modules/ diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index bdb9b39eff..7d9260c81e 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -10,6 +10,7 @@ the information on the Contributor License Agreement below. In addition, you sho 2. Watch this repository (use the "Watch" button in the upper right corner) + ## Contributor License Agreement In order to contribute to this project, the Unicode Consortium must have on file a Contributor License Agreement (CLA) covering your contributions, either an individual or a corporate CLA. Pull Requests will not be merged until the correct CLA is signed. Which version needs to be signed depends on who owns the contribution being made: you as the individual making the contribution or your employer. _It is your responsibility to determine whether your contribution is owned by your employer._ Please review [The Unicode Consortium Intellectual Property, Licensing, and Technical Contribution Policies][policies] for further guidance on which CLA to sign, as well as other information and guidelines regarding the Consortium’s licensing and technical contribution policies and procedures. @@ -20,7 +21,6 @@ In order to contribute to this project, the Unicode Consortium must have on file Unless otherwise noted in the LICENSE file, this project is released under the free and open-source [Unicode License][unicode-license], also known as Unicode, Inc. License Agreement - Data Files and Software. - [policies]: https://www.unicode.org/policies/licensing_policy.html [unicode-corporate-clas]: https://www.unicode.org/policies/corporate-cla-list/ [signing]: https://www.unicode.org/policies/licensing_policy.html#signing diff --git a/README.md b/README.md index 9393c1abc7..976bae3510 100644 --- a/README.md +++ b/README.md @@ -56,15 +56,15 @@ See more examples and the formal definition of the grammar in [spec/syntax.md](. ### Implementations -* Java: [`com.ibm.icu.message2`](https://unicode-org.github.io/icu-docs/apidoc/dev/icu4j/index.html?com/ibm/icu/message2/package-summary.html), part of ICU 72 released in October 2022, is a _tech preview_ implementation of the MessageFormat 2 syntax, together with a formatting API. See the [ICU User Guide](https://unicode-org.github.io/icu/userguide/format_parse/messages/mf2.html) for examples and a quickstart guide. -* JavaScript: [`messageformat`](https://github.com/messageformat/messageformat/tree/master/packages/mf2-messageformat) 4.0 implements the MessageFormat 2 syntax, together with a polyfill of the runtime API proposed for ECMA-402. +- Java: [`com.ibm.icu.message2`](https://unicode-org.github.io/icu-docs/apidoc/dev/icu4j/index.html?com/ibm/icu/message2/package-summary.html), part of ICU 72 released in October 2022, is a _tech preview_ implementation of the MessageFormat 2 syntax, together with a formatting API. See the [ICU User Guide](https://unicode-org.github.io/icu/userguide/format_parse/messages/mf2.html) for examples and a quickstart guide. +- JavaScript: [`messageformat`](https://github.com/messageformat/messageformat/tree/master/packages/mf2-messageformat) 4.0 implements the MessageFormat 2 syntax, together with a polyfill of the runtime API proposed for ECMA-402. ## Sharing Feedback We invite feedback about the current syntax draft, as well as the real-life use-cases, requirements, tooling, runtime APIs, localization workflows, and other topics. -* General questions and thoughts → [post a discussion thread](https://github.com/unicode-org/message-format-wg/discussions). -* Actionable feedback (bugs, feature requests) → [file a new issue](https://github.com/unicode-org/message-format-wg/issues). +- General questions and thoughts → [post a discussion thread](https://github.com/unicode-org/message-format-wg/discussions). +- Actionable feedback (bugs, feature requests) → [file a new issue](https://github.com/unicode-org/message-format-wg/issues). ## Participation diff --git a/delegates.md b/delegates.md index b4129cff3c..6d4c202d43 100644 --- a/delegates.md +++ b/delegates.md @@ -1,11 +1,10 @@ -MessageFormat Working Group Delegates -===================================== +# MessageFormat Working Group Delegates MessageFormat Working Group uses the standard TC39 note-taking procedures, including unique abbreviations for all delegates. ## Acronym Conventions -With the exception of certain grandfathered delegates, all delegates should have three-letter abbreviations. The following scheme is recommended: +With the exception of certain grandfathered delegates, all delegates should have three-letter abbreviations. The following scheme is recommended: - Suggestion 1: First letter of given name, First letter of surname, Last letter of surname. - Example: Gordon Moore → GME @@ -27,7 +26,7 @@ Please include your primary affiliation (e.g., the company you represent or wher - Eemeli Aro - OpenJSF & Vincit (EAO) - Elango Cheran - Google (ECH) - George Rhoten - Apple (GWR) -- Jan Mühlemann - Locize (JMU) +- Jan Mühlemann - Locize (JMU) - Janne Tynkkynen - PayPal (JMT) - Jeff Genovy - Microsoft (JMG) - John Watson - Facebook (JRW) diff --git a/docs/chair-group.md b/docs/chair-group.md index 6bbe556040..84a958bf0c 100644 --- a/docs/chair-group.md +++ b/docs/chair-group.md @@ -13,10 +13,9 @@ The Chair Group is a representative body of the Message Format WG that collectiv - Manage and accept WG applications. - Manage and organize all WG communication channels, email, mail group, boards and Github repositories. - - Prioritize, label and organize the tasks of the WG. -- Use the possibility of e-mail or synchronization meetings to discuss issues or organize WG work. +- Use the possibility of e-mail or synchronization meetings to discuss issues or organize WG work. - Create and update events and agenda. @@ -25,7 +24,6 @@ The Chair Group is a representative body of the Message Format WG that collectiv - Take care of the minutes of the meetings, including recording of all decisions. The minutes and presentations given in WG should be available in the WG Drive. - Prepare technical drafts and documentation. - - Formulate concrete questions that the WG can answer in their monthly meetings. #### Participate diff --git a/docs/consensus_decisions.md b/docs/consensus_decisions.md index ba972c733b..4e6551bf9d 100644 --- a/docs/consensus_decisions.md +++ b/docs/consensus_decisions.md @@ -19,7 +19,7 @@ For more details on the process that lead to these decisions, please refer to th - **Consensus 5 & 6:** The solution for [issue #127](https://github.com/unicode-org/message-format-wg/issues/127). Codified in [issue #137](https://github.com/unicode-org/message-format-wg/issues/137) during the [January 2021 meeting](https://github.com/unicode-org/message-format-wg/issues/146) of the working group. - Discussed and accepted at the [February 2021 meeting](https://github.com/unicode-org/message-format-wg/blob/HEAD/meetings/2021/notes-2021-02-15.md) of the working group. + Discussed and accepted at the [February 2021 meeting](https://github.com/unicode-org/message-format-wg/blob/HEAD/meetings/2021/notes-2021-02-15.md) of the working group. - **Consensus 7:** Discussed at the [22 September 2021 meeting](https://github.com/unicode-org/message-format-wg/issues/196) of the working group. diff --git a/docs/contributing-to-agenda.md b/docs/contributing-to-agenda.md index 46c1b5be09..7c9d359ff8 100644 --- a/docs/contributing-to-agenda.md +++ b/docs/contributing-to-agenda.md @@ -1,17 +1,17 @@ ## Contributing to Meetings ### Prepare Content - - Well ahead of the MFWG plenary meeting, produce and publish documentation about your proposal/idea, slides, design document, Github issue or any other relevant material, to be shared beforehand and presented during the meeting. + +- Well ahead of the MFWG plenary meeting, produce and publish documentation about your proposal/idea, slides, design document, Github issue or any other relevant material, to be shared beforehand and presented during the meeting. ### Getting on the agenda + To propose a presentation/ time slot in MFWG plenary meetings: - If there is not already an issue describing your topic, create one. - > Include all relevant information about the topic including necessary documentation. - + > Include all relevant information about the topic including necessary documentation. - Add the label `Agenda+` to the issue. - Write to the [group email](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) asking the chair to add your item to the agenda _at least_ 48 hours in advance of the next call. More time is better. Note that the chair may choose to defer your item until a later meeting or to "future". Be sure to indicate any time restrictions you have. - -- Watch for the chair to update the [official agenda](https://github.com/unicode-org/message-format-wg/blob/main/meetings/agenda.md). - - Notes for each meeting can be found by date under [this link](https://github.com/unicode-org/message-format-wg/tree/main/meetings) +- Watch for the chair to update the [official agenda](https://github.com/unicode-org/message-format-wg/blob/main/meetings/agenda.md). + - Notes for each meeting can be found by date under [this link](https://github.com/unicode-org/message-format-wg/tree/main/meetings) diff --git a/docs/decision-process.md b/docs/decision-process.md index 7c3f0fa0f9..4e0e8b5b5c 100644 --- a/docs/decision-process.md +++ b/docs/decision-process.md @@ -1,22 +1,23 @@ # Decision making process - ## Definitions -1. *Consensus* is defined as lack of sustained opposition. -2. *Good standing* is a characteristic of group members who fullfill their implicit and explicit obligations and hence are allowed to excercise all of their rights within the group without restriction. -3. *Proscribe* (*proscribe*, *proscribed*, *proscription*) is a taxative enumeration of group members who are temporarily excluded from exercising their rights within the group. -4. *Normative keywords* as defined in [BCP 14](https://tools.ietf.org/html/bcp14) + +1. _Consensus_ is defined as lack of sustained opposition. +2. _Good standing_ is a characteristic of group members who fullfill their implicit and explicit obligations and hence are allowed to excercise all of their rights within the group without restriction. +3. _Proscribe_ (_proscribe_, _proscribed_, _proscription_) is a taxative enumeration of group members who are temporarily excluded from exercising their rights within the group. +4. _Normative keywords_ as defined in [BCP 14](https://tools.ietf.org/html/bcp14) ## Rules + 1. Any current group member is deemed to be in good standing unless proscribed. 2. Any group member in good standing MAY make a proposal to the group via - - raising an issue, - - PR (against the repo or wiki), or - - orally in a monthly group meeting. + - raising an issue, + - PR (against the repo or wiki), or + - orally in a monthly group meeting. 3. Approval or rejection of proposed solutions and decisions SHOULD be driven by consensus. 4. Consensus MAY be reached as part of the PR or issue resolution process. 5. The monthly group meeting has the ultimate decision making authority. - - Chair Group or Chair don't have a decision making authority, see [chair-group.md](chair-group.md) and [chair-group-members.md](chair-group-members.md). + - Chair Group or Chair don't have a decision making authority, see [chair-group.md](chair-group.md) and [chair-group-members.md](chair-group-members.md). 6. In case consensus cannot be found over multiple iterations of arguments and counter arguments, a monthly group meeting MAY reach consensus to mandate the Chair Group to organize a ballot among all group members in good standing. The ballot wording, options, and success criteria SHOULD be explicitely defined in a monthly meeting. The Chair Group only administers and implements the ballot and its results. 7. Proscription procedure is TBD by monthly meeting consensus if and when needed. diff --git a/docs/glossary-and-resources.md b/docs/glossary-and-resources.md index 7b933f12dc..5244758303 100644 --- a/docs/glossary-and-resources.md +++ b/docs/glossary-and-resources.md @@ -2,83 +2,82 @@ ### Basic terms for localization -* **language** - a system of communication used by a particular country or community. [ISO 639](https://en.wikipedia.org/wiki/ISO_639) is the main standard used to define language codes. -* **locale** - the implementation of a language in a given market, including formatting (numbers, date, etc), common expressions and cultural differences. For example, French France (fr-FR) is different than French Canada (fr-CA). [BCP 47](https://en.wikipedia.org/wiki/IETF_language_tag) is the main standard used to define locale codes. -* **Internationalization (i18n)**: a set of best practices and design process that ensures that an application can be adapted to various locales without requiring code changes. -* **localization / l10n** - converting a program to run in a different locale. Most of the effort revolves around translating UI string text, so "localization" often gets used synonymously with "translation". But technically also includes designing different layouts (ex: scripts that prefer top-to-bottom right-to-left) and UI widgets (some icons flipping for right-to-left languages) -* **CAT (Computer Assisted Translation) tool** - an editor that is designed for translators to be efficient to use and integrated with other l10n services. The CAT tool UI usually has a 2-column interface in which each message's source (original) and target (translation) text are kept vertically aligned with each other. -* **TMS (Translation Management System)** - a workflow system that manages the end-to-end work of translation. Includes user upload and download, cost estimation & billing, distributing work to translators, integration of reviewers and secondary reviewers, QA / issue management, and post-editing. Some TMSes provide their own integrated CAT tools. In other cases translators choose to use their own CAT tool, in which case they may use an industry standard format like XLIFF to download the translation source and upload their finished translation. -* **post-editing** - after the user receives their translated document as the result of the main translation workflow in the TMS, the user may want to make their own final touches to the translated doc. Those final touches are called post-editing. -* **Translation Memory (TM)** - a database of previous translations (translation entry = source string, source language, target language, target string). TMs typically store individual messages as source strings in separate entries. TMs can be shared globally, shared within a company, and/or private to a single user. -* **Machine Translation (MT)** - letting an automatic translation program perform translation of the source text. This is usually performed only when no entries in the Translation Memory exit that match the source text. The reason is that it is usually easier & cheaper to start translation by correcting Machine Translation output than to write out the translated string from scratch. -* **[XLIFF](https://en.wikipedia.org/wiki/XLIFF) (XML Localization Interchange File Format)** - a localization industry standard file format that defines the structure for translation task data. -* **[Okapi](https://okapiframework.org/) Framework** - a software framework that enables people to develop their own l10n software (CAT tools / TMSes). Hierarchy of classes is similar to XLIFF data hierarchy spec. Supports XLIFF and many common textual document file formats via a plugin-style architecture. Most CAT tools / TMSes are built on Okapi. -* **translatable unit / text unit** - the first level of granularity in which a translation document is broken down to in XLIFF/Okapi. Represents translatable text -- text to be translated by a translator. Usually corresponds to paragraphs, but depends on the file format handler implementation -* **segment** - a sub-unit of a text unit. Usually corresponds to a sentence. Also represents translatable text. -* **source locale** - language of the source text (input) for a translation. -* **target locale** - language of the translated text (output) of a translation. -* **placeholder** - a piece of information inline within a segment that should not be translated. Usually represented in the UI as an indivisible widget (or substring). -* **placeholder type** - which linguistic category that a placeholder represents, as it pertains to the work of translation. Common ones are 'gender' (of a word or person) and 'plural' (a number). -* **Natural Language Generation (NLG)** - a process that transforms structured data into natural language. +- **language** - a system of communication used by a particular country or community. [ISO 639](https://en.wikipedia.org/wiki/ISO_639) is the main standard used to define language codes. +- **locale** - the implementation of a language in a given market, including formatting (numbers, date, etc), common expressions and cultural differences. For example, French France (fr-FR) is different than French Canada (fr-CA). [BCP 47](https://en.wikipedia.org/wiki/IETF_language_tag) is the main standard used to define locale codes. +- **Internationalization (i18n)**: a set of best practices and design process that ensures that an application can be adapted to various locales without requiring code changes. +- **localization / l10n** - converting a program to run in a different locale. Most of the effort revolves around translating UI string text, so "localization" often gets used synonymously with "translation". But technically also includes designing different layouts (ex: scripts that prefer top-to-bottom right-to-left) and UI widgets (some icons flipping for right-to-left languages) +- **CAT (Computer Assisted Translation) tool** - an editor that is designed for translators to be efficient to use and integrated with other l10n services. The CAT tool UI usually has a 2-column interface in which each message's source (original) and target (translation) text are kept vertically aligned with each other. +- **TMS (Translation Management System)** - a workflow system that manages the end-to-end work of translation. Includes user upload and download, cost estimation & billing, distributing work to translators, integration of reviewers and secondary reviewers, QA / issue management, and post-editing. Some TMSes provide their own integrated CAT tools. In other cases translators choose to use their own CAT tool, in which case they may use an industry standard format like XLIFF to download the translation source and upload their finished translation. +- **post-editing** - after the user receives their translated document as the result of the main translation workflow in the TMS, the user may want to make their own final touches to the translated doc. Those final touches are called post-editing. +- **Translation Memory (TM)** - a database of previous translations (translation entry = source string, source language, target language, target string). TMs typically store individual messages as source strings in separate entries. TMs can be shared globally, shared within a company, and/or private to a single user. +- **Machine Translation (MT)** - letting an automatic translation program perform translation of the source text. This is usually performed only when no entries in the Translation Memory exit that match the source text. The reason is that it is usually easier & cheaper to start translation by correcting Machine Translation output than to write out the translated string from scratch. +- **[XLIFF](https://en.wikipedia.org/wiki/XLIFF) (XML Localization Interchange File Format)** - a localization industry standard file format that defines the structure for translation task data. +- **[Okapi](https://okapiframework.org/) Framework** - a software framework that enables people to develop their own l10n software (CAT tools / TMSes). Hierarchy of classes is similar to XLIFF data hierarchy spec. Supports XLIFF and many common textual document file formats via a plugin-style architecture. Most CAT tools / TMSes are built on Okapi. +- **translatable unit / text unit** - the first level of granularity in which a translation document is broken down to in XLIFF/Okapi. Represents translatable text -- text to be translated by a translator. Usually corresponds to paragraphs, but depends on the file format handler implementation +- **segment** - a sub-unit of a text unit. Usually corresponds to a sentence. Also represents translatable text. +- **source locale** - language of the source text (input) for a translation. +- **target locale** - language of the translated text (output) of a translation. +- **placeholder** - a piece of information inline within a segment that should not be translated. Usually represented in the UI as an indivisible widget (or substring). +- **placeholder type** - which linguistic category that a placeholder represents, as it pertains to the work of translation. Common ones are 'gender' (of a word or person) and 'plural' (a number). +- **Natural Language Generation (NLG)** - a process that transforms structured data into natural language. ### Terms for message formatting -* **API** - any function call(s) invoked by the user to perform message formatting. -* **API argument format** - syntax of values passed to input of message formatting API. May also refer to structure of values represented by the syntax. May be similar or same as *authoring format*. See *message syntax*. -* **application locale** - the locale for formatting (or formatting resources) requested by an application. -* **authoring** - writing a message by hand that adheres to some syntax. -* **authoring format** - syntax of message formatting inputs when constructed for a program, either manually (by a developer) or programtically (ex: a WYSIWYG tool used by a translator). May also refer to structure of values represented by the syntax. May be similar or same as *API argument format*. See *message syntax*. -* **data model** - a syntax-independent description of the structure of values passed to the message formatting API. -* **implementation** - code written to make the API achieve the intended behavior of output for a given input. -* **interchange format** - syntax/file format used to convert the inputs of message formatting into inputs for other systems (ex: l10n systems). -* **interpolate** - inserting the contents of one string in the middle of another, at places indicated by a pattern / placeholder. See *translation merging*. -* **locale fallback** - offering a reasonable substitute locale when the requested locale's resources are not available. Results may vary depending on context (ex: audio vs. text vs. video). -* **locale matching** - computing the locale fallback. -* **resource** - files bundled with an application that are loaded in by the executable code. UI strings, etc. and their locale-specific translations are typically stored as resources. -* **roundtrip** - the process of transforming a message into another format or representation, then transforming it back into the original format. -* **translation merging** - in l10n TMSes, the document-level interpolation of translated content. In other words, the replacing of translatable units in the source document with their equivalent translated units. See *interpolate*. -* **selector** - see *placeholder type*. -* **specification** - the rules we decide that describe what is passed to the API for message formatting (structure of data, syntax, etc.). -* **serialization** - how to convert in-memory representations of data to/from a file/stream. -* **syntax** - a general term to describe a set of rules that describe the set of allowed symbols and their ordering. Can apply to data, source code, communication protocols, etc. regardless of interface (stream, file). Sometimes used synonymously for *file format*. -* **variant** - one of the pre-defined values (cases) in a switch/case manner that a message can take depending on the value of some variable (switch), such as a polaceholder. - +- **API** - any function call(s) invoked by the user to perform message formatting. +- **API argument format** - syntax of values passed to input of message formatting API. May also refer to structure of values represented by the syntax. May be similar or same as _authoring format_. See _message syntax_. +- **application locale** - the locale for formatting (or formatting resources) requested by an application. +- **authoring** - writing a message by hand that adheres to some syntax. +- **authoring format** - syntax of message formatting inputs when constructed for a program, either manually (by a developer) or programtically (ex: a WYSIWYG tool used by a translator). May also refer to structure of values represented by the syntax. May be similar or same as _API argument format_. See _message syntax_. +- **data model** - a syntax-independent description of the structure of values passed to the message formatting API. +- **implementation** - code written to make the API achieve the intended behavior of output for a given input. +- **interchange format** - syntax/file format used to convert the inputs of message formatting into inputs for other systems (ex: l10n systems). +- **interpolate** - inserting the contents of one string in the middle of another, at places indicated by a pattern / placeholder. See _translation merging_. +- **locale fallback** - offering a reasonable substitute locale when the requested locale's resources are not available. Results may vary depending on context (ex: audio vs. text vs. video). +- **locale matching** - computing the locale fallback. +- **resource** - files bundled with an application that are loaded in by the executable code. UI strings, etc. and their locale-specific translations are typically stored as resources. +- **roundtrip** - the process of transforming a message into another format or representation, then transforming it back into the original format. +- **translation merging** - in l10n TMSes, the document-level interpolation of translated content. In other words, the replacing of translatable units in the source document with their equivalent translated units. See _interpolate_. +- **selector** - see _placeholder type_. +- **specification** - the rules we decide that describe what is passed to the API for message formatting (structure of data, syntax, etc.). +- **serialization** - how to convert in-memory representations of data to/from a file/stream. +- **syntax** - a general term to describe a set of rules that describe the set of allowed symbols and their ordering. Can apply to data, source code, communication protocols, etc. regardless of interface (stream, file). Sometimes used synonymously for _file format_. +- **variant** - one of the pre-defined values (cases) in a switch/case manner that a message can take depending on the value of some variable (switch), such as a polaceholder. ## Synonyms -* AST (Abstract Syntax Tree) - the tree structure created by a parser of an input stream/file according to a particular syntax/format, typically in refernece to source code (as opposed to data files). -* binding syntax - see *authoring format*. -* build/parse-time format - see *authoring format*. -* compound message - a message that can take on different pre-defined values (cases) in a switch/case manner depending on the value of some variable (switch), such as a placeholder. See *variant*. -* consumed format - see *API arugment format*. -* developer format - see *authoring format*. -* DOM overlay - a way to enable merging translated HTML attributes for an HTML tag that is inline with the translated text. See *interpolate* and *translation merging*. -* file format - a standard syntax, coupled with a semantics for interpretation, to describe the contents of a file. Most commonly used for files representing data (including documents) and executables. See *syntax*. -* filter - Okapi terminology for the serialization code for a file format between input documents and Okapi in-memory data structures. -* formatting locale - the locale provided by the locale fallback mechanism for formatting a message. -* fragment message - when a message is used to represent a portion of a larger message, often nested within the larger message. Fragment messages can occur in messages with multiple variants as a means to refactor and narrow the region covered by variant text. See *variant*. -* full message - when a message is not nested with other messages (ex: the message is just an interposing of strings and placeholders). See *fragment message*. -* language negotation - see *locale fallback*. -* locale chain - see *locale matching*. -* intermediate format - see *authoring format*. -* markup - a category of file formats for plain text data in which portions of text are annotated with metadata / attributes. Boundaries of annotated portions are marked inline using text 'tags' that that are distinguishable from the main text. See *syntax* and *file format*. -* message syntax - the syntax of the inputs to message formatting. If the structure significantly changes between authoring and the API calls at runtime, and the representation used for notation also must differ, we can split this into *API argument format* and *authoring format*. The term *syntax* here may also refer to structure of values represented by the syntax. -* multi-level filter - Okapi terminology for a filter that supports the proper extraction of text units when contents adhering to one file format are embedding within contents adhering to another file format. -* placeable - see *placeholder*. -* placeholder locale - if placeholders contain content that is computed based on the locale, and if that locale is allowed to differ from the rest of the text in the message, then this is the locale for just that placeholder. -* positional variable - when a function call takes a series of values without some way to name those values (ex: comma-separated list). Changing the order of the inputs to the function call would lead to different semantics. In contrast, maps and named paramters are ways to provide arguments to function calls that are order-independent. -* selector - see *placeholder type*. -* source code representation - see *authoring format* -* standard message format - see *message syntax*. -* translation/localization format - the interchange format used specifically for l10n use cases. See *interchange format*. -* resource locale - see *target locale*. -* runtime format - see *API argument format*. -* UI language - see *target locale*. -* variable - see *placeholder*. -* variable locale - see *placeholder locale*. - +- AST (Abstract Syntax Tree) - the tree structure created by a parser of an input stream/file according to a particular syntax/format, typically in refernece to source code (as opposed to data files). +- binding syntax - see _authoring format_. +- build/parse-time format - see _authoring format_. +- compound message - a message that can take on different pre-defined values (cases) in a switch/case manner depending on the value of some variable (switch), such as a placeholder. See _variant_. +- consumed format - see _API arugment format_. +- developer format - see _authoring format_. +- DOM overlay - a way to enable merging translated HTML attributes for an HTML tag that is inline with the translated text. See _interpolate_ and _translation merging_. +- file format - a standard syntax, coupled with a semantics for interpretation, to describe the contents of a file. Most commonly used for files representing data (including documents) and executables. See _syntax_. +- filter - Okapi terminology for the serialization code for a file format between input documents and Okapi in-memory data structures. +- formatting locale - the locale provided by the locale fallback mechanism for formatting a message. +- fragment message - when a message is used to represent a portion of a larger message, often nested within the larger message. Fragment messages can occur in messages with multiple variants as a means to refactor and narrow the region covered by variant text. See _variant_. +- full message - when a message is not nested with other messages (ex: the message is just an interposing of strings and placeholders). See _fragment message_. +- language negotation - see _locale fallback_. +- locale chain - see _locale matching_. +- intermediate format - see _authoring format_. +- markup - a category of file formats for plain text data in which portions of text are annotated with metadata / attributes. Boundaries of annotated portions are marked inline using text 'tags' that that are distinguishable from the main text. See _syntax_ and _file format_. +- message syntax - the syntax of the inputs to message formatting. If the structure significantly changes between authoring and the API calls at runtime, and the representation used for notation also must differ, we can split this into _API argument format_ and _authoring format_. The term _syntax_ here may also refer to structure of values represented by the syntax. +- multi-level filter - Okapi terminology for a filter that supports the proper extraction of text units when contents adhering to one file format are embedding within contents adhering to another file format. +- placeable - see _placeholder_. +- placeholder locale - if placeholders contain content that is computed based on the locale, and if that locale is allowed to differ from the rest of the text in the message, then this is the locale for just that placeholder. +- positional variable - when a function call takes a series of values without some way to name those values (ex: comma-separated list). Changing the order of the inputs to the function call would lead to different semantics. In contrast, maps and named paramters are ways to provide arguments to function calls that are order-independent. +- selector - see _placeholder type_. +- source code representation - see _authoring format_ +- standard message format - see _message syntax_. +- translation/localization format - the interchange format used specifically for l10n use cases. See _interchange format_. +- resource locale - see _target locale_. +- runtime format - see _API argument format_. +- UI language - see _target locale_. +- variable - see _placeholder_. +- variable locale - see _placeholder locale_. ## Resources -* [Localization Essentials (Udacity)](https://www.udacity.com/course/localization-essentials--ud610) - free -* [Localization standards reader 4.0](https://magazine.multilingual.com/issue/jan-feb-2019dm/localization-standards-reader-4-0/) -* [Localization standards reader 4.0 - teaching copy](http://www.tara.tcd.ie/bitstream/handle/2262/90713/L10n%20Standards%20Reader%20v4.0.1.pdf?sequence=1&isAllowed=y) + +- [Localization Essentials (Udacity)](https://www.udacity.com/course/localization-essentials--ud610) - free +- [Localization standards reader 4.0](https://magazine.multilingual.com/issue/jan-feb-2019dm/localization-standards-reader-4-0/) +- [Localization standards reader 4.0 - teaching copy](http://www.tara.tcd.ie/bitstream/handle/2262/90713/L10n%20Standards%20Reader%20v4.0.1.pdf?sequence=1&isAllowed=y) diff --git a/docs/goals.md b/docs/goals.md index 7f76999f65..9ad6ab30b9 100644 --- a/docs/goals.md +++ b/docs/goals.md @@ -6,71 +6,69 @@ and informs the decisions about the scope and the priorities of its efforts. ## Goals The primary task of the MFWG is to develop an industry standard for the -representation of localizable dynamic message strings. A ***dynamic message -string*** is a string whose content changes due to the value of or insertion +representation of localizable dynamic message strings. A **_dynamic message +string_** is a string whose content changes due to the value of or insertion of some data value or values. The design goals are listed below. - 1. Express grammatical features, such as plurals, genders, and inflections. +1. Express grammatical features, such as plurals, genders, and inflections. - 2. Express other variance in translation, due to linguistic and regional +2. Express other variance in translation, due to linguistic and regional features, the presentation media, context, circumstance, and other factors. - 3. Express formattable data, such as numbers, dates, currencies, or units, +3. Express formattable data, such as numbers, dates, currencies, or units, in a locale-appropriate way. - 4. Represent structured data alongside translations, such as markup, comments, +4. Represent structured data alongside translations, such as markup, comments, and metadata. - 5. Be capable of localization roundtrip. +5. Be capable of localization roundtrip. - 6. Enable the creation of implementations, frameworks and tools building on +6. Enable the creation of implementations, frameworks and tools building on top of the standard, manifesting different ideas and programming paradigms, and optimized for different uses and audiences. - ## Deliverables - 1. A formal definition of the canonical data model for representing +1. A formal definition of the canonical data model for representing localizable _dynamic message strings_. - 2. A formal definition of the canonical syntax for representing the data +2. A formal definition of the canonical syntax for representing the data model, with well defined rules for handling text, special characters, escape sequences, whitespace, markup, as well as parsing errors. - 3. A specification for a one-to-one mapping between the data model and XLIFF. +3. A specification for a one-to-one mapping between the data model and XLIFF. _Note that this deliverable is "at risk" and not expected to be part of the 2023 fall release._ - 4. A specification for resolving messages at runtime, including +4. A specification for resolving messages at runtime, including interpolated data types and runtime errors. - 5. A conformance test suite for parsing and formatting messages sufficient to +5. A conformance test suite for parsing and formatting messages sufficient to ensure implementations can validate conformance to the specification(s) provided. - 6. A determination that there are at least two interoperable independent implementations - compliant with the conformance test suite in order to demonstrate that the +6. A determination that there are at least two interoperable independent implementations + compliant with the conformance test suite in order to demonstrate that the specification(s) are practical and meet requirements. - ## Non-Goals The following is a list of potential goals which are explicitly excluded from the scope of the MFWG. - 1. Design a _general interchange format_ for storing and transferring +1. Design a _general interchange format_ for storing and transferring translations. Instead, ensure compatibility with the existing interchange formats. - 2. Support _all grammatical features of all languages_. Instead, focus on +2. Support _all grammatical features of all languages_. Instead, focus on features most commonly encountered in user interfaces, textual, graphical and spoken ones alike. - 3. Create an _automated engine_ capable of transforming parts of speech in +3. Create an _automated engine_ capable of transforming parts of speech in a grammatically-correct fashion. Instead, allow interfacing with such automatic and non-automatic engines from within the data model. - 4. Build a _framework for localizing software_. Instead, design the standard +4. Build a _framework for localizing software_. Instead, design the standard as a building block to be used by third parties to create localization frameworks. diff --git a/docs/why_mf_next.md b/docs/why_mf_next.md index 544821be0f..9445cbb129 100644 --- a/docs/why_mf_next.md +++ b/docs/why_mf_next.md @@ -5,10 +5,11 @@ The `MessageFormat` API and syntax have been around for a long time. Intro -* `MessageFormat` is the Unicode API for software localization -* It is 20 years old, well designed, proven solution -* Its design is optimized for the software development model of 20y ago and its -shortcomings result in mixed reception and adoption by the industry. + +- `MessageFormat` is the Unicode API for software localization +- It is 20 years old, well designed, proven solution +- Its design is optimized for the software development model of 20y ago and its + shortcomings result in mixed reception and adoption by the industry. The current wave of software development uses dynamic languages, modern UI frameworks and new forms of user interactions (voice, VR etc.). @@ -23,8 +24,8 @@ Other efforts: [Fluent](https://projectfluent.org/), ## Core problems with the current `MessageFormat` 1. The design is not modular enough - * Does not have any “extension points” - * Can't deprecate anything, even if now we know better + - Does not have any “extension points” + - Can't deprecate anything, even if now we know better 2. Some existing problems 3. Hard to map to the existing localization core structures 4. Designed to be API only, plain text, UI, “imperative style” @@ -65,21 +66,22 @@ date/time parameters were added). But the stability requirements prevent any major cleanup. ### 2. Some existing problems -* ICU added new formatters, but MessageFormat does not support them -* Combined selectors (select + plural) results in unreadable and error -prone nesting -* Select and plurals inside the message are difficult to translate because of -grammatical agreement requires words outside select / plural to change. -See https://en.wikipedia.org/wiki/Agreement_(linguistics) -* Patterns in the date / time / number placeholders are bad i18n, should use skeletons -* No official support for gender. It can be done with `select`, but it -is not the same thing (same as the difference between an `enum` and integer/strings). Developers can use masculine/feminine, masc/fem, male/female, etc. -* Formatting for “parameters” known at compile time -* Escaping with apostrophe is error prone. There is no reliable way to tell if -it has to be doubled or not. -* The # is used in plural format instead of {...}, but does not work for nesting unless the plural is the innermost selector. But named placeholders don't work -properly for plurals with offset. So there are 2 ways to do the same thing that work in 98% of cases, but in special situations only one of the ways works. -* Does not support inflections, and it would be hard to add without breaking existing tools. + +- ICU added new formatters, but MessageFormat does not support them +- Combined selectors (select + plural) results in unreadable and error + prone nesting +- Select and plurals inside the message are difficult to translate because of + grammatical agreement requires words outside select / plural to change. + See https://en.wikipedia.org/wiki/Agreement_(linguistics) +- Patterns in the date / time / number placeholders are bad i18n, should use skeletons +- No official support for gender. It can be done with `select`, but it + is not the same thing (same as the difference between an `enum` and integer/strings). Developers can use masculine/feminine, masc/fem, male/female, etc. +- Formatting for “parameters” known at compile time +- Escaping with apostrophe is error prone. There is no reliable way to tell if + it has to be doubled or not. +- The # is used in plural format instead of {...}, but does not work for nesting unless the plural is the innermost selector. But named placeholders don't work + properly for plurals with offset. So there are 2 ways to do the same thing that work in 98% of cases, but in special situations only one of the ways works. +- Does not support inflections, and it would be hard to add without breaking existing tools. ### 3. Hard to map to the existing localization core structures @@ -97,11 +99,12 @@ and return 4 message variants for Russian, for example. This is not a superficial problem. It affects most steps in the normal localization flow: -* leveraging (the same string “X files” must be translated -in 2/3 different ways) -* validation (placeholders, length, terminology, etc.) -* word count and payment -* alignment (the process of creating a TM from source + translated documents) + +- leveraging (the same string “X files” must be translated + in 2/3 different ways) +- validation (placeholders, length, terminology, etc.) +- word count and payment +- alignment (the process of creating a TM from source + translated documents) ### 4. Designed to be API only, plain text, UI, “imperative style” @@ -110,7 +113,7 @@ replace placeholders, and return the string result with placeholders replaced. \ An i18n-aware `printf`, basically. It does not play well with binding, formatting tags (think `html`), -or “document-like” content (for example templating systems like +or “document-like” content (for example templating systems like [freemarker](https://freemarker.apache.org/), [mustache](https://mustache.github.io/), even JSP, PHP, etc.) diff --git a/exploration/0000-design-proposal-template.md b/exploration/0000-design-proposal-template.md index 65b0ac1a38..ba6299e0c4 100644 --- a/exploration/0000-design-proposal-template.md +++ b/exploration/0000-design-proposal-template.md @@ -14,13 +14,13 @@ ## Objective -*What is this proposal trying to achieve?* +_What is this proposal trying to achieve?_ Decide how to interpolate data in patterns, in order to be able to display dynamic information like numbers, dates, and names inside translatable messages. ## Background -*What context is helpful to understand this proposal?* +_What context is helpful to understand this proposal?_ Translatable messages need to interpolate data... Such data must be formatted and positioned inside translation patterns... @@ -28,56 +28,56 @@ There's prior art in other translation and templating solutions... ## Use-Cases -*What use-cases do we see? Ideally, quote concrete examples.* +_What use-cases do we see? Ideally, quote concrete examples._ -* Numbers representing counts, e.g. `{You have {$count} new messages}`... -* Strings representing usernames, e.g. `{Hello, {$userName}!}`... -* Selection based on numbers, e.g. `match {$count} when 1 {One thing} when * {Many things}`... +- Numbers representing counts, e.g. `{You have {$count} new messages}`... +- Strings representing usernames, e.g. `{Hello, {$userName}!}`... +- Selection based on numbers, e.g. `match {$count} when 1 {One thing} when * {Many things}`... ## Requirements -*What properties does the solution have to manifest to enable the use-cases above?* +_What properties does the solution have to manifest to enable the use-cases above?_ -* Be able to use a variable more than once in a pattern... -* Be able to use a variable in a selector... -* Be able to reorder a variable... -* Be able to tell what a variable refers to... -* Make migration from ICU MF1 possible... +- Be able to use a variable more than once in a pattern... +- Be able to use a variable in a selector... +- Be able to reorder a variable... +- Be able to tell what a variable refers to... +- Make migration from ICU MF1 possible... ## Constraints -*What prior decisions and existing conditions limit the possible design?* +_What prior decisions and existing conditions limit the possible design?_ -* A syntactical prefix must not collide with `nmtoken`, to avoid parsing ambiguities with unquoted literals... +- A syntactical prefix must not collide with `nmtoken`, to avoid parsing ambiguities with unquoted literals... ## Proposed Design -*Describe the proposed solution. Consider syntax, formatting, errors, registry, tooling, interchange.* +_Describe the proposed solution. Consider syntax, formatting, errors, registry, tooling, interchange._ ... ## Alternatives Considered -*What other solutions are available?* -*How do they compare against the requirements?* -*What other properties they have?* +_What other solutions are available?_ +_How do they compare against the requirements?_ +_What other properties they have?_ ### Use unnamed placeholders... For example: `{Hello, {$}!}`... -* **Use more than once?** No. -* **Use in selectors?** No. -* **Reorder?** No. -* **Clear what the variable refers to?** No. -* **Migration from MF1 possible?** Yes. +- **Use more than once?** No. +- **Use in selectors?** No. +- **Reorder?** No. +- **Clear what the variable refers to?** No. +- **Migration from MF1 possible?** Yes. ### Use indexed placeholders... For example: `{Hello, {$1}!}`... -* **Use more than once?** Yes. -* **Use in selectors?** Yes. -* **Reorder?** Yes. -* **Clear what the variable refers to?** No. -* **Migration from MF1 possible?** Yes. +- **Use more than once?** Yes. +- **Use in selectors?** Yes. +- **Reorder?** Yes. +- **Clear what the variable refers to?** No. +- **Migration from MF1 possible?** Yes. diff --git a/exploration/selection-matching-options.md b/exploration/selection-matching-options.md index 7b3b9269ea..b632e44acd 100644 --- a/exploration/selection-matching-options.md +++ b/exploration/selection-matching-options.md @@ -4,15 +4,15 @@ We are discussing whether to change First-Match to another value. Currently voting looks like: -| Person | Supports | -|---|---| -| APP | C-F, then B-M, no F-M | -| STA | F-M, then B-M, the C-F with optional \* | -| SCL | F-M, then C-F, then optional \* | -| RGN | C-F optional \*, F-M, CF required \* | -| MIH | B-M, C-F lexical sort, C-F required \* | -| EAO | C-F optional \*, F-M, C-F required \* | -| ECH | C-F, B-M (distant 2nd), F-M \* behind that | +| Person | Supports | +| ------ | ------------------------------------------ | +| APP | C-F, then B-M, no F-M | +| STA | F-M, then B-M, the C-F with optional \* | +| SCL | F-M, then C-F, then optional \* | +| RGN | C-F optional \*, F-M, CF required \* | +| MIH | B-M, C-F lexical sort, C-F required \* | +| EAO | C-F optional \*, F-M, C-F required \* | +| ECH | C-F, B-M (distant 2nd), F-M \* behind that | ## Background @@ -22,18 +22,17 @@ I would like to reopen this discussion because I believe that using first-match ## Comparison -| Criterion | First-Match | Scored Best-Match | Column-First Best-Match | Column-First req `*` | Notes | -|---|---|---|---|---|---| -| MF1 Compat | ? | ? | - | + | some say F-M not compat. | -| Devlopers/Translators can control | +++ | - | - | - | D/Ts have the ability to influence or override selection | -| Developers/Translator must control | - | +++ | ++ | +++ | D/Ts are required to manage selection order | -| Visual Inspection | +++ | + | + | + | It is possible to order any matrix canonically, enabling visual inspection | -| Complex matching (varies by locale) | - | +++ | +++ | +++ | Matrix explosion may conflict with manual ordering in FM | -| Complex matching (multi-value) | - | +++ | ++ | ++ | F-M stops on first match; B-M gives developer full control of matching | -| Translation tool variant order | - | +++ | +++ | +++ | Translations tools are required to maintain the order and/or provide for reordering that itself is remembers (eg. in the TM) | -| Partial leverage on added keys | - | +++ | +++ | +++ | Changes or additions to matrix only affect some entries | -| Programmable selection order | + | + | + | + | Selector authors can provide options for tailoring matches | - +| Criterion | First-Match | Scored Best-Match | Column-First Best-Match | Column-First req `*` | Notes | +| ----------------------------------- | ----------- | ----------------- | ----------------------- | -------------------- | ---------------------------------------------------------------------------------------------------------------------------- | +| MF1 Compat | ? | ? | - | + | some say F-M not compat. | +| Devlopers/Translators can control | +++ | - | - | - | D/Ts have the ability to influence or override selection | +| Developers/Translator must control | - | +++ | ++ | +++ | D/Ts are required to manage selection order | +| Visual Inspection | +++ | + | + | + | It is possible to order any matrix canonically, enabling visual inspection | +| Complex matching (varies by locale) | - | +++ | +++ | +++ | Matrix explosion may conflict with manual ordering in FM | +| Complex matching (multi-value) | - | +++ | ++ | ++ | F-M stops on first match; B-M gives developer full control of matching | +| Translation tool variant order | - | +++ | +++ | +++ | Translations tools are required to maintain the order and/or provide for reordering that itself is remembers (eg. in the TM) | +| Partial leverage on added keys | - | +++ | +++ | +++ | Changes or additions to matrix only affect some entries | +| Programmable selection order | + | + | + | + | Selector authors can provide options for tailoring matches | ### Example @@ -63,12 +62,14 @@ First-Match selection evaluates the list of _keys_ row-by-row and selects the fi In the example message, the _variants_ are in a canonical order, so first-match produces the same order as best-match does. **Pros** -+ Allows developers to control the order of selection. -+ Allows translators to tailor the order of selection. -+ Can visually inspect match order. -+ May be more efficient when perfoming match (??) + +- Allows developers to control the order of selection. +- Allows translators to tailor the order of selection. +- Can visually inspect match order. +- May be more efficient when perfoming match (??) **Cons** + - Requires developers to specify _variants_ in the correct order. - Requires translators to tailor the order of _variants_ if this is different from the source. - Requires all translation tooling and runtime processing to preserve the order of the _variants_ @@ -83,12 +84,14 @@ Best-Match selection evaluates the full list of _keys_ and selects the _variant_ 1. Column-First without required default `*` **Pros** -+ Variants can be written in any order and produce a consistent result. -+ Selector developers can write complex matches that produce different quality matches for the same value. For example, `{|1| :plural}` matches both the variant `1` and the variant `one`, but prefers the value `1`. The plural _selector_ does not need to communicate with the other _selectors_ in order to arrive at the best matching pattern. -+ Translators do not need to worry about the order of variants or need to reorder variants (which can be difficult to do when only the translation segment for the pattern is shown or when only a changed or generated _variant_ is exposed to translation. -+ Translation tools do not have to preserve the order of _variants_ and are free to send only the translatable segment (the pattern) for translation. + +- Variants can be written in any order and produce a consistent result. +- Selector developers can write complex matches that produce different quality matches for the same value. For example, `{|1| :plural}` matches both the variant `1` and the variant `one`, but prefers the value `1`. The plural _selector_ does not need to communicate with the other _selectors_ in order to arrive at the best matching pattern. +- Translators do not need to worry about the order of variants or need to reorder variants (which can be difficult to do when only the translation segment for the pattern is shown or when only a changed or generated _variant_ is exposed to translation. +- Translation tools do not have to preserve the order of _variants_ and are free to send only the translatable segment (the pattern) for translation. **Cons** + - Developers cannot override the order that the _selector_ provides unless this is exposed as a feature of the given _selector_. - More complex matching implementation; may be slower? @@ -98,9 +101,11 @@ Sorted Matching evaluates the full list of _keys_ by sorting the matrix. Each _s **Pros** (All of the pros of best match plus:) -+ Allows for better selection in some corner cases. + +- Allows for better selection in some corner cases. **Cons** + - Complex to evaluate visually - More complex to implement @@ -170,7 +175,6 @@ The second _selector_ matches the explicit value `1`, which is prefers to the ke The final _selector_ matches the default value `*` but not the keyword `one`, thus producing this matrix: - ``` * 1 * <-- winner * one * @@ -181,19 +185,19 @@ The final _selector_ matches the default value `*` but not the keyword `one`, th This is a "sorted" best match algorithm that works as follows: each _selector_ provides a "comparator" for values in its column (such as computing a weight for the value in its column). Rows that contain a non-matching value for any selector are eliminated as potential matches. The default value `*` always matches. Ordering is maintained for preceding columns, that is, _selector_ number 2 can only reorder items whose _selector_ number 1 key match. The highest ranking _key_ is returned as the _pattern_. Ties are broken by column. If no matching row is found, returns an error. - **Pros** -+ Variants can be written in any order and produce a consistent result. -+ Selector developers can write complex matches that produce different quality matches for the same value. For example, `{|1| :plural}` matches both the variant `1` and the variant `one`, but prefers the value `1`. The plural _selector_ does not need to communicate with the other _selectors_ in order to arrive at the best matching pattern. -+ Translators do not need to worry about the order of variants or need to reorder variants (which can be difficult to do when only the translation segment for the pattern is shown or when only a changed or generated _variant_ is exposed to translation. -+ Translation tools do not have to preserve the order of _variants_ and are free to send only the translatable segment (the pattern) for translation. -+ Easier to evaluate visually than sorting strategies. + +- Variants can be written in any order and produce a consistent result. +- Selector developers can write complex matches that produce different quality matches for the same value. For example, `{|1| :plural}` matches both the variant `1` and the variant `one`, but prefers the value `1`. The plural _selector_ does not need to communicate with the other _selectors_ in order to arrive at the best matching pattern. +- Translators do not need to worry about the order of variants or need to reorder variants (which can be difficult to do when only the translation segment for the pattern is shown or when only a changed or generated _variant_ is exposed to translation. +- Translation tools do not have to preserve the order of _variants_ and are free to send only the translatable segment (the pattern) for translation. +- Easier to evaluate visually than sorting strategies. **Cons** + - Developers cannot override the order that the _selector_ provides unless this is exposed as a feature of the given _selector_. - Can require more processing than First-Match - ### Column-First with Optional `*` This is a "sorted" best match algorithm. Unlike other options, this algorithm does not require that the key set contain a default (\*\) message value for a given column combination, including, presumably, the default message with all `*` values. Matching proceeds exactly like column-first-with-required-\*\. If no matching row is found, returns an error. @@ -208,11 +212,13 @@ when false * {... treating "false" as "other"?...} Here a value like `true`/`other` falls through and produces an error because `false` does not match like plural's `other`. **Pros** -+ (all of the "pros" for "with `*`") -+ Avoids having extra rows in cases where the default value and `*` might be distinct, for example `other` vs. `*` in plural. -+ Allows for _selectors_ whose default value varies by locale but for which the set of matching keys remains a closed set, for example, if the default gender were different by locale or if there were something like a default grammatical case. + +- (all of the "pros" for "with `*`") +- Avoids having extra rows in cases where the default value and `*` might be distinct, for example `other` vs. `*` in plural. +- Allows for _selectors_ whose default value varies by locale but for which the set of matching keys remains a closed set, for example, if the default gender were different by locale or if there were something like a default grammatical case. **Cons** + - Easier to produce a non-functional message that returns only an error for some set of values. - Loses the ability to validate message completeness. @@ -222,15 +228,15 @@ Computes row order in the `match` statement rather than in each _selector_. Each Using the example at top, let's consider some values: -| selector | count | size | cost | score | winner? | notes | -|---|---|---|---|---|---|---| -| `when 0 * *` | 0 | any | any | 1.0 + 0.1 + 0.1 = 1.2 | Y | 0 is perfect match | -| `when one 0 *` | 0 | 0 | 0 | 0.0 + 1.0 + 0.1 = 0.0 (!!) | N | no-match on first item ends processing | -| `when one 0 *` | 1 | 0 | 0 | 0.5 + 1.0 + 0.1 = 1.6 | Y | keyword match on one | -| `when one one *` | 1 | 0 | 0 | 0.5 + **0** = 0 | N | no-match on second item ends processing | -| `when one one *` | 1 | 1 | 0 | 0.5 + 0.5 + 0.1 = 1.1 | Y | keyword match on one | -| `when * * *` | 1 | 1 | 0 | 0.1 + 0.1 + 0.1 = 0.3 | N | `*` here is default | -| `when * * *` | 11 | 11 | 42.0 | 0.5 + 0.5 + 0.5 = 1.5 | Y | `*` here is like `other` | +| selector | count | size | cost | score | winner? | notes | +| ---------------- | ----- | ---- | ---- | -------------------------- | ------- | --------------------------------------- | +| `when 0 * *` | 0 | any | any | 1.0 + 0.1 + 0.1 = 1.2 | Y | 0 is perfect match | +| `when one 0 *` | 0 | 0 | 0 | 0.0 + 1.0 + 0.1 = 0.0 (!!) | N | no-match on first item ends processing | +| `when one 0 *` | 1 | 0 | 0 | 0.5 + 1.0 + 0.1 = 1.6 | Y | keyword match on one | +| `when one one *` | 1 | 0 | 0 | 0.5 + **0** = 0 | N | no-match on second item ends processing | +| `when one one *` | 1 | 1 | 0 | 0.5 + 0.5 + 0.1 = 1.1 | Y | keyword match on one | +| `when * * *` | 1 | 1 | 0 | 0.1 + 0.1 + 0.1 = 0.3 | N | `*` here is default | +| `when * * *` | 11 | 11 | 42.0 | 0.5 + 0.5 + 0.5 = 1.5 | Y | `*` here is like `other` | ## FAQ @@ -325,7 +331,7 @@ MF2 is fundamentally different in that **all** _selectors_ must be evaluated to #### What is "complex matching"? How does plural exemplify it? -Complex matching is when a _selector_ can match multiple different _variants_ to a single value. +Complex matching is when a _selector_ can match multiple different _variants_ to a single value. Many types of _selector_ do equality matching. For example, `SelectFormat` is generally matching a variable's value against a static literal. @@ -350,16 +356,16 @@ match {$count :plural numDigits=2} // produces localized equiv of 2.00 By letting the _selector_ decide how to process the input and range of _variants_, we can allow for complex matching without burdening our specification with a lot of details. - As an aside, how does the above express the `when` clause for the value `2.00`? It can't use a literal (what if the decimal separator were `,`!!) and the _nmtoken_ `2.00` could be complicated to handle? Also, does `2` match? #### How does this compare to programming language constructs (such as switch)? It's difficult to say if the MF2 `match` statement should work like familiar selection methods in programming languages. Internationalization APIs, such as resource managers, MF1, and date/number skeletons have tended towards "do what I want", hiding the need for both developers and translators to know about cultural and lingusitic variation and account for it in code. Modern I18N APIs hide most of this complexity. Some of the analogous cases in I18N APIs are: -* **Resource fallback**, particularly with sparsely populated localized resource files -* **Skeletons** such as for dates (for example, `yyyyMMddjm`), which do not require translators to touch "picture strings" (such as `MM/dd/yyyy HH:mm a`) to handle the time or date separators `/` and `:`, the use of 24-hour time vs. 12-hour time, the order of the fields. -* **Built-in formats** such as `short`/`medium`/`long`/`full` do not guarantee any particular separators, field order, or format and vary widely between locales. -* **Locale negotiation** matches the best particular locale to a requested language range. + +- **Resource fallback**, particularly with sparsely populated localized resource files +- **Skeletons** such as for dates (for example, `yyyyMMddjm`), which do not require translators to touch "picture strings" (such as `MM/dd/yyyy HH:mm a`) to handle the time or date separators `/` and `:`, the use of 24-hour time vs. 12-hour time, the order of the fields. +- **Built-in formats** such as `short`/`medium`/`long`/`full` do not guarantee any particular separators, field order, or format and vary widely between locales. +- **Locale negotiation** matches the best particular locale to a requested language range. When coding a plural using ICU4J's `PluralFormat`, the developer only needs to worry about _specific value_ messages (`when 1 {This is your last chance.}`) vs. value based messages (`when one {You have {$count} chance remaining.}`). @@ -384,7 +390,9 @@ when 1 {This is your last chance} when one {You have {$count} chance remaining} when * {...} ``` + vs. + ``` match {$count} when one {You have {$count} chance remaining} @@ -394,7 +402,6 @@ when * {...} This exposes developers and translators to managing the complexity versus having the API take care of it. - #### Are there other complex matching cases? Or is `plural` everything? Currently there are no other complex rule-based selectors in ICU. However, there are a number of cases where complex matching might come into play. The criteria for it being a complex match are: @@ -408,24 +415,25 @@ The key thing here is that the static text produced by the translator needs to r Some potential examples of (1) (and this is "thinking out loud"): 1. **Date/time based selection.** Date/time types, including the newer Temporal types, can present complex matching needs. While _incremental time_ values (such as `java.time.Instant`, `java.util.Date`, or JavaScript's `Date`) can resolve every field and be cast to any time zone, Other types, such as `java.time.ZonedDate`, are incomplete. There are different calendars that can affect presentation and selection as well. Some cases for complex time selection include: - * **Relative time formats.** The values available (such as `yesterday`, `tomorrow`, `day after tomorrow`) vary by locale. Here's one example in the [German CLDR charts](https://unicode-org.github.io/cldr-staging/charts/latest/summary/de.html#1d45310cbcf1b2e5) - * **Periodic time formats.** Recurring values might require message selection. -1. **Gender or part-of-speech selection.** Grammatical gender is strongly linked to language and varies by language--very much like plurals. These types of selection might not have the multiple selection quirks of plurals, but will have varying shape by locale. + - **Relative time formats.** The values available (such as `yesterday`, `tomorrow`, `day after tomorrow`) vary by locale. Here's one example in the [German CLDR charts](https://unicode-org.github.io/cldr-staging/charts/latest/summary/de.html#1d45310cbcf1b2e5) + - **Periodic time formats.** Recurring values might require message selection. + +1. **Gender or part-of-speech selection.** Grammatical gender is strongly linked to language and varies by language--very much like plurals. These types of selection might not have the multiple selection quirks of plurals, but will have varying shape by locale. + + For example, I built a "product name format" function into Amazon's devices. Each product knew (in each supported locale) its generic, short, medium, long, and full name, and each product's name could vary in gender/count/etc. per language. That is, the generic might be a "tablet" or "TV" (or whatever) and then the e.g. tablet might be called a "Fire", a "Fire 8 HDX", etc. - For example, I built a "product name format" function into Amazon's devices. Each product knew (in each supported locale) its generic, short, medium, long, and full name, and each product's name could vary in gender/count/etc. per language. That is, the generic might be a "tablet" or "TV" (or whatever) and then the e.g. tablet might be called a "Fire", a "Fire 8 HDX", etc. - The software doesn't know which device it'll be built into (actually, it's built into all of them), so the formatter needs to select the correct pattern string according to device it is in at runtime. Rather than build separate strings for every device, we generated variations based on the (smaller) set of grammar variations per-locale. A simple message like `The {whatever} is ready` in English might look sort of like the following (in our syntax) in a French locale (and I'm omitting for clarity such things as enclitic handling, e.g. when it's `l'ordinateur` not `le ordinateur`): ``` match {$product :gender} when masculine {Le {:product format=generic} est prêt.} ; le téléphone est prêt when feminine {La {:product format=generic} est prête.} ; la télévision est prête - when * {L'appareil est prêt.} + when * {L'appareil est prêt.} ``` Notice that it isn't just the article that changes. And notice that the list of enumerated values changes by language (so German has three noun genders while English mostly has one) - + Some potential examples of (2): 1. **Application specific selection.** Developers may need to write selectors with varying degrees of selection. For example, one might have a message that varies by category and then, for specific items, by sub-category: @@ -480,12 +488,16 @@ Using the values `11`, `11`, `1` goes like this: * * one * * * ``` + => + ``` * * one * * * ``` + => + ``` * * one <-- winner ``` @@ -517,7 +529,9 @@ with values `2`/`1`/`11` goes like this: ``` 2 0 0 ``` + => + ``` (empty set) <-- 1 != 0 ``` @@ -532,16 +546,19 @@ If the `*` value is considered a match (not filtered) this isn't a problem: * 1 * * * * ``` + => + ``` * 1 * * one one * one * * * one ``` + => + ``` * 1 * <-- winner * one * ``` - diff --git a/exploration/variants.md b/exploration/variants.md index 06de0d98d1..c8d9922675 100644 --- a/exploration/variants.md +++ b/exploration/variants.md @@ -67,30 +67,28 @@ _item_ is skipped for brevity but can be assumed to be the subject. Anne i John opublikowali post w grupie Birthday Party Anne i Mary opublikowały post w grupie Birthday Party John i Mark opublikowali post w grupie Birthday Party - - ## Vocative form - - ### English + +## Vocative form + +### English Hello [user], ---> wrong Czech: Pavel Hello [first_name], ---> wrong Czech: Petra Hello [full_name], ---> wrong Czech: David Filip - ### wrong Czech Ahoj [user], ---> wrong Czech: Pavel Ahoj [first_name], ---> wrong Czech: Petra Ahoj [full_name], ---> wrong Czech: David Filip - ### Czech (with canDelete="yes" on the placeholders) Dobrý den, Dobrý den, Dobrý den, - -### Czech with a vocative aware formatter + +### Czech with a vocative aware formatter Ahoj [user-vocative], ---> correct Czech: Pavle Ahoj [first_name-vocative], ---> correct Czech: Petro diff --git a/meetings/2019/notes-2019-11-25.md b/meetings/2019/notes-2019-11-25.md index a154c2d1db..2a9f2f88b3 100644 --- a/meetings/2019/notes-2019-11-25.md +++ b/meetings/2019/notes-2019-11-25.md @@ -1,4 +1,5 @@ ##### November 25 Attendees: + - Romulo Cintra - CaixaBank (RCA) - Addison Phillips - Amazon.com (APS) - Zibi Braniecki - Mozilla (ZB) @@ -12,7 +13,7 @@ - Long Ho - Dropbox (LHO) - Richard Gibson - OpenJSF and Oracle (RGN) -## MessageFormat Working Group Contacts : +## MessageFormat Working Group Contacts : - [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) @@ -33,55 +34,55 @@ APS: A problem I see with the select is that it is hard on the translators. RCA: The issue here is the syntax to make it more flexible. -AP: "Flexible" isn't the word I would choose, but easier to author and translate would be good. We maintain a system where we split the message apart and hide the curly syntax mess for translators. +AP: "Flexible" isn't the word I would choose, but easier to author and translate would be good. We maintain a system where we split the message apart and hide the curly syntax mess for translators. MIH: Readability for translators and programmers is very different. -ZB: The interesting question about "a new file format" is a little more complicated. We may need a different format for a different audience. Every row below the first row requires a new file format. But not every row needs a new file format to replace MessageFormat. +ZB: The interesting question about "a new file format" is a little more complicated. We may need a different format for a different audience. Every row below the first row requires a new file format. But not every row needs a new file format to replace MessageFormat. -APS: What does "file format" mean to you? To a certain extent it means I can put messages in a JSON file, or a new syntax. +APS: What does "file format" mean to you? To a certain extent it means I can put messages in a JSON file, or a new syntax. -RCA: What I meant with "file format" was a new file extension. Fluent has one, for example. I was asking if people want to write messages using a different extension. +RCA: What I meant with "file format" was a new file extension. Fluent has one, for example. I was asking if people want to write messages using a different extension. APS: So we're talking about a resource file format, which may… I mean that's potentially a superset of just message formatting. -ZB: It's hard to think about them completely separately. It was important that we think about the container for error handling. +ZB: It's hard to think about them completely separately. It was important that we think about the container for error handling. -APS: If we're doing this for ECMAScript, what is the authoring experience of that? The question is how do people write resources, not just message formatting ones, but they don't like to write message formats differently than other resources. +APS: If we're doing this for ECMAScript, what is the authoring experience of that? The question is how do people write resources, not just message formatting ones, but they don't like to write message formats differently than other resources. ### Requirements slide: Pluggable formatters -MIH: I have an example implementation. It was easy to write. In this area, MessageFormat and Fluent can do largely the same thing. Mark Davis in the Unicode Conference proposed an interesting idea. Come up with a data model and figure out what Fluent can represent and what MessageFormat can represent. +MIH: I have an example implementation. It was easy to write. In this area, MessageFormat and Fluent can do largely the same thing. Mark Davis in the Unicode Conference proposed an interesting idea. Come up with a data model and figure out what Fluent can represent and what MessageFormat can represent. ZB: I don't expect any major differences. -APS: I agree. It's straightforward; there's nothing mystical about doing this. It's just a question about formalizing this. If you did this at the ICU level, you'd have to deal with conflict resolution. +APS: I agree. It's straightforward; there's nothing mystical about doing this. It's just a question about formalizing this. If you did this at the ICU level, you'd have to deal with conflict resolution. STA: When you say pluggable, do you mean by platform or by programmers? MIH: By programmers. -STA: I would like to make sure this plugs into the "metadata" point below. We've found in Mozilla that it might require a way to communicate to translators what these messages mean and what they do. +STA: I would like to make sure this plugs into the "metadata" point below. We've found in Mozilla that it might require a way to communicate to translators what these messages mean and what they do. -MIH: Yeah. The more expressive we are to programmers, the messier it is to translators. If you control your full stack, like Mozilla, that's fine, but for the public audience, they don't have full stack control of the translators. +MIH: Yeah. The more expressive we are to programmers, the messier it is to translators. If you control your full stack, like Mozilla, that's fine, but for the public audience, they don't have full stack control of the translators. ### Requirements slide: HTML markup -ZB: A lot of mistakes programmers make in DIY localization is thinking about the order of arguments in the string. You might have images, etc. In Fluent we developed a service system of overlaying and re-translation not breaking things. W3C has standardized arguments, etc. +ZB: A lot of mistakes programmers make in DIY localization is thinking about the order of arguments in the string. You might have images, etc. In Fluent we developed a service system of overlaying and re-translation not breaking things. W3C has standardized arguments, etc. -MIH: I would recommend looking at the XLIF format. They have open, close, and standalone placeholders; placeholders have flags like it's OK to overlap, clone, etc. Those are the kinds of concepts that I think would help for tag. +MIH: I would recommend looking at the XLIF format. They have open, close, and standalone placeholders; placeholders have flags like it's OK to overlap, clone, etc. Those are the kinds of concepts that I think would help for tag. -STA: There are two key use cases we've identified. First, when we use HTML, we want to interpolate that into a full sentence. Second, when markup is part of the localization, like italics. These are different because they require different runtime semantics. I think it needs different allowances for syntax. For example, it would be nice to use angle brackets in translations. +STA: There are two key use cases we've identified. First, when we use HTML, we want to interpolate that into a full sentence. Second, when markup is part of the localization, like italics. These are different because they require different runtime semantics. I think it needs different allowances for syntax. For example, it would be nice to use angle brackets in translations. -JWN: I wanted to specifically mention that FBT looks like you're creating HTML markup, but there's a transpiler happening behind the scenes before it makes it to the user. The way we've been making with bolds and spans is we've been using auto-interpolation that the transpiler abstracts away from you so that the programmer doesn't have to think about it. For simple cases it works just fine. In general I would like to see the API facing the engineer, abstracting it away from them. It would be nice if they could write markup and have something in-between what they see and the translator sees. We should specify what we mean when we say "markup". +JWN: I wanted to specifically mention that FBT looks like you're creating HTML markup, but there's a transpiler happening behind the scenes before it makes it to the user. The way we've been making with bolds and spans is we've been using auto-interpolation that the transpiler abstracts away from you so that the programmer doesn't have to think about it. For simple cases it works just fine. In general I would like to see the API facing the engineer, abstracting it away from them. It would be nice if they could write markup and have something in-between what they see and the translator sees. We should specify what we mean when we say "markup". ### Requirements slide: Cross-platform -APS: I agree. Something portable that works in Java, C for native platforms, etc., is really important. +APS: I agree. Something portable that works in Java, C for native platforms, etc., is really important. MMK: +1 -APS: I'd like to add that if we have a common format, translators and so forth will get used to it, which is beneficial, because then the industry can target something. We have a lot of experience with translators not being familiar with Java/ICU MessageFormat, even at this late date. +APS: I'd like to add that if we have a common format, translators and so forth will get used to it, which is beneficial, because then the industry can target something. We have a lot of experience with translators not being familiar with Java/ICU MessageFormat, even at this late date. (RCA continues presentation) @@ -89,23 +90,23 @@ APS: I'd like to add that if we have a common format, translators and so forth w APS: Managing locale fallback hierarchy… -ZB: There are 2 levels. There could be a regional difference (partial microtranslations), so es-CL translates a few strings differently than es, and then generate a selector that generates the formality level. So a different product could use a formal language, or informal language, for the same message. So those two levels of variant selectors, by locale and by formality… +ZB: There are 2 levels. There could be a regional difference (partial microtranslations), so es-CL translates a few strings differently than es, and then generate a selector that generates the formality level. So a different product could use a formal language, or informal language, for the same message. So those two levels of variant selectors, by locale and by formality… APS: So, within a locale, having different modalities. -MIH: The biggest challenge in localization tools and translation memories, etc., is handling end-to-end mapping. So in English, if you have a message with one level of formality, and you want that translated to 3 different messages of formality in Japanese, you break a lot of different tools. Plural is a similar concept. +MIH: The biggest challenge in localization tools and translation memories, etc., is handling end-to-end mapping. So in English, if you have a message with one level of formality, and you want that translated to 3 different messages of formality in Japanese, you break a lot of different tools. Plural is a similar concept. ### End or requirements slide RCA: Any other thoughts before the next slide? -MIH: I think we should decouple the message syntax from the file format. I can store it in JSON, database, etc., and I can do a lot of things with it if I'm not tied to a file format: serialization versus runtime. +MIH: I think we should decouple the message syntax from the file format. I can store it in JSON, database, etc., and I can do a lot of things with it if I'm not tied to a file format: serialization versus runtime. APS: So we're trying to figure out scope. RX: Fallback is something I would like to include as a key requirement. -APS: I think that goes to what MIH said. If what we're defining is the MessageFormat syntax, we are talking about how you write your messages in your program. But if we're designing a file format, we have to figure out the things that go with that syntax. I won't put words in people's mouth, but Fluent has found one set of restrictions you can do if you combine the two. I think there is room for more design if you can bring different file formats to the floor. +APS: I think that goes to what MIH said. If what we're defining is the MessageFormat syntax, we are talking about how you write your messages in your program. But if we're designing a file format, we have to figure out the things that go with that syntax. I won't put words in people's mouth, but Fluent has found one set of restrictions you can do if you combine the two. I think there is room for more design if you can bring different file formats to the floor. ## Working Group Organization @@ -127,13 +128,13 @@ APS: I find minutes are more useful. SFC: I'm happy to be the scribe; having a second scribe would be good, too. -RCA: About the backlog. How should we handle this collaboration? +RCA: About the backlog. How should we handle this collaboration? SFC: We should make a repo for discussions. RCA: How about subtasks? -SFC: We're already a subgroup. I don't think we should extend the hierarchy unless we find that we need it. +SFC: We're already a subgroup. I don't think we should extend the hierarchy unless we find that we need it. ## Roadmap @@ -149,33 +150,33 @@ STA: The key decision to debate first would be single file format versus decoupl RCA: I will create issues on the repository for this backlog. -MIH: I wanted to propose that if it's possible to share documents that you accumulated during the 1:1 phase of the project? You mentioned already prototypes created by different people. Sharing what you have already. +MIH: I wanted to propose that if it's possible to share documents that you accumulated during the 1:1 phase of the project? You mentioned already prototypes created by different people. Sharing what you have already. RCA: Yeah, I'll try to get permission from everyone before sharing the docs. -SFC: We should set up a GSuite folder to share docs. It could be in either chromium.org or unicode.org. +SFC: We should set up a GSuite folder to share docs. It could be in either chromium.org or unicode.org. APS: I think the key decision, like STA said, is whether we're building a file format or an API. SFC: It might be good to have a proposal that we can mutate. -MIH: The elephant in the room is, is this Fluent 2.0 or MessageFormat 2.0? Until we decide if we go this or that way, it would be difficult to come up with a decent strawman. I think we all agree, structurally, they are very similar, but the syntax is pretty different. +MIH: The elephant in the room is, is this Fluent 2.0 or MessageFormat 2.0? Until we decide if we go this or that way, it would be difficult to come up with a decent strawman. I think we all agree, structurally, they are very similar, but the syntax is pretty different. -STA: I think it's important that we talk about this openly. I'm thankful for everyone who learned more about fluent. I think this working group has the opportunity to select scope with precision and start with something smaller. I think we're excited about everyone's participating, and like we shouldn't just do Fluent, because it does many other things as well. Syntax has a tendency to be an inflammatory topic sometimes. I took part in coming up with Fluent’s syntax, and it's been a draining process. It's amazing that this is taking place and it would be a pity to start by talking about syntax. We should think about whether to do file format, as well as use cases for the API. I think this is an opportunity to not repeat the mistakes that we might have made. I come here as someone who wants to build MessageFormat 2.0. +STA: I think it's important that we talk about this openly. I'm thankful for everyone who learned more about fluent. I think this working group has the opportunity to select scope with precision and start with something smaller. I think we're excited about everyone's participating, and like we shouldn't just do Fluent, because it does many other things as well. Syntax has a tendency to be an inflammatory topic sometimes. I took part in coming up with Fluent’s syntax, and it's been a draining process. It's amazing that this is taking place and it would be a pity to start by talking about syntax. We should think about whether to do file format, as well as use cases for the API. I think this is an opportunity to not repeat the mistakes that we might have made. I come here as someone who wants to build MessageFormat 2.0. SFC: We should start with an MVP, and you're right, we should know what we're trying to build first. -ZB: It's hard to think about JavaScript localization without thinking about HTML. It's about the web stack. How do you localize a paragraph or a menu item? In JavaScript, you have a function that you call, and inject it into HTML. In HTML, you have an identifier that's associated with localization resources. It's useful to say if we're trying to design a localization system for the web, or exclusively for JavaScript, and expect that the W3C will do something different for HTML. +ZB: It's hard to think about JavaScript localization without thinking about HTML. It's about the web stack. How do you localize a paragraph or a menu item? In JavaScript, you have a function that you call, and inject it into HTML. In HTML, you have an identifier that's associated with localization resources. It's useful to say if we're trying to design a localization system for the web, or exclusively for JavaScript, and expect that the W3C will do something different for HTML. ECH: It would be useful for us to know the technical issues that Fluent had to solve. SFC: Would be useful for ZB to share slides about Fluent, and also fo rhte champions of the other existing frameworks to share slides, so that everyone can get up to speed on these various different options. -SFC: We should think about decouping the data representation from the web-specific use cases. We want to eventually build something that can also be used in Android, etc. +SFC: We should think about decouping the data representation from the web-specific use cases. We want to eventually build something that can also be used in Android, etc. -APS: If JavaScript could just render ICU MessageFormat, that would solve a lot of problems. The ability to build messages and format them, and get things out of CLDR data that we can, that's important to me, and then adding on Amazon-special sauce is good. +APS: If JavaScript could just render ICU MessageFormat, that would solve a lot of problems. The ability to build messages and format them, and get things out of CLDR data that we can, that's important to me, and then adding on Amazon-special sauce is good. -LHO: Having more mechanisms to extract CLDR data out of the browser is more important. The parser for MessageFormat in react-intl is just a few kilobytes, tiny compared to the data that it needs. +LHO: Having more mechanisms to extract CLDR data out of the browser is more important. The parser for MessageFormat in react-intl is just a few kilobytes, tiny compared to the data that it needs. RCA: Next meeting, we should have a timebox to just share about the existing frameworks, pros and cons. @@ -187,28 +188,22 @@ APS: I intend to include someone from our enterprise localization team. MIH: I can bring some of that also; I worked for 7 years on the vendor side as a l10n engineer, and on the client side I worked on the l10n tooling for Netflix and a couple of years at Google. -SFC: We should build an MVP and then we can circulate it more widely. We don't necessarily want to get too many cooks in the kitchen. - +SFC: We should build an MVP and then we can circulate it more widely. We don't necessarily want to get too many cooks in the kitchen. - -## Documentation / Links +## Documentation / Links https://docs.google.com/document/d/1oiKRfkuCuatT9k459nYwYw3neQ2Vm3rJ4toOu9wNwr4/edit?usp=sharing -https://github.com/echeran/clj-icu4j -> walkthrough; examples; impl code +https://github.com/echeran/clj-icu4j -> walkthrough; examples; impl code https://docs.google.com/presentation/d/1Rs29O3h56bS8SZx331AH8rDroH81GWHZko_JEAcTeR4/edit?usp=sharing Fluent 1.0 slides from Unicode Conference: https://www.unicodeconference.org/presentations-42/S12T3-Braniecki.pdf - ## Backlog - - Resource Format vs Message Format - - Understand Consequences of deficions and analyze how API’s should be used - - Collect Use Cases - - Open Requirements List - - MVP Roadmap - - - +- Resource Format vs Message Format +- Understand Consequences of deficions and analyze how API’s should be used +- Collect Use Cases +- Open Requirements List +- MVP Roadmap diff --git a/meetings/2019/notes-2019-12-16.md b/meetings/2019/notes-2019-12-16.md index 736d1dae7d..8b5f0fb5ea 100644 --- a/meetings/2019/notes-2019-12-16.md +++ b/meetings/2019/notes-2019-12-16.md @@ -1,6 +1,7 @@ #### December 16 Attendees: + - Romulo Cintra - CaixaBank (RCA) -- Steven R. Loomis - IBM (SRL) - will need to stay on mute this time +- Steven R. Loomis - IBM (SRL) - will need to stay on mute this time - Eemeli Aro - OpenJSF & Vincit (EAO) - Robert Chu - Amazon.com (RCU) - Mike McKenna - PayPal (MGM) @@ -14,7 +15,7 @@ - Elango Cheran - Google (ECH) - Long Ho - Dropbox (LHO) -## MessageFormat Working Group Contacts : +## MessageFormat Working Group Contacts : - [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) @@ -23,30 +24,30 @@ January 6, 10am PDT (6pm GMT) ## Agenda + - [ Agenda on Github ](https://github.com/unicode-org/message-format-wg/issues/7) ### Introductions ### Process -SRL: Update from Unicode EMG today : MessageFormat WG to be sub-WG of CLDR, with the goal of producing a UTS (Unicode Technical Specification), i.e. parallel to UTS#35 (CLDR’s LDML). There may be intermediate steps, for example it may start as a [Unicode Technical Note](https://www.unicode.org/notes/about-notes.html) which has a lower bar for publication. But it can keep its number, so UTN#0000 can become UTS#0000. +SRL: Update from Unicode EMG today : MessageFormat WG to be sub-WG of CLDR, with the goal of producing a UTS (Unicode Technical Specification), i.e. parallel to UTS#35 (CLDR’s LDML). There may be intermediate steps, for example it may start as a [Unicode Technical Note](https://www.unicode.org/notes/about-notes.html) which has a lower bar for publication. But it can keep its number, so UTN#0000 can become UTS#0000. SFC: Unicode DFS based on 2-clause BSD. GitHub repo under unicode, etc. -SRL: It’s an OSI approved license, SPDX Unicode-DFS-2016 https://spdx.org/licenses/Unicode-DFS-2016.html - +SRL: It’s an OSI approved license, SPDX Unicode-DFS-2016 https://spdx.org/licenses/Unicode-DFS-2016.html -### Presentations +### Presentations #### Staś Małolepszy (STA) - Fluent STA: [presents slides](https://docs.google.com/presentation/d/1zgOzEDBIVMBoQPouK3BFQRI8f3kSmTXm4jtg3m889J8/edit?usp=sharing) -LHO: Can you talk about the experience on the translation side? That's often a bottleneck for us, training new translators on the format, etc. +LHO: Can you talk about the experience on the translation side? That's often a bottleneck for us, training new translators on the format, etc. -STA: We focused on discoverability of syntax every new format comes with some friction. We're using our in-house CAT tool for translating FIrefox, which has some features specifically tailored to Fluent. We also have a whole infrastructure of checkers and linters that verify the integrity of the bundles. It's about allowing different syntaxes and then using linters and dashboards to nudge them to the best practice. +STA: We focused on discoverability of syntax every new format comes with some friction. We're using our in-house CAT tool for translating FIrefox, which has some features specifically tailored to Fluent. We also have a whole infrastructure of checkers and linters that verify the integrity of the bundles. It's about allowing different syntaxes and then using linters and dashboards to nudge them to the best practice. -ZB: We also feel that we have a good number of opportunities because we are using our own format. There's a number of l10n-specific features we may add later. One characteristic we found is that if you try to re-use some other format, like it's harder to encode. Multiline is hard with JSON, for example. +ZB: We also feel that we have a good number of opportunities because we are using our own format. There's a number of l10n-specific features we may add later. One characteristic we found is that if you try to re-use some other format, like it's harder to encode. Multiline is hard with JSON, for example. #### John Watson (JRW) - FBT @@ -54,35 +55,34 @@ JRW: [presents slides](https://docs.google.com/presentation/d/1PvnBdVTOlU1-A2MdZ (The below wasn’t presented, just background for questions:) [plural docs](https://facebookincubator.github.io/fbt/docs/plurals/) -[IUC 43 presentation](https://drive.google.com/open?id=1IsTa4tTn9WFfh62lz5aJyBwohIiK9R7i ) +[IUC 43 presentation](https://drive.google.com/open?id=1IsTa4tTn9WFfh62lz5aJyBwohIiK9R7i) MGM: FBT question - can I use FBT constructs to choose gender of objects? (E.g. "table" vs "chair" have different gender articles in Spanish) JRW: yeah so ideally, fbt.enum would be used here. -Something like `fbt('give me a', fbt.enum(['chair','table']))` would solve that particular case 'a' would cause 'la' or 'el' as you'd get 2 full phrases extracted there for transliteration +Something like `fbt('give me a', fbt.enum(['chair','table']))` would solve that particular case 'a' would cause 'la' or 'el' as you'd get 2 full phrases extracted there for transliteration We have been actively thinking about applying metadata/attributes to stand-alone fbt's, but it'd be a bit of work (so they could ... in theory... be passed as raw interpolations). +LHO: same question I have with Fluent, which is experience w/ the translators side. Also is there a bundling concept w/ fbt? E.g how do you do sharding of massive string packages (or if you don't) ? -LHO: same question I have with Fluent, which is experience w/ the translators side. Also is there a bundling concept w/ fbt? E.g how do you do sharding of massive string packages (or if you don't) ? - -JRW: yeah so for the open-sourced FBT framework, we don't provide any baked-in solutions. We have an internal tool that our vendors access to provide variations (on gender/plural), etc. +JRW: yeah so for the open-sourced FBT framework, we don't provide any baked-in solutions. We have an internal tool that our vendors access to provide variations (on gender/plural), etc. Developers typically request strings directly off individual diffs (pull requests) they author with a widget we have on our internal "code review" tool. Translations are often grouped by "project" internally both for "sharding" translation requests and simply providing context for a set of strings to be. -translated. Like "oculus store" for instance +translated. Like "oculus store" for instance -MIH: Dealing with the plurals, there's an attribute `showCount ifMany`. How locale-aware is that? I imagine the count could be before or after, alternate digits, etc. +MIH: Dealing with the plurals, there's an attribute `showCount ifMany`. How locale-aware is that? I imagine the count could be before or after, alternate digits, etc. -JRW: I should be more clear on what translators see. That string is generating only English. The translator can do whatever they want at that point, move it around, etc. No FBT enum, etc., shows up on the translator's side. I'll add some more links to presentations and API docs. +JRW: I should be more clear on what translators see. That string is generating only English. The translator can do whatever they want at that point, move it around, etc. No FBT enum, etc., shows up on the translator's side. I'll add some more links to presentations and API docs. -NIB: Is it possible for the translator to add custom rules? For example, if you want a `=0` in a certain message. Can translators do that? +NIB: Is it possible for the translator to add custom rules? For example, if you want a `=0` in a certain message. Can translators do that? -JRW: Translators can use the six CLDR buckets. If you need `=0`, you need an if-else at the moment. +JRW: Translators can use the six CLDR buckets. If you need `=0`, you need an if-else at the moment. -RX : for plurals how translators know what they are translating, do you have any special mark to contextualize what they are translating. +RX : for plurals how translators know what they are translating, do you have any special mark to contextualize what they are translating. -JRW: Yeah, we have a tool… translators can specify metadata associated with a token in the tool. So if a translator wants to explode/variate a token based on its number, they can. Translators can then translate the entire outer string for the few case, many case, etc. Variations are where the token decides the explosion for the outer string. +JRW: Yeah, we have a tool… translators can specify metadata associated with a token in the tool. So if a translator wants to explode/variate a token based on its number, they can. Translators can then translate the entire outer string for the few case, many case, etc. Variations are where the token decides the explosion for the outer string. RX: So there is a binding between the metadata and the tool the translator uses? @@ -90,7 +90,7 @@ JRW: Yeah. STA: It's interesting to see the hashing approach , is good to see how infra matches updated translations with updated source strings. -JRW: It's not a perfect solution. Every time someone forgets a period, or a space, oh, that's now a new translation. Maybe an AI should detect how close it is. Regarding out-of-date translations, the hashes make that really clear. We have fuzzy matching of translations, to load from previously translated strings, and people can approve those matches. The fact that it's a totally different hash makes it really easy to detect an out-of-date translation, but in many ways it makes our lives harder. +JRW: It's not a perfect solution. Every time someone forgets a period, or a space, oh, that's now a new translation. Maybe an AI should detect how close it is. Regarding out-of-date translations, the hashes make that really clear. We have fuzzy matching of translations, to load from previously translated strings, and people can approve those matches. The fact that it's a totally different hash makes it really easy to detect an out-of-date translation, but in many ways it makes our lives harder. #### Elango Cheran - MessageFormat @@ -100,40 +100,34 @@ MIH: To clarify, ECH's code is a proof of concept and not what we're currently u ### Open-Ended Discussion -SFC: To start the debate I see several differences between the presentations and they expose different opinions , runtime format, parse, how do you feel about have unified format to coverall all of these situations +SFC: To start the debate I see several differences between the presentations and they expose different opinions , runtime format, parse, how do you feel about have unified format to coverall all of these situations -MIH: The benefit of having one format is, as a developer, you can write something and refresh your browser and see what's happening. On the other side, if you have different formats, you can give the best format for each target audience. For example, at runtime, I want something fast and small. So I see benefits for both. So in a way, I think separating the API that does the formatting, a "locale-aware printf", the string that you pass to that API could be completely separate from how I got it: JSON, whatever. Having that flexibility seems useful, to separate things. +MIH: The benefit of having one format is, as a developer, you can write something and refresh your browser and see what's happening. On the other side, if you have different formats, you can give the best format for each target audience. For example, at runtime, I want something fast and small. So I see benefits for both. So in a way, I think separating the API that does the formatting, a "locale-aware printf", the string that you pass to that API could be completely separate from how I got it: JSON, whatever. Having that flexibility seems useful, to separate things. -LHO: One inspiration coming from other Intl APIs is formatToParts. Customizability that product teams look for is stylizing certain parts of the message. I'm wondering whether we need a stylized AST format; for example, this is a number, this is a time, etc., so people can style them separately. +LHO: One inspiration coming from other Intl APIs is formatToParts. Customizability that product teams look for is stylizing certain parts of the message. I'm wondering whether we need a stylized AST format; for example, this is a number, this is a time, etc., so people can style them separately. -NIB: I made a syntax highlighter for ICU syntax. Having a single syntax, including for linguists, seems useful because maybe translators want to customize more parts of the string besides plural. But then the syntax also has to be something translators can understand. +NIB: I made a syntax highlighter for ICU syntax. Having a single syntax, including for linguists, seems useful because maybe translators want to customize more parts of the string besides plural. But then the syntax also has to be something translators can understand. -ZB: When I've talked with MIH about fluent before, the idea is at the end of the day, you want a simple API call. MIH was talking about this in the context of the fluent resource format. The way to localize a button, etc., is a call in JavaScript. So we should think about ways of localizing user interfaces that are not based on a call in the programming language. +ZB: When I've talked with MIH about fluent before, the idea is at the end of the day, you want a simple API call. MIH was talking about this in the context of the fluent resource format. The way to localize a button, etc., is a call in JavaScript. So we should think about ways of localizing user interfaces that are not based on a call in the programming language. -ECH: We said before that we want to think about the DOM/HTML. That's an important domain. I don't know if this makes sense but I wonder if there's a line about what works for MessageFormat / ICU and a CAT tool, where you have a whole structure of text units, etc. You can go off as much as you want into that. ICU MessageFormat is not the easiest format for these tools, because translators like to see things in a list of source-destination strings. So what's the line of where we go? +ECH: We said before that we want to think about the DOM/HTML. That's an important domain. I don't know if this makes sense but I wonder if there's a line about what works for MessageFormat / ICU and a CAT tool, where you have a whole structure of text units, etc. You can go off as much as you want into that. ICU MessageFormat is not the easiest format for these tools, because translators like to see things in a list of source-destination strings. So what's the line of where we go? -RCA: We're talking about two big aspects: the translators and the developers. In my opinion we should have a tool built on top of the API to make it easier for translators. I think we need to think a lot about scaling, which is important, but we should also think about the developer with a small page. +RCA: We're talking about two big aspects: the translators and the developers. In my opinion we should have a tool built on top of the API to make it easier for translators. I think we need to think a lot about scaling, which is important, but we should also think about the developer with a small page. -MIH: I would like that we have a standard mapping from whatever we come up with to XLIFF. Because not everyone can build a whole l10n infra the way the big companies do. +MIH: I would like that we have a standard mapping from whatever we come up with to XLIFF. Because not everyone can build a whole l10n infra the way the big companies do. -SRL: +1. We should consider mapping to XLIFF. I'm on the XLIFF TC somewhat. If we needed to make extensions to XLIFF, we could reach out to them. But I think this should be a core consideration. +SRL: +1. We should consider mapping to XLIFF. I'm on the XLIFF TC somewhat. If we needed to make extensions to XLIFF, we could reach out to them. But I think this should be a core consideration. MIH: I have some open proposals to XLIFF. JRW: XLIFF is the better GetText? -MIHL GetText is a format for developers, whereas XLIFF is the format for translators. You make an XLIFF file to send to translators, and then you transform the XLIFF back to something usable at runtime. We shouldn't make developers write XLIFF by hand. Some tools support XLIFF well, some not. So I'm seeing translator tools as their own entity, and they can have their own import/export filters. You just have to make sure that the core stuff is powerful enough. +MIHL GetText is a format for developers, whereas XLIFF is the format for translators. You make an XLIFF file to send to translators, and then you transform the XLIFF back to something usable at runtime. We shouldn't make developers write XLIFF by hand. Some tools support XLIFF well, some not. So I'm seeing translator tools as their own entity, and they can have their own import/export filters. You just have to make sure that the core stuff is powerful enough. -NIB: Support for XLIFF is not widely supported, probably due to its complexity. We'd like to consider XLIFF again though. +NIB: Support for XLIFF is not widely supported, probably due to its complexity. We'd like to consider XLIFF again though. -SRL: Are you looking at XLIFF 2? XLIFF can accommodate without shoehorning these things. So there's opportunity. We're making XLIFF central to all our processing. +SRL: Are you looking at XLIFF 2? XLIFF can accommodate without shoehorning these things. So there's opportunity. We're making XLIFF central to all our processing. -RCU: When we get to gender handling, there's also the gender of nouns. Translators should be able to provide information on the grammatical properties of those nouns. +RCU: When we get to gender handling, there's also the gender of nouns. Translators should be able to provide information on the grammatical properties of those nouns. MIH: If the noun is open-ended, it's hard to determine the gender at runtime. - - - - - - diff --git a/meetings/2020/notes-2020-01-06.md b/meetings/2020/notes-2020-01-06.md index 7f25a1ec3a..c93ac263c2 100644 --- a/meetings/2020/notes-2020-01-06.md +++ b/meetings/2020/notes-2020-01-06.md @@ -1,4 +1,5 @@ #### January 6 Attendees: + - Addison Phillips - Amazon (APS) - Romulo Cintra - CaixaBank (RCA) - Rafael Xavier - PayPal (RX) @@ -10,13 +11,12 @@ - Long Ho - Dropbox (LHO) - Mike McKenna - PayPal (MGM) - Nicolas Bouvrette - Expedia (NIC) -- Jan Mühlemann - Locize (JMU) +- Jan Mühlemann - Locize (JMU) - Mihai Nita - Google (MIH) - Eemeli Aro - OpenJSF (EAO) - Shane Carr - Google (SFC) - -## MessageFormat Working Group Contacts : +## MessageFormat Working Group Contacts : - [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) @@ -25,61 +25,62 @@ January 27, 10am PDT (6pm GMT) ## Agenda + - [ Agenda on Github ](https://github.com/unicode-org/message-format-wg/issues/9) ### Introductions ### Process -### Presentations +### Presentations -RCA - We have planned two presentations for today, but probably we should postpone the NIC presentation for next WG Meeting. +RCA - We have planned two presentations for today, but probably we should postpone the NIC presentation for next WG Meeting. JAM: [presents slides](https://drive.google.com/file/d/1oJdW-vObyhgnOBVvotNqsXXfgmBl6hR5/view?usp=sharing) ### Open-Ended Discussion -RCA : Where did you find the numbers about the % preferences of users regarding the structure of translations +RCA : Where did you find the numbers about the % preferences of users regarding the structure of translations -JAM: They’re are based in my own product and personal experience building applications +JAM: They’re are based in my own product and personal experience building applications -ZB: There is a big difference between whether a feature is used 0.5% or 7% of the time when considering how to design the syntax of the format. We found that when the syntax was made easier for more advanced features, more advanced features were used more often. +ZB: There is a big difference between whether a feature is used 0.5% or 7% of the time when considering how to design the syntax of the format. We found that when the syntax was made easier for more advanced features, more advanced features were used more often. -MGM: are using CLDR rules for plurals ? +MGM: are using CLDR rules for plurals ? -JAM : Yeah they’re are CLDR based +JAM : Yeah they’re are CLDR based -MGM: how do you handle nesting and gender count ? +MGM: how do you handle nesting and gender count ? JAM : That work till certain degree , becomes a problem when you have a 2 gender in same sentence , I won't go further extending the library rather i will move to fluent or a future standard Message Format. APS: Whatever formats we use should promote the correct behavior to support all the intricacies of various languages, and use CAT tools to support the translation. -NIC : I will cover this in the next session , not all TMS are equal. Having a syntax that doesn't depend on the tool would be good as a gold standard. +NIC : I will cover this in the next session , not all TMS are equal. Having a syntax that doesn't depend on the tool would be good as a gold standard. -JRW: Really loud linters are helpful here. Telling them that they should be using a gender, etc. That catches a lot of bad use cases. +JRW: Really loud linters are helpful here. Telling them that they should be using a gender, etc. That catches a lot of bad use cases. ZB: At MZ we use 3 levels of error reporting if you are running testing or other kind of build they will loudly complain about the errors , and in productions we don’t report any errors, helping developers find their errors -### Review Github Issue : Create and Collect Use Cases / Roadmap and Requirements +### Review Github Issue : Create and Collect Use Cases / Roadmap and Requirements SFC : We must define the scope, we need to consider for each part of the pipeline we will start our design, if we are designing for developers, translators, or runtime efficiency. -APS: We have different user groups that we have to consider. For example, wanting to have complete thought strings as part of the structure. That's important for translators. I want to think about those things in defining a syntax. On the other hand, I don't want to dictate how companies' translation pipelines should work. That's a different thing. +APS: We have different user groups that we have to consider. For example, wanting to have complete thought strings as part of the structure. That's important for translators. I want to think about those things in defining a syntax. On the other hand, I don't want to dictate how companies' translation pipelines should work. That's a different thing. -LHO: I agree. Relevant though is the distribution pipeline. How resources get sent down to the client so that libraries can format it. Is it a string format that the API takes in? An AST? +LHO: I agree. Relevant though is the distribution pipeline. How resources get sent down to the client so that libraries can format it. Is it a string format that the API takes in? An AST? NIC : We need to find a way where … they are 2 keys have syntax that helps in authoring and translation parts , not having any dependence on any internal tooling. -ECH: Syntax might imply a file format, so we should instead think more generally about a data model. Whatever language you're targeting can implement that accordingly. +ECH: Syntax might imply a file format, so we should instead think more generally about a data model. Whatever language you're targeting can implement that accordingly. -MIH: In our design, we have to consider the syntax for the API different from the syntax we store in the files. That works very well in general. At Google we have support for plurals lite. It's implemented on top of MessageFormat, but the APIs don't look anything like MessageFormat. They don't have curly braces, etc. Separating the APIs from the string syntax itself would really help. +MIH: In our design, we have to consider the syntax for the API different from the syntax we store in the files. That works very well in general. At Google we have support for plurals lite. It's implemented on top of MessageFormat, but the APIs don't look anything like MessageFormat. They don't have curly braces, etc. Separating the APIs from the string syntax itself would really help. -LHO: I agree. It sounds like we need a separate syntax for declaration versus runtime. +LHO: I agree. It sounds like we need a separate syntax for declaration versus runtime. -APS: I like that argument a lot. One thing I would argue is that for a JavaScript standard, at the end of the day, we need to specify what browsers consume. So we need to concern ourselves with that form, so we will eventually need to consider serialization, but it should be something that also works outside HTML. +APS: I like that argument a lot. One thing I would argue is that for a JavaScript standard, at the end of the day, we need to specify what browsers consume. So we need to concern ourselves with that form, so we will eventually need to consider serialization, but it should be something that also works outside HTML. -RCA : we can probably move further with Syntax Part ? +RCA : we can probably move further with Syntax Part ? MIH: look at the features that each framework do, and just list the features @@ -87,14 +88,12 @@ LHO: spectrum ranges from simple tokenization like gettext vs embedded skeleton MIH [Slides](https://docs.google.com/document/d/1oiKRfkuCuatT9k459nYwYw3neQ2Vm3rJ4toOu9wNwr4/edit?ts=5d83b891#heading=h.e93b6xgdq3qa) that i can present or share with you about likes and dislikes of message format -STA: I’ve been thinking that one of the features we have in fluent i would like to discuss and share is about nesting messages, In fluent we annotate the important words from the app and not reproduce all dictionary. +STA: I’ve been thinking that one of the features we have in fluent i would like to discuss and share is about nesting messages, In fluent we annotate the important words from the app and not reproduce all dictionary. LHO: Can do a MVP w/o, then add later in v2? MIH: We should have a wish list that groups all features for MF [Requirements](https://github.com/unicode-org/message-format-wg/issues/3) -RX : I would like to have an overview about whatever api we come up with, in presentations we spoke about capabilities but there's a lot of things to discuss about this topic , we should elaborate about the features of api. +RX : I would like to have an overview about whatever api we come up with, in presentations we spoke about capabilities but there's a lot of things to discuss about this topic , we should elaborate about the features of api. RCA : As next steps we will start filling the wishlist of requirements to discuss and filter on the next meeting. - - diff --git a/meetings/2020/notes-2020-01-27.md b/meetings/2020/notes-2020-01-27.md index 90457d4482..fce9616eed 100644 --- a/meetings/2020/notes-2020-01-27.md +++ b/meetings/2020/notes-2020-01-27.md @@ -1,4 +1,5 @@ #### January 27 Attendees: + - Romulo Cintra - CaixaBank (RCA) - Pu Chen - Netflix (PUC) - Eemeli Aro - OpenJSF & Vincit (EAO) @@ -16,21 +17,20 @@ - Richard Gibson - OpenJSF & Oracle (RGN) - John Watson - Facebook (JRW) - Zibi Braniecki - Mozilla (ZBI) -- Jan Mühlemann - Locize (JMU) +- Jan Mühlemann - Locize (JMU) - Janne Tynkkynen - PayPal (JMT) - Nick Felker - Google (NFR) - -## MessageFormat Working Group Contacts : +## MessageFormat Working Group Contacts : - [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) - ## Next Meeting February 24, 10am PDT (6pm GMT) ## Agenda + - [ Agenda on Github ](https://github.com/unicode-org/message-format-wg/issues/12) ## Presentations @@ -39,67 +39,67 @@ February 24, 10am PDT (6pm GMT) NIC: [presents slides](https://drive.google.com/file/d/1raElLNcFO3n_sRRKnjPzk9r4HVwpiOtN/view?usp=sharing) -MIH: This is kind of a feature that we like to support (inflections) , Every language would be different…”” The idea of a stand alone editor is nice but don’t think this should belong to be part of groups agenda. +MIH: This is kind of a feature that we like to support (inflections) , Every language would be different…”” The idea of a stand alone editor is nice but don’t think this should belong to be part of groups agenda. NIC: (presents slides) GWR: Comment about how to handle the indefinite article in English. Can’t simply use vowels, need a dictionary for the edge cases. For example: a LED vs an LED, a unicorn vs an umbrella. French, Italian and Korean needs vowel pronunciation properties of words. -MIH: The way to approach things like this is to say, here's the functionality we want to support, and then look at a lot of languages. We can ask linguists what you do with articles, for example. In some languages you glue the article to the end of the word. CLDR for example says, we want to represent yesterday and tomorrow, and send the request to linguists, and then figure out the constructs they need to support it. So you learn from linguists before you model functionality. +MIH: The way to approach things like this is to say, here's the functionality we want to support, and then look at a lot of languages. We can ask linguists what you do with articles, for example. In some languages you glue the article to the end of the word. CLDR for example says, we want to represent yesterday and tomorrow, and send the request to linguists, and then figure out the constructs they need to support it. So you learn from linguists before you model functionality. -NIC: +1. Doing a global deep-dive to figure out how it works is good because otherwise your implementation will miss something. +NIC: +1. Doing a global deep-dive to figure out how it works is good because otherwise your implementation will miss something. -MIH: The idea of a standalone editor looks nice, but I'm not sure if that should be part of this group's deliverables. Once you have a standard syntax out there, someone can build this in a weekend. It's a bit like telling developers, I'm designing a feature in this language, and then you have to use this special tool to use it. In the end we want this to be adopted by CAT tools, and if you want something separate, that's fine. +MIH: The idea of a standalone editor looks nice, but I'm not sure if that should be part of this group's deliverables. Once you have a standard syntax out there, someone can build this in a weekend. It's a bit like telling developers, I'm designing a feature in this language, and then you have to use this special tool to use it. In the end we want this to be adopted by CAT tools, and if you want something separate, that's fine. -NIC: The problem is that MessageFormat has been there for a while. I see this as a marketing tool, whereas if you don't have an easy way to test it out, it might be a longer path to integration. +NIC: The problem is that MessageFormat has been there for a while. I see this as a marketing tool, whereas if you don't have an easy way to test it out, it might be a longer path to integration. -STA: This group should design the AST so that the ecosystem can adopt it. We should think about things like syntax highlighting, for example, so that the end result can cater to these use cases well. I also agree about this being a marketing tool, where I can send people a link. It has an educational purpose as well. +STA: This group should design the AST so that the ecosystem can adopt it. We should think about things like syntax highlighting, for example, so that the end result can cater to these use cases well. I also agree about this being a marketing tool, where I can send people a link. It has an educational purpose as well. -RCA : The goal of the WG was to bring new ideas to MF, and all presentations and contributions for our group are welcome. I think that the presentation was great and tools and demo presented can be an example of Plugins/Tools that can be built on top of future MF API. +RCA : The goal of the WG was to bring new ideas to MF, and all presentations and contributions for our group are welcome. I think that the presentation was great and tools and demo presented can be an example of Plugins/Tools that can be built on top of future MF API. ### Presentation 2 EAO: [presents slides](https://docs.google.com/presentation/d/1nJrKnr2Unja12YsByrvAS2cK3UKC__1mO-PSXK102ro/edit#slide=id.p) -NIC: How would this work with a large set of dynamic variables (e.g. list of cities)? It could be a large dataset. Have you tried to build… is there a limitation in terms of dataset? +NIC: How would this work with a large set of dynamic variables (e.g. list of cities)? It could be a large dataset. Have you tried to build… is there a limitation in terms of dataset? -EAO: Probably not, if I understand the question right. With my experience, with input of MessageFormat messages, whether simple or complex, the output of a pre-compiled JavaScript source… in particular, when you zip the output, it's very minimally more than the output of the messages you've got. +EAO: Probably not, if I understand the question right. With my experience, with input of MessageFormat messages, whether simple or complex, the output of a pre-compiled JavaScript source… in particular, when you zip the output, it's very minimally more than the output of the messages you've got. PUC: Is there any special plural rule feature being supported in this tool compared to intl.pluralrule? PUC: Is there any special feature for this tool compared to the existing JavaScript message formatting tools? -EAO: This is an existing JavaScript tool. JavaScript does not itself have MessageFormat. +EAO: This is an existing JavaScript tool. JavaScript does not itself have MessageFormat. -ZBI: You said you're using your own plural rules. What are the differences between that and Intl.PluralRules? +ZBI: You said you're using your own plural rules. What are the differences between that and Intl.PluralRules? -EAO: Backwards compatibility. This may change relatively soon when people stop caring about supporting IE 11. But the rules are the same as the Intl.PluralRules polyfill. +EAO: Backwards compatibility. This may change relatively soon when people stop caring about supporting IE 11. But the rules are the same as the Intl.PluralRules polyfill. JMG: It looked like MessageFormat took a list of "supported locales", how does it handle fallback for the locales? -EAO: It only does very simply fallback by (paraphrasing) truncating from the right. Only the first / primary subtag matters. The only exception is pt-BR versus pt-PT. +EAO: It only does very simply fallback by (paraphrasing) truncating from the right. Only the first / primary subtag matters. The only exception is pt-BR versus pt-PT. JMG: I think ideally it would be great if this could pluggable ## Process ### Requirements List -Review : -https://github.com/unicode-org/message-format-wg/issues/3 +Review : +https://github.com/unicode-org/message-format-wg/issues/3 -RCA: There is a lot of great activity in this thread. I've started putting together a project board to organize this. Do you agree with this approach? Does anyone else have ideas? +RCA: There is a lot of great activity in this thread. I've started putting together a project board to organize this. Do you agree with this approach? Does anyone else have ideas? -NIC: There are some requirements that are bigger topics. Are we expecting to discuss or find closure on these? +NIC: There are some requirements that are bigger topics. Are we expecting to discuss or find closure on these? -MIH: I think we need to split this long thread somehow. I'm totally fine with the way you proposed it with the project board. Is there a way to comment there? I think it would be nice to be able to isolate these kinds of threads with a comment. Can we have a thread of comments? It would be useful to have, before going to voting, we're going to end up with something that is not very consistent in the end. +MIH: I think we need to split this long thread somehow. I'm totally fine with the way you proposed it with the project board. Is there a way to comment there? I think it would be nice to be able to isolate these kinds of threads with a comment. Can we have a thread of comments? It would be useful to have, before going to voting, we're going to end up with something that is not very consistent in the end. -ZBI: You can file an issue out of every comment. And we shouldn't vote; that's not how you design great software. +ZBI: You can file an issue out of every comment. And we shouldn't vote; that's not how you design great software. RCA: You can associate the project board items with comments. (demonstrates on screen) -NIC: Maybe before we create the cards, agreeing on what the list is. I think each item in that list should spin out a new issue. Then each card should correspond to an issue. +NIC: Maybe before we create the cards, agreeing on what the list is. I think each item in that list should spin out a new issue. Then each card should correspond to an issue. -MIH: I think some of these issues need cleaner description. Some of them are 5-10 words. The person who suggested that knows what that means, but not others. +MIH: I think some of these issues need cleaner description. Some of them are 5-10 words. The person who suggested that knows what that means, but not others. To-Do: Owners of the requirements should create separate issues for them. @@ -109,45 +109,40 @@ RCA : Soon as we split this in separated issues i will work on labeling and crea ### Support messages in HTML -ECH: I think the question is, whatever we decide, should HTML be considered from the get-go, or can we design something that is flexible enough that it can support HTML as yet another file format as any other file format or syntax? I've been working on the backend of a CAT tool supporting HTML, so I have some thoughts on this. +ECH: I think the question is, whatever we decide, should HTML be considered from the get-go, or can we design something that is flexible enough that it can support HTML as yet another file format as any other file format or syntax? I've been working on the backend of a CAT tool supporting HTML, so I have some thoughts on this. -MIH: After seeing the presentation and back-and-forth, I think we need to support some kind of HTML tagging. You can have a string hard-coded in your JavaScript, which could have HTML, like bold tags and such. I really don't think we can say, localization tools handle HTML properly and we are just good. Because that's not the case. They do have HTML, they do support placeholders, but no tool I know supports a mixture: HTML placeholders that will be replaced at runtime. The only framework I know that does something like that is okapi. +MIH: After seeing the presentation and back-and-forth, I think we need to support some kind of HTML tagging. You can have a string hard-coded in your JavaScript, which could have HTML, like bold tags and such. I really don't think we can say, localization tools handle HTML properly and we are just good. Because that's not the case. They do have HTML, they do support placeholders, but no tool I know supports a mixture: HTML placeholders that will be replaced at runtime. The only framework I know that does something like that is okapi. -NIC: To add to that, most commercial tools use okapi, that support multiple layers of filters. This might be a place where it's good to include linguists. +NIC: To add to that, most commercial tools use okapi, that support multiple layers of filters. This might be a place where it's good to include linguists. -MIH: I'm afraid the trouble is that you've seen some tools that are online. There are big tools that have big shares of the market that don't support subfiltering. +MIH: I'm afraid the trouble is that you've seen some tools that are online. There are big tools that have big shares of the market that don't support subfiltering. NIC: I'm 99% sure that SDL supports this case. -ECH: Back to the question of HTML. I was using okapi. MIH introduced it to me. I was tasked with supporting a file format that was protobuf with HTML embedded inside of it. The thing for me was that the okapi data model was able to support HTML just as much as it was able to support JSON or anything else. So I see no reason why we can't do something similar for MessageFormat. The scope is that we're talking about a specific message. A message in MessageFormat isn't a full document worth of stuff. +ECH: Back to the question of HTML. I was using okapi. MIH introduced it to me. I was tasked with supporting a file format that was protobuf with HTML embedded inside of it. The thing for me was that the okapi data model was able to support HTML just as much as it was able to support JSON or anything else. So I see no reason why we can't do something similar for MessageFormat. The scope is that we're talking about a specific message. A message in MessageFormat isn't a full document worth of stuff. -ECH: I think it's also interesting as a point of reference for the problems we had with our old CAT tool, Google Translator Toolkit (GTT). GTT represented all input documents as HTML. You can visualize what you're translating, and that worked to a point until it didn't work. This thing is our localization medium, it's our preview, and it's a one-stop shop. It seems convenient, but conflating those concerns creates complexity. The okapi model simplifies things. +ECH: I think it's also interesting as a point of reference for the problems we had with our old CAT tool, Google Translator Toolkit (GTT). GTT represented all input documents as HTML. You can visualize what you're translating, and that worked to a point until it didn't work. This thing is our localization medium, it's our preview, and it's a one-stop shop. It seems convenient, but conflating those concerns creates complexity. The okapi model simplifies things. -ECH: Taking a step back, I think people call it AST, to me it seems the data model. If we understand the input we give to the function, we can handle the HTML case. +ECH: Taking a step back, I think people call it AST, to me it seems the data model. If we understand the input we give to the function, we can handle the HTML case. ECH: We might even want to have a separate thread for terminology and concepts. -ZBI: Even within the thread of HTML support, there are multiple levels we can talk about. One the other hand, we can recognize HTML as an important description language. At the very least we can think about avoiding special symbols that have special meaning in HTML. But we can also dig much deeper. If you look at Fluent, I would question some assumptions. You said we should think on the level of a message. Maybe we should think at the level of a UI widget, where you have a binding to the elements of the widget. I think HTML, QML, would challenge thinking on the level of a function call. Maybe we don't have to go for a single function. +ZBI: Even within the thread of HTML support, there are multiple levels we can talk about. One the other hand, we can recognize HTML as an important description language. At the very least we can think about avoiding special symbols that have special meaning in HTML. But we can also dig much deeper. If you look at Fluent, I would question some assumptions. You said we should think on the level of a message. Maybe we should think at the level of a UI widget, where you have a binding to the elements of the widget. I think HTML, QML, would challenge thinking on the level of a function call. Maybe we don't have to go for a single function. -ECH: I understand that use case. What's important is the essential data that we're passing in. Being able to go beyond the level of a single message and seeing many messages is something that a TMS or higher-level framework can do as long as you uniquely tag each message. For something like Fluent, that's extra functionality that doesn't exist in ICU MessageFormat, which is fine, but to couple that into the discussion doesn't seem necessary. +ECH: I understand that use case. What's important is the essential data that we're passing in. Being able to go beyond the level of a single message and seeing many messages is something that a TMS or higher-level framework can do as long as you uniquely tag each message. For something like Fluent, that's extra functionality that doesn't exist in ICU MessageFormat, which is fine, but to couple that into the discussion doesn't seem necessary. -MIH: I agree about not overly coupling things. You have text units, segments, etc. In okapi, you can represent HTML or markup or anything else. +MIH: I agree about not overly coupling things. You have text units, segments, etc. In okapi, you can represent HTML or markup or anything else. -ZBI: I share the sentiment that if we can reduce the scope, it's better, because the scope is already massive. But the majority of what I hear you saying is embedding HTML into a localization message. A separate challenge is binding HTML to a localization message. If you think of a specific string and linking it to a specific element, that's an interesting problem space. If we think about making out solution work for HTML, it introduces different challenges for designing an API that binds to a complicated UI widget. (Think Web Components) +ZBI: I share the sentiment that if we can reduce the scope, it's better, because the scope is already massive. But the majority of what I hear you saying is embedding HTML into a localization message. A separate challenge is binding HTML to a localization message. If you think of a specific string and linking it to a specific element, that's an interesting problem space. If we think about making out solution work for HTML, it introduces different challenges for designing an API that binds to a complicated UI widget. (Think Web Components) -MIH: I agree. The way I see this is that we need to support both variations out of the box: html with MessageFormat syntax inside (full document or fragment), and MessageFormat “strings” (or messages) with html tags. +MIH: I agree. The way I see this is that we need to support both variations out of the box: html with MessageFormat syntax inside (full document or fragment), and MessageFormat “strings” (or messages) with html tags. Example 1 (MF syntax in html fragment): closure-templates (and more precise here) Example 2 (html in MF “message”): msgId = Hello {user}, click here... -GWR: We should support not only HTML but also SSML. Not only written, but also spoken. +GWR: We should support not only HTML but also SSML. Not only written, but also spoken. ZBI: In our experience, a lot of HTML bindings are needed for accessibility technologies. MIH: We should separate the concept of the data model from the raw syntax. Mihai: I’ll try to put something together before the next meeting. - RCA: For the next meeting we agreed that we are gonna split issues from the initial requirements lists(Bag of ideas). This will be done by "owners" of feature request. - - - - diff --git a/meetings/2020/notes-2020-02-24.md b/meetings/2020/notes-2020-02-24.md index d24fa9365b..f081002a74 100644 --- a/meetings/2020/notes-2020-02-24.md +++ b/meetings/2020/notes-2020-02-24.md @@ -1,4 +1,5 @@ #### February 24 Attendees: + - Romulo Cintra - CaixaBank (RCA) - Mike McKenna - PayPal (MGM) - George Rhoten - Apple (GWR) @@ -8,7 +9,7 @@ - Nicolas Bouvrette - Expedia (NIC) - Johan Jongman - Staś Małolepszy (STA) -- Jan Mühlemann - Locize (JMU) +- Jan Mühlemann - Locize (JMU) - Hugo van der Merwe - Google (HUG) - fly on the wall / observer. - Mick Monaghan - Guidewire - (MMN) - Mihai Nita - Google (MIH) @@ -20,19 +21,17 @@ - George Rhoten - Apple (GWR) - Dan Chiba - Oracle (DCA) - -## MessageFormat Working Group Contacts: +## MessageFormat Working Group Contacts: - [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) - ## Next Meeting March 23, 10am PDT (6pm GMT) ## Agenda -- [ Agenda on Github ](https://github.com/unicode-org/message-format-wg/issues/46) +- [ Agenda on Github ](https://github.com/unicode-org/message-format-wg/issues/46) ## Presentations @@ -40,46 +39,45 @@ March 23, 10am PDT (6pm GMT) MIH: presents [slides](https://docs.google.com/presentation/d/1dyW29SlqjPRZVScobqEXjnP29fhbqMkCfgxPOWj3Tnw) - EAO: Clarifying Question: "data model" =?= "AST" -ECH: I kind-of disagree. I hope we can talk about terminology. AST and Data Model get used to mean the same thing, but they're not exactly the same thing. +ECH: I kind-of disagree. I hope we can talk about terminology. AST and Data Model get used to mean the same thing, but they're not exactly the same thing. MIH: An AST would contain a lot of fluff, like this is the beginning of an array marker, and a lot of tokens that are irrelevant. EAO: So as an output of what we're doing, we consider not only an updated source language itself, but a data model that could be the target of messages coming from other formats as well? -MIH: I can receive stuff from a database, etc. It doesn't matter how they get to me. +MIH: I can receive stuff from a database, etc. It doesn't matter how they get to me. RCA: Clarifying Question: The data model is something should be something "fixed" ? -MIH: One of the big things is allowing developers to plug in their own data types. The first is the placeholder, the second is the type (flag). We can say the type is flexible. You register a formatter for it. It's expandable, but it's fixed. The structure itself doesn't change. It allows you to add in pieces here and there. If something is a Set, it is a Set. If it's a Map, it's a Map. +MIH: One of the big things is allowing developers to plug in their own data types. The first is the placeholder, the second is the type (flag). We can say the type is flexible. You register a formatter for it. It's expandable, but it's fixed. The structure itself doesn't change. It allows you to add in pieces here and there. If something is a Set, it is a Set. If it's a Map, it's a Map. RCA: Cool, so this is a model that can be extended. -MIH: You can also glue on top of it. Placeholder, style, key name foo, cannot be changed by translators. You can represent … +MIH: You can also glue on top of it. Placeholder, style, key name foo, cannot be changed by translators. You can represent … -NIC: Clarifying Question: Isn't the data model largely influenced by the problems (e.g. linguistic) we want to solve? Do we know all the problems we want to solve? There is a relation between both. +NIC: Clarifying Question: Isn't the data model largely influenced by the problems (e.g. linguistic) we want to solve? Do we know all the problems we want to solve? There is a relation between both. -MIH: I think we should take the initial one I presented in the beginning and modify it to represent the features we want. The data model I presented in the beginning might not be able to represent everything we want. I think it presents everything we have today, but we may need to change it to represent everything we want. +MIH: I think we should take the initial one I presented in the beginning and modify it to represent the features we want. The data model I presented in the beginning might not be able to represent everything we want. I think it presents everything we have today, but we may need to change it to represent everything we want. NIC: As we explore the problem space, we could discover something that could modify the model? MIH: Hopefully not drastically, but yes, it's possible. -GWR: One thing that Siri has are the notions semantic concepts and sentence fragments? We need a library that can fill in information (ex: word inflections) so that you don't have to repeat info over & over. I think semantic concepts are very important. I think sentence fragments are iffy -- they can split, have case, and one fragment can interact with another fragment, so we need to be careful about that. +GWR: One thing that Siri has are the notions semantic concepts and sentence fragments? We need a library that can fill in information (ex: word inflections) so that you don't have to repeat info over & over. I think semantic concepts are very important. I think sentence fragments are iffy -- they can split, have case, and one fragment can interact with another fragment, so we need to be careful about that. -MIH: Yeah. Personally I have big doubts that we can represent everything that is needed to represent perfect formation of human speech. Google Assistant tries to solve the same problems as Siri I think. Generating speech from concepts is really hard. There are a lot of things behind it: you have servers that have data models, … +MIH: Yeah. Personally I have big doubts that we can represent everything that is needed to represent perfect formation of human speech. Google Assistant tries to solve the same problems as Siri I think. Generating speech from concepts is really hard. There are a lot of things behind it: you have servers that have data models, … GWR: We used to have it that you fill in every single answer, and we found that doesn't scale, and now we have to generate stuff using a lexical dictionary. -MIH: I think I know what you mean, but if we're talking about cases, we could have a dative case, or accusative case, etc. The parameter of which noun case could come from the backend somewhere, instead of just coming from the user. +MIH: I think I know what you mean, but if we're talking about cases, we could have a dative case, or accusative case, etc. The parameter of which noun case could come from the backend somewhere, instead of just coming from the user. -GWR: If we have a personal relationship: your mom called, your dad called. Those could have cases potentially. Those are relationships: a bounded set of things to consider. That's separate from the current MessageFormat. It's program-defined vocabulary, but it can be very generic across the language. +GWR: If we have a personal relationship: your mom called, your dad called. Those could have cases potentially. Those are relationships: a bounded set of things to consider. That's separate from the current MessageFormat. It's program-defined vocabulary, but it can be very generic across the language. MIH: I think, if we can design our data structures to accommodate that we should try it. -GWR: Do you think we also need to define the grammatical information for each language? Sometimes linguists have trouble defining these things. +GWR: Do you think we also need to define the grammatical information for each language? Sometimes linguists have trouble defining these things. MIH: I know I've seen this several times, sometimes translators do not know about the grammar rules of their own language, they just know how to translate and what sounds correct. @@ -87,60 +85,62 @@ MED: One technique we've found that works pretty well is if you pick a set of ph MIH: I think the question is : Should we collect all possible cases across languages? -MED: I think we could define a set of grammatical terms which could be internal to the model. We've been doing some of that internally in CLDR, which focuses on noun grammatical features. Most things are noun phrases. We should come up with the internal terminology for describing this, but that's not what you pass to translators. +MED: I think we could define a set of grammatical terms which could be internal to the model. We've been doing some of that internally in CLDR, which focuses on noun grammatical features. Most things are noun phrases. We should come up with the internal terminology for describing this, but that's not what you pass to translators. MIH: What we can do is add an registration mechanism for grammatical cases, etc. that can be similar to how BCP 47 represents locales (language/region/writing system/etc.) RCA: calls for some conclusions on this agenda item. Do we discuss it now, or work on it for the next meeting? -MIH: If there are questions, I will try to answer them now, but I think people should spend time thinking about this. I don't expect people to wrap their head around this on the spot. +MIH: If there are questions, I will try to answer them now, but I think people should spend time thinking about this. I don't expect people to wrap their head around this on the spot. + +SFC: TCQ - if in doubt, use "New Topic". -SFC: TCQ - if in doubt, use "New Topic". - Clarifying question: something that can be answered within 10-15 seconds. - Point of order: "I need help with notes / can't hear the call / audio isn't working", etc. - -### Open Issues - #50 #26 #47 +### Open Issues - #50 #26 #47 #### Issue [#50](https://github.com/unicode-org/message-format-wg/issues/50) - Design Principles by STA. + Computational vs Manual; Developer Control vs Localizer Control; DRY vs. WET; Resilient vs. Brittle; People-Friendly vs. Machine-Friendly -STA: This was a way for me to organize my thoughts. I feel some of these design principles for what we want and how we want to build. The examples I gave here are relevant when we were designing Fluent. I think they're relevant for syntax as well as APIs. I think the first one (HUG: Computational vs Manual?) is one of the most important ones. At the end of the day, we're building something for non-technical users. Maybe we can discuss each of these dimensions right now. Do you think that's a good idea? I'm happy to file separate issues for each of these dimensions. One of the goals of having these guidelines is the decision-making process. Pretty soon we need to have decisions +STA: This was a way for me to organize my thoughts. I feel some of these design principles for what we want and how we want to build. The examples I gave here are relevant when we were designing Fluent. I think they're relevant for syntax as well as APIs. I think the first one (HUG: Computational vs Manual?) is one of the most important ones. At the end of the day, we're building something for non-technical users. Maybe we can discuss each of these dimensions right now. Do you think that's a good idea? I'm happy to file separate issues for each of these dimensions. One of the goals of having these guidelines is the decision-making process. Pretty soon we need to have decisions -ECH: I think these are good things to discuss. If we don't tackle them head-on, we will have these discussions over and over. +ECH: I think these are good things to discuss. If we don't tackle them head-on, we will have these discussions over and over. MIH: agrees. Something like that is needed, to provide some consistency. STA: Should I file separate issues? -RCA: Let's talk about issue 30. I think STA should organize these offline. +RCA: Let's talk about issue 30. I think STA should organize these offline. -STA: Sure. It's a little hard to extract a question from several different issues. For example, there's a good discussion going on about file format simplicity. It covers a lot of the decisions we need to make. Maybe we can start with the dimensions I put out there; they are pretty universal. One trouble that I may have myself is that some of these dimensions are required by some of the other requirements we're discussing. Some of the requirements may change the landscape of the design principles. For example, the manual versus computational dimension. Comments from GWR were very interesting to me. Seeing how Siri does things and how Siri needs to compute phrases out of smaller pieces changed the way I think about these. +STA: Sure. It's a little hard to extract a question from several different issues. For example, there's a good discussion going on about file format simplicity. It covers a lot of the decisions we need to make. Maybe we can start with the dimensions I put out there; they are pretty universal. One trouble that I may have myself is that some of these dimensions are required by some of the other requirements we're discussing. Some of the requirements may change the landscape of the design principles. For example, the manual versus computational dimension. Comments from GWR were very interesting to me. Seeing how Siri does things and how Siri needs to compute phrases out of smaller pieces changed the way I think about these. -EAO: I think we should start making decisions. We've been talking a lot and mapping the space, but I think we should start making decisions on specific things. We can iterate later, but if we don't decide on things, we can't make progress. I think we should, once we have specific issues on GitHub, we can go forward from there. +EAO: I think we should start making decisions. We've been talking a lot and mapping the space, but I think we should start making decisions on specific things. We can iterate later, but if we don't decide on things, we can't make progress. I think we should, once we have specific issues on GitHub, we can go forward from there. ECH: Do you consider data model vs. syntax as an example of a design principle dimension? -STA: Yeah, I've been wondering that myself. I think it's a consequence of my design principles. I really like how MIH approached this. Focusing on the data model will get us faster to some decisions. It's a hard question. +STA: Yeah, I've been wondering that myself. I think it's a consequence of my design principles. I really like how MIH approached this. Focusing on the data model will get us faster to some decisions. It's a hard question. -ECH: I think it is an important one. To me, I think that's one of the most fundamental kinds of decisions we can make. I think it influences the process of designing what we're doing. I think we can compartmentalize, these are implementation details, etc., by addressing this particular dimension. +ECH: I think it is an important one. To me, I think that's one of the most fundamental kinds of decisions we can make. I think it influences the process of designing what we're doing. I think we can compartmentalize, these are implementation details, etc., by addressing this particular dimension. RCA: I see doing that as an easy way to segment issues and create focus groups to work on these specific issues. -ZBI: Some design principles may not be addressable via data model alone. I think there are questions that can be resolved at the data model level. One thing we did with Fluent is we did a quick iteration of data model change and map it onto a data model syntax. One of the things I have on my mind is that if we go with MIH's approach, that we still have some dedicated time/group to make a syntax that proves that the data model can be implemented. +ZBI: Some design principles may not be addressable via data model alone. I think there are questions that can be resolved at the data model level. One thing we did with Fluent is we did a quick iteration of data model change and map it onto a data model syntax. One of the things I have on my mind is that if we go with MIH's approach, that we still have some dedicated time/group to make a syntax that proves that the data model can be implemented. MIH: I agree that the data model and design principles don't fully overlap. We need both. -ZBI: I don't fully agree with that. Depending on how far we go with error recovery, salvaging a message that has a broken fallback, etc., those kind of things are only verifiable if we try to implement the data model in some kind of syntax. +ZBI: I don't fully agree with that. Depending on how far we go with error recovery, salvaging a message that has a broken fallback, etc., those kind of things are only verifiable if we try to implement the data model in some kind of syntax. + +MIH: I think it does make sense. I think if we have doubts about the data model, we should try stuff, we should throw stuff at it. For multi-line, I don't see how it influences it. I would say that yeah, we should independently throw ideas at it and try stuff. -MIH: I think it does make sense. I think if we have doubts about the data model, we should try stuff, we should throw stuff at it. For multi-line, I don't see how it influences it. I would say that yeah, we should independently throw ideas at it and try stuff. +ECH: I agree with both MIH and ZBI. Some of the concerns ZBI was bringing up seem to relate to a dimension of a design principle, which is where does MessageFormat end and a Translation Management System begin? I would like to address that more explicitly, so that we know what is in scope or out of scope. -ECH: I agree with both MIH and ZBI. Some of the concerns ZBI was bringing up seem to relate to a dimension of a design principle, which is where does MessageFormat end and a Translation Management System begin? I would like to address that more explicitly, so that we know what is in scope or out of scope. +STA: For the data model, should we use JSON? Do you have any suggestions? -STA: For the data model, should we use JSON? Do you have any suggestions? - Could we have more concrete/experimental github-based examples to help make WG discussions flow better? -MIH: I did something on top of protobufs because I'm more familiar with it. JavaScript, Python, Dart, C, etc., support protobufs. I like that it's more opinionated than just JSON. I think Thrift would be an equally good format. I don't have a strong opinion. I think it would be useful to prototype on top of that. +MIH: I did something on top of protobufs because I'm more familiar with it. JavaScript, Python, Dart, C, etc., support protobufs. I like that it's more opinionated than just JSON. I think Thrift would be an equally good format. I don't have a strong opinion. I think it would be useful to prototype on top of that. SFC: JSON is probably the most general format that's not company-specific. Together with use of JSON Schema @@ -150,7 +150,7 @@ SRL notes that even if an implementation doesn't use JSON Schema specifically in MIH campaigns again for some prototyping with different tools. We'd be able to have more concrete discussions. -ZBI: Encourages use of a strongly-typed language. When implementing Fluent, we started with JavaScript, then Python, and I was the first person to attempt to implement the data model in strongly typed language (Rust). That exposed a number of areas that were underspecified. +ZBI: Encourages use of a strongly-typed language. When implementing Fluent, we started with JavaScript, then Python, and I was the first person to attempt to implement the data model in strongly typed language (Rust). That exposed a number of areas that were underspecified. ##### Conclusion @@ -158,25 +158,25 @@ STA: I will file issues for each of these axes, and we should also file a separa #### Issue [#30](https://github.com/unicode-org/message-format-wg/issues/30) -Define technical terms - by ECH. -ECH: I want to make sure we can agree on certain terms. Reduced vocabulary reduces the chance that we talk past each other (reduces ambiguity). I grouped these: some might be related, used in the same way, different, etc. If you've had the same experiences I've had, if you can chime in, that would be great. For example, "placeholder" and "placeable" might be referring to the same thing. "AST" also came up earlier. To me, AST means that you are parsing tokens from some syntax, usually in the context of a compiler. I don't see that quite the same as a representation of data. The way it gets used is synonymous with the other. +ECH: I want to make sure we can agree on certain terms. Reduced vocabulary reduces the chance that we talk past each other (reduces ambiguity). I grouped these: some might be related, used in the same way, different, etc. If you've had the same experiences I've had, if you can chime in, that would be great. For example, "placeholder" and "placeable" might be referring to the same thing. "AST" also came up earlier. To me, AST means that you are parsing tokens from some syntax, usually in the context of a compiler. I don't see that quite the same as a representation of data. The way it gets used is synonymous with the other. -See list of terms / collective brainstorming at: https://github.com/unicode-org/message-format-wg/issues/30 +See list of terms / collective brainstorming at: https://github.com/unicode-org/message-format-wg/issues/30 RCA: Please also update the wiki. Should we place and add new terms at : https://github.com/unicode-org/message-format-wg/wiki/Glossary-&-Resources. -ECH: The wiki of terms right now is very basic. Hopefully we can start to boil it down. +ECH: The wiki of terms right now is very basic. Hopefully we can start to boil it down. SFC: Are there any particular terms you wanted to clarify right now? -ECH: The whole cluster of API argument syntax: I want to know how they relate to each other. I want us to define them so that we have our own definitions. I don't want us to make any assumptions about what they mean. +ECH: The whole cluster of API argument syntax: I want to know how they relate to each other. I want us to define them so that we have our own definitions. I don't want us to make any assumptions about what they mean. #### Issue [#47](https://github.com/unicode-org/message-format-wg/issues/30) - File Format - by NIC. -NIC: My preference is to push for file-format-agnostic. I think if we have a file syntax, it would hurt adoption. I think it should be as flexible as possible. Adopting a new file format is something that wouldn't be done quickly. If our goal is to have better adoption, do we have a way to solve all the problems we want to solve in a way that is file format agnostic? I think it would help guide discussion to decide which are the key features we want to solve and do a stack ranking. That can help us understand whether or not we need a special file format. +NIC: My preference is to push for file-format-agnostic. I think if we have a file syntax, it would hurt adoption. I think it should be as flexible as possible. Adopting a new file format is something that wouldn't be done quickly. If our goal is to have better adoption, do we have a way to solve all the problems we want to solve in a way that is file format agnostic? I think it would help guide discussion to decide which are the key features we want to solve and do a stack ranking. That can help us understand whether or not we need a special file format. SFC: It would be useful to sort issues/features and make an explainer doc, check-in to the wiki. A shared list that takes the features and identifies what design principles are required to support those features. If there's a feature that can't be supported without a file format, we can determine whether we need to prioritise such a feature? -RCA: This is related with Data Model definition shouldn’t we wait to have it defined or more advanced to start working on that ? +RCA: This is related with Data Model definition shouldn’t we wait to have it defined or more advanced to start working on that ? EAO: Big decision we ought to be making: do we really want to be driving all of i18n/l10n to be using one message format, or do we want to build an environment which can support many or all message formatting languages? All of what we're considering and covering could work with either of those, but they're very different worlds. @@ -190,19 +190,18 @@ I think this is another way of looking at the same question of: Data model vs sy E.g. let's standardize on an existing format: MessageFormat / Fluent / some other / etc? Or yes: we're trying to define one new single syntax to use them all? -STA: This discussion made me realize that maybe besides the design principles discussion, we should have a specific discussion about the goals and non-goals of the topic. For example, compatibility with XLIF is a bounding criterion. I have a couple of other ideas. I'll throw them into an issue. - -* End of meeting (should ideally have been a point of order?) +STA: This discussion made me realize that maybe besides the design principles discussion, we should have a specific discussion about the goals and non-goals of the topic. For example, compatibility with XLIF is a bounding criterion. I have a couple of other ideas. I'll throw them into an issue. +- End of meeting (should ideally have been a point of order?) ## Not discussed Issues + - [#48](https://github.com/unicode-org/message-format-wg/issues/48) - [#26](https://github.com/unicode-org/message-format-wg/issues/26) - -## Next meeting actions/issues +## Next meeting actions/issues - Create issues to split topics related with Design Principles #50 -- Data Model Discussion, Open issue to design and work on that with examples , discussions etc … +- Data Model Discussion, Open issue to design and work on that with examples , discussions etc … - Decide/vote : “Big decision we ought to be making: do we really want to be driving all of i18n/l10n to be using one message - format, or do we want to build an environment which can support many or all message formatting languages? All of what we're considering and covering could work with either of those, but they're very different worlds.” - Define goals and non-goals of MFWG related with #49 diff --git a/meetings/2020/notes-2020-03-23.md b/meetings/2020/notes-2020-03-23.md index 8742ae54a5..9372ebf9c7 100644 --- a/meetings/2020/notes-2020-03-23.md +++ b/meetings/2020/notes-2020-03-23.md @@ -1,4 +1,5 @@ #### March 23 Attendees: + - Romulo Cintra - CaixaBank (RCA) - David Filip - ADAPT Centre, Trinity College Dublin (DAF) - Pu Chen - Netflix (PCN) @@ -6,39 +7,39 @@ - John Watson - Facebook (JRW) - Elango Cheran - Google (ECH) - Zibi Braniecki - Mozilla (ZBI) -- Jan Mühlemann - Locize (JMU) +- Jan Mühlemann - Locize (JMU) - Mike McKenna - PayPal (MGM) - Shane F. Carr - Google (SFC) - Eemeli Aro - OpenJSF (EAO) - Elango Cheran - Google (ECH) -- Jan Mühlemann - Locize (JMU) +- Jan Mühlemann - Locize (JMU) - Rafael Xavier - Paypal (RXR) - Nicolas Bouvrette - Expedia (NIC) - Mihai Nita - Google (MIH) -## MessageFormat Working Group Contacts: +## MessageFormat Working Group Contacts: - [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) - ## Next Meeting April 20, 10am PDT (6pm GMT) ## Agenda -- [ Agenda on Github ](https://github.com/unicode-org/message-format-wg/issues/57) +- [ Agenda on Github ](https://github.com/unicode-org/message-format-wg/issues/57) ## Presentations -### Chair Group Announcements + +### Chair Group Announcements Chair Group Guidelines [Link](https://github.com/unicode-org/message-format-wg/blob/master/guidelines/chair-group.md) Working Guidelines Draft about Chair Group processes [Link](https://docs.google.com/document/d/1U6PiFopoOqPyAgJ_KSzEfZOyKIcCGA-qqb_PyRrv_Mc/edit?usp=sharing) -Easy way to volunteer to participate in Chair Group : [Link](https://github.com/unicode-org/message-format-wg/pull/70) +Easy way to volunteer to participate in Chair Group : [Link](https://github.com/unicode-org/message-format-wg/pull/70) -RCA: Working group should be more than one person, it should be led by more people, people who are regular attendees. Have created a Chair Group document to describe responsibilities, now added to MF WG repo, link is above. +RCA: Working group should be more than one person, it should be led by more people, people who are regular attendees. Have created a Chair Group document to describe responsibilities, now added to MF WG repo, link is above. Proposal will help handle coordination, help make proposals in offline mode between the monthly meetings to help drive momentum so that we accomplish more outside of our 1 hr / month meetings. @@ -50,15 +51,15 @@ RCA: It's for monthly meetings and sync up. DAF: How many people do you foresee? -RCA: Not sure, did not know if there would be volunteers. If everyone volunteers, that might be too many. Maximum maybe ⅓ of total members. But maybe 20 people too many. +RCA: Not sure, did not know if there would be volunteers. If everyone volunteers, that might be too many. Maximum maybe ⅓ of total members. But maybe 20 people too many. DAF: I'm intrigued, but also new to the group, so I'm hesitant to volunteer. EAO: Steering group sounds like a more appropriate name. -SFC: We could name it something. Reason why I suggested "Chair Group" to make it clear that all of us as a full body during the 90 min monthly meeting is when decisions are actually made. Whereas Steering Group / Committee sounds like they make all the decisions and only receive feedback. Chair Group is modeled after a city council where the staffers do all the legwork and the council meetings are where decisions are made. +SFC: We could name it something. Reason why I suggested "Chair Group" to make it clear that all of us as a full body during the 90 min monthly meeting is when decisions are actually made. Whereas Steering Group / Committee sounds like they make all the decisions and only receive feedback. Chair Group is modeled after a city council where the staffers do all the legwork and the council meetings are where decisions are made. -EAO: "Chair Group" sounds like a group that makes decisions, but perhaps this is all semantics. I don't think the name matters all that much. +EAO: "Chair Group" sounds like a group that makes decisions, but perhaps this is all semantics. I don't think the name matters all that much. RCA: Do we have consensus? @@ -66,7 +67,7 @@ ZBI, STA: +1 RCA: Procedures and guidelines of how this will work are described in the document in the repository. -## Open Issues +## Open Issues ### Establish the decision making process (#58) @@ -78,9 +79,9 @@ RCA: Okay, I will create the decision making process doc. ### Goals and Non-Goals (#59) -STA: This is related to the discussion from last months, where there are many takes on what we should be doing. It's been a long month. I remembered how we said how important it is to talk about the data model, interop with XLIFF, etc. I think the charter summarizes pretty well the general sentiment, but I think it would be interesting to expand more on what it means to supersede MessageFormat, why it warrants a new solution to be created. Because it will help us understand the tasks in front of us. So I'm hoping we can agree on a core set of goals to help us navigate. +STA: This is related to the discussion from last months, where there are many takes on what we should be doing. It's been a long month. I remembered how we said how important it is to talk about the data model, interop with XLIFF, etc. I think the charter summarizes pretty well the general sentiment, but I think it would be interesting to expand more on what it means to supersede MessageFormat, why it warrants a new solution to be created. Because it will help us understand the tasks in front of us. So I'm hoping we can agree on a core set of goals to help us navigate. -DAF: I think this is really important, the non-goals are also very important. Can we discuss now? +DAF: I think this is really important, the non-goals are also very important. Can we discuss now? STA: I'm not sure if we discuss now or during it in a chair group. @@ -90,45 +91,45 @@ STA: Maybe that is a reason to have this discussion asynchronously on the repo i RCA: We don't have a packed agenda for this meeting, so maybe we have time to discuss now and move towards consensus. -NIC: We have been discussing many issues for weeks. How are we planning to triage them and get consensus? +NIC: We have been discussing many issues for weeks. How are we planning to triage them and get consensus? -STA: Goals and non-goals tell us scope. Design principles tell us how we should go about them. Requirements gathering has been helping us gathering the finer details of what we are building. Goals are higher-level. I hope that helps, the distinction can be tricky. +STA: Goals and non-goals tell us scope. Design principles tell us how we should go about them. Requirements gathering has been helping us gathering the finer details of what we are building. Goals are higher-level. I hope that helps, the distinction can be tricky. -NIC: I think it's a very good approach. Practically, are we having a side group who;s going to work on this? +NIC: I think it's a very good approach. Practically, are we having a side group who;s going to work on this? -ECH: MessageFormat for ECMAScript has been a long-standing issue. There's a glut of libraries for this in JavaScript, but not as much in other languages (?). Being able to provide data literals in the language of JavaScript: that seems to be something users want to use, which MessageFormat doesn't allow. And making a solution for the larger +ECH: MessageFormat for ECMAScript has been a long-standing issue. There's a glut of libraries for this in JavaScript, but not as much in other languages (?). Being able to provide data literals in the language of JavaScript: that seems to be something users want to use, which MessageFormat doesn't allow. And making a solution for the larger -ZBI: When I did presentations on Fluent in 2017 and 2018, I got a lot of questions about MessageFormat 2.0. The temperature of the room was that MessageFormat mostly works, but has certain deficiencies. I've noticed that presentations early on were focused on Java and C and server-side technologies. However, the industry has largely been moving toward client-side and hand-crafted solutions. MessageFormat is built on 2002 technologies. +ZBI: When I did presentations on Fluent in 2017 and 2018, I got a lot of questions about MessageFormat 2.0. The temperature of the room was that MessageFormat mostly works, but has certain deficiencies. I've noticed that presentations early on were focused on Java and C and server-side technologies. However, the industry has largely been moving toward client-side and hand-crafted solutions. MessageFormat is built on 2002 technologies. -SFC: I think going through the specific bullet points in issue #59 is a very good use of time. The list Stas put together is an example of the kind of thing that the chair group would put together. +SFC: I think going through the specific bullet points in issue #59 is a very good use of time. The list Stas put together is an example of the kind of thing that the chair group would put together. -STA: To give some background, these five points largely came from the charter. I rephrased them a bit to make them sound more like goals. I specifically put "messages" in the bullet point because there's been a lot of discussion on whether we are designing for messages or a file format. So this list of goals is not exhaustive. +STA: To give some background, these five points largely came from the charter. I rephrased them a bit to make them sound more like goals. I specifically put "messages" in the bullet point because there's been a lot of discussion on whether we are designing for messages or a file format. So this list of goals is not exhaustive. -DAF: I would like to start with the non-goals. I agree that we shouldn't design a general interchange format, and not support every grammatical feature. ??? About many-to-many, the industry says that you should be working on a pair. Situations where you're dealing with many-to-many are not always solvable. About the canonical syntax, I think we should focus on the data model, rather than syntaxes. Something different will work for Java versus JavaScript. For number 5, I don't think we should mix up MessageFormat with formatting. +DAF: I would like to start with the non-goals. I agree that we shouldn't design a general interchange format, and not support every grammatical feature. ??? About many-to-many, the industry says that you should be working on a pair. Situations where you're dealing with many-to-many are not always solvable. About the canonical syntax, I think we should focus on the data model, rather than syntaxes. Something different will work for Java versus JavaScript. For number 5, I don't think we should mix up MessageFormat with formatting. -ECH: I agree with everything DAF just said. Now that I'm thinking about the things he said for the goals and non-goals, I want to give some context on where the goals may have come from. We've been talking about the difference between data model and syntax. Many-to-many translations, at least in my work recently on a CAT tool, there are use cases where, especially when you are translating for a voice assistant, you may have different variants, like 3 variants in a source language and 5 variants in a target language. +ECH: I agree with everything DAF just said. Now that I'm thinking about the things he said for the goals and non-goals, I want to give some context on where the goals may have come from. We've been talking about the difference between data model and syntax. Many-to-many translations, at least in my work recently on a CAT tool, there are use cases where, especially when you are translating for a voice assistant, you may have different variants, like 3 variants in a source language and 5 variants in a target language. -ZBI: I think that this is a fairly sensitive point. I think that point 5 is the most important one for me. The criticism I heard so far from ECH and DAF is about dismissing the need to build a full-stack localization system. Only looking at the source model, and not thinking about how the UX side, is going to give us another MessageFormat in 5 years. There are a lot of requirements across the boundaries. Something like language switching at runtime on the mobile phone may influence our data model. If we think of ourselves as living in the silo of designing a data model, and anyone can run with it, we will make us design something that is much more limited, such that people need to extend it. If we end up with a turing-complete data model, sure, but I think it's better to focus on use cases and making sure that our API supports them. I think we should think across different layers and not isolate ourselves only to a single layer. +ZBI: I think that this is a fairly sensitive point. I think that point 5 is the most important one for me. The criticism I heard so far from ECH and DAF is about dismissing the need to build a full-stack localization system. Only looking at the source model, and not thinking about how the UX side, is going to give us another MessageFormat in 5 years. There are a lot of requirements across the boundaries. Something like language switching at runtime on the mobile phone may influence our data model. If we think of ourselves as living in the silo of designing a data model, and anyone can run with it, we will make us design something that is much more limited, such that people need to extend it. If we end up with a turing-complete data model, sure, but I think it's better to focus on use cases and making sure that our API supports them. I think we should think across different layers and not isolate ourselves only to a single layer. JMU: I agree with ZBI EAO: I agree with ZBI -STA: I have a question for David. On goal 5, you weren't sure if we should conflate combining the syntax with storage? Can you clarify? +STA: I have a question for David. On goal 5, you weren't sure if we should conflate combining the syntax with storage? Can you clarify? -DAF: There should be an API… there are 3 layers. Data model, syntax, and API. If this is all in scope, that's fine. Interpreting the message… my problem is with formatting. The formatting should be implementation-dependent. It conflates goals with non-goals. +DAF: There should be an API… there are 3 layers. Data model, syntax, and API. If this is all in scope, that's fine. Interpreting the message… my problem is with formatting. The formatting should be implementation-dependent. It conflates goals with non-goals. -STA: When you say "formatting", you mean the runtime process of interpolating strings? Do you mean for dates? +STA: When you say "formatting", you mean the runtime process of interpolating strings? Do you mean for dates? -DAF: "formatting" is a fuzzy word to use. We cannot ignore UX. We need to be aware of use cases. +DAF: "formatting" is a fuzzy word to use. We cannot ignore UX. We need to be aware of use cases. NIC: Can you clarify your terminology? -STA: When I say "many to many", I mean that the source string may have multiple variants that correspond to the variants of the source language, which may not be applicable to the translation. Say the source language is English, which it often is in our industry, and it requires plurals in a certain string, but a translation does not. So that's two variants to one variant. +STA: When I say "many to many", I mean that the source string may have multiple variants that correspond to the variants of the source language, which may not be applicable to the translation. Say the source language is English, which it often is in our industry, and it requires plurals in a certain string, but a translation does not. So that's two variants to one variant. -NIC: So it's like plural rules today? You can have different rules for different languages? +NIC: So it's like plural rules today? You can have different rules for different languages? -STA: Yeah. Plurals are one example, but other interesting grammatical features can be expressed in the same model. +STA: Yeah. Plurals are one example, but other interesting grammatical features can be expressed in the same model. ZBI: +1 to STA @@ -136,43 +137,43 @@ ECH: My interpretation of "many to many" is a bit more fuzzy than that. RCA: All, please add your own goals and non goals to the topic. -ECH: As I understand it, the data model is what goes as the input to the API that gets implemented in each implementation. It's more about that you describe the data to represent all the relationships of the input correctly, and the implementations choose what the function call arguments and types look like. But data is equal to itself, it is not a language, nor one that could become Turing complete. +ECH: As I understand it, the data model is what goes as the input to the API that gets implemented in each implementation. It's more about that you describe the data to represent all the relationships of the input correctly, and the implementations choose what the function call arguments and types look like. But data is equal to itself, it is not a language, nor one that could become Turing complete. -SFC: Regarding the second and third bullets in the non-goals: I do actually think that it should be in scope to support arbitrary grammatical features. For example, we should at least support gender, inflection, and plural. The mechanism for determining the correct gender, inflection, or plural should be implementation-dependent, but the syntax and data model should support a plug-and-play system so that users can plug in an inflection model and then be able to use messages that were translated in an inflection-aware way. +SFC: Regarding the second and third bullets in the non-goals: I do actually think that it should be in scope to support arbitrary grammatical features. For example, we should at least support gender, inflection, and plural. The mechanism for determining the correct gender, inflection, or plural should be implementation-dependent, but the syntax and data model should support a plug-and-play system so that users can plug in an inflection model and then be able to use messages that were translated in an inflection-aware way. -STA: What I meant by "transforming parts of speech" is that transforming strings with AI is not in scope; so I had something in mind where the translator has to put the cases in manually. So I would like to discuss this further. +STA: What I meant by "transforming parts of speech" is that transforming strings with AI is not in scope; so I had something in mind where the translator has to put the cases in manually. So I would like to discuss this further. SFC: I think syntax is in scope, after we decide on the data model. DAF: I agree; I was just saying that the data model and syntax should not be conflated. Data model first. -DAF: About many-to-many, there are not necessarily the same number of source and target language messages. It is in scope, but it should be clarified and grounded in CLDR data. The combinatorics should be deterministic.. +DAF: About many-to-many, there are not necessarily the same number of source and target language messages. It is in scope, but it should be clarified and grounded in CLDR data. The combinatorics should be deterministic.. -SFC: I think a goal should be to separate authoring format to runtime format. Authoring format is what the programmer writes. They check it into source control. Runtime format has a collection of strings and gets interpolated at runtime. +SFC: I think a goal should be to separate authoring format to runtime format. Authoring format is what the programmer writes. They check it into source control. Runtime format has a collection of strings and gets interpolated at runtime. MIH: We should discuss this separately, since I disagree with certain points. -STA: Yeah, I think runtime versus authoring format is worth discussing further. I didn't have it as a goal here because I pulled the goals only from the charter. +STA: Yeah, I think runtime versus authoring format is worth discussing further. I didn't have it as a goal here because I pulled the goals only from the charter. -EAO: My understanding is runtime format is a re-representation of the data model within the runtime. Or is it something else? +EAO: My understanding is runtime format is a re-representation of the data model within the runtime. Or is it something else? STA: That is my understanding, too. -MIH: If you look at the formats now, there's the one the programmer writes: it might be in English, it might be in French if you're a French developer, etc. That gets translated and might end up in the same format or a different format. You don't need everything, like comments. Those files can be compiled into a binary form, etc. Even JSON might be compressed. At runtime, it's pulled from the binary form and passed to the API. So in many cases, the authoring and runtime formats could be the same. +MIH: If you look at the formats now, there's the one the programmer writes: it might be in English, it might be in French if you're a French developer, etc. That gets translated and might end up in the same format or a different format. You don't need everything, like comments. Those files can be compiled into a binary form, etc. Even JSON might be compressed. At runtime, it's pulled from the binary form and passed to the API. So in many cases, the authoring and runtime formats could be the same. Maybe we should have an agenda item in the future to go through and discuss the terms in the glossary thread that Elango started. -NIC: Every little word can have a different meaning for people in this list. If we continue like this, if it's the right approach. +NIC: Every little word can have a different meaning for people in this list. If we continue like this, if it's the right approach. -RCA: I agree; we should continue moving on this topic. The chair group should make a definitive list. +RCA: I agree; we should continue moving on this topic. The chair group should make a definitive list. -NIC: On top of that, all these things are still very high-level. I don't have a solution, but I don't know if what we're currently doing is going to work. +NIC: On top of that, all these things are still very high-level. I don't have a solution, but I don't know if what we're currently doing is going to work. -SFC: What we're doing today is giving specific critiques on STA's bullet points. STA or the chair group should apply the feedback and come back next month with a revised list. +SFC: What we're doing today is giving specific critiques on STA's bullet points. STA or the chair group should apply the feedback and come back next month with a revised list. -STA: I think the last 2 months of discussions have been good. Requirements gathering broadened +STA: I think the last 2 months of discussions have been good. Requirements gathering broadened -STA: A while ago, we talked about API, and whether it would or would not be in scope. I think this approach doesn't give us a concept of error scenarios. What happens when something goes wrong, like an input argument is missing? Leaving that out as an undefined behavior wouldn't be a good choice for our working group. So I think we should have a definition of a step-by-step process of how formatting works. +STA: A while ago, we talked about API, and whether it would or would not be in scope. I think this approach doesn't give us a concept of error scenarios. What happens when something goes wrong, like an input argument is missing? Leaving that out as an undefined behavior wouldn't be a good choice for our working group. So I think we should have a definition of a step-by-step process of how formatting works. ZBI: +1 @@ -180,11 +181,10 @@ SFC: +1 ECH: +1 -MIH: The comment about "many to many" translations was about well-specified in CLDR. But I think it's useful to be more free-form. We have use cases in Assistant where there may be several replies, like "it's raining cats and dogs", "bring your umbrella", etc. So there was a need to represent variations of the same message. (please fill in) +MIH: The comment about "many to many" translations was about well-specified in CLDR. But I think it's useful to be more free-form. We have use cases in Assistant where there may be several replies, like "it's raining cats and dogs", "bring your umbrella", etc. So there was a need to represent variations of the same message. (please fill in) -RCA: Time's up! +RCA: Time's up! EAO: Data model should be resource-centric, even if canonical syntax isn't. [Full discussion and chat notes](https://docs.google.com/document/d/1icPqRiGXkbIGE46Y9H1hh4fZt9BL-PbJqoJKg7e9oA4/edit?usp=sharing) - diff --git a/meetings/2020/notes-2020-04-20.md b/meetings/2020/notes-2020-04-20.md index b673be5618..daf21a770b 100644 --- a/meetings/2020/notes-2020-04-20.md +++ b/meetings/2020/notes-2020-04-20.md @@ -3,11 +3,13 @@ Github: Meeting Agenda : 2020-04-24: #74 Attendees: Please fill in a 3-letter acronym if this is your first meeting: + - Suggestion 1: First letter of given name, First letter of surname, Last letter of surname - Suggestion 2: First initial, middle initial, last initial - Suggestion 3: Custom #### April 20 Attendees: + - Romulo Cintra - CaixaBank (RCA) - Elango Cheran - Google (ECH) - Mihai Nita (MIH) @@ -23,52 +25,51 @@ Please fill in a 3-letter acronym if this is your first meeting: - Mike McKenna (MGM) - Shane F. Carr - Google (SFC) -## MessageFormat Working Group Contacts: +## MessageFormat Working Group Contacts: - [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) - - ## Next Meeting May 18, 10am PDT (6pm GMT) ## Agenda + - [ Agenda on Github ](https://github.com/unicode-org/message-format-wg/issues/74) -Easy way to volunteer to participate in Chair Group : [Link](https://github.com/unicode-org/message-format-wg/pull/70) +Easy way to volunteer to participate in Chair Group : [Link](https://github.com/unicode-org/message-format-wg/pull/70) ## Updates from Chair Group Meetings RCA : We changed the way we run the meetings and now we try to rate between members of the chair group to “run ” the meetings. This week DAF will be incharge of TCQ and overall meeting discussions. - + ### Goals Non-Goals - (#77 #59) - + STA: Created PR of draft of goals & non-goals (https://github.com/unicode-org/message-format-wg/pull/77). - + We don't have to be precise since we will iterate on this, but want to have something more concrete and not nebulous. -Once we agree on the goals, I think the next step is to have an agreement on how we can get there, dedicate more time and evaluate the goals and agree … +Once we agree on the goals, I think the next step is to have an agreement on how we can get there, dedicate more time and evaluate the goals and agree … -My goal for today is to try to have a list of goals that represents our common understanding. +My goal for today is to try to have a list of goals that represents our common understanding. -Goals are splitted in 4 major points. +Goals are splitted in 4 major points. Data model for defining translations APP: When you talk about a data model this includes a file format ? or is API type definition that we wrap around and work from that. -MIH : In my point of view the Data model is separated from the syntax and api, so if people want to convert their old syntax to a new one, they can because the data model guarantees compatibility. Data model is syntax independent. Data model is a higher level than AST, AST is just about parsing tokens like open parentheses, curly braces, etc. +MIH : In my point of view the Data model is separated from the syntax and api, so if people want to convert their old syntax to a new one, they can because the data model guarantees compatibility. Data model is syntax independent. Data model is a higher level than AST, AST is just about parsing tokens like open parentheses, curly braces, etc. More than that, you should be able to write a conversion from old-format => data-model => new-format. Or if we design some APIs that take the “data-model” as argument (maps, and sets, and maps of sets, etc.) one can consume directly old and new syntax (new-syntax => data-model => API and old-syntax => data-model => API). So an old-format => data-model parser is the only thing needed. The trap: the data-model should be able to represent the old-format. In some cases the conversion might not be trivial (for example nested switches converted to a single switch on tuples). -STA: For me Data Model is similar from AST interpretation where (Please Fill) +STA: For me Data Model is similar from AST interpretation where (Please Fill) EAO: Could we refer to the canonical data model for resources, much as we do for syntax? Where are the boundaries between a translation management system and MF ? -APP: I think is clear the difference between message format and a translation management system (TMS). TMS is for , (Please fill ) +APP: I think is clear the difference between message format and a translation management system (TMS). TMS is for , (Please fill ) -ECH: Okay, sounds good. +ECH: Okay, sounds good. NIC: I think backward compatibility is an important point. @@ -76,9 +77,9 @@ APP: Backwards not necessary, but we should at least address. STA: You should have a good story for backwards compatibility, otherwise people won't use it. -DAF: Support for the idea of self-contained messages / translation units. It makes it more complex to support the linking of one message to another. +DAF: Support for the idea of self-contained messages / translation units. It makes it more complex to support the linking of one message to another. -STA: Let's leave this question open. From Fluent experience, it helped to have variants of one message's inflections used in other messages. George (GWR)'s presentation about Siri's handling is similar to what we use in Fluent. +STA: Let's leave this question open. From Fluent experience, it helped to have variants of one message's inflections used in other messages. George (GWR)'s presentation about Siri's handling is similar to what we use in Fluent. APP: Sounds good to me. @@ -86,7 +87,7 @@ ZB (on chat): Intra-message references should be considered. @Mihai pointed out MIH: Do you want the ability to support some kind of tagging , example how can i “bold/italic” this how data model can handle this,in XLIFF you have open/close placeholder... -a tag is another kind of placeholder ? +a tag is another kind of placeholder ? STA: Is a good question andsh @@ -94,13 +95,13 @@ MIH: Yes, let's add this to the goals / non-goals doc, make it possible to expre APP: We can add this as a goal, but we shouldn't go too much into detail in this document since it is about high-level descriptions of goals. -EAO: Messages have a simple structure. We do not and cannot have a fully backwards compatibility because it will at least break some aspects of the syntax (ex: in the MessageFormat.js library). +EAO: Messages have a simple structure. We do not and cannot have a fully backwards compatibility because it will at least break some aspects of the syntax (ex: in the MessageFormat.js library). -STA: To summarize briefly, there are still open questions on data model for collections of messages, relationship of data model to API, and protecting structural features. And maybe backwards compatibility is a design principle. +STA: To summarize briefly, there are still open questions on data model for collections of messages, relationship of data model to API, and protecting structural features. And maybe backwards compatibility is a design principle. -Next section of the goals document is having a canonical syntax for the data model. Parsing, validation. +Next section of the goals document is having a canonical syntax for the data model. Parsing, validation. -APP: The first part of what you said sounds like what we would use ICU MessageFormat. But the rest of what you said, sounds like more than that. +APP: The first part of what you said sounds like what we would use ICU MessageFormat. But the rest of what you said, sounds like more than that. DAF: I agree with canonical syntax. @@ -116,108 +117,105 @@ MIH: We should have a standard mapping to XLIFF. APP: +1 to XLIFF -RCA: This is an open PR, so please add any further comments there. https://github.com/unicode-org/message-format-wg/pull/77 +RCA: This is an open PR, so please add any further comments there. https://github.com/unicode-org/message-format-wg/pull/77 ### Establish the decision making process (#76 #58) https://github.com/unicode-org/message-format-wg/pull/76 - -DAF: It is time to decide some lightweight decision making process. There are already some comments on the PR. - -Preamble is nothing normative, just describes where we are. Describes consensus is lack of opposition, and voting is method of last resort. Chair group doesn't have decision making authority, the decision making authority is the monthly meeting. Process is intentionally lightweight, which is common in standards groups that are full of professionals. - -There were suggestions to define disciplinary actions against bad behavior. But so far, no such negative behavior, so no need to define just yet. Intended to be short and sweet. Anything can be resolved by consensus. - -APP: I like the intention and intentionality behind this. It captures the previous discussion. Only things I would like to mention are that Unicode already has descriptions of processes for recourse in difficult situations. - -You don’t need any of the formal stuff until you do, and then you really do need it. But if you don't already them pre-defined, then you open yourself up for being perceived as discrimination. - + +DAF: It is time to decide some lightweight decision making process. There are already some comments on the PR. + +Preamble is nothing normative, just describes where we are. Describes consensus is lack of opposition, and voting is method of last resort. Chair group doesn't have decision making authority, the decision making authority is the monthly meeting. Process is intentionally lightweight, which is common in standards groups that are full of professionals. + +There were suggestions to define disciplinary actions against bad behavior. But so far, no such negative behavior, so no need to define just yet. Intended to be short and sweet. Anything can be resolved by consensus. + +APP: I like the intention and intentionality behind this. It captures the previous discussion. Only things I would like to mention are that Unicode already has descriptions of processes for recourse in difficult situations. + +You don’t need any of the formal stuff until you do, and then you really do need it. But if you don't already them pre-defined, then you open yourself up for being perceived as discrimination. + DAF: I followed the link but it’s really defined for companies. It doesn’t have information about how to deal with members (?) - -APP: It’s uncommon in CLDR, ICU to vote. You always have that in your back pocket, just in case. I would never want to use it, but it is good to have it defined. - + +APP: It’s uncommon in CLDR, ICU to vote. You always have that in your back pocket, just in case. I would never want to use it, but it is good to have it defined. + GWR (on chat): I agree with Addison about being clear up front and not in the middle of a conflict - -DAF: Romulo says maybe we should separate decision making from disciplinary parts. That's point 7 of the Rules section of decision-process.md - + +DAF: Romulo says maybe we should separate decision making from disciplinary parts. That's point 7 of the Rules section of decision-process.md + RCA: I think we can establish it. - -DAF: I really tried to make it modular. The blacklist is defined, but we can have a separate document to describe how it is created and added to. - + +DAF: I really tried to make it modular. The blacklist is defined, but we can have a separate document to describe how it is created and added to. + Apart from the case of trolling or negative behavior, does the rest of it seem good to you? - + NIB: Just a minor comment; otherwise everything looks okay from my standpoint. - -RCA: What are the next steps? Preamble, etc. look good. As a group, what should we do to move this forward? - + +RCA: What are the next steps? Preamble, etc. look good. As a group, what should we do to move this forward? + DAF: I haven't heard anything that sounds like a "no-go" for this. - + DAF: Perhaps we need to create a document for the blacklist procedures. - + NIC: I agree. - + RCA: We should add that document for blacklist procedures. - + EAO (on chat): I like having an explicit definition of "consensus". Also helps that it's a good definition. - -STA: I like this doc. I see "group member" being used a lot. I don't know what the process is of becoming a member. What if someone joins and blocks our progress in a sustained way? - -DAF: We need membership guidelines. It should discuss the process of getting temporarily banned, etc. - -APP: We don't want to create a barrier to valid objections. We should look at ICU, CLDR, etc., where you can appeal to Unicode if anything goes wrong. - + +STA: I like this doc. I see "group member" being used a lot. I don't know what the process is of becoming a member. What if someone joins and blocks our progress in a sustained way? + +DAF: We need membership guidelines. It should discuss the process of getting temporarily banned, etc. + +APP: We don't want to create a barrier to valid objections. We should look at ICU, CLDR, etc., where you can appeal to Unicode if anything goes wrong. + SFC: Being part of an established organization gives us access to their practices: https://www.unicode.org/policies/policies.html - -As far as membership, when talking with Markus (Scherer) and Mark Davis, Unicode has companies and members. Non-company members are considered to be subject matter experts. If it gets to the point of conflict, we can bubble up concerns to the higher Unicode bodies. - + +As far as membership, when talking with Markus (Scherer) and Mark Davis, Unicode has companies and members. Non-company members are considered to be subject matter experts. If it gets to the point of conflict, we can bubble up concerns to the higher Unicode bodies. + In summary, I'm in favor of the decision making process, and we can link to Unicode policies for resolving issues, but we shouldn't link those two. - -MIH: I think it's important to contribute without becoming a member, in the spirit of contributing on Github. - + +MIH: I think it's important to contribute without becoming a member, in the spirit of contributing on Github. + APP (on chat): agree that we want community participation, but we need to address IP - + MIH: Being a part of ECMA-402 (JS Intl), when I moved from Adobe to Netflix, the change in company meant I was no longer a part of the committee. - -APP: We want to be clear on IP and stipulate that contributions are one-way. I distinguish between capital-M Member and lowercase-m subject expert member. - + +APP: We want to be clear on IP and stipulate that contributions are one-way. I distinguish between capital-M Member and lowercase-m subject expert member. + ECH: APP was pointing out that if we need, we can point to Unicode procedures. MFWG is under Unicode. SFC's previous comments covered my question well. - -DAF: My experience from OASIS is that any member can do wiki, contribute to the spec, whatever, but non-members can only do issues. Their PRs wouldn't be accepted. And issues are covered by the feedback license. - + +DAF: My experience from OASIS is that any member can do wiki, contribute to the spec, whatever, but non-members can only do issues. Their PRs wouldn't be accepted. And issues are covered by the feedback license. + SFC: Reiterate previous comments about definition of non-members as invited experts, and there is a procedure that helps cover concerns IP from them, which we can use if we need it. https://unicode.org/consortium/tc-procedures.html - + RCA: My question was about who can be a member? - -STA: I agree, this is an important topic, maybe we should have a separate document. Is this the same topic, or a different topic? - -DAF: I think membership topic is a different one from decision making topic. I can delete preamble from decision making doc and we can adopt it as the decision making doc. Does that sound viable? - -DAF: I will delete the preamble paragraph and merge the doc as our decision-making process. Is that okay? - + +STA: I agree, this is an important topic, maybe we should have a separate document. Is this the same topic, or a different topic? + +DAF: I think membership topic is a different one from decision making topic. I can delete preamble from decision making doc and we can adopt it as the decision making doc. Does that sound viable? + +DAF: I will delete the preamble paragraph and merge the doc as our decision-making process. Is that okay? + (silence) - - - + ### Review Terminology (#78 #19 #30) - + https://github.com/unicode-org/message-format-wg/issues/78 - - + APP: How do you want to get feedback? - + ECH: Comments in the issue/PR. - + STA: Commit the file as .md and have people file PRs against it? - -SFC: What are people’s opinions about wiki vs. .md files, more generally, not just for this specific instance? I think it's easier and better supported to have markdown files in the repo. There are no reviews required and no logs when editing a wiki page. - + +SFC: What are people’s opinions about wiki vs. .md files, more generally, not just for this specific instance? I think it's easier and better supported to have markdown files in the repo. There are no reviews required and no logs when editing a wiki page. + RCA: I agree. - + APP: Using markdown files instead of the wiki avoids having wiki page editing wars. - + MIH: Also, we should document terms as we go. Sometimes it’s easier for someone not directly involved in a discussion to notice that the participants mean different things. Clarify and document then and there. - - + ### Why MessageFormat needs a successor (#49) -This topic was moved for the next meeting + +This topic was moved for the next meeting diff --git a/meetings/2020/notes-2020-05-18.md b/meetings/2020/notes-2020-05-18.md index 00d6979fea..901cd6a206 100644 --- a/meetings/2020/notes-2020-05-18.md +++ b/meetings/2020/notes-2020-05-18.md @@ -1,4 +1,5 @@ #### May 18 Attendees: + - Elango Cheran - Google (ECH) - Pablo Velez - Expedia (PAV) - George Rhoten - Apple (GWR) @@ -7,7 +8,7 @@ - Staś Małolepszy - Mozilla (STA) - Nicolas Bouvrette - Expedia (NIC) - Addison Phillips - Amazon (APP) -- David Filip - ADAPT Centre at Trinity College Dublin (DAF) +- David Filip - ADAPT Centre at Trinity College Dublin (DAF) - Eemeli Aro - OpenJSF (EAO) - Mihai Niță - Google (MIH) - Zibi Braniecki - Mozilla (ZBI) @@ -15,268 +16,269 @@ - Richard Gibson - OpenJSF (RGN) - Shane F. Carr - Google (SFC) - -## MessageFormat Working Group Contacts: +## MessageFormat Working Group Contacts: - [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) - - ## Next Meeting June 22, 10am PDT (6pm GMT) ## Agenda -- [ Agenda on Github ](https://github.com/unicode-org/message-format-wg/issues/82) -Easy way to volunteer to participate in Chair Group : [Link](https://github.com/unicode-org/message-format-wg/pull/70) +- [ Agenda on Github ](https://github.com/unicode-org/message-format-wg/issues/82) +Easy way to volunteer to participate in Chair Group : [Link](https://github.com/unicode-org/message-format-wg/pull/70) ## Updates from Chair Group Meetings + ECH: provided a refresher on TCQ which we discussed adding to our documentation. RCA mentioned that the chair group meeting would be moved the week after the non-chair meeting to keep better context around previous discussions. - + ### Goals and Non-Goals(Cont.)#59 + https://github.com/unicode-org/message-format-wg/issues/59 RCA: Speaker is STA - -STA: Thanks for Git comments. Vision conflicting with deliverables. Proposing 6 goals, high-level and generic, for the vision proposal to the working group. Next section 4 deliverables. For non-goals added statements of what we want to do instead. This document should be the summary of our thinking, but can be revised. But as a group agree on this direction. - + +STA: Thanks for Git comments. Vision conflicting with deliverables. Proposing 6 goals, high-level and generic, for the vision proposal to the working group. Next section 4 deliverables. For non-goals added statements of what we want to do instead. This document should be the summary of our thinking, but can be revised. But as a group agree on this direction. + ECH displaying PR. - -STA: The six goals are organized in 3 main groups: 1-2-3 around linguistics and structure, more grammatical expression and metadata in translation; 4-5 around industry goals and interoperability, deciding on what formats beyond XLIFF and enable integration with existing TMS; and 6 is a high level one, making the standard a building block. Something that can be used by others. - + +STA: The six goals are organized in 3 main groups: 1-2-3 around linguistics and structure, more grammatical expression and metadata in translation; 4-5 around industry goals and interoperability, deciding on what formats beyond XLIFF and enable integration with existing TMS; and 6 is a high level one, making the standard a building block. Something that can be used by others. + This was a summary of the goals. Next move to deliverables. - + ECH: Checking questions - + EAO: if we mention XLIFF we should also include existing formats. - -STA: Good point. We discussed backwards compatibility last time. - -EAO: Another option is to drop XLIFF from goals. - + +STA: Good point. We discussed backwards compatibility last time. + +EAO: Another option is to drop XLIFF from goals. + STA: This could be an option to avoid controversies. - -DAF: About relationship with XLIFF. Mapping required for roundtrip of localization. We dont want to be too specific, but mapping is critical. - + +DAF: About relationship with XLIFF. Mapping required for roundtrip of localization. We dont want to be too specific, but mapping is critical. + ECH: Asking DAF to make it a topic to discuss. - + ECH: Suggested bullet point changes. - + MIH: In the current proposal, merge 4-5 or just keep 5. XLIFF is about integrating with CAT tools. With 4 we solve 5. - -Another proposal, if we are missing today, to control formats (date, times). - + +Another proposal, if we are missing today, to control formats (date, times). + ECH: People have opinions. There is a blue button to discuss topics. - + APP: Think there’s a linkage between XLIFF and interchangeability with other formats. Agree with insertive formatting observation. - + RCA: We should leave XLIFF open. Even if we have a strong relationship with this format and even it becomes the primary format first, we should leave it open so as to get stuck with one format. - + ECH: Asked David to talk about XLIFF as a topic. - + NIC: Keep number 5 is enough. If we have to use XLIFF, fine, but more important to be compatible. - + ECH: Asking David to talk about this topic. - + DAF: Best way to discuss this topic is having the mapping. - + ECH: It seems there’s an agreement. As proposed by NIC to remove 4 and start with XLIFF. - + APP: XLIFF is good for a specific translation unit. But we may need to have a mapping with larger units, like plurals. XLIFF may not provide all resources needed. as we introduce these structures, how do we pass these new units into a TMS? - + GWR: Converting grammatical number handling not explicit as XLIFF, and APP is correct about being explicit. - -APP: The challenge is the mapping for everyone to use it the same way. - + +APP: The challenge is the mapping for everyone to use it the same way. + RCA: Is it worth mentioning another format (SSML) in the goals? Probably another topic. - + GMR: Maybe another output format. - + ECH: Let's keep that one for later. EAO is next. - + EAO: If we are talking about compatibility with XLIFF, we need to be explicit of this compatibility. We should aim/hope for backwards compatibility, not to get committed to this format. - + ECH: Next is MIH. - + MIH: We shouldn't debate too much now about XLIFF. We should keep goals high-level. Valid topic but not for now. - + STA: Conversion can be approached differently. Torned about 4 and 5. Lets keep mapping as a deliverable, this mapping to XLIFF is what will help people follow the use correctly. - + ECH: People to keep their comments tracked in the chat. We will try to add them to the doc. - + DAF: Representing OASIS XLIFF https://www.oasis-open.org/apps/org/workgroup/xliff/ and XLIFF OMOS https://www.oasis-open.org/apps/org/workgroup/xliff-omos/ TCs in Unicode. I believe the high-level goal is to complete localization roundtrip. TMS dont have interoperability, you dont want to deal on this one-by-one. Only way to achieve it is with XLIFF. The goal is to have a data model behind this roundtrip and I believe having an XLIFF mapping is critical. posted in chat a link https://galaglobal.github.io/TAPICC/T1/WG3/rs01/XLIFF-EM-BP-V1.0-rs01.xhtml This is extraction and merging guidance for content owners, basically how to make sure that you roundtrip your content including inline markup through an XLIFF 2 roundtrip - + ECH: next is a point of order from RCA. - + RCA: It seems points 4 and 5 are talking too much time. We should decide if to keep it like this or what to change to these points, so we can move with the other points. - + ECH: I agree. Skip clarifying resolutions. Moving to point 2 from STA. - + STA: Agree. Merge goals 4 and 5. We meet every month to find something usable for the localization roundtrip. Lets leave the mapping as a deliverable. The value of this deliverable is something that can be integrated with a TMS. - + ECH: This is your PR. Feel free to move forward and do the changes proposed. - -STA: Ok going to go ahead and make changes in PR. - -ECH: Clarify this is about making changes to 4 and keeping 5. - + +STA: Ok going to go ahead and make changes in PR. + +ECH: Clarify this is about making changes to 4 and keeping 5. + STA: 4 really comes 5 as MIH mentioned. So lets keep 5. - + DAF: Also replace CAT tools/TMS with L10n roundtrip. - + ECH: K we are moving to another topic as we have clarified the XLIFF topic. APP is next. - + APP: Too much talk about formats, not about functionality. We may need a goal about functionality and interoperability with APIs for example. - -STA: This was planned to be captured in deliverable 4. But I meant it vague enough as previously we were cautious about talking about specific APIs or what methods to use. - + +STA: This was planned to be captured in deliverable 4. But I meant it vague enough as previously we were cautious about talking about specific APIs or what methods to use. + ECH: What deliverables do you refer to with the previous? - + STA: Deliverable number 4 is about run-time specifications. Not sure if we want to go that far and should be left open. APP do you agree deliverable 4 is enough? - + APP: It does say specification but we need some number for acceptance criteria. We may have multiple implementations for ICU. - + GWR: From my experience this may not work with ICU. - + NIC: Agree with GWR. Not sure if this is the goal for this group. We should have a well documented standard, sandbox and library. The rest should be defined by other folks. - + ECH: Next is MIH. - + MIH: Not sure if its comments or not. We should do prototyping in different languages and see if it is working. But do we want to make this part of deliverables? - + ECH: Next is STA. - -STA: On the topic of API, we may want to implement a reference implementation. What is the expected outcome? Consider error scenarios. We need to be clear and definitive about how this works. - + +STA: On the topic of API, we may want to implement a reference implementation. What is the expected outcome? Consider error scenarios. We need to be clear and definitive about how this works. + ZBI: While I understand we’re not interested in reference implementation as a goal, but hope to have aspirational goals beyond API calls. Consider voice over systems. We should aspire to support multiple user-interface models. We need a quick loop of prototyping and verifying- beyond APIs. - + ECH: As STA goals are not set in stone. - + STA: Lets keep this modern ‘uses’ in mind. The first 3 goals should capture these models. Allow for a greater expression in natural language. - + ECH: We havent put a timebox for goals and non-goals. How much more time should we extend on this? RCA proposes 10 more minutes. - + Keep going. We need to close this today. - + RCA: Agree with ZBI. Keep final implementations as POCs driven by the community. We are responsible for a base implementation, but we are not responsible for maintaining these multiple implementations now- maybe later in our roadmap. - + ECH: APP commented lack of implementation is a problem. - + GWR: APP are on similar boats. We should have a single reference implementation. Just one programming language to prove it works. But it shouldn't be forced into ICU. It has a lot of limitations. About 3MB per language to handle grammatical properties and inflection mappings for example. This is a consideration not to be constrained by ICU, which may not want to handle all this data by default. - + APP: I’m interested in ICU as a vehicle, part of my Amazon framework. But although not restrained to it, I also dont see why we shouldnt consider it here. - -ECH: Not taking a position about ICU. - -ZBI: One way out is to have a POC (without saying it explicitly in the goals in one dynamic language). I prefer open source languages, but we could use others like from Apple. - + +ECH: Not taking a position about ICU. + +ZBI: One way out is to have a POC (without saying it explicitly in the goals in one dynamic language). I prefer open source languages, but we could use others like from Apple. + STA: On the topic of backwards compatibility. We could put it as an explicit goal. What does the team think? - + APP: Its easy forward compatibility, but we shouldnt look backwards. Dont want to be restricted that way. Better to have a way to convert all messages forward in a certain way. - + DAF: Agree with APP. We already had this discussion. The goal is to design it so its future proof. It doesnt belong in goals nor deliverables. - -ECH: We have 25m left. - + +ECH: We have 25m left. + STA: Clarifying question- maybe used wrong terms. What I was aiming to put in the goals/deliverables, my idea was to add that messages could be migrated from format 1 to format 2. Is that forward or backward compatibility? - + ECH: What do you think about DAF that is more strategy than goal? - + STA: To consider it. - + ECH: MIH is next with POC as a deliverable for some specific languages. - + MIH: We may capture the POC idea in the deliverables. Lets get the bullets points right. - + STA: Agree. - + DAF: What is in scope is conformance implementation. Not full a POC- should be a 3rd party. - + APP: We should be looking for multiple implementations. - + DAF: We should encourage implementations by members and other stakeholders, even consider a certain number of implementations a success criteria, but implementations are not deliverables of the WG. - -APP: A success criteria is for us to demonstrate them. About backwards compatibility, I want to be able to do everything I used to do. - + +APP: A success criteria is for us to demonstrate them. About backwards compatibility, I want to be able to do everything I used to do. + MIH: We should document what were bad ideas for backwards compatibility. - + ECH: DAF asking about adding XLIFF as a future conversation. Anything to add now? - + DAF: XLIFF is a good topic for roundtrip feature discussions. - + ECH: Component test suite. Should we continue this one? - + STA: Clarification about POC and implementation. Should we aim to include one for us? Should this be a deliverable? - + APP: Are we going to talk or code something? - + RCA: We are here from different areas APP. We should have a final implementation to be our test field. The roadmap for this implementation can become a topic for the next meeting. We should move forward with the other topics. - + DAF: To summarize, a number of POC is a success criteria, but not developed by the working group. Should be developed by others. But the conformance test suite is in our scope and we should have this as a deliverable. - + STA: Agrees. Thanks for the clarification. - + ECH: Next is RCA. - + RCA: STA asked for notes from STA. - + STA: merge 4 and 5, change language to more relatable to loc roundtrip. Keep XLIFF out of goals for now. Next is to add a conformance test suite to the deliverables. One action item for the group is to capture our thoughts about backward compatibility- but to exclude from the document for now, only add later if this makes sense. - + MIH: Reminder about date and number formats to be added. - + STA: Is it a goal or a feature? - + MIH: Should be a separate bullet point. STA agrees. - + STA: It should be part of goals, not deliverables. - + RCA: We have to make changes to this document. Maybe we dont need to wait for the next meeting to merge changes to this document. We should actively review before the next meeting and have everything ready in Git. - + ECH: I agree. Think since STA has the doc and has the list of changes, should we make a vote now? Then wait for Git changes -instead of discussing the changes next meeting. - + STA: We could do something in between. Not too many changes planned for the doc, but we should try and do approval in Git- if no one complains before the end of May. - -DAF: Agree with STA. We should use this monthly meeting to make this decision. - + +DAF: Agree with STA. We should use this monthly meeting to make this decision. + ECH: Agree with STA and DAF. - + DAF: Use factoids instead of things as suggested by MIH. Terminology. - + MIH: Agree if this is the terminology. ‘Things’ is not an industry standard. - -DAF: Think they’re called factoids. - + +DAF: Think they’re called factoids. + MIH: ‘Formattables’ maybe? - + DAF: Formattbales is fine with me if people don’t recognize factoids as an industry term.. - + ECH: Consensus agreed with STA PR with agreed changes. And we agree with the possibility to disagree by the end of May. - -DAF: The goals PR will be in “call for dissent” until the end of month.. - + +DAF: The goals PR will be in “call for dissent” until the end of month.. + STA: To update PR by Wednesday so people have time to discuss before the end of month. - + RCA: This PR will become input for prioritization of scope and next parts of our roadmap. We should try by email to prepare the next meeting in advance. - + ECH: We are good with goals and non-goals! We have 5m left and 2 agenda topics. How do we want to spend them? - -MIH: about mesageformat, people should check PR and discuss next time. - -ECH: Ok multiple reasons to review PR. So the next item we can talk about is compatibility for design principles. - -RCA: Next week is chair meeting. Today was a good step for the next chair meeting and prioritizing the next steps. - + +MIH: about mesageformat, people should check PR and discuss next time. + +ECH: Ok multiple reasons to review PR. So the next item we can talk about is compatibility for design principles. + +RCA: Next week is chair meeting. Today was a good step for the next chair meeting and prioritizing the next steps. + ### Why MessageFormat needs a successor#49 + https://github.com/unicode-org/message-format-wg/issues/49 -This topic was moved for the next meeting - +This topic was moved for the next meeting + ### Review Terminology #80 + https://github.com/unicode-org/message-format-wg/pull/80 This topic was moved for the next meeting diff --git a/meetings/2020/notes-2020-06-15.md b/meetings/2020/notes-2020-06-15.md index 396cf78517..c337770ce4 100644 --- a/meetings/2020/notes-2020-06-15.md +++ b/meetings/2020/notes-2020-06-15.md @@ -1,4 +1,5 @@ #### June 15 Attendees: + - Pablo Velez - Expedia (PAV) - Rafael Xavier de Souza - PayPal / OpenJSF (RXS) - Eemeli Aro - OpenJSF (EAO) @@ -6,7 +7,7 @@ - Zibi Braniecki - Mozilla (ZBI) - Staś Małolepszy - Mozilla (STA) - Nicolas Bouvrette - Expedia (NIC) -- Mihai Niță - Google (MIH) +- Mihai Niță - Google (MIH) - Elango Cheran - Google (ECH) - George Rhoten - Apple (GWR) - Ben Michel - OpenJS Foundation (BPM) @@ -14,107 +15,107 @@ - Maria Esteban - Expedia (MNE) - Richard Gibson - OpenJSF (RGN) - -## MessageFormat Working Group Contacts: +## MessageFormat Working Group Contacts: - [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) - ## Next Meeting -July 20, 10am PDT (6pm GMT) +July 20, 10am PDT (6pm GMT) ## Agenda + Why MessageFormat needs a successor [#49](https://github.com/unicode-org/message-format-wg/issues/49) Design Principles [#88](https://github.com/unicode-org/message-format-wg/issues/88), [#68](https://github.com/unicode-org/message-format-wg/issues/68), [#64](https://github.com/unicode-org/message-format-wg/issues/64), [#63](https://github.com/unicode-org/message-format-wg/issues/63), [#62](https://github.com/unicode-org/message-format-wg/issues/62), [#60](https://github.com/unicode-org/message-format-wg/issues/60), [#50](https://github.com/unicode-org/message-format-wg/issues/50) - -### Why MessageFormat needs a successor #49 + +### Why MessageFormat needs a successor #49 https://github.com/unicode-org/message-format-wg/issues/49 RCA: Presenting agenda for the day. Asking ECH how to use the tool to ask questions/topics. - + RCA: MIH please drive the presentation. No timebox for this one, hope it's short. - + MIH: Trying to show PR as doc - ZIBI showed how it's done. - + MIH: Good comments received from STA and NIC. As a short intro about MessageFormat is detailed in the PR doc. - + What is important to discuss is the main four bullets regarding current problems with MessageFormat: - 1. No extension points: right now is too rigid. You need to get alignment from different ICU users to get changes in. This limitation makes it difficult to add and deprecate unrequired stuff. - 2. Can't remove anything, even we know is wrong: related to the last point above. - 3. Hard to map to existing loc core structures: not only TMS, but specifically on how parsing is done. Most systems pass the string as is and then translators can mess things up. TM leverage may work, but pluralization may be wrong. - 4. MessageFormat was meant to be used via an API: Not meant to be localized as it’s. Its limited in how many messages it can reference. The advantage is that it can be stored in any file format type. Along with some additional benefits listed in the doc, this advantage should be preserved. - + +1. No extension points: right now is too rigid. You need to get alignment from different ICU users to get changes in. This limitation makes it difficult to add and deprecate unrequired stuff. +2. Can't remove anything, even we know is wrong: related to the last point above. +3. Hard to map to existing loc core structures: not only TMS, but specifically on how parsing is done. Most systems pass the string as is and then translators can mess things up. TM leverage may work, but pluralization may be wrong. +4. MessageFormat was meant to be used via an API: Not meant to be localized as it’s. Its limited in how many messages it can reference. The advantage is that it can be stored in any file format type. Along with some additional benefits listed in the doc, this advantage should be preserved. + No more of a presentation, I want to have a discussion. RCA: so far no questions in TCQ. - + STA: About the second bullet, it's actually a benefit if you see this as a standard. Its not meant to be modified easily. - -MIH: We could list these benefits, but not sure it belongs in this document. - + +MIH: We could list these benefits, but not sure it belongs in this document. + STA: From my perspective it would make sense to include it. MIH: Tried to go one level up and focus on core problems. - + RCA: Sorry skipped DAF. - + DAF: its an advantage from a standard, but dont think is MIH’s point. The problem is the old form is 20 years old. - -Things were not designed modularly, nowadays, things are designed modularly. The core needs to be stable, but you want to be able to deprecate or add features. - + +Things were not designed modularly, nowadays, things are designed modularly. The core needs to be stable, but you want to be able to deprecate or add features. + MIH: I had a chat with ECH before the meeting. We thought it would be possible for the standard to have placeholders with flags that can be changeable (for dates, times, numbers). - -The core itself doesn't change, but the catalog over time evolves. Unicode does it -- BCP47 (for locales) hasn't changed in a long time, but the IANA registry (of valid locale subtags) does - + +The core itself doesn't change, but the catalog over time evolves. Unicode does it -- BCP47 (for locales) hasn't changed in a long time, but the IANA registry (of valid locale subtags) does + MIH: Maybe a registration mechanism for these placeholders? - -ECH: I am just agreeing with STA's suggestion that a list of specific problems in MessageFormat would be useful in this same document. The document, as it is, does well to list out the high-level problems and explanations. But specific problems could be useful for future reference. - + +ECH: I am just agreeing with STA's suggestion that a list of specific problems in MessageFormat would be useful in this same document. The document, as it is, does well to list out the high-level problems and explanations. But specific problems could be useful for future reference. + STA: we should define a core and extend it when possible. Maybe going ahead, but how this is phrased it looks like we can MessageFormat 2 because we cant change MessageFormat 1. - -MIH: If we have a registry next to the standard, we can flag/tag if something is deprecated vs what is recommended. Some things relate to the structure, and those are interesting. Otherwise, we can use the registry. - + +MIH: If we have a registry next to the standard, we can flag/tag if something is deprecated vs what is recommended. Some things relate to the structure, and those are interesting. Otherwise, we can use the registry. + STA: This might be a way to improve the wording. But the localization industry moves very slowly. Something to consider. - -MIH: Notification notice more for developers. Not for loc industry. - + +MIH: Notification notice more for developers. Not for loc industry. + STA: Understood. - -DAF: You’re saying you grandfather instead of deprecate, right? But talking about design goals, the primary consensus should be forward compatibility. Backwards compatibility should be in the backburner. This is all connected -- modular design, etc. Modular design allows you to create something to make something forwards-compatible. This can be achieved with a modular design - system of principles. - + +DAF: You’re saying you grandfather instead of deprecate, right? But talking about design goals, the primary consensus should be forward compatibility. Backwards compatibility should be in the backburner. This is all connected -- modular design, etc. Modular design allows you to create something to make something forwards-compatible. This can be achieved with a modular design - system of principles. + This is why XLIFF 2 cant be compatible with the previous versions. - + MIH: In the doc I was approaching why we need a new format, not the design principles. But we could merge point 1 and 2, would it make more sense? - -NIC: Difficult to predict if there’s a breaking change. Do we want to implement this way or another more hacky way? It's a judgment call, some may be breaking changes. We could also consider versioning as a way to distinguish between feature sets that may require breaking changes. - + +NIC: Difficult to predict if there’s a breaking change. Do we want to implement this way or another more hacky way? It's a judgment call, some may be breaking changes. We could also consider versioning as a way to distinguish between feature sets that may require breaking changes. + RCA: Are we going to timebox this topic? We are missing some parts of the design principle. We need consensus on the document from MIH. I propose 10 more min and if there’s a conclusion, merge the PR. - + STA: I think these points tie into the design principles well. - + MIH: What does the group think it's actionable items now? Merge 1 and 2? And do you think points from STA to real examples where MessageFormat is wrong, should it be mentioned in this document? - + RCA: We don't have a voting mechanism. A +1 in the chat would do. - + STA: Jumping to my reply on TCQ. To wrap up my comments, I saw point 2 as mistakes made, but MIH just used them as a high-level reference, right? - + MIH: Yes. - + STA: How about merge point 1 and 2 in one about modular design. And then add point 2 to list mistakes in the old MessageFormat. - + MIH: Mistakes were captured in the other bullets. But I will try to organize this better and list the mistakes in one bullet point. - + STA: Thank you. - + MIH: Let's move the discussion to GitHub. Team said +1 to the proposal. - + DAF: Agree. - + RCA: Deadline for this as we did for goals. What should be the timeline for this topic? - + DAF: Propose postponing merging until next meeting, but discuss in Github. - + RCA: Agree with DAF. Next is STA. ## Design Principles #88, #68, #64, #63, #62, #60, #50 @@ -128,129 +129,128 @@ RCA: Agree with DAF. Next is STA. [Design Principles #50](https://github.com/unicode-org/message-format-wg/issues/50) STA: About Design principles, first thanks for comments on goals vs non-goals. I listed 6 goals of the group, 5 design principles and expanded on 9 non goals. - + The idea to move to design principles is to help us understand how we are going to do it and decisions to make. Especially when we have to make trade-offs. - + I was looking at them before through the lens of Fluent and syntax. Initially I was looking at design principles from this perspective. But the discussion with goals, it made me realize we thought in terms of syntax but we want to be more focused on the data model (ex. UML diagrams). A description of the data model. - + -STA presenting doc (issue #50) with principles- - + Does anyone have any comments? - + MIH: Modularity with backwards compatibility should be merged. But your principles should still apply. - + GWR: In regards to backwards/forward compatibility, how you name things is hard. It took a decade for example for Hebrew on how to manage gender for numbers, when the noun is there or not. - + This is an example on how it's going to be difficult to name these concepts. Not sure how to work with this. - + Finnish is another language with complex casing. - + MIH: This part on names, we could abstract it or remove it to the part that changes in the standard. We could manage this with the placeholder for changes, including plurals, cases, like in the example from GWR. I wouldn't try to code all grammatical features in the code standard. - + RCA: Reply from ECH. - -ECH: We have a set of features we want to support. How do we pass this as data and how design our data model to represent it. These are separate things. - + +ECH: We have a set of features we want to support. How do we pass this as data and how design our data model to represent it. These are separate things. + STA: The discussion about design principle is difficult because it is abstract. Lets take computational vs manual as an example. - + One extreme example: one dictionary with all possible grammatical cases. Then you need a data model for it, but it manages all possibilities. - -But on the other extreme (manual): You don't have this dictionary, it's up to the translator to identify the cases he/she needs. - -MIH: The standard would provide a placeholder with an id to the string it refers to. Also a place for the developer to specify the genitive form. Then a machine, human can be plugging in these variations. For example, maybe I know how to determine genitive forms for Slavic languages, but when it comes to Finnish, I have no idea, so I have to give it to translators. - + +But on the other extreme (manual): You don't have this dictionary, it's up to the translator to identify the cases he/she needs. + +MIH: The standard would provide a placeholder with an id to the string it refers to. Also a place for the developer to specify the genitive form. Then a machine, human can be plugging in these variations. For example, maybe I know how to determine genitive forms for Slavic languages, but when it comes to Finnish, I have no idea, so I have to give it to translators. + STA: In this view the data model only describes the declaration for the specific case. - -DAF: Rule-based machine translation seems what STA was referring to with the computational example. And this is not really feasible to perform for all rules for all languages. - + +DAF: Rule-based machine translation seems what STA was referring to with the computational example. And this is not really feasible to perform for all rules for all languages. + STA: Agree with DAF and that's how Fluent was developed. But I was impressed with how Siri replied to me with correct nouns in my language. - + GWR: The way Siri does it is with a library to manage all extreme cases. For some languages it would use machine-learning models, like with Russian. But sometimes it does the same as Fluent - hardcoding multiple solutions to come up with one. - -Example with ‘Cometa’ in Spanish as it can be masculine or feminine depending on the context. - -Translators are not linguists. Software engineers will not know how to reference either. - + +Example with ‘Cometa’ in Spanish as it can be masculine or feminine depending on the context. + +Translators are not linguists. Software engineers will not know how to reference either. + Not an easy solution. It's a hard topic, but I recommend we should have a default set and be fluid to modify as needed. - + RCA: Next is EAO. - -EAO: This looks like a process we can go on forever. Let's choose one specific thing, try to support it with a PR, and then see how people respond and which possibilities close off as a result. How many selectors should we have in MessageFormat2? Fluent has only 1. - + +EAO: This looks like a process we can go on forever. Let's choose one specific thing, try to support it with a PR, and then see how people respond and which possibilities close off as a result. How many selectors should we have in MessageFormat2? Fluent has only 1. + RCA: Clarifying question. Should we do a list of items to start working on? And then from this build the syntax model to solve these problems, discuss during the meetings. - + EAO: Let’s pick up at least one dimension, discuss and agree on the principle as we go. - + STA: We have spent 3 months discussing abstract topics like goals vs non-goals. We need to move to tangible actions, for example with the principles. - + MIH: GWR covered what I wanted to say. Both extremes have challenges - trying to plan for all grammatical variations or give a lot of flexibility. Agree with GWR to define a set of what we know we can describe and use it to go forward. -I have an idea on how to move forward: I have a data model proposal that captures Fluent, MessageFormat and Facebook standard. We take some of the features we want and we try to plug it in to try and see if the data model works, and if not, we can iterate on the data model. We can leave aside the features that relate to syntax (escaping, double new line). - +I have an idea on how to move forward: I have a data model proposal that captures Fluent, MessageFormat and Facebook standard. We take some of the features we want and we try to plug it in to try and see if the data model works, and if not, we can iterate on the data model. We can leave aside the features that relate to syntax (escaping, double new line). + STA: I like that, I think it would work. - + NIC: Agree, we can get lost with all grammar and language-specific issues. The key is not to corner ourselves, but try to start using something- echo STA. And yes, we need design principles. - + RCA: Next is DAF. - + DAF: Computational vs manual not the right axis to start with. We can look at a different axis / dimension to start off with. Forward compatibility should be the first principle. Backward compatibility is a different, more complicated issue. We should make progress in the design principles, before deep-diving in grammatical challenges. - + RCA: Any reply to DAF? - -RCA: We are aware of the principles we want to go after first - they are tied to our goals. We should start moving forward like proposed by MIH, explore what he proposed and do as EAO proposed, deep-dive in one issue. - + +RCA: We are aware of the principles we want to go after first - they are tied to our goals. We should start moving forward like proposed by MIH, explore what he proposed and do as EAO proposed, deep-dive in one issue. + Start working on the model and capture issues in Github for further discussions and jump into the work. - + EAO: We need someone to call out the first question we want to focus on. - + RCA: Lets vote in the last 15m on the topic we want to tackle first. - + MIH: Should we start with the data model proposed a while ago in the slides? for example message level selectors. As a translator, some of the flags would be read-only, others you can change the value or you could select the value from a list of options. - + STA: About forward/backward compatibility. Valuable to have different approaches to design principles. It's important to have some compatibility strategy as part of this modular strategy for the standard. DAF can you expand on the roundtrip principle? - + DAF: When you think about the language matrix, everyone most likely starts from English as the starting language. But if you take Chinese, Finnish you’re introducing way more parameters from the beginning. - + For the roundtrip principle, I can give a short presentation on the capabilities of the localization formats to protect content grouping/ segmentation. - + STA: I would like to hear this presentation. - + RCA: ECH is back - -ECH: It looks like we were discussing a design principle first, but then we switched to implementing a feature to test a data model. Both are different things. - + +ECH: It looks like we were discussing a design principle first, but then we switched to implementing a feature to test a data model. Both are different things. + If we do features, it will be more tangible and help inform the design principles discussion. - -MIH: DAF proposed capabilities from existing systems. I'm going to share an introduction for engineers and am going to coordinate with DAF if we could present together. - + +MIH: DAF proposed capabilities from existing systems. I'm going to share an introduction for engineers and am going to coordinate with DAF if we could present together. + DAF: Yes, we can try and present at the next meeting. - -MIH: 190 grammatical features listed in a site I shared with the team in the chat- this is the link: - + +MIH: 190 grammatical features listed in a site I shared with the team in the chat- this is the link: + https://wals.info/languoid Examples of axes: https://wals.info/feature - + RCA: What should be next steps? First pick a feature and use the proposed data model to see if it works. - -STA: Yes, a concrete example. - + +STA: Yes, a concrete example. + RCA: Vague but it's the next step. Then is design principles that can help guideline the concrete example we need to identify. - + In the next chair meeting we should identify this topic. - + EAO: As long as we pick something, it's fine if we can delegate the choosing of the example with someone in the team. - + RCA: I will share some options, we can discuss offline and then we decide for our next meeting. - + And GWR can you please share more of your concerns? - + GWR: I have something to present for the Unicode conference. I will get back to the team if I can share something here. - -RCA: MIH and DAF lets coordinate if you guys want to present at the next meeting. + +RCA: MIH and DAF lets coordinate if you guys want to present at the next meeting. [Full discussion and chat notes](https://docs.google.com/document/d/1-zfWS829ciB96F6qioxaIOuqrNfw7IO6UlNckaSGS_0/edit?usp=sharing) -## Next Action Item +## Next Action Item Find a concrete message format use case and start prototyping (to be discussed further int the next char meeting). - diff --git a/meetings/2020/notes-2020-07-20.md b/meetings/2020/notes-2020-07-20.md index 69640d0d45..8c6517141f 100644 --- a/meetings/2020/notes-2020-07-20.md +++ b/meetings/2020/notes-2020-07-20.md @@ -1,4 +1,5 @@ #### July 20 Attendees: + - David Filip - ADAPT Centre @ Trinity College Dublin (DAF) - Romulo Cintra - CaixaBank (RCA) - George Rhoten - Apple (GWR) @@ -8,237 +9,232 @@ - Staś Małolepszy - Mozilla (STA) - Maria Esteban - Expedia (MNE) - Pablo Velez - Expedia (PAV) -- Mihai Niță - Google (MIH) +- Mihai Niță - Google (MIH) - Eemeli Aro - OpenJSF (EAO) - Elango Cheran - Google (ECH) - Rafael Xavier de Souza - PayPal / OpenJSF (RXS) - -## MessageFormat Working Group Contacts: +## MessageFormat Working Group Contacts: - [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) - ## Next Meeting -August 17, 10am PDT (6pm GMT) - +August 17, 10am PDT (6pm GMT) ## Agenda + - [ Agenda on Github ](https://github.com/unicode-org/message-format-wg/issues/97) -Easy way to volunteer to participate in Chair Group : [Link](https://github.com/unicode-org/message-format-wg/pull/70) +Easy way to volunteer to participate in Chair Group : [Link](https://github.com/unicode-org/message-format-wg/pull/70) We should go ahead and get started -### Moderator: David Filip +### Moderator: David Filip ## Pull Requests Status (10min) -RCA: first thing I want to share is the Chair Group meeting notes. We should start closing and merging PRs. First PR is the one by ECH is #80 on glossary and terms. - +RCA: first thing I want to share is the Chair Group meeting notes. We should start closing and merging PRs. First PR is the one by ECH is #80 on glossary and terms. + ECH: Yes, I agree it is useful since we will start discussing details in the proofs-of-concept soon. - + RCA: Next one is Why Message Format needs a successor. - + ZBI: I am okay with landing this, I have already reviewed it. - -RCA: Let's close at end of meeting. Next is #94 add examples of copy which requires variants (placeholders). - + +RCA: Let's close at end of meeting. Next is #94 add examples of copy which requires variants (placeholders). + ZBI: One thing is that I want to hear from STA about whether we want to support two independent selectors. - + STA: (chat) Yes please! My understanding was that we start with something simple now To have an excuse to discuss design principles - -ZBI: I see STA's comment. I will try to address this during my presentation. - -George: I had a question about independent selectors. Ex: "The light is one" vs "The lights are on". The whole sentence changes, there are - -ZBI: I will try to address this in my presentation. The data model should encompass representing all that info. - -RCA: Please let's start using the queue for questions. +ZBI: I see STA's comment. I will try to address this during my presentation. + +George: I had a question about independent selectors. Ex: "The light is one" vs "The lights are on". The whole sentence changes, there are + +ZBI: I will try to address this in my presentation. The data model should encompass representing all that info. + +RCA: Please let's start using the queue for questions. ## Concrete message format use case - Selectors / Placeholders POC (15min) RCA: Lets timebox for 15 minutes before starting the proof of concept. But there’s a discussion in PR 97. Someone wants to share more details? - + MIH: I have POC code that is public, but I have not yet sent the links to everyone. (chat) https://github.com/mihnita/msgfmt_experiments - + RCA: If we have extra time after the two presentations, we can discuss it. - -EAO: One thing. The fundamental questions: wrt. selectors, should we have a model for specific well-known selectros or flexibility to apply some sort of function to the input that chooses the selector. MF1 goes with the option of going with an explicit list, Fluent goes the other way of having a function. - + +EAO: One thing. The fundamental questions: wrt. selectors, should we have a model for specific well-known selectros or flexibility to apply some sort of function to the input that chooses the selector. MF1 goes with the option of going with an explicit list, Fluent goes the other way of having a function. + It allows the discussion to be far more flexible since we don't have to decide up-front all of the possibilities of available options. - + DAF: We should move to Zibi’s presentation. - + ECH: Not sure how much my presentation can answer EAO but maybe after it could be answered. - -EAO: Let's move forward. +EAO: Let's move forward. ## Zibi's Experimental AST Overview (30min) -- *Repo*: https://github.com/zbraniecki/message-format-2.0-rs/tree/master -- *Slides*: https://docs.google.com/presentation/d/1nBnWv3nQQnS0zMkM5qsIE6f5zki3YDHXR-hdxJo1Pc0/edit?usp=sharing - -ZBI: it's a fairly small presentation. It's intended to show you what we used when we designed Fluent and applied to the scope we are discussing now. Will not address rationale in this presentation. How this came about is a separate question. - +- _Repo_: https://github.com/zbraniecki/message-format-2.0-rs/tree/master +- _Slides_: https://docs.google.com/presentation/d/1nBnWv3nQQnS0zMkM5qsIE6f5zki3YDHXR-hdxJo1Pc0/edit?usp=sharing + +ZBI: it's a fairly small presentation. It's intended to show you what we used when we designed Fluent and applied to the scope we are discussing now. Will not address rationale in this presentation. How this came about is a separate question. + Problem scope: Took them from the goals of the project Added one point: all data should be formatted explicitly or implicitly - + Simplest AST: Message with single value It can later have metadata/other information Pattern: vector of elements Elements can be textual or placeholders If textual is an expression - + ZBI: Questions at the end - + Handling variants: Either single or multi variants Variants composed of Variant and Key Variants can be strings or numbers Fluent does not try to limit the structure of the AST/data model, but we can discuss this later. - + Two problems when designing; first: If you have variants and a selector for these variants, you should be always find a value even if a matching variant cannot be found Localization is a best effort, we should always return a value - no blanks We should do everything possible to save message that could indicate to the user what could be done We need a concept of default variant (fall-back) -In Fluent we added a default field in the variant, it seems natural place +In Fluent we added a default field in the variant, it seems natural place You could also use the last variant as the default But how you encode this default is open question - + Next question, is about how to handle multi selectors In Fluent we handle this with multi- expressions Currently, the expressions and variants can be nested It's challenging to read a nested selected of multi-expressions -We don't want to allow every placeholder to be a part of the selection process. That can lead to nested expressions +We don't want to allow every placeholder to be a part of the selection process. That can lead to nested expressions In option 2 multi selectors moved to the beginning of the message All options are in Git, issue 6 - + So I think the main questions to answer are: default branches, multiple selectors, and uneven branches in multi-variants (?). RCA: 10m left - -ZBI: The question about uneven branching is something we should think about. Maybe we decide that we don't want to support the granularity of preference for default selection when values are missing. In fact, Fluent only supports one selector right now, and we are still successfully localizing all of Firefox, a large project. So the question is how ambitious do we want to be with the model vs how simple? Questions? - + +ZBI: The question about uneven branching is something we should think about. Maybe we decide that we don't want to support the granularity of preference for default selection when values are missing. In fact, Fluent only supports one selector right now, and we are still successfully localizing all of Firefox, a large project. So the question is how ambitious do we want to be with the model vs how simple? Questions? + STA: (chat) Zibi, thank you for incorporating my feedback to the experiment you presented. I'd like to clarify that the Single/Multi split isn't how the actual Fluent AST is designed. I feel it's important to point it out as the question of whether the selection logic should happen outside or inside expressions has often come up in the recent discussions. - + MIH: Clarifying question, one thing message format does is predefining plurals. If selector has value other, that's the default, no need for star. if you have multiple selector, it becomes messier. Internally we implemented option 3 and that's a pain in the… I'm advocating for option 2. - + About the default, one benefit of a non-star approach is that in our localization tools translators can't add new variants. the tool would add these variants. For example, gender. In English you can use one ‘other’, same for plural. This is done by copying the value from ‘other’, then translators would work with ‘other’ to make it work for his/her language. - -Having an explicit one helps tools processing, not humans. - + +Having an explicit one helps tools processing, not humans. + Where can we add these options? - + ZBI: You can add those after ECH question. - + RCA: Next is NIC. And we should extend this Q&A for 5m. - + ECH: ok with 5 more minutes. - -NIC: Related to MIH and how this would fit with a TMS. How do you know if a selector is missing in the tool? - -ZBI: We solved it with terms in one level. But the proposal here is that we are going to have all functions available to translators to have metadata with what you can pass and what it would return. This would allow any CAT/TMS to return the right value to translators. - + +NIC: Related to MIH and how this would fit with a TMS. How do you know if a selector is missing in the tool? + +ZBI: We solved it with terms in one level. But the proposal here is that we are going to have all functions available to translators to have metadata with what you can pass and what it would return. This would allow any CAT/TMS to return the right value to translators. + The variant model from Fluent has been tested in reality. But not yet the proposal about metadata. - + RCA: Next is GWR. - + GWR: My recommendation it can't be string literal, it should be formattable. So we would want to pass the actual numerical value, not just the string representation, because you may want to format differently (ex: compact decimal format). - -ZBI: We would like to do this in the real world. Example with numeral fixer to correct decimals. And agree it shouldn't be a string but it should be formattable. SFC and I are looking to pass FixedDecimal instead of just a primitive numerical type to get much better precision. I'm trying to tap into what ECMA-402 is doing. - -GWR: Second part. If we are formatting a value, are we ok for it to have a string in spoken form in addition to the print form? - + +ZBI: We would like to do this in the real world. Example with numeral fixer to correct decimals. And agree it shouldn't be a string but it should be formattable. SFC and I are looking to pass FixedDecimal instead of just a primitive numerical type to get much better precision. I'm trying to tap into what ECMA-402 is doing. + +GWR: Second part. If we are formatting a value, are we ok for it to have a string in spoken form in addition to the print form? + ZBI: This is something not in scope in this experiment. We do it in Fluent,to have explicit forms for spoken and written. And this is a good follow up to this experiment. - + RCA: We need to progress. Next topic is from STA. - + ZBI: STA is on mute today. What I presented today is not how Fluent does it, but how I envision it. - + ECH: I wonder if spoken vs. print can be handled via selectors/placeholders. - + EAO: We are going to need this. When a human looks at a message like the one shown by ZBI, we need all the values for the nested config to make sense. But this is not good from a dev perspective. - -ZBI: We do believe if we have a separate model for humans vs machines, we should optimize in a different way. - + +ZBI: We do believe if we have a separate model for humans vs machines, we should optimize in a different way. + STA: doesn't see a strong case for numbers and literal decimals. (chat) Re. numbers, I think the case for having decimal literals is not very strong. Perhaps I haven't seen a use-case for them yet. All literals that I've seen used in Fluent were integers used as variant keys for exact matches. (Exact matches on decimals or floats can be problematic in some impls) - -MIH: same as before about decimals. It doesn't work well if you allow the number of fractional digits to be decided by the developer. This should be decided by the formatter, not the developer. - -ZBI: Thank you for your time. +MIH: same as before about decimals. It doesn't work well if you allow the number of fractional digits to be decided by the developer. This should be decided by the formatter, not the developer. + +ZBI: Thank you for your time. ## Elango POC in Rust (30min) -- *Repo*: https://github.com/zbraniecki/message-format-2.0-rs/pull/8/files -- *Slides*: https://docs.google.com/presentation/d/1SYUNBoBtIxRnfvdAy8IXBXVQvUxdxIO4I6rquuO-zO0/edit#slide=id.g8c6a179f79_0_9 +- _Repo_: https://github.com/zbraniecki/message-format-2.0-rs/pull/8/files +- _Slides_: https://docs.google.com/presentation/d/1SYUNBoBtIxRnfvdAy8IXBXVQvUxdxIO4I6rquuO-zO0/edit#slide=id.g8c6a179f79_0_9 ECH: It's in PR 8 and diagram included in Google Docs. - + I also have some diagrams with changes based on what ZBI has done. My message pattern resembles what ZBI presented. - + Terminology is called a placeholder, in some cases it can be called an expression like in the Fluent presentation. For me a placeholder is something that holds on into some sort of content. Instead of having functions, I have types; example ‘PlaceholderTypes= Gender, plural’. - + I put default values in text. The message base is what relates a single message with a message group. This relates to variants as we Fluent. - + Something different, instead of a single selector or variant keys, I was trying to solve for a case when you have two independent selectors. I came with a Placeholder Values Map, which is composed of key values. - + I used this Map as the actual key for all messages involved. This was my thing. In the next iteration of the diagram (current + todo), the values map is not part of the message base but outside of it as a separate object. - + Further changes (slide 3), messagebase is only holding to message pattern, we could get rid of it. The value map should contain any values at run time. - + Code is included in the PR. Questions? - + DAF: Reading question from STA. How is this mapped to message base patterns? - + ECH: Message base are templates. See example in line 489 in message groups, when count is equal to other, select this. You can see here multiple placeholder options. From 488 to 492. - + MIH: Ids and locales, I don't think they belong at this level. Id is metadata level, like comments. And if you take the locale out, you would avoid great pain. If you take them out, the message pattern can be simplified. Its just a thought. - + Is it really a message group like in Okapi? Or is just a selector group? - + ECH: Agree, I need a better name for it. - + NIC: Do you have examples of multiselectors? - + ECH: I didn't have time to get to that. But I can update the PR with this later. But it should be straightforward, following option two from ZBI. - + NIC: So multiple defaults approach? - + ECH: Correct. - + MIH: About the multiselectors, I posted in the chat something that can help and is missing in your presentation. We don't know in your example if this a plural or what. - + ECH: Should you make a copy of the placeholders? Each of the patterns in the message map could have multiple placeholders. Maybe having a copy could be useful, but now I think its redundant. It make more sense if its part in the pattern and I wrote this as part of the code - see line 295 of my code. - + MIH: No, you don't know this. From the placeholder you can't get all the deciding factors. - + ECH: Agree, there’s room to extend the code to support this. - + DAF: lets move to general questions. - + ZBI: Thank you ECH and MIH comments are very useful. We have 3 possibilities: In Data model we specify gender, plural, etc which closes the scope for localization features, makes the system more predictable ECH model allows for more flexibility Fluent treats all selector the same way, treating gender/plural as any other function - + DAF: We need to move to next steps. - + ZBI: Open versus closed is the most important question to decide in the next couple weeks. - -DAF: This question seems the one to be decided. - + +DAF: This question seems the one to be decided. + RCA: Please create issues in the repo with decisions pending and link them to your repo. - + RCA: We are done. Everyone who raised a decision point, create an issue so it can be reviewed during the chair meeting. diff --git a/meetings/2020/notes-2020-08-17.md b/meetings/2020/notes-2020-08-17.md index d8c74e9688..c93f0b0360 100644 --- a/meetings/2020/notes-2020-08-17.md +++ b/meetings/2020/notes-2020-08-17.md @@ -1,9 +1,10 @@ #### August 17 Attendees: + - George Rhoten - Apple (GWR) - Romulo Cintra - CaixaBank (RCA) - David Filip - ADAPT Centre @ Trinity College Dublin (DAF) - Nicolas Bouvrette - Expedia (NIC) -- Mihai Niță - Google (MIH) +- Mihai Niță - Google (MIH) - Eemeli Aro - OpenJSF (EAO) - Elango Cheran - Google (ECH) - Zibi Braniecki - Mozilla (ZBI) @@ -11,252 +12,254 @@ - Shane F. Carr - Google (SFC) - Rafael Xavier de Souza - PayPal / OpenJSF (RXS) - -## MessageFormat Working Group Contacts: +## MessageFormat Working Group Contacts: - [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) - ## Next Meeting September 21, 10am PDT (6pm GMT) ## Agenda + - [ Agenda on Github ](https://github.com/unicode-org/message-format-wg/issues/108) -Easy way to volunteer to participate in Chair Group : [Link](https://github.com/unicode-org/message-format-wg/pull/70) +Easy way to volunteer to participate in Chair Group : [Link](https://github.com/unicode-org/message-format-wg/pull/70) We should go ahead and get started - -#Moderator : Mihai Nita - + +#Moderator : Mihai Nita + # Chair Group Announcements - + ## Moving calendar events to the Unicode calendar - + ## Task group within Chair Group - -RCA: We are creating small groups to discuss specific issues, responsible for documenting discussion and results and report to monthly meeting. Ex: MIH ZBI ECH on their MFWG proofs-of-concept. Haven't decided yet how to represent tracking progress on a dashboard. How do people think? [signs of agreement] Most people in attendance were in the chair group, so no surprise. - + +RCA: We are creating small groups to discuss specific issues, responsible for documenting discussion and results and report to monthly meeting. Ex: MIH ZBI ECH on their MFWG proofs-of-concept. Haven't decided yet how to represent tracking progress on a dashboard. How do people think? [signs of agreement] Most people in attendance were in the chair group, so no surprise. + # Presentation on MIH's proofs-of-concept for example problem (#93/#94) - -MIH: Wanted to do something differently from ZBI and ECH who wrote implementations/proofs-of-concept in Rust. I have an implementation in Java using Protobuf and Java using JSON. I think in terms of the data model, it is similar is to ZBI's. - -Data model diagram - there are placeholders that occur in the middle of messages. A placeholder consists of a name, type, and map of flags. I don't want to focus on the syntax of how I've represented placeholder examples, it's really just a reflection of the data model that I'm proposing. Flags are open-ended -- they might indicate: where to find the replacement text, information about grammatical parts of speech, sub-formatting patterns ("skeletons"). - -SelectorMessage - a message with multiple branches (ex: plural, gender). Consists of switches and cases. Switches declare the arguments used for selection. Cases indicate the possible branches (message patterns) from which a selection occurs. There is another field called "extras" at the SelectMessage level that - -Notes on SelectorMessage: Doesn't address the open issues: 1) establishing the default case/branch/pattern 2) how is selection done? (ex: sub-question: what happens when 2+ case conditions are satisfied by the selection arg values?). Placeholder types are like interfaces/functions because they define how formatting should be done. Therefore, they are not "free for all" functions. Types and number of args should be defined ahead of time. But they should be extensible by the user. - -Comparison with ZBI's presentation / proof-of-concept data model. Similar. ZBI's Expression corresponds to my placeholder. VariantCase corresponds to Case. InlineExpression corresponds to Switch. - + +MIH: Wanted to do something differently from ZBI and ECH who wrote implementations/proofs-of-concept in Rust. I have an implementation in Java using Protobuf and Java using JSON. I think in terms of the data model, it is similar is to ZBI's. + +Data model diagram - there are placeholders that occur in the middle of messages. A placeholder consists of a name, type, and map of flags. I don't want to focus on the syntax of how I've represented placeholder examples, it's really just a reflection of the data model that I'm proposing. Flags are open-ended -- they might indicate: where to find the replacement text, information about grammatical parts of speech, sub-formatting patterns ("skeletons"). + +SelectorMessage - a message with multiple branches (ex: plural, gender). Consists of switches and cases. Switches declare the arguments used for selection. Cases indicate the possible branches (message patterns) from which a selection occurs. There is another field called "extras" at the SelectMessage level that + +Notes on SelectorMessage: Doesn't address the open issues: 1) establishing the default case/branch/pattern 2) how is selection done? (ex: sub-question: what happens when 2+ case conditions are satisfied by the selection arg values?). Placeholder types are like interfaces/functions because they define how formatting should be done. Therefore, they are not "free for all" functions. Types and number of args should be defined ahead of time. But they should be extensible by the user. + +Comparison with ZBI's presentation / proof-of-concept data model. Similar. ZBI's Expression corresponds to my placeholder. VariantCase corresponds to Case. InlineExpression corresponds to Switch. + I think the expressions are problematic. - + ZBI: Can you explain why it's a problem? - + MIH: I think recursivity is hard for people to wrap their heads around. - -ZBI: InlineExpression cannot contain another expression. So there is no recursivity allowed. Fluent currently allows that recursive / nested use of expressions, but my presented proposal removed that aspect to address the simplicity concern. - -MIH: Let me show you an example of a SelectorMessage in the Java+Protobuf code. I am using the example from the ICU MessageFormat documentation. My implementation works the same as ICU4J MessageFormat, as you can see the side-by-side test assertions in the unit tests. - + +ZBI: InlineExpression cannot contain another expression. So there is no recursivity allowed. Fluent currently allows that recursive / nested use of expressions, but my presented proposal removed that aspect to address the simplicity concern. + +MIH: Let me show you an example of a SelectorMessage in the Java+Protobuf code. I am using the example from the ICU MessageFormat documentation. My implementation works the same as ICU4J MessageFormat, as you can see the side-by-side test assertions in the unit tests. + I think we can make some faster progress by hammering out discussions in small group discussions, as opposed to short text blurbs here and there. - -RCA: Before discussing, I agree with the idea of small task forces. I would like the people who take the initiative to take the effort and present. But I would like to emphasize that it is open and anyone is welcome to join and participate. But I think having people being more focused, through these task forces, would be helpful. - -Action item here: who wants to join - + +RCA: Before discussing, I agree with the idea of small task forces. I would like the people who take the initiative to take the effort and present. But I would like to emphasize that it is open and anyone is welcome to join and participate. But I think having people being more focused, through these task forces, would be helpful. + +Action item here: who wants to join + EAO: I'm interested - + GWR: I'm also interested. - + RCA: EAO will you help organize? - + EAO: Sure - + RCA: Who will be a backup? - + MIH: I can be a backup - + EAO: If you are interested in joining, include your name in the minutes doc: - + EAO MIH GWR ECH (for note-taking) - + RCA: Let's go to discussion using TCQ. - -GWR: Can a selector have multiple grammemes (grammatical category values), like definite, plural and genitive? It seems that it could be difficult to represent multiple - + +GWR: Can a selector have multiple grammemes (grammatical category values), like definite, plural and genitive? It seems that it could be difficult to represent multiple + MIH: I don't think the data model prevents us from doing that. - -GWR: There will be a lot of grammatical info (ex: prepositions, articles). A lot of gramemes that we will want to combine, potentially. Not 1 or 2, but 3, 4, or 5. - + +GWR: There will be a lot of grammatical info (ex: prepositions, articles). A lot of gramemes that we will want to combine, potentially. Not 1 or 2, but 3, 4, or 5. + RCA: (chat) could we have a test suite for these combinations ? - + MIH: The slide on placeholders (slide #3) shows that you can have multiple arbitrary annotations. - -GWR: What about multiple cases. "The light is on", "The lights are on", etc. Whether it's singular/plural, masculine/feminine, whether the word begins with a vowel, etc. Have you thought about how to handle it? - -MIH: Yes, somewhat. You have plural, gender, cases, etc. But the issue is that some placeholders in one part of the sentence affect the decision in another part of the sentence, and they need to match/agree. That's why I think message patterns should be translated at the full sentence level. - -GWR: What about issues like Hebrew where the message depends on more than one noun, it can take two nouns. - -MIH: There is nothing stopping you from putting 5 gender selectors of however many you need. It might mean you have a lot of counting - -RCA: MIH you are also discussing and moderator. We have a point of order to timebox this discussion. - + +GWR: What about multiple cases. "The light is on", "The lights are on", etc. Whether it's singular/plural, masculine/feminine, whether the word begins with a vowel, etc. Have you thought about how to handle it? + +MIH: Yes, somewhat. You have plural, gender, cases, etc. But the issue is that some placeholders in one part of the sentence affect the decision in another part of the sentence, and they need to match/agree. That's why I think message patterns should be translated at the full sentence level. + +GWR: What about issues like Hebrew where the message depends on more than one noun, it can take two nouns. + +MIH: There is nothing stopping you from putting 5 gender selectors of however many you need. It might mean you have a lot of counting + +RCA: MIH you are also discussing and moderator. We have a point of order to timebox this discussion. + ECH: What is the timebox? - + EAO: 5 minutes. - -GWR: About the gender labels male/female, I request that we use "masculine" and "feminine" to match CLDR. I recommend that we use RBNF from CLDR - -ZBI: Do you want to support variables in placeholder arguments, and function calls on multiple variables? If you look at the slide #3 in your presentation, I want to look at the flags in a placeholder. In Fluent, an Expression allows you to have the type string, number, or another variable. That allows you to pass multiple arguments to a formatting function. - + +GWR: About the gender labels male/female, I request that we use "masculine" and "feminine" to match CLDR. I recommend that we use RBNF from CLDR + +ZBI: Do you want to support variables in placeholder arguments, and function calls on multiple variables? If you look at the slide #3 in your presentation, I want to look at the flags in a placeholder. In Fluent, an Expression allows you to have the type string, number, or another variable. That allows you to pass multiple arguments to a formatting function. + Do you think this is something we should not support, or have you just not considered it yet? - -MIH: I haven't considered it yet. Not sure yet if we should or shouldn't support that. - -RCA: Create a test suite. We should create a set of tests to support all of the functionality that we want. We should of course - -GWR: I would love to do that. The question is, what is the syntax? - + +MIH: I haven't considered it yet. Not sure yet if we should or shouldn't support that. + +RCA: Create a test suite. We should create a set of tests to support all of the functionality that we want. We should of course + +GWR: I would love to do that. The question is, what is the syntax? + ECH: Could we encode examples similar to what SMY did in issue #94 to create examples that drove our proofs-of-concept work? - + # Other topics - + ## Updates on proofs-of-concept - + RCA: What is the status of proofs-of-concept work? - + ECH: No progress since last time. - -EAO: On this topic, there is issue #101, which is the draft sketch of another POC. If people would comment there, I would be interested in taking that forward. - + +EAO: On this topic, there is issue #101, which is the draft sketch of another POC. If people would comment there, I would be interested in taking that forward. + MIH: I need more examples to help me parse this better. - -ECH: I thought this proposal was compatible with previous proofs of concept. I think it goes into detail about - + +ECH: I thought this proposal was compatible with previous proofs of concept. I think it goes into detail about + ## Discussion on issue #103 Do we allow multiple multi-select messages to nest inside one another? - + RCA: Can we discuss this further, there was some activity. - -MIH: I think this is an example of something that is better discussed in smaller groups. But if we have extra time and people are interested, then we can discuss right now. - + +MIH: I think this is an example of something that is better discussed in smaller groups. But if we have extra time and people are interested, then we can discuss right now. + EAO: Let's talk about this for a bit. - + RCA: Let's give this a 10 min timebox. - -EAO: I don't think we can leave out concatenation from being a possibility. The language itself needs to make it possible. You can get so much more compression happening if you - -MIH: Is the concern here data size? or the developer having to type too much? - -EAO: All of it. There are some cases where it would help from having concatenation being possible, instead of being impossible. - -MIH: I tried to give an example of selection in the middle of a placeholder vs. selection for the whole message. What I am arguing is that it is possible to convert between the two forms algorithmically. So we are not losing information either way. Yes there may be wasted space and clunkier for the developer, but it is equivalent. - + +EAO: I don't think we can leave out concatenation from being a possibility. The language itself needs to make it possible. You can get so much more compression happening if you + +MIH: Is the concern here data size? or the developer having to type too much? + +EAO: All of it. There are some cases where it would help from having concatenation being possible, instead of being impossible. + +MIH: I tried to give an example of selection in the middle of a placeholder vs. selection for the whole message. What I am arguing is that it is possible to convert between the two forms algorithmically. So we are not losing information either way. Yes there may be wasted space and clunkier for the developer, but it is equivalent. + I pointed out that lots of TMS / localization concerns only happen at the level of a full sentence, let alone being much less complex for professional translators. - -EAO: The conversion is quadratic. And some use cases are not about professional translators and TMSes, only developers are involved. - + +EAO: The conversion is quadratic. And some use cases are not about professional translators and TMSes, only developers are involved. + MIH: Do we agree that we can convert between these 2 forms algorithmically? - + EAO: Yes - -MIH: Okay, that is core. I laid out 4 ways of representing this in syntax. We can choose how we balance between these two viewpoints of representation, but the nice point is that we can represent them the same way in memory according to the data model. - + +MIH: Okay, that is core. I laid out 4 ways of representing this in syntax. We can choose how we balance between these two viewpoints of representation, but the nice point is that we can represent them the same way in memory according to the data model. + RCA: Any comments or recommendations? - + MIH: Do we at least agree that all of these 4 options are equivalent? - + EAO: I don't think they reduce the problem space in any way, because they cover the entire problem space. - + MIH: I think they do, in a way, if we agree that they are equivalent, because it means that the data model is not wrong. - + EAO: I think I addressed this in a comment somewhere else by saying that if we have a function that is free of side effects, no matter how many times it is called. - -GWR: I think I understand, if you are converting a message with an article (it can be definite, indefinite), and then if you have a plural, the choice of article can affect the plural. But these choices need to be compatible (grammatically correct). - + +GWR: I think I understand, if you are converting a message with an article (it can be definite, indefinite), and then if you have a plural, the choice of article can affect the plural. But these choices need to be compatible (grammatically correct). + EAO: Yes, we need to make sure that there is no implicit state that we depend on when doing the conversion. - -RXS: For what it is worth, it is not clear to me how EAO's point relates to the question at hand. For me, MIH's 4 options are clear, but it is not clear what EAO is describing. - + +RXS: For what it is worth, it is not clear to me how EAO's point relates to the question at hand. For me, MIH's 4 options are clear, but it is not clear what EAO is describing. + EAO: Perhaps this discussion is besides the point. - + RCA: I propose that we leave comments on the issue, since the outcome doesn't seem to be boolean in the way that I was thinking. - + MIH: Well, maybe the outcome is boolean, but it requires more discussion. - + ECH: I think this is an instance of the more generalized action of creating task forces based on topics. - + ## Discussion on issue #105 How do we support default case selection values in a multi-select message? - + RCA: This issue seems resolved, no? - -MIH: No, I think this is like one of those, where we need to discuss further. And this goes back to the design principles that SMY was rightfully wanting to think about. I think these principles will emerge after we discuss. - + +MIH: No, I think this is like one of those, where we need to discuss further. And this goes back to the design principles that SMY was rightfully wanting to think about. I think these principles will emerge after we discuss. + RCA: I think we agreed that we will discuss these things in parallel. - + MIH: Yes, they are parallel and connected. - + ## Unicode Conference presentation - -ECH: We are presenting. Uni Conf is actually happening. Oct 14-16. Currently MIH, ZBI, and ECH as speakers. But any suggestions or requests or feedback is welcome. - + +ECH: We are presenting. Uni Conf is actually happening. Oct 14-16. Currently MIH, ZBI, and ECH as speakers. But any suggestions or requests or feedback is welcome. + ZBI: The more we have to present, the more feedback we have from the audience. - + GWR: I am presenting on Siri's message formatting system also at the upcoming Unicode Conference. - + RCA: We can also do a checkpoint for people who are presenting to review their presentation progress. - + # Sign ups for task forces: - -ECH: How do we organize? Do we use email? - + +ECH: How do we organize? Do we use email? + RCA: Since you're asking, can you? - -ECH: Sure, I can take responsibility for organizing the small group. How do I send the message out to everyone who is interested? - -RCA: We have the Slack channel, and I think most people are on it. - -ECH: Not everyone is on the slack channel. - + +ECH: Sure, I can take responsibility for organizing the small group. How do I send the message out to everyone who is interested? + +RCA: We have the Slack channel, and I think most people are on it. + +ECH: Not everyone is on the slack channel. + RCA: we can invite people to the Slack channel. - -ECH: We still need email addresses to invite people to Slack. Please leave your email addresses if you are not on the Slack channel. We can add you to the Slack channel and/or organize over email -- however we decide is best. - + +ECH: We still need email addresses to invite people to Slack. Please leave your email addresses if you are not on the Slack channel. We can add you to the Slack channel and/or organize over email -- however we decide is best. + MIH: I think we need some kind of prioritization. These are too many :-) - - + # Sign up for email addresses of anyone who isn't on the MFWG Slack: - - + ## #98: Support variable info not in message patterns + MIH NIC RCA SMY - + ## #99: Design Principle: Allow (or not) functions as data in data model? + MIH NIC RCA - + ## #101 Proposal: Use input mapping functions for case selection + EAO MIH NIC RCA - + ## #103: Do we allow multiple multi-select messages to nest inside one another? + MIH NIC RCA SMY - + ## #104: How do we handle and represent selection for a multi-select message? + EAO MIH GWR @@ -264,28 +267,31 @@ NIC ECH (for note-taking) RCA SMY - + ## #105: How do we support default case selection values in a multi-select message? + EAO MIH NIC RCA SMY - + ## #106: How do we support multiple selection args (selectors) in a multi-select message? + MIH NIC RCA SMY - + ## #107: Can we treat selectors and placeholders similar somehow, or must they be distinct? + MIH NIC RCA SMY - + ## #28 Bidi support in placeholders + MIH NIC RCA - diff --git a/meetings/2020/notes-2020-09-21.md b/meetings/2020/notes-2020-09-21.md index 3ae53f2372..62509a5725 100644 --- a/meetings/2020/notes-2020-09-21.md +++ b/meetings/2020/notes-2020-09-21.md @@ -1,4 +1,5 @@ #### September 21 Attendees: + - George Rhoten - Apple (GWR) - Romulo Cintra - CaixaBank (RCA) - David Filip - ADAPT Centre @ Trinity College Dublin (DAF) @@ -10,157 +11,149 @@ - Richard Gibson - OpenJSF (RGN) - Zibi Braniecki - Mozilla (ZBI) -## MessageFormat Working Group Contacts: +## MessageFormat Working Group Contacts: - [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) - ## Next Meeting October 19, 10am PDT (6pm GMT) ## Agenda + - [ Agenda on Github ](https://github.com/unicode-org/message-format-wg/issues/115) -Easy way to volunteer to participate in Chair Group : [Link](https://github.com/unicode-org/message-format-wg/pull/70) +Easy way to volunteer to participate in Chair Group : [Link](https://github.com/unicode-org/message-format-wg/pull/70) We should go ahead and get started - + ###Moderator : Romulo Cintra - + ### Chair Group Announcements - - - + ### Test Cases - -GRH: These test cases support our use cases for Siri, and encodes features that we support. For example, in Spanish, the definite article applies to the “and” of the last item in a list. This is true in Italian and Korean. - + +GRH: These test cases support our use cases for Siri, and encodes features that we support. For example, in Spanish, the definite article applies to the “and” of the last item in a list. This is true in Italian and Korean. + NIC: In CLDR, there is data to support this particular case of the last item of item list. - -GRH: I’m not sure that the data exists for CLDR, for example for Korean, and that there’s an issue. In addition, there is information about how to pronounce a number. Russian has the ultimate complexity -- you have grammatical gender, grammatical number, and grammatical case, in addition to the plural rules that CLDr defines. All those factors combine that make grammatical agreement challenging. - + +GRH: I’m not sure that the data exists for CLDR, for example for Korean, and that there’s an issue. In addition, there is information about how to pronounce a number. Russian has the ultimate complexity -- you have grammatical gender, grammatical number, and grammatical case, in addition to the plural rules that CLDr defines. All those factors combine that make grammatical agreement challenging. + This test syntax allows intermixing of different namespaces, ex: using the SSML namespace to define certain thing, like this section only applies in a “spoken” or “written” context. - -STA: Thanks for elaborating test cases with the Russian example, Polish is in the same category. Would that definition be a part of our data model or would be an external thing? - -GRH: Within Siri, we have a semantic feature model, as opposed to a “data model”. We have semantic feature concepts. In Spanish, “cometa”, it matters whether it is masculine or feminine, depending on the gender, it can either mean “kite” or “comet”. We have labels and categories of that label. The label - + +STA: Thanks for elaborating test cases with the Russian example, Polish is in the same category. Would that definition be a part of our data model or would be an external thing? + +GRH: Within Siri, we have a semantic feature model, as opposed to a “data model”. We have semantic feature concepts. In Spanish, “cometa”, it matters whether it is masculine or feminine, depending on the gender, it can either mean “kite” or “comet”. We have labels and categories of that label. The label + STA: In teh context of MF as a standard, do you think as a group, we should limit ourselves to how to call into that data, or should we be encoding how to store that data? - -GRH: That’s a longer discussion. I think Fluent does this. - -STA: In Fluent, we wanted one solution to solve both cases, where for example, we store that this is a genetive case. Which is why Siri is interesting to me. - -GRH: Within Siri, resolving ambiguous inputs is done in an external - + +GRH: That’s a longer discussion. I think Fluent does this. + +STA: In Fluent, we wanted one solution to solve both cases, where for example, we store that this is a genetive case. Which is why Siri is interesting to me. + +GRH: Within Siri, resolving ambiguous inputs is done in an external + EAO: On this topic, we should do both. - + DAF: On the topic of namespaces, I wanted to sh - -ECH: Are we trying to store all of this information? GRH mentioned dictionaries, TM, MT. If we try to make it possible to reference all of these different semantic variants, then we end up with a sentence-generation engine. Rather than message formatting. (ECH - PLEASE REVIEW) - + +ECH: Are we trying to store all of this information? GRH mentioned dictionaries, TM, MT. If we try to make it possible to reference all of these different semantic variants, then we end up with a sentence-generation engine. Rather than message formatting. (ECH - PLEASE REVIEW) + GRH: In message formatting, there will always be issues with noun and adjective inflection. We need something that’s functional and can cover it. A lot of translators are not able to get sentences into grammatical agreement in the current systems. - + NIC: I agree, it does seem more similar to NLG than simple formatting. No systems support this today. My concern is that solving it for all cases and all languages is going to be huge. I don’t see how it could ship as part of the platform. But maybe as something that apps could inject? - + GRH: I’m not saying this is easy. - -RCA: I have a similar question. Should we support similar semantic behaviour on MF? - -GRH: Siri and Fluent (to some extent) do it. So at least two solutions. We should discuss it. - -STA: We should put this as a topic on the agenda for whether and how data is stored and ships along with the model. We’re talking about a huge repository of data. - -ECH: Also, there are licensing issues - what should it be? And where should it be stored? Wikidata (public domain license)? - -DAF: I think we all agree that it is an orthogonal problem, and that the repository is outside of the data model. I would like to see the ability of the data model to tap into the repository or especially tap into CLDR. But I think this is orthogonally related. There is a demand for semantic relation in the word. - -GRH: CLDR has already stated that they don't want this data because of this fine grain level of detail and this quantity of data. Even if we don't have all of the data, at least we can make things functional. But I’m okay with the default approach we have currently of just specifying - - -EAO: This sounds like it is a strong argument that the core of MF should be flexible for anything it does. I believe that the core should not contain any data. If the data comes, it won’t come immediately, and when it comes, it will be a huge payload. But no matter what, the structure will not be fixed, it will have breaking changes. So if we include data, then it means we will be talking about MessageFormat for a long time, as MF v3, v4, v5, etc. But what we can do is define functions for transformations that take a few or many arguments that perform what we want. - -STA: I like the idea that we can tap into CLDR and have the data defined somewhere else. But what we’ve seen with Fluent is that we only need this semantic information and inflect for brand names. If you have text that appears in the UI, and you are translating it, you just write out the string. We only needed inflections for brand names because they change. Siri makes me see a different use case because nouns come from the user. Maybe we can have a common set of nouns and inflections. Maybe we can define how the data is stored, so that developers ship a data model of messages and a data structure of data. - -GRH: You’re right, we can ship inflections of brand names. But for inflections, we would have to ship code to do it, and sometimes it has to be done algorithmically, - + +RCA: I have a similar question. Should we support similar semantic behaviour on MF? + +GRH: Siri and Fluent (to some extent) do it. So at least two solutions. We should discuss it. + +STA: We should put this as a topic on the agenda for whether and how data is stored and ships along with the model. We’re talking about a huge repository of data. + +ECH: Also, there are licensing issues - what should it be? And where should it be stored? Wikidata (public domain license)? + +DAF: I think we all agree that it is an orthogonal problem, and that the repository is outside of the data model. I would like to see the ability of the data model to tap into the repository or especially tap into CLDR. But I think this is orthogonally related. There is a demand for semantic relation in the word. + +GRH: CLDR has already stated that they don't want this data because of this fine grain level of detail and this quantity of data. Even if we don't have all of the data, at least we can make things functional. But I’m okay with the default approach we have currently of just specifying + +EAO: This sounds like it is a strong argument that the core of MF should be flexible for anything it does. I believe that the core should not contain any data. If the data comes, it won’t come immediately, and when it comes, it will be a huge payload. But no matter what, the structure will not be fixed, it will have breaking changes. So if we include data, then it means we will be talking about MessageFormat for a long time, as MF v3, v4, v5, etc. But what we can do is define functions for transformations that take a few or many arguments that perform what we want. + +STA: I like the idea that we can tap into CLDR and have the data defined somewhere else. But what we’ve seen with Fluent is that we only need this semantic information and inflect for brand names. If you have text that appears in the UI, and you are translating it, you just write out the string. We only needed inflections for brand names because they change. Siri makes me see a different use case because nouns come from the user. Maybe we can have a common set of nouns and inflections. Maybe we can define how the data is stored, so that developers ship a data model of messages and a data structure of data. + +GRH: You’re right, we can ship inflections of brand names. But for inflections, we would have to ship code to do it, and sometimes it has to be done algorithmically, + EAO: Sometimes, it’s not even possible, in Finnish, people can define how it is defined, like if a word is used a person’s name, they can define how the inflection is pronounced. - + STA: In Fluent, we solve this by defining how to degrade gracefully. If the data is available, we use it. If not, we default to the “other” variant which uses an auxiliary noun. If the gender of $userName is not known, the “other” variant reads “The user $userName did something”, and now the gender of the subject of that sentence (“the user”) is known. - -DAF: I think everyone agrees that it is a repository type of data. I understand that if CLDR does want this data, then it can be stored somewhere else. It’s not a part of the message nor the payload, but if it is stored elsewhere, it can be tapped into. - -DAF: I think there is no end to this. I think it is wrong to not expand CLDR and think that it is has solved all - + +DAF: I think everyone agrees that it is a repository type of data. I understand that if CLDR does want this data, then it can be stored somewhere else. It’s not a part of the message nor the payload, but if it is stored elsewhere, it can be tapped into. + +DAF: I think there is no end to this. I think it is wrong to not expand CLDR and think that it is has solved all + GRH: I don’t think CLDR is a good fit, and that there should be a separate repository of lexicon data. - -RCA: Point of order - we should define how “external” (data) could be injected or used within MF. Also, I agree with STA’s proposal to create a task force around talking about such a repository. - -ECH: On the topic of lexical data, my talk in the upcoming Unicode Conference is related to that, at least adjacent to it. My talk also describes the challenges for languages like Tamil. When I think about that, what does “common” mean for “common names”? “Common” is relative to whom? Also, I think we want to be flexible because approaches right now for Tamil appear to be data-based rather than algorithm-based, and it’s not easy to come up with the algorithms, but it is doable. But ICU BreakIterator and normalization don’t handle splitting up abugida script data, and are we even thinking about it? We already agree that the problem is big, and I think it is even bigger than what we even realize. - - -DAF: If we need to support multiple namespaces, there are options for JSON namespace support. We started using underscores in the JSON keys as a way of simulating namespaces, in the same way that XML uses hyphens. - + +RCA: Point of order - we should define how “external” (data) could be injected or used within MF. Also, I agree with STA’s proposal to create a task force around talking about such a repository. + +ECH: On the topic of lexical data, my talk in the upcoming Unicode Conference is related to that, at least adjacent to it. My talk also describes the challenges for languages like Tamil. When I think about that, what does “common” mean for “common names”? “Common” is relative to whom? Also, I think we want to be flexible because approaches right now for Tamil appear to be data-based rather than algorithm-based, and it’s not easy to come up with the algorithms, but it is doable. But ICU BreakIterator and normalization don’t handle splitting up abugida script data, and are we even thinking about it? We already agree that the problem is big, and I think it is even bigger than what we even realize. + +DAF: If we need to support multiple namespaces, there are options for JSON namespace support. We started using underscores in the JSON keys as a way of simulating namespaces, in the same way that XML uses hyphens. + RCA: Any suggestions to be more specific or reword this issue on Github? - - -### Taskforce Efforts for #103 - - + +### Taskforce Efforts for #103 + MIH: Presenting [Slides](https://docs.google.com/document/d/1-6t6Yl5RHZI9QZwBDrFrl1fqSKSA4IMs1ef60IxD3lU/edit#) - + The main discussion was about whether we want to support whole message selection, or “in-message” selection - we allow the selection of pieces of the message. - + We didn’t end up with a clear decision because we are not authorized to make a decision and the point is to present the information to the main group to have everyone review and decide. - + The pros and cons are somewhat related, since the pros for one can be the cons for the other. - -The pros for message-level selectors are that it is friendly for translators b/c context exist outside of individual words and phrases. This is compatible with localization (l10n) tools. It makes it easy for implementers of the MFWG data model. Cons are that it is unfriendly of developers because it is verbose. (creators of messages) Verbosity also affects bandwidth of data over the wire. Also, developers (message authors) must decide whether to make a message part of a select even if the source language “doesn’t need it” (example, the source language has a single plural category, like Chinese, and a message has a number placeholder). - + +The pros for message-level selectors are that it is friendly for translators b/c context exist outside of individual words and phrases. This is compatible with localization (l10n) tools. It makes it easy for implementers of the MFWG data model. Cons are that it is unfriendly of developers because it is verbose. (creators of messages) Verbosity also affects bandwidth of data over the wire. Also, developers (message authors) must decide whether to make a message part of a select even if the source language “doesn’t need it” (example, the source language has a single plural category, like Chinese, and a message has a number placeholder). + NIC: Can we also add “scalability” to the cons of message-level selectors? - -MIH: Is that the same thing as verbosity over the wire? Let’s get through the rest and discuss. - -The sub-selector approach is friendly for developers, but unfriendly for translators. It is harder to grep for text or integrate with TMSes (l10n tools). - + +MIH: Is that the same thing as verbosity over the wire? Let’s get through the rest and discuss. + +The sub-selector approach is friendly for developers, but unfriendly for translators. It is harder to grep for text or integrate with TMSes (l10n tools). + One other important point is that either form (full-message selectors and sub-message selectors) is equivalent to the other, so we can convert back and forth between them. - - - -RCA: Let’s open 10 mins for discussion, questions, and doubts. In the last 5 minutes, if you have a decision, then we can come to a decision together. - + +RCA: Let’s open 10 mins for discussion, questions, and doubts. In the last 5 minutes, if you have a decision, then we can come to a decision together. + EAO: I do not think that we will come to a decision on which form to have (full-message vs sub-message selector), but I think we should agree that we must ensure that we have processing so that every message can be converted into full message form internally. - -MIH: I don’t know what it means that we cannot come to a decision. Either form can be converted to the other, so they’re the same. - + +MIH: I don’t know what it means that we cannot come to a decision. Either form can be converted to the other, so they’re the same. + EAO: But it means that if the message is written in in-message selectors, then round tripping will not be possible if it gets converted to full-message selectors, it gets translated, and it’s not clear how to convert it back to the in-message selectors. - -ZBI: Round-tripping is necessary for version control systems. If a translated message of a sub-message selector message comes back with full-message selectors, then it will change in version control. - + +ZBI: Round-tripping is necessary for version control systems. If a translated message of a sub-message selector message comes back with full-message selectors, then it will change in version control. + EAO: This is a real-world scenario, and I don’t know how to resolve this. - -DAF: I do think we discussed this in the task force, and I think STA was a proponent for what EAO was saying. I was saying both things, allow developers to use syntax of in-message selectors, but it will always be represented externally using full-message selectors. I personally think that this is the only way to make it translator-friendly and TMS-friendly and to work with l10n tools. I think this addresses both points. - + +DAF: I do think we discussed this in the task force, and I think STA was a proponent for what EAO was saying. I was saying both things, allow developers to use syntax of in-message selectors, but it will always be represented externally using full-message selectors. I personally think that this is the only way to make it translator-friendly and TMS-friendly and to work with l10n tools. I think this addresses both points. + RCA: Did we capture both points that EAO brought up? - + DAF: We did address this in the discussion because STA brought this up. - + STA: I have trouble remembering, but I felt that after realizing the shortcomings of the sub-message selectors, I viewed the full-message selectors as preferable to solve those problems. - -RCA: I’m not sure if we fully agree, but perhaps we can continue the discussion offline somehow. - + +RCA: I’m not sure if we fully agree, but perhaps we can continue the discussion offline somehow. + ECH: Let’s have a followup task force session to have people who disagree to join and discuss. - -MIH: I have a feeling that, as a group, we may not fully agree. But I don’t know how to break the deadlock. - -STA: This is a good point to separate personal opinions from objective merits. It would be good to dig into the use cases of round-tripping (ex: version control) and see if there are objective reasons to support or reject an option, and we can investigate in the meantime. - -DAF: (from chat) In reply to STA: I would support a layered approach. At the syntax level, we can support the nested selector approach in syntax and then have full-message messages when sending off to translation. + +MIH: I have a feeling that, as a group, we may not fully agree. But I don’t know how to break the deadlock. + +STA: This is a good point to separate personal opinions from objective merits. It would be good to dig into the use cases of round-tripping (ex: version control) and see if there are objective reasons to support or reject an option, and we can investigate in the meantime. + +DAF: (from chat) In reply to STA: I would support a layered approach. At the syntax level, we can support the nested selector approach in syntax and then have full-message messages when sending off to translation. In reaction to STA, the conversion would not be necessarily lossy. In L10n we work with this paradigm of extraction and merging. L10n happens between these brackets. And it is always assumed that merger has the full knowledge of the extraction process. - -RCA: I will send an invite for the next task force for next week. And then push the regular chair group meeting to the following week. Anyone who wants an invite to the task force meeting, let me know. - + +RCA: I will send an invite for the next task force for next week. And then push the regular chair group meeting to the following week. Anyone who wants an invite to the task force meeting, let me know. + EAO: I would like to be invited to the meeting. - -RCA: The task force meeting started an hour earlier in order to allow for an extra hour of discussion. I will schedule the next task force meeting in the same way. + +RCA: The task force meeting started an hour earlier in order to allow for an extra hour of discussion. I will schedule the next task force meeting in the same way. diff --git a/meetings/2020/notes-2020-10-19.md b/meetings/2020/notes-2020-10-19.md index edc5e5864e..a674fb95af 100644 --- a/meetings/2020/notes-2020-10-19.md +++ b/meetings/2020/notes-2020-10-19.md @@ -1,4 +1,5 @@ #### October 19 Attendees: + - Romulo Cintra - CaixaBank (RCA) - Nicolas Bouvrette - Expedia (NIC) - Staś Małolepszy - Google (STA) @@ -14,53 +15,47 @@ - Zibi Braniecki - Mozilla (ZBI) - Eemeli Aro - OpenJSF (EAO) - - -## MessageFormat Working Group Contacts: +## MessageFormat Working Group Contacts: - [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) - ## Next Meeting November 16, 10am PDT (6pm GMT) ## Agenda + - [ Agenda on Github ](https://github.com/unicode-org/message-format-wg/issues/115) -Easy way to volunteer to participate in Chair Group : [Link](https://github.com/unicode-org/message-format-wg/pull/70) +Easy way to volunteer to participate in Chair Group : [Link](https://github.com/unicode-org/message-format-wg/pull/70) We should go ahead and get started - -###Moderator : - + +###Moderator : + ### Chair Group Announcements - + ## Review Taskforce progress for #103 (Taskforce notes) - + Taskforce notes: https://docs.google.com/document/d/1lAyBZR2VQR8ILqvcg5Gad_wf7QWUbsoJ13wGZFSmtbE/edit#heading=h.tulel52cgapk - -ECH : Presenting taskforce notes, on the taskforce meeting we decided to evaluate the pros & cons based, we decided to have a more diverse group of stakeholders to help us find - - - + +ECH : Presenting taskforce notes, on the taskforce meeting we decided to evaluate the pros & cons based, we decided to have a more diverse group of stakeholders to help us find + ## MFWG - Stakeholder definition and debate - + RCA: This was to follow up from the task force meeting to discuss how we bring in stakeholders that help us decide on the message selector question. - + NIC: Do we have an issue for that? - -RCA: We didn't discuss doing this last meeting, but maybe it makes more sense to discuss a little bit in this meeting -- who to bring in and - + +RCA: We didn't discuss doing this last meeting, but maybe it makes more sense to discuss a little bit in this meeting -- who to bring in and + NIC: Some of the action items from the task force #2 (regarding issue #103) meeting notes: - + Collecting stakeholders Listing all categories of stakeholders Inviting more representatives from stakeholder categories Goal is collect information that will help us decide on priorities among the categories of stakeholders - - Stakeholders (by type) Developers (single language & i18n) @@ -79,211 +74,206 @@ Users (want something grammatically correct) Copywriting UX -* Check what other communities are doing in terms of stakeholder categories as an action item +- Check what other communities are doing in terms of stakeholder categories as an action item MIH: There are at least 2 types of translators: translators who are professional translators, you pay the bills by doing this all day long, you get paid by the word, and "community translators" which includes unpaid/amateur translators and multilingual developers. -ZIB: To the distinction between professional and amateur translators, we're not the first community to encounter this distinction. The list of categories of stakeholders and translators must be a superset of the categories that individual companies would need. +ZIB: To the distinction between professional and amateur translators, we're not the first community to encounter this distinction. The list of categories of stakeholders and translators must be a superset of the categories that individual companies would need. NIC: Should we add developer communities as a stakeholder? ZIB: Maybe we should look at how other organizations categorize their translators, organizations like W3C or TC39, instead of trying to invent categories ourselves. -DAF: I want to chime in and agree that "translator" is not a single category. We should further subdivide professional translators. Some professional translators are employed full time by companies or government agencies, and they are able to go deep on their subject area. There are freelance translators who still pay their bills through translation and work for different companies or through vendors, and they still might be able to volunteer their services. Then there are people who are terminology experts (terminologists), language specialists, etc. For all these people, the GUI and filtering capabilities, the variability of syntax, etc. are a huge barrier. As MIH mentioned, they don't want to be slowed down by new syntax. +DAF: I want to chime in and agree that "translator" is not a single category. We should further subdivide professional translators. Some professional translators are employed full time by companies or government agencies, and they are able to go deep on their subject area. There are freelance translators who still pay their bills through translation and work for different companies or through vendors, and they still might be able to volunteer their services. Then there are people who are terminology experts (terminologists), language specialists, etc. For all these people, the GUI and filtering capabilities, the variability of syntax, etc. are a huge barrier. As MIH mentioned, they don't want to be slowed down by new syntax. -In companies like Google or Oracle, these companies have employees who might go through the codebase, resolve issues, do quality checks, etc. And they may not even do translation themselves as much as managing vendors (freelancers) who do translation. +In companies like Google or Oracle, these companies have employees who might go through the codebase, resolve issues, do quality checks, etc. And they may not even do translation themselves as much as managing vendors (freelancers) who do translation. GRH: be sure to consider users (end users) who want to hear “natural sounding” language—they are also important stakeholders -RCA: Create persona for each category of stakeholders. Flesh out the persona's story -- who the person is, how they engage / what their use case is, what their challenges are, etc. +RCA: Create persona for each category of stakeholders. Flesh out the persona's story -- who the person is, how they engage / what their use case is, what their challenges are, etc. ECH: Personas are like when people apply Design Thinking, or people who apply design thinking to UX development design. -STR: I would consider language specialists and terminologists both in the same category called Localization Quality Manager (in Expedia terminology). My point being, if we have L.S. and Trm, we don't need a separate quality-focused category. +STR: I would consider language specialists and terminologists both in the same category called Localization Quality Manager (in Expedia terminology). My point being, if we have L.S. and Trm, we don't need a separate quality-focused category. -DAF: There is the EC translation service that splits 60% of the translation work in-house, and 40% get outsourced. The more work that they outsource, the more holistic the role becomes for the in-house employees to manage the translation supply chain. +DAF: There is the EC translation service that splits 60% of the translation work in-house, and 40% get outsourced. The more work that they outsource, the more holistic the role becomes for the in-house employees to manage the translation supply chain. -STA: What about UX and copywriters? Especially copywriters. +STA: What about UX and copywriters? Especially copywriters. RCA: We can take this as our list of stakeholders, but we can also look at other organizations as ZIB said. -DAF: WRT, Zibi’s point to look at how other consortia structure their audiences. W3C ha sthe internationalization core group that mainly looks at developers and concentrates on making HTML output “print ready”. In OASIS, only the XLIFF committees are aware of I18n/L10n. We use mainly three categories, buyers (content owners), CAT tool vendors, professional translators. We believe that ideally a translator should be able to keep working in their one editor of choice that would shield them from all technica complexity and variety, they should not be exposed to code/markup and related technical variances.. +DAF: WRT, Zibi’s point to look at how other consortia structure their audiences. W3C ha sthe internationalization core group that mainly looks at developers and concentrates on making HTML output “print ready”. In OASIS, only the XLIFF committees are aware of I18n/L10n. We use mainly three categories, buyers (content owners), CAT tool vendors, professional translators. We believe that ideally a translator should be able to keep working in their one editor of choice that would shield them from all technica complexity and variety, they should not be exposed to code/markup and related technical variances.. -ZIB: My suggestion was not to see how other organizations classify engineers, translators, etc., but how they classify their target audiences. W3C creates HTML, which targets both professional and amateur developers. +ZIB: My suggestion was not to see how other organizations classify engineers, translators, etc., but how they classify their target audiences. W3C creates HTML, which targets both professional and amateur developers. DAF: Html is no longer under W3C control, it went rogue. But the technologies that are still in W3C such as linked data or ODRL target I’d say an “extreme” developer.. ZIB: Since we're targeting ECMA TC39 as part of our output, maybe that is a good place to start. -RCA: Let's advance onto the next topic. I am creating the issue to capture this, and ZIB and [#124](https://github.com/unicode-org/message-format-wg/issues/124) +RCA: Let's advance onto the next topic. I am creating the issue to capture this, and ZIB and [#124](https://github.com/unicode-org/message-format-wg/issues/124) DAF and others, please review that issue. ## Review collected "extreme" MessageFormat use cases #119 - + Issue #119 to collect corner cases of message formatting. - + NIC: I called it "extreme" use cases, but maybe we call this "corner cases". - -STA: The general theme in these examples is that it is a long sentence with many selectors. We wanted to find real-world examples of messages with multiple selectors. For me, this is only one kind of corner case. Are we missing something? - + +STA: The general theme in these examples is that it is a long sentence with many selectors. We wanted to find real-world examples of messages with multiple selectors. For me, this is only one kind of corner case. Are we missing something? + NIC: Mihai are you creating your example (of rewriting a previous example) just using concatenation? - -MIH: Yes. If you don’t do some kind of concatenation or list formatting, then you’re forced to say odd things like “This resort has no pools, no golf courses…” But the example that I rewrote is just a list, and it is just asking for a list formatter. And it is not really in the same class as the other examples, where the items of the list don't have to match the other items grammatically (in terms of sentence agreement) as is necessary for the other examples. - + +MIH: Yes. If you don’t do some kind of concatenation or list formatting, then you’re forced to say odd things like “This resort has no pools, no golf courses…” But the example that I rewrote is just a list, and it is just asking for a list formatter. And it is not really in the same class as the other examples, where the items of the list don't have to match the other items grammatically (in terms of sentence agreement) as is necessary for the other examples. + NIC: Is there consensus that there is no real blocker to message-level selector? - + ZIB: I don't not agree that there is a consensus. - -RCA: In the task force meeting, we agreed to come up with corner cases to help us make a decision. How many examples do we need? - + +RCA: In the task force meeting, we agreed to come up with corner cases to help us make a decision. How many examples do we need? + STA: I don't think there is one clear answer. - + RCA: We can spend multiple more months discussing this. - -MIH: Do we have an agreement that we can convert between internal selectors and message-level selectors? If we agree that we can do that, then why would it matter? - + +MIH: Do we have an agreement that we can convert between internal selectors and message-level selectors? If we agree that we can do that, then why would it matter? + ZIB: We don't have an agreement that we can do this losslessly. - + MIH: Define losslessly. - + ZIB: We send for translation and then translate it back to the original language - + MIH: similar to Unicode normalization, you can normalize things but not round-trip, that we do not have a diff. For example, we could say “everything must be precomposed”, or we can say that all case comparison/string comparison/etc has to handle both normalized & non-normalized forms → my position is that you should normalize at the edge, and then simplify processing on the inside - -MIH: Do we care? I'd compare it to Unicode normalization. For example, if we have a Unicode string that has both composed and decomposed normalized characters, then we can normalize to composed or decomposed forms, but thereafter, we cannot recover the original string. - + +MIH: Do we care? I'd compare it to Unicode normalization. For example, if we have a Unicode string that has both composed and decomposed normalized characters, then we can normalize to composed or decomposed forms, but thereafter, we cannot recover the original string. + ZIB: It worries me that we cannot losslessly convert without - + EAO: We can losslessly convert to message-level selectors, but we cannot losslessly convert to internal selectors. - + ZIB: I think this formulation is interesting. - + LHS: I think it's not an issue of converting, b/c it's like saying we're converting ASCII to Unicode, but ASCII is already a subset of Unicode. - -MIH: The question of - -ZIB: What about a scenario of having 2 data models, where an organization can choose a simpler - + +MIH: The question of + +ZIB: What about a scenario of having 2 data models, where an organization can choose a simpler + MIH: Why support 2 data models in the standard, where we already know that one style is messy? - + ZIB: That's just your opinion that it's messy. - + EAO: What is messy - + ZIB: What’s different from your analogy is that we don’t have agreement that one model is universally better, so we can’t just use it. If we use “lossy” versions, then anyone who is using another model will be “canonicalized” into another version, and it worries me to say that “since we can convert, it’s not a problem”, since it is. - - -STA: Where does the complexity live? We're discussing where to put the complexity, but if we keep the standard simple, then we can design syntaxes that have more complex, so long as they convert to the standard, that is simpler. - + +STA: Where does the complexity live? We're discussing where to put the complexity, but if we keep the standard simple, then we can design syntaxes that have more complex, so long as they convert to the standard, that is simpler. + MIH: Yes, you summarized my point well. - -ECH: Complexity and Simplicity are opposite, Easy and Difficult are opposite, and different. I'm going to reference info from Simple Made Easy which does a good job defining precisely what simplicity and complexity mean. Complexity/simplicity is an objective measure, while easy/difficult are relative measures. When we talk about whether the combinatorial expansion that occurs when converting internal selector messages into message-level selector messages, the increase in messages is harder for humans, which is a relative measure. The level of difficulty can be mitigated, ex: with existing CAT tools. But the increase in message count is not a sign of complexity. Cardinality is not the same as complexity. Simplicity is not about having just 1 thing. In fact, simplicity means increasing the number of things because it is about taking apart intertwined distinct concerns and keeping them separate, so it increases the number of things. - + +ECH: Complexity and Simplicity are opposite, Easy and Difficult are opposite, and different. I'm going to reference info from Simple Made Easy which does a good job defining precisely what simplicity and complexity mean. Complexity/simplicity is an objective measure, while easy/difficult are relative measures. When we talk about whether the combinatorial expansion that occurs when converting internal selector messages into message-level selector messages, the increase in messages is harder for humans, which is a relative measure. The level of difficulty can be mitigated, ex: with existing CAT tools. But the increase in message count is not a sign of complexity. Cardinality is not the same as complexity. Simplicity is not about having just 1 thing. In fact, simplicity means increasing the number of things because it is about taking apart intertwined distinct concerns and keeping them separate, so it increases the number of things. + ECH: There's a certain amount of inherent complexity in the problem, which means that at least that amount of complexity will live one way or another. The question of how much complexity should live in the standard is still a good one to ask. - + LHS: ...but a form that allowed internal selectors would also allow external selectors, it’s just a superset - + ZIB: Let’s say your translation tools only offer external selectors, so a data model that works with internal, you could compile to external selectors on the outside - + MIH: So it goes back to where we normalize. We could normalize when we go into the data model, or when we convert to XLIFF. - + ZIB: Let’s say an organization doesn’t accept anything but external. We could just convert to this. - + MIH: But why complicate the standard data model when we know that internal is horribly messy? - + EAO: It is only messy from a human point of view, from a machine point of view it can be very straightforward - + STA: This is about hiding or showing complexity, and who needs to deal with that complexity? If the inline approach is there, then everyone dealing with the standard has to deal with that complexity. If we do external selectors, then the complexity is dealt with by parties that want to deal with that, and the standard is more “primitive” (in a good way). Maybe you care about size, or expressiveness, you can still do it...and then you deal with that complexity, but the underlying standard is simple. - + MIH: That’s a good summary of my take. The producers (mostly developers) write a string with internal/inline selectors, and many consumers are translators, which we know are difficult to handle...especially when we move to XLIFF. - + ECH: One point worth putting in: in case our discussions of simplicity aren’t making sense (see the talk). Complexity & cardinality are not the same thing. Complexity & simplicity are opposites, it’s more about “easy and difficult”...it doesn’t mean “the number of things”, it’s the opposite (it’s about separate things being separate), so teasing apart things that are different means there are more things. Maybe you have more subparts of the message that need to be consistent with each other, there might be some more. - + EAO: The problem we’re solving is that we have complex & complicated messages by any definition, and how to express those, one solution is internal selectors. Another possible solution is to allow 1 message to be built up from parts of other messages (e.g. Mihai’s example with ListFormatter). We need to resolve both of these issues...maybe not together, but they correlate with each other. I believe that if we don’t have either, we have a bad standard, and if we have 1 we might not need the other. - + EAO: We need to talk about message references, because this will help decide how to handle one model or the other. - -ZIB: We keep going back and forth on the topic. It shows that the problem is non-trivial, because if we were, we wouldn't keep going back and forth. I would like us to be very humble, because the task is enormous, and what we decide will be used by many people for several years, because if this goes into JS, then many people will be using it. So maybe we can design the data model via layers, where we have a base layer in the standard that is simple, but that we enable other features in subsequent layers. - -MIH: I would like to see how that looks in the data model. Natural languages don't change as much as programming languages. French - + +ZIB: We keep going back and forth on the topic. It shows that the problem is non-trivial, because if we were, we wouldn't keep going back and forth. I would like us to be very humble, because the task is enormous, and what we decide will be used by many people for several years, because if this goes into JS, then many people will be using it. So maybe we can design the data model via layers, where we have a base layer in the standard that is simple, but that we enable other features in subsequent layers. + +MIH: I would like to see how that looks in the data model. Natural languages don't change as much as programming languages. French + STA: The problem with layering is that if we design things in a top-down way to allow for both, then it allows for both in a way that could be messy if we have 2 different ways to interpret things. - -Back to the question of where the complexity lives, I am trying to dig down to the question that addresses the fundamental issue. If we have top-level selectors in the standard, then we say that messages authored using internal selectors can convert back and forth, and use message references to do so. But what I want to know is can we define the data model as the data of - -LHS: As Zibi messaged in the Unicode Conf, the stakes are high because if we get this wrong, we will live the consequences for a long time in the future. I agree with MIH that it is unlikely that natural languages will come up with new grammar constructs any time soon. Tech moves quickly, so we can have humility about the pace of change with technology. As STA was talking about, maybe there is - -ECH: Tying message references <-> internal selectors (as EAO said), so if we define data model to cover the message in transit to translators (as STA said), then - -ZIB: I agree that languages don't change often, but I do expect the way humans interact with technology will change in the next 5 years. We should define the message processing to be as lossless as possible. My concern is that if we use message references instead of internal selectors, maybe it can work, even though it is not a part of the standard, but then maybe it goes unsupported when it should have been supported, which will cause the need to create v3 of MessageFormat to fix. - -EAO: Can we support function calls that take more than one variable? It is true that natural languages don't change often, but how we expose language functionality changes more frequently. We're still understanding how to represent things, ex: plural range selectors -- "0 - 1 items" (not "0 - 1 item") needs 2 arguments to determine. - -RCA: We need to have a followup discussion on this topic. When can we do this? Hopefully can we come to a resolution so that we can avoid - -STA: ZIB and LHS's comments make me realize that lossiness / losslessness is also a spectrum, and sometimes lossiness is okay. When we speak about user-visible functionality, the losslessness is okay because once it is consumed by the user, it is done, the effect is achieved. But if we care about the effects of tooling like version control, then losslessness matters, and I worry if we make it difficult for internal selectors in this regard, then we encourage them to go off and solve the problem in their own way. - -EAO: I think this question of lossiness is the wrong question to be focusing on. I think the question comes down to whether we allow message references or internal selectors. - -MIH: My position is that we should support message references. At times, it can be useful. We can't stop people from using message references in ways that are bad, like concatenation of the content of those message references, but we should still allow it. - + +Back to the question of where the complexity lives, I am trying to dig down to the question that addresses the fundamental issue. If we have top-level selectors in the standard, then we say that messages authored using internal selectors can convert back and forth, and use message references to do so. But what I want to know is can we define the data model as the data of + +LHS: As Zibi messaged in the Unicode Conf, the stakes are high because if we get this wrong, we will live the consequences for a long time in the future. I agree with MIH that it is unlikely that natural languages will come up with new grammar constructs any time soon. Tech moves quickly, so we can have humility about the pace of change with technology. As STA was talking about, maybe there is + +ECH: Tying message references <-> internal selectors (as EAO said), so if we define data model to cover the message in transit to translators (as STA said), then + +ZIB: I agree that languages don't change often, but I do expect the way humans interact with technology will change in the next 5 years. We should define the message processing to be as lossless as possible. My concern is that if we use message references instead of internal selectors, maybe it can work, even though it is not a part of the standard, but then maybe it goes unsupported when it should have been supported, which will cause the need to create v3 of MessageFormat to fix. + +EAO: Can we support function calls that take more than one variable? It is true that natural languages don't change often, but how we expose language functionality changes more frequently. We're still understanding how to represent things, ex: plural range selectors -- "0 - 1 items" (not "0 - 1 item") needs 2 arguments to determine. + +RCA: We need to have a followup discussion on this topic. When can we do this? Hopefully can we come to a resolution so that we can avoid + +STA: ZIB and LHS's comments make me realize that lossiness / losslessness is also a spectrum, and sometimes lossiness is okay. When we speak about user-visible functionality, the losslessness is okay because once it is consumed by the user, it is done, the effect is achieved. But if we care about the effects of tooling like version control, then losslessness matters, and I worry if we make it difficult for internal selectors in this regard, then we encourage them to go off and solve the problem in their own way. + +EAO: I think this question of lossiness is the wrong question to be focusing on. I think the question comes down to whether we allow message references or internal selectors. + +MIH: My position is that we should support message references. At times, it can be useful. We can't stop people from using message references in ways that are bad, like concatenation of the content of those message references, but we should still allow it. + RCA: We will have a followup meeting as another round of the task force for issue #103 next Monday, and defer the Chair Group meeting to the following week, like we did last month. - + NIC: Agreement for task force meeting on this topic? Anyone disagree? - + RCA: Will have as an agenda topic - + ZIB: Going back-and-forth between different models, etc. It doesn’t seem like it’s trivial, I think we’d all agree that if it were trivial we would already reach it. This is probably not the first group that has dealt with this kind of issue. I’d like us to be humble about our ability to say what is certain for the future. Everything we’re standardizing → maybe “internal selectors” is some kind of “extension” that includes a script that converts back & forth, so if we come to a conclusion that something isn’t sustainable, they can easily plug the next step. Maybe think of it as “layers”?? - + MIH: I would like to see how that looks in the data model. If we say we can add internal selectors later, that’s fine, but if we say “let’s put some fancy hooks in now” then I don’t understand the reason. I understand being humble, but we don’t want a Turing Machine in there. Languages don’t change that fast (it’s not like French will develop some new grammar in the next few years)...so it’s a finite problem. - + STA: The problem with layering is that we can start with a top-level, and then we’d end up with both of them in the standard, and I’d like to avoid that situation. If we have both, it’s probably not good (too many choices, etc.). The reason the in-message approach might be preferable...I’m trying to dig down into the question that will help solve this...I think I have a question: if we push complexity to tool authors, and say “the standard only supports external but you can support however you want” → Zibi’s concern is that the tool author can’t actually go back to the in-message selector way, so a question is should the standard be the translation, or the party that implements only uses the standard for “transport”, or canonicalization? - + LHS: Zibi said something useful at the Unicode Conference re: being stuck with this standard so we need to get this right! Also languages might not change, but technology might change. One way of thinking about it is “how hard would it be to add this later?”...it seems easier to add internal selectors later than take them out later? Another question is how important “lossless” conversion is, as long as it’s functionally equivalent…? - + ECH: If you had a simpler data model for “in transit”, then it’s up to the system that supports internal selectors, that system would have to handle conversion - + ZIB: I agree with Mihai, and Luke somewhat responded: the way we respond to software might change very much even though languages might not change much. It’s hard to predict what aspects will be necessary. Languages will always be more complex than we can encode them, and it will always be “lossy” for human languages, what are we OK not including from human language? This is a very challenging question when we are trying to handle the future. I agree that we should be more humble about technology than languages. How important is lossless? It’s important if we want internal selectors to be a “first class citizen” not a “second class citizen”. What does the layer look like? It looks like it accepts internal selectors, but it doesn’t support it right now, with a flag that turns it off and on? If the data model doesn’t allow for it at all, then adding it later would require MF 3.0 - + MIH example in chat: message = You deleted {fancyFiles, reference} from this folder. fancyFiles = {fileCount, plural, one {# file} other {# files}} - + EAO: A couple of things, a bit related, 1) we do need to get things right in 2.0, we shouldn’t have a “Messageformat Basic” with too many variants, we should have 1 that we are publishing. 2) If we decide we don’t allow internal selectors, are we allowing internal function calls with things inside them? We might have things that look like an internal selector but technically aren’t (like MIH’s example). 3) Languages don’t change quickly but the way we represent them can. For example, the CLDR data for the rules on the pluralization of ranges (e.g. “0-1 items”) isn’t well-identified for many languages. We might later identify that there’s a part of a language that ought to be represented in the standard but isn’t, and that part of the language may require changing some of our presumptions. - + NIC: similar to list selectors! - + RCA: to close → when is the best time for the next Task Force meeting? Next Monday? → general agreement - + STA: What I was trying to get at about lossy vs. lossless conversion: it’s also a spectrum, in some cases it’s OK...when you shift things to users, it’s OK to convert lossily since you’re not coming back (just being displayed to users). So in this case, it’s OK to be lossy, but if you care about round-trip to tooling then lossiness is more of a problem. If we go with the top-level approach, then other parties might come up with “ASCII Latin 1 Exetended” to handle what we don’t. - + RCA: ??? versioning - -ZIB: Imagine a company that decides to use internal selectors, and on every step of their tools, it gets converted to external selectors and then back, and then it changes, and on roundtrip it gets lost. So this company would have a papercut and realize that - + +ZIB: Imagine a company that decides to use internal selectors, and on every step of their tools, it gets converted to external selectors and then back, and then it changes, and on roundtrip it gets lost. So this company would have a papercut and realize that + MIH: A typical use case is that a developer writes a message, and then it gets translated into X languages, and they all go into version control, so the translations are “normalized” but not the English one. - + EAO: ...but if the translator fixes the English version, then they’d need to change it - + RCA: but a conversion needs a new version - + MIH: like using spaces vs. tabs → if I have a tool that does the proper style for my company, if I don't respect the style, and it changes behind my back, then it is what it is. If someone changes a translation directly, it’s OK (although they shouldn’t edit things directly) - + EAO: Lossiness is the wrong question to be focusing on, only valid if we do transformations, which are only valid if we have either internal selectors or references. We could say that it’s possible to move back & forth but maybe not losslessly. - + MIH: I believe that we should allow external references. It’s true that you could do “horrible” things like concatenation, but developers will do this anyway. By having a standard way to represent this, I’ve allowed explicitly in the data model/syntax so I can put tools around that, maybe even Lint / detect that you’re doing something problematic. - - - - + ## Quora React Localization Framework - + IUC44 Presentation - + ## MFWG - Stakeholder definition and debate - + ## Summary/Review : Unicode Conf slide deck for MFWG - -## Quora React L10n Framework data model alignment +## Quora React L10n Framework data model alignment diff --git a/meetings/2020/notes-2020-11-16.md b/meetings/2020/notes-2020-11-16.md index ee196422d0..3822daed87 100644 --- a/meetings/2020/notes-2020-11-16.md +++ b/meetings/2020/notes-2020-11-16.md @@ -1,4 +1,5 @@ #### November 16th Meeting Attendees: + - Romulo Cintra - CaixaBank (RCA) - Nicolas Bouvrette - Expedia (NIC) - Staś Małolepszy - Google (STA) @@ -14,27 +15,27 @@ - Standa Rygal - Expedia (STR) - Zibi Braniecki - Mozilla (ZBI) -## MessageFormat Working Group Contacts: +## MessageFormat Working Group Contacts: - [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) - ## Next Meeting December 14, 10am PDT (6pm GMT) ## Agenda + - [ Agenda on Github ](https://github.com/unicode-org/message-format-wg/issues/129) -Easy way to volunteer to participate in Chair Group : [Link](https://github.com/unicode-org/message-format-wg/pull/70) +Easy way to volunteer to participate in Chair Group : [Link](https://github.com/unicode-org/message-format-wg/pull/70) We should go ahead and get started - + ### Moderator : Romulo Cintra - + ### Chair Group Announcements - -STA: Current proposal is to use Github "labels" to describe the broad type / topic of the issue. Ex: organization, documentation, requirements. Github "milestones" feature to be used to track work that is due, so we will use for the work to be done by the next monthly meeting. If work gets pushed from one milestone to the next, then it is an indication that something needs to be adjusted (blockers, task scope). Lastly, Github offers "project boards". Let's experiment and use them in any way we see fit. Task forces seem like a good fit, but even other uses and short-term (1-2 weeks) uses of project boards would be good, too. + +STA: Current proposal is to use Github "labels" to describe the broad type / topic of the issue. Ex: organization, documentation, requirements. Github "milestones" feature to be used to track work that is due, so we will use for the work to be done by the next monthly meeting. If work gets pushed from one milestone to the next, then it is an indication that something needs to be adjusted (blockers, task scope). Lastly, Github offers "project boards". Let's experiment and use them in any way we see fit. Task forces seem like a good fit, but even other uses and short-term (1-2 weeks) uses of project boards would be good, too. RCA: This is about organization, so let's see how it goes. @@ -46,21 +47,21 @@ RCA: New feature in TCQ called "check temperature". [Slides](https://docs.google.com/presentation/d/19fxGJuFcGRwQiWYlppsOmNhyK8wxG6ZeMtrHeAmf8vI) -STA: I tried to extract the takeaways from the 3 meetings of the task force (for issue #103) that we discussed. I took the liberty of coming up with examples that we didn't discuss. If the examples are not representative, let me know as it comes up. The choice is between "in-message" (aka "internal") selectors as we currently have in ICU MessageFormat, and "top-level" (aka "message-level") selectors. As we focused, there were 3 main themes: compatibility, expressive power, and friendliness towards different actors. Compatibility - there are a lot of tools that may not be willing to adapt to a new way of doing things. Expressive power - questions of verbosity and ability to round-trip. Friendliness - the most subjective, and how amenable it is to translators and whether they can understand what happens at the boundaries of nested messages with internal selectors. +STA: I tried to extract the takeaways from the 3 meetings of the task force (for issue #103) that we discussed. I took the liberty of coming up with examples that we didn't discuss. If the examples are not representative, let me know as it comes up. The choice is between "in-message" (aka "internal") selectors as we currently have in ICU MessageFormat, and "top-level" (aka "message-level") selectors. As we focused, there were 3 main themes: compatibility, expressive power, and friendliness towards different actors. Compatibility - there are a lot of tools that may not be willing to adapt to a new way of doing things. Expressive power - questions of verbosity and ability to round-trip. Friendliness - the most subjective, and how amenable it is to translators and whether they can understand what happens at the boundaries of nested messages with internal selectors. -Compatibility - verbosity - shows Slide 4 (nested messages / internal selectors) and Slide 5 (expanded message group using message-level selectors). Chair Group found the message-level selector form to be more expressive, for lack of a better word, and it would work well for existing tools, and from my personal experience, well for translators, too. The examples in Slide 2 are too small to reveal as much difference as the examples in Slide 4 vs Slide 5 in this regard. +Compatibility - verbosity - shows Slide 4 (nested messages / internal selectors) and Slide 5 (expanded message group using message-level selectors). Chair Group found the message-level selector form to be more expressive, for lack of a better word, and it would work well for existing tools, and from my personal experience, well for translators, too. The examples in Slide 2 are too small to reveal as much difference as the examples in Slide 4 vs Slide 5 in this regard. -The concern about verbosity comes up for a message with multiple internal selectors with multiple options. Slide 6 has an example. +The concern about verbosity comes up for a message with multiple internal selectors with multiple options. Slide 6 has an example. -Mini-consensus - see Slide 7. We can allow message message references. We can allow each of the internal messages to be a separate top-level message whose translations are included by id in the original message. The major argument for having this feature is because that developers work around the lack of this feature anyways. They do have shortcomings - it creates dependencies and the possibility of cyclic dependencies, it requires including context of other message translations, etc. It also requires some coordination between the "parent message" and "child message". The child message needs to be able to receive the context of the parent message. In Slide 10, we see an example. We have to consider that the child message can use a runtime variable that depends on any parent messages. The parent message has to pass values for all variables that the child message needs. So there is still complexity here, we have just moved it around. +Mini-consensus - see Slide 7. We can allow message message references. We can allow each of the internal messages to be a separate top-level message whose translations are included by id in the original message. The major argument for having this feature is because that developers work around the lack of this feature anyways. They do have shortcomings - it creates dependencies and the possibility of cyclic dependencies, it requires including context of other message translations, etc. It also requires some coordination between the "parent message" and "child message". The child message needs to be able to receive the context of the parent message. In Slide 10, we see an example. We have to consider that the child message can use a runtime variable that depends on any parent messages. The parent message has to pass values for all variables that the child message needs. So there is still complexity here, we have just moved it around. -This leads to the other mini-consensus (slide 12) which is to pass parameters to the messages that are referenced. Slide 13 has example. The parameter/variable name must match, and we need a way to validate the values passed. This effectively creates a public API. In the example, we have an enumerated type, and we might want to ensure that the supplied value exists in the supported value set in the child message. Maybe the child message can report back on any exceptions. +This leads to the other mini-consensus (slide 12) which is to pass parameters to the messages that are referenced. Slide 13 has example. The parameter/variable name must match, and we need a way to validate the values passed. This effectively creates a public API. In the example, we have an enumerated type, and we might want to ensure that the supplied value exists in the supported value set in the child message. Maybe the child message can report back on any exceptions. -When it comes to validation, there could be different places where the valid values can be defined, perhaps in a layer-like way. There could be a MFWG described standard, there could be a public open-source maintained standard, and there could be a developer/org-specific set of defined values. +When it comes to validation, there could be different places where the valid values can be defined, perhaps in a layer-like way. There could be a MFWG described standard, there could be a public open-source maintained standard, and there could be a developer/org-specific set of defined values. -MIH: To address whether we have explicitly defined parameters or not, in the case of the example on Slide 10, the count=$restaurantCount can be any value. But when it comes to grammatical case, we should be explicit because there is a known finite set of grammatical cases. +MIH: To address whether we have explicitly defined parameters or not, in the case of the example on Slide 10, the count=$restaurantCount can be any value. But when it comes to grammatical case, we should be explicit because there is a known finite set of grammatical cases. -NIC: Looking at the example in Slide 10, maybe the syntax is more verbose than internal selectors for nested messages. Maybe we can reduce this? The example in Slide 11 has a lot more parameters. +NIC: Looking at the example in Slide 10, maybe the syntax is more verbose than internal selectors for nested messages. Maybe we can reduce this? The example in Slide 11 has a lot more parameters. EAO: Before we get to the discussion that we are currently in, we should first discuss whether we want to proceed with the consensii (the 2 consensuses) from the chair group. @@ -74,19 +75,19 @@ DAF: I agree with MIH about using registries as a solution for enumerated values LHS: Are we discussing a syntax or a data model? I agree with NIC that the example uses slightly more verbose syntax, but syntax is a separate problem from the data model problem. -STR: I want to be clear on what the working group agrees on. Does this mean that internal selectors would be removed from the standard? +STR: I want to be clear on what the working group agrees on. Does this mean that internal selectors would be removed from the standard? EAO: We did not reach a consensus on that topic, but it would be an obvious next step to decide. STA: This segues well with the topic that ZIB wanted to talk about as the next top-level meeting agenda item. -ECH: I just want to point out again that there is a difference in verbosity vs. easy vs. simplicity. The question of verbose or concise is not the same easy / difficult, and that is not an indication of simplicity. In fact, making things simple is all about taking apart separate things that do not belong together, and so when you take things apart, you create more things. Having more things to deal with might look more verbose or not convenient, but it may still be simpler. In our case, we have nested messages with internal selectors where their translations depend on each other and the context of the top-level message, and rewriting using message-level selectors through an Cartesian product expansion, creates more patterns, but that verbosity is actually simplicity in action. Just a reminder so that we don't confuse verbosity and difficulty with complexity. +ECH: I just want to point out again that there is a difference in verbosity vs. easy vs. simplicity. The question of verbose or concise is not the same easy / difficult, and that is not an indication of simplicity. In fact, making things simple is all about taking apart separate things that do not belong together, and so when you take things apart, you create more things. Having more things to deal with might look more verbose or not convenient, but it may still be simpler. In our case, we have nested messages with internal selectors where their translations depend on each other and the context of the top-level message, and rewriting using message-level selectors through an Cartesian product expansion, creates more patterns, but that verbosity is actually simplicity in action. Just a reminder so that we don't confuse verbosity and difficulty with complexity. -ZIB: I'm a little concerned about moving forward with voting on consensii without discussing the issue of dynamic references. In slide 8, imagine instead of `term-pool`, you would pass the name of a referenced message as an argument. For the message `description`, you call it by specifying `$feature1` is set to `"term-pool"`. Dynamic references are similar to concatenation of translated nested message, in that if you don't allow it, people will still find a way to do it. +ZIB: I'm a little concerned about moving forward with voting on consensii without discussing the issue of dynamic references. In slide 8, imagine instead of `term-pool`, you would pass the name of a referenced message as an argument. For the message `description`, you call it by specifying `$feature1` is set to `"term-pool"`. Dynamic references are similar to concatenation of translated nested message, in that if you don't allow it, people will still find a way to do it. STA: Is any of this blocking the current proposal to vote on consensii. -ZIB: Is there any reason to reject this idea upfront? If not, is there an argument, are there any arguments against this idea, or is it compatible with what we are proposing? I was just concerned at the pace of proposals. +ZIB: Is there any reason to reject this idea upfront? If not, is there an argument, are there any arguments against this idea, or is it compatible with what we are proposing? I was just concerned at the pace of proposals. LHS: Would you pass arguments along with these message id name values for dynamic message references? @@ -106,28 +107,28 @@ GRH: Separate problem, but highly relevant. MIH: I don't see this as a blocker, but rather, a specialization of passing parameters for message references. -EAO: Some of the examples could be resolved by using top-level selectors, which means that the problems can be resolved and still maintain the benefits of static checking, etc. As for a next step for dynamic message references, take a list of examples that we've looked at and come up with examples that illustrate when the dynamic references are strictly necessary. +EAO: Some of the examples could be resolved by using top-level selectors, which means that the problems can be resolved and still maintain the benefits of static checking, etc. As for a next step for dynamic message references, take a list of examples that we've looked at and come up with examples that illustrate when the dynamic references are strictly necessary. -GRH: We've tried to ban dynamic references from our codebase more than once, but developers still bring it back. We should keep a space open for discussing it. Translators complain that it becomes a black box that causes problems. So it's a tough problem. +GRH: We've tried to ban dynamic references from our codebase more than once, but developers still bring it back. We should keep a space open for discussing it. Translators complain that it becomes a black box that causes problems. So it's a tough problem. MIH: I agree with GRH. And that it's orthogonal to the current topic. -DAF: I agree with GRH and MIH that blackboxes are going against empowering the translator, +DAF: I agree with GRH and MIH that blackboxes are going against empowering the translator, Dynamic references seem orthogonal to the topic and not blocked if we approve the two task force consensi. Going forward there should probably be an option, either translator passes parameters to the API or the black box becomes gray and tells the translator what it is (for instance what is it’s intended case etc.) -RCA: Should we start voting on the 2 consensi proposed from the Chair Group? And that the points brought up can be discussed separately later? +RCA: Should we start voting on the 2 consensi proposed from the Chair Group? And that the points brought up can be discussed separately later? -The first item: should we include message references in the data model? Does anyone disagree? +The first item: should we include message references in the data model? Does anyone disagree? After waiting, it looks like there is no opposition, so there is consensus on that part. -The second item: should we pass parameters to the message being referenced and validate them? Does anyone disagree? +The second item: should we pass parameters to the message being referenced and validate them? Does anyone disagree? After waiting, it looks like there is no opposition, so there is consensus there too. RCA: Let's move to the next topic, although we only have 20 minutes left. We can see how much progress we can make. STA, I see that you have comments on message glossaries. -STA: How do I share the doc? Should I share using the working group email list? +STA: How do I share the doc? Should I share using the working group email list? RCA: Just drop the link here for now. You will need to request access: @@ -147,13 +148,13 @@ EAO: I think we are ready to resolve this issue and say that message-level selec DAF: I agree with EAO, and I don't think we need to create extra structure. People should feel free to take it upon themselves to raise concerns as they come up. -STA: If we end up with a situation where, in a year from now, we decide that we need +STA: If we end up with a situation where, in a year from now, we decide that we need -ZIB: I don't agree with EAO that we're ready. Let's not assume that we've properly explored it, investigated it, and explained it. I spent a lot of time with MIH discussing it, but using arguments that were explicitly rejected in the proposal with careful consideration and explanation. So I would ask for more diligence from others in responding. To STA, the situation where having 2 competing proposals would be bad, but we haven't even decided whether in such a scenario, they would even be solving the same problem. I would ultimately listen to the group, but I think these claims seem confident. +ZIB: I don't agree with EAO that we're ready. Let's not assume that we've properly explored it, investigated it, and explained it. I spent a lot of time with MIH discussing it, but using arguments that were explicitly rejected in the proposal with careful consideration and explanation. So I would ask for more diligence from others in responding. To STA, the situation where having 2 competing proposals would be bad, but we haven't even decided whether in such a scenario, they would even be solving the same problem. I would ultimately listen to the group, but I think these claims seem confident. -DAF: In the issue #127, I didn't see any use cases that would be addressed with internal selectors. Responses under #127 show that the listed use cases would be solved with message-level selectors with message referencing. I think top-level only selectors guarantee translatability, but allowing internal selectors doesn't. If the group doesn't have any special rights, then I am okay with an informal group, but I also think we're ready to decide on whether to allow internal selectors. I am opposed to forming a formal group. +DAF: In the issue #127, I didn't see any use cases that would be addressed with internal selectors. Responses under #127 show that the listed use cases would be solved with message-level selectors with message referencing. I think top-level only selectors guarantee translatability, but allowing internal selectors doesn't. If the group doesn't have any special rights, then I am okay with an informal group, but I also think we're ready to decide on whether to allow internal selectors. I am opposed to forming a formal group. -RCA: It's been a year since the start of MFWG, so happy birthday to the group. In our group, we want to continue ensuring that all voices are counted. ZIB, it seems that not everyone has had a chance to evaluate the proposal, based on the few comments left on the GH issue 127, so maybe this means we can still bring this up for consideration in a future meeting. +RCA: It's been a year since the start of MFWG, so happy birthday to the group. In our group, we want to continue ensuring that all voices are counted. ZIB, it seems that not everyone has had a chance to evaluate the proposal, based on the few comments left on the GH issue 127, so maybe this means we can still bring this up for consideration in a future meeting. EAO: I can't claim to have 20 years of i18n experience, but I have 7 or 8 years of experience, but I feel fairly certain that anything that can be done with internal selectors can be represented using message level selectors. @@ -163,7 +164,7 @@ RCA: If there are use cases not yet represented, we should add it to issue 119. MIH: My 2 cents, yes some of us do have 20 years of experience, or at least I do, but I can still be proven wrong. So I think there is value in having watchdogs to point out where we can be wrong. And specifically, I talked with ZIB to understand and address the concerns better. -DAF: My point is not that internal selectors cannot be translated in general, but that a solution with internal selection cannot *guarantee* translatability. It routinely creates situations where translators cannot form a grammatical sentence for all possible cases. +DAF: My point is not that internal selectors cannot be translated in general, but that a solution with internal selection cannot _guarantee_ translatability. It routinely creates situations where translators cannot form a grammatical sentence for all possible cases. MIH: Regarding the idea of unwittingly setting ourselves up for teh need for a MF v3 by unintended consequences of our decisions, the point of this watchdog group is to prevent such a situation, so that any changes might only need a (backwards-compatibility, non-breaking) change in a version "2.1". @@ -189,7 +190,7 @@ STA: I think your idea of gathering use cases is good. EAO: I think it would help a lot to specify the cases in order to better explain what is or isn't possible. -STA: I was thinking about the example at the top of issue #103, written both with nested messages + internal selectors, and with message-level selectors. I think that as a principle, we should adopt the message-level selector approach first. I am okay with the level of verbosity that it creates. But then I notice if you translate this in my native Polish, the verb "liked" must decline/conjugate based on the noun subject "friends". So you cannot just have a message reference to a separate message for "friends" that is completely independent of the rest of the parent message -- the parent message needs to know about return value of the child message. +STA: I was thinking about the example at the top of issue #103, written both with nested messages + internal selectors, and with message-level selectors. I think that as a principle, we should adopt the message-level selector approach first. I am okay with the level of verbosity that it creates. But then I notice if you translate this in my native Polish, the verb "liked" must decline/conjugate based on the noun subject "friends". So you cannot just have a message reference to a separate message for "friends" that is completely independent of the rest of the parent message -- the parent message needs to know about return value of the child message. EAO: I don't know if you realize it, but our consensus is about the data model, and that therefore does not prevent writing it in a syntax that uses internal selectors. @@ -197,6 +198,6 @@ DAF: I don't think that deciding the syntax should be a part of the group, and f MIH: This is why I have been a proponent of a data model. Syntaxes vary, JS has one type that uses its native data literals, there is ICU MessageFormat, and there is Fluent, but you can always read one syntax in and write out another syntax if you need to, so long as they all adhere to the same data model. -EAO: Would there be any need to create tooling to convert from other syntaxes (like getText, etc) to the MF v2. Is there any prior art of such a converter having been created? +EAO: Would there be any need to create tooling to convert from other syntaxes (like getText, etc) to the MF v2. Is there any prior art of such a converter having been created? MIH: I think XLIFF could serve as the common representation interchange format. That is what we do in the l10n world. diff --git a/meetings/2020/notes-2020-12-14.md b/meetings/2020/notes-2020-12-14.md index 8742a8f78e..93969e50c6 100644 --- a/meetings/2020/notes-2020-12-14.md +++ b/meetings/2020/notes-2020-12-14.md @@ -1,4 +1,5 @@ #### December 14th Meeting Attendees: + - Romulo Cintra - CaixaBank (RCA) - Luke Swartz - Google (LHS) - Mihai Nita (MIH) @@ -14,8 +15,7 @@ - Robert Heinz - Nike (RHZ) - Shane Carr (SFC) - -## MessageFormat Working Group Contacts: +## MessageFormat Working Group Contacts: - [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) @@ -24,16 +24,17 @@ January 18, 10am PDT (6pm GMT) ## Agenda + - [ Agenda on Github ](https://github.com/unicode-org/message-format-wg/issues/134) -Propose Consensus on - external selectors - -How we should represent the Data Model ? - Syntax, Language, Format... -Plan the work on data model - merge/normalize existing proposals -Dynamic References #130 + Propose Consensus on - external selectors - + How we should represent the Data Model ? - Syntax, Language, Format... + Plan the work on data model - merge/normalize existing proposals + Dynamic References #130 -Easy way to volunteer to participate in Chair Group : [Link](https://github.com/unicode-org/message-format-wg/pull/70) +Easy way to volunteer to participate in Chair Group : [Link](https://github.com/unicode-org/message-format-wg/pull/70) We should go ahead and get started - + ### Moderator : Romulo Cintra ## Propose Consensus on - external selectors - @@ -62,19 +63,19 @@ EAO: It got me to thinking about what "keys" mean. What does it mean to have top EAO: I want us to raise our awareness that if we allow top-level selectors with multiple variables, then we effectively allow multi-level nested message bundles which in turn support dynamic references. -ZBI: This gets at a major limitation of previous +ZBI: This gets at a major limitation of previous MIH: I don't see that this implication is actually happening, or that these things are connected. My understanding of what he's saying is that by allowing selectors at one level, it's like enabling keys of a message. Is that the idea? EAO: Yes, to some extent, go on. -MIH: One, I don't think they are the same thing. No system that I know of creates messages dynamically at some point. If you create 5 messages with 5 ids, you get 5 messages with 5 ids. So I don't see selectors as keys. There is glue logic to handle fallbacks for when messages don't exist. +MIH: One, I don't think they are the same thing. No system that I know of creates messages dynamically at some point. If you create 5 messages with 5 ids, you get 5 messages with 5 ids. So I don't see selectors as keys. There is glue logic to handle fallbacks for when messages don't exist. STA: Can you explain how different keys are different messages? MIH: Well, that was the part I wanted to verify my understanding with. -EAO: If we don't have selectors internally to a message, every message is a string or array of strings. Or else it's a selector of some description. That's the example in my comment, where you have a name of a monster and you select the name of a monster from an object of monster names using a key. +EAO: If we don't have selectors internally to a message, every message is a string or array of strings. Or else it's a selector of some description. That's the example in my comment, where you have a name of a monster and you select the name of a monster from an object of monster names using a key. STA: If I understand correctly, a month ago, we were talking about how we can define different levels with different keys, but this is like having a registry of monster names. @@ -84,7 +85,7 @@ ZBI: MIH, what I think a bridge between what you're saying and EAO is saying, is EAO: Yes, and this falls out when we allow more than one selector. And since it's possible, it will be achieved through hacking, but I would rather it be standardized and not done through hacking. -MIH: We can always represent internal selectors as top-level selectors. What I don't understand is why do we need dynamic references? Why not just allow arguments to to the monster-name selector? Why do we need a registry, and not go further and just have a function call? +MIH: We can always represent internal selectors as top-level selectors. What I don't understand is why do we need dynamic references? Why not just allow arguments to to the monster-name selector? Why do we need a registry, and not go further and just have a function call? ZBI: From the perspective of the call-site, you can't recognize whether the string was written as-is or came from a Spanish or French or other translation, it just looks like a string. @@ -95,23 +96,22 @@ ZBI: Can you clarify "multiple inputs for a selector"? EAO: Look at the last example in teh Github issue comment. message: | - { $key, $monster -> - [monster-name, dinosaur] Dinosaur - [monster-name, elephant] Elephant - [monster-name, ogre] Ogre - [monster-name, other] Monster - [killed-notice, other] You've been killed by a { $message(key: 'monster-name', monster: $monster) } - [other, other] Error: Message not found - } - +{ $key, $monster -> +[monster-name, dinosaur] Dinosaur +[monster-name, elephant] Elephant +[monster-name, ogre] Ogre +[monster-name, other] Monster +[killed-notice, other] You've been killed by a { $message(key: 'monster-name', monster: $monster) } +[other, other] Error: Message not found +} STA: As soon as you have any sort of branching mechanism, you can suddenly create messages as a hashmap. But you then want to have some form of control over the keys in the form of a registry. Then you want to have consistency, ex: you don't want monster name keys also being supplied in Finnish. EAO: I think you want that to be a linter thing but not a forced thing. -MIH: The jump to the last example seems hard to understand, +MIH: The jump to the last example seems hard to understand, -EAO: +EAO: STA: Are you saying that we need dynamic references because people will hack to get it anyways, in the same way that we need to support the internal selector case using message references? @@ -121,9 +121,9 @@ What I realized that EAO's position implies is that the hash map is another regi MIH: Do you see the last example as a hack, and the one above as not a hack? -ZIB: Yes +ZIB: Yes -MIH: So if we agree that the last example is a hack, and the one before it isn't a hack, then we have to see whether we can prevent the hack somehow. So I think we can support the previous examples and either prevent the last example which is a hack in the data model, or at least have it flagged by a linter. +MIH: So if we agree that the last example is a hack, and the one before it isn't a hack, then we have to see whether we can prevent the hack somehow. So I think we can support the previous examples and either prevent the last example which is a hack in the data model, or at least have it flagged by a linter. STA: This sounds like a workaround for the lack of dynamic of references? So I wonder, why didn't we use this in Firefox? @@ -133,7 +133,7 @@ STA: One possible reason we didn’t use it is because we set up continuous inte STA: Top example looks nice in English, but many languages need more than one form for each noun. In other languages we will need more hacks to handle more forms of the same variable. -MIH: I agree with your argument that you need different lists, like a list for monster plural forms and a list for accusative case, etc. +MIH: I agree with your argument that you need different lists, like a list for monster plural forms and a list for accusative case, etc. EAO: To clarify where I think we are currently in this discussion, where we are is that if we allow multiple top-level selectors, then we allow this sort of hacking in the messages, and we do not want to support that in MF 2.0. So what I think this means is that we have to clarify how to do dynamic references, not if it should be done. @@ -147,7 +147,7 @@ STR: I see where you come from, MIH. From my perspective as a linguist, the solu MIH: You cannot prevent them from doing that. Imagine you are Amazon “We just shipped your order of 5 books and 5 DVDs and 5 ….” There could be thousands of things and we cannot prevent them from adding something else. -ZBI: I completely agree, STR, that this would be great for translators. But if there are 100 monsters, then there are 100 messages. This is another case of explosion of parameters. +ZBI: I completely agree, STR, that this would be great for translators. But if there are 100 monsters, then there are 100 messages. This is another case of explosion of parameters. What I think MIH was saying, and want to restate, is that there are edge cases. It has to happen, and the question is how. If we don’t allow for anything, people will hack around us. We can block it with linting and other methods, but people will still hack around it. MIH: Looking at the last example, nobody likes it, I don’t know if we can block it in the syntax. We should really try to forbid direct recursion. It is one thing that I hate about MF. I don’t know how to prevent it, but I am happy that everyone wants to prevent it. @@ -170,7 +170,7 @@ STA: It is surprising to me that this would not be desired. A list of selectors ZBI: I recommend people to look at the top, because I recommended how people can work around this problem, and I also talk about the impact to GUIs. I would like people to decide how we can work around that. -RCA: What should we do to move this forward? Ok, let's create a task force for this, look for an email or a message about setting that up in the future. +RCA: What should we do to move this forward? Ok, let's create a task force for this, look for an email or a message about setting that up in the future. ## How should we represent the Data Model ? - Syntax, Language, Format… @@ -180,7 +180,7 @@ STA: In the Chair Group meeting, we were trying to pick a candidate for a unifie Generic typescript could be used to describe a list of elements. Should there be a canonical example syntax that we would like to continue using. Personal thought: I kind of like how some languages represent objects (C#, Rust) name of class, open brace, fields. That could be one of the ideas that I could throw into discussion. -MIH: Is there a way to represent maps where keys can be arrays? Do you know any syntax that I can use arrays for keys? +MIH: Is there a way to represent maps where keys can be arrays? Do you know any syntax that I can use arrays for keys? ZBI: Something that I remember striking from TS enums are very underdeveloped and a poor attempt that was inconsistently linted. Coming from Rust where they are core and the data model is good at enforcing it it became counterproductive using enums in Typescript. @@ -194,7 +194,7 @@ Schema syntax: Map MIH: It is totally fine if it doesn’t compile. -ECH: edn is to Closure what JSON is to Javascript. It is fundamental to the language… If you want a one sentence synopsis it is like cleaned up JSON. It has int and float and other numbers. You have heterogeneous collections. Data is data. … you can also tag things, making it extensible. +ECH: edn is to Closure what JSON is to Javascript. It is fundamental to the language… If you want a one sentence synopsis it is like cleaned up JSON. It has int and float and other numbers. You have heterogeneous collections. Data is data. … you can also tag things, making it extensible. MIH: Are there parsing libraries for edn in major languages? @@ -217,7 +217,7 @@ STA: Example syntax Array {1, 2}: Pattern {value = “....”}, } -Seems like a fairly standard way of showing objects. +Seems like a fairly standard way of showing objects. ZBI: Rust has it… @@ -237,7 +237,6 @@ RCA: Can we try out Typescript for the schema? Also, MIH, ZBI, ECH, can you work ## Plan the work on data model - merge/normalize existing proposals - MIH: I think we can do this in two steps. One, to represent the existing proposals, and then second to discuss and iterate. EAO: Let’s use the facilities that github provides for us to collaborate on this. @@ -264,13 +263,6 @@ SFC: With 402 we were running out of time so we went to 2.5 hours with a 15 minu ZBI: This meeting (compared to all others I participate in) seems to have the highest amount of back and forth in the meeting and the lowest amount outside the meeting. I would prefer to not extend and keep stricter bounds to move us along and use the asynchronous discussion outside the meeting. -RCA: Extending the meeting is difficult for earlier time zones. It’s hard to keep going on 402 for that reason. +RCA: Extending the meeting is difficult for earlier time zones. It’s hard to keep going on 402 for that reason. MIH: I think the task force meetings are also cramped by trying to allow all time zones to attend. Maybe we should be open to 1-3 person meetings as long as they are very well documented. I also don’t want to be exclusive… too closed. - - - - - - - diff --git a/meetings/2021/notes-2021-02-15.md b/meetings/2021/notes-2021-02-15.md index 5c25d8d01e..795c7faff8 100644 --- a/meetings/2021/notes-2021-02-15.md +++ b/meetings/2021/notes-2021-02-15.md @@ -1,4 +1,5 @@ #### February 15, Meeting Attendees: + - Romulo Cintra - CaixaBank (RCA) - Nicolas Bouvrette - Expedia (NIC) - Zibi Braniecki - Mozilla (ZBI) @@ -11,8 +12,7 @@ - David Filip - Huawei, OASIS XLIFF TC (DAF) - Ujjwal Sharma - Igalia (USA) - -## MessageFormat Working Group Contacts: +## MessageFormat Working Group Contacts: - [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) @@ -27,37 +27,36 @@ March 15, 10am PDT (6pm GMT) Improving MFWG meeting notes Progress on Data Model unification [#141](https://github.com/unicode-org/message-format-wg/discussions/141) Consensus on external selectors #137 - ### Moderator : Romulo - + ## Improving MFWG meeting notes - + RCA: We did not have many notes from the last meeting. In the chair group meeting, we discussed ways to improve note taking. What do you all think of the options? - + NIC: Does anyone have an objection to recording ourselves? If not, we can try doing that since we had a drop in note taking? - + RCA: My first preference is for a library (software) for transcription. - + MIH: If we keep a recording for transcription, it should be for a relatively limited time. That point is also in the chair group meeting notes. - -STA: I think the recording serve different purposes. The transcription from a recording helps for notes, but the recording itself can help people catch up to - + +STA: I think the recording serve different purposes. The transcription from a recording helps for notes, but the recording itself can help people catch up to + USA: Notes work the same way, they help most when you want to catch up or search for something. - + GWR: Transcription is really helpful to remind yourself of what you agreed on. Audio could be helpful, but I don't know about having video. - + RCA: First question: should we record? Second question: do we use the transcription of the recording or do we take notes taken during the meeting. - + MIH: Or option 3 is to record, and use the transcription of the recording to supplement the notes. - -RCA: -Option 1: Video / audio recording + manual note taking. -Option 2: automated transcription + manual note taking. + +RCA: +Option 1: Video / audio recording + manual note taking. +Option 2: automated transcription + manual note taking. Option 3: manual note taking, recording video / audio + automated transcription - + NIC: After you mentioned it, RCA, I turned on the automated transcription, and I too am impressed. - + MIH: 2 or 3 EAO: fine with anything NIC: 2 or 3 @@ -65,180 +64,171 @@ ZBI: no opinion STA: 3 GWR: not opposed, slight pref for 2 USA: 2. - + RCA: Option 2 seems like a winner. - + USA: Allso, we can use the bot that we use during TC 39 meetings and use that, I can check with Kevin. - + RCA: Let's check offline and try option 2 next meeting (manual note taking + automated transcription) - + MIH: Can everyone see “Raise hand” in Meet? Maybe use that instead of TCQ? - + ECH: TCQ gives us a queue plus inserting questions into it. Should we present the notes or TCQ during the meeting? (Agreed on notes.) - - + ## Progress on Data Model unification [#141](https://github.com/unicode-org/message-format-wg/issues/141) - + RCA: Let's talk about data model unification that MIH, ZBI, EAO, and ECH have been working on. - + EAO: Still in progress. - -ZBI: I can give a summary of where we are. We started with 4 different models, and we looked at similarities and differences. We have different models and naming schemes. Based on the similarities, we wanted to reduce the number of models to consider from 4 to 2. We are starting to coalesce on naming schemes, too. We will meet again on Thursday and reduce from 2 to 1 and present at the next meeting. - + +ZBI: I can give a summary of where we are. We started with 4 different models, and we looked at similarities and differences. We have different models and naming schemes. Based on the similarities, we wanted to reduce the number of models to consider from 4 to 2. We are starting to coalesce on naming schemes, too. We will meet again on Thursday and reduce from 2 to 1 and present at the next meeting. + Interesting for the group is that we discussed what types are possible. When I say "type", the obvious ones are string and number. The non-obvious ones are boolean, some rich structures (which we sometimes call Formattables), we have functions and are the types of inputs and outputs, and what kinds of types can developers pass? - + Data model has consequences on fallback-ing (locale matching) and error scenarios (runtime checking) on invalid inputs. - + EAO: If anyone wants to join us, please do. The meeting time is Thursday 7am Pacific Time. - + ZBI: We are treating our work as an input to the conversations in this group, and not as any sort of decision. - + RCA: There is a question on TCQ that I will raise. The first one, on the topic of locale matching (fallbacks) and error handling. Are you taking that into account? - + ZBI: Yes, there might be implications of the data model on these points. - + RCA: Can we send an email to the whole MFWG? Share to interested people who might to join. - + ZBI: I think there will be a long conversation in this group. What this current huddle for data model among MIH, EAO, ZBI, and ECH is just reducing the points of contention. - + MIH: Yes, the output of this unified data model is just a starting point for the discussion here in this group. - + DAF: Please send me the invite. - -STA: What are the cutoff criteria for calling this preliminary work done? The question of what data types there should be is something we could all discuss together? What are you merging versus discussing new things? - + +STA: What are the cutoff criteria for calling this preliminary work done? The question of what data types there should be is something we could all discuss together? What are you merging versus discussing new things? + ZBI: My expectation is that by the next meeting, we will present or share a doc or Github issue. We'll describe alternatives and pros and cons, sticking points, agreements, etc. - - - - - + ## Consensus on external selectors [#137](https://github.com/unicode-org/message-format-wg/issues/137) - + RCA: In the last meeting, we agreed to decide on a summary of our consensus on our discussions about external selectors - + STA: What I have in [issue 149](https://github.com/unicode-org/message-format-wg/issues/149) is an attempt to describe restricting message references to reduce complexity. Once you have message references, you can no longer prevent nested messages. Also, messages are now dependent on each other, so when the storage of messages migrates, all messages must migrate in lockstep. - + I made up a word "Referenceable" to generically describe any entity that can reference another entity. If we can break up the dependency graph to not be too many levels deep, we can reduce complexity. - + What I am proposing is that messages either only reference things in the registry or things directly in the referenced ("child") message. - + Message references often get used to refactor out common / shared content in messages, and that leads to bad times for localization, so we want to plan around that. - + In the example in the issue (# 149), "referenceables" becomes a key that creates content that are a part of the message (but are themselves not actually messages) that the actual parts of the message can reference in a common way. - + We would need to think about what the registry contains and looks like. Maybe it's a data structure or code. We have glossaries with words and declensions for every language, and can be managed be developers. - + This proposal limits some functionality and flexibility, but gives us the possibility to handle each message as a separate entity. Are these concerns that you all recognize and share, so that we should spend time on? - + RCA: Let's go to TCQ for questions. - + MIH: I agree with the direction in trying to limiting things. I would need to think about whether this is the direction I would go, but to me, as the data model stands with allowing message references in the patterns takes us one step back from top-level selectors that allows us to have standalone messages. Without this restriction, we can construct unwieldy message constructions. - + Question, as a translator, how am I supposed to deal with something like that complex one where one part of the sentence has to agree with another part of the sentence that become disjoint parts. - + STA: I didn't address this, but referenceables become territory for the translators. - + MIH: This totally breaks most localization tools. Most of them work on the model that you translate one string, you get one back. - + STA: I realize that that's the challenge. - + NIC: I can see the concern with messages with a lot of different variables. - + EAO: STA, have you had a chance to read the reply I posted. Specifically, how do you form the registry? This becomes analogous to creating message references. I am concerned that this would add complexity without solving the real challenges that it is attempt to. - + STA: You're right, you could probably hack this by making your entire message set into the registry. How would the registry be defined? You can make it a special set of messages. I haven't thought about whether we can allow references between things in the registry. - + EAO: You did that that "Referenceables" can reference other Referenceables. - + STA: Yes - + EAO: We have a push and pull on how much the data model should allow or disallow to be represented. We have the possibility to create softer systems that allow people to work with the data model instead of completely working around it. - + STA: My general attitude toward linter rules is that it's easy to delegate work off onto linters. Then you have capabilities that only are available to people who use the linters. I want to be the counter weight to the idea of relying on the linters to handle improvement to the standard, and instead put it directly into the standard. - -MIH: How do we map this to XLIFF? If we don't support a mapping to XLIFF, then we revert back to the status quo of not supporting localization. - + +MIH: How do we map this to XLIFF? If we don't support a mapping to XLIFF, then we revert back to the status quo of not supporting localization. + One of the use cases that I've seen in the Firefox scene is to reuse company name or brand name and adding grammatical case information is not supported by this proposal. - + STA: My idea of supporting this important use case (brand name reuse) is through the registry. There are special rules for updating the brand name in Mozilla, and it is handled tightly. - + MIH: What is a registry? - + STA: A global set of Referenceables that you can use in any message. - + GWR: There is a notion of a registry in Siri. If I want to talk about the Mozilla tab or an iPhone, is it masculine or feminine, if it's Chinese it could be 1 form, in Finnish there could be 15 declensions. There could be metadata as well. It helps ensure uniformity. - -There is information that might be useful for plural formatting, that might affect word order within a phrase. In Siri, there might be a whole message in the registry, although we've discussed here in MFWG that referring to whole messages is not necessarily a good idea. - + +There is information that might be useful for plural formatting, that might affect word order within a phrase. In Siri, there might be a whole message in the registry, although we've discussed here in MFWG that referring to whole messages is not necessarily a good idea. + There are namespaces within the registry. There is information about whether a concept is bounded or unbounded, although that can be confusing to the translator. - + There are times when I look up information from Knowledge Graph and need to display information like "The Eiffel Tower is x meters tall", and I need to know how to refer to teh Eiffel Tower. - - -MIH: My question is are you translating all of the information that you are describing (plural formatting, currency, units formatting)? It would double the amount of information compared to what CLDR provides? - + +MIH: My question is are you translating all of the information that you are describing (plural formatting, currency, units formatting)? It would double the amount of information compared to what CLDR provides? + GWR: Yes, we started this before CLDR came out with the data, but CLDR still doesn't provide information about pronunciation. - + DAF: I want to point out the connection to [issue 131](https://github.com/unicode-org/message-format-wg/discussions/131). What is in the standard and what is in the repositories. What is in CLDR and not in CLDR. It's related to issue 149 about Referenceables. - + MIH: I propose that we find a different name than registry, then. - + DAF: It's just a pointer to issue 131 but we haven't had any discussion there, but these are related. - + STA: Yes, of course, it's very related. I did refer to this when created issue 149. I am still wary of offloading too much functionality and responsibility to the registry. If we do, then we have to go and spend time specifying how that registry should work. - + DAF: This proposal guarantees that different entities maintain their own data within the registry. And maybe this can help bring standardization over time. - + RCA: Talking about the things that we have, maybe it would help to bring in other stakeholders, like people who do translation and work with translation tools. Maybe not now, maybe in a couple of months. It could help bring more perspective. - + STA: I think we can discuss then when we know a little more about the data model. While this is still in progress, we can put this on hold. Then I could create a PR that could change the data model to allow this, etc. - + RCA: We are talking about the implications of these decisions that would break interop with localization, but it could be could to have them verify that and/or tell us what we are missing. - + EAO: We are conflating two things in talk of the registries: functions that live outside of the data model, and messages that reference other messages. The registries do not specify what the functions should look like. - + MIH: That's why a name other than registry makes sense. - + RCA: Can we decide on a consensus on the external selectors? - + ECH: I just want to see a clear delineation of the consensuses (consensii?) that we are discussing. And if we are all in agreement, which I think we are since we waited and didn't have objections, we can go ahead and agree on them. - + EAO: My understanding from the last meeting was to allow time for objections since we had otherwise reached consensus on these topics. - -DAF: Thanks, EAO, you're right, I remember now. We were ready but there were fewer of us present, so we wanted to wait and give other people in the group chance to dissent.. - + +DAF: Thanks, EAO, you're right, I remember now. We were ready but there were fewer of us present, so we wanted to wait and give other people in the group chance to dissent.. + EAO: I can write a PR that clearly lays out the consensuses and we can all agree upon it and merge it into a common place. - + MIH: I am reluctant to just approve what we have currently since it is not clear with all the noise and changes. So EAO's proposal would be nice to have a clear communication for these important decisions. - + STA: I struggle with the consensus verbiage "We will not block…" Isn't that implicit in the way we are working, and thus does not need to be specially called out? - + EAO: That was from ZBI, and was about ensuring that other working groups don't block us. - -ZBI: I see the decision on ____ can affect the design of the data model. - - + +ZBI: I see the decision on \_\_\_\_ can affect the design of the data model. + ZBI: I see that we are choosing top-level selectors because there is not enough evidence to show that anything more is needed. - -STA: I'm still not convinced. It still seems that we're - + +STA: I'm still not convinced. It still seems that we're + ZBI: I think this is the only decision in our group that has this implication. Nested selectors is the one thing large consequence and potentially may bring value but is easy to be blocked. But I don't see anything else in the scope of what we're talking about like this. - -MIH: If I can try to explain, it's like the difference between saying "this is not allowed" versus we don't say that. This is kind of in the middle, by saying, "we will not block this by mistake". The decision here is to be explicit in saying that we won't block this by mistake. - + +MIH: If I can try to explain, it's like the difference between saying "this is not allowed" versus we don't say that. This is kind of in the middle, by saying, "we will not block this by mistake". The decision here is to be explicit in saying that we won't block this by mistake. + ZBI: Although I hear you saying so, STA, I don't think this type of impact is true for all decisions, it's only for this one decision. - + STA: I don't think we should have a standard that specifically calls out a specific decision. - + ZBI: Abn example is mathematical operators. We are not blocking them in MF 2, and leave it an open possibility. - + STA: I still don't see how this is necessary to include in the standard or necessary in order to have the discussions and decision making that we would normally make. - + ZBI: I agree that our discussions can happen just the same without this consensus, but it would be an acknowledgement of the months of discussions that we had on this topic, and this was a difficult consensus to come to. And in the future, it may not just be me who thinks the way I do, so that there will be an easy place to have the written out text of the conclusion of all that. - - + RCA: We will get these notes into a PR during the chair meeting as we decided. Also, we can get the notes for the task force for issue 130 in January as well https://docs.google.com/document/d/1P7qhnxUDUpD5AKpcQp_nfIYj2ZBDoXS8YspmN3eV3f8/edit# . - diff --git a/meetings/2021/notes-2021-03-15.md b/meetings/2021/notes-2021-03-15.md index 695915672f..0748305007 100644 --- a/meetings/2021/notes-2021-03-15.md +++ b/meetings/2021/notes-2021-03-15.md @@ -1,6 +1,7 @@ #### [meeting transcription](https://docs.google.com/document/d/1XXHSkxpJcZOuQk1ViwqrnAFG5i5Cx0YtzcdyVuzvfVg/edit?usp=sharing) #### March 15, Meeting Attendees: + - Romulo Cintra - CaixaBank (RCA) - Daniel Minor - Mozilla (DLM) - Nicolas Bouvrette - Expedia (NIC) @@ -16,8 +17,7 @@ - David Filip - Huawei, OASIS XLIFF TC (DAF) - Zibi Braniecki - Mozilla (ZBI) - -## MessageFormat Working Group Contacts: +## MessageFormat Working Group Contacts: - [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) @@ -26,169 +26,166 @@ April 19, 10am PDT (6pm GMT) ## Agenda + - [ Agenda on Github ](https://github.com/unicode-org/message-format-wg/issues/148) ### Moderator : Nicolas Bouvrette - - + ## Data Model Taskforce review [#140](https://github.com/unicode-org/message-format-wg/issues/140) [#141](https://github.com/unicode-org/message-format-wg/issues/141) - + RCA: DLM, MIH, EAO, ECH, and ZBI have been working in these models and participating in several rounds of task force meetings recently. DLM will present. - + DLM: Sharing this [presentation slide deck](https://docs.google.com/presentation/d/1tpKKw37-lice6zDuxb-gsTvDQ-7tURHIViwa_3q7dUs/edit#slide=id.gc5eb9fd624_0_169). Broadly, we started with 4 data models. We're down to 2. Between them, only a handful of divergences. Presentation is a summary of the work done in these task force meetings. - + One approach is to start with a smaller model and add things only when we need to. Another approach is to create a more expressive model that is future-proof but can allow things that are undesirable. Don't want to spend much time on this. - + Next question is whether the data model should place limits on translators. No limits in the data model: provides greater flexibility, and allows building tools for different use cases. Having limits simplifies the mental model for translators and hides or prevents complexity. - + Next question is regarding XLIFF mapping. The question is about allowing multiple possible mappings. ALlowing multiple possible mappings can potentially support multiple use cases. Having a single canonical mapping makes a consistent round-tripping to XLIFF and back. - + Next question is constraints on function arguments. Can a function take a mix of literal value arguments and function arguments? Not allowing this mix of types simplifies translation and XLIFF transformation. Allowing this mix of types enables function composition, which enables a wider range of expressions in conjunction with a nested structure. - + EAO: A clarifying question (comment) of why these questions are the open questions presented. We have 2 remaining data models, and we need to figure out how to merge these and come up with one single consensus. We've met 8 times and met an impasse. We have other questions, but 3 questions are considered the primary / representative ones -- resolving these will address a large part of the open points. We haven't prepared exactly how this will go, but now is the time to share opinions. - + LHS: What, if any, limitations do we agree on wrt data model? I will say what I think the current state of thinking is. I thought that earlier on, we agreed that as a committee, we are not trying to represent the full expressiveness of human language, and we can limit ourselves. It seems like this is the difference between the 2 different models. What do we think? - + EAO: Both models have ways of expressing anything that is reasonable to express. One model can express things less clumsy than another. One allows more of these expressions to be represented in the data model, and the other defers representation to supporting functions in a registry. - + MIH: I have seen messages in Google where the message is like "how is the weather" and the select message is based on "weather_type" and "weather_type = sunny => It is sunny; weather_type = rainy => Take an umbrella;, …" Some of this logic doesn't belong in MessageFormat, it belongs in the code you write in your programming language. - + EAO: One tries to estimate what is appropriate for a message,a nd tries to limit the data model accordingly. The other model doesn't limit what is allowed, and provides the minimal tools to make things possible. - + MIH: Just to note, none of the data models are final. These are the proposals. We can go to features requested in Github issues, we will try to implement, and iterate. - + LHS: Is there an example that will help illustrate the difference? - -MIH: If you go back in the slides to the French example, the difference is captured here. In the more flexible model, the translator has the ability to modify the 2nd example. In the less permissive model, the translator is only able to modify a message with the 2nd example unless the developer creates a function that can support modification of values. - + +MIH: If you go back in the slides to the French example, the difference is captured here. In the more flexible model, the translator has the ability to modify the 2nd example. In the less permissive model, the translator is only able to modify a message with the 2nd example unless the developer creates a function that can support modification of values. + STA: Who can add the AddHonorific fn in this example? - + EAO: Developer - + MIH: In this one, it's the developer because you have a "user" object. But you can define registries for things like plurals and grammatical cases. - + STA: The "full control" has me thinking that the translator can modify arguments but not create arbitrary functions. - + RCA: Yes, no inversion of control here. - + SFC: In this example, whether there should be an honorific applied is a question for the translation style guide, not for the translator to decide. Is that correct? - + MIH: The style guide is something that changes per language. - + EAO: The question is whether the data model should allow this or not. - + GRH: For us, whether you are trying to be informal or formal depends on the client application, and it is language-specific. - + LHS: To respond to SFC's comment, I agree with your general thrust, this is a specific thing that has shown up in Google products a lot. It differs from German to French to etc. It seems like the developers have to do some work to make the family name available. Either way, the developer has to do some work. - + ZBI: SFC, I think your approach is correct, but it is project-specific. We are not talking about what should be done, but rather, what data model we should choose to allow for these options. - + MIH: In a way, that's exactly the problem. In the option of allowing the translator to specify whether to apply an honorific name function, it is like current MessageFormat v1 where users can specify options to date formatting, but users need to know that function exists. Developers can specify what to do in a fallback scenario. It's like providing skeletons versus patterns. - + EAO: It is not a matter of what should be done, but whether that level of restriction should be encoded in the data model. - -LHS: To respond to MIH's point, I think some of what MIH is talking about could be mitigated by better translation tools. - + +LHS: To respond to MIH's point, I think some of what MIH is talking about could be mitigated by better translation tools. + I want to get to the question about letting translators have a mechanism for providing feedback. It frequently happens that translators need pieces of information in the message that doesn't exist. - + EAO: Yes, enabling more expressive power to solve some cases also means exposing extra arguments that must be configured in common cases that don't need it. - + RCA: We should have clearer actions on what to do here. We should discuss this offline and have a better conclusion on this. We are trying to consider all points of view, but I don't have a clear vision of the pros and cons for how these help and impact developers and translators. Maybe we should have a separate meeting just to discuss the pros and cons? Should we spend the rest of the meeting discussing these use cases? - + SFC: As a point of order, let's let everyone express their points without trying to directly respond, so that the task force members can hear everyone's input and know what the constraints are. - + SFC: The question that I would like the task group to consider offline. Are there security concerns? If you rely on functions that are offline, does applying them require something like "eval" like in JavaScript `eval`? Because that would have security concern implications. - + ZBI: It is not like `eval`. Fluent has done this, and it is nothing like eval. - + SFC: I just want this to be documented. - -STA: This discussion is reminiscent of the discussion last year about developer vs. translator control. The dichotomy is flexibility vs. limits, i.e. expressiveness at the risk of added complexity vs. having a less expressive solution that follows the principle of least power. Have we decided as a group where on the spectrum we see ourselves? - + +STA: This discussion is reminiscent of the discussion last year about developer vs. translator control. The dichotomy is flexibility vs. limits, i.e. expressiveness at the risk of added complexity vs. having a less expressive solution that follows the principle of least power. Have we decided as a group where on the spectrum we see ourselves? + I think that our job should be to design this as limiting as possible, not more. I think it is worth it for us to have this discussion. - + MIH: Main difference in the “philosophy” + - as limiting as possible, but not more - as flexible as possible, because we don't know what the future brings - + ZBI: As one of the task force members, it is easy to get caught in between looking at very specific immediate problems and looking at long-term high-level problems. We wanted to provide to the rest of the group a scaffolding of a solution, but it is not intended to be complete and detailed. Instead, there are tradeoffs that have to be made, and you will have to consider them on your own time, in order to evaluate which tradeoffs are preferable to the others. - -SFC: One of the biggest reasons for the first question -- whether you have functions or options -- one of the reasons to have the translators to have configurable options but not configurable functions is that ______, - + +SFC: One of the biggest reasons for the first question -- whether you have functions or options -- one of the reasons to have the translators to have configurable options but not configurable functions is that **\_\_**, + I have been increasingly convinced that to build the data model that gives power to the translators, that is advantageous. What are the unique advantages of the options-based model but still gives power to translators. I believe that linting tools can play a role here. I see how there is a desire to have translators to work in a sandbox but also give them control. Maybe there is a model that can let them do both. - -EAO: A link I posted _____ shows how XLIFF messages can be created that allow translators to translate the same things using the functions as using options for predefined functions. - + +EAO: A link I posted **\_** shows how XLIFF messages can be created that allow translators to translate the same things using the functions as using options for predefined functions. + LHS: I can understand ZBI that those not in the task force can't immediately have the full context of those discussions, but I am happy to meet with you all and document something as concise as possible to give the committee the proper context. We need a way for people to get that context. - -RCA: Point of order, since we are running out of time, maybe we can take up the agenda item of the 2021 Roadmap next meeting, and we discuss that offline in the meantime. Then we spend the remaining time on the Unicode Conf presentation. Okay? Okay, no objections, thanks. - + +RCA: Point of order, since we are running out of time, maybe we can take up the agenda item of the 2021 Roadmap next meeting, and we discuss that offline in the meantime. Then we spend the remaining time on the Unicode Conf presentation. Okay? Okay, no objections, thanks. + STA: In the honorific example, in the options-only approach, why does supporting this need to go back to the developer? - + MIH: You have to communicate somehow to the developer that the option is available. - + STA: That is true for the functions-also approach, translators need to know it exists, too. - + MIH: Either way, you need a mechanism indicating that the option is available, and that mechanism must exist outside of the message. - + STA: In both of the examples regarding honorifics in the presentation slide, that problem exists. - + MIH: The crux of the question is not just how do you communicate to the translator, but how do you communicate to the translation tool, too. If you have a message in the Translation Memory that uses honorific, you cannot leverage that message if you don't support it. - + STA: Thanks for the slide deck. The discussion that followed was great as well. I would like to get a 3 minute summary, yet again. After an hour of discussing, it would be good to review the questions under consideration. - + SFC: +1 to what MIH was saying about the translation tool should be able to know what options and functions are available. The translation tool is so far away in the stack that it might be possible for it to get overlooked, but I think that is non-negotiable. - + I think if we go with the functions approach, we have to come up with a way to express all the things that come with functions. Functions are a more complicated concept to represent declaratively, and I'm perfectly fine with the functions approach, but coming up with a way to make it possible to fully convey in the translation tool is something that the task force should consider. - + EAO: The task force considered more than just the 3 or 4 questions here, we discussed 10 or so questions. But these 3-4 questions were chosen because they drive much of the conversation. - + RCA: We could have a Github issue to represent. Should we have a single issue for all questions, or a separate issue per question? Vote / speak up. - + RCA: Based on the responses, let's discuss the questions together in a single issue. - + RCA: We haven't brought into the conversation of MessageFormat 2.0 other stakeholder groups like translators and industry participants (ex: tooling). - + ZBI: Those groups are not good at predicting what they want or need. If you ask a question, you get no answer or an uninformed answer. Maybe we can have another round where we put in front of them what we come up with and see whether it works for them and how, and incorporate that feedback into the next iteration. - + MIH: I agree with ZBI that for translators, putting prototypes in front of them will work well. But we also need participation of people from the tooling side, DAF is one such example, but we need more representation. - + EAO: One approach is that we have a limited data model that can be easily mapped to XLIFF, and to use XLIFF extensions where needed. Another approach is to make the data model more of an enabler than a limiter, and we will have other parts (ex: translation to XLIFF) based on the data model, but those layers of transformations will not impose limitations on what the data model can represent. - + RCA: We can close this topic for now, and we already agreed that we will continue discussing offline. - - + ## Roadmap MF 2021 [#157](https://github.com/unicode-org/message-format-wg/issues/157) - - - + ## Unicode Conf - MFWG Presentation - + ECH: What should we propose to talk about? Who wants to speak? - + RCA: I think we will be further along this year than last year. - + ZBI: I believe we should be around v 1.0 of MessageFormat 2.0 around the time of the conference. I don't like making project timelines based around conference dates. We have made a lot of progress on the data model, and by that time, we might even be working on the syntax / implementation. - -So I believe that we will have made a lot of progress and have something concrete to present by then. Unless you think we will be working in a long multi year process on MF 2.0, then I think it is reasonable to - + +So I believe that we will have made a lot of progress and have something concrete to present by then. Unless you think we will be working in a long multi year process on MF 2.0, then I think it is reasonable to + RCA: Any volunteers on the abstract? We can discuss this through the Slack channel. - - + RCA: Other topics? EAO proposed changing the plenary cadence to once per two weeks. I think we can keep it at once a month. - + MIH: I think we can keep it once a month. We already haven't seen recent attendance from Amazon, eBay, etc. - -RCA: Also, the plenary isn't necessarily where the most productive work happens. Any ideas? - + +RCA: Also, the plenary isn't necessarily where the most productive work happens. Any ideas? + STA: I would also love to see the pace increase, and I think this is a very important period. Everyone would agree that we want to see more discussions happen asynchronously on Github, and the nature of the discussions don't lend themselves to Github. I think they work better synchronously. - + I think, maybe for the next 3 months, we have 2-week meetings, that would help keep the pace. Also, for the benefit of people in other time zones, having one of those meetings at a different time would help. - + DAF: What STA said makes sense in the short-term, like he said, because these complex topics warrant extra discussion. And some of those task force meetings could benefit from the context of a decision making (or larger) meeting to unblock some of the discussions. - + ZBI: I agree that it is a critical time. I am not as concerned about dropoff because this isn't a time to join in the middle of these detailed discussions. Maybe have the plenary meeting once every 2 weeks and a rapid huddle task force meeting on the data model every 2 weeks. But the reason why we couldn't resolve the questions in the task force is because the questions are all intertwined. - + RCA: I will schedule extra meetings to occur in between the existing monthly meetings. The invite will go out to the mailing list. - + [Here is the link](https://docs.google.com/document/d/1XXHSkxpJcZOuQk1ViwqrnAFG5i5Cx0YtzcdyVuzvfVg/edit?usp=sharing) to the auto-recorded transcriptions. diff --git a/meetings/2021/notes-2021-03-29-extended.md b/meetings/2021/notes-2021-03-29-extended.md index 7e07bf6476..12b3a20ee5 100644 --- a/meetings/2021/notes-2021-03-29-extended.md +++ b/meetings/2021/notes-2021-03-29-extended.md @@ -1,11 +1,12 @@ Attendees: Please fill in a 3-letter acronym if this is your first meeting: + - Suggestion 1: First letter of given name, First letter of surname, Last letter of surname - Suggestion 2: First initial, middle initial, last initial - Suggestion 3: Custom - #### March 29, Meeting Attendees: + - Romulo Cintra - CaixaBank (RCA) - Daniel Minor - Mozilla (DLM) - Nicolas Bouvrette - Expedia (NIC) @@ -18,92 +19,90 @@ Please fill in a 3-letter acronym if this is your first meeting: - Zibi Braniecki - Mozilla (ZBI) - David Filip - Huawei, XLIFF TC liaison (DAF) - -## MessageFormat Working Group Contacts: +## MessageFormat Working Group Contacts: - [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) ## Extended Meeting - - + ## Summary documents - + ECH: I created documents to summarize the existing proposals. How do people feel about these docs to summarize and collect the arguments? I believe that we will need such summaries, and that these can be done in parallel to any other discussions. - + https://docs.google.com/document/d/1L5W1PE7V_UyO1XgYdPouOwRZfpI2vGozNb47PM7puz8/edit#heading=h.2tho5bx87ubb - + https://docs.google.com/document/d/13jc78fnrIBq-qSHMDXK_bILzlKbgJVmAuek5Xj1lAvI/edit#heading=h.2tho5bx87ubb - + Everyone with the link has edit permissions, so feel free to contribute and share with others who may also want to. I also want to suggest that we only contribute to the proposal(s) that we agree with, and that we don’t leave critical comments on other proposals. Instead, view other proposal documents as read-only and use those arguments to strengthen your own argument. - + EAO: I agree that these proposal documents are worth expanding on, and that they can be done in parallel to other discussions. - + ## Discussion on [Issue 159](https://github.com/unicode-org/message-format-wg/issues/159) - + EAO: I would like to discuss Issue 159. This is something that arose for me very strongly after the plenary discussions. It is not about the data model, but it is about the assumptions made about how the data model relates to what we are doing, and whether there are other parts. It is important to build this as a system that is relevant and powerful for translators. The view into the structure of the message that the data model provides -- should it be a 1-to-1 mapping between that and what the translators see? Or could the data model be more complex, and then we have a subset that converts to and from XLIFF, and have tooling to support more featureful views to translators? - + MIH: My position is that the two things should not be tightly coupled, but relatively coupled. The two things will be used by translators, so they should match, otherwise it is a misfit. We should not confuse what translators do with XLIFF. XLIFF has standard ways to be extended. - + LHS: I haven't read through ECH's summaries yet, so I may oversimplify, but I think this view is helpful to think about. When I think about these 2 proposals, I think the one that EAO is proposing is one that can sit on top of the data model. I think the same is true for the other proposal, there are some that are useful for translators and some are useful for developers. Or maybe it's between professional translators and developer-translators. But I see that there are tradeoffs, and that it is messy, which is why it is not easy. Does that not make sense? - + MIH: Hearing what LHS said helps clarify what bothers me about the question posed in issue 159. The real question at hand is what do human languages require? When we discover requirements based on that, we change the data model. But that is what should drive our decisions more than what translators see. - + DAF: To MIH's point about accessibility to translators doesn't mean expressible in XLIFF, it doesn't mean they are disconnected / laissez-faire. There is a core that is required in XLIFF. I am fairly sure that a module or two in XLIFF that extends the core may be required of MF 2.0, but we cannot create a data model that is incompatible with the core. - + MIH: What DAF is mentioning about modules are the extensions that I proposed for XLIFF 2. - + DAF: There was an issue in the past in XLIFF about text units and grouping, and I prefer supporting that issue through grouping. - + EAO: Regarding the XLIFF 2 discussion, can we design the data model independently of XLIFF 2? We could have a data model for MF, and have a separate data model that is more compatible with XLIFF 2? I would prefer these 2 things to be separate. Otherwise, the design of the data model will be coupled with XLIFF 2. - + MIH: My position is clear -- the design of the MF data model and the transformation of XLIFF 2 should be done in parallel. But we cannot design the data model without XLIFF informing our decision. But I would like to go back to the point about requirements coming from linguistic features. We should consider our requirements coming from linguistic features. Can we decide on that? - + RCA: Let us have everyone give their opinions and then we can try to come to an agreement on this. - + ZBI: Maybe we are conflating 2 concepts that do not need to be conflated? I am not being opinionated. On the topic of whether the data model should be coupled to XLIFF, I think MIH is describing what I'll call "semantic representation", and EAO is describing "container representation". What MIH says about linguistic features (plurals, gender, case) are about semantic features. For container features, we think about the features that the formats represent, and we think about how the transformations could happen. I wonder if there is a way to escape the responsibility of supporting all linguistic features and being a custodian to having to represent all linguistic features in MF? - + EAO: The shape of the XLIFF representation doesn't have to match the shape of the data model. When I put together the proposed model, I put together 3 different possible examples of what can be done. My point here is to ensure that our model can represent the examples. But the question is whether there can be multiple representations or should there be only one? - + DAF: To support what MIH said, these designs can be done in parallel. But we should be sure that they are done in connection with each other. Regarding "semantic representation" vs. "container representation", not all features in XLIFF are translation-specific. For example, the metadata module contains information that has information that the translators may or may not use. It might be possible to put data in the metadata module that is developer-specific that are not shown to the translators, but that is not how it is designed. - + If you round-trip something as a container, that is by design. If you represent something in the core, it will likely round-trip. If you represent something using existing modules, then it will probably round-trip. If you represent something in new modules, then it might translate. But if you don't need information from a module, then you don't have to touch it for translation purposes, and we just need a good fallback mechanism. - + MIH: What ZBI said is not what I mean. I don't argue for semantic representations at all. We should look at what languages require as features and make sure they are representable in our data model. We don’t need to reflect those features 1:1. A map has keys and values, I can design a collection with generic keys and values and it is usable for all kinds of stuff. But I’m not going to design a tree when all I need is a map. It needs to be rich enough to represent any linguistic features, but no more than that. Otherwise it is too complex. - + STA: Going back to the presentation from last time, there were two approaches that are related to this question. Is this really about the data model vs. the standard library, where we can express some of the linguistic features directly in the data model, or we can provide an agnostic way in the data model to call functions? And in the XLIFF translation layer, can we provide a mapping between the standard library and XLIFF? - + EAO: This issue is that some aspects of the data model defining the structure of the data model and this structure can’t be extended by standard library components really. If we need a strong link between the data model and the XLIFF2 representation that forces certain simplifications on the data model. If we don’t have the forced link, we can consider structures in the data model that are not represented in XLIFF2. - + STA: Is there a specific example of something we worry about not being able to express, or is this more of an abstract worry? - + EAO: The intent was to structure the question so that we don't go into the details of the data model, but instead to describe the edges of the discussion where the decision affects how we proceed on those details. One example is the range formatter, where the start and the end variables are coming in. - + RCA: After our first rounds of discussion, do we have enough information to set up votes and take clear decisions? Or do we need more information? - + EAO: How about the +1 to -1 range we used earlier for voting? - + MIH: This decides something we’ve been discussing for months. Having a vote in the smaller group bypasses the larger group and the work we’ve been putting together for the past month. - + RCA: Although at some point, we do have to do it. - + EAO: My point of voting is not to take a decision, but it is to get a gauge on how people in the room are thinking. - + ZBI: The phrase I like to use is to "check the temperature of the room". - + RCA: Okay, let's do that, voting to check the temperature of the room. - -https://www.apache.org/foundation/voting.html + +https://www.apache.org/foundation/voting.html +0: 'I don't feel strongly about it, but I'm okay with this.' -0: 'I won't get in the way, but I'd rather we didn't do this.' -0.5: 'I don't like this idea, but I can't find any rational justification for my feelings.' -0.9: 'I really don't like this, but I'm not going to stand in the way if everyone else wants to go ahead with it.' +0.9: 'This is a cool idea and i like it, but I don't have time/the skills necessary to help out.' ++1: 'Wow! I like this! Let's do it!' - + Q1: Should the interface of the data model be directly connected (+1.0) or indirectly connected (-1.0) to translators? Zibi: -0.5 DAF: 0 - + Q2: Should the data model be designed independently of the design of the transformation of the data model to/from the XLIFF, while still knowing that XLIFF transformation is required (+1.0), or should the design of the data model be based on linguistic features and with transformation to/from XLIFF being taken into account in the structure of the data model (-1.0)? ZBI: +0.4 LHS: -0 @@ -114,44 +113,44 @@ DLM: +0.7 STA: ? ECH: -0.8 DAF: +0.7 - + DAF: I agree with STA that maybe we are agreeing with each other violently. The development should happen iteratively. We can always adjust as we go along.-1 - + MIH: I think we should design iteratively based on linguistic features. We should go in parallel and iterating, so that we don't make a difficult decision. - + DAF: I don't agree with MIH on one point -- I think we should have a good idea of the design we are intending and what kind of features we are trying to support. Where the uncertainty comes is how do we support linguistic features that we haven't considered yet. - + MIH: I'm not saying that the data model should be designed based on linguistic features, but that linguistic features should be encode-able in the data model. - -ZBI: What I am understanding from MIH, is that data model nodes should be added for the sake of supporting linguistic features, but not for the sake of being more flexible. What DAF said that reminded me of an example from HTML/CSS, which is
. You can attach semantic information to
if you want, and over time, if you're using
to semantically represent a navigation bar, and over time, as several people start using it, the HTML spec evolves to represent that semantic / conceptual construct with the tag itself. And if that is the model that DAF - + +ZBI: What I am understanding from MIH, is that data model nodes should be added for the sake of supporting linguistic features, but not for the sake of being more flexible. What DAF said that reminded me of an example from HTML/CSS, which is
. You can attach semantic information to
if you want, and over time, if you're using
to semantically represent a navigation bar, and over time, as several people start using it, the HTML spec evolves to represent that semantic / conceptual construct with the tag itself. And if that is the model that DAF + EAO: Can everyone read the shape of Q2 in the document, and do the Apache voting on that before the end of the hour? - + STA: It sounded like there is a "tighter" model that MIH has been describing, but it is not clear to me what the other extreme is, and it doesn't seem like there is a clearly defined opposite, and maybe that's why it feels like there is no disagreement? - + DAF: I agree that the other pole doesn't exist. - + LHS: I wasn't hearing a clear difference between the options, so it's not clear where one ends and the other begins. - + RCA: Let's vote in the doc on Q2. We don't have an agreement or that we're not opinionated. What are the next steps? - + LHS: I think the two proposals should have their proponents to flesh out the contours of their proposal to make it more precise. - + EAO: The shortest example for the proposal I support is, when you have a list range formatter, can you have the values for the 2 ends of the range come from 2 different variables, or must they come from one single variable? I think that question helps distinguish the 2 proposals, currently. - + MIH: I went through the exercise of asking myself, "What would it take, what extra would need to change, for me to be able to support it?" After thinking, the answer I came up with is that linguistic features should be represented -But - +But + STA: I have a question for MIH, why did you choose -1? - + LHS: Yes, it doesn't seem to me like there is a clear distinction between the options. RCA: We are 10 minutes over time. Do we have any last comments? We have STA, and we should prepare for the next plenary meeting. - + STA: As a meta-point, I think we're not really clear about what the disagreement is. And I'm glad we have more frequent meetings now. I put created an example here: - + interface Element { name: string, value: string | Element }; - +
@@ -188,4 +187,3 @@ RCA: I may also change my vote because the differences are less clear to me. Let EAO: Did you send an invite to the next extended meeting invite? RCA: Yes, I have already sent out all of the instances of this event in the calendar invite. - diff --git a/meetings/2021/notes-2021-04-19.md b/meetings/2021/notes-2021-04-19.md index 6d992106dd..4c84f618d1 100644 --- a/meetings/2021/notes-2021-04-19.md +++ b/meetings/2021/notes-2021-04-19.md @@ -1,8 +1,8 @@ [Automatic Transcription Part I](https://docs.google.com/document/d/1hX_1by6tx9UwNaOu1-xbQtSDhTotluod4UhXgH5mxnc/edit?usp=sharing) [Automatic Transcription Part II](https://docs.google.com/document/d/1o3SDgGZLohFlcFgcIgcgzDYcr5TioV5haayUdBsWsc0/edit?usp=sharing) - #### April 19, Meeting Attendees: + - Romulo Cintra - CaixaBank (RCA) - Daniel Minor - Mozilla (DLM) - Jean Aurambault - Pinterest (JAU) @@ -17,190 +17,185 @@ - David Filip - Huawei, OASIS XLIFF TC (DAF) - Zibi Braniecki - Mozilla (ZBI) -## MessageFormat Working Group Contacts: +## MessageFormat Working Group Contacts: - [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) ## Next Meeting -May 3, 11am PST (6pm GMT) - Extended +May 3, 11am PST (6pm GMT) - Extended May 17, 11am PST (6pm GMT) ## Agenda + - [ Agenda on Github ](https://github.com/unicode-org/message-format-wg/issues/148) -### Moderator : Romulo Cintra - - +### Moderator : Romulo Cintra + ### Review summary documentation of the existing proposals - -ECH: - + +ECH: + ZBI: I think there are 2 planes that we are talking about. Simplicity vs. Completeness is one plane (dimension) that we are thinking about. Another is Realistic / Pragmatic (fitting into current ecosystem) vs. Aspirational. That is another dimension that we can talk about, and I think it will make it much easier for us to decide. - + MIH: I think the language and how we name things is important. Aspirational is one way of looking at it, but I think the other proposal is so complex that I would call it one that leans towards complexity. - + RCA: I would like both documents to have very similar structure. It would help people who read both sides to understand them better. There might be differing points of view, but it would help in understanding the paradigms. - + I would like to have Pull Requests on the repo once these documents are ready. - + EAO: I'm not sure how we're meant to proceed here with these documents. When should we talk about them? Do you want to present them now? - + RCA: Who has already read these documents? Should we postpone discussion of these documents or talk about them now? I see more votes to postpone discussion on these documents than to present them now. - - - - + ### #159 - Q1, Q2 from extended meeting - + RCA: [Here are the notes](https://github.com/unicode-org/message-format-wg/blob/master/meetings/2021/notes-2021-03-29-extended.md) from the extended meeting regarding issue 159. We took an informal vote to take the temperature of the room on a couple of questions that seemed central to the discussion. Who wants to summarize the 2 questions? - + EAO: My understanding is that Q1 was never asked, and Q2 is a reformulation of Q1. - + RCA: Okay, we talked about summarizing these questions for today's meeting, no? - + EAO: I'm happy to summarize the questions for the proposal that I am in favor of. - + We decided to proceed in designing the data model in layers. How do we design the data model so that the layers don't have to be an exact match to each other, in such a way that the design doesn't have to be a huge monolith? - + MIH: I don't think anyone argues against a modular development or in favor a monolithic approach. As a proponent of the other data model, I think the main difference is not about how to map to XLIFF or to localization. The major difference, I think, is that one of the models is trying to be very flexible so that it can accommodate anything in the unforeseeable future. And the other model is about "let's start with something simple" and add to it as necessary. Neither model claims to be a final draft, and both will take features and requirements, like from Github Issues, and iterate based on that. - + Thinking about mapping to XLIFF is pushing us in a direction that is not so relevant. We already have mapping to XLIFF as a goal. XLIFF already has a way to extend things. I would argue that some coupling is required, but it is not as tight as it sounds. Our data model might have extension models that need XLIFF extensions, but XLIFF is designed for translators, so it won't be the same, but it will be similar because of the overlap in goals. - -RCA: I think we are at a point where we can start to make the data models fit the requirements. We have spent a year or two, and we have a lot of points of discussion, but also lots of points of convergence between the 2 models. So how - + +RCA: I think we are at a point where we can start to make the data models fit the requirements. We have spent a year or two, and we have a lot of points of discussion, but also lots of points of convergence between the 2 models. So how + EAO: Clarifying question: you previously mentioned that the data model and the XLIFF interface should be designed together, now are you saying that they shouldn't be designed together? - -MIH: Yes, I think they should be designed together because if you don't design with XLIFF in mind upfront, then it will be harder to make - + +MIH: Yes, I think they should be designed together because if you don't design with XLIFF in mind upfront, then it will be harder to make + EAO: Are you proposing that we design all of the layers of the data model now, or do we design the XLIFF mapping separate from the data model? - + RCA: My question is whether we design the 2 different data model proposals in parallel, or do we test the data models against features and end up with one data model? - + MIH: I think those are 2 different questions. - -EAO: - + +EAO: + ZBI: - + MIH: I think the way forward is that we start implementing things in the 2 data models. It might mean a little bit of parallel work, even though I wanted to get to a single data model at this point. But I think doing implementation work will help us get to positions that are closer. - -EAO: Another way to proceed is to go back to the consensuses that we previously achieved, and see how well the current data models fit those stated consensuses. - -Another related possibility is ______ - + +EAO: Another way to proceed is to go back to the consensuses that we previously achieved, and see how well the current data models fit those stated consensuses. + +Another related possibility is **\_\_** + RCA: Do we have one data model, or both? Let us check whether both data models fulfill the consensuses that we have achieved. - + The second point is try to prioritize the features each data model data model should respect. - + EAO: And have a deadline of the next extended meeting on the data model, which is in 2 weeks? - + RCA: Yes. - + MIH: The point is to identify the features, not to implement them, per se. - + RCA: I believe so, since we don't have a consensus on a single data model. - + MIH: Maybe it's too early, but let's say we agree on 7 features that we think the data models should implement. Then what? The problem is that people who want a model that is designed to handle features that we haven't seen yet, then will those 7 features be enough? - + RCA: We have to include opinions from everyone, including people who haven't been coming to the meetings, even if it takes longer, so that we have a clear vision. This is just my personal opinion. - -EAO: I think we should list some features right now so that we include in the reporting and work. Like dynamic references and ___. - -MIH: I'll implement dynamic references because I told EAO that I would, but I don't see any specific linguistic feature requests on Github that require dynamic references. But I think what we should be doing is taking actual needs and - + +EAO: I think we should list some features right now so that we include in the reporting and work. Like dynamic references and \_\_\_. + +MIH: I'll implement dynamic references because I told EAO that I would, but I don't see any specific linguistic feature requests on Github that require dynamic references. But I think what we should be doing is taking actual needs and + NIC: I think some of the currency formatting examples would be a good starting point. - + EAO: Isn't the point to identify the requirements now? - + MIH: The important aspect is that we should be implementing linguistic features. - + GRH: From my experience, there are quite a few dynamic features. If I want to say "The object is on" in French, I can change the entire sentence based on what the object is. The definite article and other stuff depends on masculine / feminine, etc. In Spanish, the article depends on gender and plural, and even the word "and" varies. There is a lot going on that you can't hardcode. - + MIH: I totally agree that there are tons of dynamic things, but these are not related to dynamic references. This is why I think we should take linguistic examples and start from there. - + GRH: If I want to say "to the building", "to the car", "from the building", then depending on the preposition, I have to change the message morphologically. - + DLM: I support prototyping abstract features, and identifying features is a good way to find limitations on what is possible in the models. Then we can go back and decide if those limitations are actually linguistically relevant. If we start with what's considered linguistically relevant all the time, it will not give us a distinction between the 2 models, and that will make it different to choose between the two models later on. - + Standa: The way we got to dynamic references is when we have different selection patterns built into a messages. With the assumption that going forward that we would only use top-level selectors, then dynamic references would be the only way to support them, is that right? - + EAO: No, that is a different issue. Both models support dynamic references. - + MIH: What we agreed we need for multiple selectors is message references. Dynamic ones would be that the message has a variable reference that contains a message reference. Another level of indirection. - -EAO: Using the cat example from earlier, ____. - + +EAO: Using the cat example from earlier, \_\_\_\_. + STR: In that case, it is difficult for me to imagine a scenario where that is required in a practical application. So that leads me to Mihai's position, where we should have a linguistic scenario where that feature is actually needed. - + EAO: Let us say that you need to say something about the browser, in a language that modifies the words in different cases. You could have "Safari" and all the ways you pronounce it in different cases, and same for "Firefox". - + STR: That sounds like a pretty legitimate use case, and a feature worth supporting in my point of view. - + MIH: I totally agree with framing things as linguistic features. If that forces both models to implement dynamic message references, that's great. It's not clear to me that this is the case. But I will implement it because I owe it to EAO since I promised him that I would. - + RCA: In order to standardize the selection of these features, should we describe what we want? Or do we just add to a list, and then decide together? - + MIH: I would like it if people provide as ugly of a feature as possible, in other words, things that are tricky to implement but based on linguistic use cases. - + RCA: Can we take this feature as one of the list? - + MIH: Sure, but I would like to have a list that is clear. - + RCA: Yes, I just created issue [#165](https://github.com/unicode-org/message-format-wg/issues/165) in order to do exactly like that. I know that we have collected features previously in other places, but let us use this new issue for our purposes now. - - + STA: Not all features are going to be linguistic. Some of the features that we will likely want for the standard will not be linguistic per se, but will be making expressing certain ideas easier in the format. So I'm not sure if we're just listing issues that are related - + Also, I feel like dynamic references are used as a proxy for a larger abstract issue, but there hasn't been anything else that is nearly as contentious as dynamic references, so maybe we should just talk about dynamic references. - -MIH: Right, we shouldn't _____ - + +MIH: Right, we shouldn't **\_** + STA: Right, and better linting is a feature that isn't a linguistic feature, but is something that we might want. - + MIH: So yes, saying "linguistic" features was too narrow. - -EAO: _____. So really, I guess we're asking for user stories. - + +EAO: **\_**. So really, I guess we're asking for user stories. + RCA: What do you think about collecting features before the next extended meeting in 2 weeks? - + EAO: Let's have features submitted in 1 week. I will guarantee that for any issues are sent within 1 week, then I will be able to review them and be able to respond to them before the meeting. - + RCA: - + MIH: - -EAO: We are treating Github issues as user stories, but _____ - -RCA: - - - + +EAO: We are treating Github issues as user stories, but **\_** + +RCA: + ### Roadmap of Message Format 2.0 for 2021 + RCA: I don't think we are in a good position to speak about the roadmap. But we have made bullet points in the chair group meeting that you can view. [These are the Feb 2021 chair group meeting notes](https://docs.google.com/document/d/1zkWoBAWaMaidHqEk75Psrq8mqWBwp-fyOHAXW_nMOYM/edit?usp=sharing) that contain a sketch of a roadmap. Let me know if you have any additions on that. - + The first thing we spoke about is to unify the data model, which we are discussing right now. The next point is testing XLIFF uses cases against the data model - can DAF or MIH clarify what this bullet point is about? - + MIH: I'm not sure what the question is, since we have it as a goal to have XLIFF mapping. - + RCA: Also, we spoke about having a testing platform - we talked about having test cases, and GRH has provided some of them. Also, having implementations running tests would be very useful so that we can have use cases validated as being supported. - + Also, reference implementations, JS is one example. - + MIH: I also want to have a statically typed language so that the rigidity that you don't have in a dynamic language can be tested for ability to support implementations. - + RCA: Also, advocacy is another area of work. We want to get more people involved as stakeholders, present at conferences and webinars. - + EAO: Where are the tests? - + RCA: You can find GRH's test data in [this pull request](https://github.com/unicode-org/message-format-wg/pull/113/files). - + EAO: It would be nicer as JSON, but XML is okay. - + MIH: Yes, let's merge, and it's not hard to fix or remove later if we change our mind later. - + RCA: Great, it is merged, so https://github.com/unicode-org/message-format-wg/tree/master/test can now be our source of truth for tests. I also know that EAO and MIH have tests in their own implementations. - + MIH: In theory they can, the trouble is that you end up having a lot of extra work to convert the test data form some format into something more suitable for the implementation. - + EAO: I also have a lot of JSOn data used for testing MF v1. diff --git a/meetings/2021/notes-2021-05-03-extended.md b/meetings/2021/notes-2021-05-03-extended.md index 9b3d13c766..058bdbe142 100644 --- a/meetings/2021/notes-2021-05-03-extended.md +++ b/meetings/2021/notes-2021-05-03-extended.md @@ -1,6 +1,7 @@ [Transcription](https://docs.google.com/document/d/1nDqbUaGwUVq_m8vBe4Jpjhdu50tu3D4vCXweQbOhivs/edit?usp=sharing) #### May 3, Meeting Attendees: + - George Rhoten - Apple (GWR) - David Filip - Huawei, OASIS XLIFF TC (DAF) - Zibi Braniecki - Mozilla (ZBI) @@ -15,10 +16,7 @@ - Romulo Cintra - CaixaBank (RCA) - Robert Heinz - Nike (RHZ) - - - -## MessageFormat Working Group Contacts: +## MessageFormat Working Group Contacts: - [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) @@ -27,160 +25,159 @@ May 17, 11am PST (6pm GMT) ## Agenda + - [ Agenda on Github ](https://github.com/unicode-org/message-format-wg/issues/165) -### Moderator : Rômulo Cintra - - - +### Moderator : Rômulo Cintra + ## [Issue #165](https://github.com/unicode-org/message-format-wg/issues/165) Candidate features to be implemented/tested in both data models - + RCA: Let me open the issue and go through them. Before going through them, how should we prioritize the features? Do we vote here in the meeting, or count the points on Github? - + EAO: I think this should be a pass / fail sort of thing. - + RCA: Let's start with dynamic references. - + EAO: The description should explain it. - + MIH: I will implement because I said I would. As a feature, I think it is useless, because you can just pass the variable values as options. - -ZBI: If I hear what MIH said right, dynamic references are not needed because fetch a translation and pass it formatted to another message. I specifically want this feature because they cannot work like that. In the case of UI bindings, you lose the lifecycle in dynamic overlays. If you do this in 2 steps, meaning you make changes at two points in time, if you change languages on the fly, then you lose the ability to make a proper message. You lose consistency and fallbacking. If you don't consider UI messages as a part of our work, then it is tempting to exclude points that - + +ZBI: If I hear what MIH said right, dynamic references are not needed because fetch a translation and pass it formatted to another message. I specifically want this feature because they cannot work like that. In the case of UI bindings, you lose the lifecycle in dynamic overlays. If you do this in 2 steps, meaning you make changes at two points in time, if you change languages on the fly, then you lose the ability to make a proper message. You lose consistency and fallbacking. If you don't consider UI messages as a part of our work, then it is tempting to exclude points that + MIH: Ok, fair enough. - + EAO: I encourage anyone on the call to asynchronously review the issues. I have a comment on a couple of issues on the thread that ECH raised, because I don't think they have an impact on the data model. So I think we should start with people who have opinions on whether issues should _not_ be included so that we can have a negative filter and focus our time better. - + RCA: +1, I agree. Let's focus on teh list. Can we jump to the next one? Okay, Plural Range Selectors. - + MIH: This is one of the very things where I would give it a thumbs down. It feels artificial. A range has a start and end, so to pass only one endpoint as a parameter seems strange. It would be like formatting a date where the month and day are fixed, but the year is passed through -- why would you do that? - -NIC: The only thing I can think of is the "a"/"an" issue where it depends on the number (?), and - + +NIC: The only thing I can think of is the "a"/"an" issue where it depends on the number (?), and + MIH: Is that a different feature, or is that a modification of this one? - + NIC: I thought it was an application of plural range. - + EAO: I opened #125, and there is another issue that is very similar. If I understand correctly,, what MIH doesn't like is that one range end comes in as a variable, but the other one doesn't. - + MIH: Yes, pretty much. - -EAO: I want to include this as it shows a clear difference between the data models, but I understand MIH's point that there are no real world use cases that - -GRH: As far as CLDR plural rules go, there should be explicit support for ___. Even if ranges are required, they can be separated and handled elsewhere. As far as definiteness, they can be defined in CLDR plural rules, anyways. - + +EAO: I want to include this as it shows a clear difference between the data models, but I understand MIH's point that there are no real world use cases that + +GRH: As far as CLDR plural rules go, there should be explicit support for \_\_\_. Even if ranges are required, they can be separated and handled elsewhere. As far as definiteness, they can be defined in CLDR plural rules, anyways. + ZBI: I think GHR answered this. It sounds like translators, in GRH's experience, don't actually need this feature. - -MIH: This feature feels intentionally construed, so - -RCA: Before we continue, I just want to reiterate that the main point here is to go through the features and understand and learn. We don't want to compete between data models. We just want to - + +MIH: This feature feels intentionally construed, so + +RCA: Before we continue, I just want to reiterate that the main point here is to go through the features and understand and learn. We don't want to compete between data models. We just want to + EAO: I want to point out that I have experience with a situation where the start and end of a range in a translation needed to be supplied from an external source, and that source (file) only supported numbers, not ranges, so I would have benefitted from range formatting. - + DAF: I have a use case, and I want to see if MIH, EAO, ZBI can support this. In Czech, you have plural keywords many, one, few, etc. But for large numbers, you have this again, and I wonder if that would be supported. - + GRH: I think I understand the problem. It sounds like there is a confusion between plural rules, where you have the ONE case, and the need to have ranges. I really want CLDR plural rules to be handle ranges. I don't these ranges handle if the last digit is a 1, which is a CLDR plural rules thing. - + MIH: So there is no discussion here about supporting / not supporting ranges. I think the feature DAF was mentioning and the one GRH wants supported is something like "1 - 5 files" which depends on the start and/or end values of the range. The issue we're discussing here in #165 is whether you can pass the start and end as 2 individual parameters or as a tuple of 2 values. I am okay to implement it, but I don't think it will show any difference in the models. - + DAF: You want to split month from day name because the months can have different genders. So that is a case where you want to split things up. - -ZBI: - + +ZBI: + `foo { PLURAL_RANGE($dateStart, $dateEnd) }` vs. `foo { PLURAL_RANGE($dateRange) }` where `dateRange = ($dateStart, $dateEnd)` tuple written by an engineer in practice, the difference is that in the latter model, the localizer interacts only with one variable and cannot fiddle with separately start/end variables. - + Does that answer your question? - + EAO: Just an observation, since we have agreed to include this one for now, it might be good to move to the next one. - + RCA: Let's move to the next one, the multi-selector message. Issue #119. - + EAO: That's a collection of deeply nested real-world selector messages. The point of this issue is to show that it can be handled in the data model. - + RCA: Any comments? We have STA's example, NIC's examples, MIH's examples, EAO's examples. - + EAO: I think we're fine, we're all in agreement that this one should be included. - + RCA: Next is the custom list formatter. - + EAO: This issue has a lot of overlap with the next issue, the first one from ECH. The issue is whether you can concatenate lists in the data model. It requires transformation/formatting on each item in the list. - + MIH: Can you explain what you mean about concatenating lists? - + EAO: Let's look at the link to the test suite, [line 354](https://github.com/messageformat/messageformat/blob/mf2/packages/messageformat/src/mf2-features.test.ts#L354). That takes in a list at run time and appends a word, and appends a word at the end. You have the list items on line 358, and you append another word at the end. - + MIH: No, you can't concatenate to the list because you can't guarantee that it works in all languages. - + EAO: So you're saying what's on line 370 doesn't work? - + MIH: Not in all languages. It works in English, but it doesn't work in all languages. - + EAO: Can you give an example of a language where it doesn't work? - + MIH: You gave an example where you said some language could put that "another vehicle" at the beginning. The "another vehicle" is not in teh same bucket of the list items. I have a hard time explaining it. - + EAO: Can someone else chime in? Am I off base / not understanding something? - + DAF: I think I understand what MIH is saying. The "another vehicle" is an operator on the whole list, it is not an item of the list. Does that sound like what you were saying, MIH? - + MIH: Yes, I can't explain it better. - + DAF: Can you give an example of a language? - + RCA: Let's move on to the next. ECH, do you have any comments? - + ECH: The only thing I would point out is that the in the Tamil example, you have a word at the end of the list for conjunction / disjunction, and then when you conjugate that word for the dative case, you might have to double the consonant phoneme that begins the next word. But it also requires looking at the phoneme level of the words to know the beginning phoneme of the next word, which is a space not represented by code points or grapheme clusters segmentation. But these are issues that might apply to both data models in a way that might not affect the design of the data model itself. - + GRH: ECH brings up good points on lists. I haven't seen use cases of adding an item to the end of the list. But we do have use cases of making it definite or indefinite, changing the case ("to"/"from"), etc. - + EAO: It's something that I don't think affects how the data model should be. - + MIH: I agree, I think it's about how smart the list formatter is. - + DAF: I say that they are totally different features. I say "neighborhood-dependent formatting" is not constrained to lists at all. And then there are lists. I think the neighborhood-dependent formatting is completely orthogonal. Back to lists. You have operators on the list, like conjunction, disjunction, and so on. But "etc." is also an operator on the list, it is not an item. - + ECH: I just wanted to reiterate what DAF described as "neighborhood-dependent formatting" as a formatting concern that's higher-level than just the list, because the list formatter will never know what the next word after the list is, no matter how smart it is. I wanted to make sure that the point is not lost. - + MIH: Let's just take the list formatting operator point, but leave aside the higher-level formatting concern from the Tamil example, which also introduces an issue that doesn't affect the data models. - + EAO: +1 - + RCA: Let's look at the next issue. - + EAO: I think with the "inflections in interpolating placeholders" issue and the "Inflections" issue, these issues are similar. I think these won't make a difference in the models. But the "Full message fallback" is a different category of issue. - + GRH: I want to point out that CLDR gives you rules about inflections, but it doesn't tell you whether the rules apply. - + EAO: Since we are running out of time, can we leave out the "neighborhood" formatting feature? - + DAF: Yes, this is orthogonal. - + MIH: I +1 EAO's suggestion to leave it out, for considering later. - + RCA: Next one is "Full message fallback". Any comments? - + EAO: It impacts syntax, and it requires that the data model has metadata, but that is ia thing we've already agreed on doing. And it requires runtime formatting to change. - + MIH: What does this have to do with metadata. - + EAO: You have to provide a value for the fallback. - + RCA: Next is inflections. - + EAO: We agree that this was related to the neighborhood- / sentence-level formatting. - + MIH: I wouldn't call it "Inflections" though. This isn't an inflection. For example, in English, to make something a definite using an article - + GRH: For stuff like that, that is called a new surface form. You have a lemma, and in some languages, when it changes, it's a different surface form. - + RCA: What should we do for next steps? - + EAO: The next step is compare how these data models look after we support each of these features and analyze. - + MIH: To implement, basically. - + EAO: I would also like to present implementations of the tests for the proposal I'm supporting and a preliminary XLIFF conversion module. diff --git a/meetings/2021/notes-2021-05-17.md b/meetings/2021/notes-2021-05-17.md index 03cbdd8765..1e101f6f1f 100644 --- a/meetings/2021/notes-2021-05-17.md +++ b/meetings/2021/notes-2021-05-17.md @@ -1,7 +1,7 @@ [Automatic Transcription](https://docs.google.com/document/d/1qK6tKs5HwysOad0ch6eqAfNo-x2Zvp5ffQMTcY-4OsY/edit?usp=sharing) - #### May 17, Meeting Attendees: + - Romulo Cintra - CaixaBank (RCA) - Elango Cheran - Google (ECH) - George Rhoten - Apple (GWR) @@ -16,23 +16,23 @@ - Zibi Braniecki - Mozilla (ZBI) - Jean Aurambault - Pinterest (JAU) - -## MessageFormat Working Group Contacts: +## MessageFormat Working Group Contacts: - [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) ## Next Meeting -May 31, 11am PST (6pm GMT) - Extended +May 31, 11am PST (6pm GMT) - Extended June 21, 11am PST (6pm GMT) ## Agenda + - [ Agenda on Github ](https://github.com/unicode-org/message-format-wg/issues/171) -### Moderator : Romulo Cintra - +### Moderator : Romulo Cintra + ## Q&A - Summary Documents for each proposal - + Proposal that Leans Towards Making Things Possible Question (LHS): in https://github.com/unicode-org/message-format-wg/blob/master/guidelines/goals.md we say a “non-goal” is “Support all grammatical features of all languages. Instead, focus on features most commonly encountered in user interfaces, textual, graphical and spoken ones alike.” but this proposal states that “We believe that the role of the data model is to express the breadth of human communication. To achieve that end, it’s more important to make their expression possible than to establish strict or artificial limits on the complexity of those messages.” ...these two things seem to me opposed. Do the proposal authors want us to change the MFWG’s goals/non-goals, or am I misinterpreting how the two things can align? Comment/request for details (LHS): This proposal says a lot of things that I can get behind like (I’m paraphrasing), “Let’s design something that should work for 10-20 years” and “Let’s not let existing patterns/technologies unnecessarily constrain us”...BUT the devil’s in the details, and I think the tradeoffs matter. For example, I think we should seriously consider breaking with existing patterns/technologies (e.g. XLIFF) but only if it’s worth the cost. This proposal doesn’t give examples of things that it thinks are/aren’t worth the cost, so it’s hard to evaluate. Can the proposal authors be more explicit or give examples? @@ -41,186 +41,176 @@ Proposal that Leans Towards Simplicity Comment/request for details (LHS): similar to the other proposal, it’s hard for me to get a handle on the tradeoffs without examples (I thought these summaries are supposed to mostly stand alone so that the wider MFWG can read them & respond...is the expectation that everyone dig into the detailed code to understand how the 2 proposals differ? if so, can you give examples for why you think 1 proposal is better than the other?) Put another way: the current documents seem like “philosophical statements” and thus it’s hard to evaluate them, especially compared to each other (and especially because my instinctive response is “well of course we’d want to find a good compromise between simplicity and power/flexibility, understanding the tradeoffs”...but the specific tradeoffs/compromises really matter!) Questions (LHS): I’ve added a few other comments/questions in the doc - - - - - - -### [Strawman Proposla for an XLIFF 2 MessageFormat Module](https://docs.google.com/document/d/1D702OBAzT-Crb9XXUiZYJnFO9Yq5duRy4Zc3Br6JwRU/edit ) - + +### [Strawman Proposla for an XLIFF 2 MessageFormat Module](https://docs.google.com/document/d/1D702OBAzT-Crb9XXUiZYJnFO9Yq5duRy4Zc3Br6JwRU/edit) + EAO: This is based on the data model prototype that EAO and ZBI have been working on. The proposal is complete based on the XLIFF 2 documentation. I'll present how this connects to MFWG model. This is based on some understanding on XLIFF 2. - -What we are adding here are having these placeholders to representing a value, a variable reference, or a data reference. You can have `hello {foo}`, where `foo` is looked up. The second place where these placeholders are used is a variable, where this refers to a group of messages, and the selector is must be used to select a message out of the group. See excerpt starting with ``. In another example, within a `` element, messages exist within the ``, and they are referenced through a message id that is stored in the `` element, which is in turned referenced within a `` unit of an XLIFF ``. - + +What we are adding here are having these placeholders to representing a value, a variable reference, or a data reference. You can have `hello {foo}`, where `foo` is looked up. The second place where these placeholders are used is a variable, where this refers to a group of messages, and the selector is must be used to select a message out of the group. See excerpt starting with ``. In another example, within a `` element, messages exist within the ``, and they are referenced through a message id that is stored in the `` element, which is in turned referenced within a `` unit of an XLIFF ``. + There is code to convert Fluent and MFv1 syntax to and from XLIFF according to the MFv2 data model proposal. - + The code is available in open source, with e.g. a test suite [here](https://github.com/messageformat/messageformat/blob/mf2/packages/xliff/src/mf2xliff.test.ts). - + GRH: It looks like a good start. I had a question. Let's use Arabic as an example because I like to go for the hard examples. There's grammatical gender for the noun, and there's grammatical gender of the pronoun attached to the noun. How would you handle such a scenario? - + EAO: In the example between Finnish and English, you have to work with both case and gender as inputs. - + DAF: I'm seeing this for the first time, I think it is well designed from teh XLIFF 2.0 point of view. I think it will be interoperable even if the agent doesn't share the namespace. You assume that you will be able to used the namespaced attributes inline, which is a core assumption. There is a caveat which is that until you achieve the status of a module, you won't be able to insert your namespaced attributes into core inlines. - + EAO: Clarify what you mean by core inline? - + DAF: You assume that you can use this namespace in placeholders, etc., and `` is a core inline. - + EAO: `` accepts open inlines. - + DAF: Until you register -- - + EAO: I have checked with the spec when writing this, but please tell me offline where in the spec it says so. - + DAF: Sure. I want to hear what MIH says. I think it respects the core data model. - + (via chat) http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#ph Constraints - + The following XLIFF Module attributes are explicitly allowed by the wildcard other: - + - attributes from the namespace urn:oasis:names:tc:xliff:fs:2.0, OPTIONAL, provided that the Constraints specified in the Format Style Module are met. - attributes from the namespace urn:oasis:names:tc:xliff:sizerestriction:2.0, OPTIONAL, provided that the Constraints specified in the Size and Length Restriction Module are met. -No other attributes MUST be used. - + No other attributes MUST be used. + MIH: Overall, it looks fine to me. Maybe some of the areas that I would discuss would be just typical XML discussion, like why do you decide to make something an attribute versus an element. But it's no biggie. The other one comment I have is that I already have a proposal for handling gender, plural, and select for XLIFF, and I think I've shared it already. Maybe we should unify? Did you look at it already? Did you deem it not good enough? - -EAO: I already looked at the GPS proposal, and it links up with the discussion on ____. - + +EAO: I already looked at the GPS proposal, and it links up with the discussion on \_\_\_\_. + MIH: At first look, there's not anything that I can't reconcile. I'm cautiously optimistic because at the first look of the data models, I thought we could reconcile those easily, and here we are several months level. - + STA: A question about XLIFF -- what if the source language doesn't have a select, but the target language needs a select? - + DAF: XLIFF would normally be created under a corporate structure as source-only. Target would have to be created later. Target is not required, which allows content creators to create read-only content. The translators are not allowed to modify source, but they are totally free to create structures for target languages. - + STA: It seems that you want to insert a group, then, to capture all of the messages in the target language. - + DAF: You have to know what grammatical features that you are implementing. - + STA: A really quick comment -- I like the fact that the default value for the selector is specified on the selector rather than on the branch. - + DAF: This is what makes it core interoperability friendly. You don't have to navigate too far into the data structure itself. - + GRH: I know that there are frequently languages that are more linguistically variable. For instance, if you translate from Mandarin into English, you might find a word that could be singular or plural. If there is a translator to produce functions for the language that they are creating, independent of the content creators, that would be useful. - + MIH: I wanted to explain what DAF already explained with different wording, and I think it will answer GRH's comment. I do think that there is no way for anything down the line, after a XLIFF doc was created, to create new units. For example, if you create a Chinese unit, there is no way to insert a new unit in the translation. You have to know what your target language is at time of XLIFF creation. So I will have no idea where the XLIFF doc goes, whether it needs gender selects, etc., or whether it will be passed to MessageFormat. - + EAO: I think what we are identifying here is a possible weakness with how we work with XLIFF. At the very least, I have code that shows that this is not a limitation with how we translate to and from XLIFF. Also, I thought I would mention something that STA mentioned about selectors. I also posted a [PR for a clear spec for how case selection could work](https://github.com/unicode-org/message-format-wg/pull/170), which describes how it will work with the model that ZBI and I have worked on. - + STA: Coming back to what DAF and MIH said, does this imply that the XLIFF producer needs to be aware of nay functions, including custom registry functions that we were once talking about, that might introduce variability? - + MIH: Yes. - + STA: Is that new? - + MIH: It's been true forever. The whole l10n world works on the idea that you give me a simple message, and I'll give you a simple message back, not something programmatic. - + DAF: It is modular now, so if you can ask for the right thing, you will get it back. - + STA: So it would be the responsibility of the user to use the producer such that you get teh right XLIFF. - + DAF: Yes, to use the extractor or agent. We can define a new type of agent to do this. We can tell translators to not delete placeholder or some other item. If we don't produce that, then the translator cannot work with that. - + STA: If we have a registry of functions, then they will have short declarations that we can use. - + DAF: Yes, and we can define a new type of agent to do that. But it would always have to be target-language specific. You cannot work without a specified target. - + GRH: At least in Siri's use cases, we already use XLIFF. It works fine, and we can specify language-specific grammatical features. It works pretty well, it saves the translator to write out every single possibility, and allows the source creator to not predict every grammatical detail of every possible target language. So my preference is to have something similar. - + MIH: I'm really interested in how that works. Maybe not right now. - + GRH: Sure. I'm trying to say that I don't want to put the burden on the source author, which is good for long-term viability. - + MIH: Sure. Let's say that you have source in Arabic and you need the gender of the pronouns, are you saying that the translator doesn't need to include that in the translation? I'm not quite understanding, but maybe we can discuss over email. - + DAF: There is nothing programmatic that you do in XLIFF with a node, right now. - + RCA: I have a question, about the match of this proposal with what Mihai said earlier. Is there an opportunity to achieve that? - + EAO: Probably. It depends on the exact specifics of the data model that we end up with. But given that in all our specs, this is based on a data model that is more complex than that one, there is no reason why this can't work with the other data model. - + RCA: The other question here is what are the next steps? Do work on the data model and review later on? - + EAO: One reason that I went ahead and did this all thing is to have a proof of concept that all of this is possible, and have it map to the data model structure that we have. Specifically, if you all can review it and find things that can be improved, then we can make this a better proposal, but I am not aware of needful things to work on this. - + MIH: I think it's really beneficial to look into this kind of mapping XLIFF, whether we choose this data model or the one I have been working on. The sooner we look, the sooner we bump into issues of what works and what doesn't. - + DAF: A small pointer, there is a related ideas markup. The related features was added in XLIFF 2.1. Through ITS, you can specify the intended target languages. ITS is the markup for internationalization. That's just a pointer if you're looking into it. - + RCA: Can you share progress on implementing the features in the data model, how is that going? - + MIH: We discussed implementing features, I did that. I also added an implementation in Java, which is also in the `experiments` branch, [link here](https://github.com/unicode-org/message-format-wg/tree/experiments/experiments/data_model/java_mihai). It implements everything we talked about. It implements some ideas that I came across as I was implementing it. The tests directory in the source code folder shows how everything works. In FancyListTest, the testListWithItemMultiProcess test shows how 3 functions are chained. The first name of 3 person objects are retrieved, the dative form in Romanian is computed with another function, and then it is combined into a list with another function. - + The GetPersonName class shows what it takes to implement a custom function. They are all functions, so they are all equal, one is not more special than another. - + RCA: Is this `personName|grammarBB` a function too? - + MIH: Yes, I'm not fully happy with calling this a function, but is a something that is callable at runtime. - + RCA: Any comments? - + EAO: I noticed in your dynamic message reference example, the message of the browser messages is flat. Did you want to support deeper links with e.g. dot notation, or something else? - + MIH: Our structure doesn't need to have that structure. - + EAO: I thought you had message groups that can contain message groups. - + MIH: Yes. I don't want to tie the message files to the message ids. I don't wnat to encode where teh messages actually live. They may live in source files, or a database, or your Android or iOS resources. So the exact format of the message id doesn't matter, if you want to segment them with dots or slashes or colons, you can figure out how to interpret the id and retrieve the message. The RescManager is the abstraction on top of fetching the message, I don't want to expose that in the data model. - + GRH: I think this is a good first step. There's frequently a lot of cases where you need to reduce the burden where you require the translators to provide every single possibility. For some words, you provide a definite article for a specific word. Or for prepositions like "on Hawaii" vs. "in Calfornia" because Hawaii is an island. Maybe you had that in "BB"? - + MIH: Yes, I named it GrammarCasesBlackBox because I don't wnat translators to get hung up with how things are implemented. - + GRH: I have a function in house to do such work, so maybe I don't want to name it "block blox". - + MIH: No, of course you don't, you want to name it something meaningful. When you implement it, all you do is point it to your own logic function. - + GRH: How does this work? Do the translators do this? - -MIH: You just implement the function, register - + +MIH: You just implement the function, register + GRH: Can you do this for multiple inflection engines? - + MIH: Yes, because in theory, the data model doesn't know and doesn't care about the implementation of the functions. - + GRH: It's kind of like namespaces. - + MIH: Yes, it's kind of like namespaces. You have a black box in the background that does the linguistic work, you just expose which switches with labels and how to use them. XLIFF will be written to indicate whether an argument (?) can or cannot be deleted. But you cannot add any new switches. - + GRH: The translator cannot add new switches? - + MIH: Yes, this goes back to the discussion about having a schema for the functions in the registry. You c - + GRH: I like this idea, there are a few things that I would like to expand on it. Maybe there needs to be advertisement of supported functionality, because maybe one function has inflection of nouns and maybe another function has inflection of verbs. - + I recognize that I will have several comments on this, what is the best way to add comments? - + RCA: Maybe we can create a pull request that we don't merge, so that we can have comments for specific lines. - + EAO: You mentioned using dotfiles and other formats for representing the messages. And that brings up the topic of the syntax that we use for MFv2. So maybe we can bring that up in our next extended meeting. Would that be okay? - + RCA: Sure I will add an item to the next meeting agenda. - + EAO: In our data model, I wanted to point out a nested message for dynamic message references where the messages references are more of a path (sequential data structure) to index into the nested message structure. I also wanted to show a rough implementation of formatToParts and a hacky attempt at a function that can correct the article in English "a" / "an" depending on the main noun. - + MIH: We should be able to do these kind of corrections across formatting tags, ex: `an hour`. Also, in the final formatToParts, we should be able to represent overlapping fields. - + RCA: I actually found more similarities than differences between the data models, to be honest, so I think we are on a good path. I know that we want to start discussing the syntax during the next meeting, but I don't want to lose the focus on choosing or unifying a data model. - + EAO: One reason I proposed working on the syntax is because I think the syntax informs the data model. - + MIH: My take on the syntax is that I don't think it affects much of our decisions, which is why we are working on a data model, not a syntax model. I hope the JavaScript world makes its implementation idiomatic. - + EAO: Just a short observation, this would require us to go back and redefine our deliverables if we don't think we don't need a syntax. - + MIH: No, I am not saying we should go back on that, I'm just saying that I'm not very opinionated. - + ### Proposal that Leans Towards Making Things Possible - - - + ### Proposal that Leans Towards Simplicity - - - diff --git a/meetings/2021/notes-2021-05-31-extended.md b/meetings/2021/notes-2021-05-31-extended.md index 4157d33280..4bc253192d 100644 --- a/meetings/2021/notes-2021-05-31-extended.md +++ b/meetings/2021/notes-2021-05-31-extended.md @@ -1,6 +1,7 @@ [Automatic Transcription](https://docs.google.com/document/d/1DN9BDkJqtnY3UoI28k3PYUcLhsjlhk3fJK5J2LC_Atk/edit) ### May 31, extended meeting Attendees + - Romulo Cintra - CaixaBank (RCA) - David Filip - Huawei, OASIS XLIFF TC (DAF) - Daniel Minor - Mozilla (DLM) @@ -10,139 +11,134 @@ - Zibi Braniecki - Mozilla (ZBI) - Staś Małolepszy - Google (STA) - -## MessageFormat Working Group Contacts: +## MessageFormat Working Group Contacts: - [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) -## Next Meeting +## Next Meeting June 21, 11am PST (6pm GMT) -July 5, 11am PST (6pm GMT) - Extended +July 5, 11am PST (6pm GMT) - Extended +### Moderator : Rômulo Cintra - -### Moderator : Rômulo Cintra - -Related issues : +Related issues : [68#](https://github.com/unicode-org/message-format-wg/issues/68) [48#](https://github.com/unicode-org/message-format-wg/issues/48) RCA: How should we start this discussion? - -EAO: My understanding is that we want to define a particular syntax for MFv2 just like we have for current MessageFormat. That would be parseable and handled by tooling. - + +EAO: My understanding is that we want to define a particular syntax for MFv2 just like we have for current MessageFormat. That would be parseable and handled by tooling. + ECH: Do we say "the canonical syntax" or "a canonical syntax"? - + EAO: It says "the syntax". - + ECH: That's fine, so that means to me that we maintain one syntax, and there will be other syntaxes, but there will be only one data model, which is the important part to me. - + RCA: Do you have any examples in which you have started looking at them? - -EAO: Those examples are the JSON files and the Fluent examples which can be read by my prototyping code. But we need our own syntax that supports features that cannot be supported by current MessageFormat or Fluent, like having selections on messages using more than 1 selector arg to define a selection case. - + +EAO: Those examples are the JSON files and the Fluent examples which can be read by my prototyping code. But we need our own syntax that supports features that cannot be supported by current MessageFormat or Fluent, like having selections on messages using more than 1 selector arg to define a selection case. + RCA: What is our starting point? - + EAO: I think we can start at select messages. Whether the syntax for the selection should be embedded in a message, or should it be part of the structure of the larger message that approaches a file format. - + ECH: What does "approaches a file format" mean? - + EAO: MessageFormat defines a simple message format and selection message. But Fluent designed its own format for representing collections of messages. - + RCA: STA, could you share the principles or drivers that made Fluent come up with the new format? - + STA: Well, EAO was there a question of a single version or a collection of messages. But then there was the question of how Fluent compares to MFv1? - -EAO: MFv1 defines a simple message, and a selection message. But if we define a collection of messages, and in a way that is clear for how selects happen. That could work for a simple message. But - + +EAO: MFv1 defines a simple message, and a selection message. But if we define a collection of messages, and in a way that is clear for how selects happen. That could work for a simple message. But + STA: Are you suggesting that we have a hierarchy of messages? - + EAO: I'm saying that this is a decision that we should address and find an answer to. - + ECH: I don’t think that supporting a collection of messages implies that there is a file format that needs to be designed. I don’t think it needs to be a file either. - + STA: Talking about collections is useful. I’m not sure we need to solve syntax right now, that is ahead of us. - + ZBI: I think that ECH is conflating two concepts. I don’t think we should be narrowing ourselves and we need to make sure we can express our data in a non-file format. But at the same time, we do need to define a file format to target the web. Once we move beyond pure JavaScript, we need to think about how what we’re doing will be used by further projects without trying to scope creep our current project. I hope whatever we design will be a good candidate for a localization system for HTML, and that will require a file format. But we need to recognize that it isn’t the only way to store data. - + ECH: Having a grouping of messages is something that in the data model huddle meetings is something [we agreed upon early](https://github.com/unicode-org/message-format-wg/blob/experiments/experiments/data_model/ts_data_models_name_mapping.md). I think if you can create something that can be serialized to a stream of bytes, either a message or group of messages, where it is persisted is a detail exterior to MF2.0. There has to be some syntax, at least one syntax that we maintain. That syntax is not important, whatever is commonplace and requires the least amount of adoption effort, associative data (maps), sequential data (lists), then that is sufficient. Pretty much any syntax does that. JSON is common and does that, we’d have to define a schema. - + EAO: I think we’re conflating two different things. One is the canonical source for how you write a message, e.g. in Fluent, MF1.0. The second is what is the expression of some set of messages that has been parsed, how is that to be represented. These are not the same and are optimized for different purposes. How well do we support the expression / embedding of MF2.0 messages into data structures that are used by traditional systems that are technically capable of having a single message being expressed. Is this concern high enough for us should we discard the possibility of using the structures and keys to drive how we do their select. - + ECH: I don’t understand the distinction between the serialized format and the representation in memory. - + EAO: If I’m writing a message, I want to write it in something humane and easy to write, something terse and easy to read and usable that way. Once this is parsed into the data model, the expression of this can be transferred in a different format that is useful for computers talking to computers. E.g. using YAML vs. using JSON. - + ZBI: I think I have a better way to frame this: why is CSS not expressed in JSON? There were debates on how to encode CSS for the web in the early days, and they chose something other than JavaScript. So it’s worth considering we’d have a syntax that is not JavaScript based, we should not assume that JSON is the best way of handling this. ECH: I consider this not metadata but data, I think the priority is that it is unambiguous to the computer and that making it human friendly is good. I think there’s an assumption in the analogy to CSS that doesn’t apply in the way I see things. Using CSS is talking about something that is a language and is mostly written by programmers and you’re editing it by hand. Translators are not usually programmers so to what extent are things being written by translators. Is CSS really crafted by hand these days. - + ZBI: Is CSS really written by programmers these days, that is a good question. - + ECH: Does optimization for presentation really matter? I don’t see what is special about a MF syntax that requires something that is specialized. - + ZBI: Do you see it for CSS. Would you still create a separate syntax for CSS if you were designing it today? - + ECH: Yes, because you’re editing it directly, so the text format is important. I don’t see that as being in MF. If we require that you need to be a programmer to do translation work, then how is the industry working. - + EAO: Example from <…> When we’re talking about syntax for humans it needs to be easy to read, but when serialized for computers, it needs a structure that is easy to process. The other thing is that is representable in the data model should be representable in the syntax. We can imagine something that the data model can support that we would not necessarily want to be directly in the syntax, something that results from processing the syntax. - + STA: I want to respond to ZBI who is talking about CSS. I do think it is a good analogy, but what is implicit is that we consider CSS a well-designed language. Is it? It represents complex ideas, but I’d hope that the result of our work is simpler than CSS. The syntax is simple, but the semantics are complex. If we want to design something for non-programmers so that it is easy, but we realize we’re working with a complex matter of grammar and languages. - -RCA: I just want to share some thoughts on this analogy to CSS. I'm seeing more and more, nowadays, different ways of expressing CSS, ex: CSS-in-GS, etc. It's quite easy to represent CSS as it is to others who know CSS. A good starting point to represent them differently in structure -- a very non-standard way to represent CSS -- the existing ways how flexible we can be in representing CSS. But it brings up the question of how effective were they in designing the original CSS file format? In this year - + +RCA: I just want to share some thoughts on this analogy to CSS. I'm seeing more and more, nowadays, different ways of expressing CSS, ex: CSS-in-GS, etc. It's quite easy to represent CSS as it is to others who know CSS. A good starting point to represent them differently in structure -- a very non-standard way to represent CSS -- the existing ways how flexible we can be in representing CSS. But it brings up the question of how effective were they in designing the original CSS file format? In this year + ECH: I’m going to not touch that question for now, because it is more like a programming language question. I wanted to respond to EAO’s example. EAO, you were talking about two different use cases, the more important use case is the one in which computers talk. And that goes with the idea that we’re including functionality. I’m not precluding the idea that we should have a compact, concise representation for humans, but we shouldn’t maintain as a group a syntax which limits what you can do in the data model. If we choose JSON, and we have a YAML representation that doesn’t have all of the functionality, but we shouldn’t limit our canonical syntax. I prioritize computers over humans when push comes to shove. - + EAO: I think computer exchange of data model is relatively easy and non-controversial if we ensure that the data model is representable in json. If it supported by json, it is easy to guarantee that machine exchange will work. I don’t think that that expression is the best expression for humans to use. It is verbose and not suitable for humans to write by hand. - + RCA: Adoption of MF2.0 could be affected based upon this decision. - + ECH: I was going to say we don’t need to consider the human representation, our job should just be the version that works with computers and if other people want to make something human readable, that is fine. How different is that going to be from JSON for simple messages? If things get more complicated then we need something fully functional. If that does affect adoption, then human friendly representation might be important. How often are people editing things by hand that are complicated? - + EAO: At least with MF1, the only way to do it is to write the source by hand. We don’t know where the future will take us, but at the moment, anything complicated needs to be written by hand. - + STA: From Fluent, we designed it such that it could be edited by hand by pretty much anyone. Once we started using it, I realized that the only people who were interested in editing Fluent by hand were mostly programmers, and a few translators with programming experience. My recollection is that I personally felt some disappointment that we weren’t able to convince “regular” translators to use Fluent syntax. Instead we jumped through hoops to hide syntax from them and design rich UIs so they don’t have to see syntax. This could be part of a larger conversation about who we expect to edit these files. It is easy to think about translators, but they will likely favour graphical UIs over syntax. But programmers create those localizable strings when they write code, and they will favour a text syntax. - + ZBI: We shared this experience, but I think I see it slightly differently. This is more in alignment with how ECH sees it. Editing by hand is a fallback. Predominantly localizers will work with UI. I see three scenarios where this doesn’t happen: -1) Programmers adding new localizable sources. -2) Some organizations will lack resources for UI design / development, and will want to just edit files for simplicity. -3) Fallback, in a big project we create a chain through localization UI, and then there is a last second mess that needs to be fixed quickly. E.g. mistranslated string days before release, can be fixed directly, skipping all of the UI steps. - + +1. Programmers adding new localizable sources. +2. Some organizations will lack resources for UI design / development, and will want to just edit files for simplicity. +3. Fallback, in a big project we create a chain through localization UI, and then there is a last second mess that needs to be fixed quickly. E.g. mistranslated string days before release, can be fixed directly, skipping all of the UI steps. + ZBI: I think that it is a fallacy to say that because UI is the primary target, that we can discount the fallback to text, even if it is the minority of use cases. We could let other people design the human representation, but I think it is a fallacy to say that it is not necessary. - + ECH: We can build tooling to handle complicated messages and tooling in text editors can help programmers with this. It is important to make things work for translators, let programmers deal with complexity and textual representations if they need to. Tooling can solve some complications from verbosity. I trust programmers to handle this. - + RCA: Do we have an idea of next steps? - + EAO: We need a decision on whether we consider human friendly syntax to be a deliverable. We agree on computer friendly representation, but we need to decide on what will fulfill our deliverable with regard to syntax. - + RCA: I’m not sure if we can decide this now, or after we have one data model. - + EAO: I think it is orthogonal between how the data model looks and whether it has a single syntax. - + RCA: We might not want to go deep on syntax while we still have to merge data models. The syntax is the representation of the data model, for the end user the syntax is what matters, not the data model. - + STA: I think we should proceed in parallel, since we’re somewhat blocked on the data model front. The syntax discussions might inform the data model discussions. The select logic would be a good action item for this group. We should limit ourselves to a single message for now. I think syntax is tricky whenever people use ‘human friendly’ or ‘readable’ because no one understands these terms the same way. - + RCA: If we parallelize this, everything we do requires pushing something further in the future. This might require more effort if we split, we might not finish our first goal. - + EAO: I second what STA said. Since we’ve postponed the decision on the data model discussion, one thing that the syntax discussion might give is feedback on how we can represent parts of the data model in the syntax which might inform the data model design. If something is very difficult to represent in the syntax, it might not be worthwhile including in the data model. - + ECH: I think the syntax discussion won’t be super in depth and abstract like the data model discussion. So I think it makes sense to do things in parallel. I do want to clarify that we haven’t postponed the data model discussion, it is just taking a long time. - + EAO: I meant that we postponed making a decision on the data model, not postponed the data model itself. - + RCA: I think we’ve made a good start. - + ECH: We should check with the larger group on whether to proceed with the syntax in parallel. EAO: We should open an issue for this. - + ECH: Clarifying in an issue, with some of the points about machine readable vs. human friendly. - + RCA: Creating an issue would help people who are not here. We can make things more concrete. - + ECH: We should also discuss adoption, but that is subjective in the absence of more data. At some point we’ll have to make a decision and we probably won’t have data to base it on. - + EAO: We’re 15 minutes over, the meeting is officially over. - - - diff --git a/meetings/2021/notes-2021-06-21.md b/meetings/2021/notes-2021-06-21.md index 6b6bd1a3cc..823dabcda8 100644 --- a/meetings/2021/notes-2021-06-21.md +++ b/meetings/2021/notes-2021-06-21.md @@ -1,9 +1,10 @@ [Automatic Transcription](https://docs.google.com/document/d/1WXgOd8FA_kcXzz4OABk1Y3asPHH8KoEaya0ebr6FaIM/edit?usp=sharing) ### June 21, meeting Attendees + - David Filip - Huawei, OASIS XLIFF TC (DAF) - Eemeli Aro - OpenJSF (EAO) -- Mihai Nita (MIH) +- Mihai Nita (MIH) - Romulo Cintra - CaixaBank (RCA) - George Rhoten - Apple (GWR) - Daniel Minor - Mozilla (DLM) @@ -16,201 +17,192 @@ - Zibi Braniecki - Mozilla (ZBI) - Nicolas Bouvrette - Expedia (NIC) - -## MessageFormat Working Group Contacts: +## MessageFormat Working Group Contacts: - [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) -## Next Meeting - -July 5, 11am PST (6pm GMT) - Extended +## Next Meeting +July 5, 11am PST (6pm GMT) - Extended +### Moderator : Rômulo Cintra -### Moderator : Rômulo Cintra - RCA: Meeting is to help unblock progress on the data model. - + ## Unblocking Data model - Presentation @mihnita - -[Document link](https://docs.google.com/document/d/1kVXGMfwNKwU8QiUvUKReGapUAOhwZYaWJUAI3NW06UA/edit ) - + +[Document link](https://docs.google.com/document/d/1kVXGMfwNKwU8QiUvUKReGapUAOhwZYaWJUAI3NW06UA/edit) + MIH: We have been discussing this data model for a long time, and we are stuck because there are 2 different philosophical positions. It is like differing positions on a design principle as STA brought up a while ago. When discussing with EAO, we decided that we can reduce the differences to a couple of questions whose answers can more or less determine which model you prefer. - + The 2 questions we decided are: 1) Using tree structures in the data model, and 2) Should the data model be more restrictive and provide extension points, or be more flexible and rely more on validation and linting? - + For the first question about using tree structures, in the EZ model, there is a cycle in the "has a" field dependencies, thus creating recursion, which creates a tree structure. - + In the EM model, there is no nesting, the most complex structures are arrays and maps. - + The EM model is extended by registering functions that adhere to advertised interfaces. All functions are first-class citizens, whether provided or not, and users can bring their own functions. - -Using trees to represent messages is not how people think when they write sentences (even if linguists use trees to diagram concepts). Trees make it difficult to map these structures to XLIFF and lother l10n tools like Translation Memory, CAT tool UI, validation. - + +Using trees to represent messages is not how people think when they write sentences (even if linguists use trees to diagram concepts). Trees make it difficult to map these structures to XLIFF and lother l10n tools like Translation Memory, CAT tool UI, validation. + The second question is should the data model be more restrictive and provide extension points, or be more flexible and rely more on validation and linting. It is reminiscent of programming languages that are statically typed vs. dynamically typed. Both options for the data model are valid, it is not about right or wrong. - -To clarify, validation is about enforcing things that are defined as clearly right and wrong. Linting is not against the standard, but what is preferred. It is more complex. - + +To clarify, validation is about enforcing things that are defined as clearly right and wrong. Linting is not against the standard, but what is preferred. It is more complex. + Making the model more restrictive and asking developers to write custom functions is actually more beneficial, even for developers. - + Linting does not easily constrain error-prone flexibility. JS is an example where the problems of extreme flexibility were solved by introducing more constraints in the standard in ES5 and ES6. Relying more on the logic of a linting engine must happen on the backend (message authoring) and the client side (translators in a UI). The bottom line is that all this adds complexity. - + ## Unblocking Data model - Presentation @eemeli - + [Slides link](https://docs.google.com/presentation/d/153q1UcCgfTQBJEZpxQiRbqYLrU8clkxmRvCJVC2BQTU/edit?usp=sharing) - -EAO: A lot of the work that we have done has shown that we come from deeply different points of view. One question is whether trees can be used in the data model. This question doesn't attempt to solve everything. This acknowledges that both models are exceedingly similar. If we can get input from the wider group, we can unblock our forward progress. - + +EAO: A lot of the work that we have done has shown that we come from deeply different points of view. One question is whether trees can be used in the data model. This question doesn't attempt to solve everything. This acknowledges that both models are exceedingly similar. If we can get input from the wider group, we can unblock our forward progress. + Where the tree structure shows up in the data model is same as presented in the other doc: + ``` Message -> Pattern -> Part[] Part = string | MessageReference | … Part -> MesssageReference -> Path - + EM: path = Part[] EZ: path = string ``` - + Example of how to represent HTML spanning codes. It is admittedly harder to convert a flat structure into a tree structure, but it is easier to convert a tree structure into a flat structure. We're talking specifically about the data model, not everything about MF2 is and should be and should do. Each individual message is represented in multiple different ways (ex: Fluent representation, MessageFormat representation). - + Validation will be needed at all stages of the message pipeline, no matter what you do, to ensure that changes of a message at one end won't cause issues at the oehter end. - + Our expectations are for actual humans to not work directly with the dat amodle, but with other message representations. - + If tree structures are used, there is greater ability to be flexible. More complex transformations between the data model and other message representations. Possible to represent messages that are not exactly or easily representable in other forms or systems. Future use cases are likely to be already supported by the data model. - + The other question of should the data model be able to represent all possible messages. Restrictions on the EM data model make some messages impossible to represent in MessageFormat 2. Validation/linting will be required in all cases anyways. Are some ideas so bad that we need to make them impossible, or can we rely on recommended practice and linting? - + What sort of workflows do we want to support. A flexible data model may make it easier for a message to change how it uses runtime variables without changing the code. See the example of changing the date range formatting function to be more precise. - + How do we prepare for unanticipated uses? A flexible model allows us to handle future changes to the standard more gracefully. - - - + ## Q & A - + RCA: Let's have Q&A right now, and reserve the last 10 minutes to revisit the question and come up with group conclusions. - -STA: My first question is about the tree path. We are not trying to address variants under variants, or selectors under selectors. We are just trying to address the path - + +STA: My first question is about the tree path. We are not trying to address variants under variants, or selectors under selectors. We are just trying to address the path + MIH: Parts of the path can be functions themselves. - + STA: The evaluation of a value can be recursive, right? - + MIH: Right. - + STA: I'm not clear on the difference between hard limits and validation, in both of these presentations. - + EAO: A message reference can take a message reference as an input and a string literal as another input. This can be represented in the EZ model but only supportable in a hacky way in the EM model. This creates a hard limit on the messages that can be represented in the models. - + MIH: I wanted to yell “objection, your honor!” when it was suggested that there was a message that is impossible using our proposal...I haven’t found such an example yet. You can represent certain things in the tree as-is, but other things will need a function as a developer. - + There is nothing that I saw in the models that couldn't be represented in the other model. Even with functions, you can create a new function in the EM model and register it in the registry. This can support any new requirement in the future -- just write a new function. - + STA: This is helpful. One model has different types to represent functions, but the other model just takes strings and you just define a new function that are keyed by those strings. - -ZBI: I have a question for the EM model. IIUC, there are 3 levels of rejections, so to say. The first level is what can be expressed in the data model. If the data model doesn't represent a concept, then you can pass thing - + +ZBI: I have a question for the EM model. IIUC, there are 3 levels of rejections, so to say. The first level is what can be expressed in the data model. If the data model doesn't represent a concept, then you can pass thing + The second thing is what you call validation. It's analogous to saying that something doesn't conform to EBNF, and so it is proper. MIH, you talk about linting. Are what you saying is that with MF 2.0, you hope to make linting optional because validation occurs beforehand? - + MIH: I will rephrase it my own way. Yes, I am saying that linting should be optional. - + ZBI: Are you saying that the EZ model will make linting mandatory? - + MIH: Yes. - + ZBI: I don't agree, but thank you for clarifying. - + GRH: I see benefits on both sides. In one way, you have flat rules and you have to expand for the entire sentence. The other side is described as trees, but I don't agree with that description, it's more about word relationships. If you have the preposition "on", the verb and gender can change based on aspects of the rest of the sentence. When it comes to word relationships, when you add more variables to a sentence, can you represent them all easily? The flat one might be duplicating more. I think if you go with a more segmented model, which may be more like the tree model, might be beneficial. When you define a segment, you can specify the grammatical case, or implicitly inject a grammatical case of the target language. Implicit vs explicit wasn't talked about. If something is not valid, it should be validated. The final thing is that it's not tree-based. If you get a quantity, there's a number and a noun. There is a grammatical case of the noun, but the case is affected by the number. But there are the word relationships, and it's more like a graph, and it's not clear in the proposals, but I support extension in some way. - + MIH: I think that what GRH touched on here is core. When linguists deal with a sentence a tree doesn't map to a tree in MF. The EZ uses a tree model for the HTML snippet with bold and italic markup tags don't match to a tree. What GRH talked about with word relationships across a sentence is more a graph, which is also not a tree. - + GRH: That's correct. It's a graph, hopefully a DAG. - -EAO: This is a [link to the data model](https://github.com/messageformat/messageformat/blob/mf2/packages/messageformat/src/mf2-features.test.ts#L434) that does transformation of __ to ___. You need to format-to-parts. You end up with structured data, and then you can apply a transformation. I didn't experience any problem with dealing with a graph. Flattening a graph should be really easy. - + +EAO: This is a [link to the data model](https://github.com/messageformat/messageformat/blob/mf2/packages/messageformat/src/mf2-features.test.ts#L434) that does transformation of ** to \_**. You need to format-to-parts. You end up with structured data, and then you can apply a transformation. I didn't experience any problem with dealing with a graph. Flattening a graph should be really easy. + LHS: I want to thank the presenters and other members of their teams. This is the most clarity on what the dispute is. But I do want to propose some homework for the 4 of you. There are some fundamental disagreements on basic facts. When we try to answer questions of trees or not, linting or not, I can't have an opinion if we don't agree on basic facts. Can the 4 of you agree on some real examples that cannot be shown in the EM model. Maybe we can't get clarity in this specific meeting, but maybe the 4 of you can. - + EAO: This [link](https://github.com/messageformat/messageformat/blob/mf2/packages/messageformat/src/mf2-features.test.ts#L50) gives a dynamic variable reference that I have posted on previous occasions that cannot be represented in the EM model. - + LHS: Ideally, the 4 of you need to agree on the same examples. There seems to be a fundamental difference on basic facts. Without that agreement, it's hard for the rest of us to weigh in. - -RCA: I think we stressed the models in the last few weeks to make this effort. I think I understand it. Do we agree to continue in this manner, or do we try to - + +RCA: I think we stressed the models in the last few weeks to make this effort. I think I understand it. Do we agree to continue in this manner, or do we try to + MIH: A few weeks ago, we came up with a list of features to implement in both models and compare. For what is called "dynamic message references", I implemented it, EAO says that I didn't. - + LHS: MIH your model leans on functions for flexibility rather than the model itself. How custom are these custom functions? Will they be shared across the industry? Will there be a core of functions? Is there the risk that the custom functions allow the users to do the bad things you hope to prevent? - + MIH: The way I see it is that the custom functions being created by companies internally, and then contributing upstream to an open registry to be shared across the industry. - + LHS: ANd how about allowing people to do bad things? - + MIH: When you let users write their own functions, yes, they can. Once you write some bad functions and use it for a few target languages, you go back and fix teh function. But if you make mistakes in the tree model, then you have to go back and fix the structure of the messages themselves. - + ECH: When I look at the EZ model, when you look at [link](https://github.com/unicode-org/message-format-wg/blob/experiments/experiments/data_model/rust_eemeli/src/main.rs#L51) you resolve things, you go through the message structure, try to interpret a function (“this is a function”), line ~51: dispatch of the function is manual, if not a string literal, probably a function, what’s interesting: way to clean this up without hard-coded dispatch is an interface (or “trait”), if you do clean it up that way, what would the interface look like in the EZ model, since you don’t know how many arguments/types are in a function → that’s the tradeoff, it might come in the implementation. - + On [slide 10 of Eemeli’s presentation](https://docs.google.com/presentation/d/153q1UcCgfTQBJEZpxQiRbqYLrU8clkxmRvCJVC2BQTU/edit?pli=1#slide=id.ge0d6495764_0_6), we have an example of flexibility...that’s the plus side, but the tradeoff is in the coding/implementation - + EAO: I just posted the JS runtime interface: https://github.com/messageformat/messageformat/blob/mf2/packages/messageformat/src/data-model.ts#L211 (don’t look at the Rust, that was the first time I coded in Rust!) ...here you can see a map, with each of values being fixed strings, with sequence of arguments - + ECH: That doesn’t satisfy me...it’s not really an interface to me - + MIH: a comment about the dispatch idea, I don’t expect the “final code” to hardcode function names, you’d look them up somewhere...that’s easy in our model since the function names are strings, but in EZ it’s a path-array, so I’d have to look up that - + EAO: Function references are strings, see line 141 (func is a string) - + MIH: Not in the data model though? - + EAO: This is exactly the data model. - + ECH: 2 things in 1 concept: there’s this idea that we need to design flexibility for “things we don’t know”, that’s not how we do things in programming, and it puts the burden of proof on predicting all future requirements...libraries, apps, programming languages start from a small core, you jealously guard against feature creep; only for web frameworks do you want to have “everything but the kitchen sink” since you don’t want to force everyone to migrate to the next version - -ZIB: we are not designing something that can *ever* be changed, so this is a big burden on us...if we’re designing a library that we can explore in production & then throw it away later, replace Angular and React, this is different: this is an incredible burden on us - + +ZIB: we are not designing something that can _ever_ be changed, so this is a big burden on us...if we’re designing a library that we can explore in production & then throw it away later, replace Angular and React, this is different: this is an incredible burden on us + [in chat: “Because we're putting it all into a network of standards that will define l10n industry for the next decade, I would say that a scenario in which we deploy MF2.0, standardize it and for any reason the World needs MF3.0 within a decade means we produced more negative effect on the industry than if we failed to create MF2.0 at all.”] - + DAF: I think we've gone quite far in agreeing on the facts. There is the question of whether the EM model can support the previous use case. It can, it just needs a selector to make that possible. The last part where the 4 folks don't agree, on the topic of whether linting is required in the EZ model, I think it is, but we don't have time in the meeting, so we can disagree. - -EAO: To remind, we are focusing on the data model. ___ - + +EAO: To remind, we are focusing on the data model. \_\_\_ + DAF: On the last point on trees or graphs in the data model, when you allow for hierarchical structures in the data model, that is one thing. If you always want to be linear in the canonical syntax, the lesson learned from the XLIFF group to have a XLIFF model that is not tied to XML syntax, we found the linear model was good enough and it was easier. - + NEB: One thing about what DAF and EAO said about the structure and the linting, yes you can design the syntax to be limited for linting, but will all users follow that? But the bigger point is that building something that is "10-20 years proof" is a hard thing to do. You don't have to design support for all of the placeholders that Siri, Google Assistant, and Alexa support. It was not my expectation at the beginning of this project that we cover all corner cases of messages, it is not reasonable for translators to cover such cases. I haven't heard from MIH and EAO about examples which the other model cannot support. The localization industry is slow to change and has a lot of money invested, so they will not want to change their processes to handle changes to the data model. - + RCA: To wrap up and finish, as an action, the Chair Group can create a plan for the group of 4 to get together. - - - - - + fun1($var_ref) fun2(msg_ref) fun2(other($var_ref)) - + other_and_fun2($var_ref) - + [“f1”, “f2”, “f3”], map_of_arguments f1(arg) f2(arg) f3(arg) - + map_of_arguments global; f1() f2() f3() - + args: {“theVar”: $var_ref} - + [other, func2] fun2(other($var_ref)) really(not_possible_in_EM($var1, $var2), “other input”) - + EM: any temp = not_possible_in_EM($var1, $var2) really(temp, “other input”) - + EZ: really([¬_possible_in_EM, $var, $var2], “other input”) - diff --git a/meetings/2021/notes-2021-07-12-extended.md b/meetings/2021/notes-2021-07-12-extended.md index 4922fe9a80..b2169a78a0 100644 --- a/meetings/2021/notes-2021-07-12-extended.md +++ b/meetings/2021/notes-2021-07-12-extended.md @@ -1,6 +1,7 @@ [Automatic Full Transcription](https://docs.google.com/document/d/11pKp5NAaBlgH3Hi0wZDctPv-IOv35PIGOTJgpjJdC0c/edit?usp=sharing) ### July 12, meeting Attendees + - Romulo Cintra - CaixaBank (RCA) - Richard Gibson - OpenJSF (RGN) - Staś Małolepszy - Google (STA) @@ -8,14 +9,12 @@ - Daniel Minor - Mozilla (DLM) - Nicolas Bouvrette - Expedia (NIC) -## MessageFormat Working Group Contacts: +## MessageFormat Working Group Contacts: - [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) -## Next Meeting - -July 19, 11am PST (6pm GMT) - Extended +## Next Meeting -### Moderator : Romulo Cintra - +July 19, 11am PST (6pm GMT) - Extended +### Moderator : Romulo Cintra diff --git a/meetings/2021/notes-2021-07-19.md b/meetings/2021/notes-2021-07-19.md index 13f41d5dc0..9880f853a6 100644 --- a/meetings/2021/notes-2021-07-19.md +++ b/meetings/2021/notes-2021-07-19.md @@ -1,7 +1,7 @@ [Automatic Full Transcription](https://docs.google.com/document/d/1Wn0QNcUCpka3sEgJ3Ak1pTShU7rqRspWz7NTDuy-PG0/edit?usp=sharing) - ### July 19, meeting Attendees + - George Rhoten - Apple (GWR) - Daniel Minor - Mozilla (DLM) - Nicolas Bouvrette - Expedia (NIC) @@ -12,27 +12,23 @@ - David Filip - XLIFF TC, Huawei (DAF) - Eemeli Aro - OpenJSF (EAO) - -## MessageFormat Working Group Contacts: +## MessageFormat Working Group Contacts: - [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) -## Next Meeting +## Next Meeting -August 16, 11am PST (6pm GMT) +August 16, 11am PST (6pm GMT) + +### Moderator : Romulo Cintra -### Moderator : Romulo Cintra - - Reference WRT rendering in translation environments Filip, D., Husarčík, J., 2018. Modification and Rendering in Context of a Comprehensive Standards Based L10n Architecture. Proceedings ASLING Translating and the Computer, Translating and the Computer 40, 95–112. https://www.asling.org/tc40/wp-content/uploads/TC40-Proceedings.pdf#page=103 - - + Room temperature check, Apache style vote (interval from -1 to +1) Composition (EZ) -1 Encapsulation (EM) +1 - RCA: 0 EAO: -1 @@ -42,49 +38,50 @@ STA: +0.9 GWR: -1 DLM: -0.6 ZBI: -0.8 - + =============== + ``` - + EZ: f1(f2(arg)) - + EM: f2(arg); // can change things in “arg” f1(arg) - + arg1 = f2(arg); f1(arg1) - + --- - + The {$item} is {$color, {gender:’item.gender’}. The {$item} is {ADJECTIVE(id: $color, accord_with: $item)} - + format(msg, { item: FluentMessageReference(“tshirt”), color: FluentMessageReference(“red”) }); - + German tshirt=T-Shirt grammatical_gender=fem - + Romanian tshirt=tricou gender=mas - + red={color,gender, fem {rosie} mas {rosu}} - + --- - + number_fmt(value: number, precision: number) -> string number_fmt($speed, 3) number_fmt($speed, $custom_precision) - - + + Elango / Mihai data model: { parts: [ @@ -103,5 +100,5 @@ Elango / Mihai data model: } ] } - - ``` + +``` diff --git a/meetings/2021/notes-2021-08-16.md b/meetings/2021/notes-2021-08-16.md index 1238a8e700..48e289a353 100644 --- a/meetings/2021/notes-2021-08-16.md +++ b/meetings/2021/notes-2021-08-16.md @@ -1,7 +1,7 @@ [Automatic Full Transcription](https://docs.google.com/document/d/1JHCA3QVkf3UATAph0PJpX2Rp6kaW8JYZonzaPK0gE88/edit?usp=sharing) - ### August 16, meeting Attendees + - Romulo Cintra (RCA) - George Rhoten - Apple (GWR) - Shane Carr - Google (SFC) @@ -13,19 +13,18 @@ - David Filip - Huawei, OASIS XLIFF TC (DAF) - Daniel Minor - Mozilla (DLM) - -## MessageFormat Working Group Contacts: +## MessageFormat Working Group Contacts: - [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) -## Next Meeting +## Next Meeting + +September 20, 11am PST (6pm GMT) -September 20, 11am PST (6pm GMT) +### Moderator : Romulo Cintra + +### Agenda -### Moderator : Romulo Cintra - -### Agenda - More on function composition , Data Model — presentation by @stasm #186 - @zbraniecki #181 Cont. & Data & Execution Model Differences Overview @@ -54,7 +53,7 @@ GWR: I'm okay with the idea of a generic number type, but not a string. STA: Thanks for raising this; we can discuss this further. -GWR: When it comes to this syntax, is this something developers see? Or are we using XML, … ? +GWR: When it comes to this syntax, is this something developers see? Or are we using XML, … ? STA: This is only example syntax. I think the syntax should be orthogonal to what we talk about here. We just need a way to give a function name, array of args, and record of options. @@ -79,25 +78,3 @@ Mihai: I won't vote, because what is shown on the screen (the slide "FuncCall in ZB: I didn't vote because I feel that we should have more time to look at it. We should vote next month. ### Zibi's presentation from Issue #186 - - - - - - - - - - - - - - - - - - - - - - diff --git a/meetings/2021/notes-2021-12-20.md b/meetings/2021/notes-2021-12-20.md index 068fad0c78..3ef749faf3 100644 --- a/meetings/2021/notes-2021-12-20.md +++ b/meetings/2021/notes-2021-12-20.md @@ -1,7 +1,9 @@ -### Transcription +### Transcription + https://docs.google.com/document/d/199IKfdZY1ixgH12OZQ0Wl7N7NGwG_tFGd8LsAWDvMGU/edit ### December 20, meeting Attendees + - Romulo Cintra (RCA) Igalia - David Filip - XLIFF TC, Huawei (DAF) - Daniel Minor - Mozilla (DLM) @@ -14,31 +16,26 @@ https://docs.google.com/document/d/199IKfdZY1ixgH12OZQ0Wl7N7NGwG_tFGd8LsAWDvMGU/ - Richard Gibson - OpenJSF (RGN) - Staś Małolepszy - Google (STA) - - - -## MessageFormat Working Group Contacts: +## MessageFormat Working Group Contacts: - [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) -## Next Meeting +## Next Meeting -January 17 +January 17 +### Moderator : Rômulo Cintra +### Agenda -### Moderator : Rômulo Cintra - -### Agenda Message pattern elements #212 - Design Doc. presentation -ECMA-402 Proposal Stage 1 #213 - Choose a "Champion" +ECMA-402 Proposal Stage 1 #213 - Choose a "Champion" Case Selection - Design Document #208 - considered alternatives - - + # Message pattern elements #212 - Design Doc. presentation - + EAO : Presenting [Slides]() - + RCA: isn’t this similar to discussion we had re: functions? EAO: it could be similar to Display Elements/Attributes…as seen on slide, Function References could have named & positional arguments @@ -71,7 +68,7 @@ EAO: the utility is that you can represent, in the data model, a literal part pl MIH: how do I get from that to a full string? how does it work? how would I do dates? -EAO: this is beyond the scope of this design document, but if we get an input value like a date or number, we should be able to wrap it in some wrapper, like “formattable”, which allows one to “stringify” the date in a locale-appropriate manner…once we get to a formatting part, then +EAO: this is beyond the scope of this design document, but if we get an input value like a date or number, we should be able to wrap it in some wrapper, like “formattable”, which allows one to “stringify” the date in a locale-appropriate manner…once we get to a formatting part, then MIH: isn’t a “formattable thing” a function in the end? @@ -101,7 +98,7 @@ ZBI: something I was trying to capture in my design doc comments, maybe I should ZBI: the rethinking of my mental model has been growing over the year: I believe we are not trying to build a system that produces a string, we are building a system that produces a semantic list of parts of a message, and it’s only an artifact that it can be stringified -ZBI: I remember some time ago that “no, we don’t have to produce a string”, I think we’re producing a list of parts, so there’s a value in producing a list of 3 parts, literal/date/literal, even if we don’t know how to stringify it, since unless we are producing a string (which is only 1 side effect) we might want to give a set of these to give to React, DOM, Android, etc and then the system might show it in its own way (e.g. in a rich UI clock of a date) +ZBI: I remember some time ago that “no, we don’t have to produce a string”, I think we’re producing a list of parts, so there’s a value in producing a list of 3 parts, literal/date/literal, even if we don’t know how to stringify it, since unless we are producing a string (which is only 1 side effect) we might want to give a set of these to give to React, DOM, Android, etc and then the system might show it in its own way (e.g. in a rich UI clock of a date) ZBI: what we currently call “formattable”, I might call “stringifyable”, this might be useful only when you want to product a string, which is a side effect, so to answer MIH’s question about “how are you going ot stringify a date without formatting information?” I might not be able to stringify it, similar to if you get a rich user object, you should be able to return parts like literal/object/literal…so stringification is outside the scope of this conversation, MIH says “eventually we have to stringify” but there’s a value to MF to format into parts…then you can ask a question “how do you stringify” and that would be a separate design doc…we might even mandate that people add a stringification method, but I don’t think it’s the only/main/default use case of this system @@ -123,13 +120,13 @@ MIH: what about “use shorter strings” or “user is a male or female” or MIH: I agree with ZBI that we should be able to return something that’s not a string, but I disagree that we don’t need to worry about it until later, since eventually we’ll end up with a data model in memory and then you still have to tell that thing how to format the date (e.g. “don’t show the time part”), even if you pass it to the DOM and say “somebody else is going to render it”, you still need a way to give instructions about how to render it -EAO: if I’ve understood correctly, you concern is that the recommendation says “without formatting options”, if it’s at the top level of a resolved message, it may not be defined…but on this slide it doesn’t say that these pattern elements shoudl be usable direclty in the message body, this is defining the whole set of pattern elements, but separately we might decide what goes into the message *body*. For example, a variable reference could be placed directly in a message body, or might always need a formatting function reference +EAO: if I’ve understood correctly, you concern is that the recommendation says “without formatting options”, if it’s at the top level of a resolved message, it may not be defined…but on this slide it doesn’t say that these pattern elements shoudl be usable direclty in the message body, this is defining the whole set of pattern elements, but separately we might decide what goes into the message _body_. For example, a variable reference could be placed directly in a message body, or might always need a formatting function reference MIH: what bothers me is that we take a chunk of something from the middle and we don’t give what’s before and after, for example what the data model is before/after processing…? this isn’t in the original message, it’s after some processing, then we don’t show the final endpoint, and then say “I think these pieces are required” RCA: I think we should go through the doc again, address some of the concerns -LHS: Mihai it sounds like you're asking for an end-to-end example or two? (from "translation"/"localized message" to final user display) +LHS: Mihai it sounds like you're asking for an end-to-end example or two? (from "translation"/"localized message" to final user display) MIH: I’m looking for a big-picture design doc, it’s hard to give feedback on this design doc without the context, like seeing the middle of a movie but not the beginning or end @@ -151,8 +148,6 @@ RCA: volunteers? EAO: I’ll do it (will open an issue) - - [Separate discussion over chat:] LHS: Am I correct in assuming that the "Variable Reference without formatting options" would be useful for things like placeholders that ~never need formatting/inflection, versus dates (which arguably need formatting)? @@ -171,16 +166,14 @@ if user is an object of type Person, with first/last name, date of birth, etc ZBI: if it requiers inflection, it shouldn't be brought via variable directly, but rather some form of `FORMATTING_FUNCTION("declense", VARIABLE("userName"))` MIH: how do you go from there to "Steve" without a function, or parameters saying "use polite form" - - + ### Conclusion - + Before we move further into a decision about this topic we will create a document that shows all the e2e process of runtime and how this proposal fits in. That way we can have a more global picture about the environment. - + [ ] AP : Create issue and document about the runtime e2e workflow - - -# ECMA-402 Proposal Stage 1 #213 - Choose a "Champion" + +# ECMA-402 Proposal Stage 1 #213 - Choose a "Champion" RCA: Anyone willing to represent the group of ECMA-402? @@ -188,13 +181,13 @@ EAO: I volunteer Conclusion -Champions are EAO and DLM , in the next plenary meeting we will ask if someone wants to be co-champion of this proposal. +Champions are EAO and DLM , in the next plenary meeting we will ask if someone wants to be co-champion of this proposal. RGN its a possible reviewer on the ECMA-402 [ ] AP : Create issue to track the Proposal process - -# Case Selection - Design Document #208 + +# Case Selection - Design Document #208 MIH: I’m not sure what the goal is, is this group sufficient to make decisions? Design doc was shared a while ago…we don’t have representation from folks like George, Nicolas, etc. Right now we have a narrow representation (mostly from 2 companies), I don’t think we have enough diversity of representation. That being said, I’m open to talk about it (just not vote on it) @@ -238,7 +231,7 @@ STA: take plurals for example, we don’t need default fallback on plurals EAO: I agree with STA, there should be examples where a message fails and the calling function handles the rest -LHS: STA are you arguing that we *never* need a default selection or that it shouldn't be *required*? +LHS: STA are you arguing that we _never_ need a default selection or that it shouldn't be _required_? STA: I’m trying to convince myself that we never need it…but I don’t know if I’m convinced! @@ -263,8 +256,6 @@ EAO: more like 2 years for the WG RCA: was useful to get some action points for the next meeting - - #### Chat 18:32:13 - You: @@ -313,4 +304,3 @@ Mihai it sounds like you're asking for an end-to-end example or two? (from "tran 19:34:04 - Mihai Nita: not quite - diff --git a/meetings/2022/notes-2022-01-31.md b/meetings/2022/notes-2022-01-31.md index 3579efb8f5..b984516d0a 100644 --- a/meetings/2022/notes-2022-01-31.md +++ b/meetings/2022/notes-2022-01-31.md @@ -1,10 +1,10 @@ +### Transcription -### Transcription - https://docs.google.com/document/d/1zKmJie3RL4tK2Za2QT5kYOqnwjSSY-XSLYNB_xdDO1Y/edit - https://docs.google.com/document/d/1fTqan70R_0TA1BOwMdRsMOElRgLV0hR22t7AnByQfCk/edit - ### January 31, meeting Attendees + - Romulo Cintra (RCA) Igalia - Daniel Minor - Mozilla (DLM) - Matjaž Horvat - Mozilla (MAT) @@ -16,170 +16,160 @@ - Mihai Niță - Google (MIH) - Daniel Minor - Mozilla (DLM) - Staś Małolepszy - Google (STA) -- Batjargal Batbold (BAT) (*** check if acronym not already claimed ****) +- Batjargal Batbold (BAT) (**\* check if acronym not already claimed \*\***) - Richard Gibson - OpenJSF (RGN) - Zibi Braniecki - Amazon (ZBI) - Luke Swartz - Google (LHS) - Shane Carr - Google (SFC) - George Rhoten - Apple (GWR) - -## MessageFormat Working Group Contacts: +## MessageFormat Working Group Contacts: - [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) -## Next Meeting +## Next Meeting Feb 21, 2022 -### Moderator : Rômulo Cintra - -### Agenda - -- Working Group Progress Status -(CLDR -TC proposal(s) evaluation) -- #214 Runtime behaviour (Document it’s a follow up of Message pattern elements #212 - Design Doc. presentation ) +### Moderator : Rômulo Cintra + +### Agenda + +- Working Group Progress Status -(CLDR -TC proposal(s) evaluation) +- #214 Runtime behaviour (Document it’s a follow up of Message pattern elements #212 - Design Doc. presentation ) - Case Selection - Design Document #208 - Review/Feedback and vote to decide among considered alternative - + ## Working Group Progress Status - (CLDR-TC proposal(s) evaluation) - + RCA: Does anyone have anything to say about the email above that was sent to the group? - + MIH: My reading of the email is that if there are two, three, or however many proposals, that we present all of them to the CLDR-TC. - + RCA: They mentioned about start aligning to an ICU preview, so there should be more than a spec, no? - + GWR: I still work in ICU. Personally, I don’t think that any external dependencies are available in ICU. ICU only supports Java and C++. Or maybe ICU4X, whatever that programming language it uses. So this technology preview current implementations are incompatible with whatever is in ICU because they need JSON parsers or ECMAScript, etc. So something separate from ICU might be possible, and I think that should be made aware to the people receiving the proposals. - -EAO: I don’t think I’m not sure what the benefit of talking about this now. - + +EAO: I don’t think I’m not sure what the benefit of talking about this now. + STA: Do you know who will be making the decision after this meeting? - -ZBI: I can answer. We are. CLDR will provide feedback. They may include strongly worded warnings about what won’t be compatible with existing work. It may be strongly worded recommendations of what should be included. I think that’s a fairly optimistic - + +ZBI: I can answer. We are. CLDR will provide feedback. They may include strongly worded warnings about what won’t be compatible with existing work. It may be strongly worded recommendations of what should be included. I think that’s a fairly optimistic + ECH: +1 to what EAO said. - + MIH: They don’t expect us to unify the documents / proposals before sending them to the committees. - + ZBI: Right, we send what we have, and then we synthesize a design based on their feedback. - + EAO: Let’s have 10 mins each for each proposal. - + RCA: Please use [this form](https://forms.gle/fmB1BAL8edgBaDnQA) to submit feedback to the Unicode Consortium about your participation in this group. - -### Stas proposal - + +### Stas proposal + [link to slide deck](https://docs.google.com/presentation/d/1VHz4rjoX8OGz8dHTchuEzVVt1rrDMov3FMyLgO5PsTY/edit?usp=sharing) - -STA: If you’re heard me before, you know I talk about simplicity. I’ve homed in on 3 principles – compatibility, embeddability, and predictability. Compatibility – I would like vendors and workflows to adapt to the new standard. Embeddability – it should be low-level and agnostic to how and where they’re defined and stored. Predictability – the syntax should be simple for non-technical translators, although I don’t know if all of the syntax should be simple since we expect them to be using GUI CAT tools. - -The goal is to define the lower bound of complexity. I feel like whatever we add on top of this should be scrutinized for whether we need it, and if we can achieve it in other ways, we should lean towards excluding it. - + +STA: If you’re heard me before, you know I talk about simplicity. I’ve homed in on 3 principles – compatibility, embeddability, and predictability. Compatibility – I would like vendors and workflows to adapt to the new standard. Embeddability – it should be low-level and agnostic to how and where they’re defined and stored. Predictability – the syntax should be simple for non-technical translators, although I don’t know if all of the syntax should be simple since we expect them to be using GUI CAT tools. + +The goal is to define the lower bound of complexity. I feel like whatever we add on top of this should be scrutinized for whether we need it, and if we can achieve it in other ways, we should lean towards excluding it. + Focus on the message, not the container format. A consequence of that is that there are no built-in message references. I know we agreed to it, but I think it’s possible to allow all of the previous use cases without them. - -I think there should be a dedicated type for a variable. - + +I think there should be a dedicated type for a variable. + If you don’t have selectors in the message data models for a message is still the same, have always a variants map being same that message don’t have selectors, this allows to go from a single pattern message in search language like English to a multivariant message in target languages. - + No nesting of patterns or expressions, so you can’t call a function after another. But there are let-bindings (“alises”) that allow indirections. Also, aliases to sub-messages (“phrases”). - + Syntax – whitespace insignificant, designed to avoid double and single quotes as much as possible. It is reminiscent of MessageFormat v1, which is intentional. No point in reinventing the wheel. - -I use square brackets to delimit translatable content. The functional call syntax puts the argument first and the function second. An underscore is used for the default (“catch-all”) variant key. - - -Functions can have multiple signatures. So this one accepts either article, plural and case options, or article and accord. This is still WIP. signatures can be local specific. So these two are English specific. But for example, Polish expects many more. Grammatical cases and doesn't have articles. So we will have a completely different set of params defined in the registry. - + +I use square brackets to delimit translatable content. The functional call syntax puts the argument first and the function second. An underscore is used for the default (“catch-all”) variant key. + +Functions can have multiple signatures. So this one accepts either article, plural and case options, or article and accord. This is still WIP. signatures can be local specific. So these two are English specific. But for example, Polish expects many more. Grammatical cases and doesn't have articles. So we will have a completely different set of params defined in the registry. + With custom functions, we can do a lot of things. Query the environment of the user. We can implement message references. And there are drawbacks to doing that through functions. It’s okay to leave this all to user-land. For UI elements, ZBI and EAO have been toying with markup, but I have left them to custom functions. I haven’t thought about what kind of metadata we can store. But maybe we can leave that to custom functions. - - -Blind spots: interoperability with XLIFF, doc comments, metadata. Need feedback: escaping (message or container?), doc comments, aliases to sub-messages. - - - - + +Blind spots: interoperability with XLIFF, doc comments, metadata. Need feedback: escaping (message or container?), doc comments, aliases to sub-messages. + ### EZ proposal - + [link to slides](https://docs.google.com/presentation/d/1Tzz5gBH8t-xXTH8UXaC6f7-4v_--2mJhTlcYOoVaSHA/edit#slide=id.p) - -EAO: - + +EAO: + Contents: syntax, data model, formatting behavior, message selection, EBNF, DOM localization proposal. - -Missing pieces: function registry, XLIFF 2 Module, test suite. Also, doesn’t include a converter from MF1 messages to new MF2 messages, and any “unknown-unknown”s. - -Syntax is a little flipped from Stas model in regards to use of square brackets. The syntax is meant to allow single line messages to incorporate all parts of the message and be embedded. An asterisk is used in select messages for selector variables. Markup tags are parsed into separate elements in the model, but when they are merged into the target document, they will be formatted according to that format of the target document. - - - - + +Missing pieces: function registry, XLIFF 2 Module, test suite. Also, doesn’t include a converter from MF1 messages to new MF2 messages, and any “unknown-unknown”s. + +Syntax is a little flipped from Stas model in regards to use of square brackets. The syntax is meant to allow single line messages to incorporate all parts of the message and be embedded. An asterisk is used in select messages for selector variables. Markup tags are parsed into separate elements in the model, but when they are merged into the target document, they will be formatted according to that format of the target document. + ### EM proposal - + [link to design doc](https://docs.google.com/document/d/1kqD0gy5x1mfiF2PAegjcNCAc98snTAqtbxccxfLcpNo/edit#heading=h.1bnew4gwuonh) - -MIH: I can provide a link to the design document, which is a long document. But I don’t have any presentation made beforehand. The meeting agenda ahead of time did not specify that we would be discussing the 3 proposals. I will try my best. - + +MIH: I can provide a link to the design document, which is a long document. But I don’t have any presentation made beforehand. The meeting agenda ahead of time did not specify that we would be discussing the 3 proposals. I will try my best. + LHS: MIH, can you touch on the points of your proposal that differ from the previous 2 proposals? - -MIH: Well, I will skip the syntax discussion because I think that is the least important. There is some background information for the benefit of the CLDR-TC because they haven’t been involved in discussions like we have. - + +MIH: Well, I will skip the syntax discussion because I think that is the least important. There is some background information for the benefit of the CLDR-TC because they haven’t been involved in discussions like we have. + One difference between this and STA’s model is about simple messages. Although that could be a minor difference, and I will need to think about it more. - + I don’t know what concepts are new to this group. - + RCA: Maybe you could jump to the points that differ? - + MIH: There are notions of open/close/standalone placeholders. But this is not just for HTML, it could also work for text-to-speech uses. This notion of open/close/standalone has been used in XLIFF for a long time, and has been used successfully for all other kinds other prevalent file formats like MS Word, Powerpoint, Photoshop, InDesign, etc. - -Another difference that it’s not highlighted it’s how to deal with Case Selection, This is one of the biggest questions , should we use 1st match or best match ? - + +Another difference that it’s not highlighted it’s how to deal with Case Selection, This is one of the biggest questions , should we use 1st match or best match ? + I have a position regarding this, and it’s to do as ICU does, using the best match. Also i have “macros” that I think in the other proposals they introduce something similar called “alias”. Both are similar concepts that are defined and can be used in the message itself. I can compare it as Local vs Global variables. They can be used for the situation of combinatorial explosion of variants for selection messages that have multiple selector arguments. - - -In the very beginning we decided to do message full level and for specific combinatorial “explosions” we use this references to pieces of message being part of same message , they are locally defined. - -Translation metadata also can include comments for translators. Any relevant data that can be included in the message. I don’t already have a way to add it to the message itself just because it can be too much content and might become difficult to read. But it can be placed alongside the message. - + +In the very beginning we decided to do message full level and for specific combinatorial “explosions” we use this references to pieces of message being part of same message , they are locally defined. + +Translation metadata also can include comments for translators. Any relevant data that can be included in the message. I don’t already have a way to add it to the message itself just because it can be too much content and might become difficult to read. But it can be placed alongside the message. + But yeah, the core differences are the manner of case selection in a selection message. - + RCA: Thanks for presenting that without prior notice. - + ### Discussion / feedback - + RCA: Perhaps we can allocate 10 minutes to discuss the 3 proposals? - + EAO: There is so much to discuss that I don’t think 10 minutes does the topic justice. Also, can we increase the frequency of meeting so that we meet every week, given the amount of information to discuss? - + RCA: Let’s continue with the 10 minutes for discussion, we can adjust the time to 20 minutes if we need it, but we should also make sure we have time to discuss the other items of the agenda – runtime behavior and case selection. - -EAO: Maybe we can identify topics to take up - -STA : Function registry it’s one of the topics we should focus , understand the expectation , who has access to date , where date lives + +EAO: Maybe we can identify topics to take up + +STA : Function registry it’s one of the topics we should focus , understand the expectation , who has access to date , where date lives [ ] AP : Function registry i - + MIH: I imagine the registry is something hosted by Unicode and would follow the Unicode processes to —-. - -GWR: Function registry I feel also a good discussion for later time , STA proposal it’s aligned with our vision . + STA proposal - + +GWR: Function registry I feel also a good discussion for later time , STA proposal it’s aligned with our vision . + STA proposal + EAO: In addition to the general discussion on aliases and macros, the one difference between the proposals, my has aliases disallow translatable content. [ ] AP : Macros vs “Alias” - -ECH: It’s been a while since I’ve taken a look at proposals until recently. I’m sure there’s been work on them as well. Syntax aside, what are the main differences between the STA and EM proposal? We don’t have a whole lot of time, but I’d like to hear your answer, because it seems to me that they’ve converged quite a bit. - -STA: I think they’re actually quite close. There are a few differences, for example variables being a built-in type. Messages always being a map rather than a single pattern. I had a good conversation with MIH about case selection. But I went with the “first wins” selection strategy, which allows for predictability / determinism. I generally think that they are very close. I even hesitated whether or not I should present mine or not. - -EAO: One aspect you can cover at some point is how to handle display or mock-up elements that separate from _ that have a start and end tag. In STA they more fuzzily have a start and an end. From what I understand in the EM proposal, they need a placeholder for a start and an end. We ought to start considering the need for such a thing. - + +ECH: It’s been a while since I’ve taken a look at proposals until recently. I’m sure there’s been work on them as well. Syntax aside, what are the main differences between the STA and EM proposal? We don’t have a whole lot of time, but I’d like to hear your answer, because it seems to me that they’ve converged quite a bit. + +STA: I think they’re actually quite close. There are a few differences, for example variables being a built-in type. Messages always being a map rather than a single pattern. I had a good conversation with MIH about case selection. But I went with the “first wins” selection strategy, which allows for predictability / determinism. I generally think that they are very close. I even hesitated whether or not I should present mine or not. + +EAO: One aspect you can cover at some point is how to handle display or mock-up elements that separate from \_ that have a start and end tag. In STA they more fuzzily have a start and an end. From what I understand in the EM proposal, they need a placeholder for a start and an end. We ought to start considering the need for such a thing. + MAT: This might have been covered at previous meetings. STA, I liked that you called compatibility with existing l10n workflows as a design principle. I think this is an important point. To what extent do we want the new proposal to be usable out of the box in popular platforms. [ ] AP: Retro compatibility related issue https://github.com/unicode-org/message-format-wg/issues/88#issuecomment-1024344193 - + ZBI: My understanding is that everyone agrees in principle. The moment when things become interesting and our experience comes to bear is with tradeoffs and balances. Whatever the extreme for compatibility that we have to pay to obtain compatibility, we’re still negotiating it. - -It seems to me that what the STA proposal is emphasizing is that he is nudging us in the direction of stronger compatibility. The balance I heard from his presentation is that as long as there is a parser that can convert from MF2.0 to legacy formats, then we should be expecting the legacy formats to adopt our data model, but not introduce changes to workflow patterns. - + +It seems to me that what the STA proposal is emphasizing is that he is nudging us in the direction of stronger compatibility. The balance I heard from his presentation is that as long as there is a parser that can convert from MF2.0 to legacy formats, then we should be expecting the legacy formats to adopt our data model, but not introduce changes to workflow patterns. + STA: Related to what MAT asked, we should add a separate meeting about XLIFF compatibility. Because this is how we get to compatibility with localization tooling. The one topic that I wished we had discussed more before my proposal is comments, because XLIFF has a notion of notes. I wonder how that should map to our data model. - + Topics that we have noted for further discussion: - + - Message references vs. aliases to message fragments - Aliases / macros to expressions - XLIFF compat @@ -188,118 +178,111 @@ Topics that we have noted for further discussion: - UI Elements - Metadata and comments - Container format - + RCA: Right, this is a good summary, that also includes the action points (AP) noted above. - -STA: My secret hope is that we can have two specs. One for the message body and the other for the container of messages. - + +STA: My secret hope is that we can have two specs. One for the message body and the other for the container of messages. + ZBI: Can you clarify if that also means you’re comfortable with double parsing — the problem I remember we were discussing earlier is writing by hand a system with two syntaxes which means you have to keep track of invariants in your head that you’re not breaking either way it’s encoded. And you’re comfortable with that cost?= - + STA: I guess so, yes. I would expect that the containers formats would not be edited by hand. - + ZBI: What’s your position on W3C for defining a format for MessageFormat 2. - + STA: I don’t know what that is. - + ZBI: If we want to propose MF2.0 for HTML and DOM L10n, we need to have a resource format. - + STA: Ideally I think the container format would then be compatible with the message syntax. - + ZBI: So W3C syntax you would propose to be separate from ICU syntax. - + STA: Yes, but compatible, if possible. It’s hard to know if it’s going to be compatible, but that’s my hope. - + EAO: How many topics have we identified to discuss right now? - + RCA: It looks like eight. - + EAO: So are we going to spend the next 8 months discussing these topics? Or do we have a timeline for talking about them? - + RCA: As discussed before, you can continue talking about them offline and come prepared before meetings. But you also have to make sure that you keep the group informed and not leave them out, as well. Is that okay with you? - + EAO: So are you saying that we shouldn’t talk about these more frequently? - + RCA: I didn’t say that. - + EAO: So can we discuss weekly? - + MIH: I am fine to discuss weekly or more frequently. Maybe we can have separate meetings that are each focused on an individual topic. - + RCA: Any objections? Should we have the meetings separate from the plenary group? - + EAO: We get stuck discussing things all the time. I would like to propose that we have meetings where the participants get to decide preliminarily on what we discuss on a topic that is decided upon ahead of time, and if there is a disagreement during the discussion, only then do we go back on it. But if we just have discussions without planning, that’s going to lead to 8 months of us discussing and deciding nothing. I would like to make decisions before the meeting. - + MIH: I would be happy for that to happen, but I don’t see how that will happen. Imagine that we spend 2 hours and we don’t agree, then what happens? - -EAO: Let’s say if we spend two hours in a meeting and we do agree, then we can say we have a consensus on a decision. - + +EAO: Let’s say if we spend two hours in a meeting and we do agree, then we can say we have a consensus on a decision. + MIH: That’s the easy case. - -RCA: Historically, this has happened several times before, and we are in the same position as we were 2 years ago. During that time, I proposed having meetings between EAO and MIH and STA to unblock the sticking points. The points of all of all the task force meetings and other separate meetings was to bring decisions pre-cooked, and give people at the plenary the ability to decide on the final take. If it helps, let’s do it. MIH and others, are you okay to have these extra meetings? - + +RCA: Historically, this has happened several times before, and we are in the same position as we were 2 years ago. During that time, I proposed having meetings between EAO and MIH and STA to unblock the sticking points. The points of all of all the task force meetings and other separate meetings was to bring decisions pre-cooked, and give people at the plenary the ability to decide on the final take. If it helps, let’s do it. MIH and others, are you okay to have these extra meetings? + ECH: We started this meeting talking about proposals and getting feedback with strongly worded X, Y or Z. How would that affect decision making? - + ZBI: I cannot imagine us not taking that advice. We don’t need to formally claim that we will follow everything they say, but I imagine that we will follow all of it, frankly. I imagine we may come back and say that we really tried to conform and on this one subject we diverge, but ultimately, the president of unicode is telling us something. - + ECH: I always thought that we needed to get conversation back to technical discussions, and today is a good example of that, and it’s great. This is a different way of making forward progress. I felt a bit disappointed that some of the things we were trying these out-of-band efforts, that there’s some way we will be able to make progress. I don’t want to think of it in terms of, “oh they’re telling us what to do” but if we could at least make progress together, that would be nice. - -ZBI: I don’t think they’re telling us what to do. They’re telling us what they see. To me, that’s much more valuable. - -ECH: Okay, I think that’s good. - + +ZBI: I don’t think they’re telling us what to do. They’re telling us what they see. To me, that’s much more valuable. + +ECH: Okay, I think that’s good. + EAO: Can I ask someone from the Chair Group for a meeting next Monday. - - - - + ## [#214](https://github.com/unicode-org/message-format-wg/issues/214) Runtime behaviour - + (Document is a follow up of [Message pattern elements #212](https://github.com/unicode-org/message-format-wg/issues/212) - [Design Doc. presentation](https://docs.google.com/document/d/1f9He3gTjKp0vrg7XMfTfm1t68lfIruWcboGs2H4Szo4/edit) ) - -EAO: To start off, we have various goals for matching data types with output formats. Goal 3 is about expressing input data in one of N data types, each with one or more formatters. Goal 6 is enable creating different implementations of the standard, which results in M different output types. So we get M x N combinations of message types. Previously, we only had outputs as a string type, so M = 1. - + +EAO: To start off, we have various goals for matching data types with output formats. Goal 3 is about expressing input data in one of N data types, each with one or more formatters. Goal 6 is enable creating different implementations of the standard, which results in M different output types. So we get M x N combinations of message types. Previously, we only had outputs as a string type, so M = 1. + So we wrap each data type with a class that includes the value, formatting options, locale, etc. For example, for a date input, the formatter resolves to a MessageDateTime instance. And to format that instance, call its toString() or toParts() method. The new MF spec should not define all possible output formats. Concated string output is not special, though likely common. Errors should be implementation-defined, but fallback values ought to be well specified. Need a goldilocks solution for missing / bad data – enough to produce output, but not enough to compromise privacy or security. Specifying how the fallback works will help ensure compatibility between implementations. -MIH: We’ve touched on this in previous discussions before. I didn’t see the benefit of wrapping values, and I still don’t see the need now after this presentation. The placeholders have all this information already. If it is not a part of the spec and it’s not required, why should we even talk about it? +MIH: We’ve touched on this in previous discussions before. I didn’t see the benefit of wrapping values, and I still don’t see the need now after this presentation. The placeholders have all this information already. If it is not a part of the spec and it’s not required, why should we even talk about it? EAO : I supposed that runtime behavior should be part of spec the idea was to propose a large of test suite and examples to represent it. The intent here is to find a way of presenting how the formatting ought to happen. Are you saying that you have a simpler way for formatting that covers all of the use cases that we presented? -MIH : I would like a description of the runtime as a Blackbox , where it’s visible from outside if I do I have to call .format() with arguments, or do I have to call a .resolve() that gives me a thing on which I then call .format() on? +MIH : I would like a description of the runtime as a Blackbox , where it’s visible from outside if I do I have to call .format() with arguments, or do I have to call a .resolve() that gives me a thing on which I then call .format() on? EAO: Spec answers this concerns, I would also be happy to review the same on your spec and comment about it STA: My understanding is that EAO said that this shouldn’t be specified, but I see a lot of value in discussing it. If we want this behavior, we should at least make sure that it’s possible that we can achieve this behavior. But I agree in general that this should be left for implementations to handle. I have a question and a thought. The question: is this specifically about formatting, but runtime behavior also includes matching/selection. EAO: The design doc also includes matching, but I elided that due to time constraints. - + STA: Maybe we should do a thought experiment. If we had a message and some inputs, what should the output be? And this ties into error handling. For example, if you have multiple selectors, and one of them fails, how do you handle that? Maybe we don’t need to describe the exact manner of how implementations deal with what is in and what is out, but we could specify what is lazy and what is eager. When it comes to aspects like grammatical case like vocative or accusative, we might need this information available. So we should discuss what is evaluated lazily and what is evaluated eagerly. -ZBI: I agree with STA about what is lazy versus what is eager. All previous l10n system and mental models were about eager. What is interesting is to think about resolutions at runtime versus ahead of time. Also, error fallbacking and error recovery will vastly differ based on how much due diligence we do. There will be a cascade of fallbacking and error recovery that determines what the user can do. We cannot do UI messages if we can’t describe how errors are handled. I am not worried about over-specifying. +ZBI: I agree with STA about what is lazy versus what is eager. All previous l10n system and mental models were about eager. What is interesting is to think about resolutions at runtime versus ahead of time. Also, error fallbacking and error recovery will vastly differ based on how much due diligence we do. There will be a cascade of fallbacking and error recovery that determines what the user can do. We cannot do UI messages if we can’t describe how errors are handled. I am not worried about over-specifying. I want to respond to MIH who asked whether it was necessary to introduce an element, or if it is an optimization. I think it is an optimization, so I agree with your mental model. We can have a discussion about whether the pros outweigh the cons. -MIH: ZBI already addressed partially what I wanted to say. One thing is that you mention lazy / eager, and your programming backgrounds are different from mine. So I can’t readily see the benefit, and therefore these discussions are what we have to have. It doesn’t help to only read a design doc because it may still not be clear, so I would appreciate answers when I ask things because I genuinely want to understand. +MIH: ZBI already addressed partially what I wanted to say. One thing is that you mention lazy / eager, and your programming backgrounds are different from mine. So I can’t readily see the benefit, and therefore these discussions are what we have to have. It doesn’t help to only read a design doc because it may still not be clear, so I would appreciate answers when I ask things because I genuinely want to understand. -EAO: I have a hard time understanding how to answer because I have written a design doc and presented it and has an implementation. Some things are specified in the runtime behavior that will not appear in the spec. I will read your document and I appreciate you reading mine. +EAO: I have a hard time understanding how to answer because I have written a design doc and presented it and has an implementation. Some things are specified in the runtime behavior that will not appear in the spec. I will read your document and I appreciate you reading mine. -STA: I don’t know about MIH and EAO, but I felt that this exercise of writing down a full attempt of a spec was enlightening. It showed me what parts I am confident about and what I’m not. It’s a bit of an exploration. From my side, I’m intrigued by the different runtime behaviors. One thing I would like to discuss next time is being able to say that a `{color}` accords with an `{item}`. How do we know whether `{item}` has a grammatical accusative case or not? +STA: I don’t know about MIH and EAO, but I felt that this exercise of writing down a full attempt of a spec was enlightening. It showed me what parts I am confident about and what I’m not. It’s a bit of an exploration. From my side, I’m intrigued by the different runtime behaviors. One thing I would like to discuss next time is being able to say that a `{color}` accords with an `{item}`. How do we know whether `{item}` has a grammatical accusative case or not? RCA: Okay, we can start wrapping up. I will send an invitation for the meeting next week. Last comments? - + EAO: If someone else can think of the meetings for 2 weeks from now and 3 weeks from now, that would be nice. RCA: Maybe people attending the next meeting can propose the topics for the following meeting(s). We also have the topic about case selection, for which we have a document, which we didn’t discuss. Also, does anyone know when CLDR-TC will discuss the proposals? ZBI: They are already reviewing the documents we submitted, and they will add comments asynchronously, and will be ready with feedback after a few weeks. But they won’t be presenting their comments at our meeting, for example. - - -## Conclusions/Next Steps - -- Increase cadence of meetings having one MF meeting weekly to devote time to discuss #219 - starting next week - +## Conclusions/Next Steps +- Increase cadence of meetings having one MF meeting weekly to devote time to discuss #219 - starting next week diff --git a/meetings/2022/notes-2022-02-07.md b/meetings/2022/notes-2022-02-07.md index 528ffe1063..3256c946e3 100644 --- a/meetings/2022/notes-2022-02-07.md +++ b/meetings/2022/notes-2022-02-07.md @@ -1,4 +1,5 @@ ### February 7th, meeting Attendees + - George Rhoten - Apple (GWR) - Batjaa Batbold - Amazon (BAT) - Zibi Braniecki - Amazon (ZBI) @@ -6,18 +7,18 @@ - David Filip - XLIFF TC, Huawei (DAF) - Eemeli Aro - Mozilla/OpenJS (EAO) - ### Agenda - +### Agenda + This is the first of the weekly meetings of the WG, and will focus on aliases/macros; a feature mentioned previously in #209, and included in each of our current three spec proposals: Eemeli: syntax / data model / error handling Staś: syntax / data model / runtime Mihai: description Questions that may be answered by this discussion: Should aliases/macros be included in the spec? - Yes - > 70% - +Yes - > 70% + Should we call them aliases, macros, or something else? - + Should they use the same namespace as variable or function references, or have their own? In the syntax, should its assignment and use be marked with the same or different sigils? Can they take literal values? @@ -29,40 +30,37 @@ Should they be considered translatable or not translatable by default? If the value is not translatable, should it be represented in XLIFF as a part the message text, originalData (not translatable), resourceData (translatable), a separate message, or something else? If the value is translatable, should it be represented in XLIFF as a part the message text, originalData (not translatable), resourceData (translatable), a separate message, or something else? If you have additional questions in mind, please post them in the comments below. I'll send a poll about these to the mailing list in the next few days, to get a bit of a baseline on where we're starting from. - + EAO: We have clear yes on having Alias/Macros but the results speak for themselves. - -STA : - -MF2: Questions on Aliases/Macros (Responses) - -GWR: What exactly is the scope of aliases/macros? Macros sound like they’re executing code. Aliases might be confused with grammatical category aliases, e.g. “possessive” versus case=genitive. There are also “phrases” which are fragments of sentences (STA: reusable by multiple messages?) +STA : +MF2: Questions on Aliases/Macros (Responses) -EAO: Alias a way of storing a value of variable with options to be reused across multiple places, Example we can reuse it the formatter in different place in a single message, I understand that aliases and macros in MIH and STA proposals allow for this sort of usage and also allow for value of an alias or macro to be contained +GWR: What exactly is the scope of aliases/macros? Macros sound like they’re executing code. Aliases might be confused with grammatical category aliases, e.g. “possessive” versus case=genitive. There are also “phrases” which are fragments of sentences (STA: reusable by multiple messages?) +EAO: Alias a way of storing a value of variable with options to be reused across multiple places, Example we can reuse it the formatter in different place in a single message, I understand that aliases and macros in MIH and STA proposals allow for this sort of usage and also allow for value of an alias or macro to be contained GWR : I think we should use the correct terminology that would improve the amount of reeducation and be aligned with “standards of linguistics” -STA : I disagree with use of any code name if alias/macros are problematic I suggest that +STA : I disagree with use of any code name if alias/macros are problematic I suggest that EAO: The prior art for choosing the naming conventions was based on YAML MIH: +1 to use “local variables” -EAO: Any considerations about Local variables ? +EAO: Any considerations about Local variables ? -GWR : I have questions about “how to reference a variable ?” +GWR : I have questions about “how to reference a variable ?” -## Presenting syntax about Aliases to Expression using local variables +## Presenting syntax about Aliases to Expression using local variables $item = {$item asNoun count=$count case=accusative} $count = {$count asNumber maximumFractionDigits=0} {$count plural}? one [You bought a {$color asAdj accord=$item} {$item}.] - _ [You bought {$count} {$color asAdj accord=$item} {$item}.] +\_ [You bought {$count} {$color asAdj accord=$item} {$item}.] — @@ -75,6 +73,7 @@ $color = {$color asAdj accord=$item} _ [You bought {$count} {$color} {$item}.] # Use-cases: + Provide the functionality of MF1’s #. Avoid repetition for calls with many flags. Consistency. @@ -83,8 +82,8 @@ Allow complex values to be passed as option values: foo(1, option=bar(2)) → fo Improve translation experience by reducing the amount of code inside the pattern. # Consensus -Don’t allow shadowing/reassignment +Don’t allow shadowing/reassignment ZBI: Not comfortable with shadowing / reassignment. @@ -94,28 +93,26 @@ MIH: Use convention to disambiguate: $itemInAccusative. $roomsFrag = {{$roomCount}? 1 [1 room] _ [{$roomCount} rooms]} $suitesFrag = {{$suiteCount}? 1 [1 suite] _ [{$suitesCount} suites]} -$guestsFrag = {{$guestCount}? 1 [1 guest] _ [{$guestsCount} guests]} +$guestsFrag = {{$guestCount}? 1 [1 guest] \_ [{$guestsCount} guests]} {$roomCount}? {$suiteCount}? {$guestCount}? 1 1 1 [This isn't a hotel, okay?] _ _ _ [This hotel has {$roomsFrag} and {$suitesFrag}, accommodating up to {$guestsFrag}.] # Use-cases -Avoid combinatorial explosion in case of complex multi-selector messages -open(link, title=”Hello, {$username}”) → open(link, title=$linkTitle) +Avoid combinatorial explosion in case of complex multi-selector messages +open(link, title=”Hello, {$username}”) → open(link, title=$linkTitle) # Consensus - - -ADP : This case it’s the the typical bug of doing message fragments to assemble a string ? +ADP : This case it’s the the typical bug of doing message fragments to assemble a string ? MIH : This was introduced where we discussed selection inside the message versus message level. Programmers have always a way to go around it, -EAO: My suggestions are : +EAO: My suggestions are : 1 - Include message in the spec then we don’t need to allow for these local variables to support translatable text content. ? -2- If we do not support message references in the spec, then we do need to allow thi tu support translatable content ? +2- If we do not support message references in the spec, then we do need to allow thi tu support translatable content ? MIH: I was hoping this would help avoid it by using message references, if they solve the same thing, or they are global vs local variables ? I see a risk of having different messages @@ -126,12 +123,12 @@ I can't find {$relationship definitenesss=indefinite} in your addressbook. What I can't find an uncle in your addressbook. What is your uncle's phone number? I language like Hebrew and Arabic will have to change "your" in some way that is different than English. -We have a concept of local variable , and by experience if we don’t provide it Engineers will do it anyway. +We have a concept of local variable , and by experience if we don’t provide it Engineers will do it anyway. MIH: Macros are really for thing that are used in this message not shared across messages EAO: I suggest a separate discussion for use cases not covered by macros but adjacent to them. - + I would prefer a model where a message would be simpler in structure and would contain at most multiple different cases/variants of the same message. IMHO having a message with local variables that are themselves text in my spec proposal its a message group, that it’s a flat grouping of messages. Limit message references to only be able to refer to message in the immediate (adjacents) of the source message mean that you have all message contents that applies there. This would avoid complexity of a single message containing fragments. -… +… diff --git a/meetings/2022/notes-2022-02-14.md b/meetings/2022/notes-2022-02-14.md index e08d5b16a6..3c09d0b109 100644 --- a/meetings/2022/notes-2022-02-14.md +++ b/meetings/2022/notes-2022-02-14.md @@ -18,29 +18,23 @@ - Zibi Braniecki - Amazon (ZBI) - - -### **Agenda** +### **Agenda** Message references are rather variably supported in our current spec proposals: - - -* Eemeli: [syntax](https://github.com/unicode-org/message-format-wg/blob/ez-spec/spec-syntax.md#message-references) / [data model](https://github.com/unicode-org/message-format-wg/blob/ez-spec/spec-data-model.md#messageref) / [value resolution](https://github.com/unicode-org/message-format-wg/blob/ez-spec/spec-formatting.md#messageref) / [message selection](https://github.com/unicode-org/message-format-wg/blob/ez-spec/spec-message-selection.md) -* Staś: not included -* Mihai: example of message references via [custom user-defined formatting function](https://docs.google.com/document/d/1kqD0gy5x1mfiF2PAegjcNCAc98snTAqtbxccxfLcpNo/edit#heading=h.abzda7aveso5) +- Eemeli: [syntax](https://github.com/unicode-org/message-format-wg/blob/ez-spec/spec-syntax.md#message-references) / [data model](https://github.com/unicode-org/message-format-wg/blob/ez-spec/spec-data-model.md#messageref) / [value resolution](https://github.com/unicode-org/message-format-wg/blob/ez-spec/spec-formatting.md#messageref) / [message selection](https://github.com/unicode-org/message-format-wg/blob/ez-spec/spec-message-selection.md) +- Staś: not included +- Mihai: example of message references via [custom user-defined formatting function](https://docs.google.com/document/d/1kqD0gy5x1mfiF2PAegjcNCAc98snTAqtbxccxfLcpNo/edit#heading=h.abzda7aveso5) Questions that may be answered by this discussion: - - -* Should message references be included in the specification? -* Should message references be included as their own entity in the data model? -* How far can a message reference reach (i.e. only to messages in same group/same resource/anywhere)? -* Should a message reference be able to set/define variables for the referred message? -* What is the resolved value of a message reference (a formatted string, a stringifiable object with metadata, or something else)? -* Can message references be used as selector values? -* Can the resolved metadata value of a message reference be used as a selector value? +- Should message references be included in the specification? +- Should message references be included as their own entity in the data model? +- How far can a message reference reach (i.e. only to messages in same group/same resource/anywhere)? +- Should a message reference be able to set/define variables for the referred message? +- What is the resolved value of a message reference (a formatted string, a stringifiable object with metadata, or something else)? +- Can message references be used as selector values? +- Can the resolved metadata value of a message reference be used as a selector value? Survey results: [https://docs.google.com/forms/d/1zH0fIleGVrwXatDZDUh7VhL6DYMXvRTNnrqYthkjrIo/viewanalytics](https://docs.google.com/forms/d/1zH0fIleGVrwXatDZDUh7VhL6DYMXvRTNnrqYthkjrIo/viewanalytics) @@ -68,15 +62,15 @@ EAO: … (sorry, I missed it) STA: Two different problems: the “bone dragon” can be solved with the same thing we design for glossaries (human names, product names, etc.) The “hotels” problem can be solved with local variables to message fragments. Looking for the least bad solution. -MIH: if a string depends on many variables, these variables likely come from other places than message resources. And some features in the main message might depend on ALL items taken together (for example the gender of a list of items). +MIH: if a string depends on many variables, these variables likely come from other places than message resources. And some features in the main message might depend on ALL items taken together (for example the gender of a list of items). DAF: I answered “yes” in a few cases, but “only if translatability is guaranteed”. A lot of these use-cases can be solved by a sophisticated list formatted (request a grammatical case, a plural form etc). ITS: [https://www.w3.org/TR/its20/](https://www.w3.org/TR/its20/) ZBI: I think NLG will eventually blend with MF2 to generate those sophisticated messages "on fly" and MF2 data will be used eventually in three ways: -* Produce a "string" to display \ -* Produce a "UI fragment" to embed in UI \ -* Produce a data to feed into NLG model that will generate final representation +- Produce a "string" to display \ +- Produce a "UI fragment" to embed in UI \ +- Produce a data to feed into NLG model that will generate final representation The lowest common denominator use case is the first one, the Rich UI Web will rely on the second, and Ambient UI systems will increasingly rely on the last consumption model. @@ -100,7 +94,7 @@ EAO: Should we use messages for consistency in branding? MIH: Sometimes used for changing names of companies, products etc. But such changes might require changes to the public API of the brandname message. Still need to go through hundreds of messages to accommodate a new meta-property (e.g. gender). -AP: +AP: DAF: Branding is somewhat more rigid than terminology management. I don’t think you can manage the transition from one brandname to another. Some companies enforce no changes even in morphological languages. @@ -116,4 +110,4 @@ DAF: It’s a sub-problem. If you have a solution to morphological glossaries, y **#### Use-Case 4** -EOA: “Click __continue__ to proceed” +EOA: “Click **continue** to proceed” diff --git a/meetings/2022/notes-2022-02-28.md b/meetings/2022/notes-2022-02-28.md index 8241c41eab..764431bea8 100644 --- a/meetings/2022/notes-2022-02-28.md +++ b/meetings/2022/notes-2022-02-28.md @@ -20,24 +20,21 @@ - George Rhoten - Apple (GWR) - ## MessageFormat Working Group Contacts: -- [Mailing list]([https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg)) +- [Mailing list](<[https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg)>) ## Next Meeting - - ## Moderator : Rómulo Cintra ### Agenda -- Intl.MessageFormat - Stage I at TC39-TG2 ECMA-402 - * [Slides](https://docs.google.com/presentation/d/1oThTeL_n5-HAfmJTri-i8yU2YtHUvj9AakmWiyRGPlw/edit?usp=sharing) - * [Proposal]([https://github.com/dminor/proposal-intl-messageformat](https://github.com/dminor/proposal-intl-messageformat)) - * Deadline for advancement eligibility: [March 18th, 2022 10:00 EDT](https://www.timeanddate.com/countdown/generic?p0=1440&iso=20220318T14&msg=TC39%20Submission%20deadline) -- Summary from past 2 extended meetings[#220](https://github.com/unicode-org/message-format-wg/issues/220) & [#221](https://github.com/unicode-org/message-format-wg/issues/221) - Open debate and feedback +- Intl.MessageFormat - Stage I at TC39-TG2 ECMA-402 + - [Slides](https://docs.google.com/presentation/d/1oThTeL_n5-HAfmJTri-i8yU2YtHUvj9AakmWiyRGPlw/edit?usp=sharing) + - [Proposal](<[https://github.com/dminor/proposal-intl-messageformat](https://github.com/dminor/proposal-intl-messageformat)>) + - Deadline for advancement eligibility: [March 18th, 2022 10:00 EDT](https://www.timeanddate.com/countdown/generic?p0=1440&iso=20220318T14&msg=TC39%20Submission%20deadline) +- Summary from past 2 extended meetings[#220](https://github.com/unicode-org/message-format-wg/issues/220) & [#221](https://github.com/unicode-org/message-format-wg/issues/221) - Open debate and feedback - Compatibility Strategy - Resource-level data model - Junk in Data Model @@ -52,17 +49,16 @@ RCA: Yes, we should find out more about the logistics. ## Intl.MessageFormat - Stage I at TC39-TG2 ECMA-402 - -* [Slides]([https://docs.google.com/presentation/d/1oThTeL_n5-HAfmJTri-i8yU2YtHUvj9AakmWiyRGPlw/edit?usp=sharing](https://docs.google.com/presentation/d/1oThTeL_n5-HAfmJTri-i8yU2YtHUvj9AakmWiyRGPlw/edit?usp=sharing)) -* [Proposal]([https://github.com/dminor/proposal-intl-messageformat](https://github.com/dminor/proposal-intl-messageformat)) +- [Slides](<[https://docs.google.com/presentation/d/1oThTeL_n5-HAfmJTri-i8yU2YtHUvj9AakmWiyRGPlw/edit?usp=sharing](https://docs.google.com/presentation/d/1oThTeL_n5-HAfmJTri-i8yU2YtHUvj9AakmWiyRGPlw/edit?usp=sharing)>) +- [Proposal](<[https://github.com/dminor/proposal-intl-messageformat](https://github.com/dminor/proposal-intl-messageformat)>) RCA: I am adding to the notes that the deadline for the proposal stage advancement is March 18th, 2022 10:00 EDT EAO: I created this as an evolution of the API for some of the JS Intl formatters, and then extend it specifically for MessageFormat. It would look different from the other Intl formatters, for reasons that will hopefully become clear. -The existing message format API has a data model and a runtime, and we need to consider those things separately. `MessageData` don’t contain runtime values, so they belong in the constructor for `MessageFormat`, not in the interface for the `format()` function. +The existing message format API has a data model and a runtime, and we need to consider those things separately. `MessageData` don’t contain runtime values, so they belong in the constructor for `MessageFormat`, not in the interface for the `format()` function. -We should not be throwing an error but instead have an error handler for falling back, being different from Intl objects already doing, but it's a requirement due that data it’s mostly provided by user data. +We should not be throwing an error but instead have an error handler for falling back, being different from Intl objects already doing, but it's a requirement due that data it’s mostly provided by user data. Next, we consider what is the data that we are building with. We want to build a message formatter not around a single message, but around a resource that contains multiple messages. @@ -78,7 +74,7 @@ Another question that arises is if we should also have toString method. The MessageValue part represents resolve message value the, -`ResolvedMessage` extends the `MessageValue` interface to be able to return an Iterable of `MessageValue`s. We can also have interfaces that extend `MessageValue` for various more specific types of a `MessageValue`, like `MessageLiteral`, `MessageNumber`, `MessageDateTime`, etc. +`ResolvedMessage` extends the `MessageValue` interface to be able to return an Iterable of `MessageValue`s. We can also have interfaces that extend `MessageValue` for various more specific types of a `MessageValue`, like `MessageLiteral`, `MessageNumber`, `MessageDateTime`, etc. In JS we have Numbers and BigInt’s representing numbers that might affect plural rules, regarding rendering error cases we have a representation of message fallback that can be found in resolved messages. For error handling the toString message or top level ErrorHandler in case an error it’s thrown we can captura error + message “source/value”. This would allow fallback message to work. @@ -88,33 +84,33 @@ A message value may also have some metadata, which should be just a map of strin There is a cost for adding a new object in JavaScript, every type identifier will help avoid addressing those concerns. -When we talk about `MessageFormatterFunction`, we can think about this as the “registry” as it has been referred to. Functions have locals, options, and arguments. An implementation should come included with at least implementations for datetime and number formatting functions. +When we talk about `MessageFormatterFunction`, we can think about this as the “registry” as it has been referred to. Functions have locals, options, and arguments. An implementation should come included with at least implementations for datetime and number formatting functions. -That is the proposal. We have sources of certainty. The first is that the single-message syntax may be defined before message-resource syntax. We have not settled on how we are going to support what are considered in the EZ proposal as markup elements, whereas MIH has represented them as start (open) and end (close) placeholders. Also, we want to consider user-defined custom pattern elements. +That is the proposal. We have sources of certainty. The first is that the single-message syntax may be defined before message-resource syntax. We have not settled on how we are going to support what are considered in the EZ proposal as markup elements, whereas MIH has represented them as start (open) and end (close) placeholders. Also, we want to consider user-defined custom pattern elements. -RCA: How is this aligned with the actual status with both EZ and EM proposals, and does that affect the ECMA proposal? This proposal already has a well-defined structure, and I’m not sure how that matches the status +RCA: How is this aligned with the actual status with both EZ and EM proposals, and does that affect the ECMA proposal? This proposal already has a well-defined structure, and I’m not sure how that matches the status EAO: Everything presented here is compatible with all 3 proposals. -MIH: I think it would have been useful to share the presentation in advanced as we’ve established already. Next, I don’t think the proposal is compatible with all 3 proposals. One example is making the message value publicly visible outside, which is not implemented in all 3 proposals. +MIH: I think it would have been useful to share the presentation in advanced as we’ve established already. Next, I don’t think the proposal is compatible with all 3 proposals. One example is making the message value publicly visible outside, which is not implemented in all 3 proposals. EAO: Okay, I understand that. -STA: I realize that the question of specifying the MessageResources is ahead of us and under discussion, so I’m not commenting about that. I think there is value, even if we allow working with resources, making the API centered around a single message rather than formatting by message id. I think I have a preference, which would be to allow users to create/manipulate a message interface, because then you can imagine scenarios where it’s easy to work with a message formatter and a collection of messages rather than to have to insert them into resources. An example of this is how CSS rules look like and CSS styles look like. +STA: I realize that the question of specifying the MessageResources is ahead of us and under discussion, so I’m not commenting about that. I think there is value, even if we allow working with resources, making the API centered around a single message rather than formatting by message id. I think I have a preference, which would be to allow users to create/manipulate a message interface, because then you can imagine scenarios where it’s easy to work with a message formatter and a collection of messages rather than to have to insert them into resources. An example of this is how CSS rules look like and CSS styles look like. EAO: The Intl.MessageFormat API does, at some point, need to support parsing a file into a representation of messages. So having an API around a single message would be possible but would add complexity. -STA: I’m not sure about complexity, but both of these approaches have some. Creating an instance of a ____ +STA: I’m not sure about complexity, but both of these approaches have some. Creating an instance of a \_\_\_\_ -RCA: Regarding this part about parsing a file, this would involve a lot of different concerns. Perhaps this is not the venue to discuss that, but it would certainly touch a lot of other APIs. Should we discuss it here, since it would involve bringing in a loader, and dealing with a loader then requires ____. +RCA: Regarding this part about parsing a file, this would involve a lot of different concerns. Perhaps this is not the venue to discuss that, but it would certainly touch a lot of other APIs. Should we discuss it here, since it would involve bringing in a loader, and dealing with a loader then requires \_\_\_\_. EAO: Are you asking whether we should have a resource represented as a string at all? RCA: I’m talking about how you said that we need to be able to read in a file. -EAO: When I said that a +EAO: When I said that a -ECH: Simplicity it’s about taking things apart, deal with a collection with message includes to be able to deal with a message at time, starting with a single message will be simpler same analogy with deal with files instead of string, simplify will help us address things in simple and easy manner. [https://github.com/matthiasn/talk-transcripts/blob/master/Hickey_Rich/SimpleMadeEasy.md](https://github.com/matthiasn/talk-transcripts/blob/master/Hickey_Rich/SimpleMadeEasy.md) +ECH: Simplicity it’s about taking things apart, deal with a collection with message includes to be able to deal with a message at time, starting with a single message will be simpler same analogy with deal with files instead of string, simplify will help us address things in simple and easy manner. [https://github.com/matthiasn/talk-transcripts/blob/master/Hickey_Rich/SimpleMadeEasy.md](https://github.com/matthiasn/talk-transcripts/blob/master/Hickey_Rich/SimpleMadeEasy.md) EAO: A lot of this comes down to whether we adhere to the first consensus that we agreed to about supporting message references. If we support that, then we need to consider these issues. As long as we have this functionality supported, then supporting both is the simpler API. But I’m happy to discuss what it looks like to take this forward. @@ -132,7 +128,7 @@ RCA: Yes, Stage 1 is just the entry stage. And I believe that we have consensus ## Summary from past 2 extended meetings -[#220](https://github.com/unicode-org/message-format-wg/issues/220) & [#221](https://github.com/unicode-org/message-format-wg/issues/221) - Open debate and feedback +[#220](https://github.com/unicode-org/message-format-wg/issues/220) & [#221](https://github.com/unicode-org/message-format-wg/issues/221) - Open debate and feedback RCA: STA EAO MIH is there anything you would like to share with the group from the last 2 meetings? @@ -150,9 +146,9 @@ STA: We’ve been discussing in previous meetings that some things will be added EAO: We really ought to be explicit about saying what things we provide in the spec, and what things may change later. Hypothetically, if we come up with definitions in 2.0 that don’t include markup elements in the pattern elements, then maybe later in a 2.1 version we might need to include that, and we need to think about how that affects compatibility. -Figure out what we end up defining in the spec that generate any dependency being an external observable interface that affects compatibility and break those. +Figure out what we end up defining in the spec that generate any dependency being an external observable interface that affects compatibility and break those. -Example : If we include in I/O the Data Model interface we have to provide retro-compatibility between 2.0 -> 2.1 -> 3.0 in future. +Example : If we include in I/O the Data Model interface we have to provide retro-compatibility between 2.0 -> 2.1 -> 3.0 in future. STA: Generally, there are 2 sorts of compatibility that I’m interested in, and they’re sometimes confusingly called backwards and forwards compatibility. Backwards is when you have new tools working with old files. Forwards compatibility is old tools working with new files. @@ -164,7 +160,7 @@ EAO: One key aspect is, if you come across a pattern element in the data model t We should define non-breaking compatibility. In the 2.0 spec, if something causes an error or fallback, we should say that it is possible for future changes to support some of those input cases in the future. -RCA: Although we don’t specify compatibility in the group, I think it useful to think about backwards compatibility for the purposes of feature adoption. I think for 2.0 versions and similar, we should think about this. It is useful to have 3rd parties be able to read and use the spec, and it is important so that we “don’t break the web”, and ensure adoption, etc. +RCA: Although we don’t specify compatibility in the group, I think it useful to think about backwards compatibility for the purposes of feature adoption. I think for 2.0 versions and similar, we should think about this. It is useful to have 3rd parties be able to read and use the spec, and it is important so that we “don’t break the web”, and ensure adoption, etc. STA: Backwards compatibility is limiting and that means that an old file can continue to work, which is why I am more worried about forward compatibility or not giving an error when some changes are made. Maybe the safest solution is to never do 2.1. And maybe there are benefits to sticking with it for 15 years and maybe there will be a MF 3.0 WG. What are your thoughts on this possibility? @@ -172,7 +168,7 @@ MIH: From the localization tooling perspective, from what I’ve seen, some of t EAO: I think that if we write in the text about how compatibility should work with changes in the future, it shouldn’t be controversial. But who knows? I don’t think we need to dedicate time on this call to that. -[ ] AP Compatibility Strategy Plan to at least guarantee or not compatibility in future and th +[ ] AP Compatibility Strategy Plan to at least guarantee or not compatibility in future and th EAO: It would be nice like what STA mentioned that we get a 2.0 version that never needs to be upgraded, but we also need to build in facilities to support how we deal with upgrades. @@ -180,26 +176,25 @@ STA: My point was not that 2.0 would be so good that it never needs to changed, RCA: Stas already started here some discussion around compatibility https://github.com/unicode-org/message-format-wg/discussions/191 in Roadmap and created https://github.com/unicode-org/message-format-wg/issues/222 to follow up on it. - ## Resource-level data model -EAO: I don’t have anything strictly prepared for this, but we have use cases identified two weeks for things like “Click next to continue” when “next” is the contents of a nearby message. It would be good for MF to support message references, and as I said, also be able to support collections of messages, either in a group or in a resource. It plays into other work that we’re doing to provide a resource level syntax from the start and not just a message level syntax. I think this a topic that we have a number of different opinions in this group. What is the reason for bringing this up today to discuss? +EAO: I don’t have anything strictly prepared for this, but we have use cases identified two weeks for things like “Click next to continue” when “next” is the contents of a nearby message. It would be good for MF to support message references, and as I said, also be able to support collections of messages, either in a group or in a resource. It plays into other work that we’re doing to provide a resource level syntax from the start and not just a message level syntax. I think this a topic that we have a number of different opinions in this group. What is the reason for bringing this up today to discuss? RCA: I brought this up in the chat that we had in which it was said it would be good to discuss in this meeting. STA: My high-level thoughts about resources and single messages: We agreed that there are a lot of use cases that do not benefit from the abstraction of the Resource, and there are some that do. There are systems that do not have a way to support the Resource collection of messages. -We could standardize MF Spec to be a single message and on top of it build another spec for container messages , wondering if this container should be more environment specific than generalized in spec. +We could standardize MF Spec to be a single message and on top of it build another spec for container messages , wondering if this container should be more environment specific than generalized in spec. For example, for ECMA-402, we could imagine the ECMA-402 container specific for web distribution, perhaps aligned with how CSS or XML works. And other environments could use their own containers custom to their environments. So I wonder if it should be up to the MFWG to decide on what the container format should look like, rather than allowing it to be defined in the environment. -EAO: What we can do it’s define what the message model must look like and provide information about how the Resource Data Model should look like. +EAO: What we can do it’s define what the message model must look like and provide information about how the Resource Data Model should look like. We should make it clear that we should have other (ex: syntax) representations of messages from which you can build a data model without having resources. -**_**“We have some reserved characters in the proposal - How do we feel others extending the actual syntax to provide additional syntax we won’t provide ? ”**_** +**\_**“We have some reserved characters in the proposal - How do we feel others extending the actual syntax to provide additional syntax we won’t provide ? ”**\_** MIH: To clarify a little bit, in the proposal doc that I shared with CLDR-TC and MFWG, I am not too opinionated about the idiomaticity and syntax of the APIs created for different frameworks or languages. @@ -213,11 +208,11 @@ MIH: Syntax is important. We see already from current MessageFormat in ICU, for Having a syntax that works across languages seems nice but I don’t think that seems practical. Not many C++ developers use JS, so forcing a C++ syntax on JS users would not be nice, and vice versa (a JSON-y syntax would not be nice for C++ developers). -EAO: If we do consider that syntax is important our audience are developer and optimize the DX. +EAO: If we do consider that syntax is important our audience are developer and optimize the DX. -Then I think we have 2 or 3 use cases that come to mind. How do you write a new message? +Then I think we have 2 or 3 use cases that come to mind. How do you write a new message? -1. Where maybe you write a message embedded by itself in another programming language. +1. Where maybe you write a message embedded by itself in another programming language. 2. Work with message when they gets extracted by a separate process into an external resource file, and continue through a localization process. How do you work with that? @@ -235,11 +230,10 @@ MIH: One is about developers vs translators, but I would argue it the other way MIH: Updates are usually handle by location tools , normally this -EAO: Let’s not get into the details of discussing these, and we can have those discussions later. Also, for ZBI’s position on syntax, see [GitHub issue #53](https://github.com/unicode-org/message-format-wg/issues/53). +EAO: Let’s not get into the details of discussing these, and we can have those discussions later. Also, for ZBI’s position on syntax, see [GitHub issue #53](https://github.com/unicode-org/message-format-wg/issues/53). RCA: Should we add this as a topic for upcoming sessions of the extended meetings? - ## Junk in Data Model RCA: ZBI is not here to discuss this. @@ -258,9 +252,9 @@ EAO: Yes, let’s say at the end of the message you never have the close curly b MIH: Even though this is an obviously recently formed opinion, I don’t see how this could be useful. -Example if I have : Hello +Example if I have : Hello -I see some benefit to have some kind of junk information for refactoring work, so that I can fix it or change instead of throwing away messages that I fail. But for real runtime use, I don’t see how it is useful. In general, I don’t think you can do good error recovery in a meaningful way. +I see some benefit to have some kind of junk information for refactoring work, so that I can fix it or change instead of throwing away messages that I fail. But for real runtime use, I don’t see how it is useful. In general, I don’t think you can do good error recovery in a meaningful way. EAO: To describe, this is also useful for all of the operations to perform on messages beyond parsing. Like passing them to translators, or convert the message source format to a different source format. Junk can also help inform the context of other parts of the message. diff --git a/meetings/2022/notes-2022-03-07.md b/meetings/2022/notes-2022-03-07.md index 997520f203..9e3914c274 100644 --- a/meetings/2022/notes-2022-03-07.md +++ b/meetings/2022/notes-2022-03-07.md @@ -16,32 +16,24 @@ - Zibi Braniecki - Mozilla (ZBI) +**## MessageFormat Working Group Contacts:** - -**## MessageFormat Working Group Contacts:** - -- [Mailing list]([https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg)) +- [Mailing list](<[https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg)>) **## Next Meeting ** +**### Moderator : Rômulo Cintra** - -**### Moderator : Rômulo Cintra** - -### **Agenda** +### **Agenda** his weekly meeting will focus on display/markup elements, which have been discussed previously in [#26](https://github.com/unicode-org/message-format-wg/issues/26) and [#186](https://github.com/unicode-org/message-format-wg/discussions/186). They are rather variably supported in our current spec proposals: - - -* Eemeli: [syntax](https://github.com/unicode-org/message-format-wg/blob/ez-spec/spec-syntax.md#display-and-markup-elements) / [data model](https://github.com/unicode-org/message-format-wg/blob/ez-spec/spec-data-model.md#element) / [resolution](https://github.com/unicode-org/message-format-wg/blob/ez-spec/spec-formatting.md#element) / [error handling](https://github.com/unicode-org/message-format-wg/blob/ez-spec/spec-formatting.md#element-1) -* Staś: Not directly supported. -* Mihai: [open/close/standalone placeholders](https://docs.google.com/document/d/1kqD0gy5x1mfiF2PAegjcNCAc98snTAqtbxccxfLcpNo/edit#heading=h.93qjwqomt7pu) +- Eemeli: [syntax](https://github.com/unicode-org/message-format-wg/blob/ez-spec/spec-syntax.md#display-and-markup-elements) / [data model](https://github.com/unicode-org/message-format-wg/blob/ez-spec/spec-data-model.md#element) / [resolution](https://github.com/unicode-org/message-format-wg/blob/ez-spec/spec-formatting.md#element) / [error handling](https://github.com/unicode-org/message-format-wg/blob/ez-spec/spec-formatting.md#element-1) +- Staś: Not directly supported. +- Mihai: [open/close/standalone placeholders](https://docs.google.com/document/d/1kqD0gy5x1mfiF2PAegjcNCAc98snTAqtbxccxfLcpNo/edit#heading=h.93qjwqomt7pu) Questions that may be answered by this discussion: - - 1. Should display/markup elements be supported by the spec? 2. Should standalone elements be supported? 3. Should start/end pairs of elements be supported? @@ -51,31 +43,23 @@ Questions that may be answered by this discussion: 7. Should elements be used to mark up messages for localisers? For example, to indicate that a span should not be translated. 8. Should some elements be supported by default? -Comments: +Comments: I believe that lot of formatting can be covered implicitly by formatters' behavior, but I guess there will always be important use cases where inline markup cannot be omitted, so the implicit and explicit formatting should be aware of each other in the standard. Also it's important that the inline markup data model is mappable onto XLIFF inline model. It's okay if inline markup doesn't use well formed equivalents, linear only data model is okay and mappable onto a model where also well formed markup exists.. ### How should start/end pairs of elements be matched with each other? -We have to find a counter-argument for the use cases that can support different ways of matching different from HTML or XML. Reordering , Opening and Closing tags or neste of don’t need to close how they should work. +We have to find a counter-argument for the use cases that can support different ways of matching different from HTML or XML. Reordering , Opening and Closing tags or neste of don’t need to close how they should work. - - -* No conclusion, we should revisit this later on and try to get more use cases +- No conclusion, we should revisit this later on and try to get more use cases ### Should elements be separate from placeholders/formatting functions? - - - - -* - -### Should elements be used to mark up messages for localisers? For example, to indicate that a span should not be translated. - +- +### Should elements be used to mark up messages for localisers? For example, to indicate that a span should not be translated. -* No conclusion, we need way to communicate to localisers, we should identify how to do it +- No conclusion, we need way to communicate to localisers, we should identify how to do it ### Should some elements be supported by default, i.e. included in something like a common registry? @@ -455,7 +439,7 @@ Do we have other things? Anything else that we may need to communicate to a loca MIH: It's a bit of a trick question. The whole semantic stuff is, it's an unbounded number of semantic tags. So what can it be? Whatever I can come up in 5 years you put them in semantics and that's it. -ZBI: But we could establish ... and know ... +ZBI: But we could establish ... and know ... Any quality tool shold know wheter a part was touched. @@ -577,9 +561,9 @@ Zibi & MIH probably have more to say on this. ZBI: I'm gonna repeat my 2 talking points. -1) I belive that fundamentally what we're considering elements & placeholders have different needs & limitations. And collapsing them into the same structure requries as to create a very open ended entity in the data model. That has a set of attr and shapes that cators to placeholders, formatting placeholders. It's union of a shape needed by placeholder and formatting elements. Ofc, we can make it generic. But in any scenario in which we can reason about what is needed for elements & placeholders, I believe the union is not overlapping. Which indicates to me it should be 2 different things. +1. I belive that fundamentally what we're considering elements & placeholders have different needs & limitations. And collapsing them into the same structure requries as to create a very open ended entity in the data model. That has a set of attr and shapes that cators to placeholders, formatting placeholders. It's union of a shape needed by placeholder and formatting elements. Ofc, we can make it generic. But in any scenario in which we can reason about what is needed for elements & placeholders, I believe the union is not overlapping. Which indicates to me it should be 2 different things. -2) If we're to be a little bit strict about what arguments and shape the elements take, then we make it fundamentally easier for tools to reason about. Even if we do what STA is saying, that we let another level completely resolve the structure or tree. I believe that our success of this delicate binding of message format into some UI bindings and the MF is what the CAT tools will be dealing with and QA tools depends on what API of the declerative parts we provide to those tools to reason about. +2. If we're to be a little bit strict about what arguments and shape the elements take, then we make it fundamentally easier for tools to reason about. Even if we do what STA is saying, that we let another level completely resolve the structure or tree. I believe that our success of this delicate binding of message format into some UI bindings and the MF is what the CAT tools will be dealing with and QA tools depends on what API of the declerative parts we provide to those tools to reason about. So an element with open and close, is very easy for a CAT tools to sat there's no close to this open. Black Box placeholder called element with attributes that are maybe somewhere designed, described in some registry require a CAT tool to look into registry, retrieve the API and reason about what attributes should be there and what their their meaning. diff --git a/meetings/2022/notes-2022-04-04.md b/meetings/2022/notes-2022-04-04.md index 05a3a3c794..cbb1cf99e0 100644 --- a/meetings/2022/notes-2022-04-04.md +++ b/meetings/2022/notes-2022-04-04.md @@ -1,4 +1,5 @@ ### April 4th, meeting Attendees + - George Rhoten - Apple (GWR) - Romulo Cintra (RCA) Igalia - Eemeli Aro - Mozilla (EAO) @@ -10,34 +11,30 @@ - Staś Małolepszy - Google (STA) - Prithvi Shah - Amazon (PSH) - - - -## MessageFormat Working Group Contacts: +## MessageFormat Working Group Contacts: - [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) -## Next Meeting +## Next Meeting Apr 18, 2022 -### Moderator : Rômulo Cintra - -### Agenda - -- MessageFormat Stage 1 at TC39 Plenary -- CLDR-TC Resolution -- Continue discussion about Syntax - create requirements document for the syntax. - * Single vs Multi line messages - * Should we minimize the volume and the syntax for simple cases - minimal needed syntax ? - * Error Fallback options - * Should the syntax be agnostic and be used across different platforms ? - * Should we take this as an example/starting point https://github.com/unicode-org/message-format-wg/blob/main/test/formattest.xsd +### Moderator : Rômulo Cintra +### Agenda + +- MessageFormat Stage 1 at TC39 Plenary +- CLDR-TC Resolution +- Continue discussion about Syntax - create requirements document for the syntax. + - Single vs Multi line messages + - Should we minimize the volume and the syntax for simple cases - minimal needed syntax ? + - Error Fallback options + - Should the syntax be agnostic and be used across different platforms ? + - Should we take this as an example/starting point https://github.com/unicode-org/message-format-wg/blob/main/test/formattest.xsd ## CLDR-TC Resolution -RCA: +RCA: The following is the document that records the guidance from the CLDR-TC and ICU-TC committee meeting. @@ -184,7 +181,7 @@ The following is the document that records the guidance from the CLDR-TC and ICU MIH: I am okay calling macros as aliases. The exact name is not important to me. -EAO: When we discussed, we called macros / aliases as "local variables". Another point was that assigning values to local variables should be easy so that actual users do this. +EAO: When we discussed, we called macros / aliases as "local variables". Another point was that assigning values to local variables should be easy so that actual users do this. EAO: Nothing to add about a function registry except that we need one. @@ -192,7 +189,7 @@ RCA: For UI Elements, we need an introducer syntax. ECH: For the benefit of the group here, I wanted to point out that "UI Elements" discussions also was interpreted in the committee discussions to apply to any type of region that has a start and end, so it doesn't apply only to UI widgets, but is more generally applicable than that. -APP: I want to make sure that I understand -- this is referring to any type of markup or markup placeholders and inclusions, right? Has anyone given thought to paired and unpaired placeholders? +APP: I want to make sure that I understand -- this is referring to any type of markup or markup placeholders and inclusions, right? Has anyone given thought to paired and unpaired placeholders? MIH: Yes. @@ -209,17 +206,17 @@ EAO: The point on formatting output is that we are not limited to just an output RCA: On the topic of VariableRef mechanism -- -EAO: There are 2 parts here that the CLDR-TC + ICU-TC didn't comment much on because the discussion went on too long already. One is whether we want to have variables specified where their specific formatting type/function is not specified. The committee said that formatting functions should only use what values are passed to them, and we can try to investigate in parallel, and to be considered only if the investigation concludes before MF 2.0, otherwise can be considered for a later MF 2.X iteration. +EAO: There are 2 parts here that the CLDR-TC + ICU-TC didn't comment much on because the discussion went on too long already. One is whether we want to have variables specified where their specific formatting type/function is not specified. The committee said that formatting functions should only use what values are passed to them, and we can try to investigate in parallel, and to be considered only if the investigation concludes before MF 2.0, otherwise can be considered for a later MF 2.X iteration. EAO: Variable names should be, effectively what I get get out of this is that variable names should be kind of same and look a little bit like variable names but can contain characters like a period that a separate parse effectively can interpret as splitting up the the variable name into path parts. If an implementation chooses to consider or allow input values to have some sort of hierarchy, then that can be supported. -MIH: I think the description is accurate, but we have to understand that this has big implementation implications. In the existing MF, the library is required to take the name of the argument and pass that to the formatting function. In a formulation that wants to use segmented path style identifiers, the engine (formatting function) +MIH: I think the description is accurate, but we have to understand that this has big implementation implications. In the existing MF, the library is required to take the name of the argument and pass that to the formatting function. In a formulation that wants to use segmented path style identifiers, the engine (formatting function) ECH: What I recall from the meetings is that it's not that we're reserving a dot as a special character, it's that if you choose a dot or semicolon or whatever, separator for any type of path style identifier that's implementation specific. And the handling of that is also implementation specific. -APP: I understand from their standpoint that only implementations that make use of reserved characters for segmented paths, and if you don't use it, +APP: I understand from their standpoint that only implementations that make use of reserved characters for segmented paths, and if you don't use it, -ZBI: I want to make the point that the use of a dot as a path separator as something that will impact the topic in +ZBI: I want to make the point that the use of a dot as a path separator as something that will impact the topic in EAO: (Item 11) Let's do consider later whether we do them, make sure that in the syntax, there's a slot available for them. If need be But not yet. Message bundles were decided to leave it out of MF 2.0, but to take it up latter in MF 2.X. @@ -227,23 +224,23 @@ ZBI: Also, we discussed that it is TBD where the appropriate place for such a me ZBI: I really wish we didn't use "bundle" - it's resource syntax :) bundle can also be used at runtime for collection of interrelated messages and then I'd prefer "context". So, I'd advocate for "resource" (store)/"context" (runtime) and no "bundle" at all -MIH: I believe that item 13 is only referring to behavior, and in particular, the selection only operates on values after they are formatted. That is because the formatting output affects how selection must be performed. +MIH: I believe that item 13 is only referring to behavior, and in particular, the selection only operates on values after they are formatted. That is because the formatting output affects how selection must be performed. ZBI: I believe George’s proposal addresses it very well. EAO: Similar to message references, the feedback was along the lines of "let's not do it now, but let's not prevent the possibility of doing it later" -ECH: For item 15, the general consensus was that the proposed feature was not given with a clear problem statement defined, +ECH: For item 15, the general consensus was that the proposed feature was not given with a clear problem statement defined, EAO: My take is that the committee sees the feature as out of scope of the spec. -ZBI: I believe that this is an unfortunate situation because I wanted to discuss it with the committee before I left early, but +ZBI: I believe that this is an unfortunate situation because I wanted to discuss it with the committee before I left early, but APP: I think I am in agreement with EAO's last statement that we have a lenient parser that can fail on things that are errors, but it would be bad if we had an entirely lenient parser that would allow completely junk data to go into production. EAO: This got covered during the last 15 minutes of the whole TC considering things. And I did feel like it was missed an opportunity to shine. I would recommend that we schedule for next Mondays extended call. I presume we're having an extended call next week to talk about this specifically, Rather than going into really the depths of what we mean by recoverability and what would be good, what would be bad during this meeting. -MIH: For people who advocate for recoverability for bad syntax to look at the existing specs, for example in programming languages. Different compilers of a language can do different things when they encounter syntax errors because the behavior is not defined. I would like to see examples. +MIH: For people who advocate for recoverability for bad syntax to look at the existing specs, for example in programming languages. Different compilers of a language can do different things when they encounter syntax errors because the behavior is not defined. I would like to see examples. ECH: If we decide to discuss this further, I would like to see a clear problem statement, with alternatives, and each with their pros and cons. @@ -255,17 +252,13 @@ STA: I think robustness is a great value, but I'm sure there is a spectrum to it EAO: Next week Monday ZBI will lead a discussion on robustness. - - ## Requirements list for syntax - Single line vs Multine messages - - Should we minimize the volume and the syntax for simple cases - minimal needed syntax ? - Error Fallback options - Should the syntax be agnostic, in order to be used in different platforms ? - Should we take this as an example/starting point https://github.com/unicode-org/message-format-wg/blob/main/test/formattest.xsd - - +Single line vs Multine messages - +Should we minimize the volume and the syntax for simple cases - minimal needed syntax ? +Error Fallback options +Should the syntax be agnostic, in order to be used in different platforms ? +Should we take this as an example/starting point https://github.com/unicode-org/message-format-wg/blob/main/test/formattest.xsd MIH: This is too short notice. I only received this meeting's agenda before the meeting, so I haven't had time to think about it. @@ -293,66 +286,55 @@ EAO: How about somebody puts up a document shares it with everyone for collectin GWR: I wanted to cover my proposal for an implementation since there is a desire to move quickly with the current ones. I want to bring up three main concerns with the current proposals. 1) I don't think they typically address the typical linguistic problems that hands the translators. 2) There's been slow progress and we won't pass the proposed test cases that I provided like about last year about 11 months ago. 3) I just wanted to kind of briefly go over the existing proposals.. - There's new custom syntax but it's something to consider and it's markdown just for comparison. +There's new custom syntax but it's something to consider and it's markdown just for comparison. - - -EAO: One particular thing that jumped out at me is that what we ended up in MF2 is design for how we represent messages and message resources. As we are including a implementation agnostic data model, we should include something like a canonical XML representation of that Data Model that represents that same DM ? This would make easy to import and use different messages across different implementations. +EAO: One particular thing that jumped out at me is that what we ended up in MF2 is design for how we represent messages and message resources. As we are including a implementation agnostic data model, we should include something like a canonical XML representation of that Data Model that represents that same DM ? This would make easy to import and use different messages across different implementations. GWR My concern about transforming syntax to another different syntax might be overkill EAO: I think we already have an agreement that we are going to define the data model for MF2. So it becomes relatively easy to say that if we want to represent messages in XML, this is way to do it. -STA: I’m happy to see a different proposal from what have been seen so far +STA: I’m happy to see a different proposal from what have been seen so far On a high level, I think what you're proposing is a possible incarnation of the other models. How I think about the other proposals is that they can allow the encoding of the grammatical model into morphological glossaries plus a set of custom functions. IIUC, what you're proposing here is that some of that grammatical resolution is encoding directly in the standard. It's interesting and perhaps more advanced than what the other proposals are about. GWR: You're right, I'm encoding more of teh grammatical model into the standard. -STA : And related to that, my 3rd question it’s about slide 27 , example how this constraints works ? How a constrain value makes to the runtime. How that term know about desired gender of the user ? - +STA : And related to that, my 3rd question it’s about slide 27 , example how this constraints works ? How a constrain value makes to the runtime. How that term know about desired gender of the user ? -GWR: So, that's an excellent point. What you can do is you can provide a term like him her or it let's say. And what would have is like the first value, it actually have all the constraints with it potentially, that's that's one such example. The other possible example, is that in the back end, we've actually defined all the grammys that are valid forgiven language and the populate that that's possibility. There's also some people that want be able to provide custom pronouns and as far as custom pronouns, you know, maybe they want to be able to define whether that has a masculine or feminine or a neutral term or an undefined term, or maybe it's just not known and they need to be able to +GWR: So, that's an excellent point. What you can do is you can provide a term like him her or it let's say. And what would have is like the first value, it actually have all the constraints with it potentially, that's that's one such example. The other possible example, is that in the back end, we've actually defined all the grammys that are valid forgiven language and the populate that that's possibility. There's also some people that want be able to provide custom pronouns and as far as custom pronouns, you know, maybe they want to be able to define whether that has a masculine or feminine or a neutral term or an undefined term, or maybe it's just not known and they need to be able to -STA : know that seal D For example. Now provides a list of grammatical features that are That exists in different languages. So something that could be used as well, but I think here's specifically, I was more wondering about like the expected API on the from the point of view of a, you know, like c++ developer, like how do they actually inject the, the current users Gender into a format call with this constrained space. +STA : know that seal D For example. Now provides a list of grammatical features that are That exists in different languages. So something that could be used as well, but I think here's specifically, I was more wondering about like the expected API on the from the point of view of a, you know, like c++ developer, like how do they actually inject the, the current users Gender into a format call with this constrained space. -ECH: I’m curious to know the things you described can be represented through the 3 proposal , due that they have open this part , the question is - this can be supported by all 3 proposals ? +ECH: I’m curious to know the things you described can be represented through the 3 proposal , due that they have open this part , the question is - this can be supported by all 3 proposals ? -2nd thing was actually met a comment and it's actually praise. I mean, I just saw that you have pros and cons in some of the in a lot of the slides that you had for the different features. And, you know, I appreciate that. I mean, it allows me to focus on specifics and say, like, okay, well, I might have a disagreement here or maybe there's another con that should be added to the XML, at least, at least one. So just that's just praise. +2nd thing was actually met a comment and it's actually praise. I mean, I just saw that you have pros and cons in some of the in a lot of the slides that you had for the different features. And, you know, I appreciate that. I mean, it allows me to focus on specifics and say, like, okay, well, I might have a disagreement here or maybe there's another con that should be added to the XML, at least, at least one. So just that's just praise. -The third thing is a higher level question, it's almost like a point of order. Which is that this is like a full-blown proposal. And as a I think STA mentioned it actually goes beyond even the scope of the proposals that were presented to the ICU - -What do we want to do with this, is this going to be a proposal that the committee hasn't seen that, we need to consider. Is this going to be something that? +The third thing is a higher level question, it's almost like a point of order. Which is that this is like a full-blown proposal. And as a I think STA mentioned it actually goes beyond even the scope of the proposals that were presented to the ICU +What do we want to do with this, is this going to be a proposal that the committee hasn't seen that, we need to consider. Is this going to be something that? GWR:I'll just quickly say that as far as the Alternate proposals is, you know, is it possible? Yes. It is possible to handle that some operations that way. My concern is that it is too open-ended, it is too much of of Lego bricks and I I recommend providing a little bit more structure to constrain what is possible and improve interoperability kind of like house. The C language says the ants can be of very sizes and Um, some people thought well and can be 16 bits and other people. So 32. And I would like to reduce that kind of problem in the future and provide a little bit, more formal structure that supported. But yes, it can be for that way. And your second point. Thank you. And the third part as far as consensus when the goal here was just to provide something that was a little bit more concrete, less abstract and a starting point and I think as far as previous proposals, I want them to be Quick anyway. - APP: I think it's an audacious thing. A number of us in the NLG space are struggling with these ideas ourselves. But I think this is something that is difficult to take to developers and make them do something with it. So I agree with ECH that I would like to take the existing proposals and use them to solve the majority of the problems of existing customers and users. - GWR: You’re right maybe it’s overkill , I wanted to focus on the senses on the most actua MF doesn’t do this. -MIH: I have the same comment as ECH that it would have been nice to have it in front of the CLDR-TC + ICU-TC because it would have been nice to have their comments about this. This proposal came after that, and it only arrived for this meeting with short notice. I would have liked to see these several ideas in this proposal and taken separately +MIH: I have the same comment as ECH that it would have been nice to have it in front of the CLDR-TC + ICU-TC because it would have been nice to have their comments about this. This proposal came after that, and it only arrived for this meeting with short notice. I would have liked to see these several ideas in this proposal and taken separately +EAO: What do we do with this now? I think what we can think out of this is a lot of interesting and real use cases that we really ought to be solved with MF2 while keeping in mind that effectively what we do it doing at least with message from a 2.0 is figuring out very much. The MVP minimal viable product that we can have out of this spec that allows for other things to be done, we might fit this into the actual proposals. -EAO: What do we do with this now? I think what we can think out of this is a lot of interesting and real use cases that we really ought to be solved with MF2 while keeping in mind that effectively what we do it doing at least with message from a 2.0 is figuring out very much. The MVP minimal viable product that we can have out of this spec that allows for other things to be done, we might fit this into the actual proposals. - - -ECH: I agree with what you said, EAO . I think the the high level point which I'm glad we're discussing, because I think that's really important. We should decide as a group, what we want to do here. I think we're beyond the point where we want to be entertaining, new proposals. And I think what we have with our proposals can support this idea. So that's my, my appeal to the group is that we continue with the the path that we're on right now. - +ECH: I agree with what you said, EAO . I think the the high level point which I'm glad we're discussing, because I think that's really important. We should decide as a group, what we want to do here. I think we're beyond the point where we want to be entertaining, new proposals. And I think what we have with our proposals can support this idea. So that's my, my appeal to the group is that we continue with the the path that we're on right now. EAO: The syntax that we are proposing. Is something that is More directly human editable than an XML. What I will my first comment on this is that I do think that it should be. In fact, relatively simple for us to include an XML message, specification as I'm Appendix or an annex to the spec that would enable exactly this sort of use. While also allowing for those environments for instance, where JSON is more popular or more easier to use to have a canonical JSON, specification for what a message looks like. And then also to have the syntax that Star Shania are working on, which is a canonical human friendly. representation, RCA: Final reminder to please if you have requirements or wishes or other things about the syntax to add them to the notes by Thursday - - -Action Items / Questions : +Action Items / Questions : - Decide and figure out the next steps for this new “proposal” - - Should be used ad kind of canonical XML representation of DM ? - - Annex of the spec ? - - We are already beyond the timeline to include those ideias into the actual proposals ? How it can be done ? / They are compatible ? + - Should be used ad kind of canonical XML representation of DM ? + - Annex of the spec ? + - We are already beyond the timeline to include those ideias into the actual proposals ? How it can be done ? / They are compatible ? - Should we consider sharing this new “proposal” with CLDR-TC - Fill the requirements lists diff --git a/meetings/2022/notes-2022-05-16.md b/meetings/2022/notes-2022-05-16.md index b29e6c8bac..4e29729750 100644 --- a/meetings/2022/notes-2022-05-16.md +++ b/meetings/2022/notes-2022-05-16.md @@ -1,4 +1,5 @@ ### May 16th, meeting Attendees + - Romulo Cintra (RCA) Igalia - Daniel Minor - Mozilla (DLM) - Eemeli Aro - Mozilla (EAO) @@ -10,427 +11,418 @@ - Zibi Braniecki - Mozilla (ZBI) - Richard Gibson - OpenJSF (RGN) -## Auto Transcription - +## Auto Transcription - -## MessageFormat Working Group Contacts: +## MessageFormat Working Group Contacts: - [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) -## Next Meeting +## Next Meeting May 23, 2022 ### Agenda - + - Admin - 5 min - - 20th June - Plenary - move it to the 27th - -## Retrospective on Syntax PR’s - -https://github.com/unicode-org/message-format-wg/pull/266/files - -ECH: What is the process to have consensus - PR’s / Design Docs / Slides or is there another way ? - + - 20th June - Plenary - move it to the 27th + +## Retrospective on Syntax PR’s + +https://github.com/unicode-org/message-format-wg/pull/266/files + +ECH: What is the process to have consensus - PR’s / Design Docs / Slides or is there another way ? + I originally had concerns about the PR initiated by EAO for trying to merge to `main` based on our previous processes for doing things. Merging to `main` has had a certain meaning since the beginning that everyone’s concerns have been addressed, but this PR didn’t really attempt to do that. - -I had concerns about the continuity of discussions started in the slide deck that were not really captured in the PR. I want to point out MIH’s effort to transfer those comments, that was really valuable. The discussions on the follow repo issues shows the importance of the comments and the importance of the continuity of discussions. - -A PR for a syntax proposal was submitted on Friday by Markus. There are people concerned about it. What do we want to do about that? - + +I had concerns about the continuity of discussions started in the slide deck that were not really captured in the PR. I want to point out MIH’s effort to transfer those comments, that was really valuable. The discussions on the follow repo issues shows the importance of the comments and the importance of the continuity of discussions. + +A PR for a syntax proposal was submitted on Friday by Markus. There are people concerned about it. What do we want to do about that? + STA: Thanks ECH for raising those concerns. There was a meeting for deciding on the `develop` branch. There is not a lot of our work captured in Github that occurs in other meetings and meeting notes. - + Markus had a concern about an engineering mindset of several interrelated issues, and perhaps that is true, but I wish he hadn’t filed his PR. [clarification later on: … but instead filed separate issues with his very good feedback] - + EAO: Agree with STA. My previous understanding was that STA and I would create a starting point for syntax discussions, and then we can take issues up one by one. - -I hope the changes on syntax are proposed on the current document instead of having a new document with a new syntax - + +I hope the changes on syntax are proposed on the current document instead of having a new document with a new syntax + I agree that many of these issues are interlinked, but I think we should be able to have individual issues. - + ECH : It’s important to make sure that all feedback independently of meeting participations are included and listened to. I was concerned about how people who don’t frequently attend meetings left feedback on the slides, and that would have been dropped on the floor until MIH had transferred the comments to those issues. - + I believe that development and experimental branches technologically are pretty similar, so I still don’t understand the difference. - -I don’t feel that the syntax proposals(merged) doesn’t reflect the EM proposal at all, and it doesn’t incorporate the comments from the slides. Markus has a PR for his own compromise syntax proposal, so what do we want to do about that? - + +I don’t feel that the syntax proposals(merged) doesn’t reflect the EM proposal at all, and it doesn’t incorporate the comments from the slides. Markus has a PR for his own compromise syntax proposal, so what do we want to do about that? + STA: I want to clarify that I am happy that Markus gave us feedback, obviously. I want to address what ECH said about the EAO and STA PR not including the EM proposal syntax. When EAO, STA, and MIH discuss things amongst ourselves, it was a logistical problem to not be able to include MIH in those discussions. So having a PR the way it was was just a way to create a PR without having too many opinions that would complicate the PR. - -EAO: What we presented in the PR was a work-in-progress. What we were looking for wasn’t comments and criticism, but instead input the final resolution. We got the feedback that we needed to finalize the proposal that we want to present to the group. - -ECH: Based on the way that PR was created, there were existing viewpoints not represented, meaning that there were was room for those ideas to exist. Markus had his own PR, and based on the initial feedback there are people who agree with what he is saying, so I don’t think that discussion should be totally sidelined. - - + +EAO: What we presented in the PR was a work-in-progress. What we were looking for wasn’t comments and criticism, but instead input the final resolution. We got the feedback that we needed to finalize the proposal that we want to present to the group. + +ECH: Based on the way that PR was created, there were existing viewpoints not represented, meaning that there were was room for those ideas to exist. Markus had his own PR, and based on the initial feedback there are people who agree with what he is saying, so I don’t think that discussion should be totally sidelined. + EAO: Absolutely. I don’t mean that at all. The second proposal that’s is in PR now have a shared baseline but differs in several ways and we have the issues to address this as separate conversation instead of having an all in one new PR. - + STA: My understanding from back in January when discussing the proposals to the CLDR-TC when talking with MIH was that he was not entirely interested in the syntax so long as the data model is correct. And perhaps I took this too far in my interpretation, and took that in my discussions with EAO. - + But it seems that MIH does have opinions on syntax, especially as it is a reflection of the features in the data model. So perhaps we should have run these ideas by him before creating our syntax proposal. - + EAO: We should be having these conversations with anyone and everyone who is interested, and not just in a small clique. - -RCA: Good, and now is the time to address that. My concern is whether we are going in the opposite direction because now we have another syntax proposal PR in addition to the first syntax proposal. - -EAO: Let’s proritize the issues that we do have. Let’s identify what to work on next in order to proceed. - -MIH: I didn’t talk to Markus about his PR or anything, but I’m afraid that the fact that he took the effort to submit it is a reflection that he feels very strongly about what he feels. So I’m concerned that a passive open request for feedback will postpone acting on the feedback, which feels important. We can ask directly if that is useful. - + +RCA: Good, and now is the time to address that. My concern is whether we are going in the opposite direction because now we have another syntax proposal PR in addition to the first syntax proposal. + +EAO: Let’s proritize the issues that we do have. Let’s identify what to work on next in order to proceed. + +MIH: I didn’t talk to Markus about his PR or anything, but I’m afraid that the fact that he took the effort to submit it is a reflection that he feels very strongly about what he feels. So I’m concerned that a passive open request for feedback will postpone acting on the feedback, which feels important. We can ask directly if that is useful. + RCA: I think that would be useful. - + STA: Maybe we can invite Markus here next week. Otherwise we are speculating. - + RCA: Let’s add an action item to invite him. - + STA: ECH, are you happy with how we are going about things? - - - + Action Point: -- [ ] Invite Markus Scherer to the next Extended meeting to share thoughts about the PR - -ECH : Communication is important, so I’m glad we’re having these conversations. I’m not sure if the process of filing several issues would work, based on how things have gone in the past. I have never been in favor of doing the same things over and over and expecting different results. So when we have been stuck, I have been interested in doing it in different ways to test if we can make more progress using different strategies. A year ago, around February/March, we were deadlocked in disagreement but couldn’t even agree that we disagreed, the response from people was to talk more with even fewer people than we had in our frequent 4 person huddles. That continued until the CLDR-TC stepped in at the end of the year. My question is, as a group what have we done that has worked in order to unblock and make progress ? Some of the recent progress was because the CLDR-TC + ICU-TC committee intervened and we spent 3 months creating concrete proposals and discussing them in committee meetings. But alone as a group, what have we done that has worked? My answer to STA’s question is with a question. - +- [ ] Invite Markus Scherer to the next Extended meeting to share thoughts about the PR + +ECH : Communication is important, so I’m glad we’re having these conversations. I’m not sure if the process of filing several issues would work, based on how things have gone in the past. I have never been in favor of doing the same things over and over and expecting different results. So when we have been stuck, I have been interested in doing it in different ways to test if we can make more progress using different strategies. A year ago, around February/March, we were deadlocked in disagreement but couldn’t even agree that we disagreed, the response from people was to talk more with even fewer people than we had in our frequent 4 person huddles. That continued until the CLDR-TC stepped in at the end of the year. My question is, as a group what have we done that has worked in order to unblock and make progress ? Some of the recent progress was because the CLDR-TC + ICU-TC committee intervened and we spent 3 months creating concrete proposals and discussing them in committee meetings. But alone as a group, what have we done that has worked? My answer to STA’s question is with a question. + EAO: Can we just try to discuss the issues that we have and try to form a consensus? - -RCA: It’s hard to say what worked and what didn’t, but some things worked, some things didn’t. We have made progress, but it’s always the consensus part that has been difficult. - -STA: I think it’s important to analyze what has worked in a “blameless postmortem” type of way, but it’s also not useful to bash ourselves. Looking with some hindsight, we haven’t agreed what we want MFv2 to be. Until a few months ago when CLDR-TC asked for proposals, we hadn’t really delved into issues that are important. Why I think this time is different, is that if we use a public forum, then everything is recorded and out in the open. - + +RCA: It’s hard to say what worked and what didn’t, but some things worked, some things didn’t. We have made progress, but it’s always the consensus part that has been difficult. + +STA: I think it’s important to analyze what has worked in a “blameless postmortem” type of way, but it’s also not useful to bash ourselves. Looking with some hindsight, we haven’t agreed what we want MFv2 to be. Until a few months ago when CLDR-TC asked for proposals, we hadn’t really delved into issues that are important. Why I think this time is different, is that if we use a public forum, then everything is recorded and out in the open. + I think we have the extended meetings to continue discussion, and even though I was a proponent of that, it has led to the communication gap that has caused problems. - + RCA: Should we go on to the issues? - + STA: What do we do about Markus’s PR? I think that’s the issue. - + RCA: I think we have an action item to invite Markus and see what he thinks. We don’t know what the intention is, so we should wait to ask. - - + ## Prioritise and resolve issues raised with the proposed syntax. - + EAO: We have a request from the CLDR-TC that we have a technical preview of MFv2 ready for the ICU 72 release. That means that we would need a syntax ready in the next few weeks. Without having that outside pressure, it might be possible to discuss the issues that are blockers. Let’s go through the issues we have and determine which issues are blockers for the code for ICU 72 release. - -MIH: I agree with this point, in order to have something delivered ultimately, we have to start getting into the details. I don’t expect us to address all issues, but if we can go through them and tag them as blockers for implementing the technical preview. I think we should also send out an invitation to the list - + +MIH: I agree with this point, in order to have something delivered ultimately, we have to start getting into the details. I don’t expect us to address all issues, but if we can go through them and tag them as blockers for implementing the technical preview. I think we should also send out an invitation to the list + RCA: I have created another Project kanban board in the repo that we can use to pull issues into. - + MIH: I prefer using labels instead of this process. It is hard to have multiple ways of tracking info. - + STA: Github also has milestones. But I share MIH’s sentiment. Maybe one label could represent agreement, another label represents nomination. A new issue created for a blocker would get the nomination label, and if we all agree it is a blocker, then it gets the agreement label. - + MIH: Sounds good. - + RCA: Let’s create those labels now. - + EAO: Can we use labels: blocker, backlog, wishlist. - + STA: Do those labels map onto P0, P1, and P2? - + Labels: - + - backlog - blocker - wishlist - + ## Discuss how we may proceed in parallel with the other remaining parts of the spec, and figure out what is the true MVP scope for an initial implementation. - + ### [#269](https://github.com/unicode-org/message-format-wg/pull/269) - + RGN: I filed this issue to make sure that it was captured, but it existed only in the email thread. It is something that I think should be resolved, but I agree that it doesn’t need to be a blocker for a technical preview. - + ### [#268](https://github.com/unicode-org/message-format-wg/pull/268) - -RGN: I observed when going over the syntax proposal merged in the `develop` branch. Multiline strings are uncommon but not unheard of, so that merited special attention. Not a blocker for a technical preview. - + +RGN: I observed when going over the syntax proposal merged in the `develop` branch. Multiline strings are uncommon but not unheard of, so that merited special attention. Not a blocker for a technical preview. + ### [#265](https://github.com/unicode-org/message-format-wg/pull/265) - + EAO: Not a blocker - + ### [#263](https://github.com/unicode-org/message-format-wg/pull/263) - + EAO: That is a blocker - + STA: I agree that is a blocker, but it is very simple to change. It can change adoption very much, so I agree that it is a blocker. - + ### [#262](https://github.com/unicode-org/message-format-wg/pull/262) - -STA: Markus was concerned about markup being too freeform, i think. - -MIH: I think it is related to other issues that we have, and those are blockers. Ex: Is HTML a first-class markup citizen and all other markups are not, or - + +STA: Markus was concerned about markup being too freeform, i think. + +MIH: I think it is related to other issues that we have, and those are blockers. Ex: Is HTML a first-class markup citizen and all other markups are not, or + ECH: I think this is a blocker. This is really similar to the committee feedback on what it called “UI Elements”, here it is called “markup element”, but it’s really the same thing. - + ### [#261](https://github.com/unicode-org/message-format-wg/pull/261) - + EAO: We can find out from Markus next week. - -STA: Maybe there is a critical user journey that is missing from this and a few other related issues. I imagine the topic would be “what is the select and format mechanism?” We should find out from the TC. I don’t think this is a blocker itself. - + +STA: Maybe there is a critical user journey that is missing from this and a few other related issues. I imagine the topic would be “what is the select and format mechanism?” We should find out from the TC. I don’t think this is a blocker itself. + ### [#260](https://github.com/unicode-org/message-format-wg/pull/260) - -STA: I think we talked about this with MIH lately, about how formatting and selection functions are preferred and valid. It is about the naming of default functions. - + +STA: I think we talked about this with MIH lately, about how formatting and selection functions are preferred and valid. It is about the naming of default functions. + MIH: Markus has some similar thing detailed in his syntax PR. - + STA: I think this is a registry issue. - -MIH: I think for the implementation, it’s okay to design the APIs that can support either way. If we can’t agree that we need 3 categories of functions, then it’s a blocker. - + +MIH: I think for the implementation, it’s okay to design the APIs that can support either way. If we can’t agree that we need 3 categories of functions, then it’s a blocker. + EAO: Unless we have a registry, then we have to say it’s not a blocker. But it highlights that we should have a registry proposal ASAP. - -MIH: I agree it’s not a syntax, but it is required for the implementation. So it’s not something that can be delayed. - + +MIH: I agree it’s not a syntax, but it is required for the implementation. So it’s not something that can be delayed. + STA: I agree with MIH. - + RCA: So it is a blocker and it is registry. - + MIH: Yes, but remember that it could still be a blocker even without a registry implementation. - + ### [#259](https://github.com/unicode-org/message-format-wg/pull/259) - + STA: I don’t think this is a blocker. - -EAO: Yes, this is a blocker. Going back on that decision would be difficult. - + +EAO: Yes, this is a blocker. Going back on that decision would be difficult. + ### [#257](https://github.com/unicode-org/message-format-wg/pull/257) - + EAO: Blocker. - + STA: Yes, it’s related to the preamble declaration. - + ### [#256](https://github.com/unicode-org/message-format-wg/pull/256) - + STA: That is a blocker. - + EAO: Agreed. - + ### [#255](https://github.com/unicode-org/message-format-wg/pull/255) - -ECH: This is an issue that multiple have commented on. I didn’t have an opinion until I read their comments, and it makes sense. It has a linguistic related aspect that affects people’s - + +ECH: This is an issue that multiple have commented on. I didn’t have an opinion until I read their comments, and it makes sense. It has a linguistic related aspect that affects people’s + STA: I think it’s contentious enough that is should be a blocker. - + ### [#254](https://github.com/unicode-org/message-format-wg/pull/254) - + EAO: This is not a syntax thing. - + RCA: But is it a blocker? - + STA: After MIH explained how important this is to leveraging, I think this is a blocker. - -MIH: I think this is a blocker. yes. It’s not data model, it is actually mostly syntax, but still it is a blocker. It affects how you handle the keys. - + +MIH: I think this is a blocker. yes. It’s not data model, it is actually mostly syntax, but still it is a blocker. It affects how you handle the keys. + STA: Okay, even more reason for it to be a blocker. - + ### [#253](https://github.com/unicode-org/message-format-wg/pull/253) - + STA: This is something that Markus’s PR has, and Addison was a proponent of. - + EAO: Isn’t this related to a previously discussed issue? - + STA: No, that was about patterns, this is about keys. - + EAO: Okay, then it’s a blocker. - + ### [#252](https://github.com/unicode-org/message-format-wg/pull/252) - + STA: That is a blocker - + MIH: Blocker - + EAO: Blocker - + ### [#251](https://github.com/unicode-org/message-format-wg/pull/251) - -STA: Similar to the issue that Markus had about making selectors stand out. This one is about the entire preamble. It is not clear whether it includes local variables or not. Probably a blocker. - + +STA: Similar to the issue that Markus had about making selectors stand out. This one is about the entire preamble. It is not clear whether it includes local variables or not. Probably a blocker. + ### [#249](https://github.com/unicode-org/message-format-wg/pull/249) - + EAO: Based on discussions with ZBI, not a blocker. - + ### [#248](https://github.com/unicode-org/message-format-wg/pull/248) - + STA: Not a blocker - + MIH: Not a blocker, but the more we kick this can down the road, the more difficult it becomes. - + ECH: Maybe it’s not a blocker, but it gets really annoying when people assume “Placeable” is an interface because it ends in “-able”, and we’ve already had this discussion before and we’re having it again. As I said earlier, we also had misunderstanding earlier today looking at an issue on “markup syntax” when the CLDR-TC committee had referred to it as “UI Elements” because that’s what a proposal used. - -STA: - + +STA: + ### [#247](https://github.com/unicode-org/message-format-wg/pull/247) - + STA: I think this issue is a blocker - + EAO: This could be something we can defer as a blocker until we hear from Markus. - + STA: But I think MIH is right, I understood Markus’s PR as him being strongly against one of those use cases. - + MIH: I’m also against the argument being a function option. - + STA: We can leave it as a non-blocker and work on specifying it better. - + ### [#246](https://github.com/unicode-org/message-format-wg/pull/246) - + STA: Probably not a blocker. - + EAO: Can we just say that they just need to be delimited somehow? - + MIH: Yes, I agree, it doesn’t have to be quotes. - -STA: There is a separate issue for what the delimiter should be, but this issue is about _whether_ there should be a delimiter. - + +STA: There is a separate issue for what the delimiter should be, but this issue is about _whether_ there should be a delimiter. + STA: There is no rush on this. - + ### [#245](https://github.com/unicode-org/message-format-wg/pull/245) - + MIH: I don’t see it is a blocker. - + ### [#244](https://github.com/unicode-org/message-format-wg/pull/244) - + STA: I don’t know how to resolve this. - + MIH: For me, it’s clear that the title question’s answer is “yes”. So it’s fine to move on, but it’s not solely my decision to make. - + STA: Not a blocker. - + ### [#243](https://github.com/unicode-org/message-format-wg/pull/243) - + STA: I can close this and reference the related issue. - -STA: Actually, I think this issue was about - + +STA: Actually, I think this issue was about + ### [#242](https://github.com/unicode-org/message-format-wg/pull/242) - - - + EAO: I think this connected to allowing whitespace. - + STA: I think this is completely a blocker, and we should resolve it with the non-controversial thing, which is a sigil. - + EAO: Sure. - + ### [#241](https://github.com/unicode-org/message-format-wg/pull/241) - -MIH: I think it is a blocker if you look at the bigger picture. - -ECH: Do you remember the CLDR-TC - + +MIH: I think it is a blocker if you look at the bigger picture. + +ECH: Do you remember the CLDR-TC + EAO: My sense was content in curly braces is special, but not all content in curly braces is variable references or placeholders. - + STA: The committee resolution doc on UI elements says the “introducer begins with a { “ - -ECH: Addison had a comment saying that this is something that should be handled outside of MF, say by your “Translation Manamagent System (TMS)” or some other process. I agree. So there is a related question about whether we should even be doing this, so this is a blocker. - + +ECH: Addison had a comment saying that this is something that should be handled outside of MF, say by your “Translation Manamagent System (TMS)” or some other process. I agree. So there is a related question about whether we should even be doing this, so this is a blocker. + MIH: I propose that we have move ahead if we all agree that it is a blocker. - + ### [#240](https://github.com/unicode-org/message-format-wg/pull/240) - + EAO: Does everyone agree that we should add a standalone markup element to the spec? - + MIH: I find it strange that we have things like in HTML bold and links are markup elements, but images are placeholders. - + STA: Agreed. - + RCA: So this is a blocker, until we say otherwise. - + ### [#239](https://github.com/unicode-org/message-format-wg/pull/239) - -STA: This is a duplicate of the previous #240. Please reference the previous issue when closing this one. - + +STA: This is a duplicate of the previous #240. Please reference the previous issue when closing this one. + ### [#238](https://github.com/unicode-org/message-format-wg/pull/238) - -STA: I think these concerns should be explained. MIH can you explain? - -MIH: I can explain, but not better than the already existing comments. It’s about namespacing. The namespace is a potential solution to this. - -STA: I thought this was more about being able to use - + +STA: I think these concerns should be explained. MIH can you explain? + +MIH: I can explain, but not better than the already existing comments. It’s about namespacing. The namespace is a potential solution to this. + +STA: I thought this was more about being able to use + MIH: Potentially, sure, why not? - + STA: Well, Latex doesn’t using markup, right? - + EAO: If we add namspacing, then we don’t have to have this is a blocker. - -STA: I think namespacing is a blocker, even though it is jumping - + +STA: I think namespacing is a blocker, even though it is jumping + RCA: Is everyone okay with this not being a blocker? - + ECH: Isn’t this basically related to other things that are blockers. - -STA: We can just prefix everything with “html_” - + +STA: We can just prefix everything with “html\_” + MIH: No, that won’t work. That will force ICU or any implementation to know ahead of time what all types of markup will exist and have special code to know how to handle them when they are encountered. - + ECH: It sounds to me that it is a blocker, and if it is a duplicate of something else, then it is easy to close. - + STA: If we follow that logic, then all of our syntax proposal followup issues - - + ### [#237](https://github.com/unicode-org/message-format-wg/pull/237) - + MIH: I think this is related to the other discussion on placeholders. It is about how to wrap literals or not. - + EAO: Can we close this or just let it be? - + MIH: I think this will be self-resolved once we decide how to wrap string literals and option values in placeholders. - + ### [#236](https://github.com/unicode-org/message-format-wg/pull/236) - + EAO: Not a blocker because resource syntax is not a blocker - + ### [#235](https://github.com/unicode-org/message-format-wg/pull/235) - + EAO: I think we are all clear that this is an error, as it currently is. - + MIH: Yeah, it is an error. - + ### [#234](https://github.com/unicode-org/message-format-wg/pull/234) - -EAO: We currently don’t have Unicode escape sequences, and I am not proposing that we add them explicitly. We can close unless someone thinks differently. - + +EAO: We currently don’t have Unicode escape sequences, and I am not proposing that we add them explicitly. We can close unless someone thinks differently. + ### [#233](https://github.com/unicode-org/message-format-wg/pull/233) - + EAO: Not a blocker - + ### [#209](https://github.com/unicode-org/message-format-wg/pull/209) - + EAO: This is an issue that is related to the syntax that I raised in November last year. So we are done with the issues related specifically to our recent syntax proposals. Let’s not do anything with this issue. - + ### [#160](https://github.com/unicode-org/message-format-wg/pull/160) - + EAO: I don’t think this is a syntax issue. - + MIH: No, it’s not. It’s ancient because it is more than 1 year old. - + RCA: Okay, then let’s remove the “syntax” label from it. - - -RCA: How many total blocker issues do we have? Is that all? - + +RCA: How many total blocker issues do we have? Is that all? + MIH: Well, we will probably have more after we extend a call for others to file other blocker issues, whether they are syntax or not. - + EAO: Let’s also encourage people to comment on issues so that we can work asynchronously so that we can make progress. - -STA: Do we want to add these issues to the milestone? Have you talked about that? - + +STA: Do we want to add these issues to the milestone? Have you talked about that? + EAO: We don’t have another thing besides the technical preview to deliver, so a milestone wouldn’t help. - + RCA: So are we all ready? - + EAO: We should address all of the blockers for the technical preview, that are beyond just the syntax. - + RCA: If we get Markus here next week, would it be possible to ask him? - + EAO: Adding a meta blocker issue of noting that we need to resolve this. I can do this. - -STA: Back to the question of the milestones, from the point of view of a newcomer to the repo who adds an issue that doesn’t get the label “blocker”, they may wonder what does “blocker” mean. So having a milestone allows us to clarify - - + +STA: Back to the question of the milestones, from the point of view of a newcomer to the repo who adds an issue that doesn’t get the label “blocker”, they may wonder what does “blocker” mean. So having a milestone allows us to clarify + CUJs: + 1. How do I select-and-format? 1. How do I use non-HTML UI elements? - + Action Point: -- [ ] Invite Markus Scherer to the next Extended meeting to share thoughts about the PR + +- [ ] Invite Markus Scherer to the next Extended meeting to share thoughts about the PR diff --git a/meetings/2022/notes-2022-06-06.md b/meetings/2022/notes-2022-06-06.md index 453e840463..580194cf6e 100644 --- a/meetings/2022/notes-2022-06-06.md +++ b/meetings/2022/notes-2022-06-06.md @@ -1,4 +1,5 @@ ### June 6th, meeting Attendees + - Romulo Cintra (RCA) Igalia - Eemeli Aro - Mozilla (EAO) - Mihai Nita - Google (MIH) @@ -8,268 +9,267 @@ - Richard Gibson - OpenJSF (RGN) - Staś Małolepszy - Google (STA) -## Auto Transcription - +## Auto Transcription -## MessageFormat Working Group Contacts: +## MessageFormat Working Group Contacts: - [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) -## Next Meeting +## Next Meeting May 23, 2022 ### Agenda - -* review open issues -* discuss blockers to beginning prototyping - + +- review open issues +- discuss blockers to beginning prototyping + ## Review open issues - + ### [#254](https://github.com/unicode-org/message-format-wg/pull/254) - + EAO: It looks like we have consensus that we should have the same number of keys and selectors. - + MWS: I feel strongly that we should have matching numbers of keys and selectors. - + ### [#252](https://github.com/unicode-org/message-format-wg/pull/252) - + EAO: I agree with STA that we should close this with a conclusion that local variable definitions and selectors should be different from each other. - + ### [#251](https://github.com/unicode-org/message-format-wg/pull/251) - + EAO: I don't think we're all agreed either way on the resolution. - -MWS: I think it is predicated on starting in code mode. If we do start in code mode, then we don't need to insert anything addition to start collecting patterns. - + +MWS: I think it is predicated on starting in code mode. If we do start in code mode, then we don't need to insert anything addition to start collecting patterns. + EAO: If we don't unify local variable definitions and selectors, then a single preamble doesn't make sense. We have two parts. - -MWS: Yes, you have to have these parts separate and clear somehow. You have local variables, then followed by selectors, then followed by selector key and pattern pairs. But you don't need something extra special before you start having patterns. - -STA: I think the preamble is an outdated concept, and we have 2 related issues, so I don't think we need a third. I think we can close this. - + +MWS: Yes, you have to have these parts separate and clear somehow. You have local variables, then followed by selectors, then followed by selector key and pattern pairs. But you don't need something extra special before you start having patterns. + +STA: I think the preamble is an outdated concept, and we have 2 related issues, so I don't think we need a third. I think we can close this. + MWS: Yes, close this as moot. - + STA: This issue is covered by #252 and #257. - + ### [#247](https://github.com/unicode-org/message-format-wg/pull/247) - + EAO: I think there is some value to have function options that refer to arguments / variables. - -MWS: I added my comment previous. I see that MIH has added a plausible use case. I'm not entirely happy with that, but I cannot think of a better way of supporting that use case, so I'm okay with that. - + +MWS: I added my comment previous. I see that MIH has added a plausible use case. I'm not entirely happy with that, but I cannot think of a better way of supporting that use case, so I'm okay with that. + EAO: Maybe we can say "yes", but qualify it by saying that there should be restrictions on the functions in the registry. - -MWS: Can we rephrase that as only allowing variables in function options. I would like to restrict it to the type of usage that MIH recorded in the issue so that we do not allow more free form usages. - + +MWS: Can we rephrase that as only allowing variables in function options. I would like to restrict it to the type of usage that MIH recorded in the issue so that we do not allow more free form usages. + EAO: That would restrict functions in the way that they can operate. - -MWS: - + +MWS: + MIH: Yes, we shouldn't allow variables / arguments - + MWS: I think we are trying to define something that is not what this issue is about, but an addendum. - -EAO: My proposal is to record consensus that the answer "yes, but allow restrictions in the function registry". There would be further discussion. But at the syntax level, I think we have agreement. - + +EAO: My proposal is to record consensus that the answer "yes, but allow restrictions in the function registry". There would be further discussion. But at the syntax level, I think we have agreement. + MWS: I agree at the syntax we can allow `{... :function option=$val ...}`, but we should not allow `$val` be an argument to the message. - + MIH: Does that mean that we can use local variables? - + MWS: I would like it to be limited to bullet 3 in the description of the issue: "Can `$variable` above be supplied by the application code as an argument to MessageFormat?" - -STA: I think bullet 5 "Can `$variable` refer to a local variable defined by the message?" to be considered as well. This would allow alignment / agreement across parts of a translation. - + +STA: I think bullet 5 "Can `$variable` refer to a local variable defined by the message?" to be considered as well. This would allow alignment / agreement across parts of a translation. + MIH: I think bullet 4 would allow sentence grammatical agreement. - + STA: MIH, I thought in our previous conversations, you said that a placeholder would be sufficient to allow this. - -MWS: In my mind, I think this is pretty speculative. I am doubtful that we can have a single pass that would allow sentence agreement. I think we would need at least a second pass handling to support sentence agreement. I think the scenario MIH gave to support agreement of units of measurement as compelling, but I have not found other arguments convincing. - -ZBI: I generally agree with MWS. My mental model over the last few months have been updated, in which we are not going to have a message format that handles grammatical agreement itself, but it can be the input to a separate system that is like a grammatical correctness engine. Instead, we can have a simpler model where simpler grammatical agreement can be supported but defer to a separate engine for more complex work. So I support bullet 3 and bullet 5. - -STA: To clarify, I think bullets 4 and 5 are equivalent. But I think we need to first decide whether we can pass info of one part of a message to another part. - -MIH: I agree with STA that bullets 4 and 5 are part of the same solution. I also agree with ZBI that we should have a grammatical correctness engine as a separate step outside of MessageFormat. If you have a sentence _"I couldn’t fit the statue in the bag because it was too big"_ vs _“I couldn’t fit the statue in the bag because it was too small”_, you need to have metadata to know to which noun the adjectives "big" / “small” refers to. Although the sentences are almost identical, “big” refers to the statue, “small” refers to the bag. A machine can’t decide that (yet :-) - + +MWS: In my mind, I think this is pretty speculative. I am doubtful that we can have a single pass that would allow sentence agreement. I think we would need at least a second pass handling to support sentence agreement. I think the scenario MIH gave to support agreement of units of measurement as compelling, but I have not found other arguments convincing. + +ZBI: I generally agree with MWS. My mental model over the last few months have been updated, in which we are not going to have a message format that handles grammatical agreement itself, but it can be the input to a separate system that is like a grammatical correctness engine. Instead, we can have a simpler model where simpler grammatical agreement can be supported but defer to a separate engine for more complex work. So I support bullet 3 and bullet 5. + +STA: To clarify, I think bullets 4 and 5 are equivalent. But I think we need to first decide whether we can pass info of one part of a message to another part. + +MIH: I agree with STA that bullets 4 and 5 are part of the same solution. I also agree with ZBI that we should have a grammatical correctness engine as a separate step outside of MessageFormat. If you have a sentence _"I couldn’t fit the statue in the bag because it was too big"_ vs _“I couldn’t fit the statue in the bag because it was too small”_, you need to have metadata to know to which noun the adjectives "big" / “small” refers to. Although the sentences are almost identical, “big” refers to the statue, “small” refers to the bag. A machine can’t decide that (yet :-) + MWS: I am willing to stand down on this, and we can stick with "Yes, with potential restrictions defined in the function registry". - -STA: I agree with this consensus. I would still be happy to document the usages that I have in mind. - + +STA: I agree with this consensus. I would still be happy to document the usages that I have in mind. + ### [#281](https://github.com/unicode-org/message-format-wg/pull/281) - + RCA: We have 2 approvals. Are we okay to merge? - + MIH: Yes. - + ### [#280](https://github.com/unicode-org/message-format-wg/pull/280) - + EAO: This effectively removes the ability to have a space after a colon but before the name of a function. - + ### [#278](https://github.com/unicode-org/message-format-wg/pull/278) - + RCA: I think BAT needed to change the examples. - + ### [#276](https://github.com/unicode-org/message-format-wg/pull/276) - + EAO: My understanding here is that MIH thinks that double quotes are better than parentheses. - -MIH: I am fine with parentheses for literal values. I feel stronger about angle brackets, which would cause problems with HTML-style markup languages. - + +MIH: I am fine with parentheses for literal values. I feel stronger about angle brackets, which would cause problems with HTML-style markup languages. + ### [#273](https://github.com/unicode-org/message-format-wg/pull/273) - + EAO: I feel differently about this PR after discussions about how to address open / close elements, and would rather spend time on that discussion instead of this PR. - + MIH: Yes, I think open / close / standalone for placeholders together as one topic and not different topics. - + ## Discuss blockers to beginning prototyping - + RCA: Shall we talk about ECH's proposed topic about how to go about prototyping? - + MIH: Yes, with the understanding that we are not trying to force anything, but it is already summer, and different people will have vacation, and we have to have a prototype implemented by middle of August to have something ready for a technical preview for ICU 72. - -EAO: Yes, let's talk about this. A relevant question is to MWS - do you have an idea of what needs to be resolved before we begin on work? - -MIH: My intuition is that we have to resolve all the issues labeled "blocker", otherwise we are stuck. Either they are blockers, or they are labeled incorrect. - + +EAO: Yes, let's talk about this. A relevant question is to MWS - do you have an idea of what needs to be resolved before we begin on work? + +MIH: My intuition is that we have to resolve all the issues labeled "blocker", otherwise we are stuck. Either they are blockers, or they are labeled incorrect. + EAO: But are there some things that might be minor, like the character for delimiting literal values, and they can be changed later easily. - + MIH: I agree that such things can be changed later, but the point is whether it is okay that we go ahead and implement things in a certain way that aren't agreed upon yet, with the clear understanding that things will change. - -EAO: The point of the original consensus syntax would be that this would be the starting point from which any implementations - -ECH: The point of this question is to make sure that we're thinking about the timeline, and that we have a backup plan. _____ - -STA: I think we have general agreement on the data model, thanks to this exercise with the syntax. I think these discussions, even if they result in not immediately pursuing certain options, are important milestones for helping us understand the issues. Things like: separating local variable definitions from selectors; fairly good idea of what a placeholder will look like (barring markup, which is TBD); number of variant keys. Outstanding: local variables. - + +EAO: The point of the original consensus syntax would be that this would be the starting point from which any implementations + +ECH: The point of this question is to make sure that we're thinking about the timeline, and that we have a backup plan. **\_** + +STA: I think we have general agreement on the data model, thanks to this exercise with the syntax. I think these discussions, even if they result in not immediately pursuing certain options, are important milestones for helping us understand the issues. Things like: separating local variable definitions from selectors; fairly good idea of what a placeholder will look like (barring markup, which is TBD); number of variant keys. Outstanding: local variables. + And I want to say it because no one else is, that there is no way that we are going to meet the mid-August feature freeze deadline. - + MWS: I think the biggest thing that would cause us to change is "do we start in code mode?", followed by whether we delimit things and how. - + RCA: Who will be implementing this? - + MIH: I will, for ICU4J. - + EAO: And I will, for ECMA-402. - -My question is also what do we - -RCA: MIH, with these blocker issues, do we have the necessary hints and directions to guide impleme notation? With the understanding that our continued discussions can update the implementation? - -MIH: I appreciate EAO's offer to work on the JavaScript implementation, because that is a totally different type of language. The real deadline is for ICU. I think we should identify 1 or 2 big issues, like what MWS said. To what STA said about just supporting the data model first, and my response is that I already went through that exercise 1 year ago with EAO when going back and forth to show that discussed features were supportable. It was awkward to show and write unit tests for, so I think also having a syntax will make that much clearer. - + +My question is also what do we + +RCA: MIH, with these blocker issues, do we have the necessary hints and directions to guide impleme notation? With the understanding that our continued discussions can update the implementation? + +MIH: I appreciate EAO's offer to work on the JavaScript implementation, because that is a totally different type of language. The real deadline is for ICU. I think we should identify 1 or 2 big issues, like what MWS said. To what STA said about just supporting the data model first, and my response is that I already went through that exercise 1 year ago with EAO when going back and forth to show that discussed features were supportable. It was awkward to show and write unit tests for, so I think also having a syntax will make that much clearer. + MWS (from chat): for "normal people" to give feedback, they have to see & play with a message-string-with-syntax and see what happens - -EAO: I think the difficult stuff for the internals is how to deal with values and variables, as defined. How do we handle markup elements, are they fundamentally their own thing, or can they work within placeholders? The syntax level questions are things that for a technical preview could be handled more easily because we have EBNF parser tool that can help. - + +EAO: I think the difficult stuff for the internals is how to deal with values and variables, as defined. How do we handle markup elements, are they fundamentally their own thing, or can they work within placeholders? The syntax level questions are things that for a technical preview could be handled more easily because we have EBNF parser tool that can help. + MIH: My gut feeling about the "big blockers": - + - start in code or text - markup elements: same as placeholder, or not? - the "." in variable names (`{$foo.bar :func}` or `{$foo :func opt=$bar.baz}` or even in selectors / local variables - + The second and third feel more like data model than syntax, really, so they should be addressed first. - -STA: So I like the list by MIH. I would add use cases for local variables naming. Let's use the guidance from the CLDR-TC on this topic, which said to allow "." in identifiers that are treated in implementation-specific ways. - -MIH: I'm okay with that, even though it contradicts other things that CLDR-TC said. If I reference `$foo.bar`, and the `.bar` implies some type of structure must exist in `$foo` for the engine to work properly, then I think - + +STA: So I like the list by MIH. I would add use cases for local variables naming. Let's use the guidance from the CLDR-TC on this topic, which said to allow "." in identifiers that are treated in implementation-specific ways. + +MIH: I'm okay with that, even though it contradicts other things that CLDR-TC said. If I reference `$foo.bar`, and the `.bar` implies some type of structure must exist in `$foo` for the engine to work properly, then I think + MWS (from chat): Variable name with dot: I certainly don't want the spec to require that something special happens. But if in some implementation or language something natural could happen, that's probably ok. - -STA: MIH, what you said is not how I understood the CLDR-TC guidance. I understood it to mean that if an implementation uses a "." dot, then it is not a part of the data model, and it is entirely between the specific implementation and users of it. - -MIH: My hope is that, whatever we implement, if we swap implementations, we get the same results. The custom functions should look more or less the same. If we don't have that property, then we can't have a test suite to verify conformance. - -MWS: I don't want the spec to require that `person.name` should have some fancy look. It should be given a map where one of the keys is literally `person.name` as a string key and be able to get something out. It should also be possible for an implementation to later say that, if `person.name` as a string key is not present, but `person` does exist, then it does a lookup into the `person` struct if appropriate. I just don't want to _require_ in the spec that a 2-step lookup should happen. - + +STA: MIH, what you said is not how I understood the CLDR-TC guidance. I understood it to mean that if an implementation uses a "." dot, then it is not a part of the data model, and it is entirely between the specific implementation and users of it. + +MIH: My hope is that, whatever we implement, if we swap implementations, we get the same results. The custom functions should look more or less the same. If we don't have that property, then we can't have a test suite to verify conformance. + +MWS: I don't want the spec to require that `person.name` should have some fancy look. It should be given a map where one of the keys is literally `person.name` as a string key and be able to get something out. It should also be possible for an implementation to later say that, if `person.name` as a string key is not present, but `person` does exist, then it does a lookup into the `person` struct if appropriate. I just don't want to _require_ in the spec that a 2-step lookup should happen. + MIH (via chat): - + Example: + ``` mf = mf2.parse( "{$pers.name} was born on {$pers.dob}") ``` - + should I be able to do this? `mf.format( "pers" : currentPErson)` ? - + Or I am forced to do + ``` mf.format( { "pers.name" : currentPerson.name, "pers.dob" : currentPerson.dob ) ``` - + MWS (via chat): Mihai's example could be implemented with a simple Map given to the MF2 function with `{"pers.name":"Markus", "pers.dob":(1905-12-31)}` - + MIH (via chat): yes, but the dev is forced to convert a POJO to a map in order to use it - + MWS (via chat): it should also be possible to provide a "pers" argument with some structure (e.g., another Map, or some special type with a get(name) etc.) - -MIH: I get that But I do see value in the specification saying what the expected behavior should be. - + +MIH: I get that But I do see value in the specification saying what the expected behavior should be. + STA: I don't think it is even possible to do what MIH is asking for. - -MIH: I disagree. I think you can look at the type of `pers` as a Person object and then do something accordingly. Or you can do something like we do for DateFormatter. - + +MIH: I disagree. I think you can look at the type of `pers` as a Person object and then do something accordingly. Or you can do something like we do for DateFormatter. + STA: I don't understand what we did for DateFormatter. - + MIH: Okay, if we can't decide, then I'm fine just saying that it is 100% an implementation detail. - -EAO: I would like to stick to the CLDR-TC + ICU-TC committee's consensus decisions, otherwise we will set ourselves back several months. - -The other part I would like to discuss is how to handle variables, which is related. The question is what can a function see? The key part here is what do we understand a "value" to be? I understand that the CLDR+ICU-TC believes that when a value is given to the formatting function, only the value is given, but not the name or identifier that the value is associated with. - -MWS: I agree with EAO with what the value is. If I think of this is as a Java implementation, the name should be opaque and all you get is a Java object. We would want the formatting functions to be able to inspect the runtime type and dispatch as necessary. - -EAO: One extension point of this discussion is do we take the same approach for function names? - -MWS: I would like to clarify that the message formatting function should know what a `Person` type is. It shouldn't know what types that the formatting function expects to handle. The message library could generically handle that a Person object maps to a Map. We don't need to require it, but we don't need to forbid it. - -Regarding dots in a function name versus a variable name. In a variable name, it is natural to allow the dots to signify a multi-level lookup. For functions, one thing that Addison liked was that dots could signify a custom function name, where the dots signify a namespace in the style of Java package names / Maven artifact group ids. Back to variables, we should still allow a flat level lookup. - + +EAO: I would like to stick to the CLDR-TC + ICU-TC committee's consensus decisions, otherwise we will set ourselves back several months. + +The other part I would like to discuss is how to handle variables, which is related. The question is what can a function see? The key part here is what do we understand a "value" to be? I understand that the CLDR+ICU-TC believes that when a value is given to the formatting function, only the value is given, but not the name or identifier that the value is associated with. + +MWS: I agree with EAO with what the value is. If I think of this is as a Java implementation, the name should be opaque and all you get is a Java object. We would want the formatting functions to be able to inspect the runtime type and dispatch as necessary. + +EAO: One extension point of this discussion is do we take the same approach for function names? + +MWS: I would like to clarify that the message formatting function should know what a `Person` type is. It shouldn't know what types that the formatting function expects to handle. The message library could generically handle that a Person object maps to a Map. We don't need to require it, but we don't need to forbid it. + +Regarding dots in a function name versus a variable name. In a variable name, it is natural to allow the dots to signify a multi-level lookup. For functions, one thing that Addison liked was that dots could signify a custom function name, where the dots signify a namespace in the style of Java package names / Maven artifact group ids. Back to variables, we should still allow a flat level lookup. + MIH: I am fine to say that this topic is just an implementation detail, and I will go ahead and implement in the ICU4J prototype whatever I think makes the most sense for Java. - -EAO: It does sound like the function name is a separate discussion. One thing I do like for custom functions is some signifier that indicates that a function is a custom function. In Fluent, a leading dash indicates that a message is a term, and in a similar way, maybe a "." could be used to indicate a custom function. - -ECH (via chat): +1. MWS's PR mentions this point exactly -- that a dot in a function could signify a custom function, without specifying how. This is where Addison +1'ed this alternative over the other alternatives for signifying that a function is a custom function - + +EAO: It does sound like the function name is a separate discussion. One thing I do like for custom functions is some signifier that indicates that a function is a custom function. In Fluent, a leading dash indicates that a message is a term, and in a similar way, maybe a "." could be used to indicate a custom function. + +ECH (via chat): +1. MWS's PR mentions this point exactly -- that a dot in a function could signify a custom function, without specifying how. This is where Addison +1'ed this alternative over the other alternatives for signifying that a function is a custom function + EAO: So should we update the EBNF to indicate that dots in function names imply custom functions? - -MWS: I don't think we need to update the syntax. I think we can let it just be a convention that we handle in a specific way accordingly. - + +MWS: I don't think we need to update the syntax. I think we can let it just be a convention that we handle in a specific way accordingly. + RCA: Let's record consensus on these topics. - + STA: We haven't answered ECH's questions about backup plans. - + EAO: I might be available for June 27, but would be out for most of July. - + STA: I would be out for 2 weeks in July... so what is our backup plan? - -ECH: We have the 3 items above listed by MIH. We've already discussed the 3rd item and reached a consensus, as the notes above indicate. If we focus on the remaining 2 items of that list, then we can have the resolutions we need to unblock MIH's ICU4J tech preview implementation. - -EAO: Given this time crunch, we should identify who will submit PRs to improving the existing syntax spec according to the decisions that we are making? Specifically, who other than STA and myself will submit PRs to change the EBNF structure? That needs to be happening in parallel for us to be reaching these goals. - -STA: I can definitely can work on the EBNF. For the decision mentioned of separating local variables from selectors, this is a topic that is tightly coupled with other issues, such as whether and how we delimit selectors. They are all regarding syntax visuals, and maybe we should handle them altogether this time around. As long as we reach some sort of agreement, I can go ahead the EBNF. - + +ECH: We have the 3 items above listed by MIH. We've already discussed the 3rd item and reached a consensus, as the notes above indicate. If we focus on the remaining 2 items of that list, then we can have the resolutions we need to unblock MIH's ICU4J tech preview implementation. + +EAO: Given this time crunch, we should identify who will submit PRs to improving the existing syntax spec according to the decisions that we are making? Specifically, who other than STA and myself will submit PRs to change the EBNF structure? That needs to be happening in parallel for us to be reaching these goals. + +STA: I can definitely can work on the EBNF. For the decision mentioned of separating local variables from selectors, this is a topic that is tightly coupled with other issues, such as whether and how we delimit selectors. They are all regarding syntax visuals, and maybe we should handle them altogether this time around. As long as we reach some sort of agreement, I can go ahead the EBNF. + EAO: I think STA and I can coordinate on that. - -EAO: As a last sort of question, as a starting point for next week, I still think we should start in text mode. I know that MIH and STA and MWS think we should start in code mode. Is there anyone else besides me who thinks we should start in text mode? - + +EAO: As a last sort of question, as a starting point for next week, I still think we should start in text mode. I know that MIH and STA and MWS think we should start in code mode. Is there anyone else besides me who thinks we should start in text mode? + ECH: I agree to start in code mode. - -RGN: I have a weak preference to start in code mode. - - + +RGN: I have a weak preference to start in code mode. + ### the "." in variable names (`{$foo.bar :func}` or `{$foo :func opt=$bar.baz}` or even in selectors / local variables - -#### Consensus : - - -Next Meeting Agenda : -- start in code or text - https://github.com/unicode-org/message-format-wg/issues/256 - Look at meeting notes + +#### Consensus : + +Next Meeting Agenda : + +- start in code or text - https://github.com/unicode-org/message-format-wg/issues/256 - Look at meeting notes - markup elements: same as placeholder, or not? -- the "." in variable names (`{$foo.bar :func}` or `{$foo :func opt=$bar.baz}` or even in selectors / local variables - Already discussed we need to record consensus - +- the "." in variable names (`{$foo.bar :func}` or `{$foo :func opt=$bar.baz}` or even in selectors / local variables - Already discussed we need to record consensus diff --git a/meetings/2022/notes-2022-06-13.md b/meetings/2022/notes-2022-06-13.md index 74ba56b158..513a9076de 100644 --- a/meetings/2022/notes-2022-06-13.md +++ b/meetings/2022/notes-2022-06-13.md @@ -1,4 +1,5 @@ ### June 13th, meeting Attendees + - Eemeli Aro - Mozilla (EAO) - Mihai Nita - Google (MIH) - Elango Cheran - Google (ECH) @@ -6,153 +7,147 @@ - Richard Gibson - OpenJSF (RGN) - David Filip - XLIFF TC, Huawei (DAF) -## MessageFormat Working Group Contacts: +## MessageFormat Working Group Contacts: - [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) - ### Agenda - -* review open issues -* discuss blockers to beginning prototyping - + +- review open issues +- discuss blockers to beginning prototyping + ## Review open issues - + ### [#257](https://github.com/unicode-org/message-format-wg/pull/257) - + STA: A bit of cleanup is needed because we have a bunch of issues that are related. I propose this discussion in the issue that has the most discussion, which is #253. - + EAO: I am closing #257 as redundant. - + ### [#282](https://github.com/unicode-org/message-format-wg/pull/282) - -EAO: As I understand it, the intent of this PR is to make a purely editorial edit without changing anything in the proposal. This PR just makes the change that any character is a character in the Unicode code point space, with descriptions of ranges that are included or excluded. Once this PR lands, then we can have - + +EAO: As I understand it, the intent of this PR is to make a purely editorial edit without changing anything in the proposal. This PR just makes the change that any character is a character in the Unicode code point space, with descriptions of ranges that are included or excluded. Once this PR lands, then we can have + MIH: I am not able to follow where he is coming from. I don't understand why JavaScript influences the decision, but Java properties doesn't. - + EAO: I think the issue you are referring to is #268. - -MIH: Yes, the discussion is split. RGN was saying that what I was referring to is in-memory, not applicable, but I don't understand the argument that we care about the valid characters in a serialization... I'm fine with the change. I just want to make sure that we understand the implications ... Depending on the programming language and/or operating system, you get strings serialized as UTF-8, or UTF-16, or potentially something else. The question is if you have invalid UTF-8 characters, do we say it's invalid, or not? That is the implication that we need to discuss. - -RGN: Logically, when you are ingesting a representation of a message format into an in-memory MF structure, you need to define what is being ingested. In Java, you can say that it is treated as a sequence of Unicode code points. THe ingest process takes the code points and returns a data structure / object. The sequence of code points is going to be checked syntatically to understand it. There is a special meaning ascribed to the double quote code point, left bracket, etc. It is checked against the EBNF grammar defined. The followup question to the PR I started is what should be considered valid in the grammar. If you say it is literally any code point, other than an unescaped double quote or backslash, you run into the problem that some values cannot be represented UTF-8 string. The whole ecosystem prefers and defaults to UTF-8. You are introducing friction if you say that the message cannot be supported in UTF-8. - + +MIH: Yes, the discussion is split. RGN was saying that what I was referring to is in-memory, not applicable, but I don't understand the argument that we care about the valid characters in a serialization... I'm fine with the change. I just want to make sure that we understand the implications ... Depending on the programming language and/or operating system, you get strings serialized as UTF-8, or UTF-16, or potentially something else. The question is if you have invalid UTF-8 characters, do we say it's invalid, or not? That is the implication that we need to discuss. + +RGN: Logically, when you are ingesting a representation of a message format into an in-memory MF structure, you need to define what is being ingested. In Java, you can say that it is treated as a sequence of Unicode code points. THe ingest process takes the code points and returns a data structure / object. The sequence of code points is going to be checked syntatically to understand it. There is a special meaning ascribed to the double quote code point, left bracket, etc. It is checked against the EBNF grammar defined. The followup question to the PR I started is what should be considered valid in the grammar. If you say it is literally any code point, other than an unescaped double quote or backslash, you run into the problem that some values cannot be represented UTF-8 string. The whole ecosystem prefers and defaults to UTF-8. You are introducing friction if you say that the message cannot be supported in UTF-8. + STA: I'm not as familiar with teh details of encodings. What sort of code points are not representable as UTF-8. - -RGN: Surrogate code points. Those are code points reserved for representing code points in UTF-16 that are beyond the first plane (BMP) of 2^16 code points. - -MIH: I understand what you mean. But we also implement this is C and Java, and so on. So what should we do if we receive a message with invalid UTF-8 code points. Do we expect to replace them with the replacement character, or do we just pass them through? - -RGN: I think what you're asking about, using JavaScript as a concrete example, is that a JS string is allowed to have unpaired surrogates. So the question is a question for the JS adapter / implementation, but that's not a question for the standard itself. - + +RGN: Surrogate code points. Those are code points reserved for representing code points in UTF-16 that are beyond the first plane (BMP) of 2^16 code points. + +MIH: I understand what you mean. But we also implement this is C and Java, and so on. So what should we do if we receive a message with invalid UTF-8 code points. Do we expect to replace them with the replacement character, or do we just pass them through? + +RGN: I think what you're asking about, using JavaScript as a concrete example, is that a JS string is allowed to have unpaired surrogates. So the question is a question for the JS adapter / implementation, but that's not a question for the standard itself. + MIH: So we leave it to the implementation? - + RGN: Yes. - + MIH: Okay, that is fine with me. - -EAO: Given that this discussion is not the main portion of the PR, can we merge this PR? It sounds like we're okay with merging. - + +EAO: Given that this discussion is not the main portion of the PR, can we merge this PR? It sounds like we're okay with merging. + ### [#283](https://github.com/unicode-org/message-format-wg/pull/283) - -EAO: The other PR is the one I filed. It's to make markup placeholders use plus and minus signs to indicate open and close to indicate the type of placeholder/markup tag. - -STA: I think we're on the verge of this PR becoming a point of discussion. I think this is related to the issue about needing different sigil names. We can go ahead with EAO's PR and then continue the discussion in #241. - -MIH: To be fair, I like the HTML-like style with slashes to indicate open/close/standalone. The question I have is if we need a concept of a sigil in the first place, it feels like bikeshedding. - -EAO: Through the discussion, I feel like we have less distance between formatting functions and markup placeholders. My understanding of what we will havei n the spec, data model / structure-wise, is whether we need to have arguments for open markup elements. - + +EAO: The other PR is the one I filed. It's to make markup placeholders use plus and minus signs to indicate open and close to indicate the type of placeholder/markup tag. + +STA: I think we're on the verge of this PR becoming a point of discussion. I think this is related to the issue about needing different sigil names. We can go ahead with EAO's PR and then continue the discussion in #241. + +MIH: To be fair, I like the HTML-like style with slashes to indicate open/close/standalone. The question I have is if we need a concept of a sigil in the first place, it feels like bikeshedding. + +EAO: Through the discussion, I feel like we have less distance between formatting functions and markup placeholders. My understanding of what we will havei n the spec, data model / structure-wise, is whether we need to have arguments for open markup elements. + MIH: I think we do need to have arguments. - + EAO: I don't think we do. - + MIH: Okay, then this is not the place to discuss it, it's the issue that we've touched in the past few weeks. - -EAO: Once we get to implementation, my sense is that we get things that represent markup elements and things that represent placeholders. I think we're getting away from the actual PR. - -ECH: Why do we - -STA: Maybe we can spend some time today talking about whether we need markup elements at all. There has been a good discussion in Github about this already. I think EAO's comments already covered what I wanted to say. - + +EAO: Once we get to implementation, my sense is that we get things that represent markup elements and things that represent placeholders. I think we're getting away from the actual PR. + +ECH: Why do we + +STA: Maybe we can spend some time today talking about whether we need markup elements at all. There has been a good discussion in Github about this already. I think EAO's comments already covered what I wanted to say. + MIH: Are you referring to the issue #262. - + STA: Yes, and there may have been another issue. - -EAO: I would like us to get more to that discussion. Would people be okay with merging #283 so that we have something for the open and close markup tags. - + +EAO: I would like us to get more to that discussion. Would people be okay with merging #283 so that we have something for the open and close markup tags. + STA: Do we need this? - + EAO: I was convinced by RGN that because many templating systems use 2 character syntax to demarcate placeholders, we could use the plus and minus to distinguish open and close. - -STA: I understand the idea, but I don't understand where the idea originated from. Maybe this is from MWS's idea that we shouldn't allow whitespace if possible, which I don't agree with, but I agree with the idea that the placeholder starts with the opening curly brace, and everything in between is part of the contents of the placeholder. - + +STA: I understand the idea, but I don't understand where the idea originated from. Maybe this is from MWS's idea that we shouldn't allow whitespace if possible, which I don't agree with, but I agree with the idea that the placeholder starts with the opening curly brace, and everything in between is part of the contents of the placeholder. + MIH: I agree that if you have `{:...`}, the colon is a part of the function invocation, not a part of the curly brace. - + EAO: I agree with you structurally, but I think it is useful to communicate the open and close, and this is better than the HTML style syntax of nothing for open and a slash for close. - -STA: Actually, it's issue #240 that I wanted to spend 10-15 minutes to discuss. Let's switch to that. - + +STA: Actually, it's issue #240 that I wanted to spend 10-15 minutes to discuss. Let's switch to that. + I think by special-casing this class of placeholders, we enable tooling to know that they need to invoke special functions when formatting them. - -EAO: What I want to get out of MF2 with respect to markup elements is that they're possible. I don't think they need to be special-cased. What occurred to me when writing my comment is that there are things in the registry that ___. I think what STA was saying is that maybe a function doesn't need to be invoked, and can allow a formatToParts. - -MIH: I was trying to remember one of the comments from MWS, I think it's that a function can exist that can be pass-through for formatting, say `:string`. I do agree that we should have formatToParts, which should be true parts (not the current MF notion of parts), and we can also have the pass-through function -- the identify function. - -STA: I'm torn, myself. I thought I knew what my position was. I started out thinking that we can do everything as functions. A few months ago, I was convinced by discussion in the group to consider display elements as special because they are larger than just displaying things. Crucially, I think it's more of a stylistic choice as far as API design, they are functionally equivalent. If it is a part, then it is up to the runtime to decide what to do it. Which is appropriate, because the runtime is responsible for knowing how to display things, rather than inside a function. In fact, some functions can be called too early to know how to decide how to display things. So the function may have to pass through and let the runtime decide anyways. So think we need to decide on the special-ness of display elements. - -ZBI: I'm in a similar position as STA as going back and forth on the special-ness of display elements. My anchor is display elements / markup elements do not share the same API as formatting functions. I think the CLDR-TC+ICU-TC decision came into the discussion pseudo-coding the solution using functions. It feels like pushing a square peg through a round hole. It might require tools to look into the registry to figure out what to do. - -In the option where we treat markup elements, we have 3 types of tools: the tool that formats a placeholder without any intelligence, a tool that formats a placeholder by looking into the registry for a function to do the formatting, and a tool that can recognize the difference between markup elements and placeholders with functions and can do - + +EAO: What I want to get out of MF2 with respect to markup elements is that they're possible. I don't think they need to be special-cased. What occurred to me when writing my comment is that there are things in the registry that \_\_\_. I think what STA was saying is that maybe a function doesn't need to be invoked, and can allow a formatToParts. + +MIH: I was trying to remember one of the comments from MWS, I think it's that a function can exist that can be pass-through for formatting, say `:string`. I do agree that we should have formatToParts, which should be true parts (not the current MF notion of parts), and we can also have the pass-through function -- the identify function. + +STA: I'm torn, myself. I thought I knew what my position was. I started out thinking that we can do everything as functions. A few months ago, I was convinced by discussion in the group to consider display elements as special because they are larger than just displaying things. Crucially, I think it's more of a stylistic choice as far as API design, they are functionally equivalent. If it is a part, then it is up to the runtime to decide what to do it. Which is appropriate, because the runtime is responsible for knowing how to display things, rather than inside a function. In fact, some functions can be called too early to know how to decide how to display things. So the function may have to pass through and let the runtime decide anyways. So think we need to decide on the special-ness of display elements. + +ZBI: I'm in a similar position as STA as going back and forth on the special-ness of display elements. My anchor is display elements / markup elements do not share the same API as formatting functions. I think the CLDR-TC+ICU-TC decision came into the discussion pseudo-coding the solution using functions. It feels like pushing a square peg through a round hole. It might require tools to look into the registry to figure out what to do. + +In the option where we treat markup elements, we have 3 types of tools: the tool that formats a placeholder without any intelligence, a tool that formats a placeholder by looking into the registry for a function to do the formatting, and a tool that can recognize the difference between markup elements and placeholders with functions and can do + EAO: I agree, but I think we can come out with an emergent property from the base rules, but without putting constraints on the extent of what a function can do. - -MIH: Tools today already exist that take in the placeholder type open/close/standalone. Tools today also take in the type of markup tag and reason about / dispatch on the tag. There is a tendency to think that UI elements are just UI, but a lot of them have functionality. If you only think of JS, they have an identifier, and when they do, translators cannot delete them because they are used to trigger functionality, so it's a concept that is represented in XLIFF. I am not bringing up XLIFF to say we have to do everything it does, but designs in XLIFF support the needs of the localization world. - -ECH: I think I can understand the plus and minus signs because we would need to represent open/close/standalone info either way. That would have to be some key-value option in the options list for a placeholder, and the EM proposal doesn't specify what the name of that key should be. But that is always something that we can specify later. - -STA: I wanted to verify that I understand, - -ZBI: One more thing that came to mind is that if we treat elements as regular placeholders, then we actually have a deeper problem. In our registry thinking, I don't think we have thought through how we communicate in the registry how a particular function interprets the types of the arguments. But if you say that we don't need to specialize functions for display elements, then we don't need to spend time discussing this. - -EAO: I think we're mostly agreed, and we agree on the need for indicating open/close/standalone for placeholders. We still disagree on and need to discuss whether open and close placeholders can take arguments, or not. I propose - + +MIH: Tools today already exist that take in the placeholder type open/close/standalone. Tools today also take in the type of markup tag and reason about / dispatch on the tag. There is a tendency to think that UI elements are just UI, but a lot of them have functionality. If you only think of JS, they have an identifier, and when they do, translators cannot delete them because they are used to trigger functionality, so it's a concept that is represented in XLIFF. I am not bringing up XLIFF to say we have to do everything it does, but designs in XLIFF support the needs of the localization world. + +ECH: I think I can understand the plus and minus signs because we would need to represent open/close/standalone info either way. That would have to be some key-value option in the options list for a placeholder, and the EM proposal doesn't specify what the name of that key should be. But that is always something that we can specify later. + +STA: I wanted to verify that I understand, + +ZBI: One more thing that came to mind is that if we treat elements as regular placeholders, then we actually have a deeper problem. In our registry thinking, I don't think we have thought through how we communicate in the registry how a particular function interprets the types of the arguments. But if you say that we don't need to specialize functions for display elements, then we don't need to spend time discussing this. + +EAO: I think we're mostly agreed, and we agree on the need for indicating open/close/standalone for placeholders. We still disagree on and need to discuss whether open and close placeholders can take arguments, or not. I propose + MIH: I think the discussion on that last topic in which we disagree can be better understood through the discussion in #262. - -STA: Back in #240, or is #283, we are saying that there are 3, not 1 sigils for calling a function. But 2 of them will indicate that in addition to invoking a function, they also indicate the open/close/standalone value of the placeholder. - + +STA: Back in #240, or is #283, we are saying that there are 3, not 1 sigils for calling a function. But 2 of them will indicate that in addition to invoking a function, they also indicate the open/close/standalone value of the placeholder. + MIH: I still think we need a function for placeholders representing markup tags, because the tag names might be reused. Ex: `` is in HTML and exists in SSML: https://cloud.google.com/text-to-speech/docs/ssml#sub - + ECH: On the point in #240 in which it is proposed to not allow arguments or options for close placeholders, we need to be able to allow ids for close placeholders to match them up to their corresponding open placeholder, especially when there are multiple instances of the same tag name in a message, and they need to be paired. - -EAO: This is exactly why I want to preclude them because adding that possibility to a standard is easy, but once you add something, it is hard to take away. I think our implementation work will give us real-world evidence of where it is needed and how. - -ECH: I agree with the principle of it being easier to add things and hard to take them away. Even though I have a strong opinion on how this will end up, I am okay with starting with it not being allowed and allowing implementation work informing us soon enough. - -STA: The topic that we just discussed seems harmless enough that allowing them or not allowing them won't affect people, but I am not strongly opinionated on that. On the topic from MIH that we need to have positional arguments to indicate the markup type as a function, would you consider having namespaced functions as a sufficient substitute? - -To the point that MIH said about namespaces being the same as functions, I want to talk - - - - - - + +EAO: This is exactly why I want to preclude them because adding that possibility to a standard is easy, but once you add something, it is hard to take away. I think our implementation work will give us real-world evidence of where it is needed and how. + +ECH: I agree with the principle of it being easier to add things and hard to take them away. Even though I have a strong opinion on how this will end up, I am okay with starting with it not being allowed and allowing implementation work informing us soon enough. + +STA: The topic that we just discussed seems harmless enough that allowing them or not allowing them won't affect people, but I am not strongly opinionated on that. On the topic from MIH that we need to have positional arguments to indicate the markup type as a function, would you consider having namespaced functions as a sufficient substitute? + +To the point that MIH said about namespaces being the same as functions, I want to talk + ## Discuss blockers to beginning prototyping - + EAO: Would it be appropriate for me to merge the PR to allow the syntax with the plus and minus signs, and due to lack of time, we can continue the discussion on whether we have arguments and/or options in open and/or close placeholders? - + ECH: I am okay with going ahead in this way based on my interpretation that the plus minus signs are syntactic sugar to represent the open/close status of a placeholder, but that the data model wouldn't need to change, because this is just a special syntactical representation. - -EAO: Okay. I think if you or MIH can create a PR to represent your changes to the syntax spec in the `develop` branch, we can continue the discussion on that PR because we don't have enough time in the current meeting to do so. - -STA: On #255, if there's a consensus that excludes me on using curly braces instead of square brackets for delimiting patterns, then I would be okay with curly braces for patterns. And that would make the issue of delimiting selector values really quick to close. I won't block on this forever, but I raised concerns, and no one has addressed those concerns, which is not okay. - + +EAO: Okay. I think if you or MIH can create a PR to represent your changes to the syntax spec in the `develop` branch, we can continue the discussion on that PR because we don't have enough time in the current meeting to do so. + +STA: On #255, if there's a consensus that excludes me on using curly braces instead of square brackets for delimiting patterns, then I would be okay with curly braces for patterns. And that would make the issue of delimiting selector values really quick to close. I won't block on this forever, but I raised concerns, and no one has addressed those concerns, which is not okay. + EAO: Is it okay to record a consensus on curly braces and have a followup issue to continue looking into square brackets? - -STA: That is not what I am saying. MWS is on vacation for 2 more weeks, and others on vacation, too. However, are there other decisions that are affected by a decision here, like how to delimit selector values. - + +STA: That is not what I am saying. MWS is on vacation for 2 more weeks, and others on vacation, too. However, are there other decisions that are affected by a decision here, like how to delimit selector values. + MIH: I still have to do the implementation for ICU4J, so I can work on the data model part if that's okay. - -STA: I still am not comfortable moving forward. It will be hard to reverse a decision here if we start implementing based on this and subsequent follow on decisions. - -ZBI: I completely agree with STA that we should not move forward without addressing his concerns. + +STA: I still am not comfortable moving forward. It will be hard to reverse a decision here if we start implementing based on this and subsequent follow on decisions. + +ZBI: I completely agree with STA that we should not move forward without addressing his concerns. diff --git a/meetings/2022/notes-2022-06-27.md b/meetings/2022/notes-2022-06-27.md index 4fabd221b2..4d4c47a98b 100644 --- a/meetings/2022/notes-2022-06-27.md +++ b/meetings/2022/notes-2022-06-27.md @@ -1,4 +1,5 @@ ### June 27th, meeting Attendees + - Romulo Cintra (RCA) Igalia - David Filip - XLIFF TC, Huawei (DAF) - Eemeli Aro - Mozilla (EAO) @@ -8,125 +9,121 @@ - Elango Cheran - Google (ECH) - Richard Gibson - OpenJSF (RGN) - - -## MessageFormat Working Group Contacts: +## MessageFormat Working Group Contacts: - [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) - ### Agenda - + Review PR’s & Open Issues Delimiters ([] vs. {} vs. none) Issue #255 Review tech preview blocker issues Try to find a temporary solution for markup elements Plan next steps before tech preview(Jul & Aug) - - + ## Review open issues - -ECH : This meeting would be the “final” meeting where all decisions made would be included in technical preview - you all agree with this? We of course all acknowledge that the technical preview is not necessarily what the final spec will look like, as we have reiterated in almost every meeting recently. - + +ECH : This meeting would be the “final” meeting where all decisions made would be included in technical preview - you all agree with this? We of course all acknowledge that the technical preview is not necessarily what the final spec will look like, as we have reiterated in almost every meeting recently. + STA: I’m ok with that and July being quiet give us opportunity to work on implementation and tests. We should leave today’s meeting with a plan. - + ## Review PR's & Open Issues - + #240 - + EAO: We originally labeled this as a blocker because we thought we could include this for the tech preview, but I think we were wrong on that estimation. - + STA: I think this was a point of disagreement between EAO and MIH during the last meeting. - + MIH: I think the title is not really descriptive of what we're trying to solve at this point. - -EAO: I would describe the situation as that we have different views about how the idea of markup elements ought to work, so that needs further resolution. But we will not reach an agreement here. One possibility is that start and end elements will end up getting passed through in the tech preview implementation. Is that the suggestion? - + +EAO: I would describe the situation as that we have different views about how the idea of markup elements ought to work, so that needs further resolution. But we will not reach an agreement here. One possibility is that start and end elements will end up getting passed through in the tech preview implementation. Is that the suggestion? + MIH: Yes, that is the idea. - + RCA: Does this mean that at this moment, we have an intermediate solution that we can put in the tech preview? - -MIH: No, we don't have an agreement. So I wouldn't want to put it into the tech preview and then have a big change. - -STA: I think we can edit the title of this issue. It's not only about standalone element, but more generally about all markup elements (open and close, too). What MIH is saying that markup elements should be represented as a literal value argument to a function that represents the type of markup. I don't know if we will have an agreement today. - + +MIH: No, we don't have an agreement. So I wouldn't want to put it into the tech preview and then have a big change. + +STA: I think we can edit the title of this issue. It's not only about standalone element, but more generally about all markup elements (open and close, too). What MIH is saying that markup elements should be represented as a literal value argument to a function that represents the type of markup. I don't know if we will have an agreement today. + MIH: My suggestion is for the literal syntax like this + ``` For me: {+b :html} b = positional argument :html = function ``` -But Mark's - + +But Mark's + EAO: I think the black box is the better way to go right now because it doesn't make a decision on how to handle markup elements and leaves for room for them to be decided later. - + STA: And to clarify, does the black box implementation mean that when the parser sees the + or - sign, then it returns the entire text of that placeholder in the formatted output as-is without any interpretation. - -MIH: Yes. My contention is that we will find that markup elements are basically the same as placeholders. - + +MIH: Yes. My contention is that we will find that markup elements are basically the same as placeholders. + RCA: So we are all in agreement that we will have the black box pass-through behavior for now for markup elements. - + #241 - + EAO: I think we can defer the issue by allowing this to be passed through in the black box manner, too. - + #248 - -ECH: This really needs to get figured out. It's an issue hampering the discussions in this group. It doesn't have to be a "blocker" in the sense that it blocks the tech preview implementation, but we really need to come to an agreement - + +ECH: This really needs to get figured out. It's an issue hampering the discussions in this group. It doesn't have to be a "blocker" in the sense that it blocks the tech preview implementation, but we really need to come to an agreement + #256 - + EAO: We have a consensus on this by now. - + MIH: Don't close this until we merge the PR that addresses this. - + #260 - -STA: I don't think that any syntax addresses this right now. I think we will eventually address this. - + +STA: I don't think that any syntax addresses this right now. I think we will eventually address this. + EAO: Are we making the registry public for initial implementations? - + MIH: No, not now. - + EAO: So since this is an implementation detail for MIH's and my tech preview prototype code, we don't need to decide on this right now. - + #275 - + EAO: This is covered by the PR, too. - + RCA: Okay, let's move to the PRs now. - + #285 - -MIH: Let's start with #285 because that should be uncontroversial. It's just updating the syntax docs markdown based on what we agreed to last week. It should be a formality. - + +MIH: Let's start with #285 because that should be uncontroversial. It's just updating the syntax docs markdown based on what we agreed to last week. It should be a formality. + #287 - -EAO: Originally, this PR came from a discussion between me and STA to implement a few different things that we had agreed upon. We came up with something in line with what we have done before that could work. I think it was me who said that we can't expect any translator to look at this syntax and make sense of the message enough to know where the translatable parts are. - + +EAO: Originally, this PR came from a discussion between me and STA to implement a few different things that we had agreed upon. We came up with something in line with what we have done before that could work. I think it was me who said that we can't expect any translator to look at this syntax and make sense of the message enough to know where the translatable parts are. + Then MIH, STA, and I discussed the ideas, and we discussed the idea of using keywords like `case`, `when`, etc., and I wrote the issue and PR to document the ending state of those discussions. - + STA: To add, we agreed the primary audience for the syntax is developers, and that parsers would need to start in "code mode" when beginning the parsing of a message string. - + EAO: One approach is to see if this PR will make things significantly worse, and if not, we can merge, and then collect opinions and data from users. - + STA: I would like to get visibility on whether we agree or not about whether keywords are okay in the syntax or not. - -EAO: There is a close parallel in current MessageFormat for `match`. The keyword `match` here takes on a similar role to `plural`, `ordinal`, etc. in current MessageFormat. - -ZBI: My concern is that these keywords are in English, and then you have message patterns in the source language in which the source language is also English, it becomes confusing to authors. I wonder what the experience is for users when they see the syntax for the first time -- how easy it is for them? The other argument is that a person who is reading a MFv2 message is someone who is reading 1 million such MFv2 messages, and they are already familiar with the syntax. In other words, the benefits for a first-time user and the benefits for a heavy user may be opposed to each other. I'm concerned when keywords become a part of the message that look like English words, but I am not against it. - -MIH: I'm torn, like ZBI. I approved the PR for the reason that it solves issues like the indecision of whether the parser should start in "code mode" or not, and proposals on that topic were very irregular. What I envision is that, as a developer, you are used to writing design docs and code, and only once in a while you author a message. You don't spend a lot of time writing messages, and often times they are simple messages (not ones with selections including plurals). I'm curious to put it out there and see what the reaction is. - + +EAO: There is a close parallel in current MessageFormat for `match`. The keyword `match` here takes on a similar role to `plural`, `ordinal`, etc. in current MessageFormat. + +ZBI: My concern is that these keywords are in English, and then you have message patterns in the source language in which the source language is also English, it becomes confusing to authors. I wonder what the experience is for users when they see the syntax for the first time -- how easy it is for them? The other argument is that a person who is reading a MFv2 message is someone who is reading 1 million such MFv2 messages, and they are already familiar with the syntax. In other words, the benefits for a first-time user and the benefits for a heavy user may be opposed to each other. I'm concerned when keywords become a part of the message that look like English words, but I am not against it. + +MIH: I'm torn, like ZBI. I approved the PR for the reason that it solves issues like the indecision of whether the parser should start in "code mode" or not, and proposals on that topic were very irregular. What I envision is that, as a developer, you are used to writing design docs and code, and only once in a while you author a message. You don't spend a lot of time writing messages, and often times they are simple messages (not ones with selections including plurals). I'm curious to put it out there and see what the reaction is. + RCA: I like the idea of trying it out and seeing what happens. - -STA: I like what MIH said. I want to address the concern of ZBI about what experienced users will say. When I think about my own experience with regex's, I've probably written thousands of them, but I have never remembered the syntax, and always had to consult documentation about it. It won't be a syntax that people study and then be passionate about. Maybe this concern is a little bit smaller now that we're considered always delimiting translatable text. Whereas in Fluent, translatable text was not delimited, and so ____. - -ZBI: STA, I think that counter-argument is a weak one. You're still asking a user to look ahead to the word `when`, then look ahead to the opening bracket, in order to figure out what the selector values are. - -ECH: Why I continue to point out that: 1) we need to have delimiters for the selector value tuples and selector definitions, and that 2) the keywords should be optional because we should not replace the representation of data with these keywords. Looking forward to when languages with data literal syntax can create libraries that can use native data structures idiomatically to represent the message instead of a large "stringly-typed" string that is required of languages that don't have data literals in C++, Java, Rust, and Scala. - - + +STA: I like what MIH said. I want to address the concern of ZBI about what experienced users will say. When I think about my own experience with regex's, I've probably written thousands of them, but I have never remembered the syntax, and always had to consult documentation about it. It won't be a syntax that people study and then be passionate about. Maybe this concern is a little bit smaller now that we're considered always delimiting translatable text. Whereas in Fluent, translatable text was not delimited, and so \_\_\_\_. + +ZBI: STA, I think that counter-argument is a weak one. You're still asking a user to look ahead to the word `when`, then look ahead to the opening bracket, in order to figure out what the selector values are. + +ECH: Why I continue to point out that: 1) we need to have delimiters for the selector value tuples and selector definitions, and that 2) the keywords should be optional because we should not replace the representation of data with these keywords. Looking forward to when languages with data literal syntax can create libraries that can use native data structures idiomatically to represent the message instead of a large "stringly-typed" string that is required of languages that don't have data literals in C++, Java, Rust, and Scala. + Consensus for https://github.com/unicode-org/message-format-wg/issues/255 - we are ok to use {} instead of [] - diff --git a/meetings/2022/notes-2022-07-18.md b/meetings/2022/notes-2022-07-18.md index 7c3d800737..2313d05242 100644 --- a/meetings/2022/notes-2022-07-18.md +++ b/meetings/2022/notes-2022-07-18.md @@ -1,12 +1,12 @@ - Attendees: Please fill in a 3-letter acronym if this is your first meeting: + - Suggestion 1: First letter of given name, First letter of surname, Last letter of surname - Suggestion 2: First initial, middle initial, last initial - Suggestion 3: Custom - ### July 18th, meeting Attendees + - Romulo Cintra (RCA) Igalia - Eemeli Aro - Mozilla (EAO) - Mihai Nita - Google (MIH) @@ -15,231 +15,228 @@ Please fill in a 3-letter acronym if this is your first meeting: - Shane Carr - Google (SFC) - Richard Gibson - OpenJSF (RGN) - -## MessageFormat Working Group Contacts: +## MessageFormat Working Group Contacts: - [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) ## Next Meeting -Extended - 25 July; 1 Aug + +Extended - 25 July; 1 Aug Plenary - 15 Aug ### Agenda - + Review PR’s & Open Issues - - + ## Review open issues - - + ## Review PR's & Open Issues - -RCA: We don't currently have any new open PRs. +RCA: We don't currently have any new open PRs. STA: I would like to fix bugs in our current EBNF and also clarify the usage of whitespace in the syntax. - + ### [#289](https://github.com/unicode-org/message-format-wg/pull/289) - -STA: I think there is a consensus among participants that `let` is an acceptable keyword. I will close the issue with a comment - + +STA: I think there is a consensus among participants that `let` is an acceptable keyword. I will close the issue with a comment + ### [#268](https://github.com/unicode-org/message-format-wg/pull/268) - -MIH: I was the one asking clarifications. I didn't understand the points from RGN. - + +MIH: I was the one asking clarifications. I didn't understand the points from RGN. + EAO: Let's return to this conversation when RGN is back. - + ### [#265](https://github.com/unicode-org/message-format-wg/pull/265) - + EAO: This is waiting on the CLDR-TC to see if they would be okay with starting up a new subcommittee to work on the resource syntax. - + RCA: Should we keep it open? - + EAO: Yes. - + ### [#262](https://github.com/unicode-org/message-format-wg/pull/262) - + EAO: We agreed last time to sidestep this for now for the purposes of the Technial Preview implementation by allowing these to passed through as black boxes. - + RCA: Can we remove the `blocker` label? - -STA: Yes, for the purposes of the technical preview, it's not a blocker. It's also in the technical preview milestone, and we can remove this issue from there. - + +STA: Yes, for the purposes of the technical preview, it's not a blocker. It's also in the technical preview milestone, and we can remove this issue from there. + ### [#261](https://github.com/unicode-org/message-format-wg/pull/261) - + STA: I think this was about using a different sigil like the `#` hash sign for named expressions or not. - + MIH: I did use a different syntax for this. The only drawback is that you cannot tell what is local or not. - + RCA: Can we close this or not? - + MIH: From my side, we can close. - + EAO: Should we keep it open? - -STA: Currently, the syntax on the `develop` branch has a different sigil. This is a good question to resolve once we get feedback from actual users. I don't think that anyone currently has strong feelings. - + +STA: Currently, the syntax on the `develop` branch has a different sigil. This is a good question to resolve once we get feedback from actual users. I don't think that anyone currently has strong feelings. + RCA: Should we comment that we are waiting for feedback? - + EAO: Let's close and comment that we may choose to reopen based on feedback. - + ### [#259](https://github.com/unicode-org/message-format-wg/pull/259) - + EAO: I think we should leave this open because MWS had strong feelings about this. And STA is working on an editorial PR to describe changes. - + #255 - + STA: This can be fixed. - + ### [#244](https://github.com/unicode-org/message-format-wg/pull/244) - + EAO: This is the black box. - + MIH: No, this is not related to the black box solution for markup elements denoted with `+` and `-`. - -EAO: Oh, this is related to function calls with no arguments. I would fine with this being closed because we are okay with the syntax in question. - + +EAO: Oh, this is related to function calls with no arguments. I would fine with this being closed because we are okay with the syntax in question. + ### [#243](https://github.com/unicode-org/message-format-wg/pull/243) - + STA: This is probably implementation dependent. - + EAO: For the question of syntax, I think we should allow such syntax. - + RCA: Should we record consensus on this, or leave it open? - + STA: We can record a consensus on the syntax, and we leave it up to the implementations to figure out how to properly implement the behavior per runtime type of the formattable object. - -EAO: And we also leave the implementation to determine how to interpret the semantics when no formatting function is provided. But there too, we don't specify in our spec. - + +EAO: And we also leave the implementation to determine how to interpret the semantics when no formatting function is provided. But there too, we don't specify in our spec. + ### [#241](https://github.com/unicode-org/message-format-wg/pull/241) - + RCA: Are we agreed on this? - + MIH: I don't think we are. - + RCA: Can I remove the `blocker` tag from the issue and remove from the tech preview milestone? - + MIH: Yes. - + ### [#240](https://github.com/unicode-org/message-format-wg/pull/240) - -MIH: I think this whole space of display elements needs to be clarified. You cannot close this one and leave the other issues open. You can remove the `blocker` tag and remove it from the tech preview milestone, but it is still conjoined with the other related issues. - + +MIH: I think this whole space of display elements needs to be clarified. You cannot close this one and leave the other issues open. You can remove the `blocker` tag and remove it from the tech preview milestone, but it is still conjoined with the other related issues. + EAO: Agreed. - + ### [#238](https://github.com/unicode-org/message-format-wg/pull/238) - + MIH: I see this in the same bucket as the markup issues. - + RCA: Do we all agree. - -STA: Yes. We need the technical preview more urgently than to solve these issues right now. - + +STA: Yes. We need the technical preview more urgently than to solve these issues right now. + ### [#236](https://github.com/unicode-org/message-format-wg/pull/236) - + STA: They got simpler now that we don't use brackets and round parentheses for delimiting patterns. - -EAO: I think we specify that we use a minimal set of - + +EAO: I think we specify that we use a minimal set of + STA: One thing is that we don't use double quotes for value literals, instead we use parentheses. And we don't use square brackets for the pattern delimiters. - + MIH: There is still some funniness about having to escape round parentheses. - + STA: Yes, but it is still a big improvement and simplification. - + #268 revisited - + RGN: Given that some measure of escaping is required for the delimiters, what else is valid in the MFv2 string that we embed, ex: embedded in a JSON string. - + EAO: Oh - + RGN: For example, I think we should at least specify how the surrogate code points should be specified. - -MIH: But should that be in MF specification? For example, C allows unpaired surrogates or invalid UTF-8. These are things that should be invalidated at the edge, not at every level of the stack. - -EAO: But - -RGN: Yes, you couldn't even save a file as UTF-8 if you had certain code points in between the delimiters, for example the surrogate code points. That would be detrimental to the functionality, and I'm not aware of any purpose this would serve. - + +MIH: But should that be in MF specification? For example, C allows unpaired surrogates or invalid UTF-8. These are things that should be invalidated at the edge, not at every level of the stack. + +EAO: But + +RGN: Yes, you couldn't even save a file as UTF-8 if you had certain code points in between the delimiters, for example the surrogate code points. That would be detrimental to the functionality, and I'm not aware of any purpose this would serve. + EAO: Should we make an editorial PR to specify this? - + MIH: I'm in agreement with you that these types of sequences would be invalid, but does specifying this in the standard require every implementation to check for this in their implementation to be valid? - + RGN: We would first assume that a MF string that is parsed - + MIH: I think that would be expensive to implement. - + RGN: That is how everything is implemented -- CSS, HTML, JavaScript. - -MIH: If I create a UTF-8 file and put an invalid sequence in there, then what happens? Does it get dropped, does it get converted to the invalid Unicode character? - + +MIH: If I create a UTF-8 file and put an invalid sequence in there, then what happens? Does it get dropped, does it get converted to the invalid Unicode character? + RGN: HTML is difficult, and it may be dropped. - -MIH: Right, I'm not ready to assume that what is provided is a sequence of code points. This makes things expensive. We can drop to only assuming that we get a sequence of code points. - + +MIH: Right, I'm not ready to assume that what is provided is a sequence of code points. This makes things expensive. We can drop to only assuming that we get a sequence of code points. + EAO: I think we want MF, just like all layers of the stack, to validate the string. - + MIH: This will also complicate the parser. - + RGN: But we already have the parser rejecting `\l` then the same rejection logic should be rejecting other invalid sequences of code units. - -If your parser is implementing a parser for UTF-16, then you - -RGN: The grammar for MF is defined in terms of code points. If you're folding in the lower-level concern to get UTF-8 representing code units vs UTF-16 vs. CECU UTF-8, but the grammar is the same regarding terminal code points. But it - + +If your parser is implementing a parser for UTF-16, then you + +RGN: The grammar for MF is defined in terms of code points. If you're folding in the lower-level concern to get UTF-8 representing code units vs UTF-16 vs. CECU UTF-8, but the grammar is the same regarding terminal code points. But it + MIH: Yes, but the rejection should be happening earlier. - + RGN: I think it is should be in scope to define what happens when you receive invalid UTF-8 strings. - + MIH: Do we make it in scope to validate UTF-8? - + RGN: Yes. - -MIH: I'm trying to the bridge between the syntax and the implementation. I'm just trying to understand the implication of this idea on the tech preview implementation that I'm working on. - + +MIH: I'm trying to the bridge between the syntax and the implementation. I'm just trying to understand the implication of this idea on the tech preview implementation that I'm working on. + STA: I think we would like to somehow express that we only accept valid code points. - + MIH: What I'm trying to put my finger on, is do we say that we reject such invalid sequences of code units, or do we say that it is just an error but is undefined behavior? - + RGN: Also, some implementations will replace inavlid code units with the replacement code point. And that is one way to solve this problem that I am proposing. - + EAO: We could just copy XML: https://www.w3.org/TR/xml/#charsets - -STA: We can go further and also exclude control characters. There was a previous discussion in Fluent with API https://github.com/projectfluent/fluent/issues/182. Fluent decided to be very lenient about it, but maybe RGN is right. - + +STA: We can go further and also exclude control characters. There was a previous discussion in Fluent with API https://github.com/projectfluent/fluent/issues/182. Fluent decided to be very lenient about it, but maybe RGN is right. + RCA: What is the resolution here? - + STA: I'm okay forbidding incomplete surrogate pairs? - -RGN: Could we tackle it piecemeal? First disallow unpaired surrogate code points, and then later consider other code points that are awkward to represent? - + +RGN: Could we tackle it piecemeal? First disallow unpaired surrogate code points, and then later consider other code points that are awkward to represent? + STA: Is the consequence of it that, if we ban surrogate pairs, then UTF-16 encoded strings-- - -RGN: It's not banning surrogate pairs. The point of surrogate code points is to have a range of code points in the BMP that can express code points outside of the BMP when taken together in pairs. So they are a UTF-16 consideration. But that then means that these are the only code points cannot be encoded in UTF-8. I believe that this is the only part of the grammar that is unrestricted. - + +RGN: It's not banning surrogate pairs. The point of surrogate code points is to have a range of code points in the BMP that can express code points outside of the BMP when taken together in pairs. So they are a UTF-16 consideration. But that then means that these are the only code points cannot be encoded in UTF-8. I believe that this is the only part of the grammar that is unrestricted. + RCA: What should we do? - + RGN: I think this PR is needed, so I will do it. - + ### [#233](https://github.com/unicode-org/message-format-wg/pull/233) - + EAO: I would be happy to leave this open because ZBI has actively participated in the discussion and isn't here now. - + ### [#209](https://github.com/unicode-org/message-format-wg/pull/209) - + EAO: Let's remove the syntax tag from this. - -## Implementation Status - -EAO: Reviewing the Data Model originated on the JS implementation at https://github.com/tc39/proposal-intl-messageformat - + +## Implementation Status + +EAO: Reviewing the Data Model originated on the JS implementation at https://github.com/tc39/proposal-intl-messageformat + RCA: Can we have a playground to show some live examples of this? - -STA: It is nice to see a working implementation of this. I had a comment about the data model of the parsed output, that is related to how you separated PatternMessage from SelectMessage. I think it would be nice to have that division, and just have to have the SelectMessage. The reason is that it makes it trivial for tooling to make it possible to start adding variants without having to change the kind of message data structure. - + +STA: It is nice to see a working implementation of this. I had a comment about the data model of the parsed output, that is related to how you separated PatternMessage from SelectMessage. I think it would be nice to have that division, and just have to have the SelectMessage. The reason is that it makes it trivial for tooling to make it possible to start adding variants without having to change the kind of message data structure. + The other thing is that I would call them definitions, not declarations. - -EAO: I also have tooling that is public on NPM under the `messageformat` package. It can accept MF1 syntax strings and turn it into a MF2 data model / formatter. There is also a part of the package that can accept Fluent syntax pattern strings and do something similar. - -I have not yet published the XLIFF support to NPM yet because it needs revisiting. That is why I tagged these components separately in NPM. Currently, the MF1 and Fluent support is just one-way at the moment. + +EAO: I also have tooling that is public on NPM under the `messageformat` package. It can accept MF1 syntax strings and turn it into a MF2 data model / formatter. There is also a part of the package that can accept Fluent syntax pattern strings and do something similar. + +I have not yet published the XLIFF support to NPM yet because it needs revisiting. That is why I tagged these components separately in NPM. Currently, the MF1 and Fluent support is just one-way at the moment. diff --git a/meetings/2022/notes-2022-08-15.md b/meetings/2022/notes-2022-08-15.md index 0109fab1ff..3a5226b6d6 100644 --- a/meetings/2022/notes-2022-08-15.md +++ b/meetings/2022/notes-2022-08-15.md @@ -1,4 +1,5 @@ ### August 15th, meeting Attendees + - Romulo Cintra (RCA) Igalia - David Filip - XLIFF TC, Huawei (DAF) - Eemeli Aro - Mozilla (EAO) @@ -7,64 +8,59 @@ - Zibi Braniecki - Amazon (ZBI) - Richard Gibson - OpenJSF (RGN) - -## MessageFormat Working Group Contacts: +## MessageFormat Working Group Contacts: - [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) - - ### Agenda - -- Past meeting Actions + +- Past meeting Actions - Implementation Status - Review PR’s & Open Issues -- Message resource WG - - - -## Past meeting Actions -- [ ] Consensus about what we should do next regarding meeting cadence, should we cancel extended meetings ? - +- Message resource WG + +## Past meeting Actions + +- [ ] Consensus about what we should do next regarding meeting cadence, should we cancel extended meetings ? + RCA: Can we revisit whether we need our weekly extended meetings? - + EAO: Let's revisit this question later in the meeting. - - + ## Implementation Status - + MIH : I have a design doc and it’s posted for review and it’s how public API should look for ICU, about implementation. I have something working, and I’m using a JSON file provided by EAO to analyze how the JS implementation is working on tests. - + I found some parts that I don’t think we should support due to the fact that they aren't specified. - + As soon as I have a more “stable” implementation I will share the PR. - -EAO: The [JSON test data](https://github.com/messageformat/messageformat/blob/master/packages/mf2-messageformat/src/__fixtures/test-messages.json) is entirely authored by me. It currently has an Apache2 license. If there is a place that I can publish it that is more usable, then I am happy to do it. - + +EAO: The [JSON test data](https://github.com/messageformat/messageformat/blob/master/packages/mf2-messageformat/src/__fixtures/test-messages.json) is entirely authored by me. It currently has an Apache2 license. If there is a place that I can publish it that is more usable, then I am happy to do it. + ECH : Can you put it in the repo? - -EAO : I can add it and modify license accordingly with needs, I can PR against `main` or `develop`, what do you think? - + +EAO : I can add it and modify license accordingly with needs, I can PR against `main` or `develop`, what do you think? + MIH: I would say put it in the `develop` branch along with the working spec. Once the spec is ready, we merge it to `main`. - + EAO: Some parts are specific to my implementation, like testing for error codes, which I don't think is relevant to the common spec. - + MIH: I did some minor modifications. If in your test is an error array and gets an error it just passes, I consider no error. - + EAO: That is correct. - + RCA: What is the status of the implementation work? - + EAO: The JS work is effectively done, and it's available on npm as [messageformat@next](https://www.npmjs.com/package/messageformat/v/next). - -There are some details about inconsistencies found but aren’t published yet (how markup it’s handled and how its stringification is black-boxed as we agreed). Tooling is also available to parse ICU MessageFormat 1 and Fluent messages for use by the MessageFormat 2 runtime. There are some updates that I haven't yet released, like the 2-way sync between MF2 and Fluent. Please give it a try. It implements the API in the ECMA-402 `Intl.MessageFormat` proposal for JS that I am championing. - + +There are some details about inconsistencies found but aren’t published yet (how markup it’s handled and how its stringification is black-boxed as we agreed). Tooling is also available to parse ICU MessageFormat 1 and Fluent messages for use by the MessageFormat 2 runtime. There are some updates that I haven't yet released, like the 2-way sync between MF2 and Fluent. Please give it a try. It implements the API in the ECMA-402 `Intl.MessageFormat` proposal for JS that I am championing. + RCA: I think this deserves a blog post to announce the work that you have done. - + ECH: I think it’s really interesting to get more feedback and see how people in general feel. - -MIH: - + +MIH: + ``` let $foo = {$count :number currency=$cur precision=2 rounding=up} let $pers = {$foo :person level=formal} @@ -76,20 +72,21 @@ match {$bar} when foggy {...} when * {...} ``` - -Here is an example of our current working syntax. Something like this is legal, and I saw an instance of this in our JSON test data. However, I think this is confusing for a user, and it is not clear what the expected behavior should be. - + +Here is an example of our current working syntax. Something like this is legal, and I saw an instance of this in our JSON test data. However, I think this is confusing for a user, and it is not clear what the expected behavior should be. + EAO: It might be more helpful to paste the data from JSON to be more direct trying to solve the problem - + MIH: This is not a strawman argument, I have to implement. - + EAO: I don't think anyone is asking you to implement a cast from a person to a date. - + MIH: Is that a cast? - + EAO: There should be an example where there are 2 local variables, and one depends on the other, and one is being used as a selector. - -MIH: + +MIH: + ``` let $foo = {$bar} match {$foo} @@ -97,44 +94,44 @@ when one {one} when * {other} ... ``` - -This is what the syntax allows. I think someone mentioned back in March that local variables should not refer to other local variables. 1) It is unclear what the use case is 2) It makes the implementation messy and opens to recursion that is complex. - + +This is what the syntax allows. I think someone mentioned back in March that local variables should not refer to other local variables. 1) It is unclear what the use case is 2) It makes the implementation messy and opens to recursion that is complex. + I would like to disallow this unless there is a clear use case. - -EAO: Right now we included in the spec that it's not legal to have a dependency loop on local variables. - + +EAO: Right now we included in the spec that it's not legal to have a dependency loop on local variables. + The spec as we currently have it certainly allows things like `$bar` depend on `$foo`, and `$baz` depend on `$bar`. It requires a parser to detect whether we have a cycle in the declaration dependencies. - -For potential use cases… We have a way of saying certain shapes come from a custom fn and we want to use two different fields as local variables, the benefit comes from not having to duplicate code in local variables that depend on the same thing. - + +For potential use cases… We have a way of saying certain shapes come from a custom fn and we want to use two different fields as local variables, the benefit comes from not having to duplicate code in local variables that depend on the same thing. + You are not going to find a grammatical use for this that cannot be worked around, but it can greatly simplify the creation of a message where you have a person and you need the person's name and the person's age. - -RCA: Should we be more strict ont he recommendation we make in the spec? We already have a recommendation regarding cycles of dependencies. - + +RCA: Should we be more strict ont he recommendation we make in the spec? We already have a recommendation regarding cycles of dependencies. + MIH: No, what it doesn't cover is messing with types and changing types. So it makes it difficult for the custom function to know what it should return. - -It requires that the local variables that depend on each other have the same functions. Declaring that constraint might help, but if the function options specified for each of the local variables differ even when the functions are the same, is that okay or not? - + +It requires that the local variables that depend on each other have the same functions. Declaring that constraint might help, but if the function options specified for each of the local variables differ even when the functions are the same, is that okay or not? + ``` let $foo = {$amount :currency precision ….} let $bar = {$foo :currency minfractional=2} ``` - + I think that seeing EAO's code would be more helpful to understand. - + EAO: This question is different from the question about local variables depending on each other. If we had a local variable used inline in a formatter statement... I think the answer here is that we define this through the function registry. The registry will define what the inputs and outputs are for a function. - -In one of the examples you provide, you had a person object, and you were feeding it into a date. When we have the registry, we can throw an error when types don't match. - -MIH: This pushes the registry beyond what I imagined. This is a registry of formatter functions. The functions in the registry are not general transformational functions. The registry functions should be for formatting only, not free-form functions. If that is not the case, then we have a potentially bigger problem. - + +In one of the examples you provide, you had a person object, and you were feeding it into a date. When we have the registry, we can throw an error when types don't match. + +MIH: This pushes the registry beyond what I imagined. This is a registry of formatter functions. The functions in the registry are not general transformational functions. The registry functions should be for formatting only, not free-form functions. If that is not the case, then we have a potentially bigger problem. + EAO: I would say that, any of things we mention are part of the test suite, are what we should discuss. We shouldn't talk about this question right now because it's not in the test suite. - + MIH: The problem still remains that I can choose to not support this in the current ICU tech preview, but these are valid in the spec. It is not clear what the expected behavior should be. - -The other question I have is whether a selection pattern should allow a selector to be specified without an associated formatting function. For example: - + +The other question I have is whether a selection pattern should allow a selector to be specified without an associated formatting function. For example: + ``` match {$bar} when 1 {...} @@ -142,187 +139,183 @@ match {$bar} when male {...} when foggy {...} when * {...} - + match {$bar} when 1 {...} when * {...} ``` - -We have talked previously about plural selection messages needing the plural formatter in order to truly make a correct selection on the formatted number. The formatter matters in order to give the formatted number, and selection cannot correctly occur prior to formatting. - + +We have talked previously about plural selection messages needing the plural formatter in order to truly make a correct selection on the formatted number. The formatter matters in order to give the formatted number, and selection cannot correctly occur prior to formatting. + So I would suggest that we alway require a formatter specified in the selector to avoid such problems. - -ZBI: So, I generally agree with MIH with problem definition. In my mind it’s possible to define a function that can accept different types. My intention was to use metadata to annotate semantic comments to help define the types of message and allow tooling to reason about this. - -A type definition for the $bar would be more important than the definition - - + +ZBI: So, I generally agree with MIH with problem definition. In my mind it’s possible to define a function that can accept different types. My intention was to use metadata to annotate semantic comments to help define the types of message and allow tooling to reason about this. + +A type definition for the $bar would be more important than the definition + EAO: My main point is that what you're effectively presenting there is how the MessageFormat v1 currently works. - -MIH: In MF1 you have to define if `plural` or `select` for a selection message. But that allows you to know what that selector does. `select` means you have a literal match, whereas `plural` means you need to treat the input as a formatted number. I would argue that function it’s more important than the type. - -A function is useful because if I say `{foo :gender}` because I go to the registry, take the `:gender` function, and look at the matches of ___. So the type is less important than the function. - + +MIH: In MF1 you have to define if `plural` or `select` for a selection message. But that allows you to know what that selector does. `select` means you have a literal match, whereas `plural` means you need to treat the input as a formatted number. I would argue that function it’s more important than the type. + +A function is useful because if I say `{foo :gender}` because I go to the registry, take the `:gender` function, and look at the matches of \_\_\_. So the type is less important than the function. + ZBI: What you want to do is pass the same the same variable to 2 different functions. - + MIH: That's totally fine, it's fine today. - - + ZBI: Do you want to create a function that takes a union type (say a number or a string), and require the function to negotiate (dispatch) on behavior on the type. - + RCA: I did not see a resolution or action point on the previous question. What do we want to do with that. - + EAO: I don't think there is anything to do, unless MIH wants to open an issue. - + MIH: - + Here is a pattern `{Hello {$foo :fn1} and {$foo :fn2}}` where `fn1(input: String | Number)` and `fn2(input: Number | Boolean)`. - -ZBI: I am trying to reason about what you said before, and I think we should think about this a lot. This is what I was envisioning for Fluent and want us to think about for MessageFormat v2. - + +ZBI: I am trying to reason about what you said before, and I think we should think about this a lot. This is what I was envisioning for Fluent and want us to think about for MessageFormat v2. + ``` // $foo (String) - Name of the person (example: “John”) {Hello {$foo :fn1} and {$foo :fn2}} ``` - -I want the CAT tool to be able to take metadata for `$foo` to show a snippet so that the translator can see information about `$foo` when translating with live feedback. I don't think you can implicitly deduce that from the function signature. - + +I want the CAT tool to be able to take metadata for `$foo` to show a snippet so that the translator can see information about `$foo` when translating with live feedback. I don't think you can implicitly deduce that from the function signature. + MIH: I agree that example is useful, but as a translator, I don't think it matters if a date formatter potentially takes a Date object, a Calendar object, a milliseconds from epoch value, etc. - + ZBI: What was discussion of the previous question? - -MIH: For my implementation, I won't allow a local variable to reference another local variable. I will also require all selectors to be invoked with a function. - -ZBI: How do you test for recursion in the local variable dependencies? - + +MIH: For my implementation, I won't allow a local variable to reference another local variable. I will also require all selectors to be invoked with a function. + +ZBI: How do you test for recursion in the local variable dependencies? + MIH : I don’t, because disallowing the dependencies prevents the problem to begin with. - -EAO : Can you you list things on the `develop` branch (spec) that you might not implement or is underspecified or it’s not useful to implement ? - + +EAO : Can you you list things on the `develop` branch (spec) that you might not implement or is underspecified or it’s not useful to implement ? + MIH: I'm arguing that these features are not useful and error prone, and should therefore be disallowed in the spec. - + EAO: Are you saying that doing so isn't possible? - + ZBI: No, it's possible, it's all code, anything is possible. - + MIH: Yes, it's possible, but it's error prone, so I don't think we should entertain them in the spec. - + EAO: MIH, can you file issues for these 2 topics and describe the issues in them? - + MIH: Yes. - + RCA: To clarify, these issues won't block your implementation, right? - -MIH: Yes and no. I should know what the resolution to these issues are to know whether I need to do further work. But I have already said what I am currently doing about them in the meantime in my tech preview implementation. - + +MIH: Yes and no. I should know what the resolution to these issues are to know whether I need to do further work. But I have already said what I am currently doing about them in the meantime in my tech preview implementation. + EAO: So long as we have documentation about these issues, we can proceed. And anyways, we wouldn't have time to wait for a resolution on these issues and finish a tech preview before the deadline. - -ZBI: MIH, I hope you can file an issue to describe what you're doing. Also, you could write tests. In my implementation, I have a parser that has a dirty state in order to address security concerns. We need to test against "billion laughs attack" (probably Billion laughs attack). - + +ZBI: MIH, I hope you can file an issue to describe what you're doing. Also, you could write tests. In my implementation, I have a parser that has a dirty state in order to address security concerns. We need to test against "billion laughs attack" (probably Billion laughs attack). + I'm concerned about the recursion possibility if we allow message references, and maybe we should detect cycles there, too. - -Conclusion: + +Conclusion: - [ ] MIH will file 2 issues for the 2 questions brought up about cycles in local variable dependencies, and requiring functions for selector invocations. -- [ ] MIH will file an issue about potential cycles among message reference dependencies. +- [ ] MIH will file an issue about potential cycles among message reference dependencies. - [ ] EAO will file an issue about protecting against the Billion laughs attack, which includes mention of whether limits on the return string, etc. are needed by implementations or not, etc. - - + ## Issue [#268](https://github.com/unicode-org/message-format-wg/issues/268) - + MIH: I am okay with the related PR #290. - + EAO: Does anyone have concerns about the pull request, or can we merge now? - + MIH: I am fine. - + EAO: When we get the +1 from MIH, then I will merge the PR. - -RGN: Everything falls into different categories. We have ASCII control characters, we have the Unicode non-characters. That is the only set I would push for to disallow. It would be unusual to see them in strings. When you do a sampling of them across languages, it seems difficult to include them in strings. - + +RGN: Everything falls into different categories. We have ASCII control characters, we have the Unicode non-characters. That is the only set I would push for to disallow. It would be unusual to see them in strings. When you do a sampling of them across languages, it seems difficult to include them in strings. + EAO: I think 2 things are happening here. By the time that we see a string, it has already come from a resource format like JSON that would already have had an escaping method to represent such characters. Let's say you have a null character representation in JSON, then it comes to MessageFormat as a null character code point. - -RGN: I think that would be misleading because it will look like a nonprinting character, and so - -MIH: I disagree that such strings necessarily need to represent visually non-printing characters in order to represent them. I think it is not the job of MessageFormat to validate the strings. This is something that should be handled at the edges of the system, but not in every step in between. -Example of API that takes a string with the 0 character: the lpstrFilter in + +RGN: I think that would be misleading because it will look like a nonprinting character, and so + +MIH: I disagree that such strings necessarily need to represent visually non-printing characters in order to represent them. I think it is not the job of MessageFormat to validate the strings. This is something that should be handled at the edges of the system, but not in every step in between. +Example of API that takes a string with the 0 character: the lpstrFilter in https://docs.microsoft.com/en-us/windows/win32/api/commdlg/ns-commdlg-openfilenamea - + ECH: Please look at information from the Unicode book (core spec) https://github.com/unicode-org/message-format-wg/issues/268#issuecomment-1212540949 - -I think it touches upon a lot of aspects of our previous discussions on the topic. Initially, I wasn't sure how to understand the topic, and my instinct was to prevent MessageFormat from having to enforce a position on the principle of keeping separate concerns from getting intertwined, in order to avoid complexity. But I feel like the points and practical points about the frequency of such uncommon characters and the efficiency concerns if multiple layers of the application stack had to repeat string validity checks, plus ICU's approach which is a garbage in, garbage out approach for efficiency reasons and for not taking a position on user intention, all makes me more confident in my original inclination - -RGN: It still is in the control of this group to decide what is allowed. After PR #290, that set of allowed characters will be slightly smaller. But we still are allowing characters that are control characters and defined to never be a character. That is a valid stance, however, other formats disallow it. - -EAO: Maybe we have a different perspective here because messages will exist in resource formats and representations, and they will - + +I think it touches upon a lot of aspects of our previous discussions on the topic. Initially, I wasn't sure how to understand the topic, and my instinct was to prevent MessageFormat from having to enforce a position on the principle of keeping separate concerns from getting intertwined, in order to avoid complexity. But I feel like the points and practical points about the frequency of such uncommon characters and the efficiency concerns if multiple layers of the application stack had to repeat string validity checks, plus ICU's approach which is a garbage in, garbage out approach for efficiency reasons and for not taking a position on user intention, all makes me more confident in my original inclination + +RGN: It still is in the control of this group to decide what is allowed. After PR #290, that set of allowed characters will be slightly smaller. But we still are allowing characters that are control characters and defined to never be a character. That is a valid stance, however, other formats disallow it. + +EAO: Maybe we have a different perspective here because messages will exist in resource formats and representations, and they will + MIH: The MessageFormat API doesn't care about how messages are stored and serialized. If someone wants to store messages in JSON, I can do so, including escaping characters. But it is a concern in between the storage and MessageFormat, but it is not the job of MessageFormat to enforce it. - + RGN: I think JSON is a good analogy. JSON is any sequence of UTF-16 code units, but you can't represent them raw. Because the textual format of those characters for interchange. MessageFormat, as described, is similar. - -EAO: I think the difference is that MessageFormat ___. - -RGN: If that is the case, then there is no reason to define that MessageFormat is not a textual representation at all. Just say that a string is just a sequence of code points, and that absolves you of the concern at all. That description is what I'm hearing from MIH. - -MIH: What I am saying is that different levels have different concerns. For example, - + +EAO: I think the difference is that MessageFormat \_\_\_. + +RGN: If that is the case, then there is no reason to define that MessageFormat is not a textual representation at all. Just say that a string is just a sequence of code points, and that absolves you of the concern at all. That description is what I'm hearing from MIH. + +MIH: What I am saying is that different levels have different concerns. For example, + http://docs.oasis-open.org/xliff/xliff-core/v2.1/os/xliff-core-v2.1-os.html#cp “Represents a Unicode character that is invalid in XML” - -DAF: BTW, this method is taken from Unicode LDML - + +DAF: BTW, this method is taken from Unicode LDML + RGN: What I'm referring to is the textual representation of the message, not what can or cannot be handled in the in-memory structure by the MessageFormat runtime. - -ECH: Let's understand what RGN is saying. So the problem that you're talking about is the textual form of how a person may author a message, say in some source code. And because this textual form will get copy-and-pasted around, it should avoid the common pitfalls of interchange between systems by being explicit about that serialized text form for. - + +ECH: Let's understand what RGN is saying. So the problem that you're talking about is the textual form of how a person may author a message, say in some source code. And because this textual form will get copy-and-pasted around, it should avoid the common pitfalls of interchange between systems by being explicit about that serialized text form for. + RGN: Yes. - -ECH: Okay, so I do see the concern. It still does seem like a concern of how a message gets serialized, which we have no control over -- how control character may be escaped is as much of an implementation detail as whether the serialization is UTF-8 or UTF-16 encoded, or whether the container format is JSON, XML, Fluent, or something else. So it's a separate concern, and so I wouldn't go so far as to say that MessageFormat should enforce anything at the serialization level. But I think it would be okay for the spec to point out the problem and give a strong recommendation to implementers and users to be mindful of this problem. - -EAO: In the example of someone emailing a message to someone else, the key here is that every one of these container formats, there is their own version of escape characters in that container format of special characters. So it may be that you are not ever really able to copy paste messages without being aware of the container format. - -RGN: I don't share that confidence. People will copy paste messages, from console or source control. I think that if you say that MessageFormat is an object, then you will see people copy-paste messages and that will lead to the propagation of control characters. - -MIH: I think these are separate concerns. If I, as the MessageFormat runtime, see a control character, then I won't necessarily know how it was originally serialized on disk. Nor would I need to know. - + +ECH: Okay, so I do see the concern. It still does seem like a concern of how a message gets serialized, which we have no control over -- how control character may be escaped is as much of an implementation detail as whether the serialization is UTF-8 or UTF-16 encoded, or whether the container format is JSON, XML, Fluent, or something else. So it's a separate concern, and so I wouldn't go so far as to say that MessageFormat should enforce anything at the serialization level. But I think it would be okay for the spec to point out the problem and give a strong recommendation to implementers and users to be mindful of this problem. + +EAO: In the example of someone emailing a message to someone else, the key here is that every one of these container formats, there is their own version of escape characters in that container format of special characters. So it may be that you are not ever really able to copy paste messages without being aware of the container format. + +RGN: I don't share that confidence. People will copy paste messages, from console or source control. I think that if you say that MessageFormat is an object, then you will see people copy-paste messages and that will lead to the propagation of control characters. + +MIH: I think these are separate concerns. If I, as the MessageFormat runtime, see a control character, then I won't necessarily know how it was originally serialized on disk. Nor would I need to know. + Java code: + ``` String msg = "{Hello \u0000 world}”; mf2 = MessageFormat2.parse(msg); ``` - -ZBI: I'm slightly concerned about the assumptions that you are making here, MIH. For version control systems, it is important to have a round trip that you can serialize and parse. With Fluent, we had a problem with lossiness where we used the replacement character. Part of the solution was to escape whitespace and to remember whether a character was escaped or unescaped. - -EAO: I think a slightly different level of approach. This is a good introduction to having a working group for having a resource for messages. So that we can have a canonical resource specification, and then we can describe there how to serialize messages, including how to escape characters. - + +ZBI: I'm slightly concerned about the assumptions that you are making here, MIH. For version control systems, it is important to have a round trip that you can serialize and parse. With Fluent, we had a problem with lossiness where we used the replacement character. Part of the solution was to escape whitespace and to remember whether a character was escaped or unescaped. + +EAO: I think a slightly different level of approach. This is a good introduction to having a working group for having a resource for messages. So that we can have a canonical resource specification, and then we can describe there how to serialize messages, including how to escape characters. + ZBI: I'm not sure I agree with the idea of moving the discussion to a resource format, which is not addressing the issues. Then we allow any sort of system to give us a stream of bytes, and we open the possibility of non-round tripping of messages. - + EAO: I think that is an implementation detail. - + ECH: I agree that it is an implementation detail. And the argument of having non-resource format based systems likes DBs, etc. supplying messages is actually all the more reason for not enforcing this restriction in MessageFormat itself, and keep it in the serialization format, in order to not intertwine separate concerns. - + RGN: But factually, there is a serialization format, because that is the syntax spec that we have in the `develop` branch. And if we want to be permissive of problematic characters, then more work needs to be done. - + MIH: I would say that the syntax is not a serialization formation. - -RGN: It literally is. We have an EBNF description. Sure, we can describe it as a data structure, but we have a syntax for the textual form. - -MIH: It is a separation of concerns. In my implementation, I parse into the data structure, and the syntax represents what is in memory logically, but that has no bearing on the serialized form. - -EAO: For MessageFormat, we have to solve it for the general case. I recognize, RGN, that this is not exhaustive of all of the use cases, such as directly getting messages from an external system, or console logging messages, and if we copy and paste, the result will be misleading. If we can describe a canonical message resource message syntax, then we can solve the problem of interchange through escaping in a consistent way. -ECH: I want to observe that I think we are talking about 2 different things, and that we all seem to agree on one of them. We are talking about the syntax of what allowed in the in-memory representation, and then there is the syntax the serialized form of messages. From what I've just heard, I think we are all in agreement that the in-memory form does not need to place any restrictions on the code points of the Unicode string coming in as input to the API. +RGN: It literally is. We have an EBNF description. Sure, we can describe it as a data structure, but we have a syntax for the textual form. + +MIH: It is a separation of concerns. In my implementation, I parse into the data structure, and the syntax represents what is in memory logically, but that has no bearing on the serialized form. + +EAO: For MessageFormat, we have to solve it for the general case. I recognize, RGN, that this is not exhaustive of all of the use cases, such as directly getting messages from an external system, or console logging messages, and if we copy and paste, the result will be misleading. If we can describe a canonical message resource message syntax, then we can solve the problem of interchange through escaping in a consistent way. + +ECH: I want to observe that I think we are talking about 2 different things, and that we all seem to agree on one of them. We are talking about the syntax of what allowed in the in-memory representation, and then there is the syntax the serialized form of messages. From what I've just heard, I think we are all in agreement that the in-memory form does not need to place any restrictions on the code points of the Unicode string coming in as input to the API. However, we need a separate syntax to cover the concept of what is allowed for a serialized string, which we are also referring to, and we need to tease these 2 separate concepts apart. RGN: The syntax we have worked on is in fact the syntax of those serialized forms. -ECH: I want to make another observation, which is that I think we are looking at the syntax, as we have in the `develop` branch, and we are seeing the same thing and interpreting it differently. So I think we're talking past each other. +ECH: I want to make another observation, which is that I think we are looking at the syntax, as we have in the `develop` branch, and we are seeing the same thing and interpreting it differently. So I think we're talking past each other. So the question I think we all don't share the same answer to is: What does the syntax represent -- the string to be parsed into the API that directly determines the in-memory representation, or is it the serialized form of the message that a user might author? If we continue this discussion, I think this would be a good starting point so that we can avoid talking past each other and instead discuss from a common understanding. - - + ## Review PR's & Open Issues - -## Message resource WG - + +## Message resource WG diff --git a/meetings/2022/notes-2022-08-22.md b/meetings/2022/notes-2022-08-22.md index a198bd48a8..95d66a4dda 100644 --- a/meetings/2022/notes-2022-08-22.md +++ b/meetings/2022/notes-2022-08-22.md @@ -1,4 +1,5 @@ ### August 22nd, meeting Attendees + - Eemeli Aro - Mozilla (EAO) - Mihai Nita - Google (MIH) - Zibi Braniecki - Mozilla (ZBI) @@ -6,150 +7,144 @@ - Staś Małolepszy - Google (STA) - Elango Cheran - Google (ECH) - - -## MessageFormat Working Group Contacts: +## MessageFormat Working Group Contacts: - [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) - - ### Agenda - + - PR [#294](https://github.com/unicode-org/message-format-wg/pull/294) Explain the liberal Text - Issue [#292](https://github.com/unicode-org/message-format-wg/issues/292) Spec clarification: resolving the type when chaining local variables -- Issue [#293](https://github.com/unicode-org/message-format-wg/issues/293 ) Spec clarification: selection without a function - - +- Issue [#293](https://github.com/unicode-org/message-format-wg/issues/293) Spec clarification: selection without a function + ## PR [#294](https://github.com/unicode-org/message-format-wg/pull/294): Explain the liberal Text - -RGN: We had discussion on this topic last week, and there was followup discussion. I was convinced to come around with what seems like the majority opinion, which is to allow anything that is valid in a Unicode format as valid. This is basically in line with what the majority opinion is. However, I included verbiage to advise users about intended usage because unusual characters could appear unescaped. - + +RGN: We had discussion on this topic last week, and there was followup discussion. I was convinced to come around with what seems like the majority opinion, which is to allow anything that is valid in a Unicode format as valid. This is basically in line with what the majority opinion is. However, I included verbiage to advise users about intended usage because unusual characters could appear unescaped. + EAO: Is anyone opposed? - + STA: I took a look, it looked good to me. - + MIH: It sounds good to me. My opinion is that the unusual characters are not that unusual, which was my only reason for objection. - + STA: Have we ever considered picking an unusual escape character? - + MIH: We had a discussion at some point, I don’t remember who was present, and we discussed whether it would be okay to use a character outside of ASCII. And I think MWS strongly advised against this from some prior experience. - + STA: Okay, just curious. I didn’t want to derail this discussion. - + EAO: As for me, I don’t think there is another candidate for escaping other than backslash. - + STA: Yes, and there is no reason to change things now. - + ## Issue [#292](https://github.com/unicode-org/message-format-wg/issues/292) Spec clarification: resolving the type when chaining local variables - - - -## Issue [#293](https://github.com/unicode-org/message-format-wg/issues/293 ) Spec clarification: selection without a function - + +## Issue [#293](https://github.com/unicode-org/message-format-wg/issues/293) Spec clarification: selection without a function + EAO: Summary: the issue proposes to disallow having selectors declared in a selection message that doesn’t specify its formatting function. - + MIH: Actually, ZBI is agreement with the proposal in the issue. - + EAO: One aspect that wasn’t clear to me is whether we’re talking about a limitation in the resulting data structure of the message or a limitation imposed on the syntax. - -MIH: I would say that we shouldn’t allow it. When I parse the message, I should expect to get an error there. If we enforce this in the syntax / data structures, it would make parsing messier. I would be fine just throwing an error in the parser. One reason is because of ZBI’s point – we can always add it later, which would be a backwards compatible change. - + +MIH: I would say that we shouldn’t allow it. When I parse the message, I should expect to get an error there. If we enforce this in the syntax / data structures, it would make parsing messier. I would be fine just throwing an error in the parser. One reason is because of ZBI’s point – we can always add it later, which would be a backwards compatible change. + STA: My initial reaction is in line with ZBIs, but I am still thinking about it. - + EAO: Making this change would make it difficult if not impossible to convert Fluent into MessageFormat v2. Fluent has a structure in which a selector is implicitly typed, and it would make existing Fluent impossible to convert to MFv2 syntax in an automated manner. - -MIH: I would argue that the same problems of going from Fluent to MFv2 … because of that migration, we would be willing to pay the price of living with errors in lint. - + +MIH: I would argue that the same problems of going from Fluent to MFv2 … because of that migration, we would be willing to pay the price of living with errors in lint. + ECH: 1. Being consistent means requiring the formatting function for a selector, as we know we have to do for plurals 2. How are Fluent messages not automatically upgradable to MFv2? - -STA: What are the reasons for forbidding this bare style of selectors? How is that different from linting inside placeholders? - -MIH: In the issue itself, I explain the disadvantages. One is that if I see `match {$foo} * {...}`, I don’t know as a translator what type of case match values that correspond to the selector. Second is I don’t know how to lint such messages. Third, is in Fluent, if I have a plural (number) match, and the input numerical value is 1, I don’t know to distinguish between `“1”` and `”ONE”`. It creates a risky runtime feature. - -STA: If we have a risky runtime, then the runtime is broken. For the problem in placeholders, we expect that everything should be stringifiable. For the first issue, I can see the problem there. I thought we were working on the assumption that most tools don’t let translators touch messages. - -EAO: ECH, in Fluent, you can have different variants in a selection message. The syntax of Fluent doesn’t require an annotation for what type of value selection should be performed on. Only at runtime is that type is known. At runtime, we wrap values as a NumberValue for plural selects. The actual behavior ends up depending on the runtime value. In Fluent, it’s not required to expect the input value be of a certain type, so we don’t know if the input is the number 1 or if it’s the string `”ONE”`. - -ECH: I see the problem, but it sounds like a Fluent problem. We want to have a clear contract between the inputs of a formatter function and the output, and it relates to what I said earlier, which is that we select on formatted value. - -EAO: This is related to whether a user has the ability to pass options to a formatter. What is the value of a value in a match statement. I think we should support partially formatted values in a select statement, and allow partially formatted values equal to other values. It is hard for me to see how that would work if we always require functions for selectors. - -MIH: I want to refer back to a point from STA. He asked how is this (selectors) different from placeholders? Most of the times, placeholders can be black boxes and they “just work” for formatting. Selectors are not black boxes. The tools won’t know what type of value the selector represents. The idea that EAO was describing, that a formatted value would have methods on how to format itself, is - + +STA: What are the reasons for forbidding this bare style of selectors? How is that different from linting inside placeholders? + +MIH: In the issue itself, I explain the disadvantages. One is that if I see `match {$foo} * {...}`, I don’t know as a translator what type of case match values that correspond to the selector. Second is I don’t know how to lint such messages. Third, is in Fluent, if I have a plural (number) match, and the input numerical value is 1, I don’t know to distinguish between `“1”` and `”ONE”`. It creates a risky runtime feature. + +STA: If we have a risky runtime, then the runtime is broken. For the problem in placeholders, we expect that everything should be stringifiable. For the first issue, I can see the problem there. I thought we were working on the assumption that most tools don’t let translators touch messages. + +EAO: ECH, in Fluent, you can have different variants in a selection message. The syntax of Fluent doesn’t require an annotation for what type of value selection should be performed on. Only at runtime is that type is known. At runtime, we wrap values as a NumberValue for plural selects. The actual behavior ends up depending on the runtime value. In Fluent, it’s not required to expect the input value be of a certain type, so we don’t know if the input is the number 1 or if it’s the string `”ONE”`. + +ECH: I see the problem, but it sounds like a Fluent problem. We want to have a clear contract between the inputs of a formatter function and the output, and it relates to what I said earlier, which is that we select on formatted value. + +EAO: This is related to whether a user has the ability to pass options to a formatter. What is the value of a value in a match statement. I think we should support partially formatted values in a select statement, and allow partially formatted values equal to other values. It is hard for me to see how that would work if we always require functions for selectors. + +MIH: I want to refer back to a point from STA. He asked how is this (selectors) different from placeholders? Most of the times, placeholders can be black boxes and they “just work” for formatting. Selectors are not black boxes. The tools won’t know what type of value the selector represents. The idea that EAO was describing, that a formatted value would have methods on how to format itself, is + The argument of not supporting a one-way upgrade from Fluent messages to MFv2 is not a strong reason, and even by that measure, MessageFormat v1 is more established and widely used of a format so that should be more of a concern than Fluent for upgradeability, and we already acknowledged that it is a non-goal. - -STA: I acknowledge that we don’t want to put formatting functions in placeholders, that would be wasteful.. I think there is a point about consistency to be made. We all agree that we want to skip formatting functions in placeholders. So if that is the case, then how is this case of selectors consistent with that of placeholders? - -EAO: I think there are 3 layers of type detection for linters, formatters, etc. If we always have a function attached to a selector, then we don’t need to lookup what the type is. The second lookup is, say if we’re matching `match {$foo} * {...}`, then we could have `$foo` set by a local variable with a well-defined type. The third level is to look into the semantic comments in the message that declares that `$foo` is a number, which allows a linter / tooling to have access to that info. My sense is that the place this becomes questionable, where the arguments are valid, are the questions about tooling and validation. We are likely to end up with more than one solution. So maybe we end up with a selector without a function being a linting problem rather than a syntax problem. - + +STA: I acknowledge that we don’t want to put formatting functions in placeholders, that would be wasteful.. I think there is a point about consistency to be made. We all agree that we want to skip formatting functions in placeholders. So if that is the case, then how is this case of selectors consistent with that of placeholders? + +EAO: I think there are 3 layers of type detection for linters, formatters, etc. If we always have a function attached to a selector, then we don’t need to lookup what the type is. The second lookup is, say if we’re matching `match {$foo} * {...}`, then we could have `$foo` set by a local variable with a well-defined type. The third level is to look into the semantic comments in the message that declares that `$foo` is a number, which allows a linter / tooling to have access to that info. My sense is that the place this becomes questionable, where the arguments are valid, are the questions about tooling and validation. We are likely to end up with more than one solution. So maybe we end up with a selector without a function being a linting problem rather than a syntax problem. + ECH: To STA’s point, I think we do indeed want to have formatting functions specified for placeholders, and correct me if I’m wrong, but we allow placeholders without functions with the assumption that the implicit function is a “.toString()”, and we leave it to the implementations to decide. - -MIH: Some of the points that EAO raised are good. For example, in my implementation, whenever I select for plurals, I always make it a separate type. Selecting on a number is not enough – plurals are a different animal. If all I have is a numerical value, is the plural supposed to be ordinal or cardinal?, etc. - + +MIH: Some of the points that EAO raised are good. For example, in my implementation, whenever I select for plurals, I always make it a separate type. Selecting on a number is not enough – plurals are a different animal. If all I have is a numerical value, is the plural supposed to be ordinal or cardinal?, etc. + What exactly are saving by not requiring to have a formatter function for the selector? What is the benefit, exactly? - -STA: The argument about the local variable is a good one. Say you define it at the top of the message, and then match on it. Maybe it’s a matter of convenience over semantics. I don’t think we don’t want to enforce formatting functions in placeholders everywhere, which is why we have local variables. That’s a bigger quality of life improvement. So maybe it would be a little bit inconsistent, but I think it would be good to say that if we use a local variable as a selector, then the definition of the local variable must include a formatting function. - -EAO: One possible expansion direction if we consider partially formatted values as first class citizens – can you use a partially formatted value as a selector, provided that selection functionality is provided for it? It would be good to be able to provide a value to a formatting function that specifies the range of output values. Could it be possible to use the metadata to specify that information to help tooling and humans understand how it works? Is it possible to allow implementations to decide the formatting for selectors without functions? - -MIH: Regarding metadata, you can theoretically specify all the information for output values, but why? We would have all that information already in the registry. We shouldn’t have to specify that again in metadata of the messages. - -Regarding STA’s comments on local variables, yes, local variables are convenient to avoid repeating formatting options, etc, in placeholders. But the benefit of reuse when it comes to selectors is minimal – there is only one instance in a message of a selector declaration, so the reuse argument goes away. - -When it comes to supporting partially formatted values, we should be supporting new value types by adding a new function that supports that type. So to say that they are not a first class ci - -ECH: Yes, I didn’t understand how partially formatted values are different from any other type, so what I wanted to say is the same as how MIH expects things to work. So I am not sure what “first-class citizen” means for a type, but I am wary. The point of having these interfaces for formatting and selection functions and a registry to put your implementations is the whole point of allowing you to bring your own types and implementations. - + +STA: The argument about the local variable is a good one. Say you define it at the top of the message, and then match on it. Maybe it’s a matter of convenience over semantics. I don’t think we don’t want to enforce formatting functions in placeholders everywhere, which is why we have local variables. That’s a bigger quality of life improvement. So maybe it would be a little bit inconsistent, but I think it would be good to say that if we use a local variable as a selector, then the definition of the local variable must include a formatting function. + +EAO: One possible expansion direction if we consider partially formatted values as first class citizens – can you use a partially formatted value as a selector, provided that selection functionality is provided for it? It would be good to be able to provide a value to a formatting function that specifies the range of output values. Could it be possible to use the metadata to specify that information to help tooling and humans understand how it works? Is it possible to allow implementations to decide the formatting for selectors without functions? + +MIH: Regarding metadata, you can theoretically specify all the information for output values, but why? We would have all that information already in the registry. We shouldn’t have to specify that again in metadata of the messages. + +Regarding STA’s comments on local variables, yes, local variables are convenient to avoid repeating formatting options, etc, in placeholders. But the benefit of reuse when it comes to selectors is minimal – there is only one instance in a message of a selector declaration, so the reuse argument goes away. + +When it comes to supporting partially formatted values, we should be supporting new value types by adding a new function that supports that type. So to say that they are not a first class ci + +ECH: Yes, I didn’t understand how partially formatted values are different from any other type, so what I wanted to say is the same as how MIH expects things to work. So I am not sure what “first-class citizen” means for a type, but I am wary. The point of having these interfaces for formatting and selection functions and a registry to put your implementations is the whole point of allowing you to bring your own types and implementations. + EAO: The question of whether selectors should be allowed without a function is an implementation decision, as we decided before. - + MIH: I don’t think we decided on this before. - + STA: I think it needs to be implementation specific. - + MIH: Whether or not you require a function for a selector is orthogonal to whatever your set of formatting functions in your registry happens to be. - + I haven’t heard a use case of where it is beneficial to use a selector without a formatting function. - -EAO: Same reason as the `#` symbol in ICU MessageFormat. It automatically formats the number and selects the plural category. - -MIH: But ICU MessageFormat doesn’t do selection without a function. Secondly, there is only place in a message where a selector is declared, so no reuse occurs. - + +EAO: Same reason as the `#` symbol in ICU MessageFormat. It automatically formats the number and selects the plural category. + +MIH: But ICU MessageFormat doesn’t do selection without a function. Secondly, there is only place in a message where a selector is declared, so no reuse occurs. + STA: The key here is whether selectors and placeholders are the same – that is the crux of the argument. - + EAO: Or whether selectors and placeholders can be treated the same. - + MIH: I still haven’t heard a use case where it is beneficial to treat selectors and placeholders the same, beyond the general idea of flexibility, but nothing concrete. - + EAO: We discussed in the CLDR-TC+ICU-TC meetings that local variables are useful for matching. - + MIH: No, local variables are useful for reusing in formatting, but not for selecting. - -STA: Right, I think EAO proposed a few months ago to make it possible for selection to be done using local variables. So I think that’s the crux of the argument. - + +STA: Right, I think EAO proposed a few months ago to make it possible for selection to be done using local variables. So I think that’s the crux of the argument. + MIH: But selection functions and formatting functions have different interfaces, so they are different. - + STA: That is one way of implementing it, but I am trying to offer an alternative for another way of implementing it. - -MIH: Selection and formatting are 2 different operations. Each of the 2 distinct operations warrants its own interface. If you want a user-defined type to implement both interfaces, that’s fine. But those 2 concepts need to be kept distinct. - -ZBI: I feel the mental model that MIH is coming in with is significantly different from the mental model with which Fluent was built. I wonder if this will let us implement our implementations, and let it decide which one is canonical. I am not sure that we cannot derive that decision just now. - -STA: I wanted to suggest a resolution. Perhaps this is different from what ZBI is saying. ZBI wants to defer a decision until we had an implementation. But rather, because EAO wants a strict superset of what MIH wants, we can go with the common intersection subset, which is to always require the narrower constraints – always require a formatting function for selectors. Even though it may be inconsistent with the rules with placeholders, I think there are good reasons for the rules for placeholders and selectors, so it makes sense. - -EAO: ECH, to your question about the question of what “first-class citizen” means about types in relation to partially formatted values, it is the wrong time to make a decision on this. We don’t want to deviate from our spec too much at this point. - -STA: ___ would serve as an archive of constraints - -MIH: I agree that this decision would change any of the current implementations – it is too late given our time deadlines. I opened these issues in order for the group to decide on this for the future. - -If the group decides that this is an implementation decision, then I would be fine. But I would still have to argue very strongly within Google that the linter would have to do a lot of work to be very strict to prevent this clear pitfall. Imagine that you put it into ECMA-402 for JS Intl, if you provide values of a type unknown to the JS Intl MesageFormatv2 implementation, and the type is unknown to the default implementation, you will get unexpected behavior unless you do both of: specifying the formatter type, and add to the registry an appropriate function for that type. - + +MIH: Selection and formatting are 2 different operations. Each of the 2 distinct operations warrants its own interface. If you want a user-defined type to implement both interfaces, that’s fine. But those 2 concepts need to be kept distinct. + +ZBI: I feel the mental model that MIH is coming in with is significantly different from the mental model with which Fluent was built. I wonder if this will let us implement our implementations, and let it decide which one is canonical. I am not sure that we cannot derive that decision just now. + +STA: I wanted to suggest a resolution. Perhaps this is different from what ZBI is saying. ZBI wants to defer a decision until we had an implementation. But rather, because EAO wants a strict superset of what MIH wants, we can go with the common intersection subset, which is to always require the narrower constraints – always require a formatting function for selectors. Even though it may be inconsistent with the rules with placeholders, I think there are good reasons for the rules for placeholders and selectors, so it makes sense. + +EAO: ECH, to your question about the question of what “first-class citizen” means about types in relation to partially formatted values, it is the wrong time to make a decision on this. We don’t want to deviate from our spec too much at this point. + +STA: \_\_\_ would serve as an archive of constraints + +MIH: I agree that this decision would change any of the current implementations – it is too late given our time deadlines. I opened these issues in order for the group to decide on this for the future. + +If the group decides that this is an implementation decision, then I would be fine. But I would still have to argue very strongly within Google that the linter would have to do a lot of work to be very strict to prevent this clear pitfall. Imagine that you put it into ECMA-402 for JS Intl, if you provide values of a type unknown to the JS Intl MesageFormatv2 implementation, and the type is unknown to the default implementation, you will get unexpected behavior unless you do both of: specifying the formatter type, and add to the registry an appropriate function for that type. + MIH: https://projectfluent.org/play/ + ``` message = {$foo -> @@ -158,34 +153,37 @@ message = *[other] {$foo} messages } ``` + In the Config tab: + ``` { "foo": "1" } ``` + Result: `"1 messages"` - -ECH: Given that we're discussing this issue for the future, I don't understand the distinction that ZBI was drawing between the mental models of what MIH is saying and what EAO is saying. What MIH says makes sense -- we know that selection depends on the formatted value, so we should make formatting function required on the selector to ensure we get a formated value. I still haven't heard a reason for why we should allow omitting the formatter function on a selector, but I like STA's way of framing things -- MIH's constraints allow a smaller strict subset of EAO's constraints, so we can go with MIH's constraints and require functions on selectors, and have a backwards-compatible evolution to the API in the future if we want to relax constraints. - -STA: I realize that we’ve have different threads of discussion. But maybe MIH and I can meet to discuss how we can implement the extensibility in both cases. - + +ECH: Given that we're discussing this issue for the future, I don't understand the distinction that ZBI was drawing between the mental models of what MIH is saying and what EAO is saying. What MIH says makes sense -- we know that selection depends on the formatted value, so we should make formatting function required on the selector to ensure we get a formated value. I still haven't heard a reason for why we should allow omitting the formatter function on a selector, but I like STA's way of framing things -- MIH's constraints allow a smaller strict subset of EAO's constraints, so we can go with MIH's constraints and require functions on selectors, and have a backwards-compatible evolution to the API in the future if we want to relax constraints. + +STA: I realize that we’ve have different threads of discussion. But maybe MIH and I can meet to discuss how we can implement the extensibility in both cases. + EAO: MIH, can we have what you're describing in the form a user story filed as an issue? - -MIH: I’m talking about restricting selectors to always have the functions specified. So the use case should be to describe why the restriction should be relaxed. - -STA: Not about the selectors requiring functions, but rather, it’s the difference between 2 implementations – one that has the objects doing formatting versus one implementation having functions to do the formatting. - -MIH: The functional style allows you to support additional user-defined types that are not already known to the implementation. I was just trying to describe how a system could be implemented to achieve the flexibility to support user-defined value types that we expect from MFv2. - + +MIH: I’m talking about restricting selectors to always have the functions specified. So the use case should be to describe why the restriction should be relaxed. + +STA: Not about the selectors requiring functions, but rather, it’s the difference between 2 implementations – one that has the objects doing formatting versus one implementation having functions to do the formatting. + +MIH: The functional style allows you to support additional user-defined types that are not already known to the implementation. I was just trying to describe how a system could be implemented to achieve the flexibility to support user-defined value types that we expect from MFv2. + STA: But the functional style restricts the types you can support. - -ECH: I was following what MIH was saying, but I am confused by the last statement from STA. Please add me to the meetings that you two have, STA and MIH. - + +ECH: I was following what MIH was saying, but I am confused by the last statement from STA. Please add me to the meetings that you two have, STA and MIH. + MIH: Email to the ICU / CLDR Design list - + > Dear ICU team & users, -> +> > I would like to propose the following API for: ICU 72 > > Please provide feedback by: next Tuesday, 2022-08-26 @@ -193,22 +191,20 @@ MIH: Email to the ICU / CLDR Design list > Designated API reviewer: Markus > > Ticket: [ICU-22124](https://unicode-org.atlassian.net/browse/ICU-22124) -> +> > This is for the Tech Preview of the MessageFormat v2 functionality that the Message Format Working Group (MFWG) has been working on for close to three years. -> +> > Right now, this is Java-only, and all of the APIs are marked @internal: -> +> > ``` > * @internal ICU 72 technology preview > * @deprecated This API is for technology preview only. > ``` -> +> > For background on the proposal, APIs, and design, please see the original proposal document: -["ICU4J: MessageFormat2 APIs and Design"](https://docs.google.com/document/d/1NkHwFRWV9MiQROq4VVBh21tRMrP2DxpODg3fjHh0-Jw/) -> +> ["ICU4J: MessageFormat2 APIs and Design"](https://docs.google.com/document/d/1NkHwFRWV9MiQROq4VVBh21tRMrP2DxpODg3fjHh0-Jw/) +> > This is entirely new functionality in entirely new source files and package; no existing source files are affected. -> +> > Best regards, > Mihai - - \ No newline at end of file diff --git a/meetings/2022/notes-2022-09-19.md b/meetings/2022/notes-2022-09-19.md index 947258396c..fcc21ac6a9 100644 --- a/meetings/2022/notes-2022-09-19.md +++ b/meetings/2022/notes-2022-09-19.md @@ -1,4 +1,5 @@ ### September 19th, meeting Attendees + - Romulo Cintra (RCA) Igalia - Eemeli Aro - Mozilla (EAO) - Elango Cheran - Google (ECH) @@ -6,96 +7,93 @@ - Richard Gibson - OpenJSF (RGN) - Staś Małolepszy - Google (STA) - -## MessageFormat Working Group Contacts: +## MessageFormat Working Group Contacts: - [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) - - ### Agenda - + Intro to Message resource WG. -Registry Definition Language +Registry Definition Language Tech Preview Status and feedback Test Suite - -## Past meeting Actions - + +## Past meeting Actions + ## Implementation Status - + ## Review PR's & Open Issues - + ## Registry Definition Language - + EAO: STA, is this something that you could come up with based on your previous work? - + STA: What I worked on in XML might be interesting for MIH and ECH to take a look at. - + MIH: We have something in our EM proposal, and I think what we had looks pretty similar. - + STA: Yes. The version you created was JSON-based, and I only chose XML because it allowed me to think in terms of an XML schema. - -I care more about the data that is defined. One thing that I explored was having different signatures for functions, and one of the signatures could be called based on the input provided. - + +I care more about the data that is defined. One thing that I explored was having different signatures for functions, and one of the signatures could be called based on the input provided. + How do we talk about this in a structured manner? - + EAO: Can you review what you have and come back in a couple of weeks with a presentation. - + STA: I can do that, although I need to consider the work from the other proposals. - + MIH: We still need to resolve quite a few issues regarding the MessageFormat itself, like markdown, naming, etc. - -STA: What I am hearing is that the registry work is not top priority right now because there is ongoing implementation work. But I can still do a comparison of the proposals for a registry and have that ready in a month. We can either take up the issue in a month, or later if we need to. - + +STA: What I am hearing is that the registry work is not top priority right now because there is ongoing implementation work. But I can still do a comparison of the proposals for a registry and have that ready in a month. We can either take up the issue in a month, or later if we need to. + ECH: Back to STA’s idea, can we continue a weekly cadence of meetings, but have them shorter, and not as extended meetings but as a touch point for people working on implementations to resolve the existing blocking issues. - + MIH: Yes, and can we use the Github issue labels “blocking” and “blocker-candidate” to label and triage the issues in the meetings? - + EAO: I agree with a shorter weekly series of meetings for this. - -## Message resource WG - -EAO: I am trying to get an official CLDR working group to discuss message resources. I am proposing that we have a separate repo for this so that it can be separate from the MFWG repo. - + +## Message resource WG + +EAO: I am trying to get an official CLDR working group to discuss message resources. I am proposing that we have a separate repo for this so that it can be separate from the MFWG repo. + MIH: Yes, I think this is better kept separate because it would distract from the MFWG communications if the resource format issues were in the same MFWG repo. - + ECH: My concern is that a separate group on an issue might relitigate an issue that we already have a consensus on in a different way from our consensus. That is just based on what I’ve observed in the past, so it would be safer to have such a message resource group within MFWG. - + RCA: I am concerned that having the work on this will detract from the work done here in MFWG since there is a large overlap of people. - -MIH: I agree with EAO that having this group outside of MFWG would be better for progress. A lot of users of MessageFormant have their own formats and standards, but I think ECMA-402 doesn’t have a standard, so that is fair to design for. However, I think this effort is more natural under ECMA-402. I don’t know how connected they should be, so the more disconnected these groups are, the better it would be. - + +MIH: I agree with EAO that having this group outside of MFWG would be better for progress. A lot of users of MessageFormant have their own formats and standards, but I think ECMA-402 doesn’t have a standard, so that is fair to design for. However, I think this effort is more natural under ECMA-402. I don’t know how connected they should be, so the more disconnected these groups are, the better it would be. + EAO: I am hearing that only RCA and ECH are opposed to having this Message Resource Group outside of the current MFWG. - -ECH: Listening to what MIH said, I do think it is possible to have the group outside of MFWG. My concern was just the possibility that we as a group would go back on previous decisions. - -MIH: I think this would be a hard thing to create a universal standard for. There are so many different platform with their own constraints and preferences. If it were done under Unicode, then I would feel compelled to get involved, in order to ensure that any issues I see are brought up and problematic things opposed. - -RCA: My thought is also about the impact to the MFWG group. it may be that such a group might belong in a different venue, echoing what MIH and ECH said would be outside of Unicode, or maybe even outside of EMCA-402 too. - + +ECH: Listening to what MIH said, I do think it is possible to have the group outside of MFWG. My concern was just the possibility that we as a group would go back on previous decisions. + +MIH: I think this would be a hard thing to create a universal standard for. There are so many different platform with their own constraints and preferences. If it were done under Unicode, then I would feel compelled to get involved, in order to ensure that any issues I see are brought up and problematic things opposed. + +RCA: My thought is also about the impact to the MFWG group. it may be that such a group might belong in a different venue, echoing what MIH and ECH said would be outside of Unicode, or maybe even outside of EMCA-402 too. + EAO: In that case, we seem in general agreement that this message resource group would be outside of the scope of the MFWG, but we are undecided on where such a group would be under. Unless anyone is opposed, I will go ahead and create a repository to collect work for this effort, and we can decide later where this group can live. - + ## Getting feedback on MessageFormat - + MIH: Can we merge the spec that we have in the `develop` branch into `main`? It seems appropriate for the consensus that we already have on it. - + EAO: +1. - + STA: I agree. - -EAO: Also, we should start thinking about advertising the work that we have been making. Now that we have a Tech Preview implementation in ICU4J, we can start announcing publicly that we have it so that we can start soliciting feedback. Maybe we can create a Github issue template. - + +EAO: Also, we should start thinking about advertising the work that we have been making. Now that we have a Tech Preview implementation in ICU4J, we can start announcing publicly that we have it so that we can start soliciting feedback. Maybe we can create a Github issue template. + STA: We have Github issues, but we also have Github discussions. I think if we go out with a preset list of questions, we might skew the answers that we get. In my experience getting feedback, asking a set of questions always interfered with getting the full set of feedback. - + ## Action items - + - Update the README to guide feedback-givers - Create an issue template for feedback - Create a guideline about issues vs. discussions - Merge develop into main - Update urls in the ICU implementation ([1](https://github.com/unicode-org/icu/pull/2170/files#diff-94843b33f399d329dd530d74a97bdbbcf57d3fa115d17986c92cca03d694e25eR33), [2](https://github.com/unicode-org/icu/pull/2170/files#diff-94843b33f399d329dd530d74a97bdbbcf57d3fa115d17986c92cca03d694e25eR33)) -- Write a press release: Press Relase Draft - MF 2.0 +- Write a press release: Press Relase Draft - MF 2.0 - Reach out to the Editors WG to coordinate the comms about the MF TP - Change the Extended meeting to 1-hour-long weekly touchpoints. - Next meeting (Sep 26) - checkpoint before the release diff --git a/meetings/2022/notes-2022-10-24.md b/meetings/2022/notes-2022-10-24.md index c41ad6720c..f7f9b3bb67 100644 --- a/meetings/2022/notes-2022-10-24.md +++ b/meetings/2022/notes-2022-10-24.md @@ -1,11 +1,12 @@ - Attendees: Please fill in a 3-letter acronym if this is your first meeting: + - Suggestion 1: First letter of given name, First letter of surname, Last letter of surname - Suggestion 2: First initial, middle initial, last initial - Suggestion 3: Custom ### October 24th, meeting Attendees + - Romulo Cintra (RCA) Igalia - Elango Cheran - Google (ECH) - Eemeli Aro - Mozilla (EAO) @@ -13,138 +14,134 @@ Please fill in a 3-letter acronym if this is your first meeting: - Mihai Nita - Google (MIH) - Richard Gibson - OpenJSF (RGN) - -## MessageFormat Working Group Contacts: +## MessageFormat Working Group Contacts: - [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) - - ### Agenda - + Prepare for the tech preview release Open PRs Open issues Do deliverables include function registry? Next steps / Roadmap - -## Prepare for the tech preview release - + +## Prepare for the tech preview release + ECH: 4 Pull Requests and a [blog post](https://docs.google.com/document/d/1AjCDEMqfc7kvR1OH_7zaFd7YLPDgJidBvY3Whx9nrD0/edit). - -EAO: + +EAO: Add Feedback discussion category link to issue template selector Move earlier spec.md to experiments branch Update README with examples and links to feedback collection - + STA: Let’s also triage and address open PRs. - + MIH: +1 - + STA: Let’s also review the idea of changing “2.0” to “2” in the name “MessageFormat 2.0” in our messaging, repo, etc. - + MIH: "2" for the project / effort, and "2.0" for the version. - -EAO: Other specs use different naming schemes (semantic versioning, dates / years, auto-increment integers, etc.). The draft spec is okay with “DRAFT … 2.0” for now, we can revisit. - + +EAO: Other specs use different naming schemes (semantic versioning, dates / years, auto-increment integers, etc.). The draft spec is okay with “DRAFT … 2.0” for now, we can revisit. + RCA: We should revisit this as people look at the repo. - -## Open PRs - + +## Open PRs + ### Add variable resolution [#305](https://github.com/unicode-org/message-format-wg/pull/305) - + STA: From the point of view of announcing the Tech Preview, leaving this PR open for now is fine. - -MIH: I did approve this earlier today, but based on earlier discussion, it might be weird to them if we change the spec when they visit. So how do we move forward without confusing people? - -EAO: This is why I want to have a `main` for the version of the spec, and using branches to clarify the versions of the spec. We can use the features of git like git tags in our repo to refer to - + +MIH: I did approve this earlier today, but based on earlier discussion, it might be weird to them if we change the spec when they visit. So how do we move forward without confusing people? + +EAO: This is why I want to have a `main` for the version of the spec, and using branches to clarify the versions of the spec. We can use the features of git like git tags in our repo to refer to + MIH: But the links to ICU are gone. - + EAO: We can use git tags. - + STA: We can use Github Releases to create a version. - -ECH: Github Releases create git tags. Will updating the spec on `main` create confusion to users? - -RCA: Yes, it will. We should have some type of tags / snapshots / releases, and all other changes are labeled somehow as draft. - -STA: I’m not too worried about ICU contains the link to our repo’s `main` branch, since the ICU 72 implementation is labeled as Tech Preview. And `main` will always be the most current. How do we avoid bike shedding on the name of the version for the release? - + +ECH: Github Releases create git tags. Will updating the spec on `main` create confusion to users? + +RCA: Yes, it will. We should have some type of tags / snapshots / releases, and all other changes are labeled somehow as draft. + +STA: I’m not too worried about ICU contains the link to our repo’s `main` branch, since the ICU 72 implementation is labeled as Tech Preview. And `main` will always be the most current. How do we avoid bike shedding on the name of the version for the release? + EAO: What about 2.0.0-draft.1? - -STA: Does semver even apply here? If we just say “-vICU72”, then it is very clear what the release corresponds to. - + +STA: Does semver even apply here? If we just say “-vICU72”, then it is very clear what the release corresponds to. + EAO: Let’s investigate and get back to this later. - + ECH: Then do we hold off on merging PRs until then, because that is where this discussion started? - + EAO: I’m against blocking progress our work because we don’t have a name of the version. - + MIH: Nobody is suggesting that we stop working because of the name of the version. - + EAO: https://www.unicode.org/reports/about-reports.html#Versioning - + ECH: We all agree to revisit the idea of releases and version naming so that we make things clearer for users in the future. - -STA: I would like to have a single version for all of the constituent parts of the spec (syntax, formatting) rather than separate versions per constituent part. I agree with ECH’s suggestion to use auto-increment integers. - + +STA: I would like to have a single version for all of the constituent parts of the spec (syntax, formatting) rather than separate versions per constituent part. I agree with ECH’s suggestion to use auto-increment integers. + EAO: This relates to the question of “Do deliverables include the function registry?” - -STA: Some kinds of PRs that don’t touch the spec can be handled with a quicker turnaround without waiting for the plenary meetings. Maybe the Chair Group can handle that, or maybe 1 or 2 asynchronous proposals without needing to meet. - + +STA: Some kinds of PRs that don’t touch the spec can be handled with a quicker turnaround without waiting for the plenary meetings. Maybe the Chair Group can handle that, or maybe 1 or 2 asynchronous proposals without needing to meet. + EAO: The OpenJS Cross-project Council handles [this sort of need specifically](https://github.com/openjs-foundation/cross-project-council/blob/455efe54f19a93d785a70d9cc9e88a9600c4ffd2/governance/GOVERNANCE.md#fast-tracking-prs). - + ECH: +1 - + MIH: This does not apply to changes to the spec, right? - + ECH: That’s right. - + EAO: Let’s follow, and even incorporate into our rules, the Fast Tracking PRs guidelines from the link above. - + ECH: This sounds good. - -## Open issues - When do we evaluate the local variables? #299 - -## Do deliverables include the function registry? - + +## Open issues + +When do we evaluate the local variables? #299 + +## Do deliverables include the function registry? + ## Next steps / Roadmap - + RCA: I know someone who wants to sponsor the implementation in ICU4C because it will help the ecosystem that they depend on. - + STA: How much are they willing to handle changes? - + RCA: I wanted to see the level of interest of working on this. - + MIH: The problem with ICU is that things in ICU4J or ICU4C are implemented together, so they certain non-idiomatic quirks. - -ECH: - -RCA: The things that they already asked about the comments (?), the list of custom formatters, how custom formats will work, how resources will work. Comments refer to the metadata for the message at the various levels of the message (message-level, placeholder-level). Resources refer to files or bundles of messages. - -EAO: The repo for resources is at https://github.com/eemeli/message-resource-wg/. My answer to RCA’s original question is that there is no need for an implementation to not start working. It is great to have comments and questions from the outside. - + +ECH: + +RCA: The things that they already asked about the comments (?), the list of custom formatters, how custom formats will work, how resources will work. Comments refer to the metadata for the message at the various levels of the message (message-level, placeholder-level). Resources refer to files or bundles of messages. + +EAO: The repo for resources is at https://github.com/eemeli/message-resource-wg/. My answer to RCA’s original question is that there is no need for an implementation to not start working. It is great to have comments and questions from the outside. + RCA: Taking into account that ICU4C would be a port from the ICU4J version. - -EAO: Would it be a straight port? ICU4J is using - + +EAO: Would it be a straight port? ICU4J is using + MIH: I intend to rewrite the generated parser part and rewrite it by hand. - + RCA: During the next meeting, it would be nice to walk through the roadmap list that STA had created for the next iteration of MFWG. - -EAO: I think that too is predicated on what is part of the release. We have an open discussion of whether the exact contents / implementation of the function registry, and not just the shape / interface(s), are part of the release. - + +EAO: I think that too is predicated on what is part of the release. We have an open discussion of whether the exact contents / implementation of the function registry, and not just the shape / interface(s), are part of the release. + MIH: We have to decide what exactly the official release is, to discuss. - + ECH: It sounds like we are all in agreement, in different words. - + MIH: I will send out the agenda email for next week? - + EAO: What is the rotation schedule after next Mihai? - + Meeting notes doc for next week (2022-10-31 Intl.MessageFormat WG): https://docs.google.com/document/d/1oW4dIi6JZMxavLB19gMhWW-yTEV3gd6fJutJHo78NPk/edit - - diff --git a/meetings/2022/notes-2022-10-31.md b/meetings/2022/notes-2022-10-31.md index c29f5ceb95..bbabcd5948 100644 --- a/meetings/2022/notes-2022-10-31.md +++ b/meetings/2022/notes-2022-10-31.md @@ -1,4 +1,5 @@ ### October 31st, meeting Attendees + - Romulo Cintra (RCA) Igalia - Eemeli Aro - Mozilla (EAO) - Mihai Nita - Google (MIH) @@ -6,84 +7,82 @@ - Richard Gibson - OpenJSF (RGN) - Staś Małolepszy - Google (STA) -## MessageFormat Working Group Contacts: +## MessageFormat Working Group Contacts: - [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) - ### Agenda - -Scope of MF 2.0 Deliverables + +Scope of MF 2.0 Deliverables Do deliverables include the function registry? Do we need extended meetings for the time being? Perhaps go back to monthly plenaries and re-invite some people? - -## Scope of MF 2.0 Deliverables - -EAO: MessageFormat 2 might be a stack/set of related specifications. Others may want to create other specs on top of this in other places like ECMA-402, Unicode, etc. My understanding is that we’re creating a minimal sufficient set of specs to support other derivative specs to be built. - + +## Scope of MF 2.0 Deliverables + +EAO: MessageFormat 2 might be a stack/set of related specifications. Others may want to create other specs on top of this in other places like ECMA-402, Unicode, etc. My understanding is that we’re creating a minimal sufficient set of specs to support other derivative specs to be built. + MIH: That is my understanding. I would propose that we individually (not in a meeting) go through the issues, mark them as proposed-blocker. Then, as a group and based on what is in scope, we can decide on what are deliverables, label them as blockers, and must finish them before a final completion of our spec. - -STA: It is good to go back to the [list of deliverables](https://github.com/unicode-org/message-format-wg/blob/main/guidelines/goals.md#deliverables) We have a proposal for a canonical syntax, we have a data model. One trouble that I think we’ve been having is that some of the things are difficult to define because they depend on the environment in which we run. We could define the canonical data model, but an implementation might differ in how they design it. - -ECH: Data model shouldn’t change. The point is that it is something that - -STA: The idea of the data model is good, but even in these discussions, we chose to deal in terms of syntax. And so long as different implementations can serialized and deserialize the canonical syntax, then it ensures interoperability. - + +STA: It is good to go back to the [list of deliverables](https://github.com/unicode-org/message-format-wg/blob/main/guidelines/goals.md#deliverables) We have a proposal for a canonical syntax, we have a data model. One trouble that I think we’ve been having is that some of the things are difficult to define because they depend on the environment in which we run. We could define the canonical data model, but an implementation might differ in how they design it. + +ECH: Data model shouldn’t change. The point is that it is something that + +STA: The idea of the data model is good, but even in these discussions, we chose to deal in terms of syntax. And so long as different implementations can serialized and deserialize the canonical syntax, then it ensures interoperability. + MIH: We can describe the data model in terms of abstract data types without depending on any specific programming language. Portability of the implementations in a specific language to another language is not necessarily realistic, nor is it necessary. - + EAO: What does a consumer of the data model look like? Who benefits from it? Being able to take a C function and use it in a Rust impl – does that happen? Do we want to enable it? One client benefiting from the data model would be us. By having a spec based on a data model, we can have non-canonical syntaxes that we support so long as they can map to the data model. - + MIH: Yes, I agree, we should be the first client, and make sure that we can define functionality based on the data model, mapping to xliff, etc.. Users with custom formatter functions would be another client. And writers of parsers can be another client. - -STA: There are different use cases. If we say that we ourselves are a client that can help define the other parts of the spec, that sounds good. Yes, we should have a canonical data model. But what we are perhaps conflating with others parts is interoperability. Interoperability is guaranteed by a canonical syntax. The syntax allows us to describe things like maps which C has no data literal syntax for them. - + +STA: There are different use cases. If we say that we ourselves are a client that can help define the other parts of the spec, that sounds good. Yes, we should have a canonical data model. But what we are perhaps conflating with others parts is interoperability. Interoperability is guaranteed by a canonical syntax. The syntax allows us to describe things like maps which C has no data literal syntax for them. + We can provide what we make as canonical, and not mandate all of that entirely, so as to allow implementations to define their own data model. - -EAO: The parallel I would look at is what other programming languages do. They have a particular syntax to represent constructs. LLVM is an intermediate representation that allows different programming languages (front ends) to compile to the same IR. - -STA: An alternative is to allow each implementation to have its own serialization that converts back and forth to the canonical data model. But there - -ECH: That should be okay. Implementations could have different syntaxes, and so long as they support the data model, that’s fine. They can extend the data model to support - -MIH: I agree that in the end, that to say that we have a portable data model is useless. C and Java has maps and arrays. If someone wants an implementation in Java, it will be likely be after ICU4J’s implementation is available, and they can decide what parts of the ICU4J they want or not. We shouldn’t struggle to make the data model to be compatible across implementations. But for the first users, we should be mindful to make them consistent. - -EAO: I have 3 examples in mind that this discussion is relevant for. There is an argument that we might want to represent data in the data model that isn’t JSON-ifiable. Is the data model representing pure data, or can we have elements that can’t somehow? We need not discuss how that’s possible now. Second, we have cases in which we need to refer to elements that ______. We need a way of selecting among the various variants/cases in a selection message. We can represent such selection messages as a switch/case statement or an ordered map, but we don’t need to be specific about that. Third, having line numbers and other things represented could be useful instead of having placeholders. So placeholders are not always necessary, so if an implementation chooses not to use placeholders, then that should not be a problem if it is not a superset of a canonical implementation. - -STA: I’m okay with different data models, but I’m not okay with different syntaxes. Once someone tries to extend the syntax, then we have non-interoperable functionality. - -EAO: To clarify, you are only talking about the MessageFormat 2 syntax, right? We would allow an ICU MessageFormat v1 parser to convert that into a MessageFormat 2 syntax. - + +EAO: The parallel I would look at is what other programming languages do. They have a particular syntax to represent constructs. LLVM is an intermediate representation that allows different programming languages (front ends) to compile to the same IR. + +STA: An alternative is to allow each implementation to have its own serialization that converts back and forth to the canonical data model. But there + +ECH: That should be okay. Implementations could have different syntaxes, and so long as they support the data model, that’s fine. They can extend the data model to support + +MIH: I agree that in the end, that to say that we have a portable data model is useless. C and Java has maps and arrays. If someone wants an implementation in Java, it will be likely be after ICU4J’s implementation is available, and they can decide what parts of the ICU4J they want or not. We shouldn’t struggle to make the data model to be compatible across implementations. But for the first users, we should be mindful to make them consistent. + +EAO: I have 3 examples in mind that this discussion is relevant for. There is an argument that we might want to represent data in the data model that isn’t JSON-ifiable. Is the data model representing pure data, or can we have elements that can’t somehow? We need not discuss how that’s possible now. Second, we have cases in which we need to refer to elements that **\_\_**. We need a way of selecting among the various variants/cases in a selection message. We can represent such selection messages as a switch/case statement or an ordered map, but we don’t need to be specific about that. Third, having line numbers and other things represented could be useful instead of having placeholders. So placeholders are not always necessary, so if an implementation chooses not to use placeholders, then that should not be a problem if it is not a superset of a canonical implementation. + +STA: I’m okay with different data models, but I’m not okay with different syntaxes. Once someone tries to extend the syntax, then we have non-interoperable functionality. + +EAO: To clarify, you are only talking about the MessageFormat 2 syntax, right? We would allow an ICU MessageFormat v1 parser to convert that into a MessageFormat 2 syntax. + STA: I don’t think we need to be precise about other syntaxes. - + EAO: Specifically, the MessageFormat 2 data model, not syntax. - + STA: Back to interoperability, I think the syntax must be the same. - + MIH: My thinking about this is that the data model and the syntax that we propose will map to each other seamlessly. Parsing another syntax into the data model allows it to be used by the same machinery for MessageFormat 2. We shouldn’t want to standardize or make pronouncements on other syntaxes, and people are free to make other syntaxes, but they are on their own. - + EAO: Overall, we are agreeing. Let’s shift back to talking about what parts are in scope for MessageFormat 2, like the function registry. - -STA: There will be different implementations of MessageFormat 2. They could be in the same or different programming languages. I’m not arguing for the compatibility of custom function implementations. When I think of the function registry, I think we want to provide a specification of how we specify the functions available. - -MIH: Like a schema. - -STA: Yes, we should come out with a schema that is cross-platform that can be consumed - -ECH: - + +STA: There will be different implementations of MessageFormat 2. They could be in the same or different programming languages. I’m not arguing for the compatibility of custom function implementations. When I think of the function registry, I think we want to provide a specification of how we specify the functions available. + +MIH: Like a schema. + +STA: Yes, we should come out with a schema that is cross-platform that can be consumed + +ECH: + EAO: The question is, say for a plural formatting function, do we define how a plural formatting function’s inputs/outputs and how it should behave? Or do we define how someone else would define a plural .formatting function? - + STA: For me, it is the latter. - + MIH: To describe the schema, it is more than just a schema. We would want some description of behaviors that controlled by the input values, which goes beyond just the types of things. For example XLIFF. It goes beyond that schema (the element foo has these attributes & and these children), to specify what each element represents and how to use it for localization. -About whether we should predefine or pre-populate the registry with items. It is a useful thing to consider what the current functions look like in the registry, even if it is not a part of the standard. It is good to put things in there to see how it works. Otherwise we design in an ivory tower. We propose a registry schema, but we are not even sure that schema can represent the existing functionality we have in MF1 / ECMAScript / Fluent. +About whether we should predefine or pre-populate the registry with items. It is a useful thing to consider what the current functions look like in the registry, even if it is not a part of the standard. It is good to put things in there to see how it works. Otherwise we design in an ivory tower. We propose a registry schema, but we are not even sure that schema can represent the existing functionality we have in MF1 / ECMAScript / Fluent. We can also propose those things as the initial submission for the registry content, with no guarantees that they will be approved “as is” or not. We should work as if populating the registry is managed by a different entity than the MFWG (which will probably be the case). - -ECH: I’m hearing two different threads of conversation, and we can pick up with these next week. 1) Defining a schema of sorts in which functions can be defined for the registry, and 2) whether it is okay to prepopulate a registry implementation with functions not declared in the standard for beta testing purposes, without intending or expecting to impose their inclusion in the standard. Let’s pick up from there next time. - + +ECH: I’m hearing two different threads of conversation, and we can pick up with these next week. 1) Defining a schema of sorts in which functions can be defined for the registry, and 2) whether it is okay to prepopulate a registry implementation with functions not declared in the standard for beta testing purposes, without intending or expecting to impose their inclusion in the standard. Let’s pick up from there next time. + MIH: adding for next time: do we have one registry, managed by one single entity (probably ICU/CLDR)? Or separate registries (ICU/CLDR, ECMAScript, Microsoft, Apple, etc.)? My wish / vote is for one single registry. - diff --git a/meetings/2022/notes-2022-11-21.md b/meetings/2022/notes-2022-11-21.md index d30c365b0b..1567e00a97 100644 --- a/meetings/2022/notes-2022-11-21.md +++ b/meetings/2022/notes-2022-11-21.md @@ -1,4 +1,5 @@ ### November 21st, meeting Attendees + - Romulo Cintra - Igalia (RCA) - Simon Clark - Oracle (SCU) - Eemeli Aro - Mozilla (EAO) @@ -10,180 +11,176 @@ - Staś Małolepszy - Google (STA) - Zibi Braniecki - Amazon (ZBI) - -## MessageFormat Working Group Contacts: +## MessageFormat Working Group Contacts: - [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) ### Agenda + - Presentation - "Intl.MessageFormat.parseResource()" - [Slides](https://docs.google.com/presentation/d/1OUlaN_kx3t6puqWoWHSwPSrUNRFrhTo-QqAedceonCo/edit#slide=id.p) - Chair Group - Discuss current status and future - Review Open Issues - * Decide on formatting to something other than text [#272](https://github.com/unicode-org/message-format-wg/issues/272) - * Support for BiDi in placeables [#28](https://github.com/unicode-org/message-format-wg/issues/28) - * Add Bidirectional Isolation section to formatting [#315](https://github.com/unicode-org/message-format-wg/issues/315) + - Decide on formatting to something other than text [#272](https://github.com/unicode-org/message-format-wg/issues/272) + - Support for BiDi in placeables [#28](https://github.com/unicode-org/message-format-wg/issues/28) + - Add Bidirectional Isolation section to formatting [#315](https://github.com/unicode-org/message-format-wg/issues/315) - Marked some issues as **blocker-candidate**. These are issues that I think need to be resolved before declaring the spec “done”. Would be good to review, and the ones we agree on change to **blocker**. (MIH) - ## Presentation - "Intl.MessageFormat.parseResource()" - -EAO: I have been talking with STA and ZBI about the idea of a format for resources. We have a [repository](https://github.com/eemeli/message-resource-wg). Someone has also asked about Google’s ARB format. - + +EAO: I have been talking with STA and ZBI about the idea of a format for resources. We have a [repository](https://github.com/eemeli/message-resource-wg). Someone has also asked about Google’s ARB format. + I am presenting these [slides](https://docs.google.com/presentation/d/1OUlaN_kx3t6puqWoWHSwPSrUNRFrhTo-QqAedceonCo/edit#slide=id.p) for the first time before next week’s TC39 meeting. - -MIH: I’m less opinionated about JavaScript. My only comment is about slide 7 regarding the Intl.MessageFormat.parseResource. That seems backwards, it looks like MessageFormat parses resource bundles. Every other system has a resource manager that loads resources. And it loads more than strings. - -EAO: Since we’re talking about JavaScript, and we have a large range of resources that might be used, you would have a resource manager that would parse resources. It is similar to how JSON.parse() which is a static method. - -SCU : My main concerns are related with performance , the parsing performance for JSON , do you have any work around that ? - + +MIH: I’m less opinionated about JavaScript. My only comment is about slide 7 regarding the Intl.MessageFormat.parseResource. That seems backwards, it looks like MessageFormat parses resource bundles. Every other system has a resource manager that loads resources. And it loads more than strings. + +EAO: Since we’re talking about JavaScript, and we have a large range of resources that might be used, you would have a resource manager that would parse resources. It is similar to how JSON.parse() which is a static method. + +SCU : My main concerns are related with performance , the parsing performance for JSON , do you have any work around that ? + EAO: No, I haven’t done work on that. It’s something we should take into consideration but isn’t the unique that we should look at to get better performance. - + ZBI: I authored similar parser in the past(Fluent), I don’t have the numbers but we were able to get parity on perf - -ECH: How you make the the splitting of the resource manager code ? More importantly, we have the same problem of the implementation being the specification. Why can’t we have a specification like we have a data model - -EAO : For the resource manager is a question I’m not trying to answer here, API is taking in a resource as string representation and figuring it out. This not aim to be a complete solution, - + +ECH: How you make the the splitting of the resource manager code ? More importantly, we have the same problem of the implementation being the specification. Why can’t we have a specification like we have a data model + +EAO : For the resource manager is a question I’m not trying to answer here, API is taking in a resource as string representation and figuring it out. This not aim to be a complete solution, + APP : The resource is already an item ? is not a resource bundle but instead the resolved thing - -ECH : I believe this can be decoupled - -EAO : You mean this is complex or simple ? +ECH : I believe this can be decoupled + +EAO : You mean this is complex or simple ? -ECH : I said complex +ECH : I said complex APP : MIH’s reaction is like mine. When I was at Amazon, the resource bundle can host a MessageFormat, but here you have it the other way around. - + EAO: Maybe there is a problem with naming that is causing confusion? - -APP: Possibly. I took MessageResource to mean a resource bundle. Apart from the names, what is the objective of this proposal? I think others will have a similar confusion. - + +APP: Possibly. I took MessageResource to mean a resource bundle. Apart from the names, what is the objective of this proposal? I think others will have a similar confusion. + ECH: Resource manager could just provide a message then you provide that in turn to create message format object, so this can be decoupled as way to simplify it. - + ZBI: I’m raising the flag for potential performance penalties depending on the way we go. If we have to parse multiple times we loose some benefits of this model - -ECH: We shouldn’t sacrifice having the simple APIs in the name of optimization. We should at least have the simple design, and we can have both. Can we have a specification from what we are proposing here ? - + +ECH: We shouldn’t sacrifice having the simple APIs in the name of optimization. We should at least have the simple design, and we can have both. Can we have a specification from what we are proposing here ? + ECH: The problem of having a specification should be addressed at some point. That seems like an obviously important thing, just as we’ve done before with the data model for the message, to avoid the problems of “the specification is the implementation” problem of ICU MessageFormat currently.. - -EAO: All I’m doing here is proposing Stage 1, which is just an indication that is a useful things to explore. A Stage 2 proposal would require - -STA: My comment will be similar to what people have previously said about distinguishing between a Message, MessageResource, etc. Previously, we have been focusing on the building block of the set of functionality. But now we can design from both sides, including starting from what we want and working backwards to decide the API. Starting the API design perspective can inform what we’re trying to do. - -EAO: One aspect of the API design here is determining the bit that we ought to bake into JavaScript forever. The more complex the API is, the more fragile it becomes. - + +EAO: All I’m doing here is proposing Stage 1, which is just an indication that is a useful things to explore. A Stage 2 proposal would require + +STA: My comment will be similar to what people have previously said about distinguishing between a Message, MessageResource, etc. Previously, we have been focusing on the building block of the set of functionality. But now we can design from both sides, including starting from what we want and working backwards to decide the API. Starting the API design perspective can inform what we’re trying to do. + +EAO: One aspect of the API design here is determining the bit that we ought to bake into JavaScript forever. The more complex the API is, the more fragile it becomes. + APP: One of my concerns is about adoption and usability by end user and developers, if start by specifying a format would also raise concerns on tools that needs to adapt to new formats etc… - -There are a ton of libraries that exist for managing strings, and it would be useful for people to be able to plug in to those to get messages. Pulling that into JavaScript might be cool, but a lot of library people would be sad, and it would have the problem of warring formats. - -EAO: Stage 2 proposal needs to have representation either of a string of a single message working on MF runtime or an representation of entire resource messages in specific format like json. So an ICU representation should be possible to be passed for a data representation and passed to a construct from Intl API, this is not imposing any message syntax or message resource, instead provide way to developer support theirs solutions with language built ins - + +There are a ton of libraries that exist for managing strings, and it would be useful for people to be able to plug in to those to get messages. Pulling that into JavaScript might be cool, but a lot of library people would be sad, and it would have the problem of warring formats. + +EAO: Stage 2 proposal needs to have representation either of a string of a single message working on MF runtime or an representation of entire resource messages in specific format like json. So an ICU representation should be possible to be passed for a data representation and passed to a construct from Intl API, this is not imposing any message syntax or message resource, instead provide way to developer support theirs solutions with language built ins + APP: Is there a reason where you didn’t go for a map-like interface where you have a key on the left side, and message on the right side. - + EAO: Does this example in the slides cover that? - -APP: What I don’t understand is why you wouldn’t just provide a map interface? Instead, you’re making the map internally and not letting users provide a map themselves. - -EAO: I am providing map. The benefit of this is that you can use it like it JSON, where `JSON.parse()` is static method. - + +APP: What I don’t understand is why you wouldn’t just provide a map interface? Instead, you’re making the map internally and not letting users provide a map themselves. + +EAO: I am providing map. The benefit of this is that you can use it like it JSON, where `JSON.parse()` is static method. + MIH: It was already touched upon by a couple of comments, but I want to make sure that it is addressed properly. The idea of loading everything at start time would be a performance hit. - -Our experiences differ quite a lot, my experience a tiny percentage of string are at screen at same time, so some google apps have 9000 strings but in our applications, we only have 10 or 20 messages at same time. - -That is true in general for software. The difference is I would like to parse only 10 or 20 messages, rather than have to parse all 9000 strings in the resource. - -The result of that mapping is string to MF so every key/val is parsed according to MF2 syntax, so if I get a string and have to parse it for all the messages independently if using it or not we have performance footguns. Performance is parsing + memory, so parsing bits of that information would be always better than batch all results at once. - -It forces everything to be in the MessageFormat syntax. Even if it is CSS, it needs to be in the syntax, or else it needs to be in yet another file format. - -ZBI: I’ll answer MIH first, I envision making this parser work is going to require contextual understanding from the beginning and end of the message depending on what you want to produce , parsed or stringified message. So this go against having multiline message and human readable messages, so what I mean by double parsings is the understanding of multiline messages as first time, message parse. - -MIH, you should step out of your mental model of Android, and my example is CSS. We don’t have to parse all of the CSS rules/files for a given UI, we only need to parse the CSS file relevant to a specific screen. - -So this model is similar to css where messages will be spread across files need for each “screen” so the cost of parsing will reside on architecture. +Our experiences differ quite a lot, my experience a tiny percentage of string are at screen at same time, so some google apps have 9000 strings but in our applications, we only have 10 or 20 messages at same time. + +That is true in general for software. The difference is I would like to parse only 10 or 20 messages, rather than have to parse all 9000 strings in the resource. + +The result of that mapping is string to MF so every key/val is parsed according to MF2 syntax, so if I get a string and have to parse it for all the messages independently if using it or not we have performance footguns. Performance is parsing + memory, so parsing bits of that information would be always better than batch all results at once. + +It forces everything to be in the MessageFormat syntax. Even if it is CSS, it needs to be in the syntax, or else it needs to be in yet another file format. + +ZBI: I’ll answer MIH first, I envision making this parser work is going to require contextual understanding from the beginning and end of the message depending on what you want to produce , parsed or stringified message. So this go against having multiline message and human readable messages, so what I mean by double parsings is the understanding of multiline messages as first time, message parse. + +MIH, you should step out of your mental model of Android, and my example is CSS. We don’t have to parse all of the CSS rules/files for a given UI, we only need to parse the CSS file relevant to a specific screen. + +So this model is similar to css where messages will be spread across files need for each “screen” so the cost of parsing will reside on architecture. MIH: My mental model is not necessarily the Android one, it’s based on every platform I’ve seen in the last 20 years. - -ZBI: In java props the recognition of the end of message is not related with content of it , the current proposal has to do it to - + +ZBI: In java props the recognition of the end of message is not related with content of it , the current proposal has to do it to + MOC (via chat): When does the 2nd parse come into place after the initial load? - -ZBI: The 2 approaches we are approaching are, the first is that we parse the resource file to identify the locations of messages without parsing the message. The other approach is to parse the resource file and then parse the message strings identified within it. - -EAO: This proposal tries to possibility of going either way at level of implementation , you can have a single parser that parse the content or parse the whole file. The reason why syntax doesn’t need to use additional wrappers is because we discover that doing reference counting on braces it’s more work but results and doesn’t increase complexity and results should be quite performant as well. - -STA: I understand the unwillingness to double parsing proposed by ZBI. The stigma probably comes from original proposals didn’t have clear demarcation of messages. But now we have open and close curly braces to delimit messages, so we can probably make a double-pass parser pretty fast. - -From APP’s comments, I see the benefit the value of working with a map of stringified messages. I want to make the point that it is eager versus lazy. What EAO is proposing can be implemented by APIs based on splitting up the work. - -ZBI: I am losing confidence in my claim that we can’t have double performance, or that we would have to sacrifice significant performance. The second point of storing stringified messages in a map would reduce the value of this proposal versus storing this just as JSON. - -One of the values is that you are operating as an editor of a file containing messages of a consistent message syntax. If you treat it as a map whose values are message strings. What happens if there is an error in the message. I don’t think it is a dealbreaker; I think we can work around it. It is not just simplification, I think it will complicate things in a number of places. - -EAO, on slide 7, the ergonomics of loading resources is reminiscent of loading CSS, and that brings up the utility of having message references. From the memory management point of view, it is tricky to have a function to operate on this line. - + +ZBI: The 2 approaches we are approaching are, the first is that we parse the resource file to identify the locations of messages without parsing the message. The other approach is to parse the resource file and then parse the message strings identified within it. + +EAO: This proposal tries to possibility of going either way at level of implementation , you can have a single parser that parse the content or parse the whole file. The reason why syntax doesn’t need to use additional wrappers is because we discover that doing reference counting on braces it’s more work but results and doesn’t increase complexity and results should be quite performant as well. + +STA: I understand the unwillingness to double parsing proposed by ZBI. The stigma probably comes from original proposals didn’t have clear demarcation of messages. But now we have open and close curly braces to delimit messages, so we can probably make a double-pass parser pretty fast. + +From APP’s comments, I see the benefit the value of working with a map of stringified messages. I want to make the point that it is eager versus lazy. What EAO is proposing can be implemented by APIs based on splitting up the work. + +ZBI: I am losing confidence in my claim that we can’t have double performance, or that we would have to sacrifice significant performance. The second point of storing stringified messages in a map would reduce the value of this proposal versus storing this just as JSON. + +One of the values is that you are operating as an editor of a file containing messages of a consistent message syntax. If you treat it as a map whose values are message strings. What happens if there is an error in the message. I don’t think it is a dealbreaker; I think we can work around it. It is not just simplification, I think it will complicate things in a number of places. + +EAO, on slide 7, the ergonomics of loading resources is reminiscent of loading CSS, and that brings up the utility of having message references. From the memory management point of view, it is tricky to have a function to operate on this line. + EAO: Doing a sketch on that would be a nice idea to as a proof of concept - -MOC: It was my understanding that MFv2 has a hierarchical structure and reuse is a design goal. If we don’t have parsing at the message level or the bundle file, then that’s something that we have to figure out. - -EAO : We have both API’s providing the same sort of message reference and reuse capabilities, The message resource is defining how API looks from outside and can be used when parse resource is run can work in a lazy manner as we call async and build MF instances Lazy. Proposal is not stating how this should work on. - -STA: I want to go back to the eagerness vs. laziness regarding the design of returning a map of strings or not. What is the atom here? Is it a bundle of messages or is it a message? We are sending it over the wire and across boundaries. The precedent here, in the form of ICU MessageFormat, we stick to the string representation because there is no data model. We shouldn’t jump to dismissing this as a thing that MFv2 should be doing. Maybe both can coexist. As another example, the DOM has an interface for a CSS to be applied or added to the set of rules. But there is also a method to allow instantiating from a string. We don’t have to worry about a single representation being transported because the representation of a string. The design here will impact how people will use this API in a fundamental way. - -MIH: It’s again back to whether to parse to string. We have `res.get(...)` shows that we are mixing concerns between a resource manager and a message. In reality, you address messages across file boundaries, for example a file may reference a DTD. - -ZBI: I think STA conflated 2 concepts that shouldn’t be conflated. In CSS, a file isn’t a stylesheet. In fluent, a file isn’t a message bundle. The storage of messages is separate from a bundle. An example is that a stylesheet can be made from multiple CSS files. I see the proposal here as one potential source of messages, because we could get messages over the air or from a database. The main benefit of a resource file is that you operate over the messages in a certain context. - + +MOC: It was my understanding that MFv2 has a hierarchical structure and reuse is a design goal. If we don’t have parsing at the message level or the bundle file, then that’s something that we have to figure out. + +EAO : We have both API’s providing the same sort of message reference and reuse capabilities, The message resource is defining how API looks from outside and can be used when parse resource is run can work in a lazy manner as we call async and build MF instances Lazy. Proposal is not stating how this should work on. + +STA: I want to go back to the eagerness vs. laziness regarding the design of returning a map of strings or not. What is the atom here? Is it a bundle of messages or is it a message? We are sending it over the wire and across boundaries. The precedent here, in the form of ICU MessageFormat, we stick to the string representation because there is no data model. We shouldn’t jump to dismissing this as a thing that MFv2 should be doing. Maybe both can coexist. As another example, the DOM has an interface for a CSS to be applied or added to the set of rules. But there is also a method to allow instantiating from a string. We don’t have to worry about a single representation being transported because the representation of a string. The design here will impact how people will use this API in a fundamental way. + +MIH: It’s again back to whether to parse to string. We have `res.get(...)` shows that we are mixing concerns between a resource manager and a message. In reality, you address messages across file boundaries, for example a file may reference a DTD. + +ZBI: I think STA conflated 2 concepts that shouldn’t be conflated. In CSS, a file isn’t a stylesheet. In fluent, a file isn’t a message bundle. The storage of messages is separate from a bundle. An example is that a stylesheet can be made from multiple CSS files. I see the proposal here as one potential source of messages, because we could get messages over the air or from a database. The main benefit of a resource file is that you operate over the messages in a certain context. + ## Chair Group - -RCA: I want to collaborate on how we can be more productive in the group. We have been in this group for 3 years, starting on 2019-11-25. We still have 6 of the original 12 people in this group. I want to go through the things we’ve done over the last 3 years. We have the technical preview after all of the discussions and different approaches to have those discussions. I’m happy with the progress we have made so far. Since I’ve been trying to give more time than I really have, I’m stepping down from being a member of the Chair Group. The thing that I want to talk about is the future because we need to continue on our strong position and encourage more people to join or rejoin. I’ll still be participating in the group, but from a different perspective. I want to know how we can help move the group forward in a collective effort, and not just a 1-2 person effort. - + +RCA: I want to collaborate on how we can be more productive in the group. We have been in this group for 3 years, starting on 2019-11-25. We still have 6 of the original 12 people in this group. I want to go through the things we’ve done over the last 3 years. We have the technical preview after all of the discussions and different approaches to have those discussions. I’m happy with the progress we have made so far. Since I’ve been trying to give more time than I really have, I’m stepping down from being a member of the Chair Group. The thing that I want to talk about is the future because we need to continue on our strong position and encourage more people to join or rejoin. I’ll still be participating in the group, but from a different perspective. I want to know how we can help move the group forward in a collective effort, and not just a 1-2 person effort. + EAO: Does this take effect immediately, or when does this occur? - + RCA: I am not stepping down from participating in the group, but I am stepping down from my most involved way as the Chair of the group and as a member of the Chair Group. - + EAO: Who is in the Chair Group currently? - -RCA: I need to find the list. It is also out of date and needs to be updated. - -ECH: Do you have any ideas for how to - + +RCA: I need to find the list. It is also out of date and needs to be updated. + +ECH: Do you have any ideas for how to + RCA: It was hard to keep the Chair Group convening in a constant cadence. So improving the cadence of the Chair Group would help a lot. - -APP: If you’re going to have a chair of chairs, who is responsible for coordinating things, then we need to have someone to step up. You either have a name or names (plural) who take the responsibility over. Otherwise things don’t happen. - + +APP: If you’re going to have a chair of chairs, who is responsible for coordinating things, then we need to have someone to step up. You either have a name or names (plural) who take the responsibility over. Otherwise things don’t happen. + I’d consider being a chair. - + EAO: Can we have an asynchronous way to consider this idea? - + RCA: I’ll follow offline / post an issue to continue this conversation. - + EAO: It would also be good to mention the list of Chair Group members. - + RCA: - - + ## Review Open Issues - + ### Decide on formatting to something other than text #272 - -MIH: I am not sure that we will define exactly how the parts will look like when we format to parts (or format to whatever non-string representation). I imagine the resolution will say something like we will format to parts without saying how the parts will look like, exactly. - -APP: A lot of my experience is thinking about what people want to do with the parts. Let’s say that part of the output is a currency, and you need to decorate that when formatting to HTML. You don’t need to know how the number formatter works in order to do that. Mentally, it makes sense to me that MessageFormat produces a sequence of things, and that it makes sense that they have structure/metadata attached to them and available. - -ZBI: We’re going through a design reviewing of something we’re calling `icu_pattern` in ICU4X as a generic way to represent patterns for formatters and the output representation from formatters. One of the APIs could provide an Iterator over the parts, perhaps filtered by provided criteria. - -The question relevant here is whether the parts are producing BiDi control codes, or are they just holding metadata - + +MIH: I am not sure that we will define exactly how the parts will look like when we format to parts (or format to whatever non-string representation). I imagine the resolution will say something like we will format to parts without saying how the parts will look like, exactly. + +APP: A lot of my experience is thinking about what people want to do with the parts. Let’s say that part of the output is a currency, and you need to decorate that when formatting to HTML. You don’t need to know how the number formatter works in order to do that. Mentally, it makes sense to me that MessageFormat produces a sequence of things, and that it makes sense that they have structure/metadata attached to them and available. + +ZBI: We’re going through a design reviewing of something we’re calling `icu_pattern` in ICU4X as a generic way to represent patterns for formatters and the output representation from formatters. One of the APIs could provide an Iterator over the parts, perhaps filtered by provided criteria. + +The question relevant here is whether the parts are producing BiDi control codes, or are they just holding metadata + STA: In this proposal, who interweaves / zips the iterators? - + ZBI: The user has to zip the iterators. - -APP: One of the things I want to communicate about the BiDi topic is that we are not trying to re-run the BiDi algorithm and not attempting to provide ranges of directional text. Instead, just providing the limits of parts, like placeholders. And directionality can be data that is a part of a placeholder. - + +APP: One of the things I want to communicate about the BiDi topic is that we are not trying to re-run the BiDi algorithm and not attempting to provide ranges of directional text. Instead, just providing the limits of parts, like placeholders. And directionality can be data that is a part of a placeholder. + EAO: For #272, I haven’t heard someone say that we should propose explicitly what the parts look like. - -MIH: I can write the PR to describe this in the specification. \ No newline at end of file + +MIH: I can write the PR to describe this in the specification. diff --git a/meetings/2022/notes-2022-12-19.md b/meetings/2022/notes-2022-12-19.md index c4e4edec75..a7d7b0db91 100644 --- a/meetings/2022/notes-2022-12-19.md +++ b/meetings/2022/notes-2022-12-19.md @@ -1,198 +1,195 @@ ### December 19th, meeting Attendees + - Romulo Cintra (RCA) Igalia -- Addison Phillips (APP) +- Addison Phillips (APP) - Elango Cheran - Google (ECH) - Eemeli Aro - Mozilla (EAO) - Staś Małolepszy - Google (STA) - Zibi Braniecki - Amazon (ZBI) - Richard Gibson - OpenJSF (RGN) - -## MessageFormat Working Group Contacts: +## MessageFormat Working Group Contacts: - [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) ### Agenda - Group Chair - Next Steps - * December Meetings - * Next milestones + - December Meetings + - Next milestones - Review [Open Issues](https://github.com/unicode-org/message-format-wg/issues?q=is%3Aissue+is%3Aopen+label%3Ablocker-candidate) and [PRs](https://github.com/unicode-org/message-format-wg/pulls) - ## Admin +## Admin -- Cancel all meetings until 9th January - Done -- Move Jan 16 plenary to the week after - Done +- Cancel all meetings until 9th January - Done +- Move Jan 16 plenary to the week after - Done ## Group Chair - Next Steps - + RCA: Waiting for volunteers to be a chair of the group. Otherwise, we have to put it to a vote and coordinate with Unicode. - -APP: I believe we are only waiting for CLDR to formalize this, but I have volunteered to - + +APP: I believe we are only waiting for CLDR to formalize this, but I have volunteered to + ECH: +1 - + ECH: Talking about this , I also want to bring up the Chair Group, I do believe that is important to re-activate again. - + EAO : I propose that APP send a proposal about how group should be organized, if chair group should be included or ideas about the organization. - + APP: I will send out emails and announcements around this accordingly. - -STA: It would be good to review the reasons for why the Chair Group was formed. - -APP: I’m open to suggestions, we suspect that we’ll have a discussion of the cadence of our meetings, and what our deliverables for this upcoming year. The first release of CLDR and Unicode is early in the year. The next release is later in the year. I think we all share a desire to getting close to done, and in a maintainable shape, this year, so I will help drive towards that. - -ZBI: In my mental model, there are 4 big blockers. -One is Bidi, and that relates to how format to parts to work, which is the biggest blocker. + +STA: It would be good to review the reasons for why the Chair Group was formed. + +APP: I’m open to suggestions, we suspect that we’ll have a discussion of the cadence of our meetings, and what our deliverables for this upcoming year. The first release of CLDR and Unicode is early in the year. The next release is later in the year. I think we all share a desire to getting close to done, and in a maintainable shape, this year, so I will help drive towards that. + +ZBI: In my mental model, there are 4 big blockers. +One is Bidi, and that relates to how format to parts to work, which is the biggest blocker. Second is error handling. -Third is lessons learned from the design of a message resource format. I wouldn’t block on the message resource, but it would be nice to know that a resource format could be designed on top of the MessageFormat API. +Third is lessons learned from the design of a message resource format. I wouldn’t block on the message resource, but it would be nice to know that a resource format could be designed on top of the MessageFormat API. Also, I would like to see that operational bindings, like Flutter/DOM/React. I would like to see how those work. - + EAO: Also, function registry - how is it defined? How much of it do we implement? - -STA: I was also going to say the function registry, but also error handling, how much is defined, and how much - + +STA: I was also going to say the function registry, but also error handling, how much is defined, and how much + APP: For me, I want to know what are our gating criteria, and when those deadlines are. In W3C, we like to see 2 implementations. I know we have implementations in the works. Those might be a lower bar than all the things that ZBI mentioned. Would could have formatters working while we check off other desired goals. - + EAO: We also don’t have the exact algorithm of which variants/selection case messages, and we should try to specify that in case we realize after trying to implement it that we need to rethink things. - -ZBI: In terms of implementations, there is an ICU4J, a pure JS and Rust between at least two of them we can achieve the goal of 2 implementations. For Rust I got stuck on the BiDi. I couldn’t use the pattern on the implementation, so I agree with the timeline APP proposed, I will try to continue working on the implementation. Not understanding how formatToParts works with BiDi is my main blocker. Registry is also important but not blocker at this point. - + +ZBI: In terms of implementations, there is an ICU4J, a pure JS and Rust between at least two of them we can achieve the goal of 2 implementations. For Rust I got stuck on the BiDi. I couldn’t use the pattern on the implementation, so I agree with the timeline APP proposed, I will try to continue working on the implementation. Not understanding how formatToParts works with BiDi is my main blocker. Registry is also important but not blocker at this point. + STA: One more thing to add to things we are missing is markup inside translations, so we should figure it out, we agreed that we will parse matching “tags” so far. To what ZBI was saying, these are all good points, but am I not clear how much of that belongs in this group, and how much about the runtime that we want to enforce in this group. - + ZBI: I’m also concerned about this and how users will end up using it, implementations interpret the current specification and the definition of Done could be unclear, The work I’m doing will rely on the current MF2. - -EAO: Markup needs to be defined, the current syntax is … , but we have to write a second implementation, the ICU4C starts to appear. I would be happy about this. - + +EAO: Markup needs to be defined, the current syntax is … , but we have to write a second implementation, the ICU4C starts to appear. I would be happy about this. + APP: I think it’s a good idea for us to catalog what our normative requirements for specification. What is inside and what is outside. For example, the BiDi would be nice to have but we’ve excluded it from 1.0’s conformance criteria. Here is the checklist of things to be considered done. If the results of that are not useful, but I suspect whatever we put on that list will be useful, and current MessageFormat is already useful, and I think what we have is already better than that. The more we define what is in and what is - + ECH: if you’re going to have a C/C++, just do it in ICU4C. Haven’t talked to their TC to see if it is on their agenda. Already have a Java, want to be consistent. Otherwise it’ll be more confusing. - + EAO: The shape of the API I need is smaller than ICU4C. Be happy to work on one, but what I’d be doing would be smaller. - -ECH: I think it would be confusing if we had two competing C/C++ implementations, one of them in ICU4C, in addition to the implementations in Java (ICU4J), JS, and maybe Rust (ICU4X). - + +ECH: I think it would be confusing if we had two competing C/C++ implementations, one of them in ICU4C, in addition to the implementations in Java (ICU4J), JS, and maybe Rust (ICU4X). + RCA: Igalia is also interested in an ICU4C so I’ll follow up offline - -## PRs Review +## PRs Review #### Add error handling #320 + https://github.com/unicode-org/message-format-wg/pull/320/files - + EAO: Presenting the issue - + ECH: Did the group previously discuss and agree to the part of the PR that says “In all cases, when encountering an error,a message formatter must provide some representation of the message.”? - + APP: In my experience at Amazon, once you have constructed a message, you only want to entertain errors, not throws (things that can be caught and handled). Once there is an error, it’s hard to have a meaningful message, so you only want to treat the message as broken and an error. So I could entertain the position that the formulation of that statement is wrong. Have you thought about considering the possibility of doing something different in the error case. - + EAO: That would be interesting to consider returning the type of error, and let the runtime choose how to handle it. There is some work needed on my side to mark the PR from draft to ready to merge - + ECH: Just want to check when MIH would be around to have enough time to review PR. - + ZBI: After reading the notes, I disagree with APP, my mental model evaluates a localizations system as core , and we should set as baseline when everything is broken, so everything after is an improvement. IMHO everything should be static analyzed before runtime, and no errors should happen on runtime … (Zibi ?) - -APP: My [comment on the PR](https://github.com/unicode-org/message-format-wg/pull/320/files#r1052458671) had 2 parts. The phrase reads, “The selector may only match the catch-all VariantKey `*`.” The word “may” is suggestive, but the word “only” suggests “must”. - -ZBI: I think we should give the best possible result to the user instead of only making/showing an error or show empty thing , can we do better than that ? - + +APP: My [comment on the PR](https://github.com/unicode-org/message-format-wg/pull/320/files#r1052458671) had 2 parts. The phrase reads, “The selector may only match the catch-all VariantKey `*`.” The word “may” is suggestive, but the word “only” suggests “must”. + +ZBI: I think we should give the best possible result to the user instead of only making/showing an error or show empty thing , can we do better than that ? + APP: A point I was trying to make is to enumerate all of the possible cases of results. One example is to return errors, including up to a fatal error, because if you can’t return something meaningful, there is nothing to act on. Another position is EAO’s idea of returning “[???]” even though it is not useful to or actionable by the user, because it is like saying “shame on you” to the localization team responsible for the message. - -ZBI: I’m separating parsing error from resolution errors, I do believe that parser errors should at least have something usable , resolution errors we should avoid or throw. - + +ZBI: I’m separating parsing error from resolution errors, I do believe that parser errors should at least have something usable , resolution errors we should avoid or throw. + APP: The case I had in mind, There is an error within an selector and you match `*` , and you have several cases and you have an error on selector but I don’t want to match the `*` just because of that error, so might be an error but if selector still matching one of the values I can report and get the best resolution by matching on selector at least if possible. - -EAO: There is a category that I forgot, It’s a catch all part that I should add to the PR, the second thing is how we handle BiDi , we have a fallback representation and a `$` representation …how do we start/end isolation for this ? They probably would match all each other so the representation of this things needs to be thought carefully. - + +EAO: There is a category that I forgot, It’s a catch all part that I should add to the PR, the second thing is how we handle BiDi , we have a fallback representation and a `$` representation …how do we start/end isolation for this ? They probably would match all each other so the representation of this things needs to be thought carefully. + STA: We also allow this PR to land in an incomplete state , so we can iterate after, I feel this conversation would continue in loops with new use cases, so we if we have this PR landed we can set as baseline for approaching the problem. - + APP: Yeah, I think this valid and implementation experience would also help us. - + ### Use inclusive language #319 - -EAO: We should merge it during this meeting - -#### Conclusion -Merge it - + +EAO: We should merge it during this meeting + +#### Conclusion + +Merge it + ### Specify 'format to parts' (issue #272) #318 - + APP : Should we wait for MIH on this ? - -EAO: Should we have this sort of discussion as a separated document ? - -APP: OK, are you concerned about this ? - -EAO : I’m afraid we are not clear on the scope of this, so we should determinate more clearly the deliverables of implementation - + +EAO: Should we have this sort of discussion as a separated document ? + +APP: OK, are you concerned about this ? + +EAO : I’m afraid we are not clear on the scope of this, so we should determinate more clearly the deliverables of implementation + APP: We should clarify this - + ECH: there may be concerns about the language, the wording of it - - + #### Add Bidirectional Isolation section to formatting #315 - + EAO: Not sure what’s blocking PR to be merged - -APP: IMHO what’s blocking is about the base direction, so I think wording needs some updates. - -APP: Mark made a suggestion on comment that I think is close to what you mention, and describes when isolation is required, APP and EAO want to allow when returning isolation, they aren’t incompatible - + +APP: IMHO what’s blocking is about the base direction, so I think wording needs some updates. + +APP: Mark made a suggestion on comment that I think is close to what you mention, and describes when isolation is required, APP and EAO want to allow when returning isolation, they aren’t incompatible + EAO: I want to BiDI Behaviour to be testable , without reference to implementation details , like handlers or resolutions works, but end up with something in spec that lead us that we support BiDi while allow a different algorithm to be used by implementations. Because in cases we don’t include tests for isolation might originate some miss interpretation. - + EAO : I will take another look at PR and try to have it ready for next meeting - + #### Add examples for multiline/complex resources for other languages #278 - -EAO : This is stale -APP: What we should do here ? Close , comment ? +EAO : This is stale + +APP: What we should do here ? Close , comment ? + +APP: I’ll add comment , and if now updates we can close it -APP: I’ll add comment , and if now updates we can close it - - #### MF2.0 compromise syntax #266 - + STA: This was an experiment about the syntax ,I pinged Markus, I wanted to archive this work. - -EAO: We can do a merge to experiments branch - + +EAO: We can do a merge to experiments branch + STA : I can do it - + ##### Conclusion -Merge this on experiments branch - - +Merge this on experiments branch + #### Add consensus decision on formatting function context #197 - -EAO : We can address this when talking about function registry - + +EAO : We can address this when talking about function registry + ##### Conclusion -Review as soon as we talk about function registry - - -### FormatToParts - +Review as soon as we talk about function registry + +### FormatToParts + EAO: Should we define how formatToParts should be specified , if part of spec or part of implementation - -ZBI: IMHO , we have to understand the implications of not specifying this would make all the bindings specific to an implementation, so we cannot swap over implementations. So we are limited when trying to be cross-env, I’m fine with us having to adapt to bindings when changing from implementations , so I would like to see at least a recommendation in how certain things are handled. I feel that ECMA402 would be the follow example for other implementations. I wonder on each the right mental model we should choose, should we define what formatToParts should return - -APP: I like what’s MIH added, my mental thinking that formatters inside formatters can have their own formatToParts so we only can worry about one layer not worrying about parts/sequences. - + +ZBI: IMHO , we have to understand the implications of not specifying this would make all the bindings specific to an implementation, so we cannot swap over implementations. So we are limited when trying to be cross-env, I’m fine with us having to adapt to bindings when changing from implementations , so I would like to see at least a recommendation in how certain things are handled. I feel that ECMA402 would be the follow example for other implementations. I wonder on each the right mental model we should choose, should we define what formatToParts should return + +APP: I like what’s MIH added, my mental thinking that formatters inside formatters can have their own formatToParts so we only can worry about one layer not worrying about parts/sequences. + EAO: I agree that formaToParts in ecma402 would have an impact on implementations, I would argue overall this work much be easier if we don’t duplicate this spec work in both sides, so we should get standardization of this using by agreeing with what’s other implementation are doing agreeing that example ECMA spec would could be used as reference for other implementations. - -ZBI: I understand APP you’re saying , ECMA specifies so this goes back to other backends. I’m ok staying on this level but if we have to have to specify for ECMA-402 - + +ZBI: I understand APP you’re saying , ECMA specifies so this goes back to other backends. I’m ok staying on this level but if we have to have to specify for ECMA-402 + RC: interoperability is one of the key points. I think it should be done at the level of the MF spec. ECMA-402 will influence a lot. Should take some examples of formatToParts from ECMA-402 as examples. What I mean is that I would like to see 402 as the main reference but see it in mf2 - + EAO: I’m having trouble with finding why we can specify this that works with implementations - -ZBI: We are trying to make MF compatible with different use cases, DOM localization and … so formatToParts is necessary to understand what registry returns, so if there is a ref chain it should all be part of the binds on fToParts. - + +ZBI: We are trying to make MF compatible with different use cases, DOM localization and … so formatToParts is necessary to understand what registry returns, so if there is a ref chain it should all be part of the binds on fToParts. + APP: I think an infinite deep bucket resolution is not a localizable solution, I do agree that a certain level of nesting can happen. We have to be able to define a part , a structure and common use cases for it. - + EAO: If we follow this we might end defining constrictions that might end up in an sub optimal results for implementation - -APP: Maybe we can define something like a data structure or something like that to implementations can \ No newline at end of file + +APP: Maybe we can define something like a data structure or something like that to implementations can diff --git a/meetings/2023/notes-2023-01-09.md b/meetings/2023/notes-2023-01-09.md index 908df9f433..ae7f513726 100644 --- a/meetings/2023/notes-2023-01-09.md +++ b/meetings/2023/notes-2023-01-09.md @@ -1,4 +1,5 @@ ### 2023-01-09 Attendees + Addison Phillips - Unicode (APP) Eemeli Aro - Mozilla (EAO) Mihai Nita - Google (MN) @@ -6,69 +7,62 @@ Staś Małolepszy - Google (STA) Richard Gibson - OpenJSF (RGN) ### Last Meeting Attendees -- Romulo Cintra - Igalia (RCA) -- Addison Phillips - Unicode (APP) + +- Romulo Cintra - Igalia (RCA) +- Addison Phillips - Unicode (APP) - Elango Cheran - Google (ECH) - Eemeli Aro - Mozilla (EAO) - Staś Małolepszy - Google (STA) - Zibi Braniecki - Amazon (ZBI) - Richard Gibson - OpenJSF (RGN) - -## MessageFormat Working Group Contacts: +## MessageFormat Working Group Contacts: - [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) - - ### Agenda - + Introducing new chair Resolving meeting conflicts, schedule, and scribe Review [Open Issues](https://github.com/unicode-org/message-format-wg/issues?q=is%3Aissue+is%3Aopen+label%3Ablocker-candidate) and [PRs](https://github.com/unicode-org/message-format-wg/pulls) - - + ## Admin - - - + meeting every 2 weeks at 9:30 pacific, 1.5 hours - + publish notes to gh - + publish agenda - + – - + https://github.com/unicode-org/message-format-wg/issues/310 -question if “variables defined externally” are parameters or something else. +question if “variables defined externally” are parameters or something else. EAO: can have variables that are available but are not locally defined; tried not to define the shape of that API; make it possible for impl to choose to make e.g. a ctor and then a second stage to have a formatting function. allowing an implementation. if a local variable has this name then that one, else use the other. - + MN: call parameters that and not variables - + STA: what you call them depends on what way you look - -APP: call parameters that; - + +APP: call parameters that; + EAO: okay with parameters, but don’t want to needlessly limit implementations; they could have globals - -MN: keep the difference between parameters and “those extra things”; wouldn’t call them variables. could be context or - + +MN: keep the difference between parameters and “those extra things”; wouldn’t call them variables. could be context or + MN to make PR - + (further discussion) - + STA: three things being discussed; what the values area; what if you have two defined separately; what if you have two defined locally–the precedence; think we should solve the two - + MN: the local variable; you cannot redefine it; do take precedence over parameters; global variables doesn’t mean they have names; why do we need that - + STA: my position was locally defined lets take precedence; if there is a $time and if as a translator I want to decorate with with options I can do “let $time = …” with some options; and there is some change management; if a string changes in the future there is potential - + EAO: this is ends up referring to (perhaps in scope of resources) how we track change management - + – Error discussion PR on agenda for next time STA: want to talk about function registry on call of 23rd; proposals from last january, the three models each had sections on this - - diff --git a/meetings/2023/notes-2023-01-23.md b/meetings/2023/notes-2023-01-23.md index 122e294864..e0239fab56 100644 --- a/meetings/2023/notes-2023-01-23.md +++ b/meetings/2023/notes-2023-01-23.md @@ -1,81 +1,80 @@ ### 2023-01-23 Attendees -* Addison Phillips - Unicode (APP) - chair -* Tim Chevalier - Igalia (TJC) -* Eemeli Aro - Mozilla (EAO) -* Romulo Cintra - Igalia (RCA) -* Richard Gibson - OpenJSF (RGN) -* Staś Małolepszy - Google (STA) -* Simon Clark - Oracle (SCL) -* Elango Cheran - Google (ECH) -* Mihai Nita - Google (MIH) + +- Addison Phillips - Unicode (APP) - chair +- Tim Chevalier - Igalia (TJC) +- Eemeli Aro - Mozilla (EAO) +- Romulo Cintra - Igalia (RCA) +- Richard Gibson - OpenJSF (RGN) +- Staś Małolepszy - Google (STA) +- Simon Clark - Oracle (SCL) +- Elango Cheran - Google (ECH) +- Mihai Nita - Google (MIH) ### Last Meeting Attendees -* Addison Phillips - Unicode (APP) -* Eemeli Aro - Mozilla (EAO) -* Mihai Nita - Google (MIH) -* Richard Gibson - OpenJSF (RGN) -## MessageFormat Working Group Contacts: +- Addison Phillips - Unicode (APP) +- Eemeli Aro - Mozilla (EAO) +- Mihai Nita - Google (MIH) +- Richard Gibson - OpenJSF (RGN) -- [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) +## MessageFormat Working Group Contacts: +- [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) ### Action Items -* [APP] Update goals.md page per discussion, particularly non-goal 4 about implementations. -* [APP] Sync with @macchiati et al about specification format -* [All] Review PR #320 ***before*** Monday, 30 January. silence == merge +- [APP] Update goals.md page per discussion, particularly non-goal 4 about implementations. +- [APP] Sync with @macchiati et al about specification format +- [All] Review PR #320 **_before_** Monday, 30 January. silence == merge ### Agenda -https://github.com/unicode-org/message-format-wg/blob/main/meetings/agenda.md +https://github.com/unicode-org/message-format-wg/blob/main/meetings/agenda.md -* **Topic:** Agenda Review -* **Topic:** Info Share -* **Topic:** Action Item Review -* **Topic:** Admin +- **Topic:** Agenda Review +- **Topic:** Info Share +- **Topic:** Action Item Review +- **Topic:** Admin Requested by: chair Changes to labels, agenda structure, note taking. Scribe: will someone volunteer to be the official scribe or do we use a rotation? -* **Topic:** Schedule and Goals for 2023 +- **Topic:** Schedule and Goals for 2023 Requested by: chair #328 -* **Topic:** Error Handling PR +- **Topic:** Error Handling PR Requested by: EAO #320 -* **Topic:** (Discussion) Consider adding an FAQ section to the README +- **Topic:** (Discussion) Consider adding an FAQ section to the README Requested by: glen-84 #321 -* **Topic:** (Discussion) Guidance needed for dealing with selector explosions +- **Topic:** (Discussion) Guidance needed for dealing with selector explosions Requested by: STA #323 - - - ## Admin - (discussion of agenda logistics) STA and ECH are the scribes. + ## Info Share -SCL: Introducing himself. +SCL: Introducing himself. ## Action item review -APP: Waiting for MIH to open a PR with his feedback on …. +APP: Waiting for MIH to open a PR with his feedback on …. ## Topic: Schedule and Goals for 2023 + APP: Filed [Proposal for setting goals for 2023 · Discussion #328 · unicode-org/message-format-wg](https://github.com/unicode-org/message-format-wg/discussions/328) to discuss the work-back schedule for the WG's deliverables. Any thoughts today? SCL: No idea if this is doable, but the industry is eager for this. @@ -84,161 +83,110 @@ EAO: The spec, icu4j, icu4c are rather separate things. Are we discussing about APP: First, produce a normative document (UTS or … ). Second, have implementations to demonstrate the functionality. - RCA: Are we having other non-ICU implementations as well? - APP: Yes, more is better. At least 2. They should be aligned with the spec. - STA: there is the question of doing the work and the time, but also the form it takes. Is this a separate UTS or something. Action from this meeting? Does choosing one of the formats affect the dates? - APP: Fair question. We can incorporate things by reference into UAX 35. That shouldn’t affect the schedule that much. There is still the question is about getting onto the Unicode 16 train. - -EAO: - +EAO: APP: I have the release candidate of the spec scheduled for mid-August. Just a proposal, we can modify when we commit to doing them. If we decide to have a milestone for the version of the spec, then we can have a lot of discussions about the scope. - STA: We could review our originally documented deliverables for the group. [Link](https://github.com/unicode-org/message-format-wg/blob/main/guidelines/goals.md), in the section “Deliverables”. - EAO: I think all of them are fine, except we drop the XLIFF deliverable. We decided that in consultation with CLDR-TC. - STA: I don’t recall that. - ECH: I don’t recall that either. - MIH: I don’t recall that either. - APP: Will this working group do the implementation? I look at standards as not existing until you can point to one. - STA: With implementations, they are not authored by the working group, they are authored by people who are also in the working group, and those implementations don’t go through working group processes. - EAO: Impls are dependencies, not deliverables. - STA: Maybe the function registry should be a separate item in deliverables, since we added that as a requirement. - EAO: The same thing that applies to the impls applies to the XLIFF work because XLIFF is not a part of this group. - ECH: I don't recall saying that XLIFF is off the books. Also, the recommendations that we got from the CLDR TC should be documented in our [2022-04-04 meeting notes](https://github.com/unicode-org/message-format-wg/blob/main/meetings/2022/notes-2022-04-04.md). It doesn’t say no to XLIFF, it only says that the work should be done in parallel to the initial technical preview implementation in ICU4J. - MIH: If we were to leave XLIFF conversion work to someone else, it is unlikely to be done because it is such big work. - APP: Is it a lower priority to us than other deliverables? - MIH: Probably. We don’t have to always think about it, but It should be in the back of our minds as “can we map this feature to XLIFF?”. - ## Topic: (Discussion) Consider adding an FAQ section to the README + Requested by: glen-84 #321 STA: The original point of the proposal is also valid. Useful to add because some decisions are controversial. It will be useful to hear the questions of “why?” from participants. - EAO: Some of these questions were discussed before and there were strong opinions and valid reasons for delimiting the text and code parts of the syntax. Now that we have a bit more experience with the syntax, I'm curious whether these opinions are still so strong. The sticky point is about why we delimit text and not code. - ECH: We had a lot of disagreements on nitty gritty details. Some of them were a matter of taste or opinions. But we also thought we needed more opinions from other people. Having a FAQ might give people the misleading impression that these decisions are more authoritative than they are. We need this feedback to help us get us more input on what works for people. Maybe we can use the Discussions feature of the repo to make it easier for people to weigh in on what they think. - SCL: The public preview can call out these discussions explicitly. - MIH: We discussed this a lot. It's not only about trimming spaces. It's mainly about which mode to start in. About consistency. About the minimal message of `{Hello, World!}`. I'd be careful about feedback with the sample size of 1. Other reasons: recognizability of MF2 as syntax; syntax error recovery. - APP: To the question of “should we write a FAQ?”, the answer is yes. To the question of syntax, we should consider how usable it is. I’m less concerned about mixing 5 types of formats together, but the point about starting in text mode is important. We did go down this path for certain reasons, we shouldn’t have to litigate those choices. The problem isn’t questions of taste, but the performance, and whether we can explain to audiences of developers. - STA: I don’t mean the FAQ to be an authoritative source of truth, but rather a history of discussions. As we involve more people, their new feedback will repeat some of the same arguments. It will be good to document these because some later comment might give us new insight. - ECH: Suggestion: use the QA feature of github to document the questions and the answers. - MIH: I still think trimming leading and trailing whitespace is a bad idea, but it doesn't require the curly braces. Instead, it can be treated as a concern of the container format / serialization. - APP: We probably want to have description in our spec text about why things are the way they are. - MIH: What about pointing to the FAQs from the Readme, since FAQs tend to grow? - STA: ECH suggested Q&A on Github but that ties us to Github. Sometimes, it’s easier to deal with text. Or maybe we can use Discussions to draft responses to SCL’s questions, and we can convert that into text in the repo later. - APP: STA you are correct that at some point, we want to control our responses to such questions. - - - ## Topic: Error Handling PR + Requested by: EAO #320 EAO: Went through the discussion in the PR and marked a number of threads as resolved. Please go over the current text and identify blockers to merging it. - APP: The text seems okay, but it is missing classes of errors and their different states. My tendency is to think in terms of normative text as being very clear. Unicode specs like UTS 35 can be verbose, and collation can give you a high level explanation with a lot of details without telling you how to implement them. I see W3C and IETF specs as generally easier to read through and understand, so I personally favor that. - -MIH: I don't consider the PR as ready for submission. There are issues raised that don't have answers. (EAO: Example?) In one of the types of errors there's mention of including the id of the message. - +MIH: I don't consider the PR as ready for submission. There are issues raised that don't have answers. (EAO: Example?) In one of the types of errors there's mention of including the id of the message. EAO: It was addressed. MIH: It wasn't. APP: Are you looking at the thread of the current text in the file? MIH: Thread. - STA: I would like us to resolve this discussion once and for all, regardless of the decision, so that we can make progress. - APP: I think it’s valuable to have hefty, gnarly discussions in threads so that we can properly discuss things before coming to a decision. EAO, to your point about the output, if we have descriptions of what the return values are per error type, then it will help us write tests accordingly. - APP: STA’s comment is valid: do we collect all of the errors, do we nest errors, or do we just throw the first one? - EAO: We want to go even further: we want to allow implementations to decide. - MIH: Agreed, it's implementation-specific. Whether to collect all errors should be up to the implementation. - EAO: The current PR text is rather strict about … - EAO: "an informative error or errors must be provided" - APP: It seems important that we have some descriptions of what output to expect for certain inputs, like if you have a placeholder named `$foo` but you have that represented in runtime data, then we should describe the way in which output will look. - STA: If you have a scenario where you have multiple errors, like selecting on a selector that isn’t present, and trying to include it in the message, you have different types of errors to use, but at least just specify the type of error and how it should the error response should report things as. - APP: If we want to make errors part of the compliance tests, then yes. But errors can also be internal to implementations. - EAO: The conformance suite could specify the expected formatting result and the list of expected errors and require at least one of these errors to be produced by implementations. - MIH: If we allow merging PRs that are not fully agreed on / finished, do we want to add some verbiage to the spec to mark it as draft? - APP: At a meta-level, if we commit text to the spec, but we still think that further work is needed, then we should file issues against them. - diff --git a/meetings/2023/notes-2023-02-06.md b/meetings/2023/notes-2023-02-06.md index 626d47749d..a1845799a9 100644 --- a/meetings/2023/notes-2023-02-06.md +++ b/meetings/2023/notes-2023-02-06.md @@ -1,11 +1,12 @@ - Attendees: Please fill in your name and affiliation plus a 3-letter acronym: + - Suggestion 1: First letter of given name, First letter of surname, Last letter of surname - Suggestion 2: First initial, middle initial, last initial - Suggestion 3: Custom ### 2023-02-06 Attendees + - Addison Phillips - Unicode (APP) - chair - Elango Cheran - Google (ECH) - Eemeli Aro - Mozilla (EAO) @@ -17,6 +18,7 @@ Please fill in your name and affiliation plus a 3-letter acronym: - Zibi Braniecki - Amazon (ZIB) ### Last Meeting Attendees + - Addison Phillips - Unicode (APP) - chair - Tim Chevalier - Igalia (TJC) - Eemeli Aro - Mozilla (EAO) @@ -37,20 +39,20 @@ EAO presented on MF2 & Intl.MessageFormat at FOSDEM this past weekend: https://f **Topic:** Action Item Review -Action-Item issues +Action-Item issues **Topic:** Schedule and Goals for 2023 Requested by: chair -https://github.com/unicode-org/message-format-wg/discussions/328 -https://github.com/unicode-org/message-format-wg/issues/337 +https://github.com/unicode-org/message-format-wg/discussions/328 +https://github.com/unicode-org/message-format-wg/issues/337 **Topic:** Handling of 2119 keywords and internal terms Requested by: APP, EAO -Discussion of document formatting standards for 2119 keywords (**must**, *should*, etc.). Discussion of terminology handling. -https://github.com/unicode-org/message-format-wg/issues/331 -https://github.com/unicode-org/message-format-wg/discussions/332 +Discussion of document formatting standards for 2119 keywords (**must**, _should_, etc.). Discussion of terminology handling. +https://github.com/unicode-org/message-format-wg/issues/331 +https://github.com/unicode-org/message-format-wg/discussions/332 **Topic:** Function Registry @@ -67,31 +69,31 @@ Discussion of bidirectional text handling. See: **Topic:** (Discussion) Guidance needed for dealing with selector explosions [next time] Requested by: STA -https://github.com/unicode-org/message-format-wg/discussions/323 +https://github.com/unicode-org/message-format-wg/discussions/323 May be out-of-date, since we had "extra time" at the end of 2023-01-23's call. **Topic:** (Pull Request) Add Pattern Selection Requested by: EAO -https://github.com/unicode-org/message-format-wg/pull/333 +https://github.com/unicode-org/message-format-wg/pull/333 Addition of pattern selection (matching) rules to the spec. Thanks @eemeli for the PR! --- -## NOTES: +## NOTES: ## Schedule and Goals for 2023 + Widespread agreement to strive for a separate UTR/UTS. Aiming for some level of completeness in June for the 16.0 release is an aggressive, but viable target. - ## Keywords and normative vocabulary APP: I’m fine with RFC 2119 terms for normativity, but Mark Davis doesn’t like them. Unicode documents normally embed explicit numbered requirements. APP: Additionally, our text is probably not sufficiently stable to count on ordering. -RGN: Examples of Unicode style at https://www.unicode.org/reports/tr31/#Conformance and https://www.unicode.org/reports/tr35/#11-conformance +RGN: Examples of Unicode style at https://www.unicode.org/reports/tr31/#Conformance and https://www.unicode.org/reports/tr35/#11-conformance EAO: Preference for 2119; Unicode style would be cumbersome at this point but we can always add numbers later. @@ -102,10 +104,12 @@ EAO: Do we want to style the keywords? [relatively even split between caps vs. formatting, with many “don’t care” opinions] ### Conclusion + No objection to 2119 keywords. APP to add text adopting them, capitalizing the terms. For internal terms, bold+italicize on first use and italicize references. ## Function Registry + STA: [MessageFormat 2: The Function Registry Slides](https://docs.google.com/presentation/d/1z6uEBwMSbW0OmpFGv73usRrk4vo-lgGu6ZNOXjT38a0/edit?usp=sharing) diff --git a/meetings/2023/notes-2023-02-13.md b/meetings/2023/notes-2023-02-13.md index 55c95cbb79..ef23dfee80 100644 --- a/meetings/2023/notes-2023-02-13.md +++ b/meetings/2023/notes-2023-02-13.md @@ -1,6 +1,7 @@ 2023-02-13 | MessageFormat WG (replaces 2023-02-20 call) ### 2023-02-13 Attendees + - Addison Phillips - Unicode (APP) - chair - Eemeli Aro - Mozilla (EAO) - Simon Clark - Oracle (SCL) @@ -15,6 +16,7 @@ #### Topic: Agenda Review #### Topic: Info Share + ECH: Updated ICU user guide SCL and STA continue working on FAQ @@ -22,27 +24,29 @@ SCL and STA continue working on FAQ APP shared a doc on CLDR edits. https://docs.google.com/document/d/1vFPGgppCvLl_7KgYUER1qatjHz1Zwm5oL2feZRzeXjA/edit - Looking for feedback on edits by Friday #### Topic: Action Item Review + APP: working on guidance for questioning tech decisions. Will respond to question in github #### Topic: Whitespace handling in EBNF + Discussion required for Unblocking STA STA: Are we happy with the flavor of EBNF STA: Whitespace definition in spec is not clearly defined -STA: looking for EBNF with better tooling support for validation, syntax error reporting. IETF BNF - possible candidate. APP says there are good tools available. Other parts of Unicode are using W3C EBNF. +STA: looking for EBNF with better tooling support for validation, syntax error reporting. IETF BNF - possible candidate. APP says there are good tools available. Other parts of Unicode are using W3C EBNF. No strong preference in the call. MIH will work with STA to identify validation tool availability. STA will select an implementation. MIH link to REX Parser generator : https://www.bottlecaps.de/rex/ -STA: LL1 parsers not very expressive. Hard to express whitespace rules. “LL1 With backtracking” a possibility. Or not be explicit about whitespace in grammar. Can reject whitespace errors on tokenization step. +STA: LL1 parsers not very expressive. Hard to express whitespace rules. “LL1 With backtracking” a possibility. Or not be explicit about whitespace in grammar. Can reject whitespace errors on tokenization step. EAO: We are LL1 right now with MF1 -APP: we currently do not permit the whitespaces that we all include in our messages. +APP: we currently do not permit the whitespaces that we all include in our messages. -STA: Proposal drop the strict LL1 Requirement. Go with LL1 with backtracking. +STA: Proposal drop the strict LL1 Requirement. Go with LL1 with backtracking. MIH : there is a LL1 with backtracking implementation in ECMA script @@ -52,7 +56,7 @@ APP: possible to differentiate between single ling whitespace and multiline whit EAO: happy to let NL NL token allowed, hard to prevent, not worth the effort -STA: Do we need WS between \*\*? Yes. MIH: We want spaces between keys. +STA: Do we need WS between \*\*? Yes. MIH: We want spaces between keys. EAO: Can fix in editor, or on export step. @@ -60,22 +64,23 @@ ECH: consistency is important for reducing cognitive overhead. Strictness is ok EAO: Using parenthesis is a corner case, why add rules to make it special. -STA: Will use LL1 With backtracking.. Will pick a BNF +STA: Will use LL1 With backtracking.. Will pick a BNF #### Topic: 2119 keyword PR + Requested by: APP -APP: Added 2119 to syntax.md. Do we make syntax.md the spec or create a new container? +APP: Added 2119 to syntax.md. Do we make syntax.md the spec or create a new container? EAO: Syntax is ‘chapter 1’. More to come about other parts of the spec. Everything in the spec folder should be part of the official spec, but breaking into files is valuable. #338 - #### Topic: (Pull Request) Add Pattern Selection + Requested by: EAO #333 -EAO: As a result of CLDR RC how do we have conversations about revisiting technical choices. Ie: text-first or code-first. +EAO: As a result of CLDR RC how do we have conversations about revisiting technical choices. Ie: text-first or code-first. APP: Formal way to oppose decisions. Working group can choose to take it up, or reject it as ‘settled’. Default position in ‘no change’ @@ -87,7 +92,7 @@ EAO: Would like to agree on how we have conversations before we have the convers APP: I owe the group a document why relying on the ordering of variants is a bad idea. -STA: There is a doc from CLDR-TC, any incompatibility issues can be solved with tooling. +STA: There is a doc from CLDR-TC, any incompatibility issues can be solved with tooling. APP: I’m sensitive to the audience of people who will use this because relying on order of variants will be really confusing and difficult to use. @@ -95,13 +100,13 @@ STA: Best match requires function or documentation inspection, first match is mo MIH: Whatever is implemented in ICU4J doesn’t mean it is the group’s agreed final consensus. We needed to make compromises to make the deadline for the technical preview implementation. -MIH: I am not saying that we are required to be fully compatible with MessageFormat 1.0, but once we try to +MIH: I am not saying that we are required to be fully compatible with MessageFormat 1.0, but once we try to ECH: We should not lose the thread concerning the algorithm on how we match tuple cases for each variant within the selection. That point might inform best-match vs. first match. The notion of matching is not equality, the notion of equality depends on identity and can differ depending on what we're dealing with. If we take a MF1 message and try to convert it, what is doing the conversion between the semantics of MF1 and MF2? Would require additional tooling in order scale. EAO: There seems to be 3 related discussions here: 1) APP to describe why best match is strongly preferable to first match; 2) ECH says the spec should be more specific on having an abstraction/interface for the matching logic to be specific to the selector type; 3) updating the spec text. -APP: The way EAO's PR is setup is to have the selector to determine its output. If we invert this, the inputs and various when statement. If we write in terms of the match statement rather than in terms of then when statement, it will decouple us from some of the weirdness around plurals. Even for first match, plural will not produce "1" or "ONE" +APP: The way EAO's PR is setup is to have the selector to determine its output. If we invert this, the inputs and various when statement. If we write in terms of the match statement rather than in terms of then when statement, it will decouple us from some of the weirdness around plurals. Even for first match, plural will not produce "1" or "ONE" EAO: It ends up impacting the structure of how selection works. @@ -113,25 +118,23 @@ APP: If you were doing best match, then you would be processing all of the varia STA: Unclear on where score is recorded and which score is best. -ECH: I don't think my points have been addressed. I can clarify if needed. Plurals may be quirky but are also very common. Formatting has to happen with numbers, can also be called pre-processing depending on the terminology or context. This isn't enough to match. A formatted to parts number won't match "ONE" +ECH: I don't think my points have been addressed. I can clarify if needed. Plurals may be quirky but are also very common. Formatting has to happen with numbers, can also be called pre-processing depending on the terminology or context. This isn't enough to match. A formatted to parts number won't match "ONE" -STA: This is the point of the EAO's spec. We wanted to ensure plurals can be implemented as custom functions. We're able to express this nuance in Javascript. +STA: This is the point of the EAO's spec. We wanted to ensure plurals can be implemented as custom functions. We're able to express this nuance in Javascript. -APP (chat): Here is what the current PR says: - 1. Let _sel_ be the entry in _res_ at position _i_. - 2. Let _pass_ be the boolean result of testing _key_ against _sel_. +APP (chat): Here is what the current PR says: 1. Let _sel_ be the entry in _res_ at position _i_. 2. Let _pass_ be the boolean result of testing _key_ against _sel_. -ECH: An interface will allow us to decouple the implementation and this hasn't been specified in the PR yet. +ECH: An interface will allow us to decouple the implementation and this hasn't been specified in the PR yet. STA: This goes against our runtime specification. -ECH: I agree with the intention but it is non-obvious, unseen. +ECH: I agree with the intention but it is non-obvious, unseen. STA (chat): Right, and an implementation for that can be: if (sel.match(key)) ..., where both sel and key are some runtime types that can match and be formatted, respectively ECH (chat): and `match()` is the interface that we need to define. and it is intertwined with the selector value type (and in the case of plurals, depends on the selector value type's formatter's output) -MIH: A source of confusion is how we're thinking about result values. You call a function and get a result value. The result could also be an object. We cannot be backwards compatible with MF1. We can do MF1 sort of thing by sorting. This is outside of the purview of message format, the function knows how to sort. We can look at individual values and return a score. MF1 runtime is doing the sorting. Can we write a custom function that implements plural matching the MF1 semantics in the MF2 engine? +MIH: A source of confusion is how we're thinking about result values. You call a function and get a result value. The result could also be an object. We cannot be backwards compatible with MF1. We can do MF1 sort of thing by sorting. This is outside of the purview of message format, the function knows how to sort. We can look at individual values and return a score. MF1 runtime is doing the sorting. Can we write a custom function that implements plural matching the MF1 semantics in the MF2 engine? EAO: I would like to include in the spec the things that ECH is asking for but have also tried not to include it. This has caused controversy in the past. It would facilitate understanding and clear communication. Explicitly define the interface for what is the value of a local variable. How do we communicate to readers what is happening? In ICU4J there is a formatted number structure, what is the plural category of this number? @@ -141,11 +144,11 @@ EAO: We should find acceptable language to build a spec so implementations can b EAO: For MF1 compatibility, we see a two phase operation. There is a separate module from MF2 runtime which is able to take MF1 syntax and produces a MF2 data model representation. This includes the step of sorting and full visibility for all keys. Not able to do what MIH describes. Reordering doesn't break compatibility of MF1 with MF2. -STA: Thought we didn't want to specify what a selector looks like because of implementation differences. Confused how far we should take the normalization of the spec. Also confused about how best-match could work. Understood that sorting would take place at migration time, not runtime. +STA: Thought we didn't want to specify what a selector looks like because of implementation differences. Confused how far we should take the normalization of the spec. Also confused about how best-match could work. Understood that sorting would take place at migration time, not runtime. -ECH: I would hope the interfaces would allow you to not specify the types. You don't know what you will get from pre-processed value. Sometimes preprocessing includes locale information. +ECH: I would hope the interfaces would allow you to not specify the types. You don't know what you will get from pre-processed value. Sometimes preprocessing includes locale information. -APP: we can write a spec that is agnostic about many of these things. Would like to understand better when EAO is getting at. Given a selector and a set of whens and a set of inputs, we get a result, either a string or error. We should try to accomplish this goal. +APP: we can write a spec that is agnostic about many of these things. Would like to understand better when EAO is getting at. Given a selector and a set of whens and a set of inputs, we get a result, either a string or error. We should try to accomplish this goal. MIH: Should we meet specifically on this topic? @@ -198,7 +201,7 @@ WhiteSpace ::= #x9 | #xD | #xA | #x20 /\* ws: definition \*/ Mihai ⦅U⦆ Niță 10:06 AM -Literal ::= '(' (LiteralChar | LiteralEscape)* ')' /\* ws: explicit \*/ +Literal ::= '(' (LiteralChar | LiteralEscape)\* ')' /\* ws: explicit \*/ Addison 10:13 AM @@ -211,6 +214,7 @@ Shoud this be invalid? {$foo :func opt=(value)opt=value} Mihai ⦅U⦆ Niță 10:23 AM + > Shoud this be invalid? {$foo :func opt=(value)opt=value} yes (my opinion) Addison @@ -230,13 +234,13 @@ it helps to leapfrog with 2 scribes. and it results in better coverage Mihai ⦅U⦆ Niță 10:46 AM -Two levels for "best match" vs "first match": A. "per column" ``` match {$count, :plural} when * {....} when one {....} when 1 {....} ``` B. Between rows ``` when 1 * when one masculine ``` +Two levels for "best match" vs "first match": A. "per column" `match {$count, :plural} when * {....} when one {....} when 1 {....}` B. Between rows `when 1 * when one masculine` If we do first match on A, that is horribly bad, and not MF1 compatible Doing first match for B is OKish Mihai ⦅U⦆ Niță 10:54 AM -In other words what I think Addison is saying: ``` switch ( function(val) ) { case A: .... case B: .... default: } ``` Vs: if ( function ( val, A ) ) .... elsif ( function ( val, B ) ) .... else .... ```` +In other words what I think Addison is saying: `switch ( function(val) ) { case A: .... case B: .... default: }` Vs: if ( function ( val, A ) ) .... elsif ( function ( val, B ) ) .... else .... ```` Addison 11:00 AM @@ -275,5 +279,3 @@ Mihai ⦅U⦆ Niță 11:25 AM Also: "Make it clear which functions are formatting functions and which are selection functions blocker-candidate " https://github.com/unicode-org/message-format-wg/issues/260 Which is Elango's concern, i think - - diff --git a/meetings/2023/notes-2023-02-27.md b/meetings/2023/notes-2023-02-27.md index a95f19ac55..e7b1fb7ce4 100644 --- a/meetings/2023/notes-2023-02-27.md +++ b/meetings/2023/notes-2023-02-27.md @@ -1,18 +1,19 @@ ### Attendees -* Addison Phillips - Unicode (APP) - chair -* Eemeli Aro - Mozilla (EAO) -* Simon Clark - Oracle (SCL) -* Mihai Nita - Google (MIH) -* Tim Chevalier - Igalia (TJC) -* Richard Gibson - OpenJSF (RGN) -* Elango Cheran - Google (ECH) -* Staś Małolepszy - Google (STA) -* Romulo Cintra - Igalia (RCA) -* Zibi Braniecki - Amazon (ZBI) + +- Addison Phillips - Unicode (APP) - chair +- Eemeli Aro - Mozilla (EAO) +- Simon Clark - Oracle (SCL) +- Mihai Nita - Google (MIH) +- Tim Chevalier - Igalia (TJC) +- Richard Gibson - OpenJSF (RGN) +- Elango Cheran - Google (ECH) +- Staś Małolepszy - Google (STA) +- Romulo Cintra - Igalia (RCA) +- Zibi Braniecki - Amazon (ZBI) ### Scribe -* MIH +- MIH ### Notes doc location @@ -28,7 +29,6 @@ Topic: Action Item Review Topic: Function Registry (continued) - Über-topic: ABNF Topic: First-match vs. best-match @@ -45,14 +45,14 @@ Topic: Markup https://github.com/unicode-org/message-format-wg/pull/347 -“Raw” view: https://github.com/unicode-org/message-format-wg/blob/84bbfa1dd751eb1915514cf7eb37e3834748bbf8/spec/message.abnf +“Raw” view: https://github.com/unicode-org/message-format-wg/blob/84bbfa1dd751eb1915514cf7eb37e3834748bbf8/spec/message.abnf STA: also discuss issues tagged blocker-candidate & resolve-candidate STA: ABNF is based on the PR #344 dealing with whitespaces, which means it is blocked Also uneasy about using a tool (REx) that is not available as source, only as a web service (https://www.bottlecaps.de/rex/) -APP: Can we resolve the ABNF issue first? A lot of interesting points arose in that discussion. I see agreement from the group. +APP: Can we resolve the ABNF issue first? A lot of interesting points arose in that discussion. I see agreement from the group. APP: Let’s talk about whitespace. @@ -74,7 +74,7 @@ APP: If we want future extensions we should reserve them in the v1 released gram STA: The question about `****` is whether we can consider `*` as a token. -MIH: I think we need to keep the * and (*) as different. One is keyword for default, the other is a literal. +MIH: I think we need to keep the _ and (_) as different. One is keyword for default, the other is a literal. STA: ok, agree STA: Regarding relaxing the grammar, we can allow 1 whitespace like XML does, which I would be okay. But what would the group think about allowing `****`, and I realized that having grammar that is lenient is helpful, and tooling (ex: linters) can use our grammar to fix messages. @@ -83,11 +83,12 @@ EAO: ok to require spaces around {} only. Can we require whitespace in the Varia APP: There is already whitespace required between the `when` values in the Variant production. -STA: And there is an implicit whitespace requirement between `NMTOKEN`s because if you don’t put in a whitespace, then +STA: And there is an implicit whitespace requirement between `NMTOKEN`s because if you don’t put in a whitespace, then ECH: I agree with EAO in wanting to require whitespace, but I am feeling less strong now about requiring whitespaces except for the `when` values in the Variant production. It would look inconsistent and confusing to have whitespace between some values but not have them between a series of `*` like `****`. Longer discussion around examples + ``` match {$x} when*{{a=(b)c=d}} @@ -119,7 +120,7 @@ APP: Let’s have discussions on specific changes to the ABNF via PRs and discus STA: Okay, I will merge the PR. And I will not file issues but instead only have PRs. Is that what you are suggesting? -APP: +APP: #### Topic: Delimiter for literals @@ -134,7 +135,7 @@ EAO: One other point on this as it relates to whitespace is if we do use vertica ECH: +1 to EAO’s comment -STA: This +STA: This EAO: Let’s agree to switch to vertical pipes, and then an action item for STA to file the PR. @@ -148,21 +149,16 @@ EAO: That is a good point. ECH: I am +1 to APP’s proposal. - #### Topic: Issues tagged resolve-candidate Try to close what is non-controversial #### Topic: First-match vs. Best-match -APP: Just to introduce the topic for next time, look at issue #351 and the new pull request related to it. The current status is to take the first variant whose when condition matches. The proposal is to +APP: Just to introduce the topic for next time, look at issue #351 and the new pull request related to it. The current status is to take the first variant whose when condition matches. The proposal is to #### Topic: Markup APP: Just to introduce the topic for the next time, we have called out that the ABNF has reserved syntax for markup. There are a bunch of questions that flow from that, like whether the syntax can be different, and whether they work as we intend them to. Another issue that I raised, and others chimed in on, is whether we can protect markup. Say, using XLIFF to protect markup. Issue #356 is the place to start. -MIH: My position is that markup is not required, and you can achieve all that you want with markup using placeholders. The other issue is a meta-level concern about whether we properly agreed - - - - +MIH: My position is that markup is not required, and you can achieve all that you want with markup using placeholders. The other issue is a meta-level concern about whether we properly agreed diff --git a/meetings/2023/notes-2023-03-06.md b/meetings/2023/notes-2023-03-06.md index 2c5217cc8f..d91de66d0d 100644 --- a/meetings/2023/notes-2023-03-06.md +++ b/meetings/2023/notes-2023-03-06.md @@ -1,18 +1,20 @@ ## Attendees -* Addison Phillips - Unicode (APP) - chair -* Elango Cheran - Google (ECH) -* Mihai Nita - Google (MIH) -* Eemeli Aro - Mozilla (EAO) -* Staś Małolepszy - Google (STA) -* Richard Gibson - OpenJSF (RGN) -* Simon Clark - Oracle (SCL) + +- Addison Phillips - Unicode (APP) - chair +- Elango Cheran - Google (ECH) +- Mihai Nita - Google (MIH) +- Eemeli Aro - Mozilla (EAO) +- Staś Małolepszy - Google (STA) +- Richard Gibson - OpenJSF (RGN) +- Simon Clark - Oracle (SCL) Scribe: SCL (thank you!) ## Agenda ### Topic: Agenda Review -* https://github.com/unicode-org/message-format-wg/blob/main/meetings/agenda.md + +- https://github.com/unicode-org/message-format-wg/blob/main/meetings/agenda.md EOA: Backward compatibility with MF1? @@ -21,52 +23,55 @@ EOA: Backward compatibility with MF1? EAO: PR in messageResource WG regarding resource format - https://github.com/eemeli/message-resource-wg/pull/11 ### Topic: Action Item Review -* none + +- none ### Topic: Function Registry (continued) + Requested by: STA STA considers not a dependency of other discussions. Discussion of the function registry. Two of the three models had sections on this. ### Topic: First-match vs. best-match -* Requested by: APP -* Document: https://github.com/unicode-org/message-format-wg/blob/aphillips-issue-351/exploration/selection-matching-options.md -#351 + +- Requested by: APP +- Document: https://github.com/unicode-org/message-format-wg/blob/aphillips-issue-351/exploration/selection-matching-options.md + #351 EAO: Add example around non-order specific benefit of adding new translation at end saving translation resources. What about when there are multiple lines hitting the same score? What result? ECH: CLDR and ICU added new rules for romance languages (ex: `fr`, `pt`) recently in 2020, whose plural formatting of compact notations was not represented. The plural rules for those locales needed to add a plural category to the plural rule set for those locales (category name `many`). That required messages with plural selectors in those locales to be updated to add variants for the newly added `many` plural category. -MIH: Translator tool may not have control over order of translations? But it’s just one string. Do any tools understand these strings as separate entities? New requirement on translation tooling to understand order. +MIH: Translator tool may not have control over order of translations? But it’s just one string. Do any tools understand these strings as separate entities? New requirement on translation tooling to understand order. MIH: First match case has the potential to break backward compatibility with MF1. -EAO: Understanding of which selectors get picked, reliance on function registry. Votes for column-first match with star as optional. +EAO: Understanding of which selectors get picked, reliance on function registry. Votes for column-first match with star as optional. APP: Languages like `ja` don’t have a rule for the `one` plural category, so the `other`/`*` category is necessary -‘One’ fires in some locales for ‘21’. +‘One’ fires in some locales for ‘21’. SCL: Question: if the parameters passed in are `0 1 1`, what it match under first match, and under best batch? Should the `*`/`other` contribute less to the match score because they are default catch-all cases? APP: Look at the table in the FAQ that shows an example of scoring an input. Note that in the case of plurals, the `*` is equally meaningful because it is the same as the `other` plural category, which has rules defined for it. -STA - leans towards not requiring the \*. +STA - leans towards not requiring the \*. -STA: worried about cases in column first where we hit dead ends when first column is too good of a match. Opinion: first match is simplest choice. Not a lot of accidental complexity. Algorithms for calculating match scores have high cognitive load. Somewhat opaque. Requires certain level of trust. +STA: worried about cases in column first where we hit dead ends when first column is too good of a match. Opinion: first match is simplest choice. Not a lot of accidental complexity. Algorithms for calculating match scores have high cognitive load. Somewhat opaque. Requires certain level of trust. -EAO : Column first allows for leaving out the \*. Allows for having the default first. +EAO : Column first allows for leaving out the \*. Allows for having the default first. ECH: complexity argument not well stated. Fist-match may be easier to understand and implement, but then creates a dependency on the ordering of variants. The dependency on ordering then creates problems for the example of plurals where the target locale’s plural categories are different from the source locale, or when a new plural category is added to a locale’s plural rules set. There is inherent complexity in the system, and first-match tries to shift it around, but doesn’t reduce it. Importantly, when dealing with MF1 legacy messages, they are best-match because they are written without a dependency on ordering. Want to be able to upgrade those messages without changing behaviour. (Main concern) MIH: The first-match approach expects translators to know and write what is best order based on rules they may not have access to. -EAO: Keep in mind for MF1 - we do have a best match algo, however it is very specific case in plural select. Anything other than first match introduces dependency on the function registry. +EAO: Keep in mind for MF1 - we do have a best match algo, however it is very specific case in plural select. Anything other than first match introduces dependency on the function registry. APP: MF 1 can be converted deterministically by knowing the rules on both sides. -APP: Best match separates keyword selection from ranking algorithm . Need to +APP: Best match separates keyword selection from ranking algorithm . Need to APP: When comparing to MF1, we’ve gone from nesting selection to matrix based selection for MF 2.0. I think this matrix based selection wasn’t necessary in MF1 because you had nested messages that encapsulated the return value for any selector. @@ -76,32 +81,32 @@ MIH: What does function registry dependance mean in the case of best match code? STA: Migration path is possible from MF1 to MF2. Not easy but possible. Complexity is there in all options. Combat complexity with predictability. Like CSS specificity. Still confuses a lot of people. Who are we optimizing for? Localizers, if working with fragments of string, have no visibility into complexity of selectors. Best matchers use scoring variant, First match is a boolean system. Less cognitive complexity for both developers and translators. -APP: Possible to write best match in a canonical order that is also first match. +APP: Possible to write best match in a canonical order that is also first match. -EAH: dependency on function registry based on “1” vs “one” . Defining spec order means there is a validity dependency on order/function registry. Column first allows for boolean decision tree. +EAH: dependency on function registry based on “1” vs “one” . Defining spec order means there is a validity dependency on order/function registry. Column first allows for boolean decision tree. -MIH: Who needs to figure out the ordering mechanism? Scoring may be non-intuitive, but complexity is understanding row tuple ranking. What about column matching. Apply mer column MF1 pattern. Sort by first column, then second, etc. Get lexographical sort. Gives a mixture of best match vs first match. Does complexity come from understanding how columns rank against each other. +MIH: Who needs to figure out the ordering mechanism? Scoring may be non-intuitive, but complexity is understanding row tuple ranking. What about column matching. Apply mer column MF1 pattern. Sort by first column, then second, etc. Get lexographical sort. Gives a mixture of best match vs first match. Does complexity come from understanding how columns rank against each other. -STA: Scoring is just an implementation detail. Exponential complexity potential, but unlikely. Easier to understand with just a few cases and variants. With hundreds of variant (arabic languages) will be cognitively hard to understand regardless. If this is +STA: Scoring is just an implementation detail. Exponential complexity potential, but unlikely. Easier to understand with just a few cases and variants. With hundreds of variant (arabic languages) will be cognitively hard to understand regardless. If this is largely about plurals (it is) then we already know the rules. ECH: Not removing complexity with first match. Still have tuples to avoid nested selects. Has preference for column first match - stable sort. -APP: Filter then sort column-based for best match. +APP: Filter then sort column-based for best match. MIH: would what Elango proposed be acceptable? Impossible to prove that no one can come up with a good use case. If MIH describe that algorithm, consider it? -EAO Best match explicitly prioritizes the columns. Selection is still partly a black box. Preference. 1) Column match with optional star. +EAO Best match explicitly prioritizes the columns. Selection is still partly a black box. Preference. 1) Column match with optional star. SCL: First match can be described in a single unambiguous sentence. Makes it a compelling choice. -STA: Why ok to order on selectors/cols, not ok to order on variants? +STA: Why ok to order on selectors/cols, not ok to order on variants? EAO: Col first with optional star, first match, col first with req star ECH: column first, best match distant sec, first match distant third -MIH: Best Match first. Col first lexo, star mandatory second, +MIH: Best Match first. Col first lexo, star mandatory second, RGN: Col first with optional star, first match, col first with req star @@ -109,9 +114,8 @@ SCL: first match, col first, star opt second STA: first match, best match, col match with opt star -APP: sorted col first, allergic to first match +APP: sorted col first, allergic to first match EAO: primary audience for writing is developers, reading them is translators. STA: What is the reason for a “1” override, or a ‘2’ override? Who does that? Polish may want to have explicitly tooling for plural “2” to make it sound natural. - diff --git a/meetings/2023/notes-2023-03-13.md b/meetings/2023/notes-2023-03-13.md index 3d59d8e44e..f287471bc0 100644 --- a/meetings/2023/notes-2023-03-13.md +++ b/meetings/2023/notes-2023-03-13.md @@ -1,15 +1,16 @@ Mar 13, 2023 | MessageFormat WG Teleconference ### Attendees -* Addison Phillips - Unicode (APP) - chair -* Eemeli Aro - Mozilla (EAO) -* Simon Clark - Oracle (SCL) -* Elango Cheran - Google (ECH) -* Mihai Nita - Google (MIH) -* Staś Małolepszy - Google (STA) -* Richard Gibson - OpenJSF (RGN) -* Zibi Braniecki - Amazon (ZBI) -* Tim Chevalier (TIM) + +- Addison Phillips - Unicode (APP) - chair +- Eemeli Aro - Mozilla (EAO) +- Simon Clark - Oracle (SCL) +- Elango Cheran - Google (ECH) +- Mihai Nita - Google (MIH) +- Staś Małolepszy - Google (STA) +- Richard Gibson - OpenJSF (RGN) +- Zibi Braniecki - Amazon (ZBI) +- Tim Chevalier (TIM) Scribe was: ECH @@ -21,13 +22,14 @@ Notes Document for The Upcoming Call https://docs.google.com/document/d/17X6BPvHdjI_Twuy2vcbtVjxKAYTi5eUTZpJl9qUGzjM/edit ### Agenda for 2023-03-13 -* Topic: Agenda Review -* Topic: Info Share -* Topic: Action Item Review -* Topic: Closing the book on matching -* Topic: Function Registry -* Topic: Markup -* Topic: Determining MF1 compatbility + +- Topic: Agenda Review +- Topic: Info Share +- Topic: Action Item Review +- Topic: Closing the book on matching +- Topic: Function Registry +- Topic: Markup +- Topic: Determining MF1 compatbility #### Proposed for Future (or if time permits) @@ -60,7 +62,6 @@ Requested by: APP ## Notes - ### Info Share SCL: I did an audit of all of our message handling needs, and comparing that to MF 2.0. It turns out that MF 2.0 handles all of our needs. @@ -77,18 +78,17 @@ STA: The signature of a function has one or more signatures, and the signature c APP: One of the things we warn people about is having to change their message strings around when the locale changes. -STA: A few other things that I want to point out. For an option to a function, the `values` attribute indicates which enumerated set of values are allowed. Another way to validate is using the `pattern` attribute, which references a regular expression that describes the string provided +STA: A few other things that I want to point out. For an option to a function, the `values` attribute indicates which enumerated set of values are allowed. Another way to validate is using the `pattern` attribute, which references a regular expression that describes the string provided APP: We’ll take feedback directly on the PR, which also gives time for others to have a chance to take a look first. - ### Closing the book on Selection Matching APP: I have updated [the document](https://github.com/unicode-org/message-format-wg/blob/aphillips-issue-351/exploration/selection-matching-options.md). Where we left off is a slight preference towards column-first matching. SCL: I am reasonably okay with any option. One statement that I made last time that I want to revise after talking with people in our company is who is responsible for crafting messages. For our workflows, the developer will draft a message, but it goes to the content management team to be edited to support our workflows. Our -EAO: What does your company’s team +EAO: What does your company’s team SCL: Because we don’t have a lot of need for selection messages, I prefer the first-match strategy because it’s easier. @@ -120,7 +120,7 @@ MIH: About tooling, we should expect a minimal level tooling, such as validating I still have a problem with the argument for requiring the developer to sort the messages. If you have the ability to validate or lint messages at build time using ordering rules, then you have the ability to order things at runtime. So why not just use the tooling at runtime in a best-match strategy? -STA: About tooling, the safest assumption is to assume the same level of tooling for current MF2.0. I also didn’t understand why there is a distinction between required `*` and optional `*`. I get concerned about the idea in the best-match approach that messages are selected by a transformation that is doing filtering +STA: About tooling, the safest assumption is to assume the same level of tooling for current MF2.0. I also didn’t understand why there is a distinction between required `*` and optional `*`. I get concerned about the idea in the best-match approach that messages are selected by a transformation that is doing filtering EAO: I introduced the distinction between required and optional `*`. My thought was that fist-match requires sorting of variants, and that requires developers/translators to refer to the registry to know how to sort, and that can be difficult, so I thought that this could only work if we relax the constraint that the default case `*` is required. @@ -132,18 +132,17 @@ APP: Remember that you have ChoiceFormat. That is tricky, where a numerical inpu STA: I’m starting to think differently about this. ECH is right that we’re shifting complexity around. I was optimizing for debuggability, but that is not necessarily optimizing for developers/translators. Maybe it is better to just focus on debugging of a single message. Maybe it is better to surrender the decision of which message is chosen to the selector that knows how to sort keys. Among best-match, I prefer the option that uses scoring the most. I have a concern about fuzzy matches versus exact matches. - MIH: I feel what APP said did a good job of explaining of what I tried but haven’t been able to, which is that you have to separate the per-selector matching & sorting algorithms from the overall matrix / key tuples selection algorithm. Regarding the point about debuggability and wanting to understand things, developers are already used to using libraries and algorithms without knowing the exact implementation details. -STA: I am concerned about the +STA: I am concerned about the EAO: I am still very resistant to using an approach that uses scoring. MIH: For me, scoring is an implementation detail. -ECH: I want to point out that in the ideal case, if the full space of inputs is covered by all of the selector value key tuples in the matrix, then the ideal best tuple will always win in any of the best-match approach. In the document example, if the input values are `1`, `1`, `1`, then the selector tuple `=1 =1 =1` will be matched over anything else like `one one one`, etc. The ideal variant will be selected in either of the best-match algorithms (column-first or best-score). So what we’re talking about here is not what happens in the ideal case, but what happens when the matrix / tuple space is underspecified. So what we’re looking at for cases that are underspecified is providing a best effort result. The best-match algorithms differ by whether it is a greedy algorithm or a global scoring algorithm, but they are still just providing a best effort answer. +ECH: I want to point out that in the ideal case, if the full space of inputs is covered by all of the selector value key tuples in the matrix, then the ideal best tuple will always win in any of the best-match approach. In the document example, if the input values are `1`, `1`, `1`, then the selector tuple `=1 =1 =1` will be matched over anything else like `one one one`, etc. The ideal variant will be selected in either of the best-match algorithms (column-first or best-score). So what we’re talking about here is not what happens in the ideal case, but what happens when the matrix / tuple space is underspecified. So what we’re looking at for cases that are underspecified is providing a best effort result. The best-match algorithms differ by whether it is a greedy algorithm or a global scoring algorithm, but they are still just providing a best effort answer. EAO: My reference point for the scoring approach is CSS, which uses a scoring algorithm, but I bet that no one knows what it is. What happens as a result is you introduce the syntax `!` to indicate that something is important. I am afraid that something like that would evolve. @@ -154,5 +153,3 @@ STA: I used the same argument of CSS last week to argue against best-match scori APP: What are everyone’s preferences now? … Based on what everyone said, it seems like there is a rough consensus on some type of best-match algorithm, but we are still undecided between column-first and best-score. It would help to have running code for column-first, and for best-score. - - diff --git a/meetings/2023/notes-2023-03-27.md b/meetings/2023/notes-2023-03-27.md index a9c92d72fe..4d6f858e3f 100644 --- a/meetings/2023/notes-2023-03-27.md +++ b/meetings/2023/notes-2023-03-27.md @@ -1,6 +1,7 @@ Mar 27, 2023 | MessageFormat WG Teleconference ### Attendees + - Addison Phillips - Unicode (APP) - chair - Eemeli Aro - Mozilla (EAO) - Mihai Nita - Google (MIH) @@ -19,10 +20,10 @@ Scribe: STA https://github.com/unicode-org/message-format-wg/blob/main/meetings/agenda.md -* Topic: Agenda Review -* Topic: Info Share -* Topic: Action Item Review -* Topic: Closing the book on matching +- Topic: Agenda Review +- Topic: Info Share +- Topic: Action Item Review +- Topic: Closing the book on matching Requested by: APP @@ -31,36 +32,38 @@ Timebox: 20 minutes https://github.com/unicode-org/message-format-wg/blob/aphillips-issue-351/exploration/selection-matching-options.md #351 -* Topic: Function Registry (continued) +- Topic: Function Registry (continued) Requested by: STA Discussion of the function registry. Two of the three models had sections on this. Homework for next call: -https://github.com/unicode-org/message-format-wg/pull/368 +https://github.com/unicode-org/message-format-wg/pull/368 -* Topic: Markup +- Topic: Markup Requested by: APP, MIH #356 Markup open issues: -* #241 -* #262 -* #238 + +- #241 +- #262 +- #238 — # Topic: Closing the book on matching + EAO: thank you MIH for providing a scoring implementation. It made me realize that it requires to be specific about how much one choice is better than another. e.g. A is 4x as good than B. It feels like a setup for weird situations when unrelated selectors can outweigh the decision. APP: I've considered it. The question is: is there anything in the farther columns that makes the variant jump to the top. -MIH. Are we talking about something like when 1 * * … when one one one? Each selectors choose their values. +MIH. Are we talking about something like when 1 \* \* … when one one one? Each selectors choose their values. -STA: But are the scores normalized? Also, does * mean "any" or "any other"? +STA: But are the scores normalized? Also, does \* mean "any" or "any other"? ECH: I keep alternating between preferring best-match and column-first. Either one will give you the "probably best" match. @@ -70,7 +73,7 @@ STA: For complete messages, all algorithms seem to produce good enough results. EAO: We should include incomplete messages as potential use-cases. -SCL: I need to drop for 30 minutes. Will be back. For the record, I'm still in the First Match camp, for cognitive complexity/simplicity reasons, but I fully admit that I've not done my homework on this one. +SCL: I need to drop for 30 minutes. Will be back. For the record, I'm still in the First Match camp, for cognitive complexity/simplicity reasons, but I fully admit that I've not done my homework on this one. MIH: Can accept column-first. Against first-match. @@ -91,7 +94,9 @@ EAO: Can start with the last column. Will document. STA: I'm OK continuing with column-first provided we figure out the backtracking problem (if it's a problem; still not sure). APP: Let's pursue column-first. + # Topic: Markup + APP: Let's start by discussing the reserved prefixes. Proposal: allow exotic prefixes in operand position, but require well-formed "annotation" and "options" after it. APP: Maybe we don't solve markup in 1.0, but we can reserve prefixes to give us tools to solve in the future. @@ -102,7 +107,7 @@ EAO: I can do a PR to handle keywords in ABNF. APP: Will handle reserved prefixes in placeholders. -MIH: I'm fine with well-formed exotic placeholders. Against blob placeholers. Against allowing new keywords. +MIH: I'm fine with well-formed exotic placeholders. Against blob placeholers. Against allowing new keywords. APP: Are we OK with postponing markup? @@ -128,14 +133,13 @@ STA: EAO, you mentioned the need to address this through interfaces, and that th EAO: Regarding the colon markup, what should the markup for `{:foo}` or `{:span}`? I think it should render as like an ``. When you have an inline subportion of a message that needs to be marked up, the open placeholder / tag needs to be paired correctly with the corresponding close placeholder / tag? -STA: EAO and APP, you have said before that we need to be able to determine how markup placeholders match and how to pair them. Maybe we need to specify this, but also look at how we make this fit in XLIFF, and how this works at runtime. We also need to look at how it’s stored. Once we decide that certain placeholders are open or close, then we +STA: EAO and APP, you have said before that we need to be able to determine how markup placeholders match and how to pair them. Maybe we need to specify this, but also look at how we make this fit in XLIFF, and how this works at runtime. We also need to look at how it’s stored. Once we decide that certain placeholders are open or close, then we ECH: There are other things that will become complicated if we go this route. There are use-cases where we have segmentation happening and a sentence can span multiple segments/messages. An open tag can end up with message1, and the close tag can end up in message3. All of the problems that we are bringing up here all point to needing tooling, in between the source format and the MF messages, both before generating MF messages and after the translation of such messages, in order to clean up the messes caused by these problems. If external tooling is necessary, then it is even less clear why these concerns need to be in MF itself. EAO: The current language related to markup does not require paired markup. -APP: A few things that I think I'm the outlier about. I'm in the protection business, not evaluation business. E.g. `Hello {+ph}{-ph}hello`. - +APP: A few things that I think I'm the outlier about. I'm in the protection business, not evaluation business. E.g. `Hello {+ph}{-ph}hello`. STA: What is the next step with these tiny syntaxes? Is there an extra parsing step? @@ -149,7 +153,7 @@ ZBI: Could the placeholders also specify the id, and what its paired tag? APP: Yes, it would have this. -MIH: The syntax that we have today, with placeholders that have a bag of options, can already represent the open/close/standalone information. If we then go and add open/close information to the placeholder, then we have to handle this somehow in the function registry. But with what we already have, we can represent standalone markup tags, ex: `{img :html alt=some text}` is an HTML `` +MIH: The syntax that we have today, with placeholders that have a bag of options, can already represent the open/close/standalone information. If we then go and add open/close information to the placeholder, then we have to handle this somehow in the function registry. But with what we already have, we can represent standalone markup tags, ex: `{img :html alt=some text}` is an HTML `` APP: 2 points: is HTML special enough that we want to introduce syntax just for it? There is a lot of markup in the world. Are we trying to build a fully generic system, a somewhat generic system, or a targeted system? @@ -174,6 +178,7 @@ Hello World! {-ph} ``` + MIH: We need ids on the close `{-ph}` so that they are able to be paired. ZBI: So is this equivalent? @@ -225,17 +230,12 @@ The advantage of this representation is that it matches what we already have in We allow MessageFormat to be independent of knowing whether it will be processed by an HTML or SSML or other type of processor later on for runtime formatting purposes. - - - - STA: Two questions to focus us on the problems: (1) Do we need open/close concepts in the syntax? (2) Do we protect, or protect and evaluate markup? APP: Need to go back to requirements. - - # Chat + Elango Cheran 6:34 PM (btw, might need to leave 30 mins early today, not sure yet) @@ -250,131 +250,132 @@ Tim Chevalier thx Mihai ⦅U⦆ Niță 6:42 PM + ``` 1 * * * one one one ``` + Mihai ⦅U⦆ Niță 6:49 PM score -* 1 1 => wins in best match -1 * * => column wins -Mihai ⦅U⦆ Niță -6:56 PM -Polish rues on an English message (fallback) -Eemeli Aro -6:59 PM -Could we use the queue, please? -Simon Clark -6:59 PM -I need to drop for 30 minutes. Will be back. For the record, I'm still in the First Match camp, for cognitive complexity/simplicity reasons, but I fully admit that I've not done my homework on this one. -Zibi Braniecki -7:00 PM -I have to switch to another meeting. I'll be reading notes alongside and jump back as soon as I can -bbl -Mihai ⦅U⦆ Niță -7:17 PM -i'm ok with column first -Mihai ⦅U⦆ Niță -7:40 PM -it does buy us open / close -Mihai ⦅U⦆ Niță -7:52 PM -{+span :HTML options} .... {-span :html} ... -{+html:span ...} .. {-html:span} -Mihai ⦅U⦆ Niță -8:00 PM -so: -{span +html}.... {span -html} -? -Simon Clark -8:00 PM -sorry, off to another meeting. Have a good day -Tim Chevalier -8:01 PM -Have to drop as well, have a good day! -You -8:04 PM -Hello, {# #}user!{# #} -Addison Phillips -8:05 PM -Hello {+ph}{-ph}hello -Mihai ⦅U⦆ Niță -8:08 PM -{img :html alt=some text} -You -8:11 PM -Maybe {foo :standalone} {foo open} and {foo /close} instead? -Zibi Braniecki -8:11 PM -Hello World! + +- 1 1 => wins in best match + 1 \* \* => column wins + Mihai ⦅U⦆ Niță + 6:56 PM + Polish rues on an English message (fallback) + Eemeli Aro + 6:59 PM + Could we use the queue, please? + Simon Clark + 6:59 PM + I need to drop for 30 minutes. Will be back. For the record, I'm still in the First Match camp, for cognitive complexity/simplicity reasons, but I fully admit that I've not done my homework on this one. + Zibi Braniecki + 7:00 PM + I have to switch to another meeting. I'll be reading notes alongside and jump back as soon as I can + bbl + Mihai ⦅U⦆ Niță + 7:17 PM + i'm ok with column first + Mihai ⦅U⦆ Niță + 7:40 PM + it does buy us open / close + Mihai ⦅U⦆ Niță + 7:52 PM + {+span :HTML options} .... {-span :html} ... + {+html:span ...} .. {-html:span} + Mihai ⦅U⦆ Niță + 8:00 PM + so: + {span +html}.... {span -html} + ? + Simon Clark + 8:00 PM + sorry, off to another meeting. Have a good day + Tim Chevalier + 8:01 PM + Have to drop as well, have a good day! + You + 8:04 PM + Hello, {# #}user!{# #} + Addison Phillips + 8:05 PM + Hello {+ph}{-ph}hello + Mihai ⦅U⦆ Niță + 8:08 PM + {img :html alt=some text} + You + 8:11 PM + Maybe {foo :standalone} {foo open} and {foo /close} instead? + Zibi Braniecki + 8:11 PM + Hello World! {+ph - type=open - ns="html" - id="foo1" - attr::title="unit2" + type=open + ns="html" + id="foo1" + attr::title="unit2" } - + {-ph} Click me! {+ph - type=close - ns="html" - id="foo1" + type=close + ns="html" + id="foo1" } - + {-ph} -Zibi Braniecki -8:14 PM -Hello World! + Zibi Braniecki + 8:14 PM + Hello World! {+html:a - type=open - id="foo1" - attr::title="unit2" + type=open + id="foo1" + attr::title="unit2" } Click me! {+html:a - type=close - id="foo1" + type=close + id="foo1" } -Mihai ⦅U⦆ Niță -8:14 PM -Click me! + Mihai ⦅U⦆ Niță + 8:14 PM + Click me! {+a :html - id="foo1" + id="foo1" } -Zibi Braniecki -8:14 PM -Hello World! + Zibi Braniecki + 8:14 PM + Hello World! {+html:a - id="foo1" - attr::title="unit2" + id="foo1" + attr::title="unit2" } Click me! {-html:a - id="foo1" + id="foo1" } -Addison Phillips -8:15 PM -Click here <- this is a valid pattern -Click {+bpt}<-bpt}here{+bpt}{-bpt} -Mihai ⦅U⦆ Niță -8:15 PM -{+a :html - id="foo1" - attr::title="unit2" + Addison Phillips + 8:15 PM + Click here <- this is a valid pattern + Click {+bpt}<-bpt}here{+bpt}{-bpt} + Mihai ⦅U⦆ Niță + 8:15 PM + {+a :html + id="foo1" + attr::title="unit2" } Click me! {-a html: - id="foo1" + id="foo1" } -Mihai ⦅U⦆ Niță -8:25 PM -i think that the open / close concepts are useful -they exist in xliff, and allows the prevention of certain kind of validation that many l10n tools already do -Zibi Braniecki -8:29 PM -Hello { $link } -link = [MF2.Markup("a"), MF2.Text("Click Me")]; - - + Mihai ⦅U⦆ Niță + 8:25 PM + i think that the open / close concepts are useful + they exist in xliff, and allows the prevention of certain kind of validation that many l10n tools already do + Zibi Braniecki + 8:29 PM + Hello { $link } + link = [MF2.Markup("a"), MF2.Text("Click Me")]; diff --git a/meetings/2023/notes-2023-03-31.md b/meetings/2023/notes-2023-03-31.md index 2c4e2d7790..44704ba2d4 100644 --- a/meetings/2023/notes-2023-03-31.md +++ b/meetings/2023/notes-2023-03-31.md @@ -1,13 +1,14 @@ Mar 31, 2023 | MessageFormat special session ### Attendees -* Addison Phillips - Unicode (APP) - chair -* Elango Cheran - Google (ECH) -* Staś Małolepszy - Google (STA) -* Mihai Nita - Google (MIH) -* Eemeli Aro - Mozilla (EAO) -* Richard Gibson - OpenJSF (RGN) -* Simon Clark - Oracle (SCL) + +- Addison Phillips - Unicode (APP) - chair +- Elango Cheran - Google (ECH) +- Staś Małolepszy - Google (STA) +- Mihai Nita - Google (MIH) +- Eemeli Aro - Mozilla (EAO) +- Richard Gibson - OpenJSF (RGN) +- Simon Clark - Oracle (SCL) Scribe: ECH, APP @@ -25,9 +26,9 @@ EAO: PR [#371](https://github.com/unicode-org/message-format-wg/pull/371) and PR APP: We have spent a lot of time talking about things without concretizing things, ex: in the form of PRs, etc. Some of those discussions are necessary, ex: to get a better understanding of the alternatives for selection. Maybe we would make better progress by creating PRs for spec text and reading about and discussing the specifics of those. What do people think? -STA: I like the idea of ____. Now that we have ABNF for the spec, it should be easier. I like the PR #371 that is a holistic PR in that it also adjusts the syntax accordingly. +STA: I like the idea of \_\_\_\_. Now that we have ABNF for the spec, it should be easier. I like the PR #371 that is a holistic PR in that it also adjusts the syntax accordingly. -EAO: Our selection discussion comes from my PR about selection, where we didn’t have a shared or complete understanding, so we have to discuss. How do you propose +EAO: Our selection discussion comes from my PR about selection, where we didn’t have a shared or complete understanding, so we have to discuss. How do you propose ECH: THis meeting is addressing what we’re doing better now than in the past. We kind of get lost in some of the details of the discussion. We need to address some of the pros/cons (higher level). We have a tendency to get lost in the details. Also speaks to some things are not agreed to and thus not on solid ground. Could be moving faster. Can we prioritize things? If we have discussion of markup before details like sigils. @@ -35,9 +36,9 @@ MIH: Thank you APP for our good progress lately. To move forward with written pu APP: Like PR #362 that I drafted to discuss selection? -STA: Thanks APP for drafting that selection explainer doc. When we had too many small issues, every PR +STA: Thanks APP for drafting that selection explainer doc. When we had too many small issues, every PR -APP: Would people be okay in merging #362 about selection? No objection, we have consensus. +APP: Would people be okay in merging #362 about selection? No objection, we have consensus. ## Topic: Markup @@ -48,7 +49,7 @@ Markup open issues: #262 #238 -ECH: +ECH: EAO: To have data for this argument, about 4-5% of messages in Mozilla have markup in the message. @@ -58,7 +59,7 @@ ECH: appreciate the stats, but don’t think that’s totally germane/salient. G MIH: I think the format we design should support the tagging of parts of the string, say parts of speech. I think we can have this as a separate concept in the spec. My position is that the placeholders have been sufficient, and can benefit from adding a field to represent open/close/standalone. It would be useful to add a function for how to format / generate markup, not just protect markup tags. -STA: I think having functions and formatToParts means that people will produce markup from the translated messages. The question from last meeting is whether we need open/close/standalone, whether that needs special syntax, and whether we need to have functions in the registry. In the past, we have been successful in outsourcing our problems to other layers, but for protecting markup, we don’t know what our compatibility with XLIFF is. Maybe whatever we do when translating +STA: I think having functions and formatToParts means that people will produce markup from the translated messages. The question from last meeting is whether we need open/close/standalone, whether that needs special syntax, and whether we need to have functions in the registry. In the past, we have been successful in outsourcing our problems to other layers, but for protecting markup, we don’t know what our compatibility with XLIFF is. Maybe whatever we do when translating SCL: About 90% of strings that we deliver don’t need MessageFormat, they’re plain strings. So I don’t want to see those messages with markup in them. @@ -68,7 +69,7 @@ APP: Our current spec allows people to type stuff into the pattern string, excep STA: I think #371 is fine as a solution, and I think APP’s idea to reserve sigil prefix syntax is a more general solution. But what I’m missing is a clear statement of the value proposition of why we need the notion of open and close in the syntax in the first place. -MIH: I’m pretty happy with the direction of #371. Regarding what the benefits of open / close. At Google, we have lots of types of content, including types of markup. People invent their own concepts of open and close, but XLIFF has ways of representing +MIH: I’m pretty happy with the direction of #371. Regarding what the benefits of open / close. At Google, we have lots of types of content, including types of markup. People invent their own concepts of open and close, but XLIFF has ways of representing STA: That’s not what I’m asking. What I’m asking is if we need to represent open/close _in the syntax itself_. @@ -78,32 +79,27 @@ STA: Yes, why not handle it in the function registry? MIH: If we have it outside of the function registry, then it would be easier for translators to deal with, without having to touch the function registry. -APP: For me, when I look at #371, the `+` and `-` give us a way for a placeholder to span a substring. The concern that I have before is that I don’t want to have to take their strings, translate them into MF-style syntax, and then change their strings according to how it - +APP: For me, when I look at #371, the `+` and `-` give us a way for a placeholder to span a substring. The concern that I have before is that I don’t want to have to take their strings, translate them into MF-style syntax, and then change their strings according to how it MIH: Protected content, no special support needed: + ``` Click {||}here{||} to register. ``` With Eemeli's changes (PR 371): + ``` Hello {user}, click {a +html}here{a +html} to continue {img :html src='foo.gif'}! Your offer expires in {|30| :number} days (on {$expDate :datetime skeleton=yMMMd}). ``` - -APP: My feeling is that we’re not far from a consensus. +APP: My feeling is that we’re not far from a consensus. SCL: It would be useful to see what the function signature that responds to this would be, but I am in favour of the proposal. ECH: I would be okay if the `+` and `-` were just syntax sugar for annotating placeholders that have a specific value for a specific field. EAO: ECH, look at #371 because the PR is more generic than that. It doesn’t specify behavior, and markup as a term is not used, except to describe one example. But the interpretation of the syntax is implementation specific. - - - - - diff --git a/meetings/2023/notes-2023-04-10.md b/meetings/2023/notes-2023-04-10.md index 2c1819942b..7c65edd332 100644 --- a/meetings/2023/notes-2023-04-10.md +++ b/meetings/2023/notes-2023-04-10.md @@ -1,4 +1,5 @@ ### Attendees + - Addison Phillips - Unicode (APP) - chair - Tim Chevalier - Igalia (TIM) - Eemeli Aro - Mozilla (EAO) @@ -6,10 +7,8 @@ - Elango Cheran - Google (ECH) - Richard Gibson - OpenJSF (RGN) - Scribe: TIM - # Agenda: ## Topic: Markup PR 371 @@ -26,9 +25,9 @@ EAO: My reason for agreeing with Stas's proposal to drop the placeholder and use APP: I think that's fair; my observation is spec terminology and ABNF rule names should match as much as possible. I favor "placeholder" as the name of the thing that can occur inside a pattern. We have patterns in two kinds of places. I agree from the POV that an expression and a placeholder are not notably distinct once you get rid of the distinction between `markup` and `expression`. Interesting question about reserved sigils and how we incorporate them. That's a thing that's in my PR; basically the reserved sigils become labels for named items and they're indistinguishable from `function` in terms of their placement and how they can be decorated. The question is if that's appropriate or if we need to rewind a bit to allow named things to have other kinds of constructs behind them besides just `option` and so forth. I thought the interesting thing, Eemeli, in your PR was that you allow the reserved sigils to be followed by basically any non-quotable text. That would be interesting for a "comment" placeholder. I think it would be counterproductive for any kind of functional placeholder because it means that any un-reserving we did in the future would break existing parsers. We don't want to do that. -EAO: I think that's also a different discussion. +EAO: I think that's also a different discussion. -MIH: For the name, I see nothing against calling them placeholders everywhere, including in the `let`. +MIH: For the name, I see nothing against calling them placeholders everywhere, including in the `let`. APP: except expressions can appear outside of placeholders @@ -44,7 +43,7 @@ EAO: My intention was to make it minimally opinionated on this topic, not even g MIH: For me, they're all functions -APP: They may all be functions and work like functions, and they can have the same descriptive thing, but I suspect they should have different names. If they're just functions, +APP: They may all be functions and work like functions, and they can have the same descriptive thing, but I suspect they should have different names. If they're just functions, (APP's connection froze) @@ -85,13 +84,14 @@ APP: We can do that. For me, the only thing in my mind is whether we call everyt APP: OK, why don't we merge it and then argue about changes from there. ## Topic: Function Registry (continued) + Discussion of the function registry. Two of the three models had sections on this. Deferred b/c Stas isn't here ## Topic: Reserve sigils for future use -* #360, #374 +- #360, #374 https://github.com/unicode-org/message-format-wg/pull/374 @@ -103,7 +103,7 @@ APP: That is permitted, you're just not reading the second line of, I think, `ex MIH: So the annotation is the function name – in an expression we have `literal / variable` and the variable starts with '$', which means we can't have variables starting w/ one of the reserved things. -APP: We can; what we don't allow currently is nesting. What you can't have is '%'foo':'function, whatever. You can just have '%'foo`options`, but can't have '%''foo''function'. +APP: We can; what we don't allow currently is nesting. What you can't have is '%'foo':'function, whatever. You can just have '%'foo`options`, but can't have '%''foo''function'. MIH: Is this allowed? `...{@foo :date }....` @@ -129,15 +129,15 @@ EAO: I'm confused- APP: If we reuse `expression`, the parser would know where to find the pieces of the interior. But it would restrict what you could make the pieces of the interior; would be a limit to what you could do. The parser would return that to you in a structured way. -MIH: I was thinking, step back a little bit and wonder: what do we gain from all this, if we reserve all this? A parser can parse the unknown new syntax, an older parser, but then what? I can't use it, really, at runtime, because I've parsed it and don't know what to do within it anyway. That's the point: what's the benefit? It's useful, for instance- the way we have it now, I look at it and I know exactly that this is a string literal or a '$' something which is a variable, or a function name. That's good for localization. When you translate, I don't care what's inside the block, but it's good to know that it's "expiration date", end date, etc. As a translator that's useful for me. Once I get a block and have no idea what it is, that doesn't help me much. +MIH: I was thinking, step back a little bit and wonder: what do we gain from all this, if we reserve all this? A parser can parse the unknown new syntax, an older parser, but then what? I can't use it, really, at runtime, because I've parsed it and don't know what to do within it anyway. That's the point: what's the benefit? It's useful, for instance- the way we have it now, I look at it and I know exactly that this is a string literal or a '$' something which is a variable, or a function name. That's good for localization. When you translate, I don't care what's inside the block, but it's good to know that it's "expiration date", end date, etc. As a translator that's useful for me. Once I get a block and have no idea what it is, that doesn't help me much. APP: Presumably an implementation that was using this would know what it was and could describe it -MIH: I just wonder, what's the benefit of having a parser that supports future things. Ok, I have an old parser in my localization tool, you give me a new message, I won't explode I don't know what it is, will treat it as a black-box but I have zero smartness about it. +MIH: I just wonder, what's the benefit of having a parser that supports future things. Ok, I have an old parser in my localization tool, you give me a new message, I won't explode I don't know what it is, will treat it as a black-box but I have zero smartness about it. APP: Whatever it is would be opaque -EAO: Main benefit is it gives us space to consider the possibility later of doing something like a MessageFormat v2.1. A new release of the spec w/ some new features that doesn't break existing parsers. For example, if we introduce the '@' for global variables of some sort, so you could have '@foo:date` and `@foo` would come from some global scope, if we wanted to introduce that and didn't have the reserved space in the spec, all of the stuff that works in 2.0 will break. So whatever we release after that has to be 3.0. If we have the reserved space and we start using it, we can call that 2.1 and not 3.0. +EAO: Main benefit is it gives us space to consider the possibility later of doing something like a MessageFormat v2.1. A new release of the spec w/ some new features that doesn't break existing parsers. For example, if we introduce the '@' for global variables of some sort, so you could have '@foo:date`and`@foo` would come from some global scope, if we wanted to introduce that and didn't have the reserved space in the spec, all of the stuff that works in 2.0 will break. So whatever we release after that has to be 3.0. If we have the reserved space and we start using it, we can call that 2.1 and not 3.0. APP: Not just that, but remember that strings don't identify what version number they are. Lots of people will have strings, if you start using the new features, you won't have tools going "I don't understand what this is, but it's reserved, so I won't throw it away or complain that it's not well-formed. It's not valid to me, but it is well-formed". @@ -157,7 +157,7 @@ APP: So we should have that discussion. Is it a named item followed by opaque te EAO: So we could have something like a `:foo` with options and then it's followed by `@| = thing`, which would currently be an error, or are you saying we would reserve that for future use? -APP: Is the sigil always attached to a name? Or just, after me, before whatever the next token separator is, is opaque? I think it wants to be attached to a name. +APP: Is the sigil always attached to a name? Or just, after me, before whatever the next token separator is, is opaque? I think it wants to be attached to a name. MIH: I think if it's not attached to a name, it's not a sigil anymore. That's how I've seen it in a lot of places @@ -198,13 +198,14 @@ MIH: I think it's not necessarily a good idea to nest them. Imagine what you do APP: I'm not sure comments are a good idea either. Just bringing it up as a test case. Let's do the PR and then we'll come back and re-examine ## Topic: Allow `name` to start with a digit -* #350 + +- #350 https://github.com/unicode-org/message-format-wg/issues/350 APP: Had a long thread about it in the PR; I proposed allowing a name to start with a number, which we don't currently do -EAO: My preference: it seems like it makes stuff complicated and a little weird, we should do the XML thing of adding an '_' as prefix if we need to +EAO: My preference: it seems like it makes stuff complicated and a little weird, we should do the XML thing of adding an '\_' as prefix if we need to MIH: To summarize, I added some comments, but my take is also that we shouldn't. We are just adding complications. Now what happens if you have a mix of named and non-named arguments; it's becoming a mess @@ -250,7 +251,7 @@ EAO: Related in my head APP: I'm trying to reduce the mental overhead for users as much as we can. I think the more arcane we make things, the more of a tripping hazard there is. I do think the positional argument one is interesting simply b/c people have a ton of code in the world that already uses positional. This is a way they don't have to change their code to adopt our formatter. If we want to say that – if we want to build in a feature to support that, that would be where you have to use an underscore and that's fine. This is potentially a way to get around that. I'm more concerned about – I don't want to think too much about what I can put into the name of a variable. It can even be generated by code from data values over which people may not have much control. Generally will be ASCII but the ASCII can include numbers. That's where I'll stop. We should get a sense for where people are at. Eemeli and Mihai, I think you're both in the 'reject' camp. -EAO: Even where we would currently require an '_' prefix, we're allowing the possibility for a runtime implementation to see that it's not getting a bag of named options but rather an array of options. There is a pathway there for an implementation to enable old code to not need any updates. +EAO: Even where we would currently require an '\_' prefix, we're allowing the possibility for a runtime implementation to see that it's not getting a bag of named options but rather an array of options. There is a pathway there for an implementation to enable old code to not need any updates. APP: Outside the scope of what we're defining. We can recommend that. diff --git a/meetings/2023/notes-2023-04-24.md b/meetings/2023/notes-2023-04-24.md index 19862b28ff..637e48fa1e 100644 --- a/meetings/2023/notes-2023-04-24.md +++ b/meetings/2023/notes-2023-04-24.md @@ -1,17 +1,18 @@ ### Attendees -* Addison Phillips - Unicode (APP) - chair -* Simon Clark (SCL) -* Mihai Nita - Google (MIH) -* Staś Małolepszy - Google (STA) -* Tim Chevalier - Igalia (TIM) + +- Addison Phillips - Unicode (APP) - chair +- Simon Clark (SCL) +- Mihai Nita - Google (MIH) +- Staś Małolepszy - Google (STA) +- Tim Chevalier - Igalia (TIM) Scribe: Simon Clark - SCL # Action items -* NEW: APP to check on license for software, test suite -* NEW: APP to update deliverable wording -* NEW: ALL: read function registry PR +- NEW: APP to check on license for software, test suite +- NEW: APP to update deliverable wording +- NEW: ALL: read function registry PR Next call: 8 May 2023 @@ -19,7 +20,7 @@ Next call: 8 May 2023 ## Info Share -SCL: managed to get a couple interns. Want to build an in browser message editor in JS. +SCL: managed to get a couple interns. Want to build an in browser message editor in JS. MIH: does the JS implementation by Eemeli play for you from gh? There’s a library STA: there is one for MF1 also MIH: also a sandbox where you can play with FLuent one. @@ -35,29 +36,30 @@ Mihai ⦅U⦆ Niță ICU4J: https://icu4j-demos.unicode.org/icu4jweb/formatTest.jsp ## Reserve sigils for future use -* Requested by: APP -* #360 #374 -Can we merge? EAO reviewed, Stas not blocking it. -Allow reserve sigil to come first. Prevent trailing spaces from becoming part of the message -No objections to merging. Going ahead. +- Requested by: APP +- #360 #374 + Can we merge? EAO reviewed, Stas not blocking it. + Allow reserve sigil to come first. Prevent trailing spaces from becoming part of the message + No objections to merging. Going ahead. ## Schedule and release plans -* requested by STA -* -Function Registry is big outstanding topic needing clarity. Is key large scale thing needs addressing. + +- requested by STA +- Function Registry is big outstanding topic needing clarity. Is key large scale thing needs addressing. Target is August release of ICU MIH - several open discussions tat need resolution -APP - what are our exit criteria to meet the release date? Spec clean up, implementations required, function registry spec, +APP - what are our exit criteria to meet the release date? Spec clean up, implementations required, function registry spec, STA - test suite requires definition APP - Doesn’t formatting numbers and dates require function registry? - required for initial release MIH - big JSON file from EAO that can be run through test at https://github.com/unicode-org/icu/blob/main/icu4j/main/tests/core/src/com/ibm/icu/dev/test/message2/FromJsonTest.java does not give same result as JSON + ``` new TestCase.Builder() .pattern("match {$foo :plural} when 1 {one} when * {other}") @@ -74,11 +76,11 @@ https://github.com/unicode-org/message-format-wg/blob/main/docs/goals.md 1 is easy to complete: STA 2 is well done. -APP: What is the formal definition of the data model? +APP: What is the formal definition of the data model? -MIH: There is a typescript representation: +MIH: There is a typescript representation: -EAO JSON schema as formal language description of data model - +EAO JSON schema as formal language description of data model - MIH - not runnable implementation necessarily. @@ -89,13 +91,13 @@ MIH - Typescript schema description can be found at https://github.com/unicode-o EAO - datamodel defines what is important in what is represented by the syntax. A parsed interface level for what a message looks like Gives an implementation an interface to transform messages. Allows, for instance, tooling to authoritatively rename a variable. Should not require an implementation to use the data model. Eg: spec does not define if functions can be redefined. -APP If I want to implement MF2, we have ABNF, what more do I need to write an interoperable implementation. +APP If I want to implement MF2, we have ABNF, what more do I need to write an interoperable implementation. EAO - data model is significant because it has transformed our own internal thinking on what an internationalized message can look like. Universal tool that can represent any message in any language. TIM - analogous to ECMA script lexical definition / spec. Prerequisite for release if we want good implementations -STA - requirement to ensure interoperable implementations. JSON schema would be valid way to define. +STA - requirement to ensure interoperable implementations. JSON schema would be valid way to define. EAO - https://json-schema.org/ @@ -103,7 +105,7 @@ APP - we have similar but slightly definition of what a data model needs to look Can we flesh out what we want to deliver by the fall? 3 - mapping to XLIFF and back? Post release according to MIH and EAO. -STA- point of XLIFF was to ensure nothing blocks working with XLIFF, not necessarily required to define in order to deliver. +STA- point of XLIFF was to ensure nothing blocks working with XLIFF, not necessarily required to define in order to deliver. EAO - XLIFF spec of MF2 will likely be dependent on MessageResource2.0 spec as well. @@ -113,13 +115,13 @@ APP - 4 - how to merge data model with current arguments - resolve to string EAO - 4 best way to answer deliverable is ??? -MIH - 4 is to define behaviour that is not described in the spec. +MIH - 4 is to define behaviour that is not described in the spec. EAO - keep “resolving” wording APP - get rid of the “translations” wording -TIM - should define what we mean by resolving in this contex.t It is an overloaded term +TIM - should define what we mean by resolving in this contex.t It is an overloaded term EAO - “resolving/Resolution” should be added to here @@ -147,7 +149,7 @@ APP 5 - Is about validating implementations. Is not code, but a set of test case MIH - challenges - upput may be different for different uses, may change based on function registry. -EAO - attempt to be minimally dependent on any function registry. Requires at least some function registry calls, as that is important part of spec. Hopes someone +EAO - attempt to be minimally dependent on any function registry. Requires at least some function registry calls, as that is important part of spec. Hopes someone else can drive test suite - non implementor, possibly? APP - test-suite is closely tied and coordinated with requirements / spec @@ -164,13 +166,12 @@ EAO - happy with anything in the javascript set of formatters in the function re EAO - Would like to be able to test at least 2 different outputters ??? -MIH - Test suite should be portable, +MIH - Test suite should be portable, EAO - overlap between javascript formatters and ICU formatters. - - ## Function Registry (continued) -* Requested by: STA + +- Requested by: STA Discussion of the function registry. Two of the three models had sections on this. diff --git a/meetings/2023/notes-2023-05-08.md b/meetings/2023/notes-2023-05-08.md index 0b646ccffd..b881be7524 100644 --- a/meetings/2023/notes-2023-05-08.md +++ b/meetings/2023/notes-2023-05-08.md @@ -1,20 +1,21 @@ ### Attendees -* Addison Phillips - Unicode (APP) - chair -* Tim Chevalier - Igalia (TIM) -* Mihai Nita - Google (MIH) -* Staś Małolepszy - Google (STA) -* Richard Gibson (RGN) -* Eemeli Aro (EAO) -* Zibi Braniecki - Amazon (ZBI) -Scribe: Mihai Nita +- Addison Phillips - Unicode (APP) - chair +- Tim Chevalier - Igalia (TIM) +- Mihai Nita - Google (MIH) +- Staś Małolepszy - Google (STA) +- Richard Gibson (RGN) +- Eemeli Aro (EAO) +- Zibi Braniecki - Amazon (ZBI) +Scribe: Mihai Nita Action items + - [ ] APP to check on license for software, test suite - [x] APP to update deliverable wording - [ ] ALL: read function registry PR -- [ ] NEW: APP: review PRs for next time for commit +- [ ] NEW: APP: review PRs for next time for commit # Agenda: @@ -25,8 +26,9 @@ Action items **Topic:** Action Item Review **Topic:** Function Registry (continued) -* Requested by: STA -* https://github.com/unicode-org/message-format-wg/pull/368 + +- Requested by: STA +- https://github.com/unicode-org/message-format-wg/pull/368 ZBI: tooling, customer of the registry. Provide a good experience for the tools (CAT tools, refactoring, etc). @@ -40,7 +42,7 @@ APP: consider an antipattern to have different dialects of MF. It would be ideal EAO: if the options match it is less maintenance. With transformat -MIH: Agree with Addison. Difficult to +MIH: Agree with Addison. Difficult to TIM: different layers: does that mean … @@ -82,7 +84,7 @@ EAO: _missing_ STA: no matter what we decide on the minimal set, we need to agree on the schema. No experience maintaining something like this. -APP: we actually have some experience. LDML data + ICU4C / 4J / 4X + ECMAScript are very very close to each other. +APP: we actually have some experience. LDML data + ICU4C / 4J / 4X + ECMAScript are very very close to each other. MIH: the current MF group does no have to maintain the registry content. Same as the group that designed BCP 47 is the the group that maintains the IANA Language Registry. @@ -96,8 +98,8 @@ To show that a minimal set can be implemented on top of it. APP: goal for the next 2 weeks: merge this and do PRs on top of it ### Actions items: + - [ ] APP: go through the PRs, commit / not commit - [ ] STA: cleanup registry PR ## Topic: AOB? - diff --git a/meetings/2023/notes-2023-05-22.md b/meetings/2023/notes-2023-05-22.md index 8ca7edbffe..1b89718a46 100644 --- a/meetings/2023/notes-2023-05-22.md +++ b/meetings/2023/notes-2023-05-22.md @@ -3,15 +3,16 @@ Attendees: Please fill “attendee” block with your name, affiliation and a 3-letter acronym for the scribe to use (see examples in “previous attendees”): ### Attendees -* Addison Phillips - Unicode (APP) - chair -* Mihai Niță - Google (MIH) -* Tim Chevalier - Igalia (TIM) -* Eemeli Aro (EAO) -* Ujjwal Sharma (UJJ) -* Staś Małolepszy - Google (STA) -* Romulo Cintra - Igalia (RCA) -* Elango Cheran - Google (ECH) -* Richard Gibson - OpenJSF (RGN) + +- Addison Phillips - Unicode (APP) - chair +- Mihai Niță - Google (MIH) +- Tim Chevalier - Igalia (TIM) +- Eemeli Aro (EAO) +- Ujjwal Sharma (UJJ) +- Staś Małolepszy - Google (STA) +- Romulo Cintra - Igalia (RCA) +- Elango Cheran - Google (ECH) +- Richard Gibson - OpenJSF (RGN) **Scribe:** ECH @@ -21,8 +22,6 @@ STA: formatting specification; a reference implementation? ERO: meta-topic: what is the quality of the spec text required? - - ### Topic: Info Share _none_ @@ -31,7 +30,7 @@ _none_ APP to check on license for software, test suite ALL: read function registry PR -APP: review PRs for next time for commit +APP: review PRs for next time for commit STA: cleanup registry PR APP: I am still working on the license. All of you had homework to read the function registry PR. I have reviewed the PRs. STA, what is the status of the registry PR? @@ -64,40 +63,35 @@ EAO: Can we make active PR review a part of our regular meetings? APP: Yes, I will do that going forward. - ### Topic: Active PR review Discussion of active PRs. We will merge or reject them in the call. - -| PR | Description | Recommendation | -|------|-------------|----------------| -| #197 | Consensus 7 | Discuss (see below) | -| #278 | Add examples in other resource languages | Abandon | -| #315 | Bidi support | Discuss (see below) | -| #318 | Format to Parts | Reject (see below) | -| #357 | Unknown Markup error | Reject (obsolete) | -| #364 | Unquoted plain expression arguments | Merge with edits | -| #368 | Draft of registry | Discuss | -| #372 | Column-first | Merge | -| #381 | Variable overrides | Merge with edits | -| #382 | Literal Resolution | Discuss | - - +| PR | Description | Recommendation | +| ---- | ---------------------------------------- | ------------------- | +| #197 | Consensus 7 | Discuss (see below) | +| #278 | Add examples in other resource languages | Abandon | +| #315 | Bidi support | Discuss (see below) | +| #318 | Format to Parts | Reject (see below) | +| #357 | Unknown Markup error | Reject (obsolete) | +| #364 | Unquoted plain expression arguments | Merge with edits | +| #368 | Draft of registry | Discuss | +| #372 | Column-first | Merge | +| #381 | Variable overrides | Merge with edits | +| #382 | Literal Resolution | Discuss | PR #197 is about an old WG consensus. Let's double-check that consensus quickly and merge in the call. PR #315 about bidi needs another round of edits and should be discussed in a future call. PR #318 about formatToParts is not written in a way that fits into the spec. A version that is "spec ready" should be produced instead. The recommendation "discuss" is to ensure there is WG consensus before merging. The recommendation "merge with edits" is to merge once existing comments have been addressed. - #### #197 Consensus 7 APP: This is from years ago. Should we merge this in, or should we review it? RCA: The comment that I added on the PR as a review required block was based on meeting feedback. -APP: If I merge it right now, are there any objections? And you can file an issue afterwards if you want to make changes. +APP: If I merge it right now, are there any objections? And you can file an issue afterwards if you want to make changes. STA: Some of this wording is obsolete. But maybe we can make changes in a followup. @@ -115,19 +109,16 @@ APP: This was filed by a former colleague who has not responded. I don’t think APP: I propose that we defer this until we get to the point where we handle it. I can meet with EAO and anyone else interested at that point. - #### #318 Format to Parts APP: I think this should be turned into a proposal, either as a doc or a PR. MIH: That sounds good. - #### #357 Unknown Markup error APP: This is about markup. This is now obsolete, so I will close it. - #### #364 Unquoted plain expression arguments APP: What is the status of this, EAO? @@ -162,15 +153,13 @@ STA: The XML definition of `nmtoken` allows for starting characters that conflic APP: The direction we want to go, as has been said, is that everything provided that isn’t annotated as being some other type is a value, and some values are allowed to be unquoted. - #### #372 Column-first APP: I think the discussion here has become quiet. Thanks EAO for doing the work on this. I think we can merge this, and I hear no objections. - #### #381 Variable overrides -APP: I haven’t heard responses, so I want to leave time for feedback. The example I was giving was that you can refer to the same thing on the left and right hand sides, which is confusing: `let $foo = {$foo} something`. You probably want instead to decorate that +APP: I haven’t heard responses, so I want to leave time for feedback. The example I was giving was that you can refer to the same thing on the left and right hand sides, which is confusing: `let $foo = {$foo} something`. You probably want instead to decorate that TIM: Can we not merge this PR until I have time to review. Name shadowing is a tricky concept. @@ -182,10 +171,6 @@ EAO: Let’s continue this in the PR. APP: We need more reviewers for this PR. - - - - ### Topic: Formatting spec STA: Could I create a presentation in 2 weeks to outline my ideas? @@ -206,11 +191,10 @@ EAO: Are you closing #318 and #357? APP: Yes. - ### Topic: Function Registry (continued) -Continued discussion of the function registry. Two of the three models had sections on this. -https://github.com/unicode-org/message-format-wg/pull/368 +Continued discussion of the function registry. Two of the three models had sections on this. +https://github.com/unicode-org/message-format-wg/pull/368 ### Topic: AOB? diff --git a/meetings/2023/notes-2023-06-05.md b/meetings/2023/notes-2023-06-05.md index 912f28907d..375817597d 100644 --- a/meetings/2023/notes-2023-06-05.md +++ b/meetings/2023/notes-2023-06-05.md @@ -1,13 +1,14 @@ # 2023-06-05 MFWG Teleconference ### Attendees -* Addison Phillips - Unicode (APP) - chair -* Eemeli Aro (EAO) -* Elango Cheran - Google (ECH) -* Staś Małolepszy - Google (STA) -* Tim Chevalier - Igalia (TIM) -* Mihai Niță - Google (MIH) -* Richard Gibson - OpenJSF (RGN) + +- Addison Phillips - Unicode (APP) - chair +- Eemeli Aro (EAO) +- Elango Cheran - Google (ECH) +- Staś Małolepszy - Google (STA) +- Tim Chevalier - Igalia (TIM) +- Mihai Niță - Google (MIH) +- Richard Gibson - OpenJSF (RGN) **Scribe:** RGN (until 10:00 Pacific) @@ -21,7 +22,6 @@ EAO: Demo of TextMate-based syntax highlighting EAO: TextMate applies rules one line at a time, but some of the grammar is actually tricky to get write (for example, `let`, because there’s no clear “end” indication) EAO: It’s currently doing a full reparse on every change and feeling fast enough, but supporting incremental parsing will be needed later on. - ### Topic: Action Item Review APP to check on license for software, test suite @@ -29,22 +29,22 @@ EAO: Which license will carry the ABNF? APP: It should be under Software & Data. ECH: The license since 2016 is based on MIT. It is being redone recently to be based on Apache. Don’t quote me on that because I’m not a lawyer. That’s my understanding of how permissive it is. - ## Topic: Function Registry (continued) Requested by: STA Discussion of the function registry. Two of the three models had sections on this. ## Topic: Active PR review + Discussion of active PRs. We will merge or reject them in the call. -| PR | Description | Recommendation | -|------|-------------|----------------| -| #315 | Bidi support | Discuss (see below) | -| #364 | Replace `nmtoken` with `unquoted` | Merge with edits | -| #368 | Draft of registry | Discuss | -| #381 | Variable overrides | Merge | -| #382 | Literal Resolution | Merge | +| PR | Description | Recommendation | +| ---- | ----------------------------------- | ---------------------------- | +| #315 | Bidi support | Discuss (see below) | +| #364 | Replace `nmtoken` with `unquoted` | Merge with edits | +| #368 | Draft of registry | Discuss | +| #381 | Variable overrides | Merge | +| #382 | Literal Resolution | Merge | | #385 | Clarifications to pattern selection | Merge (discussion to follow) | The recommendation "discuss" is to ensure there is WG consensus before merging. The recommendation "merge with edits" is to merge once existing comments have been addressed. @@ -98,4 +98,3 @@ Flat list vs hierarchies. // this meeting's notes are incomplete ## Topic: AOB? - diff --git a/meetings/2023/notes-2023-06-19.md b/meetings/2023/notes-2023-06-19.md index 408f4cc9ed..fb0923060f 100644 --- a/meetings/2023/notes-2023-06-19.md +++ b/meetings/2023/notes-2023-06-19.md @@ -3,14 +3,14 @@ Attendees: Please fill “attendee” block with your name, affiliation and a 3-letter acronym for the scribe to use (see examples in “previous attendees”): ### Attendees -* Addison Phillips - Unicode (APP) - chair -* Tim Chevalier - Igalia (TIM) -* Chris Dieringer - Walmart (CMD) -* Mihai Niță - Google (MIH) -* Staś Małolepszy - Google (STA) -* Eemeli Aro (EAO) -* Ujjwal Sharma - Igalia (USA) +- Addison Phillips - Unicode (APP) - chair +- Tim Chevalier - Igalia (TIM) +- Chris Dieringer - Walmart (CMD) +- Mihai Niță - Google (MIH) +- Staś Małolepszy - Google (STA) +- Eemeli Aro (EAO) +- Ujjwal Sharma - Igalia (USA) Today’s Scribe: TIM @@ -18,7 +18,6 @@ Today’s Scribe: TIM EAO: Want to overview the long PR to add the rest of the formatting. Premature for discussion, but want to briefly introduce it to help people review the PR https://github.com/unicode-org/message-format-wg/pull/396 - ### Topic: Info Share Presentation this week (Thursday) by Addison at the Unicode CLDR thingy. @@ -47,13 +46,13 @@ Discussion of active PRs. We will merge or reject them in the call. The recommendation "discuss" is to ensure there is WG consensus before merging. The recommendation "merge with edits" is to merge once existing comments have been addressed. ## Topic: Bidi support -https://github.com/unicode-org/message-format-wg/pull/315 + +https://github.com/unicode-org/message-format-wg/pull/315 Discussion of bidirectional text support and specifically how to handle auto-isolation of placeables. APP: I am satisfied with the PR; could make more comments, but I think merging is the right next step. Does anyone want to discuss? [silence] APP: Should I merge it? Merging. - ## Topic: Name shadowing https://github.com/unicode-org/message-format-wg/pull/381 @@ -75,10 +74,11 @@ MIH: I’m trying to clarify overriding what you call “variables defined elsew TIM: [Couldn’t write down my own comment] -STA: I’m on the opposite side. I strongly feel this should be allowed, from the perspective of a native speaker of a Slavic language. To give reasons, Addison’s comments had a list of very good use cases that I think other commenters were asking for. I want to +1. We mentioned complexity and the need to refer back to the spec. I have a sense, this is a double-edged sword. You will also need to check back with the spec if we don’t allow overriding. There are two mental models: why choose the non-intuitive one? Crucially, there is a bit of complexity, true, but I argue that this is very useful complexity in the spec that will be used. There is a big inconsistency with how we handle the message arguments when shadowing them. This is what Addison also mentioned. So maybe we need to solve this first. I don’t see releasing this thing where we can shadow message arguments with local variables, but can’t shadow local variables with each other. +STA: I’m on the opposite side. I strongly feel this should be allowed, from the perspective of a native speaker of a Slavic language. To give reasons, Addison’s comments had a list of very good use cases that I think other commenters were asking for. I want to +1. We mentioned complexity and the need to refer back to the spec. I have a sense, this is a double-edged sword. You will also need to check back with the spec if we don’t allow overriding. There are two mental models: why choose the non-intuitive one? Crucially, there is a bit of complexity, true, but I argue that this is very useful complexity in the spec that will be used. There is a big inconsistency with how we handle the message arguments when shadowing them. This is what Addison also mentioned. So maybe we need to solve this first. I don’t see releasing this thing where we can shadow message arguments with local variables, but can’t shadow local variables with each other. EAO: (from chat) asked - : Should this be valid MF2, if $num is originally given as an external variable? + ``` let $num = {$num :number} {The number is {$num}} @@ -102,8 +102,9 @@ MIH: I agree with your use case, sequence of definitions - my solution is to use APP: Nothing prohibits you from doing what you suggest, but we’re suggesting allowing the model that you see -MIH: I’m trying to argue that seeing that thing with 3 different names, it’s instantly obvious what it means. I know Eemeli said “obvious” is a personal thing, but I don’t think anyone would be confused by (the unique names). +MIH: I’m trying to argue that seeing that thing with 3 different names, it’s instantly obvious what it means. I know Eemeli said “obvious” is a personal thing, but I don’t think anyone would be confused by (the unique names). From chat: + ``` let $foo = {$foo :text-transform transform=uppercase} let $foo = {$foo :trim} @@ -111,13 +112,14 @@ let $foo = {$foo :sanitize target=html} ``` can similarly use different names, and it will not affect leveraging: + ``` let $foo_tmp1 = {$foo :text-transform transform=uppercase} let $foo_tmp2 = {$foo_tmp1 :trim} let $foo = {$foo_tmp2 :sanitize target=html} ``` -I think the second is less confusing. For arguments, I’m ok with not allowing shadowing if that’s less confusing. We could also use different sigils for locals vs. globals. I don’t know if it’s worth the trouble. +I think the second is less confusing. For arguments, I’m ok with not allowing shadowing if that’s less confusing. We could also use different sigils for locals vs. globals. I don’t know if it’s worth the trouble. APP: A challenge is that the message can’t “see” the argument list. That might be an argument for changing the sigil for local variables, to prevent accidentally overriding a value that’s being passed. I’m not sure we want to reopen that. I promised to timebox; clearly we’re not in agreement yet with my recommendation to merge. I’d like to see if we can work this to “done” by the next call. We need to make a decision and move along. I’m not sure that I’ve heard any new arguments today; we’re still split into two camps. I’d thought we were closer to done. Is that okay? @@ -139,7 +141,7 @@ EAO: This is your thing now, we’ve timeboxed. STA: The fact that I’m silent does not mean I like the idea of this -APP: It would be work to convince me, but let’s see. +APP: It would be work to convince me, but let’s see. STA: I’ll follow up in Github comments @@ -152,6 +154,7 @@ APP: any objections to merging? APP: merged ### Topic: Pattern Selection text + https://github.com/unicode-org/message-format-wg/pull/388 APP: editorial change to [...] @@ -194,9 +197,9 @@ EAO: I think that’s a separate PR MIH: At first glance, this makes total sense, just renaming terms. The trouble is, in grammar productions, that might make the grammar more complicated. Now you need a lot more context to understand: should I produce an operand or not? I see something and I have to know what I produce from the grammar. -APP: Mark made a comment on this earlier in our history. Sometimes you define a production for something that’s not necessarily structurally but makes it easier to talk about things in the spec coherently. +APP: Mark made a comment on this earlier in our history. Sometimes you define a production for something that’s not necessarily structurally but makes it easier to talk about things in the spec coherently. -MIH: If we take the grammar to be kind of like that, where we don’t have to produce exactly the same grammar, it’s fine. +MIH: If we take the grammar to be kind of like that, where we don’t have to produce exactly the same grammar, it’s fine. APP: You can always add a level of indirection. Doesn’t change the grammar’s functional interpretation. This is to help people to understand. @@ -207,7 +210,7 @@ EAO: Part of this might be clarified by PR #393 – adding a JSON-based intercha APP: So what do we want to do with #395? Any objection? -MIH: I’m fine to merge it. +MIH: I’m fine to merge it. ## Topic: Function introducers/negative numbers @@ -225,7 +228,7 @@ APP: This is what this topic is for STA: The alternative is to reconsider the open and close prefixes. Today we use +/-. I filed two PRs with two different alternatives. One is more ??, more reasonable, I would call it ?? - the one with colons. The other one is more like “maybe this could fly”. All of this is not ideal. I don’t think we can find prefixes that will feel intuitive. +/- are equally cryptic. I want to acknowledge that as a baseline for us. At least that’s what I claim. I see value in replacing them with something just as cryptic but in the case of :: and :/, it’s something that has some other benefits. The colon option doubles down on : as a generic function introducer. Maybe then we could drop the `reserved` production. -APP: I’m not sure if we have a PR open to reserve some of `reserved` for private use. The two-sigil thing might be a way to do that. But this is reopening the discussion of whether to change +/- or the reserved sigils. I shouldn’t say “reserved”, the non-reserved currently-in-use sigils. +APP: I’m not sure if we have a PR open to reserve some of `reserved` for private use. The two-sigil thing might be a way to do that. But this is reopening the discussion of whether to change +/- or the reserved sigils. I shouldn’t say “reserved”, the non-reserved currently-in-use sigils. EAO: I would like to contest a little bit the assertion that all of these possibilities are equally cryptic. Many are cryptic, yes; of all the options we’ve considered, I am not aware of anything less cryptic than the +/- pair of things. I’ve shown this can be handled with very little complexity cost to an actual implementation. When looking at something like the Firefox Fluent set of messages we have, this seems like one of the closest things to what we’re actually working towards everyone being able to do with their formatting. Fluent is structurally somewhat similar to MessageFormat and in this space, I can see that things like markup are used by about 5% of all messages while negative literal numbers are not used at all. My strong preference is to do what’s necessary to make markup as good and as non-cryptic as possible, even if the cost is to make the workaround for negative numbers a little bit less complex. @@ -233,6 +236,7 @@ APP: I don’t disagree. I think my concern with negative numbers is that I want EAO: I added in the chat: We could have: + ``` literal = quoted / unquoted / negative negative = "-" ( digit / "." ) … @@ -254,22 +258,23 @@ APP: We could look at different sigil options. CMD: I’m wondering if there’s a good reference issue that I can go study some of the motivating/originating markup cases. I would like some background so I don’t reopen any past decisions that were heavily debated. To have meaningful input, I feel like I need to be able to see the past, e.g. the origins of markup even being in the spec. -APP: To clarify, we don’t have markup in the spec; what we have are functions and those functions can produce markup if they want to. Some are said to be starters or terminators, opens or closes, but our spec doesn’t say how those are interpreted. They’re not required to be paired or balanced. +APP: To clarify, we don’t have markup in the spec; what we have are functions and those functions can produce markup if they want to. Some are said to be starters or terminators, opens or closes, but our spec doesn’t say how those are interpreted. They’re not required to be paired or balanced. MIH: You described functions being open or close; I look at these as being part of the whole placeholder. For example (from chat): + ``` -{b :html} .=> open +{b :html} .=> open {hr :html} => standalone ``` -There’s no real difference between placeholders and markup. +There’s no real difference between placeholders and markup. CMD: I’m going to void my comment and go dig in more. The nomenclature is kind of mixed up in the issue tracker and implementations. APP: it’s a relatively recent set of decisions, so you’ll see some hold-overs esp. In implementation-land, in terms of us trying to build up markup indirectly. TIM: [suggests Chris could add a document based on old minutes/etc. That explains this history, if it also helps him with his understanding, so the explanation would be in one place; submit a PR] -APP: there’s a docs/ directory in the spec repo; this could go there. If you think something needs clarification, feel free to submit a PR +APP: there’s a docs/ directory in the spec repo; this could go there. If you think something needs clarification, feel free to submit a PR ### Topic: Data model PR @@ -281,9 +286,9 @@ STA: I looked at the PR briefly. I know we care about the JSON schema, but we al USA: [...I missed some of this…] I’ve been working on documenting some of this. Especially with the function registry, there’s not much documentation/specification of the built-in functions. The number or datetime functions would need to have a similar feature set on all implementations. -APP: That’s an interesting question, and an important thing that we’ve talked about briefly before; are we going to define a standard function registry that implementors are required to implement, and define how they can extend it? We’ve had the discussion before and I’m a proponent of saying we should have a core set of functions; I think it would be a bad thing if there were different ways on different platforms to do the same thing. We have a clear understanding of how a bunch of these selectors should work. It’s fine for me if an implementation wants to extend something in a custom way, but I don’t want to have to change the set of arguments for (e.g.) the number formatters every time I change programming languages, templating languages, etc. This seems like an anti-pattern. I would prefer to have a standard core set. I tend to think we’ll want to make agreements on what the bag of options looks like. +APP: That’s an interesting question, and an important thing that we’ve talked about briefly before; are we going to define a standard function registry that implementors are required to implement, and define how they can extend it? We’ve had the discussion before and I’m a proponent of saying we should have a core set of functions; I think it would be a bad thing if there were different ways on different platforms to do the same thing. We have a clear understanding of how a bunch of these selectors should work. It’s fine for me if an implementation wants to extend something in a custom way, but I don’t want to have to change the set of arguments for (e.g.) the number formatters every time I change programming languages, templating languages, etc. This seems like an anti-pattern. I would prefer to have a standard core set. I tend to think we’ll want to make agreements on what the bag of options looks like. -EAO: regarding specifically the data model proposal: I picked JSON Schema because I wanted to pick something that would work as an interchange format for messages so different implementations could rely on this. I don’t think JSON schema is appropriate for defining the function registry. We have several different formats on the table that we’re using for solving different parts of this. +EAO: regarding specifically the data model proposal: I picked JSON Schema because I wanted to pick something that would work as an interchange format for messages so different implementations could rely on this. I don’t think JSON schema is appropriate for defining the function registry. We have several different formats on the table that we’re using for solving different parts of this. EAO: Mihai, where were you planning on documenting the data model? @@ -294,13 +299,10 @@ EAO: That would only be available in the process or program that is using ICU sp MIH: They are Java interfaces with objects implementing them. The interfaces are designed to follow the ideas we had way back with TypeScript, updated for the grammar as it was when I implemented it. {Open issue review was deferred] + ### Topic: Open Issue Review + https://github.com/unicode-org/message-format-wg/issues Currently we have 85 open. - - ## Topic: AOB? - - - diff --git a/meetings/2023/notes-2023-07-03.md b/meetings/2023/notes-2023-07-03.md index 7f216cad61..42b777fd0b 100644 --- a/meetings/2023/notes-2023-07-03.md +++ b/meetings/2023/notes-2023-07-03.md @@ -1,6 +1,7 @@ -# 03 July 2023 | MessageFormat Working Group Regular Teleconference +# 03 July 2023 | MessageFormat Working Group Regular Teleconference ### Attendees + - Addison Phillips - Unicode (APP) - chair - Staś Małolepszy - Google (STA) - Tim Chevalier - Igalia (TIM) @@ -13,13 +14,16 @@ Today’s Scribe: STA --- -## Agenda + +## Agenda + ### Topic: Agenda Review ### Topic: Info Share -* Presentation at CLDR event -* https://thenewstack.io/whats-next-for-javascript-new-features-to-look-forward-to/ -* EAO: PHP is interested in MF2. + +- Presentation at CLDR event +- https://thenewstack.io/whats-next-for-javascript-new-features-to-look-forward-to/ +- EAO: PHP is interested in MF2. ### Topic: Action Item Review @@ -41,13 +45,14 @@ PR_ Proposals: Make local variables use a different sigil - If yes, use one character or two? Which character(s)? +If yes, use one character or two? Which character(s)? Make local variables immutable? -Change open and close sigils to avoid -? -If yes, what sigils or sequences to use? +Change open and close sigils to avoid -? +If yes, what sigils or sequences to use? Should name, etc. use Nmtoken or some other rules? ### Topic: Discussion of default registry requirements + _An open question is whether MFv2 will provide a default registry of functions/selectors that implementations are required to implement. If such a registry were created, what should go in it (what are the inclusion criteria)? If we do not create a default registry, how will we prevent divergence of the syntax between implementations?_ **CONSENSUS:** to have a core registry @@ -59,7 +64,8 @@ MIH: propose text and proposed XML for default registry ## Notes ### Topic: Active PR review -Discussing https://github.com/unicode-org/message-format-wg/pull/404 + +Discussing https://github.com/unicode-org/message-format-wg/pull/404 STA: Any reason why there are 2 private-use sigils? @@ -77,7 +83,7 @@ STA: I'd like to ask to postpone merging; didn't have time to review yet. Topic: Discussion of default registry requirements -USA: https://notes.igalia.com/zIhRAUfURuWqRIa18kcjTQ?both#Others +USA: https://notes.igalia.com/zIhRAUfURuWqRIa18kcjTQ?both#Others EAO: Support anything that is a subset of JS. @@ -98,15 +104,15 @@ APP: General agreement to have the core registry; need more work to figure out w MIH: Propose the core registry functions in form of registry definitions. MIH: Draft of the strategy for how to extend and use the core registry. - ### Topic: Open/Close function syntax, naming, and immutability. + Discussing new sigil namespace for local variables. STA: Context: the goal is to enable static analysis and detect typos and referencing unknown local variables. EAO: Any additional sigil/symbol has a high cost to users. -APP: Separate sigils address the immutability question. If they are separate then it's clear that message arguments are immutable. +APP: Separate sigils address the immutability question. If they are separate then it's clear that message arguments are immutable. STA: Separate sigils make immutability of local variables orthogonal to immutability of message arguments (which is axiomatic). @@ -146,7 +152,7 @@ TIM: It sounds like we're discussing lazy vs. eager. Maybe we should start with ## Chat (verbatim) -``` +```` You 9:32 AM https://docs.google.com/document/d/1gJ92S0roqvXYmv7mmKb2ICQsZ5Z5XSn6WLgGFNcq6S0/edit @@ -234,7 +240,7 @@ Mihai ⦅U⦆ Niță 11:24 AM $foo = {13} $bar = {$foo} $foo = {$bar} harder to read, but not ambiguous -``` +```` — diff --git a/meetings/2023/notes-2023-07-10.md b/meetings/2023/notes-2023-07-10.md index 291b3ebc26..a20d46fe5b 100644 --- a/meetings/2023/notes-2023-07-10.md +++ b/meetings/2023/notes-2023-07-10.md @@ -1,17 +1,17 @@ -# 10 July 2023 | MessageFormat Working Group Regular Teleconference +# 10 July 2023 | MessageFormat Working Group Regular Teleconference ### Attendees -* Addison Phillips - Unicode (APP) - chair -* Chris Dieringer - Walmart (CMD) -* Eemeli Aro (EAO) -* Elango Cheran - Google (ECH) -* Matt Radbourne - Bloomberg (MRR) -* Mihai Niță - Google (MIH) -* Staś Małolepszy - Google (STA) -* Tim Chevalier - Igalia (TIM) -* Richard Gibson - OpenJSF (RGN) -* Zibi Braniecki - Amazon (ZBI) +- Addison Phillips - Unicode (APP) - chair +- Chris Dieringer - Walmart (CMD) +- Eemeli Aro (EAO) +- Elango Cheran - Google (ECH) +- Matt Radbourne - Bloomberg (MRR) +- Mihai Niță - Google (MIH) +- Staś Małolepszy - Google (STA) +- Tim Chevalier - Igalia (TIM) +- Richard Gibson - OpenJSF (RGN) +- Zibi Braniecki - Amazon (ZBI) Scribe: CMD, ECH @@ -23,7 +23,7 @@ No new community solicited topics ## Topic: Info Share -Introductions: +Introductions: CMD: From Customer Experience org. Would like to unify efforts in supporting our international footprint @@ -42,6 +42,7 @@ APP: provide pro/con comparison for immutability/namespacing discussion (all): Read https://github.com/unicode-org/message-format-wg/issues/299 ## Topic: Active PR review + _Discussion of active PRs. We will merge or reject them in the call._ > #414 @@ -95,10 +96,12 @@ MIH: (paraphrase) we need mapping into runtime primitives (not strings) APP: (paraphrase) Acknowledged, read thread https://github.com/unicode-org/message-format-wg/issues/41 ## Topic: Open Issue Review + https://github.com/unicode-org/message-format-wg/issues Currently we have 73 open. ## Topic: Interchange Data Model + _Let’s discuss @eemeli’s proposal for an interchange data model_ EAO: The purpose is to create an optional data model for the representation of a parsed message. @@ -109,7 +112,7 @@ MIH: We already attempted to codify the data model way back when, and we had con CMD: Something like this is definitely needed. We can focus on the syntax, but we need a target that compilers can use across platforms. Right now, everyone has to run their own compilations and source their own stuff. FormatToParts is another layer of serialization. I do want to see some sort of specification. -STA: I agree with CMD. I like that the PR exists because we would have an optional but canonical +STA: I agree with CMD. I like that the PR exists because we would have an optional but canonical ECH: Observations on conflicting terms, “optional vs canonical”, so what does it mean to have such a thing that is canonical yet optional. Also, we’ve been talking about the data model and trying to codify it for the majority of our time, so where did we lose the thread of this to focus on just the syntax? @@ -133,7 +136,7 @@ STA: We all agree that we want a description of the data model, but we don’t h EAO: Would making that change be sufficient? -MIH: No, the problem with JSON / JSON Schema / XML that we experienced previously was that they were not powerful enough to represent all of the concepts that we needed. For example, we want to indicate that a map is ordered, but that is not something +MIH: No, the problem with JSON / JSON Schema / XML that we experienced previously was that they were not powerful enough to represent all of the concepts that we needed. For example, we want to indicate that a map is ordered, but that is not something ## Topic: Open/Close function syntax, naming, and immutability. @@ -163,13 +166,10 @@ APP: We already did that when we deviated from `nmtoken`. EAO: I like the `:`, `+`, `-`. Those are sigils that have some meaning to people not yet familiar with MF 2.0. -MIH: To STA’s question about whether `+` and `-` are descriptions of the function or the placeholder, I see them as descriptions of the placeholder. The placeholder types are `OPEN` and `CLOSE` (where `STANDALONE` might be another such type). Whereas, it would be strange to consider them +MIH: To STA’s question about whether `+` and `-` are descriptions of the function or the placeholder, I see them as descriptions of the placeholder. The placeholder types are `OPEN` and `CLOSE` (where `STANDALONE` might be another such type). Whereas, it would be strange to consider them ## Topic: AOB? Next steps APP: As an input to the discussion for the next time, here is the [discussion from the W3C i18n WG](https://www.w3.org/2023/07/06-i18n-minutes.html#t07) regarding shadowing / im-/mutability of local variable definitions. Initially, they had one interpretation, but as they looked at it more, they changed their mind. - - - diff --git a/meetings/2023/notes-2023-07-24.md b/meetings/2023/notes-2023-07-24.md index 6ae5d6dfad..80579c409f 100644 --- a/meetings/2023/notes-2023-07-24.md +++ b/meetings/2023/notes-2023-07-24.md @@ -1,15 +1,15 @@ # MessageFormat WG teleconference 2023-07-24 ### Attendees -* Addison Phillips - Unicode (APP) - chair -* Tim Chevalier - Igalia (TIM) -* Eemeli Aro (EAO) -* Mihai Niță - Google (MIH) -* Ujjwal Sharma (USA) -* Staś Małolepszy - Google (STA) -Scribe: USA +- Addison Phillips - Unicode (APP) - chair +- Tim Chevalier - Igalia (TIM) +- Eemeli Aro (EAO) +- Mihai Niță - Google (MIH) +- Ujjwal Sharma (USA) +- Staś Małolepszy - Google (STA) +Scribe: USA To request that the chair add an issue to the agenda, add the label Agenda+ To request that the chair add an agenda item, send email to the message-format-wg group email. @@ -25,8 +25,8 @@ https://github.com/tc39/proposal-intl-messageformat/pull/22 APP: Been performing a cleanup of the repo, don’t be alarmed. Will move the agenda to the wiki since it’s a bit noisy to commit for each update to the agenda. - ## Topic: Action Item Review + MIH: propose text and proposed XML for default registry APP: provide pro/con comparison for immutability/namespacing discussion @@ -34,37 +34,38 @@ APP: provide pro/con comparison for immutability/namespacing discussion APP: Did about half of that in an issue but got sidetracked, should manage to have a productive discussion, let’s see. ## Topic: Active PR review + Discussion of active PRs. We will merge or reject them in the call. The recommendation "discuss" is to ensure there is WG consensus before merging. The recommendation "merge with edits" is to merge once existing comments have been addressed. Discussion of active PRs. We will merge or reject them in the call. -* PR #399 +- PR #399 EAO: Stas mentioned that we should hold off on this. STA: Would appreciate it if we could hold off on this for a bit. -* PR #432 +- PR #432 EAO: In favor, but want to highlight but we can’t add a selector with a key “<10” later for instance. The cases where expression for the selector outputs a list of keys for the inputs. Just wanted to highlight it so nobody is surprised later. -STA: While this indeed simplifies the signature, it requires the selector to pass all the values in a bag so it disallows the “mini DSL” use case. +STA: While this indeed simplifies the signature, it requires the selector to pass all the values in a bag so it disallows the “mini DSL” use case. APP: A better way to express it is by making the result ordered by preference since some lists might as well be really big. EAO: I would suggest you two to check out the PR since we’d need to address the issues with selectors one way or another. -* PR #431 +- PR #431 APP: This is ready but we should revisit this later. MIH: These changes were introduced three days ago, process-wise, should there be a deadline? -* PR #421 +- PR #421 EAO: I would prefer you to merge it and make follow-on changes later. -* PR #420 +- PR #420 EAO: Happy to see this be merged, we can iterate on it further. @@ -76,7 +77,7 @@ STA: Let’s break it up in a few issues. USA: Agreed, let’s iterate on this once we have it merged. -* PR #419 +- PR #419 EAO: In the spirit of what STA mentioned, do we have an issue for aligning around errors and how they’re defined? @@ -85,9 +86,10 @@ APP: I don’t think so, but everything should follow that formatting, so no nee APP: Will resolve conflicts and merge this. ## Topic: Summary of ad-hoc of 2023-07-21 + A small group (@mihnita, @stasm, @eemeli, @macchiati, @aphillips) met on Friday to discuss #425, primarily the problem of "default" selectors. Let's discuss the results of that call. -APP: *introduces the resolution* +APP: _introduces the resolution_ EAO: I’d let MIH express himself, but it wasn’t a unanimous decision. @@ -161,37 +163,40 @@ APP: We’ve imported typing then, I don’t think that’s our intent. We need MIH: I think we should leave it up to the implementations. They could throw an error if they find the input unwieldy. ## Topic: Refactoring spec.md + @aphillips is proposing to refactor spec.md. Let’s discuss whether to pursue this further. See #429 (discussion, decided to proceed) ## Topic: Use quotes instead of pipes for quoting literals (#414) + @eemeli is proposing to change the quote character from | to single/double quotes -EAO: *explains the change* +EAO: _explains the change_ APP: Nobody loves the pipes but they get the job done and the onus is on Eemeli to prove that we need to reopen this consensus. STA: In our syntax.md doc, we have a goal that says “easily embeddable in any context” ## Topic: Open Issue Review + https://github.com/unicode-org/message-format-wg/issues Currently we have 83 open (up from 73). 17 resolve-candidate ## Topic: Open/Close function syntax, naming, and immutability. + We have multiple proposals for open/close function markup, including the current scheme (+function/-function). Let's resolve how to support open/close functionality. These proposals partly exist to address the problem of negative literals, given our use of -function currently. We have also been discussing whether let statements should be immutable. If they are immutable, there is a proposal that they use a different sigil from $ or that they use a two-character sigil (such as $$localVar). Note that separating the sigil allows for static analysis of local variables as called out by #403. This can be a separate concern from whether they are immutable. - ## Topic: Discussion of default registry requirements MIH: propose text and proposed XML for default registry ## Topic: AOB? - #### Link Farm: + https://github.com/unicode-org/message-format-wg/issues/310#issuecomment-1646670556 https://docs.google.com/document/d/13JVPTuhs_SJXWcsSpjFWNIVk3o-T1DQI30RX0qyeK5k/edit diff --git a/meetings/2023/notes-2023-08-07.md b/meetings/2023/notes-2023-08-07.md index d1b983508f..7c425fc9e4 100644 --- a/meetings/2023/notes-2023-08-07.md +++ b/meetings/2023/notes-2023-08-07.md @@ -1,12 +1,13 @@ -# 07 August 2023 | MessageFormat Working Group Regular Teleconference +# 07 August 2023 | MessageFormat Working Group Regular Teleconference ### Attendees -* Addison Phillips - Unicode (APP) - chair -* Eemeli Aro (EAO) -* Mihai Niță - Google (MIH) -* Ujjwal Sharma (USA) -* Staś Małolepszy - Google (STA) -* Matt Radbourne - Bloomberg (MRR) + +- Addison Phillips - Unicode (APP) - chair +- Eemeli Aro (EAO) +- Mihai Niță - Google (MIH) +- Ujjwal Sharma (USA) +- Staś Małolepszy - Google (STA) +- Matt Radbourne - Bloomberg (MRR) Scribe: MRR @@ -14,9 +15,11 @@ To request that the chair add an issue to the agenda, add the label Agenda+ To r Topic: Agenda Review ## Call for topics + EAO - Sigil expiration ## Topic: Info Share + EAO - Thoroughly refactored intl-messageformat proposal The Intl.MF API refactoring PR: https://github.com/tc39/proposal-intl-messageformat/pull/22 Topic: Action Item Review @@ -26,6 +29,7 @@ APP: provide pro/con comparison for immutability/namespacing discussion Still open - Could do with input from STA ## Topic: Active PR review + Discussion of active PRs. We will merge or reject them in the call. The recommendation "discuss" is to ensure there is WG consensus before merging. The recommendation "merge with edits" is to merge once existing comments have been addressed. Discussion of active PRs. We will merge or reject them in the call. @@ -67,6 +71,7 @@ EAO - If we give it the variable sigil, it’s not clear if we’re talking abou APP - Not forgetting that there are external variables that are not declared. STA - Is there middle ground: + ``` variable-declaration = let s variable-name [s] "=" [s] expression variable-reference = variable-name @@ -191,11 +196,10 @@ EAO - Need Tim for this. Let’s skip. ## Topic: Sigils, immutability, naming -@stasm has produced a draft discussing requirements for open/close, function naming, etc. This appears to be a useful starting point for a discussion. See here: https://github.com/stasm/message-format-wg/blob/sigils/exploration/sigils.md +@stasm has produced a draft discussing requirements for open/close, function naming, etc. This appears to be a useful starting point for a discussion. See here: https://github.com/stasm/message-format-wg/blob/sigils/exploration/sigils.md See #449 for a proposal from Addison Phillips to put open/close out of scope for MF2 1.0. - EAO - What we achieved we agreement on was the syntax but not explicitly defining what they mean other than they can use a function with open/close. I don’t think we need to define more than what we have already. Would like to have it in the release. STA - I became frustrated about the volume of discussion vs volume of requirements and agreement. This is underspecified. We should remove for now but there are items that are very important for some people. We should try to focus on them. I don’t know how much time we have to agree though. @@ -204,7 +208,7 @@ MIH - Always against the fact that open/close are properties of functions. They APP - We need to understand the requirements and work from there. We don’t have any open/close functions at present. There are disagreements about the form of open/close, open/close vs markup-detection. This sounds like a place we’d spend a great deal of time. Is it strictly necessary? We wouldn’t push them into ‘reserve’, push them into ‘soon’. -EAO - The messages I care about: 1. have a named variable reference in a message. 2nd most common: - markup - the use case I have for open/close. About 7% of all the messages I’m working with at Mozilla, especially with the level of agreement we have. It works fine and we should not be getting rid of open/close. +EAO - The messages I care about: 1. have a named variable reference in a message. 2nd most common: - markup - the use case I have for open/close. About 7% of all the messages I’m working with at Mozilla, especially with the level of agreement we have. It works fine and we should not be getting rid of open/close. APP - If we want to keep it in scope, that’s fine but we need to solve all the problems, even if it’s a temporary solution. It needs to be enough that people can implement and we don’t then break what we ship in the near-term. @@ -271,6 +275,7 @@ I want syntax changes to be an afterthought. Conscious decision, not happenstanc MIH - Don’t care about syntax changes. We have PRs that we submit without full agreement. We end up with things in the registry without full agreement. If we do this, it should be easy to change. ## Actions -* F2F @ W3C TPAC in Seville -* Meetings to be weekly -* ACTION: addison: comparo matrix for +/- + +- F2F @ W3C TPAC in Seville +- Meetings to be weekly +- ACTION: addison: comparo matrix for +/- diff --git a/meetings/2023/notes-2023-08-14.md b/meetings/2023/notes-2023-08-14.md index cced395655..2ce18e6dff 100644 --- a/meetings/2023/notes-2023-08-14.md +++ b/meetings/2023/notes-2023-08-14.md @@ -1,14 +1,15 @@ -14 August 2023 | MessageFormat Working Group Regular Teleconference +14 August 2023 | MessageFormat Working Group Regular Teleconference Attendees: Please fill “attendee” block with your name, affiliation and a 3-letter acronym for the scribe to use (see examples in “previous attendees”): ### Attendees -* Addison Phillips - Unicode (APP) - chair -* Eemeli Aro (EAO) -* Mihai Niță - Google (MIH) -* Ujjwal Sharma (USA) -* Staś Małolepszy - Google (STA) -* Matt Radbourne - Bloomberg (MRR) + +- Addison Phillips - Unicode (APP) - chair +- Eemeli Aro (EAO) +- Mihai Niță - Google (MIH) +- Ujjwal Sharma (USA) +- Staś Małolepszy - Google (STA) +- Matt Radbourne - Bloomberg (MRR) Scribe: MIH @@ -22,16 +23,19 @@ EAO: Browser extensions considering adopting MF2. EAO: there is interest from Apple and Mozilla for a localisation file format for extension l10n, that supports MF2. ## Topic: Action Item Review + [ ] +/- comparison (addison) APP: In progress with STA https://docs.google.com/document/d/1IHODjPLh_b2pcZlH3KbAdIMXniGKLFzIv_LWsrO9R0E/edit?resourcekey=0-EilhrcWYrQ90N632fB8mGg#heading=h.frldpg1ui9ww ## Topic: F2F Planning + Everybody is going to Seville, right? Here’s who I have so far: -* Addison -* Eemeli -* Mihai -* Stas + +- Addison +- Eemeli +- Mihai +- Stas Regrets: Ujjwal @@ -43,9 +47,9 @@ Regrets: Ujjwal https://github.com/unicode-org/message-format-wg/issues -* Currently we have 70 open (was 72 last time). -* 0 are resolved-candidate and proposed for close. -* 6 are Agenda+ and proposed for discussion. +- Currently we have 70 open (was 72 last time). +- 0 are resolved-candidate and proposed for close. +- 6 are Agenda+ and proposed for discussion. Synopsis of #425: This has a really long thread. There is a subsidiary issue (#433) about what we name the default selector (:string, :select, or :equals). A key issue in this thread is whether formatters can also be selectors and, if so, what the default selector is for a given function (e.g. is the default for :number the equivalent of :plural). We previously settled that selectors MUST have an annotation. @@ -63,7 +67,7 @@ Need a document summarizing what we need and what to include in the registry. Start with capabilities (requirements). Then discuss what the options are. Maybe 3 categories: what we must include, we would like to, and we don’t include in 2.0. - + What's left to discuss on markup? #375 The last comment contains a list of items contributed by cdaringe. @@ -86,25 +90,33 @@ Last one: something you can set on the whole message. Another one that Amazon us EAO: the use case I give is a fragment that is in a different language than the rest of the message. APP: + ``` you say 'yes' in French as oui ``` EAO: Could also have: + ``` {In French, "{|bonjour| @locale=fr}" is a greeting} ``` + Or + ``` {In French, "{|bonjour| @locale=fr @canCopy=true}" is a greeting} ``` + MIH in chat :-: Why not: + ``` {In French, "{|bonjour| @canCopy=true :string locale=fr }" is a greeting} ``` + EAO in chat: + ``` {In French, "{|bonjour| :string @locale=fr}" is a greeting} ``` @@ -116,10 +128,13 @@ Locale, id, direction, canCopy & Co? Even if we don’t specify them all in 2.0, are they enough to make a case for non-function attributes? APP: + ``` "@" name "=" unquoted-literal ``` + EAO: + ``` @locale=$foo ``` @@ -154,10 +169,8 @@ STA: one of the main reasons for open / close is tooling, especially in the tool AOE: the proposal to tc39 is that the registry does not care about the registry. The work happens above Intl.MessageFormat. - ## Topic: AOB? - Link Farm: https://github.com/unicode-org/message-format-wg/issues/310#issuecomment-1646670556 diff --git a/meetings/chair-group/chair-group-notes-2020-05-11.md b/meetings/chair-group/chair-group-notes-2020-05-11.md index 3620b9de4b..6c5fc2f72d 100644 --- a/meetings/chair-group/chair-group-notes-2020-05-11.md +++ b/meetings/chair-group/chair-group-notes-2020-05-11.md @@ -1,17 +1,18 @@ - # Chair Group Meeting ### Agenda - - Retrospective from last meeting - - Choose next meeting moderator - - Review and share tasks related to chair group management - - Manage MFWG mail list - - Agenda for Chair Group meetings - - Define next meeting agenda - - Review Backlog and open issues - - Label issues - + +- Retrospective from last meeting + - Choose next meeting moderator +- Review and share tasks related to chair group management + - Manage MFWG mail list + - Agenda for Chair Group meetings +- Define next meeting agenda +- Review Backlog and open issues + - Label issues + ### Actions + - Prepare a brief reminder on how to work with TCQ to shared before every meeting [#56](https://github.com/unicode-org/message-format-wg/issues/56) - Prepare(@stasm) PR && Review PR for Goals and Non-Goals(Cont.) [#59](https://github.com/unicode-org/message-format-wg/issues/59) - Try to find collaborators to take notes(Pablo will join use next meetings) @@ -20,11 +21,13 @@ - Schedule Chair Group Meetings - We should start backlog review/clean up after the definition of goals and non-goals -## Decisions +## Decisions + - Group decided in continue with only on chair-group meeting, that will take place the week after of the MFWG one. - Next Meeting moderator @echeran -### Next Meeting Agenda 2020-05-18: +### Next Meeting Agenda 2020-05-18: + - Goals and Non-Goals(Cont.)[#59](https://github.com/unicode-org/message-format-wg/issues/59) - Why MessageFormat needs a successor[#49](https://github.com/unicode-org/message-format-wg/issues/49) - Review Terminology [#80](https://github.com/unicode-org/message-format-wg/issues/80) diff --git a/meetings/chair-group/chair-group-notes-2020-06-22.md b/meetings/chair-group/chair-group-notes-2020-06-22.md index 21bbb2da0f..33daef6084 100644 --- a/meetings/chair-group/chair-group-notes-2020-06-22.md +++ b/meetings/chair-group/chair-group-notes-2020-06-22.md @@ -1,48 +1,46 @@ - # Chair Group Meeting ### Agenda - - Retrospective from last meeting - - Choose next meeting moderator (David) ✅ - - Review and share tasks related to chair group management - - Manage MFWG mail list - - Agenda for Chair Group meetings - - Define next meeting agenda - - Review Backlog and open issues - - Plan for design principles - - Round of presentations - - David(XLIFF) [Slides](https://docs.google.com/presentation/d/1MZwkBUnp4hhbGVoWtJ_e1QCIRp07m7DZ5617G8AkfVg/edit?usp=sharing) - - George (SIRI) - - Mihai - [Slides](https://docs.google.com/presentation/d/19US44GNyPRn_oOTTrQHX2frSvm-7XQsTYTpqRPEWtdU/edit#slide=id.g1f2f46a3ff_0_140) - - More sync channel for message format ✅ - - Review [icu4x](https://github.com/unicode-org/icu4x) ✅ - - - + +- Retrospective from last meeting + - Choose next meeting moderator (David) ✅ +- Review and share tasks related to chair group management + - Manage MFWG mail list + - Agenda for Chair Group meetings +- Define next meeting agenda +- Review Backlog and open issues +- Plan for design principles +- Round of presentations + - David(XLIFF) [Slides](https://docs.google.com/presentation/d/1MZwkBUnp4hhbGVoWtJ_e1QCIRp07m7DZ5617G8AkfVg/edit?usp=sharing) + - George (SIRI) + - Mihai - [Slides](https://docs.google.com/presentation/d/19US44GNyPRn_oOTTrQHX2frSvm-7XQsTYTpqRPEWtdU/edit#slide=id.g1f2f46a3ff_0_140) +- More sync channel for message format ✅ +- Review [icu4x](https://github.com/unicode-org/icu4x) ✅ + ### Actions -Concrete message format use case #93 + +Concrete message format use case #93 + - Selectors and Placeholders(Number Format) using the [example](https://github.com/unicode-org/message-format-wg/blob/c23e1bcba06b2a34d6f077d93d3dab213bc76d33/exploration/variants.md) - - How to serialize data model ? - - Use cases of data model used by different implementations([Slides with examples](https://docs.google.com/presentation/d/1RujNFCq3gH9TUEKDB_uFdKWNG1A1j2_NBCdnTmnEqv0/edit?usp=sharing))? - + - How to serialize data model ? + - Use cases of data model used by different implementations([Slides with examples](https://docs.google.com/presentation/d/1RujNFCq3gH9TUEKDB_uFdKWNG1A1j2_NBCdnTmnEqv0/edit?usp=sharing))? + ### Actions for August Meeting + - Work in presentation to fit both in 20-25 minutes timebox - -## Decisions -- Elango shared information about icu4x project ✅ -- More sync channel for message format, we decide to use the unicode slack (Channel is already working) ✅ -- Round of presentations David & Mihai ✅ +## Decisions + +- Elango shared information about icu4x project ✅ +- More sync channel for message format, we decide to use the unicode slack (Channel is already working) ✅ +- Round of presentations David & Mihai ✅ +### Interesting info +- [i18n Namespace](https://w3c.github.io/json-ld-syntax/#the-i18n-namespace) + +### Next Meeting Agenda 2020-07-20: - - - ### Interesting info - - [i18n Namespace](https://w3c.github.io/json-ld-syntax/#the-i18n-namespace) - -### Next Meeting Agenda 2020-07-20: - Chair Group Status - POC's progress overview -- Zibi's Experimental AST Overview - +- Zibi's Experimental AST Overview diff --git a/meetings/chair-group/chair-group-notes-2020-07-27.md b/meetings/chair-group/chair-group-notes-2020-07-27.md index b407dc522c..fe8a3dea50 100644 --- a/meetings/chair-group/chair-group-notes-2020-07-27.md +++ b/meetings/chair-group/chair-group-notes-2020-07-27.md @@ -1,30 +1,33 @@ # Chair Group Meeting ### Agenda - - Retrospective from last meeting - - Choose next meeting moderator(We should add all github user on TCQ) - Mihai - - Review Backlog and open issues - - Label issues - - Pending actions from Monthly meeting - (Summary about pending issues on slack) - * Open / Closed / List Other list of available selectors - * Top-level "MultiVariant" vs nested "SelectorExpression" in "Placeholder" - - * Defaults - * Multi-selector variants - - - Review and share tasks related to chair group management - - Manage MFWG mail list - - Define next meeting agenda - - + +- Retrospective from last meeting + - Choose next meeting moderator(We should add all github user on TCQ) - Mihai +- Review Backlog and open issues + - Label issues +- Pending actions from Monthly meeting + (Summary about pending issues on slack) + + - Open / Closed / List Other list of available selectors + - Top-level "MultiVariant" vs nested "SelectorExpression" in "Placeholder" - + - Defaults + - Multi-selector variants + +- Review and share tasks related to chair group management +- Manage MFWG mail list +- Define next meeting agenda + ### Actions -- Agenda : - Moderator must create the next TCQ meeting + +- Agenda : + Moderator must create the next TCQ meeting - Prepare a proposal for labelling issues and use projects in Github [#111](https://github.com/unicode-org/message-format-wg/issues/111) - Add Chair Group members as Git members - Actions for Pending Issues [#110](https://github.com/unicode-org/message-format-wg/issues/110) - * Create task forces (Small Groups) to do the work to be presented at MFWG Meetings their responsibilities are: - + - Create task forces (Small Groups) to do the work to be presented at MFWG Meetings their responsibilities are: + + - Is the owner of the Task(Issue/Project) - Organize meetings and documentation - Reports the results to all group diff --git a/meetings/task-force/#103-2020-08-31.md b/meetings/task-force/#103-2020-08-31.md index 10592681b4..823e9cfd4c 100644 --- a/meetings/task-force/#103-2020-08-31.md +++ b/meetings/task-force/#103-2020-08-31.md @@ -1,6 +1,7 @@ ## Executive summary ([Original Doc](https://docs.google.com/document/d/1-6t6Yl5RHZI9QZwBDrFrl1fqSKSA4IMs1ef60IxD3lU/edit#)) -*Participants:* +_Participants:_ + - DAF: David Filip - ECH: Elango Cheran - MIH: Mihai Nita @@ -14,6 +15,7 @@ The discussion was focused on issue [#103](https://github.com/unicode-org/messag Despite the title of issue #103, the main topic of discussion was more precisely: supporting in-message and full-message selectors vs full-message selectors only. ### Full-message selection: + ``` ICU: {count, plural, =1 {You deleted # file from the folder {folder_name}!} @@ -63,101 +65,89 @@ Hard to track what happens on the boundary of segments Harder to grep for text Integration with TMS is more difficult - - > Approval Stamps for Executive Summary -*ECH,NIC,DAF,SMY,RCA* - - +_ECH,NIC,DAF,SMY,RCA_ ## Notes from meeting on 2020-08-31 +SMY: We have to be careful about decisions because of tradeoffs. Flexibility vs disallowing bad practices. -SMY: We have to be careful about decisions because of tradeoffs. Flexibility vs disallowing bad practices. - -MIH: It’s like programming languages (ex: static vs dynamic). - -NIC: We’ve tried to document pros and cons. Maybe we can find more concrete factual data to support each argument. If we want to support this in TMS, then we need to do work to integrate. +MIH: It’s like programming languages (ex: static vs dynamic). +NIC: We’ve tried to document pros and cons. Maybe we can find more concrete factual data to support each argument. If we want to support this in TMS, then we need to do work to integrate. MIH: Did I capture the similarities and differences correctly in the updated comparison slides doc? Slides doc: https://docs.google.com/presentation/d/1xi4cyLmLVADNXSb-xNtZoz1xZNbcwQa4khuqt-21MOo/edit#slide=id.g94e31aa88b_1_0 -SMY: Edge cases = the combinatorial number of combinations of cases. Let’s start with NIC’s suggestion of specific use cases +SMY: Edge cases = the combinatorial number of combinations of cases. Let’s start with NIC’s suggestion of specific use cases -MIH: Core decision to make is “do we allow selector type of constructs within the message or not” ? In other words, do we let our message pattern cases be full sentences +MIH: Core decision to make is “do we allow selector type of constructs within the message or not” ? In other words, do we let our message pattern cases be full sentences -DAF: From a round-trip localization point of view, I am for the message-level selectors. I don’t think the other approach is possible for GUIs. +DAF: From a round-trip localization point of view, I am for the message-level selectors. I don’t think the other approach is possible for GUIs. -MIH: I’ve seen that, actually, but it is very unfriendly to translators. You as the translator have to coordinate translations across sub-message parts to ensure subject / verb / inflection agreement, and it is very difficult. Also, it ruins integrations with Translation Memories (TMs). +MIH: I’ve seen that, actually, but it is very unfriendly to translators. You as the translator have to coordinate translations across sub-message parts to ensure subject / verb / inflection agreement, and it is very difficult. Also, it ruins integrations with Translation Memories (TMs). ZIB: (chat) in context of what Nicolas said, I deeply recommend https://www.youtube.com/watch?v=2ajos-0OWts -ZIB: I think the fundamental question is are we designing a system for today or for 10 years from now? Role model example is CSS. Previously, they questioned CSS as unnecessary and unsupported. +ZIB: I think the fundamental question is are we designing a system for today or for 10 years from now? Role model example is CSS. Previously, they questioned CSS as unnecessary and unsupported. -MIH: In the localization industry, things move very slowly. So I think 10 years from now, things will look similar to what they are now, just my opinion. +MIH: In the localization industry, things move very slowly. So I think 10 years from now, things will look similar to what they are now, just my opinion. -ZIB: Responding, wearing an optimistic hat. Unprecedented situation, we have the backing of Unicode and W3C and avenue for landing straight into JS. Landing into JS gives us adoption by the largest pool of developers and reach. We have more push power than before. +ZIB: Responding, wearing an optimistic hat. Unprecedented situation, we have the backing of Unicode and W3C and avenue for landing straight into JS. Landing into JS gives us adoption by the largest pool of developers and reach. We have more push power than before. -SMY: Just as ZIB is an optimist, I’m perceived as a pessimist, which is why we work very well. I urge the side of modesty and caution. We got here to this meeting to discuss how selection works, but it is good to discuss principles of design. The crucial difference between CSS, OpenGL, and localization is that there were millions of dollars to be made with CSS and OpenGL, but localization is already established. But the quality of translation is not from supporting 4 plurals at the same time, but the quality of the text that we put in there. So we have to support a format that enables translators to write great prose translations. +SMY: Just as ZIB is an optimist, I’m perceived as a pessimist, which is why we work very well. I urge the side of modesty and caution. We got here to this meeting to discuss how selection works, but it is good to discuss principles of design. The crucial difference between CSS, OpenGL, and localization is that there were millions of dollars to be made with CSS and OpenGL, but localization is already established. But the quality of translation is not from supporting 4 plurals at the same time, but the quality of the text that we put in there. So we have to support a format that enables translators to write great prose translations. -MIH: I would like to exclude from discussion about what tooling will look like in 10 years. Both CSS and OpenGL are developer oriented technologies. What is not going away for localization is the linguistic part -- some languages will require one to bring information from outside to inside a selector. The main argument is that the 2 selection models are perfectly equivalent, they allow the same things, and I have code that can convert between the 2. +MIH: I would like to exclude from discussion about what tooling will look like in 10 years. Both CSS and OpenGL are developer oriented technologies. What is not going away for localization is the linguistic part -- some languages will require one to bring information from outside to inside a selector. The main argument is that the 2 selection models are perfectly equivalent, they allow the same things, and I have code that can convert between the 2. -DAF: I would agree that they are equivalent in expressivity, and certainly there is a difference in verbosity. But verbosity also makes things translation friendly, and the verbosity won’t cost extra money due to TM leveraging. +DAF: I would agree that they are equivalent in expressivity, and certainly there is a difference in verbosity. But verbosity also makes things translation friendly, and the verbosity won’t cost extra money due to TM leveraging. -SMY: I wanted to add on what DAF said to a similar effect. I’ve seen the rise of Translation Memory (TM) and machine translation (MT). So I would be interested in which model supports these functionalities better. +SMY: I wanted to add on what DAF said to a similar effect. I’ve seen the rise of Translation Memory (TM) and machine translation (MT). So I would be interested in which model supports these functionalities better. DAF: I asked in my first monthly meeting if we’re trying to build a universal Rule Based translation engine, and it sounds like we are, in some ways. MIH: If both approaches map to each other exactly, choosing one over the other right now doesn’t mean we can’t change it later. -And it doesn’t prevent us from having a different syntax that has “internal selection” (nested selector sub-messages). We can still convert to the data model. +And it doesn’t prevent us from having a different syntax that has “internal selection” (nested selector sub-messages). We can still convert to the data model. SMY: For conversion, this is a lossy conversion, right? -DAF: I would support a layered approach. At the syntax level, we can support the nested selector approach in syntax and then have full-message messages when sending off to translation. -In reaction to SMY, the conversion would not be necessarily lossy. In L10n we work with this paradigm of extraction and merging. L10n happens between these brackets. And it is always assumed that merger has the full knowledge of the extraction process. So you should be able to work losslessly in your proprietary bracket, knowing what you have done to expand to the canonical format.. +DAF: I would support a layered approach. At the syntax level, we can support the nested selector approach in syntax and then have full-message messages when sending off to translation. +In reaction to SMY, the conversion would not be necessarily lossy. In L10n we work with this paradigm of extraction and merging. L10n happens between these brackets. And it is always assumed that merger has the full knowledge of the extraction process. So you should be able to work losslessly in your proprietary bracket, knowing what you have done to expand to the canonical format.. -ZIB: I am mesmerized by the sentiment that we are talking about having the data model user-friendly and the syntax is okay to be verbose. Shouldn’t the syntax be user-friendly and allow the data model to handle more and be more flexible? +ZIB: I am mesmerized by the sentiment that we are talking about having the data model user-friendly and the syntax is okay to be verbose. Shouldn’t the syntax be user-friendly and allow the data model to handle more and be more flexible? -MIH: Yes, and no. There are no “generic humans” in this equation. Developers can handle certain syntax that translators cannot, we can’t put them into the same bucket. What we can’t currently handle in the data model but I see in some proprietary tools is to automatically add missing plural cases. In this example, you really need to do this at the full-message level. +MIH: Yes, and no. There are no “generic humans” in this equation. Developers can handle certain syntax that translators cannot, we can’t put them into the same bucket. What we can’t currently handle in the data model but I see in some proprietary tools is to automatically add missing plural cases. In this example, you really need to do this at the full-message level. -DAF: I think there is some misunderstanding in hearing ZIB’s reaction. It’s an interoperability effort. We are trying to come up with an interoperability vehicle, not prescribe the whole thing. We can have the more verbose representation be the canonical, but syntax can always be adjusted as we see fit. So in your proprietary bracket, you can make the syntax be whatever you want, so long as you can convert to the canonical representation. +DAF: I think there is some misunderstanding in hearing ZIB’s reaction. It’s an interoperability effort. We are trying to come up with an interoperability vehicle, not prescribe the whole thing. We can have the more verbose representation be the canonical, but syntax can always be adjusted as we see fit. So in your proprietary bracket, you can make the syntax be whatever you want, so long as you can convert to the canonical representation. -SMY: So to make sense of that for the Fluent case, it sounds like Fluent can have its own syntax, so long as it can be converted into the canonical format. Right? +SMY: So to make sense of that for the Fluent case, it sounds like Fluent can have its own syntax, so long as it can be converted into the canonical format. Right? -DAF: Yes. Example: Markdown is simplified, but you can always convert to HTML, and you can allow plain HTML in Markdown to support the full range of HTML. +DAF: Yes. Example: Markdown is simplified, but you can always convert to HTML, and you can allow plain HTML in Markdown to support the full range of HTML. MIH: For me, the core question is do we support selection happening within (nested in) the message pattern, or does it happen at the message level. -DAF: I think having the internal selection would be too costly. If we support internal selection as part of the standard, it would need to create too many syntax peculiarities. +DAF: I think having the internal selection would be too costly. If we support internal selection as part of the standard, it would need to create too many syntax peculiarities. MIH: Should we try to list the pros and cons of each approach on the topic of allowing nested selectors? - - -| Message Level Selectors | | Sub-selectors | | -| :---: | --- | :---: | --- | -| Pros | Cons | Pros | Cons | -| Friendly for translators

L10n tools friendly

More “implementor” friendly | Unfriendly for developers (verbose)

Verbose (to move through the wire)

Developers must decide to make a message a selector even if the target language doesn’t need it. | Friendly for developers | Unfriendly for translators

Hard to track what happens on the boundary of segments

Harder to grep for text

Integration with TMS is more difficult | - - - +| Message Level Selectors | | Sub-selectors | | +| :----------------------------------------------------------------------------------------: | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :---------------------: | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| Pros | Cons | Pros | Cons | +| Friendly for translators

L10n tools friendly

More “implementor” friendly | Unfriendly for developers (verbose)

Verbose (to move through the wire)

Developers must decide to make a message a selector even if the target language doesn’t need it. | Friendly for developers | Unfriendly for translators

Hard to track what happens on the boundary of segments

Harder to grep for text

Integration with TMS is more difficult | MIH: Nesting breaks a lot of things in a TMS, like validation (do the number of placeholders match?), not just translation (MT) and memory (TM). - ============================= #### ICU MF (source lang) ``` -You have {COUNT, plural, +You have {COUNT, plural, one {1 unread message} other {{COUNT} unread messages}} in your inbox. @@ -172,11 +162,10 @@ CASES: { ``` -SMY: Can I bring your attention to discussion from pros and cons to the issue of whether we can have the data model be the backend and having syntaxes be the front end? +SMY: Can I bring your attention to discussion from pros and cons to the issue of whether we can have the data model be the backend and having syntaxes be the front end? In the example of translating to Polish, do we want this to be driven by the translator themselves, or do we always try to represent the maximum number of plural cases? - #### Canonical data model MFWG (target lang) ``` @@ -190,12 +179,12 @@ CASES: { ICU MF (target lang) -{COUNT, plural, +{COUNT, plural, one {Masz 1 wiadomość w skrzynce odbiorczej.} few {Masz {COUNT} wiadomości w skrzynce odbiorczej.} other {Masz {COUNT} wiadomości w skrzynce odbiorczej.} -Masz {COUNT, plural, +Masz {COUNT, plural, one {1 wiadomość} few {{COUNT} wiadomości} other {{COUNT} wiadomości} @@ -203,8 +192,7 @@ w skrzynce odbiorczej. ``` - -MIH: I think we always want to represent the maximum number of plural cases. Translators always should be shown the full set of translations, and let TM handle the redundancy. +MIH: I think we always want to represent the maximum number of plural cases. Translators always should be shown the full set of translations, and let TM handle the redundancy. Now, if we want to collapse the representation in syntax to a more compact form, that can be handled by the TMS as necessary (ex: can’t afford the extra bandwidth caused by verbosity). @@ -220,26 +208,25 @@ SMY: You can encode very complex sentences in a succinct manner? MIH: This is just a restatement of verbosity. -These are edge cases, really. I tried to count in our internal codebase how many messages have selectors, and it was maybe 2-3%, sometimes 5% depending on the application, so let’s say 5%. +These are edge cases, really. I tried to count in our internal codebase how many messages have selectors, and it was maybe 2-3%, sometimes 5% depending on the application, so let’s say 5%. -ZIB: I think the argument you made is a fallacy. There is a lot of tooling that make selectors discouraged and thus less likely. If we had more representation of selection, then it would make more natural sounding messages in other languages. English might be an exception where selection is less necessary and most other languages have more selection messages than English. +ZIB: I think the argument you made is a fallacy. There is a lot of tooling that make selectors discouraged and thus less likely. If we had more representation of selection, then it would make more natural sounding messages in other languages. English might be an exception where selection is less necessary and most other languages have more selection messages than English. SMY: So it’s a self-fulfilling prophecy to say that because we don’t have it, we don’t need it. DAF: It can cost you market share in Japan or Poland if you don’t have natural-sounding translations, even if for instance Czech Republic is more forgiving. -ZIB: It’s hard to get signal on direct correlation of impact of translation and - -Closest thing to having data is Paypal. A person at Paypal was exploring using Fluent for translation, because they saw that the older users had less trust of the product if the translation was not natively written. +ZIB: It’s hard to get signal on direct correlation of impact of translation and -SMY: I agree, although I wasn’t trying to be data-driven here. The trust issue is important, as a user of Polish language software. But the major problem isn’t about plural support so much as it is just bad translations. +Closest thing to having data is Paypal. A person at Paypal was exploring using Fluent for translation, because they saw that the older users had less trust of the product if the translation was not natively written. -DAF: I agree with ZIB and SMY. I totally agree with the approach that English is just another language/translation. But I think that the translator must be given the whole message and all the lines she needs to serve all cases. +SMY: I agree, although I wasn’t trying to be data-driven here. The trust issue is important, as a user of Polish language software. But the major problem isn’t about plural support so much as it is just bad translations. -MIH: I don’t disagree with anything that was said here. We can discuss whether the syntax is friendly or not to developers in certain ways. I don’t think the space saving considerations are significant. I don’t think these issues matter so much as whether we make it easy for translators or not? Does the data model support making things easy for translators? We can design syntaxes to be concise or verbose for developers however we want (ex: JS friendly, Java friendly). But should this be in the data model? +DAF: I agree with ZIB and SMY. I totally agree with the approach that English is just another language/translation. But I think that the translator must be given the whole message and all the lines she needs to serve all cases. -SMY: I wanted to add one thing to the list of pros and cons. Both approaches can be difficult to debug. Look at the following example. If we translate this into Arabic, there are 1300 variants/cases. It’s likely to make small mistakes that create inconsistencies. +MIH: I don’t disagree with anything that was said here. We can discuss whether the syntax is friendly or not to developers in certain ways. I don’t think the space saving considerations are significant. I don’t think these issues matter so much as whether we make it easy for translators or not? Does the data model support making things easy for translators? We can design syntaxes to be concise or verbose for developers however we want (ex: JS friendly, Java friendly). But should this be in the data model? +SMY: I wanted to add one thing to the list of pros and cons. Both approaches can be difficult to debug. Look at the following example. If we translate this into Arabic, there are 1300 variants/cases. It’s likely to make small mistakes that create inconsistencies. ``` activity-needed-calculation-plural = { NUMBER($totalHours) -> @@ -257,15 +244,13 @@ activity-needed-calculation-plural = { NUMBER($totalHours) -> } a day. ``` - - -MIH: I don’t find this more readable, though. I think you end up with more errors than just spaces and commas, you get problems of sentence agreement (https://en.wikipedia.org/wiki/Agreement_(linguistics)). +MIH: I don’t find this more readable, though. I think you end up with more errors than just spaces and commas, you get problems of sentence agreement (https://en.wikipedia.org/wiki/Agreement_(linguistics)). I still don’t see this example as an argument to represent nested selector messages in the data model. -SMY: I’m trying to picture the scenario in 10 years from now, let’s say you have a UI, you have different use case scenarios. Can we support it? +SMY: I’m trying to picture the scenario in 10 years from now, let’s say you have a UI, you have different use case scenarios. Can we support it? -MIH: I don’t think the existence of new tooling will make a difference, because the linguistic concerns of sentence agreement will still be there. It still makes things mentally complex. +MIH: I don’t think the existence of new tooling will make a difference, because the linguistic concerns of sentence agreement will still be there. It still makes things mentally complex. SMY: This is a bit of an extreme case, yes, and I’m just trying to play devil’s advocate a little bit. @@ -274,25 +259,25 @@ Can you split the message into separate sentences, translate, and then combine s MIH: The problem is that the combination of sentence parts has to be managed by the translator. And sentences still need to agree: “You deleted 21 files. You can recover them from the Recycle Bin” -DAF: In XLIFF 1.2, support of segmentation was added as an afterthought, it was not well supported. So the model was changed in XLIFF 2. In the XLIFF 2 data model, you can do segmentation at the time of translation. It is something that can potentially be forced at the time of extraction of messages from the source, but it is not a general best practice, segmentation is a linguistic process, so it should happen in L10n tools not in the engineering Extractor. +DAF: In XLIFF 1.2, support of segmentation was added as an afterthought, it was not well supported. So the model was changed in XLIFF 2. In the XLIFF 2 data model, you can do segmentation at the time of translation. It is something that can potentially be forced at the time of extraction of messages from the source, but it is not a general best practice, segmentation is a linguistic process, so it should happen in L10n tools not in the engineering Extractor. -MIH: Look at slide 19 of my l10n concepts doc. Segmentation is localization level. Taking a full message for translation, it turns it into a single Text Unit. That Text Unit is split into segments. The only benefit of segmentation is increased leverage. But has drawbacks (for example, how to deal with spaces between sentences in CJK, Thai, etc.) +MIH: Look at slide 19 of my l10n concepts doc. Segmentation is localization level. Taking a full message for translation, it turns it into a single Text Unit. That Text Unit is split into segments. The only benefit of segmentation is increased leverage. But has drawbacks (for example, how to deal with spaces between sentences in CJK, Thai, etc.) DAF: As a supporting detail, according to the XLIFF standard, Text Units are indivisible, but it is okay for Segments inside a unit to be split or joined. -MIH: A Text Unit cannot be altered, but it can be split up into whatever number of Segments, and it gets recombined back into a Text Unit when returning the translated Text Unit. +MIH: A Text Unit cannot be altered, but it can be split up into whatever number of Segments, and it gets recombined back into a Text Unit when returning the translated Text Unit. -Extraction produces Text Units, which is usually paragraphs. Segmentation converts Text Units into Segments, which are usually sentences. +Extraction produces Text Units, which is usually paragraphs. Segmentation converts Text Units into Segments, which are usually sentences. -For example, if you have a source like, “There are 21 files. Together, they are 82 MB.” The word “they” refers to the 21 files. So the sentences have to be taken together. +For example, if you have a source like, “There are 21 files. Together, they are 82 MB.” The word “they” refers to the 21 files. So the sentences have to be taken together. NIC: I see, so it is up to the author to keep the context of the sentences within a text unit. MIH: Are there any other pros and cons? -NIC: Is one approach or another more compatible with TMSes? I have a gut feeling, but not sure. +NIC: Is one approach or another more compatible with TMSes? I have a gut feeling, but not sure. -MIH: I have more than just gut feeling, I tried both and implemented internally at Google, the approach without internal selection is more compatible with TMSes. I worked on an internal TMS tool a couple of years ago, and it takes the full message approach. For plural message, it supports the representation of missing plural cases (English as ONE and OTHER, but Russian might have 4 cases, and the tool supports the expansion to the plural cases in the target language and Cartesian product of cases when there are multiple selectors in the source message). +MIH: I have more than just gut feeling, I tried both and implemented internally at Google, the approach without internal selection is more compatible with TMSes. I worked on an internal TMS tool a couple of years ago, and it takes the full message approach. For plural message, it supports the representation of missing plural cases (English as ONE and OTHER, but Russian might have 4 cases, and the tool supports the expansion to the plural cases in the target language and Cartesian product of cases when there are multiple selectors in the source message). NIC: Perhaps that is important so that we avoid the pattern of trying to solve a problem with a solution but not actually solving it. @@ -300,41 +285,37 @@ DAF: Perhaps we can add to the goals a set of examples that show the mappings of We can add these mappings to the goals document, for example “Mapping to Fluent”, “Mapping to ICU”. -MIH: I have code that does that mapping algorithmically for ICU MessageFormat messages. And note, there is no concept of full message selection in ICU. Fluent is similar in that regard (of no full message selection), right? +MIH: I have code that does that mapping algorithmically for ICU MessageFormat messages. And note, there is no concept of full message selection in ICU. Fluent is similar in that regard (of no full message selection), right? -DAF: I think this is great progress. I think that we here can say that we all understand each other. And we have the pros and cons. We can make the notes of this meeting available, and provide a digestible summary to the full meeting. +DAF: I think this is great progress. I think that we here can say that we all understand each other. And we have the pros and cons. We can make the notes of this meeting available, and provide a digestible summary to the full meeting. ECH: We also don’t have to provide just one proposal from the group, we could provide a small number of proposals along with the notes and summary to help the group think and decide. DAF: I would want us to sleep on it to not make a rash decision. After all, our decision making process gives all the power to decide to the monthly plenary meeting. -RCA: I have been listening all along, and I agree with most of the opinions of the meeting, even if they diverge a little bit from each other. I like ZIB’s thought to think about 10 years in the future and think about what we have now. We can support existing tools. But I don’t think it is a correct way to limit the ability in the future of what we can represent so that we don’t regret things in 10 years’ time. +RCA: I have been listening all along, and I agree with most of the opinions of the meeting, even if they diverge a little bit from each other. I like ZIB’s thought to think about 10 years in the future and think about what we have now. We can support existing tools. But I don’t think it is a correct way to limit the ability in the future of what we can represent so that we don’t regret things in 10 years’ time. -NIC: What if we have an example with 3 selectors? Or 1000 selectors? +NIC: What if we have an example with 3 selectors? Or 1000 selectors? -MIH: Sure, there is no problem in the data model to represent 1000 selectors, and the tooling can be adapted to support that, too. The issue is that if we allow nested selectors, then we move the complexity out of the syntax and into the head of the translator. +MIH: Sure, there is no problem in the data model to represent 1000 selectors, and the tooling can be adapted to support that, too. The issue is that if we allow nested selectors, then we move the complexity out of the syntax and into the head of the translator. -DAF: This brings me back to the question of are we trying to support natural language generation, or are we trying to create a rule based machine translation system? I don’t think we should try to create a rule based machine translation system. +DAF: This brings me back to the question of are we trying to support natural language generation, or are we trying to create a rule based machine translation system? I don’t think we should try to create a rule based machine translation system. -SMY: For the simple message example, _________ For example, if you have the message: +SMY: For the simple message example, ****\_**** For example, if you have the message: “You have {X} available items.” In many languages, you need message-level selectors. +--- -______ +MIH: You cannot add this as a translator. \_**\_ You can do this for gender. \_\_\_** You need to have a tool that supports messages that are formatted this way +SMY: This is a con of message-level selectors. Unless +DAF: The other way around, it won’t be there, either. The linguist won’t be able to add options in their GUI. -MIH: You cannot add this as a translator. ____ You can do this for gender. _____ You need to have a tool that supports messages that are formatted this way - - -SMY: This is a con of message-level selectors. Unless - -DAF: The other way around, it won’t be there, either. The linguist won’t be able to add options in their GUI. - -ZIB: It’s not necessary in Fluent. That would be terrible to force the translator to work with plurals for languages that don’t need it. Isolation of languages (per-language support) is one of the fundamental design principles of Fluent. +ZIB: It’s not necessary in Fluent. That would be terrible to force the translator to work with plurals for languages that don’t need it. Isolation of languages (per-language support) is one of the fundamental design principles of Fluent. Does this mean that every single message needs selection? @@ -344,21 +325,21 @@ SMY: But there are messages with noun declensions (aka inflections) MIH: Messages without placeholders don’t need selection. -SMY: But for messages with noun declensions, it sounds like +SMY: But for messages with noun declensions, it sounds like -DAF: Developers are responsible for making their messages internationalizable. They don’t need to know everything about the possible target languages to do so. +DAF: Developers are responsible for making their messages internationalizable. They don’t need to know everything about the possible target languages to do so. -ZIB: Are we talking about making all messages that are plurals into selection messages (?), or are we saying ___________. +ZIB: Are we talking about making all messages that are plurals into selection messages (?), or are we saying ****\_\_\_****. SMY: I think what we’re saying is that the data model for a message with a single placeholder is a message level selection with a single case/variant. DAF: A feature request that Steven Loomis filed for XLIFF 2.2 is to allow passing on semantic data for placeholders as metadata. I think it would be a great feature complementing the current core capability. -ZIB: It is not a bug in the system that there is ambiguity in the system as to whether an argument should be an integer or a string or not. It is just +ZIB: It is not a bug in the system that there is ambiguity in the system as to whether an argument should be an integer or a string or not. It is just -I am less concerned about this direction. I am more concerned about the concern that DAF brought up with the example of gettext tooling, where if you have a message with plurals, you have to use a different call site and go off somewhere to fetch the message. +I am less concerned about this direction. I am more concerned about the concern that DAF brought up with the example of gettext tooling, where if you have a message with plurals, you have to use a different call site and go off somewhere to fetch the message. -SMY: I think the confusion is because I used the phrase “simple message”, but what I meant was _______. +SMY: I think the confusion is because I used the phrase “simple message”, but what I meant was **\_\_\_**. MIH: If you have a placeholder, and the placeholder is numeric, you should have validation that says that “you probably want a message level selection string”. @@ -368,13 +349,13 @@ MIH: I would say no. DAF: I can’t think of any example that wouldn’t. -ECH: What about values coming in from runtime, example you have a date. They are represented via as a placeholder, but it doesn’t require selection. +ECH: What about values coming in from runtime, example you have a date. They are represented via as a placeholder, but it doesn’t require selection. DAF: Well, depending on the use of the date in the sentence in Czech, it can either be in nominative or accusative case, so there is selection there. NIC: What about CJK languages that don’t have any plurals? -MIH: Well, not supporting plurals properly is bad i18n. (?) +MIH: Well, not supporting plurals properly is bad i18n. (?) DAF: I wanted to add as a follow up topic how to support the per-language message expansion in XLIFF. I would think that the only option is to create separate XLIFF files (with set @trgLang) for each locale with all the relevant cases covered.. @@ -389,40 +370,26 @@ SMY: It would be good to create an executive summary. ECH: Fill in the blanks in the notes, too. -SMY: When is the next meeting? It’s Sept 21 -- 3 weeks from now. +SMY: When is the next meeting? It’s Sept 21 -- 3 weeks from now. RCA: Should we, on Slack, if you want to meet before the next plenary meeting, we can use that. -SMY: We only talked about 2 items in our Tier 1 issues in our Task Force issues at https://github.com/unicode-org/message-format-wg/projects/3. We can meet again in 1-2 weeks to discuss the other 2 issues #103 and #106, which are really the same thing. +SMY: We only talked about 2 items in our Tier 1 issues in our Task Force issues at https://github.com/unicode-org/message-format-wg/projects/3. We can meet again in 1-2 weeks to discuss the other 2 issues #103 and #106, which are really the same thing. MIH: I think #104 is the same thing, no? -SMY: #104 is about how to represent selection in the data model. But #103 and #106 are different, and are about “do we allow” and “how”. +SMY: #104 is about how to represent selection in the data model. But #103 and #106 are different, and are about “do we allow” and “how”. RCA: Should we schedule a meeting to discuss those remaining topics on Sept 14? -DAF: I was going to argue against meeting again in 2 weeks. If we meet in 2 weeks, we won’t have enough time to prepare, discuss, summarize and present to the main meeting afterwards.. Also I feel this should be finalized before progressing.. +DAF: I was going to argue against meeting again in 2 weeks. If we meet in 2 weeks, we won’t have enough time to prepare, discuss, summarize and present to the main meeting afterwards.. Also I feel this should be finalized before progressing.. RCA: I think 2 weeks would be a good checkpoint if nothing else, so that we don’t allow 3 weeks to go by without more progress. MIH?: Sure, we can check if the meeting results are ready to present to the monthly meeting, let’s talk on Slack? - SMY?: Agree with DAF, there are dependencies, we shouldn’t progress before this is settled.. MIH: I volunteer for the first pass on the executive summary, should be checked by “the other camp” DAF: Slack sounds good to me, let’s add individual approval stamps for the executive summary.. - - - - - - - - - - - - - diff --git a/meetings/task-force/#103-2020-09-28.md b/meetings/task-force/#103-2020-09-28.md index f0bbddf8ed..05f2205c56 100644 --- a/meetings/task-force/#103-2020-09-28.md +++ b/meetings/task-force/#103-2020-09-28.md @@ -1,6 +1,7 @@ ## Executive summary ([Original Doc](https://docs.google.com/document/d/1lAyBZR2VQR8ILqvcg5Gad_wf7QWUbsoJ13wGZFSmtbE/edit#)) -*Participants:* +_Participants:_ + - DAF: David Filip - ECH: Elango Cheran - MIH: Mihai Nita @@ -22,11 +23,13 @@ Also, some of the concerns raised recently about message-level selectors would b Action items include all of the following: Collecting stakeholders + - Listing all categories of stakeholders - Inviting more representatives from stakeholder categories - Goal is collect information that will help us decide on priorities among the categories of stakeholders Collecting use cases + - [Issue #119](https://github.com/unicode-org/message-format-wg/issues/119) - Including corner cases for current approaches (ICU MessageFormat, Fluent, etc.) - Examples that seem practical IRL but potentially unwieldy, from all perspectives @@ -34,32 +37,31 @@ Collecting use cases Describe / depict the scenario of a UI (of a CAT tool) for professional translators in dealing with internal selectors (is this feasible or more difficult when compared with only full message selectors). - > Approval Stamps for Executive Summary -*DAF,ECH,STA,NIC* +_DAF,ECH,STA,NIC_ ## Minutes -This meeting is a continuation of the last task force ([minutes from the 1st task force meeting](https://docs.google.com/document/d/1-6t6Yl5RHZI9QZwBDrFrl1fqSKSA4IMs1ef60IxD3lU/edit#heading=h.tulel52cgapk)). In [issue #103](https://github.com/unicode-org/message-format-wg/issues/103). +This meeting is a continuation of the last task force ([minutes from the 1st task force meeting](https://docs.google.com/document/d/1-6t6Yl5RHZI9QZwBDrFrl1fqSKSA4IMs1ef60IxD3lU/edit#heading=h.tulel52cgapk)). In [issue #103](https://github.com/unicode-org/message-format-wg/issues/103). STA: EAO has use cases of internal selectors, we can start from there. -MIH: ZIB raises concerns that I don't totally agree with. I think we are past technical arguments or pros and cons, and we have fundamental philosophical differences in how we evaluate the pros and cons. By arguing about our positions, we've ended up at a point where we need to compromise somehow. On how to compromise, we need some clear guiding principles. I think the first 4 bullets (of doc _____) still apply. We have to be opinionated and be willing to fix what we learned that was wrong. +MIH: ZIB raises concerns that I don't totally agree with. I think we are past technical arguments or pros and cons, and we have fundamental philosophical differences in how we evaluate the pros and cons. By arguing about our positions, we've ended up at a point where we need to compromise somehow. On how to compromise, we need some clear guiding principles. I think the first 4 bullets (of doc **\_**) still apply. We have to be opinionated and be willing to fix what we learned that was wrong. -LHS: What is the point of having a standard in the first place? Why should we not each of us just create something and keep it within our respective organizations? For example, we've created a format internally for better supporting more necessary aspects of localization for our localization tools ecosystem. +LHS: What is the point of having a standard in the first place? Why should we not each of us just create something and keep it within our respective organizations? For example, we've created a format internally for better supporting more necessary aspects of localization for our localization tools ecosystem. EAO: So where are we supposed to do that transformation? -MIH: Yes, that is the core argument. My preference among the 4 options [in this issue comment](https://github.com/unicode-org/message-format-wg/issues/103#issuecomment-699432663) is option 2, and the other argument being discussed is option 3. +MIH: Yes, that is the core argument. My preference among the 4 options [in this issue comment](https://github.com/unicode-org/message-format-wg/issues/103#issuecomment-699432663) is option 2, and the other argument being discussed is option 3. -RCA: What I meant to start off the reason is for EAO to explain his concerns. We want to hear more about EAO and ZIB have to say. +RCA: What I meant to start off the reason is for EAO to explain his concerns. We want to hear more about EAO and ZIB have to say. -STA: Can I try to summarize? Some of us prefer the message-level selectors with the internal selectors are "exploding" (Cartesian product of combinatorial options) to full message patterns with top-level selectors. +STA: Can I try to summarize? Some of us prefer the message-level selectors with the internal selectors are "exploding" (Cartesian product of combinatorial options) to full message patterns with top-level selectors. And others would prefer this conversion to happen in the round-trip. -EAO: Not quite. The conversion from internal selectors to top-level selectors is an easy operation, and can happen when we need top-level only selectors. Allowing for internal-only selectors doesn't impose an appreciable cost when we require the top-level selectors. +EAO: Not quite. The conversion from internal selectors to top-level selectors is an easy operation, and can happen when we need top-level only selectors. Allowing for internal-only selectors doesn't impose an appreciable cost when we require the top-level selectors. If we make explicit what the operation is for converting from internal selectors to top-level selectors, then the reverse-conversion will be clear, even though we don't need to specify how it should be done in the specification. @@ -71,11 +73,11 @@ MIH: yes but implementation is harder, things are messier...once we allow intern LHS: When you say recursivity, do you mean nesting? -MIH: Yes. But I also mean what happens when you have message references. +MIH: Yes. But I also mean what happens when you have message references. DAF: Isn’t recursivity an issue also in case of top level selectors? -EAO: I think what we're talking about is nesting, not recursivity. Recursivity needs to be addressed if we allow references between messages, ex: A includes B, B includes C, ... +EAO: I think what we're talking about is nesting, not recursivity. Recursivity needs to be addressed if we allow references between messages, ex: A includes B, B includes C, ... LHS: Can you explain why nesting would be worse for internal selectors than full-message selectors? @@ -101,25 +103,25 @@ MIH: Yes, but once it's in the standard, then we're stuck with it. EAO: It's not all that difficult, having worked with that for a while. -MIH: But it's not adopted by others, especially in the localization world. This is not very well supported. +MIH: But it's not adopted by others, especially in the localization world. This is not very well supported. -LHS: Some of us at Google are sensitive to the concerns of the localization industry due to the internal localization we do. I don't think you've explained why it’s messier with internal selectors than external/full-message selectors. +LHS: Some of us at Google are sensitive to the concerns of the localization industry due to the internal localization we do. I don't think you've explained why it’s messier with internal selectors than external/full-message selectors. EAO: How simple would the algorithm need to be that converts from internal to top-level selectors so that the localization industry would be able to support messages with internal selectors? -MIH: The algorithm is not that complicated. What becomes messier is the data structures - basically, we make it more complicated. We then need to ensure that we are dealing with the normalized form before working on it. +MIH: The algorithm is not that complicated. What becomes messier is the data structures - basically, we make it more complicated. We then need to ensure that we are dealing with the normalized form before working on it. EAO: Or we add a flag that indicates whether we are using internal selectors. -ZIB: I have 3 thoughts on what I just heard. 1) __________. I am not sympathetic to the reason that "localization tools haven't used it so far" as a reason for not supporting it. Saying that CAT tools are not powerful enough to handle it, so that they never will be, limits what we are willing to try. +ZIB: I have 3 thoughts on what I just heard. 1) ****\_\_****. I am not sympathetic to the reason that "localization tools haven't used it so far" as a reason for not supporting it. Saying that CAT tools are not powerful enough to handle it, so that they never will be, limits what we are willing to try. -LHS: I'm sure you're as frustrated with the current ICU MessageFormat as the rest of us are. At Google, we have a bespoke localization toolchain, we've spent resources in the tooling and l10n infrastructure. Even for us, with all that we do, it's still very difficult to support these features (in existing MessageFormat). +LHS: I'm sure you're as frustrated with the current ICU MessageFormat as the rest of us are. At Google, we have a bespoke localization toolchain, we've spent resources in the tooling and l10n infrastructure. Even for us, with all that we do, it's still very difficult to support these features (in existing MessageFormat). MIH: That's not really the argument. RCA: Point of order, let's stick to the original topic. -MIH: No, that's not what I was saying. We've been using ICU MF for 15 years. The main argument is that the places where it is not adopted / banned, it was banned for the reason "There are internal selectors -- don't do this". So to argue for putting that back in is ignoring those lessons, and we know what works and what doesn't. +MIH: No, that's not what I was saying. We've been using ICU MF for 15 years. The main argument is that the places where it is not adopted / banned, it was banned for the reason "There are internal selectors -- don't do this". So to argue for putting that back in is ignoring those lessons, and we know what works and what doesn't. ZIB: Can I respond to that, because I think it links to what LHS is saying. Rust allows “unsafe” option, we should have the default Lint option be external message selectors, but still provide the other option. The experience of l10n teams is a good justification for defaults, and this limitation as a default, but saying “let’s not allow it ever by anyone” seems cocky. MIH please distinguish complicated & complex. @@ -131,35 +133,34 @@ MIH: let’s not say “let’s make it as flexible as possible” because then STA: We agreed that these approaches are equivalent, so it’s just a question of convenience, convenience for developers vs. translators. - STA: I think we agreed previously that these difference approaches are equivalent, so things are inherently complex, but the tooling can be there. -I’d like to take a step back in heated conversations and think about our goals and principles. I want to go back to what LHS wsa saying about the l10n industry. With current MF, we're so fragmented. So I think we shouldn't try to add too many features. But maybe we're ready to challenge previous assumptions. +I’d like to take a step back in heated conversations and think about our goals and principles. I want to go back to what LHS wsa saying about the l10n industry. With current MF, we're so fragmented. So I think we shouldn't try to add too many features. But maybe we're ready to challenge previous assumptions. -It's another axis to think about. Unification or innovation. +It's another axis to think about. Unification or innovation. -DAF: First off, I do support message only selectors.. I tried to wear the hat of internal selectors in case of MIH’s recursivity argument. I think both approaches support recursivity. In full message selectors, the recursivity is only allowed upwards and no hidden complexity is allowed. But inside selectors allow for infinitesimal hidden complexity via recursivity. I am for saying that the standard canonical solution is message level selectors only, and having internal selector capable roundtrips is okay as far as we want to go there. +DAF: First off, I do support message only selectors.. I tried to wear the hat of internal selectors in case of MIH’s recursivity argument. I think both approaches support recursivity. In full message selectors, the recursivity is only allowed upwards and no hidden complexity is allowed. But inside selectors allow for infinitesimal hidden complexity via recursivity. I am for saying that the standard canonical solution is message level selectors only, and having internal selector capable roundtrips is okay as far as we want to go there. We finished our goals and non goals, but we haven't finished our design principles. This discussion is one of the axes discussions, continuing the design principles development. We agreed that interoperability with L10n is one of our goals and I don’t see internal selectors contributing to that. -ECH: When it comes to discussion of complexity & simplicity, there’s a talk I go back to, [“Simple Made Easy”](https://www.infoq.com/presentations/Simple-Made-Easy/), talks about how this relates to programming. The talk does a better job than I could to explain this. Simplicity & complexity are objective terms, but difficult/easy are subjective terms. Complicated sounds a little ambiguous because it's not clear whether it's meant in the objective meaning of complex or the relative meaning of difficult. There’s inherent complexity (in the problem we’re trying to solve) and incidental complexity (that we’re adding)...I go to this perspective repeatedly and it always proves to help me make successful decisions every time. +ECH: When it comes to discussion of complexity & simplicity, there’s a talk I go back to, [“Simple Made Easy”](https://www.infoq.com/presentations/Simple-Made-Easy/), talks about how this relates to programming. The talk does a better job than I could to explain this. Simplicity & complexity are objective terms, but difficult/easy are subjective terms. Complicated sounds a little ambiguous because it's not clear whether it's meant in the objective meaning of complex or the relative meaning of difficult. There’s inherent complexity (in the problem we’re trying to solve) and incidental complexity (that we’re adding)...I go to this perspective repeatedly and it always proves to help me make successful decisions every time. MIH: I want to go back to me being emotional, and I am not getting emotional about making my point, but I'm getting emotional about repeating my points, and we keep repeating the arguments like in a flame war without discussing the underlying principles and finding a way to compromise, in order to break this impasse. EAO: A lot of my exp with MF1 has in fact been at a different scale of Google’s size. Small teams, in-house work. Most of the stuff we do needs to have support for Finnish and English. The other end is: no need for external toolchain, no need for LSP companies, just working with developers who know what they’re doing. -EAO: The other point is that I think there is a compromise solution here to satisfy both our desires. That is, require top-level selectors, but require messages to build themselves out of parts of other messages. That would give us the benefits of message level selectors without having them. Would that work, MIH? +EAO: The other point is that I think there is a compromise solution here to satisfy both our desires. That is, require top-level selectors, but require messages to build themselves out of parts of other messages. That would give us the benefits of message level selectors without having them. Would that work, MIH? MIH: Yes, with caveats. EAO: That way, we're not specifying the format explicitly, and we let the data model for a message or a bundle of messages, and to not allow recursivity. -MIH: Yes. Although, there was a request in the GH issue discussions for allowing recursivity, so we should revisit that. Back to the English / Finnish examples, is there value in gathering the stakeholders together and seeing how we can serve them all? +MIH: Yes. Although, there was a request in the GH issue discussions for allowing recursivity, so we should revisit that. Back to the English / Finnish examples, is there value in gathering the stakeholders together and seeing how we can serve them all? -ZIB: I'm happy to _________. Last month, we agreed that those 2 approaches are equivalent, but then EAO pointed out that it's not true, for example version control roundtrip. If we allow nested selectors (?), then we can support recursivity in either approach, but one approach doesn't allow round trip and the other doesn't. +ZIB: I'm happy to ****\_****. Last month, we agreed that those 2 approaches are equivalent, but then EAO pointed out that it's not true, for example version control roundtrip. If we allow nested selectors (?), then we can support recursivity in either approach, but one approach doesn't allow round trip and the other doesn't. STA: But if we say that we only require top level selectors, then round trip works. -ZIB: ________, and from the top level, you cannot get back to what was provided. +ZIB: **\_\_\_\_**, and from the top level, you cannot get back to what was provided. STA: One solution is to not allow the transfer between the 2. @@ -168,13 +169,14 @@ ZIB: But then the data model doesn't allow for the storage of one approach. MIH: **You can’t roundtrip this (with info from data model only):** + ``` You {count, plural, =1 {deleted # file from} other {deleted # files from}} income. ``` -DAF: You can do the conversion if you have knowledge of the expansion mechanism that had been used. Without that, it's impossible just algorithmically. +DAF: You can do the conversion if you have knowledge of the expansion mechanism that had been used. Without that, it's impossible just algorithmically. -EAO: And that gets tricky when you deal with transformations in the other form (in the message level form if originally supplying internal selector form). Which is why I say that we allow it but don't try to specify it. +EAO: And that gets tricky when you deal with transformations in the other form (in the message level form if originally supplying internal selector form). Which is why I say that we allow it but don't try to specify it. DAF : Last meeting we were tending to a consensus that all public exchanges should happen with full message selectors and internal selectors would be allowed in private roundtrips. @@ -182,36 +184,33 @@ MIH : Ok we only can roundtrip this if you have internal information of data mod ZIB: I want to point out that I agree that the question MIH posed is the right question, but I want to counter the point that they are completely equivalent. - DAF: The only argument for internal selectors that makes business sense to me is that internal is cheaper over the wire.. EAO: And it's more future-proof, if you're asserting that tools up until now haven't used it, it doesn't mean that tools won't use it. -STA: Back to axis (unification vs innovation). Why do you think we want to have this standard out? What chances do you see for MF 3.0? Will that [have to] happen, or is MF 2.0 enough. +STA: Back to axis (unification vs innovation). Why do you think we want to have this standard out? What chances do you see for MF 3.0? Will that [have to] happen, or is MF 2.0 enough. EAO: If we do it right, we can maintain backwards compatibility and never have to break. -STA: We have to look at the issue and implications. What do we do with a message with 7 different selectors. There are 3 different options: 1) favor compatibility and accept that we won’t be able to express messages with many selectors in a terse manner; 2) favor innovation and challenge the existing toolchains, 3) perhaps add it in MF3.0? And considering the complexity involved, I'm in favor of adding less features and keeping better backwards compatibility. +STA: We have to look at the issue and implications. What do we do with a message with 7 different selectors. There are 3 different options: 1) favor compatibility and accept that we won’t be able to express messages with many selectors in a terse manner; 2) favor innovation and challenge the existing toolchains, 3) perhaps add it in MF3.0? And considering the complexity involved, I'm in favor of adding less features and keeping better backwards compatibility. -RCA: Can we take a look at 2 options, and vote. So, message level selectors only, or allows both message level and internal selectors. We're going back and forth now. +RCA: Can we take a look at 2 options, and vote. So, message level selectors only, or allows both message level and internal selectors. We're going back and forth now. MIH: It's not so easy. -DAF: I agree that it's not so easy. If we allow message level selectors only, that’s the only simple option. But if we allow internal selectors, we get combinatorial possibilities. Which is canonical, how do u get from one to another? If you allow both, we’d need to define these equivalences, u cannot simply allow two options and let people use both, that’s not how u standardize interoperability.. +DAF: I agree that it's not so easy. If we allow message level selectors only, that’s the only simple option. But if we allow internal selectors, we get combinatorial possibilities. Which is canonical, how do u get from one to another? If you allow both, we’d need to define these equivalences, u cannot simply allow two options and let people use both, that’s not how u standardize interoperability.. -EAO: We're also not just dealing with this question in a vacuum. Like I said, I'm perfectly okay with supporting message level selectors on the condition that we can allow for message bundles (groupings of messages) to be able to be passed. +EAO: We're also not just dealing with this question in a vacuum. Like I said, I'm perfectly okay with supporting message level selectors on the condition that we can allow for message bundles (groupings of messages) to be able to be passed. MIH: I can clarify my position. If you don’t mean “Bundle”, I’m fine with references. I want to allow for references that are loaded from an arbitrary place, not necessarily from the same “Bundle”. I don’t want to care if it’s the same file or not. -EAO: In the data model it does matter because ____ +EAO: In the data model it does matter because \_\_\_\_ EAO: It’s not only about loading references to messages, but also scope and arguments. +ZIB: I think we agree on the round trip, but differences in **\_** -ZIB: I think we agree on the round trip, but differences in _____ - -Is the goal to allow a standard that allows a particular localization to be built on top? Or are we trying to create a data model - +Is the goal to allow a standard that allows a particular localization to be built on top? Or are we trying to create a data model STA: I have thoughts. @@ -222,29 +221,29 @@ MIH: I agree about references, but I don’t think we need to support hierarchy ECH: So, you want to have references, but you don’t need them in the same bundle, you just need to be able to refer to one another. -NIC: Re-using string is normally considered a bad practice. In many languages the context will change the translation of a string (e.g. “yes/no” in Vietnam). From what I understand from ZIB's bundles, the main use case is context, and that's a much bigger problem to solve if we tackle this part of this group. A lot of UI also required images for example to provide the best translation. +NIC: Re-using string is normally considered a bad practice. In many languages the context will change the translation of a string (e.g. “yes/no” in Vietnam). From what I understand from ZIB's bundles, the main use case is context, and that's a much bigger problem to solve if we tackle this part of this group. A lot of UI also required images for example to provide the best translation. ZIB: I want to point out that EAO and MIH are using "bundle" to refer to as a single packages of resources/files grouped together, and Fluent means it to refer to context that gets evaluated/interpolated at runtime. -DAF: If I understand correctly, EAO would be happy with top-level selectors only if we allow messages, variables, or scope passed through. As a part of l10n interchange format standards, I am potentially worried that this could violate the boundaries of text units. +DAF: If I understand correctly, EAO would be happy with top-level selectors only if we allow messages, variables, or scope passed through. As a part of l10n interchange format standards, I am potentially worried that this could violate the boundaries of text units. -MIH: There are use cases for that such as alt text in HTML.. +MIH: There are use cases for that such as alt text in HTML.. DAF: I see, subflows.. I am fine with that, as the subflows mechanism is well established in L10n exchange.. -STA: I would like to caution about message references. I still don't know what to think about them myself. They do cause problems for tooling, from experience. And they do cause problems with runtime resolution if you don't have that reference ready, yet. So there are reasons to be cautious, and it's good to not conflate these 2 discussions. +STA: I would like to caution about message references. I still don't know what to think about them myself. They do cause problems for tooling, from experience. And they do cause problems with runtime resolution if you don't have that reference ready, yet. So there are reasons to be cautious, and it's good to not conflate these 2 discussions. -Back to what ZIB was saying earlier, whether we want to have a data model that can express different forms of translation to support different types of l10n systems. I think that is based on the idea that it is necessary to support the internal selectors. But that's not necessarily true. It can be useful for developers, but there are cases where they are an abomination and cause problems. So I would be okay just not allowing them at all. +Back to what ZIB was saying earlier, whether we want to have a data model that can express different forms of translation to support different types of l10n systems. I think that is based on the idea that it is necessary to support the internal selectors. But that's not necessarily true. It can be useful for developers, but there are cases where they are an abomination and cause problems. So I would be okay just not allowing them at all. -MIH: That is a problem. Languages in English might be more conducive to expression in internal selectors, but when you get to Slavic languages, then you are forced to expand it or find workarounds. +MIH: That is a problem. Languages in English might be more conducive to expression in internal selectors, but when you get to Slavic languages, then you are forced to expand it or find workarounds. -ZIB: I know you're aware of the problem of the explosion of permutations (expansion of combinations, which is a large number). Do you not think that is a problem? +ZIB: I know you're aware of the problem of the explosion of permutations (expansion of combinations, which is a large number). Do you not think that is a problem? STA: I think there is a tradeoff that we're not really naming, which is, if we can say that there are a few messages that we won't be able to support, but if we do, then we can enable a very wide adoption, and become a good standard, and the large benefit that it provides would, in my mind, outweigh the costs of not supporting a few types of messages. -ZIB: Can we collect all the use cases that can only be represented with internal selectors? Give ourselves 2 weeks to do that, and see what we get. +ZIB: Can we collect all the use cases that can only be represented with internal selectors? Give ourselves 2 weeks to do that, and see what we get. -EAO: I think it would be hard to get current users of current MessageFormat ["1.0"] without a safe and trusted way to convert their messages and be able to recover their messages in the old format. And that is why, if we only use message level selectors, we have to use message references, etc. +EAO: I think it would be hard to get current users of current MessageFormat ["1.0"] without a safe and trusted way to convert their messages and be able to recover their messages in the old format. And that is why, if we only use message level selectors, we have to use message references, etc. MIH: I think it is possible to do. @@ -252,7 +251,7 @@ RCA: How can we collect use cases? Create a new issue? MIH: Also propose collecting stakeholders (“beneficiaries of this standard”), so when we do our compromises we know who is affected -RCA: When the group was formed, I wanted translation companies to be part of us, but wasn’t able to bring enough people from that side to our meeting/work group. Should include L10n industry *and* simple developers that want to localize their own stuff. +RCA: When the group was formed, I wanted translation companies to be part of us, but wasn’t able to bring enough people from that side to our meeting/work group. Should include L10n industry _and_ simple developers that want to localize their own stuff. MIH: we keep saying “translators” but there are a number of types of translators (from open source contributors to paid L10n vendors) @@ -294,17 +293,16 @@ EAO: I’m curious what other people's opinions might be there. STA: I see benefits of them, I also see benefits of not having them. I can write a short doc on it and put it on github. +ZIB: I would also like to say that we have an increasing number of requests for dynamic elements, where the best thing to pass is a declaration to an argument, and this is causing people to write dirty hacks in JS. It's another angle to think from. Ex: you decide only at runtime which name from a set of five names you use for a message. -ZIB: I would also like to say that we have an increasing number of requests for dynamic elements, where the best thing to pass is a declaration to an argument, and this is causing people to write dirty hacks in JS. It's another angle to think from. Ex: you decide only at runtime which name from a set of five names you use for a message. - -STA: I think this is similar to how Siri works. I think that is something interesting. +STA: I think this is similar to how Siri works. I think that is something interesting. -MIH: Basically, you get all the same problems that you get for placeholders. Ex: the problem +MIH: Basically, you get all the same problems that you get for placeholders. Ex: the problem -ZIB: But the extra problem is that declaration is synchronous, but the resolution of the declaration is asynchronous. That results in _______. That creates churn and is a paper cut. +ZIB: But the extra problem is that declaration is synchronous, but the resolution of the declaration is asynchronous. That results in **\_\_\_**. That creates churn and is a paper cut. EAO: This conversation is presupposing that we have message references. -MIH: I will file an issue, if we don't, to discuss message references. Also, I created and applied the "requirements" tags to our issues. Fix as needed. +MIH: I will file an issue, if we don't, to discuss message references. Also, I created and applied the "requirements" tags to our issues. Fix as needed. -RCA: Thanks, this type of repo maintenance is necessary, so thanks for that. Let's define a little bit more the organizing work to be done by the chair group in the chair group meeting next week, and create the project planning board. \ No newline at end of file +RCA: Thanks, this type of repo maintenance is necessary, so thanks for that. Let's define a little bit more the organizing work to be done by the chair group in the chair group meeting next week, and create the project planning board. diff --git a/meetings/task-force/#103-2020-10-26.md b/meetings/task-force/#103-2020-10-26.md index 3fb932ae4e..1ea710fa5a 100644 --- a/meetings/task-force/#103-2020-10-26.md +++ b/meetings/task-force/#103-2020-10-26.md @@ -1,6 +1,7 @@ ## Executive summary ([Original Doc](https://docs.google.com/document/d/1QvzmpbVsPfW0MFajXGIqPhXxp54oGV6xaVShOjaV-qk/edit#)) -*Participants:* +_Participants:_ + - DAF: David Filip - ECH: Elango Cheran - MIH: Mihai Nita @@ -15,25 +16,25 @@ Consensus 1: Include message references in the data model. -Discussion: The implementers would find a way to include references anyways, but including it in the data model (standard) can make it subject to best practices. It’s still possible for users to do “the wrong thing” (ex: concatenation of strings/messages), but then you would find it more difficult to achieve. +Discussion: The implementers would find a way to include references anyways, but including it in the data model (standard) can make it subject to best practices. It’s still possible for users to do “the wrong thing” (ex: concatenation of strings/messages), but then you would find it more difficult to achieve. -One of the drawbacks of message references is that referenced messages effectively have a public API (names of parameters, variables, variants, etc.) which must be consistent across all callsites. This leads us to consensus 2. +One of the drawbacks of message references is that referenced messages effectively have a public API (names of parameters, variables, variants, etc.) which must be consistent across all callsites. This leads us to consensus 2. Consensus 2: Allow parameters passed with message references to the message being referenced and validate it. -Discussion: The variables/fields passed should not be completely untyped and unchecked. We want a validation mechanism that can allow providing early error feedback to the translators & developers. We need to decide on when the validation can & should happen, including the meaning of “build time” and “run time” in regards to validation. +Discussion: The variables/fields passed should not be completely untyped and unchecked. We want a validation mechanism that can allow providing early error feedback to the translators & developers. We need to decide on when the validation can & should happen, including the meaning of “build time” and “run time” in regards to validation. > Approval Stamps for Executive Summary -*ECH,MIH,RCA,NIC,DAF,STA,EAO,CLS* +_ECH,MIH,RCA,NIC,DAF,STA,EAO,CLS_ ## Minutes -LHS: Summary of last meeting (2020-10-19 monthly meeting). Continuation of previous discussions, but closer to a compromise. One side would prefer to keep all selectors external to the message. The other side would prefer to allow selectors inside / internal to the message. Argument for external selectors is simplicity of data model, and compatibility with existing l10n tooling. Argument for internal selectors is that it allows for more flexibility in the future. It makes a more compact notation (for cases where compactness matters). And it makes it easier for translators who are programmers. It allows for the possibility of lossless round tripping. EAO was suggesting a compromise that we either allow message references or internal selectors, and it seems like MIH and others were OK with this…? +LHS: Summary of last meeting (2020-10-19 monthly meeting). Continuation of previous discussions, but closer to a compromise. One side would prefer to keep all selectors external to the message. The other side would prefer to allow selectors inside / internal to the message. Argument for external selectors is simplicity of data model, and compatibility with existing l10n tooling. Argument for internal selectors is that it allows for more flexibility in the future. It makes a more compact notation (for cases where compactness matters). And it makes it easier for translators who are programmers. It allows for the possibility of lossless round tripping. EAO was suggesting a compromise that we either allow message references or internal selectors, and it seems like MIH and others were OK with this…? -MIH: Message references are about using a message id that points to an external selector, but not including the message itself. So it’s “message-by-reference”, not “message-by-value”. +MIH: Message references are about using a message id that points to an external selector, but not including the message itself. So it’s “message-by-reference”, not “message-by-value”. -ZIB: For message references, there would need to be tooling made to support it, but it can be done. But one advantage is that they can be used dynamically. That is a powerful model that we don’t support in Fluent but would like to explore support of. If they solve what EAO is looking for, then that is great. +ZIB: For message references, there would need to be tooling made to support it, but it can be done. But one advantage is that they can be used dynamically. That is a powerful model that we don’t support in Fluent but would like to explore support of. If they solve what EAO is looking for, then that is great. LHS: One thing to point out is that MIH said last time that allowing message references means that users can concatenate messages, but we recognized that no matter what we do, it’s always possible for users to “do the wrong thing”. @@ -41,31 +42,31 @@ ZIB: And we can use linters to help users detect potential problems. MIH: And if we don’t allow programmers to do this, then it will be “behind our backs”, but if we let people do references in the standard, then we can catch it, advise, etc. -RCA: It’s nice to have guidelines to help the user, but it’s a cool-to-have or nice-to-have feature, but it’s good to +RCA: It’s nice to have guidelines to help the user, but it’s a cool-to-have or nice-to-have feature, but it’s good to -EAO: I’m not sure that linters are outside the scope [of the WG]. It is something that could be included in the spec. +EAO: I’m not sure that linters are outside the scope [of the WG]. It is something that could be included in the spec. -RCA: I think we can specify best practices, but it is good to not interfere in the +RCA: I think we can specify best practices, but it is good to not interfere in the EAO: I suspect that if we delve deeper into this topic, that it would be more complicated than we realize. -DAF: It’s true that we don’t have specific guidelines on the external user deliverables, but we do have a specific goal on interoperability with XLIFF, and that might surface +DAF: It’s true that we don’t have specific guidelines on the external user deliverables, but we do have a specific goal on interoperability with XLIFF, and that might surface -STA: I want to acknowledge the shortcomings from Fluent regarding message references. The first drawback is that message references means that you reference messages. People tend to abuse it. They put nouns into message reference and then use it as a reference instead of just using the noun directly in the pattern. The second problem is that there is a loss of context for the translator. The third problem is that a single message is no longer independent. That means that you need a CAT tool to find out what these other messages are / where to find them, and then you need to construct a graph of message dependencies. +STA: I want to acknowledge the shortcomings from Fluent regarding message references. The first drawback is that message references means that you reference messages. People tend to abuse it. They put nouns into message reference and then use it as a reference instead of just using the noun directly in the pattern. The second problem is that there is a loss of context for the translator. The third problem is that a single message is no longer independent. That means that you need a CAT tool to find out what these other messages are / where to find them, and then you need to construct a graph of message dependencies. -DAF: STA, you’re right, but MIH is also right -- if you don’t gie this option, then they will do this anyways, so this gives the user some guidance on what to do. XLIFF can represent this with subflows. You’re right there might be subflows of logic, and there can be business logic around that. And this is a continuation of the discussion on the thread about “localization units”. +DAF: STA, you’re right, but MIH is also right -- if you don’t gie this option, then they will do this anyways, so this gives the user some guidance on what to do. XLIFF can represent this with subflows. You’re right there might be subflows of logic, and there can be business logic around that. And this is a continuation of the discussion on the thread about “localization units”. EAO: If our current discussion is about where to put the message references in the data model, then later discussions can determine how to layer in how to use it, but we haven’t finished the first discussion. LHS: One thing that I wanted to makes sure is that companies/groups that want to use internal selectors, that there is a mechanism to convert to external selectors losslessly, and that companies/groups that want to keep things simpler (e.g. due to STA’s concerns) could just not allow references at the company/group level (even though it’s allowed in the data model/standard) -ZIB: What I was talking about last time was exploration about extensions, using metadata, that gives us the ability to turn on features using flags and maintain that information. +ZIB: What I was talking about last time was exploration about extensions, using metadata, that gives us the ability to turn on features using flags and maintain that information. EAO: It’s nice to have consensus on this topic of , which I am seeing here. RCA: Can we ask for consensus, here? -STA: I have my doubts about them, and I’m less enthusiastic about them, but it’s really easy to work around them, and if you do, it’s even worse. It’s not perfect, but all the alternatives are worse. +STA: I have my doubts about them, and I’m less enthusiastic about them, but it’s really easy to work around them, and if you do, it’s even worse. It’s not perfect, but all the alternatives are worse. MIH: I agree with all the concerns that STA has, too. @@ -77,13 +78,13 @@ MIH: They’re different. LHS: But can we talk about not having internal selectors if we use message references? -MIH: ZIB, is it okay to share info from our 1-on-1 meeting earlier today? We are cautiously optimistic that there is a way to work with this without needing to use internal selectors. +MIH: ZIB, is it okay to share info from our 1-on-1 meeting earlier today? We are cautiously optimistic that there is a way to work with this without needing to use internal selectors. ZIB: I will write up a summary of our meeting and post it as a GH issue. EAO: The (external) message that is being referenced by another message should be able to have access to some variable representing the context of the message that uses the reference. -MIH: Is this example correct? “I am visiting {city_name}” and the translator wants to know, when translating the message that {city_name} points to, that it is a dative or locative case. +MIH: Is this example correct? “I am visiting {city_name}” and the translator wants to know, when translating the message that {city_name} points to, that it is a dative or locative case. STA: One thing about genders and casing was that developers define the messages with cases, etc., and once they do, they’ve created a sort of public API for the messages, whether they realize it or not. @@ -93,11 +94,11 @@ STA: For example, someone might call it genitive, but in Polish, I may not call GRH: Linguists come in and define it, but over time we’ve improved after we realize that we want to standardize the names of terms. -MIH: A proposal: we don’t put it in the standard, but instead we create a registry like what Unicode does for locale identifiers (BCP 47) that uses IANA for a registry. With a clear expectation that it is subject to change. +MIH: A proposal: we don’t put it in the standard, but instead we create a registry like what Unicode does for locale identifiers (BCP 47) that uses IANA for a registry. With a clear expectation that it is subject to change. -STA: That is something that we were thinking about, too. Maybe on a per-application basis. +STA: That is something that we were thinking about, too. Maybe on a per-application basis. -NIC: We had discussions on Github about special file formats for references. Is that necessary for references? +NIC: We had discussions on Github about special file formats for references. Is that necessary for references? MIH: I think that it can be designed to be file-format agnostic. @@ -105,9 +106,9 @@ EAO: My feeling is that we should be able to provide _a_ file format that can su LHS: Like a reference implementation? -EAO: Yes. I think YAML is the only one that does ______. But I don’t think we should talk about file formats right now. +EAO: Yes. I think YAML is the only one that does **\_\_**. But I don’t think we should talk about file formats right now. -DAF: Going back to the topic of registry. We have quite a lot of things already that would benefit from a registry. Maybe a repo for all of them? General linguistics define many of the grammatical concepts that map to variants of messages. But we also have to think about where the repo goes, who maintains it, etc.? Would Unicode be maintaining it? For XLIFF, we have a place where we allow people to register their own custom values so that they don’t just go off and do things opaquely. +DAF: Going back to the topic of registry. We have quite a lot of things already that would benefit from a registry. Maybe a repo for all of them? General linguistics define many of the grammatical concepts that map to variants of messages. But we also have to think about where the repo goes, who maintains it, etc.? Would Unicode be maintaining it? For XLIFF, we have a place where we allow people to register their own custom values so that they don’t just go off and do things opaquely. MIH: XLIFF 2 has a model that defines standard ways to add extensions http://docs.oasis-open.org/xliff/xliff-core/v2.1/os/xliff-core-v2.1-os.html#extensions @@ -116,7 +117,7 @@ DAF: also it is a good idea to reserve some authority for ourselves and make cle EAO: This overlaps with the discussion of references (?) and we need a registry for that. -DAF: We are talking about one registry, but maybe we need more than one. For example, a registry of variants per language is a different kind of information than an inter-language registry of general linguistic categories Unicode CLDR acts as registry for several external specs such as BCP 47 extensions U and T.. We also have to start thinking about where to place the registry technically and politically. +DAF: We are talking about one registry, but maybe we need more than one. For example, a registry of variants per language is a different kind of information than an inter-language registry of general linguistic categories Unicode CLDR acts as registry for several external specs such as BCP 47 extensions U and T.. We also have to start thinking about where to place the registry technically and politically. EAO: What we do not know yet is whether these registries will have 100 entries or 1000 entries, and these sorts of matters will shape the discussions of who owns that registry and how does it operate. @@ -126,14 +127,13 @@ STA: Anyone -- is there any sustained opposition to MessageFormat 2.0 having ref RCA: We seem to have consensus. -DAF: We should bring this up in the monthly meeting. Only there can we make decisions. The consensus here is still helpful for the taskforce. +DAF: We should bring this up in the monthly meeting. Only there can we make decisions. The consensus here is still helpful for the taskforce. -ZIB: I would like to see the [Apache voting system](https://www.apache.org/foundation/voting.html) to make it quick and clear, from -1.0 to 1.0. It helps see the temperature on the discussion and not a difficult binary system. +ZIB: I would like to see the [Apache voting system](https://www.apache.org/foundation/voting.html) to make it quick and clear, from -1.0 to 1.0. It helps see the temperature on the discussion and not a difficult binary system. RCA: We should still have the official vote in our monthly meeting, based on our rules. -DAF: But maybe we can quickly vote - +DAF: But maybe we can quickly vote Apache style Voting on including message references in data model: @@ -158,7 +158,7 @@ RCA: Should we count? ZIB: It looks good, everything is fairly strongly positive with a low standard deviation. -RCA: We should have someone bring this to the plenary meeting to describe the discussion and the consensus and temperature reading mechanism using Apache voting. Can someone do that? +RCA: We should have someone bring this to the plenary meeting to describe the discussion and the consensus and temperature reading mechanism using Apache voting. Can someone do that? MIH: I can. @@ -168,7 +168,7 @@ RCA: I see this as an extension of the message reference itself. MIH: It kind of is. -EAO: I would like us to consider the proposal and get consensus for the ability to pass context in the message reference. If we do, then we can call for consensus on not allowing internal selectors. +EAO: I would like us to consider the proposal and get consensus for the ability to pass context in the message reference. If we do, then we can call for consensus on not allowing internal selectors. MIH: We could take a look at the metadata that we tie to the references, but we should not connect it to the topic of internal selectors. @@ -178,7 +178,6 @@ MIH: a need for metadata for message references Example: `You visited {$company} headquarters` It is useful to allow a translator to add some extra info to the ref, for example the fact that it should use a locative grammatical case. - EAO: STA, can you formulate a statement that you would be willing to support? STA: That might be too much to do right now. @@ -208,9 +207,9 @@ EAO: That _a_ possible implementation, but what I am looking for is a consensus MIH: Okay, I wouldn’t call that “strongly typed”, but it should be defined in the registry. -ZIB: Another way of thinking about it is having some form of meta information that defines which selectors/keys and/or values provided to them that are passed to the message that is being referenced. But we still allow companies/groups to define their own selector types. +ZIB: Another way of thinking about it is having some form of meta information that defines which selectors/keys and/or values provided to them that are passed to the message that is being referenced. But we still allow companies/groups to define their own selector types. -DAF: We need 3 levels: things in the standard, things in the registry, and things in the control of the code owners. That’s basically defining extension points that say what we additionally accept and where. +DAF: We need 3 levels: things in the standard, things in the registry, and things in the control of the code owners. That’s basically defining extension points that say what we additionally accept and where. We need to define extension points. @@ -218,7 +217,7 @@ MIH: Yes, XLIFF 1.2 defines so many extension points that tool makers extend it DAF: we need to be clear that the private extensions must not compete for functionality with the standard or the registry when adding their own private values. -CLS: Another simple example: +CLS: Another simple example: ``` color-ball = "You picked the {$color} ball." @@ -239,11 +238,10 @@ MIH: maybe color-toy = "Escojiste el juguete {$color, { grammatical-gender: masculine, grammatical-plural: singular} }." ``` -`$` in front of color means reference (Fluent style), but it is not a proposed syntax, just to exemplify. +`$` in front of color means reference (Fluent style), but it is not a proposed syntax, just to exemplify. STA: Message references create an implicit API, and what I’m trying to solve is preventing someone from breaking this API. - RCA: Should we bring this to the plenary and discuss there? EAO: Let’s do the Apache voting on this topic. @@ -287,13 +285,13 @@ ECH: I think that could be an implementation detail, but I agree that it’s nic NIC: How will linguists handle variables that are managed in registries, especially for big datasets? Are we expecting them to have access to variable values during translations and if so, I presume we would expect TMSes to implement this new standard for this to work? -MIH: Yes… The metadata stays the same no matter how big the dataset is. Things like grammatical case / gender / number don’t depend on how many items we apply them to. But +MIH: Yes… The metadata stays the same no matter how big the dataset is. Things like grammatical case / gender / number don’t depend on how many items we apply them to. But STA: If we make this lenient and the parameters don’t match at runtime, then it would be useful for the parent message to know that the parameters don’t match. -MIH: An example I had was `hello {$username}`, and in Polish the case of the `$username` changes to vocative, but what happens when there is no vocative form of the name available? Can the translator have a default option that doesn’t use the placeholder altogether? +MIH: An example I had was `hello {$username}`, and in Polish the case of the `$username` changes to vocative, but what happens when there is no vocative form of the name available? Can the translator have a default option that doesn’t use the placeholder altogether? -DAF: I agree, we need an exception payload +DAF: I agree, we need an exception payload EAO: Do we need the referenced messages to return not only the value (string, array, etc) but also the case/variant that was chosen, so that the referencing message has a chance to react? @@ -301,7 +299,7 @@ STA: There needs to be 2-way communication, I agree EAO. EAO: Given the mini-consensus that the task force has come to on these topics, I would be okay in not allowing internal selectors in MF 2.0. -RCA: We can bring these issues to the next plenary meeting. I would like to ask for another volunteer to bring this issue about characterizing metainformation passed with a message reference. +RCA: We can bring these issues to the next plenary meeting. I would like to ask for another volunteer to bring this issue about characterizing metainformation passed with a message reference. MIH: Let’s summarize our discussion. @@ -313,6 +311,6 @@ MIH: So we’ll move fast, but let’s summarize what we have right here. RCA: MIH, do you want to have another task force meeting on this? -MIH: For message references + metadata, we can take it to the plenary meeting without further discussion. Not yet for internal selectors. +MIH: For message references + metadata, we can take it to the plenary meeting without further discussion. Not yet for internal selectors. -RCA : We should define the terminology of build time & runtime, here is the place [#126](https://github.com/unicode-org/message-format-wg/issues/126) \ No newline at end of file +RCA : We should define the terminology of build time & runtime, here is the place [#126](https://github.com/unicode-org/message-format-wg/issues/126) diff --git a/meetings/task-force/#130-2021-01-11.md b/meetings/task-force/#130-2021-01-11.md index 10f108cb47..d8a1713120 100644 --- a/meetings/task-force/#130-2021-01-11.md +++ b/meetings/task-force/#130-2021-01-11.md @@ -1,7 +1,9 @@ ## Executive summary ([Original Doc](https://docs.google.com/document/d/1P7qhnxUDUpD5AKpcQp_nfIYj2ZBDoXS8YspmN3eV3f8/edit#)) + Executive summary Participants: + - RCA: Romulo Cintra - NIC: Nicolas Bouvrette - Expedia - EAO: Eemeli Aro - OpenJSF @@ -13,13 +15,10 @@ Participants: - GWR: George Rhoten - Apple - MIH: Mihai Nita - -There is a general consensus around supporting dynamic references. There are some valid use cases to support, we probably can't prevent people from working around dynamic references, and by supporting them we “gain back” some control (conformance levels, lint, etc.). This can simplify messages that could otherwise have thousands of related static messages, but it brings along the risk of extra complexity and indirection. There are concerns about testing & validation -- ex: what happens at word boundaries, agreement between selector name, context completeness checking. Conformance levels with regards to this feature depend on the capability to switch off dynamic- and static message referencing. +There is a general consensus around supporting dynamic references. There are some valid use cases to support, we probably can't prevent people from working around dynamic references, and by supporting them we “gain back” some control (conformance levels, lint, etc.). This can simplify messages that could otherwise have thousands of related static messages, but it brings along the risk of extra complexity and indirection. There are concerns about testing & validation -- ex: what happens at word boundaries, agreement between selector name, context completeness checking. Conformance levels with regards to this feature depend on the capability to switch off dynamic- and static message referencing. We agree to pause meetings of this particular task force for issue #130 until we have progress on the data model that requires clarifying these details. - - > Approval Stamps for Executive Summary ECH @@ -29,7 +28,6 @@ MIH DAF NIC - ## Minutes ZBI: Summary of dynamic selectors. Previously, we wanted to provide a design for developers to communicate to translators. Challenges at Mozilla happen at build time, developers don't know what messages they want to reference from another -- it is a runtime decision. Workarounds are messy, proposal is dynamic references -- have references to another message within a message that are only resolved at runtime. Avoids previous errors from workarounds of string concatenation and different fallback (locale?) of 2 different message patterns. @@ -46,7 +44,7 @@ Is there anyone opposing the proposal? STA: I don't want to sounds opposing, but I want to raise some red flags, and raise the idea of the registry. With message references, we have the API that is effectively created by the message, which includes the name of the selector. If you provide the wrong selector name, the message will break, as has happened in Fluent, and you could use static analysis somehow to catch them. But you could also create a registry of defined names and values for selectors, and the developers can override the registry to customize that to their specific needs. I hope we can talk about this because not having a registry creates a dangerous solution. -ZBI: Every solution has its soft spots, so any solution should address this problem, and I hope +ZBI: Every solution has its soft spots, so any solution should address this problem, and I hope DAF: The dynamic references create the same problem as internal selectors for translators (localization), but they address the same problem. The difference is the time at which they are resolved (compile/build time, runtime). But I don't think the standard should define when these references are resolved. I do like the conversation about checking, validation, etc. I think these problems can be avoided by not using message references, but I think we are moving in the right direction. @@ -82,7 +80,7 @@ DAF: I want to agree here, we need to construct examples that show the problem. NIC: I agree with DAF. There is a huge implication with supporting references correctly - does this require the right tooling, how does it affect translation memory? -ZBI: Is this example here a better example? https://github.com/projectfluent/fluent/issues/80 +ZBI: Is this example here a better example? https://github.com/projectfluent/fluent/issues/80 NIC: It brings up the question of what is the limit on the number of static strings before you resort to message references. @@ -94,7 +92,7 @@ RCA: I agree with STA, it is an interesting point-of-view. I don't know how we s EAO: It shouldn't be that hard to write tooling that does checking to verify whether messages and references are broken. -RCA: True, but this is adding some complexity, and it +RCA: True, but this is adding some complexity, and it NIC: I'm also worried about tooling because existing tooling around current MessageFormat is so bad that I do not want to give it to linguists and translators. @@ -110,7 +108,7 @@ NIC: Is that similar to a lexicon? GWR: It's like a highly structured lexicon. For example, in the Fluent#80 example with bone-dragon, you have to describe what it looks like as singular vs plural, if it's definite or indefinite article. -DAF: The capabilities that GWR describe should be features that describe the context. We should make the context capable of completeness tests. I see this as a local instance of the registry at various levels. +DAF: The capabilities that GWR describe should be features that describe the context. We should make the context capable of completeness tests. I see this as a local instance of the registry at various levels. To STA, who mentioned limiting references to just nouns, it is hard to tell what parts of speech are sometimes ("stop the steal" - is "steal" a noun or a verb?). I think the standard should be limited at the standard level, it should only be limited at the linter level. @@ -128,7 +126,7 @@ EAO: That's a separate detailed implementation discussion, I don't want to sidet DAF: I think the standard describes all 3 levels, but then there could be a conformance statement (from a checker), but the levels could be described in the standard. -RCA: I don't understand what the levels mean about what kind of support we provide or not, but I don't see how the linter +RCA: I don't understand what the levels mean about what kind of support we provide or not, but I don't see how the linter DAF: There could be a combination of features that go together that cannot be easily checked independently by a linter. @@ -138,7 +136,7 @@ RCA: What are the takeaways from this meeting. STA: This is interesting, the more complex the data model gets, the more ways to break it there will be, so there will be more work needed to be put in to the linter to help people use it in the best ways. -RCA: How do we go about working on a linter. We can start working on this now, starting from 0; or we can have this in mind as we continue designing the data model; or do we want to clarify using more examples about what is needed? +RCA: How do we go about working on a linter. We can start working on this now, starting from 0; or we can have this in mind as we continue designing the data model; or do we want to clarify using more examples about what is needed? EAO: I think it is too early to start defining levels of the data model. @@ -161,7 +159,3 @@ DAF: Let's make a summary for the plenary. STA: What I take away from this meeting is that there is not a big difference between implementing regular references, and go ahead and work on the data model. ZBI: One difference we encountered with dynamic references is that without them, there were people trying to resolve regular references at build time, but dynamic references can be used in such compile/build-time systems. It's important to at least discuss early on to avoid those friction points. Either DAF and EAO mentioned it in chat, that List Formatting can affect the type of arguments that you pass in. But we can discuss that when we discuss the details of that after deciding whether and how to distinguish dynamic references. - - - - diff --git a/spec/README.md b/spec/README.md index f37e333543..b091d7be16 100644 --- a/spec/README.md +++ b/spec/README.md @@ -24,15 +24,15 @@ to make it culturally accepted and grammatically correct. > For example, if your US English (`en-US`) interface has a message like: > ->> Your item had 1,023 views on April 3, 2023 +> > Your item had 1,023 views on April 3, 2023 > > You want the translated message to be appropriately formatted into French: > ->> Votre article a eu 1 023 vues le 3 avril 2023 +> > Votre article a eu 1 023 vues le 3 avril 2023 > > Or Japanese: > ->> あなたのアイテムは 2023 年 4 月 3 日に 1,023 回閲覧されました。 +> > あなたのアイテムは 2023 年 4 月 3 日に 1,023 回閲覧されました。 This specification defines the data model, syntax, processing, and conformance requirements diff --git a/spec/data-model/README.md b/spec/data-model/README.md index 50d924c6f0..6632211e90 100644 --- a/spec/data-model/README.md +++ b/spec/data-model/README.md @@ -14,6 +14,7 @@ Implementations are not required to use this data model for their internal repre To ensure compatibility across all platforms, this interchange data model is defined here using TypeScript notation. Two equivalent definitions of the data model are also provided: + - [`message.json`](./message.json) is a JSON Schema definition, for use with message data encoded as JSON or compatible formats, such as YAML. - [`message.dtd`](./message.dtd) is a document type definition (DTD), @@ -25,19 +26,19 @@ A `SelectMessage` corresponds to a syntax message that includes _selectors_. A message without _selectors_ and with a single _pattern_ is represented by a `PatternMessage`. ```ts -type Message = PatternMessage | SelectMessage +type Message = PatternMessage | SelectMessage; interface PatternMessage { - type: 'message' - declarations: Declaration[] - pattern: Pattern + type: "message"; + declarations: Declaration[]; + pattern: Pattern; } interface SelectMessage { - type: 'select' - declarations: Declaration[] - selectors: Expression[] - variants: Variant[] + type: "select"; + declarations: Declaration[]; + selectors: Expression[]; + variants: Variant[]; } ``` @@ -48,8 +49,8 @@ The `name` does not include the initial `$` of the _variable_. ```ts interface Declaration { - name: string - value: Expression + name: string; + value: Expression; } ``` @@ -60,13 +61,13 @@ This is always `'*'` in MessageFormat 2 syntax, but may vary in other formats. ```ts interface Variant { - keys: Array - value: Pattern + keys: Array; + value: Pattern; } interface CatchallKey { - type: '*' - value?: string + type: "*"; + value?: string; } ``` @@ -84,17 +85,17 @@ A `body` with an unrecognized value SHOULD be treated as an `Unsupported` value. ```ts interface Pattern { - body: Array + body: Array; } interface Text { - type: 'text' - value: string + type: "text"; + value: string; } interface Expression { - type: 'expression' - body: Literal | VariableRef | FunctionRef | Unsupported + type: "expression"; + body: Literal | VariableRef | FunctionRef | Unsupported; } ``` @@ -112,13 +113,13 @@ In a `VariableRef`, the `name` does not include the initial `$` of the _variable ```ts interface Literal { - type: 'literal' - value: string + type: "literal"; + value: string; } interface VariableRef { - type: 'variable' - name: string + type: "variable"; + name: string; } ``` @@ -134,16 +135,16 @@ Each _option_ is represented by an `Option`. ```ts interface FunctionRef { - type: 'function' - kind: 'open' | 'close' | 'value' - name: string - operand?: Literal | VariableRef - options?: Option[] + type: "function"; + kind: "open" | "close" | "value"; + name: string; + operand?: Literal | VariableRef; + options?: Option[]; } interface Option { - name: string - value: Literal | VariableRef + name: string; + value: Literal | VariableRef; } ``` @@ -171,10 +172,10 @@ that the implementation attaches to that _annotation_. ```ts interface Unsupported { - type: 'unsupported' - sigil: '!' | '@' | '#' | '%' | '^' | '&' | '*' | '<' | '>' | '/' | '?' | '~' - source: string - operand?: Literal | VariableRef + type: "unsupported"; + sigil: "!" | "@" | "#" | "%" | "^" | "&" | "*" | "<" | ">" | "/" | "?" | "~"; + source: string; + operand?: Literal | VariableRef; } ``` diff --git a/spec/formatting.md b/spec/formatting.md index e3c7cefcf3..9d01f3ee3d 100644 --- a/spec/formatting.md +++ b/spec/formatting.md @@ -151,9 +151,9 @@ the following steps are taken: and use a _fallback value_ for the _expression_. 3. Resolve the _option_ values to a mapping of string identifiers to values. For each _option_: - * If its right-hand side successfully resolves to a value, - bind the _name_ of the _option_ to the resolved value in the mapping. - * Otherwise, do not bind the _name_ of the _option_ to any value in the mapping. + - If its right-hand side successfully resolves to a value, + bind the _name_ of the _option_ to the resolved value in the mapping. + - Otherwise, do not bind the _name_ of the _option_ to any value in the mapping. 4. Call the function implementation with the following arguments: - The current _locale_. @@ -239,7 +239,7 @@ _Pattern selection_ is not supported for _fallback values_. When a _message_ contains a _match_ construct with one or more _expressions_, the implementation needs to determine which _variant_ will be used -to provide the _pattern_ for the formatting operation. +to provide the _pattern_ for the formatting operation. This is done by ordering and filtering the available _variant_ statements according to their _key_ values and selecting the first one. @@ -248,21 +248,23 @@ The number of _keys_ in each _variant_ MUST equal the number of _expressions_ in Each _key_ corresponds to an _expression_ in the _selectors_ by its position in the _variant_. > For example, in this message: +> > ``` > match {:one} {:two} {:three} > when 1 2 3 { ... } > ``` +> > The first _key_ `1` corresponds to the first _expression_ in the _selectors_ (`{:one}`), -> the second _key_ `2` to the second _expression_ (`{:two}`), +> the second _key_ `2` to the second _expression_ (`{:two}`), > and the third _key_ `3` to the third _expression_ (`{:three}`). To determine which _variant_ best matches a given set of inputs, each _selector_ is used in turn to order and filter the list of _variants_. Each _variant_ with a _key_ that does not match its corresponding _selector expression_ -is omitted from the list of _variants_. +is omitted from the list of _variants_. The remaining _variants_ are sorted according to the _expression_'s _key_-ordering preference. -Earlier _expressions_ in the _selector_'s list of _expressions_ have a higher priority than later ones. +Earlier _expressions_ in the _selector_'s list of _expressions_ have a higher priority than later ones. When all of the _selector expressions_ have been processed, the earliest-sorted _variant_ in the remaining list of _variants_ is selected. @@ -703,7 +705,7 @@ These are divided into the following categories: > when * {Value is not one} > ``` - - **Duplicate Option Name errors** occur when the same _name_ + - **Duplicate Option Name errors** occur when the same _name_ appears on the left-hand side of more than one _option_ in the same _expression_. diff --git a/spec/syntax.md b/spec/syntax.md index 90d62464a7..099d5a4696 100644 --- a/spec/syntax.md +++ b/spec/syntax.md @@ -77,7 +77,7 @@ The author of a _message_ can also assign _local variables_, including variables that modify _external variables_. This part of the MessageFormat specification defines the syntax for a _message_, -along with the concepts and terminology needed when processing a _message_ +along with the concepts and terminology needed when processing a _message_ during the [formatting](./formatting.md) of a _message_ at runtime. The complete formal syntax of a _message_ is described by the [ABNF](./message.abnf). @@ -104,18 +104,24 @@ A **_message_** is the complete template for a specific message forma > In general (and except where required by the syntax), whitespace carries no meaning in the structure > of a _message_. While many of the examples in this spec are written on multiple lines, the formatting > shown is primarily for readability. ->> **Example** This _message_: ->>``` ->>let $foo = { |horse| } ->>{You have a {$foo}!} ->>``` ->> Can also be written as: ->>``` ->>let $foo={|horse|}{You have a {$foo}!} ->>``` -> An exception to this is: whitespace inside a _pattern_ is **always** significant. +> +> > **Example** This _message_: +> > +> > ``` +> > let $foo = { |horse| } +> > {You have a {$foo}!} +> > ``` +> > +> > Can also be written as: +> > +> > ``` +> > let $foo={|horse|}{You have a {$foo}!} +> > ``` +> > +> > An exception to this is: whitespace inside a _pattern_ is **always** significant. A _message_ consists of two parts: + 1. an optional list of _declarations_, followed by 2. a _body_ @@ -142,20 +148,23 @@ All _messages_ MUST contain a _body_. An empty string is not a _well-formed_ _message_. > A simple _message_ containing only a _body_: +> > ``` > {Hello world!} > ``` ->The same _message_ defined in a `.properties` file: > ->```properties ->app.greetings.hello = {Hello, world!} ->``` ->The same _message_ defined inline in JavaScript: +> The same _message_ defined in a `.properties` file: +> +> ```properties +> app.greetings.hello = {Hello, world!} +> ``` > ->```js ->let hello = new MessageFormat('{Hello, world!}') ->hello.format() ->``` +> The same _message_ defined inline in JavaScript: +> +> ```js +> let hello = new MessageFormat("{Hello, world!}"); +> hello.format(); +> ``` ## Pattern @@ -171,11 +180,12 @@ pattern = "{" *(text / expression) "}" A _pattern_ MAY be empty. > An empty _pattern_: +> > ``` > {} > ``` -A _pattern_ MAY contain an arbitrary number of _placeholders_ to be evaluated +A _pattern_ MAY contain an arbitrary number of _placeholders_ to be evaluated during the formatting process. ### Text @@ -195,6 +205,7 @@ various formats regardless of the container's whitespace trimming rules. > In a Java `.properties` file, the values `hello` and `hello2` both contain > an identical _message_ which consists of a single _pattern_. > This _pattern_ consists of _text_ with exactly three spaces before and after the word "Hello": +> > ```properties > hello = { Hello } > hello2={ Hello } @@ -228,11 +239,12 @@ determined at runtime. A _matcher_ consists of the keyword `match` followed by at least one _selector_ and at least one _variant_. -When the _matcher_ is processed, the result will be a single _pattern_ that serves +When the _matcher_ is processed, the result will be a single _pattern_ that serves as the template for the formatting process. A _message_ can only be considered _valid_ if the following requirements are satisfied: + - The number of _keys_ on each _variant_ MUST be equal to the number of _selectors_. - At least one _variant_ MUST exist whose _keys_ are all equal to the "catch-all" key `*`. @@ -241,6 +253,7 @@ matcher = match 1*(selector) 1*(variant) ``` > A _message_ with a _matcher_: +> > ``` > match {$count :number} > when 1 {You have one notification.} @@ -248,6 +261,7 @@ matcher = match 1*(selector) 1*(variant) > ``` > A _message_ containing a _matcher_ formatted on a single line: +> > ``` > match {:platform} when windows {Settings} when * {Preferences} > ``` @@ -256,7 +270,7 @@ matcher = match 1*(selector) 1*(variant) A **_selector_** is an _expression_ that ranks or excludes the _variants_ based on the value of its corresponding _key_ in each _variant_. -The combination of _selectors_ in a _matcher_ thus determines +The combination of _selectors_ in a _matcher_ thus determines which _pattern_ will be used during formatting. ```abnf @@ -266,27 +280,27 @@ selector = expression There MUST be at least one _selector_ in a _matcher_. There MAY be any number of additional _selectors_. ->A _message_ with a single _selector_ that uses a custom `:hasCase` _function_, ->allowing the _message_ to choose a _pattern_ based on grammatical case: +> A _message_ with a single _selector_ that uses a custom `:hasCase` _function_, +> allowing the _message_ to choose a _pattern_ based on grammatical case: > ->``` ->match {$userName :hasCase} ->when vocative {Hello, {$userName :person case=vocative}!} ->when accusative {Please welcome {$userName :person case=accusative}!} ->when * {Hello!} ->``` - ->A message with two _selectors_: +> ``` +> match {$userName :hasCase} +> when vocative {Hello, {$userName :person case=vocative}!} +> when accusative {Please welcome {$userName :person case=accusative}!} +> when * {Hello!} +> ``` + +> A message with two _selectors_: > ->``` ->match {$photoCount :number} {$userGender :equals} ->when 1 masculine {{$userName} added a new photo to his album.} ->when 1 feminine {{$userName} added a new photo to her album.} ->when 1 * {{$userName} added a new photo to their album.} ->when * masculine {{$userName} added {$photoCount} photos to his album.} ->when * feminine {{$userName} added {$photoCount} photos to her album.} ->when * * {{$userName} added {$photoCount} photos to their album.} ->``` +> ``` +> match {$photoCount :number} {$userGender :equals} +> when 1 masculine {{$userName} added a new photo to his album.} +> when 1 feminine {{$userName} added a new photo to her album.} +> when 1 * {{$userName} added a new photo to their album.} +> when * masculine {{$userName} added {$photoCount} photos to his album.} +> when * feminine {{$userName} added {$photoCount} photos to her album.} +> when * * {{$userName} added {$photoCount} photos to their album.} +> ``` ### Variant @@ -307,7 +321,7 @@ key = literal / "*" #### Key A **_key_** is a value in a _variant_ for use by a _selector_ when ranking -or excluding _variants_ during the _matcher_ process. +or excluding _variants_ during the _matcher_ process. A _key_ can be either a _literal_ value or the "catch-all" key `*`. The **_catch-all key_** is a special key, represented by `*`, @@ -318,11 +332,11 @@ that matches all values for a given _selector_. An **_expression_** is a part of a _message_ that will be determined during the _message_'s formatting. -An _expression_ MUST begin with U+007B LEFT CURLY BRACKET `{` +An _expression_ MUST begin with U+007B LEFT CURLY BRACKET `{` and end with U+007D RIGHT CURLY BRACKET `}`. An _expression_ MUST NOT be empty. -An _expression_ can contain an _operand_, -an _annotation_, +An _expression_ can contain an _operand_, +an _annotation_, or an _operand_ followed by an _annotation_. ```abnf @@ -333,6 +347,7 @@ annotation = (function *(s option)) / private-use / reserved There are several types of _expression_ that can appear in a _message_. All _expressions_ share a common syntax. The types of _expression_ are: + 1. The value of a _declaration_ 2. A _selector_ 3. A _placeholder_ in a _pattern_ @@ -340,15 +355,20 @@ All _expressions_ share a common syntax. The types of _expression_ are: > Examples of different types of _expression_ > > Declarations: +> > ``` > let $x = {|This is an expression|} > let $y = {$operand :function option=operand} > ``` +> > Selectors: +> > ``` > match {$selector :functionRequired} > ``` +> > Placeholders: +> > ``` > {This placeholder contains an {|expression with a literal|}} > {This placeholder references a {$variable}} @@ -366,7 +386,7 @@ operand = literal / variable ### Annotation -An **_annotation_** is part of an _expression_ containing either +An **_annotation_** is part of an _expression_ containing either a _function_ together with its associated _options_, or a _private-use_ or _reserved_ sequence. @@ -383,13 +403,13 @@ A **_function_** is named functionality in an _annotation_. _Functions_ are used to evaluate, format, select, or otherwise process data values during formatting. -Each _function_ is defined by the runtime's _function registry_. -A _function_'s entry in the _function registry_ will define +Each _function_ is defined by the runtime's _function registry_. +A _function_'s entry in the _function registry_ will define whether the _function_ is a _selector_ or formatter (or both), -whether an _operand_ is required, +whether an _operand_ is required, what form the values of an _operand_ can take, -what _options_ and _option_ values are valid, -and what outputs might result. +what _options_ and _option_ values are valid, +and what outputs might result. See [function registry](./) for more information. _Functions_ can be _standalone_, or can be an _opening element_ or _closing element_. @@ -401,19 +421,22 @@ A **_closing element_** is a _function_ that SHOULD be paired with an An _opening element_ MAY be present in a message without a corresponding _closing element_, and vice versa. ->A _message_ with a _standalone_ _function_ operating on the _variable_ `$now`: ->``` ->{{$now :datetime}} ->``` ->A _message_ with two markup-like _functions_, `button` and `link`, ->which the runtime can use to construct a document tree structure for a UI framework: +> A _message_ with a _standalone_ _function_ operating on the _variable_ `$now`: +> +> ``` +> {{$now :datetime}} +> ``` +> +> A _message_ with two markup-like _functions_, `button` and `link`, +> which the runtime can use to construct a document tree structure for a UI framework: > ->``` ->{{+button}Submit{-button} or {+link}cancel{-link}.} ->``` +> ``` +> {{+button}Submit{-button} or {+link}cancel{-link}.} +> ``` A _function_ consists of a prefix sigil followed by a _name_. The following sigils are used for _functions_: + - `:` for a _standalone_ function - `+` for an _opening element_ - `-` for a _closing element_ @@ -442,44 +465,44 @@ option = name [s] "=" [s] (literal / variable) > > A _message_ with a `$date` _variable_ formatted with the `:datetime` _function_: > ->``` ->{Today is {$date :datetime weekday=long}.} ->``` +> ``` +> {Today is {$date :datetime weekday=long}.} +> ``` ->A _message_ with a `$userName` _variable_ formatted with ->the custom `:person` _function_ capable of ->declension (using either a fixed dictionary, algorithmic declension, ML, etc.): +> A _message_ with a `$userName` _variable_ formatted with +> the custom `:person` _function_ capable of +> declension (using either a fixed dictionary, algorithmic declension, ML, etc.): > ->``` ->{Hello, {$userName :person case=vocative}!} ->``` +> ``` +> {Hello, {$userName :person case=vocative}!} +> ``` ->A _message_ with a `$userObj` _variable_ formatted with ->the custom `:person` _function_ capable of ->plucking the first name from the object representing a person: +> A _message_ with a `$userObj` _variable_ formatted with +> the custom `:person` _function_ capable of +> plucking the first name from the object representing a person: > ->``` ->{Hello, {$userObj :person firstName=long}!} ->``` - +> ``` +> {Hello, {$userObj :person firstName=long}!} +> ``` #### Private-Use A **_private-use_** _annotation_ is an _annotation_ whose syntax is reserved -for use by a specific implementation or by private agreement between multiple implementations. +for use by a specific implementation or by private agreement between multiple implementations. Implementations MAY define their own meaning and semantics for _private-use_ annotations. A _private-use_ annotation starts with either U+0026 AMPERSAND `&` or U+005E CIRCUMFLEX ACCENT `^`. - + Characters, including whitespace, are assigned meaning by the implementation. The definition of escapes in the `reserved-body` production, used for the body of -a _private-use_ annotation is an affordance to implementations that +a _private-use_ annotation is an affordance to implementations that wish to use a syntax exactly like other functions. Specifically: + - The characters `\`, `{`, and `}` MUST be escaped as `\\`, `\{`, and `\}` respectively -when they appear in the body of a _private-use_ annotation. + when they appear in the body of a _private-use_ annotation. - The character `|` is special: it SHOULD be escaped as `\|` in a _private-use_ annotation, -but can appear unescaped as long as it is paired with another `|`. This is an affordance to -allow _literals_ to appear in the private use syntax. + but can appear unescaped as long as it is paired with another `|`. This is an affordance to + allow _literals_ to appear in the private use syntax. A _private-use_ _annotation_ MAY be empty after its introducing sigil. @@ -494,6 +517,7 @@ private-start = "&" / "^" ``` > Here are some examples of what _private-use_ sequences might look like: +> > ``` > {Here's private use with an operand: {$foo &bar}} > {Here's a placeholder that is entirely private-use: {&anything here}} @@ -501,7 +525,7 @@ private-start = "&" / "^" > {The character \| has to be paired or escaped: {&private || |something between| or isolated: \| }} > {Stop {& "translate 'stop' as a verb" might be a translator instruction or comment }} > {Protect stuff in {^ph}{^/ph}private use{^ph}{^/ph}} ->``` +> ``` #### Reserved @@ -559,9 +583,9 @@ when = %x77.68.65.6E ; "when" A **_literal_** is a character sequence that appears outside of _text_ in various parts of a _message_. -A _literal_ can appear in a _declaration_, +A _literal_ can appear in a _declaration_, as a _key_ value, -as an _operand_, +as an _operand_, or in the value of an _option_. A _literal_ MAY include any Unicode code point except for surrogate code points U+D800 through U+DFFF. @@ -569,7 +593,7 @@ except for surrogate code points U+D800 through U+DFFF. All code points are preserved. A **_quoted_** literal begins and ends with U+005E VERTICAL BAR `|`. -The characters `\` and `|` within a _quoted_ literal MUST be +The characters `\` and `|` within a _quoted_ literal MUST be escaped as `\\` and `\|`. An **_unquoted_** literal is a _literal_ that does not require the `|` @@ -623,7 +647,7 @@ name-char = name-start / DIGIT / "-" / "." / ":" / %xB7 / %x300-36F / %x203F-2040 ``` -> **Note** +> **Note**\ > _External variables_ can be passed in that are not valid _names_. > Such variables cannot be referenced in a _message_, > but are not otherwise errors.