Skip to content

Add consensus decisions #154

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Mar 11, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 69 additions & 0 deletions docs/consensus_decisions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Consensus Decisions

During its proceedings, the working group has reached internal consensus on a number of issues.
This document enumerates those, and provides a reference for later actions.

### Sources

For more details on the process that lead to these decisions, please refer to the following:

- **Consensus 1 & 2:**
Identified as prerequisites for maintaining backwards-compatibility with MessageFormat 1 once Consensus 3 & 4 are agreed upon.
Reached during the meetings of the [issue #103](https://github.com/unicode-org/message-format-wg/issues/103) task-force, and codified during the [October 2020 task-force meeting](https://github.com/unicode-org/message-format-wg/blob/HEAD/meetings/task-force/%23103-2020-10-26.md).
Accepted at the [November 2020 meeting](https://github.com/unicode-org/message-format-wg/blob/HEAD/meetings/2020/notes-2020-11-16.md) of the working group.
- **Consensus 3 & 4:**
The core result of the [issue #103](https://github.com/unicode-org/message-format-wg/issues/103) task-force ([minutes](https://github.com/unicode-org/message-format-wg/tree/master/meetings/task-force)).
Reached in principle during the [December 2020 meeting](https://github.com/unicode-org/message-format-wg/blob/HEAD/meetings/2020/notes-2020-12-14.md) of the working group.
Codified in [issue #137](https://github.com/unicode-org/message-format-wg/issues/137).
Discussed and accepted at the [January 2021](https://github.com/unicode-org/message-format-wg/issues/146) and [February 2021](https://github.com/unicode-org/message-format-wg/blob/HEAD/meetings/2021/notes-2021-02-15.md) meetings of the working group.
- **Consensus 5 & 6:**
The solution for [issue #127](https://github.com/unicode-org/message-format-wg/issues/127).
Codified in [issue #137](https://github.com/unicode-org/message-format-wg/issues/137) during the [January 2021 meeting](https://github.com/unicode-org/message-format-wg/issues/146) of the working group.
Discussed and accepted at the [February 2021 meeting](https://github.com/unicode-org/message-format-wg/blob/HEAD/meetings/2021/notes-2021-02-15.md) of the working group.

## 1: Include message references in the data model.

**Discussion:**
The implementers would find a way to include references anyways, but including it in the data model (standard) can make it subject to best practices.
It will unfortunately still be possible, but much more difficult, for users to do “the wrong thing” by concatenating strings or messages.

One of the drawbacks of message references is that referenced messages effectively have a public API (names of parameters, variables, variants, etc.) which must be consistent across all callsites.
This leads us to consensus 2.

## 2: Allow message references to include parameters in a form that enables their validation.

**Discussion:**
The variables/fields passed should not be completely untyped and unchecked.
We want a validation mechanism that can allow providing early error feedback to the translators and developers.
We need to decide on when the validation can and should happen, including the meaning of “build time” and “run time” in regards to validation.

## 3: Allow for selectors to select a case depending on the value of one or more input arguments.

**Discussion:**
This is a prerequisite for top-level selectors to be able to represent complex messages, without requiring those messages to be split up in an unergonomic manner.
This is an extension or relaxation of what's allowed in MessageFormat 1.

While message references make it technically possible for the data model to represent multi-argument selectors otherwise, this requires the use of n²-1 artificial "messages", where n is the number of arguments. This is not desirable.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The artificiality/undesirability of the n^2-1 messages is in the eye of the beholder. While it is the case that nested select/plural structures beyond a couple of levels produces a lot of rather repetitious (nearly identical but often subtly different--particularly in a forgiving language like English) messages, this is not necessarily undesirable. When the localization system can participate actively, the burden on translators can be reduced.

I am not arguing for folks to have 10-level nested structures, please note. The size of the data is a problem and consistency management becomes a chore. But I often see teams writing 2-3 levels of nest. The best part is, we can make tools that understand this stuff and make it easier to author and it is something I can teach to developers. If we didn't provide this, developers would go back to their bad old ways of writing switches in code or doing substring replacement (grammatical consistency forgotten).

I don't have an alternate suggestion for wording just now, but this is something I have my eye on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also occurs to me that I may not understand the term "nested" in this context. For me, this is a "nested" structure (in this case a select in a plural--I copied this from a code review and I'd have written it with the select on the outside, but still...):

{
  "value": {
    "param": "unreadCount",
    "pluralItems": {
      "=0": {
        "param": "msgType",
        "selectItems": {
          "email": "Hello {name}, you have no emails in your inbox.",
          "notification": "Hello {name}, you have no notifications in your inbox.",
          "other": "Hello {name}, you have no new items in your inbox."
        }
      },
      "one": {
        "param": "msgType",
        "selectItems": {
          "email": "Hello {name}, you have {unreadCount} email in your inbox.",
          "notification": "Hello {name}, you have {unreadCount} notification in your inbox.",
          "other": "Hello {name}, you have {unreadCount} item in your inbox."
        }
      },
      "other": {
        "param": "msgType",
        "selectItems": {
          "email": "Hello {name}, you have {unreadCount} emails in your inbox.",
          "notification": "Hello {name}, you have {unreadCount} notifications in your inbox.",
          "other": "Hello {name}, you have {unreadCount} item in your inbox."
        }
      }
    }
  }
}

Is this what we mean by nested? Or would nested mean the more classical MessageFormat 1 like:

Hello {name}, you have {unreadCount,plural, =0 {{msgType,select, email {{unreadCount,number,integer} emails}} one ...//  etc... won't bother to type the whole thing

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current aim is not to allow for one selector to be inside another selector in the same message, instead enabling the representation of messages such as your example by having that top-level selector be able to use more than one variable (e.g. both unreadCount and msgType) to select one of the cases. Here's what your message would look like with one of the data model candidates we're developing:

{
  id: 'unread-somethings',
  value: {
    select: [
      { func: 'plural', args: [{ var_path: ['unreadCount'] }] },
      { var_path: ['msgType'] }
    ],
    cases: [
      {
        key: [0, 'email'],
        value: ['Hello ', { var_path: ['name'] }, ' you have no emails in your inbox.']
      },
      {
        key: [0, 'notification'],
        value: ['Hello ', { var_path: ['name'] }, ' you have no notifications in your inbox.']
      },
      {
        key: [0, 'other'],
        value: ['Hello ', { var_path: ['name'] }, ' you have no new items in your inbox.']
      },
      {
        key: ['one', 'email'],
        value: ['Hello ', { var_path: ['name'] }, ' you have ', { var_path: ['unreadCount'] }, ' email in your inbox.']
      },
      {
        key: ['one', 'notification'],
        value: ['Hello ', { var_path: ['name'] }, ' you have ', { var_path: ['unreadCount'] }, ' notification in your inbox.']
      },
      {
        key: ['one', 'other'],
        value: ['Hello ', { var_path: ['name'] }, ' you have ', { var_path: ['unreadCount'] }, ' item in your inbox.']
      },
      {
        key: ['other', 'email'],
        value: ['Hello ', { var_path: ['name'] }, ' you have ', { var_path: ['unreadCount'] }, ' emails in your inbox.']
      },
      {
        key: ['other', 'notification'],
        value: ['Hello ', { var_path: ['name'] }, ' you have ', { var_path: ['unreadCount'] }, ' notifications in your inbox.']
      },
      {
        key: ['other', 'other'],
        value: ['Hello ', { var_path: ['name'] }, ' you have ', { var_path: ['unreadCount'] }, ' items in your inbox.']
      }
    ]
  }
}

The n^2 -1 reference in the text is for one of the alternative representations of this message, where we avoid using more than one argument for a selector by extracting variants into separate messages, and one of those is then picked by a parent message. Kind of like this:

{
  id: 'unread-emails',
  value: {
    select: [{ func: 'plural', args: [{ var_path: ['unreadCount'] }] }],
    cases: [
      {
        key: [0],
        value: ['Hello ', { var_path: ['name'] }, ' you have no emails in your inbox.']
      },
      {
        key: ['one'],
        value: ['Hello ', { var_path: ['name'] }, ' you have ', { var_path: ['unreadCount'] }, ' email in your inbox.']
      },
      {
        key: ['other'],
        value: ['Hello ', { var_path: ['name'] }, ' you have ', { var_path: ['unreadCount'] }, ' emails in your inbox.']
      }
    ]
  }
},

{
  id: 'unread-notifications',
  value: {
    select: [{ func: 'plural', args: [{ var_path: ['unreadCount'] }] }],
    cases: [
      {
        key: [0],
        value: ['Hello ', { var_path: ['name'] }, ' you have no notifications in your inbox.']
      },
      {
        key: ['one'],
        value: ['Hello ', { var_path: ['name'] }, ' you have ', { var_path: ['unreadCount'] }, ' notification in your inbox.']
      },
      {
        key: ['other'],
        value: ['Hello ', { var_path: ['name'] }, ' you have ', { var_path: ['unreadCount'] }, ' notifications in your inbox.']
      }
    ]
  }
},

{
  id: 'unread-items',
  value: {
    select: [{ func: 'plural', args: [{ var_path: ['unreadCount'] }] }],
    cases: [
      {
        key: [0],
        value: ['Hello ', { var_path: ['name'] }, ' you have no new items in your inbox.']
      },
      {
        key: ['one'],
        value: ['Hello ', { var_path: ['name'] }, ' you have ', { var_path: ['unreadCount'] }, ' item in your inbox.']
      },
      {
        key: ['other'],
        value: ['Hello ', { var_path: ['name'] }, ' you have ', { var_path: ['unreadCount'] }, ' items in your inbox.']
      }
    ]
  }
},

{
  id: 'unread-somethings',
  value: {
    select: [{ var_path: ['msgType'] }],
    cases: [
      { key: ['email'], value: [{msg_path: ['unread-emails']}] },
      { key: ['notification'], value: [{msg_path: ['unread-notifications']}] },
      { key: ['other'], value: [{msg_path: ['unread-items']}] }
    ]
  }
}

The "actual" message is still the last one, but with only one argument being used for the top-level selectors, the three others are also required.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eemeli thanks for this. I had followed the discussion of matrixed selectors elsewhere, which is why I hadn't come out of the woodwork with comments earlier. Your top example and mine are just "misspellings" of each other--functionally equivalent while being syntactically different. As long as I was mentally able to map the two functionally, I wasn't concerned. When I saw and paused to read the proposed consensus, though, I went "huh? must have missed something here..."

This is a prerequisite for top-level selectors to be able to represent complex messages, without requiring those messages to be split up in an unergonomic manner.

I think the more relevant detail here is: "This allows selectors to represent complex messages while avoiding linguistically problematic constructs that can occur when selectors operate only part of a message" (... such as found in MF1)

This is an extension or relaxation of what's allowed in MessageFormat 1.

I don't find this sentence helpful. This doesn't relax any constraints of MF1 (since it's a separate syntax altogether). I suppose it could be "an extension". But really it's just different from the philosophy of MF1.

While message references make it technically possible for the data model to represent multi-argument selectors otherwise, this requires the use of n²-1 artificial "messages", where n is the number of arguments. This is not desirable.

The latter example is a logical outgrowth of message references. But I don't see how the proposed text leads here: this should be reserved for a discussion of references, as it doesn't have anything to do with number 3?


## 4: Only allow for selectors at the top level of a message.

**Discussion:**
Requiring selectors to only be available at the top level is a good way of helping to maintain the translatability of messages, as well as otherwise guiding MessageFormat 2 users towards good practices.

After an in-depth exploration of the problem space, we have determined that while selectors are a necessary feature of MessageFormat, it is not necessary for them to be available within the body of a message, or directly within a case of a parent selector.

All identified use cases of such constructions may be cleanly represented using a top-level selector that may use more than one input argument to select among a set of messages.
Furthermore, we may enable complete reversibility of message transformations to and from languages such as MessageFormat 1 and Fluent by using message references.

## 5: Top level selectors together with message references provides the same value as nested selectors at a lower cost.

**Discussion:**
Nested selectors provides capabilities that may be useful in avoiding variant permutation explosion in edge cases, but the use of them has not been evaluated in production localization systems to date.
The group believes that the known value of this feature can be sufficiently covered by the combination of message references and top level selection features, which together provide a sufficient feature set at a lower cost to the ecosystem than nested selectors would do.

## 6: The model will be designed to leave the door open for nested selectors being potentially reconsidered in the future.

**Discussion:**
The cost analysis of the nested selectors feature was performed in the absence of sufficient in-field experience of use in production systems.
In result, the group's decision to not currently incorporate the feature is based on the lack of sufficient known value that would require them, which the group recognizes may change in the future.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both this paragraph and the following one say "in result". This is also oddly worded. I'd suggest:

Because the inclusion of nested selectors is not a blocker for the initial release and because the need for nested selectors remains subject to debate, ultimately the group decided not to incorporate the feature into this version of the standard.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aphillips Would you be open to having the current text be accepted provisionally in this PR, and then improving the language of Consensus 6 in a separate PR? I'm getting the sense that this one is a bit more divisive than the rest, and that it might be good to have a more focused discussion on its particulars.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go for it, although these suggestions are more about English prosody than content and I think we've about wrung any additional value out of this thread.

In result, it is the intent of the group to design MessageFormat 2 in a way that wouldn't prevent future revisions of the standard to be extended with nested selectors feature.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads oddly. I would suggest saying this in a positive way:

As a result, the design of MessageFormat 2 is being created in such a way as to potentially allow as to allow future revisions of the standard to be extended with the nested selectors feature.

Or, if we prefer to talk about group intentions:

As a result, it is the intention of the group to design MessageFormat 2 in a way that might allow future versions of the standard to provide a nested selectors feature.