Skip to content

Delimited text looks like code #275

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
eemeli opened this issue May 20, 2022 · 4 comments
Closed

Delimited text looks like code #275

eemeli opened this issue May 20, 2022 · 4 comments
Labels
blocker Blocks the release syntax Issues related with syntax or ABNF

Comments

@eemeli
Copy link
Collaborator

eemeli commented May 20, 2022

In general, existing templating and markup languages use delimiters such as {...}, ${...} and {{...}} to indicate parts of the source as various forms of "code". In fact, I am not aware of any prior art other than MF1 select/plural variants that use curly braces or square brackets around content that is intended for consideration as "text", i.e. localisable content that is intended to become a top-level part of the formatted output. Fluent comes close, using {...} around an entire in-message select expression.

Given this state of affairs, I posit that humans, when scanning a source string, will initially pass over any delimited parts when looking for the content of a message. Furthermore, I posit that breaking from this assumption has made it more difficult for MF1 to reach a greater market share.

Given the above, I think MF2 should have a syntax which delimits variant keys, but not values. In past conversations on this, the strongest argument against such as approach has been that the lack of such delimiters means that the variant value would need to have its surrounding white space trimmed, making the representation of messages with leading or trailing white space more difficult.

I am aware of at least two possible solutions for trimming whitespace around variant values, either of which could provide a workable solution:

  1. Trim all whitespace from the start and end of the variant value. This would require that in-content whitespace at the start and end would need to be explicitly escaped, e.g. as {" "}. As we are not including any in-message selectors in MF2, this would only be relevant when the whole message is expected to start of end with whitespace. I am not intimately familiar with how all translation tooling handles such cases, but my suspicion is that making such surrounding whitespace explicit would make it far less common for it to be accidentally removed, in particular when translating between languages that handle whitespace very differently.

  2. Define clear and minimal trimming rules for content. A likely pair of replacement rules might be to remove exactly one space from the start, and either a newline followed by spaces or a single space from the end. Expressed in code, that looks like this:

    function trimVariant(raw: string) {
      return raw.replace(/^ ?/, '').replace(/(\n *| )?$/, '');
    }

    With this approach, leading and trailing whitespace would not need to be escaped.

To illustrate the difference in these approaches, I've included below examples extraced from the syntax proposal and subsequent discussions. In each case, the original (with some variations in syntax) is first, followed by one or two examples of possible delimited-keys syntaxes.

1 [You have one notification.]
_ [You have {$count} notifications.]

———

[1] You have one notification.
[_] You have {$count} notifications.
vocative [Hello, {$userName: person case=vocative}!]
accusative [Please welcome {$userName: person case=accusative}!]
_ [Hello!]

———

[vocative] Hello, {$userName: person case=vocative}!
[accusative] Please welcome {$userName: person case=accusative}!
[_] Hello!
1 masculine [{$userName} added a new photo to his album.]
1 feminine [{$userName} added a new photo to her album.]
1 _ [{$userName} added a new photo to their album.]
_ masculine [{$userName} added {$photoCount} photos to his album.]
_ feminine [{$userName} added {$photoCount} photos to her album.]
_ _ [{$userName} added {$photoCount} photos to their album.]

———

[1 masculine] {$userName} added a new photo to his album.
[1 feminine] {$userName} added a new photo to her album.
[1 _] {$userName} added a new photo to their album.
[_ masculine] {$userName} added {$photoCount} photos to his album.
[_ feminine] {$userName} added {$photoCount} photos to her album.
[_ _] {$userName} added {$photoCount} photos to their album.

———

[1 masculine] {$userName} added a new photo to his album.
[1 feminine ] {$userName} added a new photo to her album.
[1 _        ] {$userName} added a new photo to their album.
[_ masculine] {$userName} added {$photoCount} photos to his album.
[_ feminine ] {$userName} added {$photoCount} photos to her album.
[_ _        ] {$userName} added {$photoCount} photos to their album.
one one [Some text...] one few [etc.]

———

[one one] Some text... [one few] etc.

As the original sources of these last two examples do not use [...] as delimiters, the corresponding key delimiters here are {[...]}. This issue isn't meant to be about the specific delimiters that are being used, but about which parts are being delimited in the syntax.

=1   fem  {the message 1 F}
=1    _   {the message 1 O}
one  masc {the message One M}
_    masc {the message O M}
_     _   {the message O O}

———

{[  1  fem  ]} the message 1 F
{[  1   _   ]} the message 1 O
{[ one masc ]} the message One M
{[  _  masc ]} the message O M
{[  _   _   ]} the message O O
[1 female] {{$name} added you to her circles.}
[1 male] {{$name} added you to his circles.}
[1 _] {{$name} added you to their circles.}
[_ _] {{$name} added you and {#count} others to their circles.}

———

{[1 female]} {$name} added you to her circles.
{[1 male]} {$name} added you to his circles.
{[1 _]} {$name} added you to their circles.
{[_ _]} {$name} added you and {#count} others to their circles.

———

{[1 female]} {$name} added you to her circles.
{[1 male  ]} {$name} added you to his circles.
{[1 _     ]} {$name} added you to their circles.
{[_ _     ]} {$name} added you and {#count} others to their circles.

Finally, to consider the case of strings with surrounding whitespace, the following examples show how they could look like first with delimited values, then delimited keys with escaped whitespace, and finally when working with the option-2 minimal trimming rules.

1 { and one more}
_ { and {$count} more}

———

{[ 1 ]} {" "}and one more
{[ _ ]} {" "}and {$count} more

———

{[ 1 ]}  and one more
{[ _ ]}  and {$count} more
1 {end of this paragraph
}
_ {end of these paragraphs
}


———

{[ 1 ]} end of this paragraph{"
"}
{[ _ ]} end of these paragraphs{"
"}

———

{[ 1 ]} end of this paragraph

{[ _ ]} end of these paragraphs

@eemeli eemeli added syntax Issues related with syntax or ABNF blocker-candidate The submitter thinks this might be a block for the next release labels May 20, 2022
@stasm
Copy link
Collaborator

stasm commented May 23, 2022

I think there's value in additionally delimiting variant keys (#253), but I'm currently against dropping the requirement for delimiting variant values.

You mention trimming as one of the reasons to support delimiting variant values, but there's also the argument of being explicit about where the value ends, rather than relying on the presence (or the lack) of the next variant key. For example in the message below, we'd need to specify up front the list of all characters which are meant to start a new variant (or start a new syntax concept which we haven't yet consider and which would come after the list of variants).

[1] You have one notification.
[_] You have {$count} notifications.
# What if in the future we want to put metadata here prefixed with a hash?

Furthermore, on the topic of scanning, I'm sure a lot of what we share here is subjective, but I would just like to offer an interpretation that I'm seeing when I look at the following example:

[1 masculine] {$userName} added a new photo to his album.

What I see here are 3 things on an equal level: [1 masculine], {$userName}, and added a new photo to his album.. In fact, it looks like perhaps it's OK to reorder them?

{$userName} [1 masculine] added a new photo to his album.

This doesn't make sense anymore content-wise, but that's the thing about scanning: it happens before we get a chance to parse the sentence and evaluate its correctness.

To my eyes, it's helpful when the entire variant values is enclosed in delimiters, so that I can see that it's a coherent whole, different from the variant key. Like so:

[1 masculine] [{$userName} added a new photo to his album.]

@romulocintra romulocintra added blocker Blocks the release and removed blocker-candidate The submitter thinks this might be a block for the next release labels May 23, 2022
@markusicu
Copy link
Member

Eemeli's examples are missing the lists of selectors before the first key-pattern pairs.

At the start of the message, if you start in "text" mode, you need to distinguish between user text, variable definitions, and selectors. You also need to distinguish placeholders from the other syntax elements, for a message with selectors vs. one without. For this to work you either need to use and reserve several increasingly common-in-normal-text syntax characters that then require escaping (or use different escaping rules in different parts of user text), or use the same {} for pretty much every piece of syntax, like ICU.

Also, it would look like you could write user text before and after a message-with-selectors, like you are allowed to in ICU MessageFormat. We should not allow that.

Whitespace trimming is also tricky around the edges. If you use {( )} you prevent over-trimming, but you also prevent translators from being able to choose whether the pattern should start with a space in their language or not. If you use something like \ to escape a space, you get into trouble with trimming at the end of the message; it could work when the trimming happens inside the message formatting function, but when the message is embedded in some other syntax that trims whitespace before handing over the message, you could easily lose the space and get a message that ends with the backslash.

Trimming exactly one space but not adjacent spaces would be utterly confusing.

I really think we are better off starting in "code" mode and delimiting user text.

@stasm
Copy link
Collaborator

stasm commented Jun 1, 2022

Also, it would look like you could write user text before and after a message-with-selectors, like you are allowed to in ICU MessageFormat. We should not allow that.

I've not considered this before, thanks for mentioning it. It's a strong argument to not mix starting modes for me.

@eemeli
Copy link
Collaborator Author

eemeli commented Jun 28, 2022

Closed by #287.

@eemeli eemeli closed this as completed Jun 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker Blocks the release syntax Issues related with syntax or ABNF
Projects
None yet
Development

No branches or pull requests

4 participants