-
-
Notifications
You must be signed in to change notification settings - Fork 36
Delimited text looks like code #275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think there's value in additionally delimiting variant keys (#253), but I'm currently against dropping the requirement for delimiting variant values. You mention trimming as one of the reasons to support delimiting variant values, but there's also the argument of being explicit about where the value ends, rather than relying on the presence (or the lack) of the next variant key. For example in the message below, we'd need to specify up front the list of all characters which are meant to start a new variant (or start a new syntax concept which we haven't yet consider and which would come after the list of variants).
Furthermore, on the topic of scanning, I'm sure a lot of what we share here is subjective, but I would just like to offer an interpretation that I'm seeing when I look at the following example:
What I see here are 3 things on an equal level:
This doesn't make sense anymore content-wise, but that's the thing about scanning: it happens before we get a chance to parse the sentence and evaluate its correctness. To my eyes, it's helpful when the entire variant values is enclosed in delimiters, so that I can see that it's a coherent whole, different from the variant key. Like so:
|
Eemeli's examples are missing the lists of selectors before the first key-pattern pairs. At the start of the message, if you start in "text" mode, you need to distinguish between user text, variable definitions, and selectors. You also need to distinguish placeholders from the other syntax elements, for a message with selectors vs. one without. For this to work you either need to use and reserve several increasingly common-in-normal-text syntax characters that then require escaping (or use different escaping rules in different parts of user text), or use the same Also, it would look like you could write user text before and after a message-with-selectors, like you are allowed to in ICU MessageFormat. We should not allow that. Whitespace trimming is also tricky around the edges. If you use Trimming exactly one space but not adjacent spaces would be utterly confusing. I really think we are better off starting in "code" mode and delimiting user text. |
I've not considered this before, thanks for mentioning it. It's a strong argument to not mix starting modes for me. |
Closed by #287. |
In general, existing templating and markup languages use delimiters such as
{...}
,${...}
and{{...}}
to indicate parts of the source as various forms of "code". In fact, I am not aware of any prior art other than MF1 select/plural variants that use curly braces or square brackets around content that is intended for consideration as "text", i.e. localisable content that is intended to become a top-level part of the formatted output. Fluent comes close, using{...}
around an entire in-message select expression.Given this state of affairs, I posit that humans, when scanning a source string, will initially pass over any delimited parts when looking for the content of a message. Furthermore, I posit that breaking from this assumption has made it more difficult for MF1 to reach a greater market share.
Given the above, I think MF2 should have a syntax which delimits variant keys, but not values. In past conversations on this, the strongest argument against such as approach has been that the lack of such delimiters means that the variant value would need to have its surrounding white space trimmed, making the representation of messages with leading or trailing white space more difficult.
I am aware of at least two possible solutions for trimming whitespace around variant values, either of which could provide a workable solution:
Trim all whitespace from the start and end of the variant value. This would require that in-content whitespace at the start and end would need to be explicitly escaped, e.g. as
{" "}
. As we are not including any in-message selectors in MF2, this would only be relevant when the whole message is expected to start of end with whitespace. I am not intimately familiar with how all translation tooling handles such cases, but my suspicion is that making such surrounding whitespace explicit would make it far less common for it to be accidentally removed, in particular when translating between languages that handle whitespace very differently.Define clear and minimal trimming rules for content. A likely pair of replacement rules might be to remove exactly one space from the start, and either a newline followed by spaces or a single space from the end. Expressed in code, that looks like this:
With this approach, leading and trailing whitespace would not need to be escaped.
To illustrate the difference in these approaches, I've included below examples extraced from the syntax proposal and subsequent discussions. In each case, the original (with some variations in syntax) is first, followed by one or two examples of possible delimited-keys syntaxes.
As the original sources of these last two examples do not use
[...]
as delimiters, the corresponding key delimiters here are{[...]}
. This issue isn't meant to be about the specific delimiters that are being used, but about which parts are being delimited in the syntax.Finally, to consider the case of strings with surrounding whitespace, the following examples show how they could look like first with delimited values, then delimited keys with escaped whitespace, and finally when working with the option-2 minimal trimming rules.
The text was updated successfully, but these errors were encountered: