Skip to content

Pick a delimiter for literals other than the double quote #263

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
stasm opened this issue May 12, 2022 · 19 comments · Fixed by #359
Closed

Pick a delimiter for literals other than the double quote #263

stasm opened this issue May 12, 2022 · 19 comments · Fixed by #359
Labels
Action-Item Action item assigned by the WG blocker Blocks the release syntax Issues related with syntax or ABNF

Comments

@stasm
Copy link
Collaborator

stasm commented May 12, 2022

@markusicu wrote in #230 (comment):

Speaking of literals, don't enclose them in quotes. One of the stated goals here is to make messages usable as string literals in programming languages. Enclose literals in parentheses {... {(5) :number} books} or angle brackets {... {<5> :number} books} or pairs of pipe symbols {... {|5| :number} books} or similar.

@stasm stasm added the syntax Issues related with syntax or ABNF label May 12, 2022
@stasm
Copy link
Collaborator Author

stasm commented May 13, 2022

I like this suggestion a lot. It's a simple change that can have a significant impact on how easy it will be to embed translations in code or other containers.

Parentheses look good to me ({(5) :number}), although as a non-native English speaker, I wonder if there's a strong connotation to negative numbers here? If there is, then a pair of pipes would get my vote instead. I don't think we need directional delimiters here (i.e. separate open and close ones) like I recommend we have for patterns. Literals are usually short and already enclosed in a placeholder (which has its own delimiters) or are outside patterns (when used as variant keys).

Using the current EBNF to visualize the options:

{$foo :func}
{$bar :func}
    (a key with spaces) (another one) [Some text with an interpolated {(123) :number} literal.]
    |a key with spaces| |another one| [Some text with an interpolated {|123| :number} literal.]
    _ [Default foo]

@aphillips
Copy link
Member

If we encode @stasm's example to a string it becomes gibberish:

"{$foo :func}\n{$bar :func}\n    (a key with spaces) (another one) [Some text...]\n    (a key with spaces) (another one) [etc.]"

The discussion of a syntax without consideration for the serialization form makes me nervous that we'll have to embed our nifty new syntax in an impenetrable layer of syntactic goo from any eventual resource format. Maybe we should reconsider and just "bite the bullet" to define the "source format", which can then be consumed/compiled into a runtime format? Thinking of MFv2 as "a pattern string" makes my head hurt 😵

I think the only things missing currently are a way to identify the "resource key" (pattern identifier) and the outsides of the MFv2 structure.

@zbraniecki
Copy link
Member

@aphillips I am aligned with you that we should draft the "MF2 Resource" proposal before we freeze MF2 Message Format.

The list of items to consider for resource is (brain dump) - pattern identifier, recover from broken message mechanism, groupings, group and resource level meta information.

@stasm
Copy link
Collaborator Author

stasm commented May 13, 2022

I formatted my example for readability, but I think it's realistic to assume that many translations will be formatted on a single line when embedded into a generic container. That's why I was against making the newline a syntax-significant character in MF2.

My example also has long variant keys with spaces to demonstrate what they would look like with delimiters, but the EBNF allows bare variant keys too. I think a more realistic example would be this:

"{$foo :func} {$bar :func} one one [Some text...] one few [etc.]"

There's #251 about making the preamble with selectors stand out more from the rest of the message, but otherwise I think this is actually pretty minimal. Do you think a special-purpose resource format can improve this significantly?

@eemeli
Copy link
Collaborator

eemeli commented May 13, 2022

I added #265 to continue the discussion on developing a resource syntax in parallel.

Regarding the original topic and the related #245, we should keep in mind the potential of using escaped literals as a way of signaling non-translatability, so a message could include e.g.

Text with {"untranslatable term"}.

in addition to

Text with {"42" :number}.

Furthermore, the syntax currently also uses quotes for option values:

Text with {$var :func foo="value with spaces"}.

With the alternatives suggested by @markusicu, those would look like this:

Text with {(untranslatable term)}.
Text with {(42) :number}.
Text with {$var :func foo=(value with spaces)}.
Text with {<untranslatable term>}.
Text with {<42> :number}.
Text with {$var :func foo=<value with spaces>}.
Text with {|untranslatable term|}.
Text with {|42| :number}.
Text with {$var :func foo=|value with spaces|}.

Whichever option we pick, I would think that it should be the same for each of the above use cases.

@stasm
Copy link
Collaborator Author

stasm commented May 20, 2022

Re. angle brackets, I have another concern on top of the need to escape them in XML containers. Consider the examples from above:

Text with {<untranslatable term>}.
Text with {<42> :number}.
Text with {$var :func foo=<value with spaces>}.

Am I the only one to whom the <angle bracketed literals> look like unofficial placeholders that are used in docs or in chat between users? I don't know how to call them right; it's like a second derivative templating used when explaining how a DSL works.

@markusicu
Copy link
Member

Whichever option we pick, I would think that it should be the same for each of the above use cases.

+1

Given agreement on pull request #276 (comment) could this issue be closed?

@eemeli
Copy link
Collaborator

eemeli commented Jun 7, 2022

Agreed. I had thought that GitHub's automation would take care of that, but apparently that isn't the case. @romulocintra, could we close this right away or do we need to wait for the next call?

@eemeli eemeli added the resolve-candidate This issue appears to have been answered or resolved, and may be closed soon. label Jun 7, 2022
@stasm
Copy link
Collaborator Author

stasm commented Feb 14, 2023

I'd like to revisit this issue, and with it, the decision to delimit literals with round parentheses. I'm concerned that we may be making a mistake, even if I don't think it's a very serious mistake. My rationale is based on my personal, subjective impression. I wonder if anyone has similar impressions about this.

With the current grammar, we have:

{Text with {(an untranslatable term)}.}
{Text with {(42) :number}.}
{Text with {$var :func foo=(value with spaces)}.}

When I see literals with parens around them, I can't help the feeling of optionality that I associate with parens in prose. They look like some sort of addendum, an annotation, an extra, or even as if they weren't part of the syntax at all, and instead where used to denote missing content.

I realize that we don't expect delimited literals to be very common. After all, most variant keys will probably just be single ASCII words; the same goes for option values. Keeping this in mind, it's reasonable to be happy with (literals), even if they're not perfect.

My point, however, is that this is also the reason to be a little bit more cautious. Developers won't be used to seeing (literals) often, and may be confused by them. My proposition is that a more exotic choice can improve readability.

I'd like to propose to use the vertical pipe |, on both sides of the literal. It was one of the three options originally considered in this issue.

{Text with {|an untranslatable term|}.}
{Text with {|42| :number}.}
{Text with {$var :func foo=|value with spaces|}.}

In #263 (comment) I said:

I don't think we need directional delimiters here (i.e. separate open and close ones) like I recommend we have for patterns. Literals are usually short and already enclosed in a placeholder (which has its own delimiters) or are outside patterns (when used as variant keys).

@stasm stasm reopened this Feb 14, 2023
@zbraniecki
Copy link
Member

zbraniecki commented Feb 15, 2023

I'd like to suggest we revisit Markus' position on not using double quotes:

{Text with {"an untranslatable term"}.}
{Text with {"42" :number}.}
{Text with {$var :func foo="value with spaces"}.}

In isolation, I think it works perfectly. It conveys exactly the right meaning to me - this is a "plain string" - string literal - enclosed in my pattern.

Markus point was that patterns are meant to be used in programming languages, and in them they will be enclosed in strings.

I'd like to challenge that on two levels:

  1. I do not think most of MF2 patterns will be written explicitly in C++, Java, JS or Python code. That's not how localization works. In most cases patterns are stored in resource files.
  2. Even if one disagrees with (1), scenario where there is a pattern, in source code, and that pattern has a string literal inside it is not that unfamiliar! Nested string literals happen, albeit rarely, and SDEs know how to handle them.

For (2), In JS one can do multiple things, from writing a string in single quotes, to writing it in backticks, without having to resort to use of \". In other programming languages we have f'PATTERN',, """PATTERN""", #"PATTERN"# etc.

I am not saying that double quotes in pattern are great. They are definitely a paper cut, but I make a claim that writing patterns in source code is rare, and literals in patterns are rare. When those two things coexist, modern programming languages provide solutions because MessageFormat is not the only scenario where double quotes in string literals happen.

Example in JS:

let source = `{Text with {"an untranslatable term"}.}`;

let mf2 = new Intl.MessageFormat(["en-US"]);
let result = mf2.formatToString(source);

I really don't think it looks that bad and the double quote convey the intention better than (), || or {} would.

@stasm
Copy link
Collaborator Author

stasm commented Feb 15, 2023

I think almost everyone agrees that double quotes look the best. At the same time, a lot of us also believed that they introduced too much friction.

@eemeli
Copy link
Collaborator

eemeli commented Feb 15, 2023

I'd be fine with |42| rather than our current (42), esp. as the | is likely to be a rarer character than parentheses, and it allows us to avoid the question of whether ( needs to be escaped within a literal. There is a corner case with || having a conceptual overloading as "logical OR" rather than "empty value", but I don't think that's a blocker.

As for "42", I like it less than either parentheses or pipes, for at least the following reasons:

  1. It's more common in real-world messages, so in general it would lead to more escaping being required.
  2. There's the eternal battle between "straight" and “curly” quotes that we could here avoid.
  3. It communicates "string" rather than "value". As in,
    {"42" :number}
    
    somewhat forces the reader to consider the value first as a string, and only secondarily as something that becomes a number. Meanwhile,
    {(42) :number}
    
    or
    {|42| :number}
    
    makes it easier to skip that first part and see the whole as a value that's defined to be a number.

@stasm
Copy link
Collaborator Author

stasm commented Feb 15, 2023

(3) is a great point. I didn't consider that we expect to use literals for numerals, too. It contradicts my "everyone agrees" from my earlier comment; thanks for mentioning it.

@aphillips
Copy link
Member

(as contributor)

I was going to propose using backtick (` U+0060), since it is an ASCII character, is quote-like, doesn't have typesetting variations, and is rare in actual text. It also isn't widely used by programming languages. I would be okay with |.

I agree with @stasm that we don't actually need paired characters (although an advantage of paired characters is that many editors will help you match them up). I agree with @eemeli's points about quotes and would emphasize that external editors won't hork the pattern by attempting to curl the quotes.

I disagree slightly with @zbraniecki in that messages may sometimes appear in code but will definitely appear in file formats. Using double-quotes in e.g. a JSON format would require lots of quoting and produce visual clutter wherever literals are used. Developers, translators, and tools don't write messages directly: they write them for the serialization form where they are stored. Since what we're developing isn't hewing to any specific existing format, support for our syntax in editors will probably be rare.

For Amazon's format we chose a dialect of JSON and didn't invent any special syntax goo, even though we expected to use a resource compiler, so that existing mature editors could be used with no special anything.

(as chair)

If we're going to reopen an decision, I think we need to hold the same standard for everyone.

  • Reopening the issue (where one exists) is fine
  • The person reopening the issue needs to provide a proposal with pros and cons for the current and proposed change and any supporting material before the WG will consider it; the proposal can be discussed in advance and can just take the form of issue conversation, but it must be made in writing
  • The WG will then vote in the next teleconference on whether (a) to consider the issue and, if yes, (b) on the request. The decision of the WG is final; if the submitter is unsatisfied, they can appeal to the TC.

@aphillips aphillips added the Agenda+ Requested for upcoming teleconference label Feb 15, 2023
@alerque
Copy link
Contributor

alerque commented Feb 15, 2023

As noted, double quotes are problematic for embedding content in other formats, JSON being a prevalent example, but also most programming languages use them for string literals. Since message content will be embedded in languages as strings (even if this isn't the normative use case), anything used to denote strings should be avoided, double quotes being probably the defacto one to avoid if possible.

Backticks are problematic as well for slightly different reasons. One of the troubles with backticks is (a) how dangerous they are in all shell languages and a few others and (b) how difficult they are to escape in common markups like Markdown. @aphillips got off easy in his comment above with one unbalanced backtic being invalid markup, but would you even have known how to output the string {Text with {`42` :number}.}? It's not a simple backslash escape, you need double backticks around the string for GitHub flavored Markdown and there are differences in other parsers.

Of the things I've seen suggested | seems to make the most sense to me. I agree both parenthesis and braces suggest other meanings.

@aphillips
Copy link
Member

It's not luck: I quoted the backtick because I knew it would be a problem in markdown otherwise. To be clear, I support |.

There just aren't enough ASCII characters 😜 and as a result all of them are meaningful to some syntax.

I would have preferred braces, since that reduces the number of special characters in use, but these pose problems:

when foo {bar} baz {this is the pattern but the parser wants "bar" to be the pattern}
when foo {{bar}} baz {doubling might work, but is {{{moo}} :number} unattractive and requires more lookahead}

@zbraniecki
Copy link
Member

zbraniecki commented Feb 15, 2023

I am convinced by the JSON argument - it is indeed common. In result I retract my request to revisit double quote.

@eemeli :

It communicates "string" rather than "value".

I am confused by your argument here. You say {"42" :number} "forces the reader to think of 42 here as a string" - because it is. it is a string literal passed to a number formatter. What else should it make the reader think?
{|42| :number} indeed may hide that 42 is here a string, but I think that's just confusing.

What about using a back tick?

{Text with {`an untranslatable term`}.}

It will work well in JSON, and almost all programming languages embedded literals except of JS where backtick may be used for parametrized strings. It still looks like a string.
Downside is that the character is harder to find on the keyboard.

edit: Ahh, I see the conversation below about backtick. Github showed it to me after I posted my comment. I agree we're low on available characters, but I think I'm not convinced by the shell argument.

First of all, I wouldn't consider input/output in shell to be significant use case of MF2, and two, if we want to go ad-extremum, all characters are used somewhere "there are not enough ASCII chars".

@gibson042
Copy link
Collaborator

gibson042 commented Feb 19, 2023

I would have preferred braces, since that reduces the number of special characters in use, but these pose problems:

when foo {bar} baz {this is the pattern but the parser wants "bar" to be the pattern}
when foo {{bar}} baz {doubling might work, but is {{{moo}} :number} unattractive and requires more lookahead}

@aphillips I don't think you were around for #269, but it proposed double braces for placeholders in alignment with Mustache, Jinja(2) and Angular. If it had been adopted, the examples here might look like one of these (depending upon the choice of literal delimiter):

  • parentheses:
    when foo (bar) [{{(42) :number}} is numeric; {{(untranslatable term) :quote pretty=(true)}} is quoted]
  • pipe:
    when foo |bar| [{{|42| :number}} is numeric; {{|untranslatable term| :quote pretty=|true|}} is quoted]
  • backtick:
    when foo `bar` [{{`42` :number}} is numeric; {{`untranslatable term` :quote pretty=`true`}} is quoted]
  • single quote (with gratuitous Jinja-subset function syntax variation):
    when foo 'bar' [{{'42' |number}} is numeric; {{'untranslatable term' | quote(pretty='true')}} is quoted]

However, things instead settled on using single braces to wrap both patterns and placeholders. And given that already unique syntax, there doesn't seem to be much difference between wrapping literals in parentheses vs. pipes vs. backticks vs. single quotes (which I think covers all the JSON- and XML-friendly options that have been suggested above)—none are really objectionable, but none really compelling either.

@aphillips aphillips removed resolve-candidate This issue appears to have been answered or resolved, and may be closed soon. Agenda+ Requested for upcoming teleconference labels Feb 27, 2023
@aphillips
Copy link
Member

The 2023-02-27 call resolved that we would replace (/) with |. @stasm took the action to create a PR to that effect. This issue depends on the change to ABNF.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Action-Item Action item assigned by the WG blocker Blocks the release syntax Issues related with syntax or ABNF
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants