-
-
Notifications
You must be signed in to change notification settings - Fork 36
Pick a delimiter for literals other than the double quote #263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I like this suggestion a lot. It's a simple change that can have a significant impact on how easy it will be to embed translations in code or other containers. Parentheses look good to me ( Using the current EBNF to visualize the options:
|
If we encode @stasm's example to a string it becomes gibberish:
The discussion of a syntax without consideration for the serialization form makes me nervous that we'll have to embed our nifty new syntax in an impenetrable layer of syntactic goo from any eventual resource format. Maybe we should reconsider and just "bite the bullet" to define the "source format", which can then be consumed/compiled into a runtime format? Thinking of MFv2 as "a pattern string" makes my head hurt 😵 I think the only things missing currently are a way to identify the "resource key" (pattern identifier) and the outsides of the MFv2 structure. |
@aphillips I am aligned with you that we should draft the "MF2 Resource" proposal before we freeze MF2 Message Format. The list of items to consider for resource is (brain dump) - pattern identifier, recover from broken message mechanism, groupings, group and resource level meta information. |
I formatted my example for readability, but I think it's realistic to assume that many translations will be formatted on a single line when embedded into a generic container. That's why I was against making the newline a syntax-significant character in MF2. My example also has long variant keys with spaces to demonstrate what they would look like with delimiters, but the EBNF allows bare variant keys too. I think a more realistic example would be this:
There's #251 about making the preamble with selectors stand out more from the rest of the message, but otherwise I think this is actually pretty minimal. Do you think a special-purpose resource format can improve this significantly? |
I added #265 to continue the discussion on developing a resource syntax in parallel. Regarding the original topic and the related #245, we should keep in mind the potential of using escaped literals as a way of signaling non-translatability, so a message could include e.g.
in addition to
Furthermore, the syntax currently also uses quotes for option values:
With the alternatives suggested by @markusicu, those would look like this:
Whichever option we pick, I would think that it should be the same for each of the above use cases. |
Re. angle brackets, I have another concern on top of the need to escape them in XML containers. Consider the examples from above:
Am I the only one to whom the |
+1 Given agreement on pull request #276 (comment) could this issue be closed? |
Agreed. I had thought that GitHub's automation would take care of that, but apparently that isn't the case. @romulocintra, could we close this right away or do we need to wait for the next call? |
I'd like to revisit this issue, and with it, the decision to delimit literals with round parentheses. I'm concerned that we may be making a mistake, even if I don't think it's a very serious mistake. My rationale is based on my personal, subjective impression. I wonder if anyone has similar impressions about this. With the current grammar, we have:
When I see literals with parens around them, I can't help the feeling of optionality that I associate with parens in prose. They look like some sort of addendum, an annotation, an extra, or even as if they weren't part of the syntax at all, and instead where used to denote missing content. I realize that we don't expect delimited literals to be very common. After all, most variant keys will probably just be single ASCII words; the same goes for option values. Keeping this in mind, it's reasonable to be happy with My point, however, is that this is also the reason to be a little bit more cautious. Developers won't be used to seeing I'd like to propose to use the vertical pipe
In #263 (comment) I said:
|
I'd like to suggest we revisit Markus' position on not using double quotes:
In isolation, I think it works perfectly. It conveys exactly the right meaning to me - this is a "plain string" - string literal - enclosed in my pattern. Markus point was that patterns are meant to be used in programming languages, and in them they will be enclosed in strings. I'd like to challenge that on two levels:
For (2), In JS one can do multiple things, from writing a string in single quotes, to writing it in backticks, without having to resort to use of I am not saying that double quotes in pattern are great. They are definitely a paper cut, but I make a claim that writing patterns in source code is rare, and literals in patterns are rare. When those two things coexist, modern programming languages provide solutions because MessageFormat is not the only scenario where double quotes in string literals happen. Example in JS: let source = `{Text with {"an untranslatable term"}.}`;
let mf2 = new Intl.MessageFormat(["en-US"]);
let result = mf2.formatToString(source); I really don't think it looks that bad and the double quote convey the intention better than |
I think almost everyone agrees that double quotes look the best. At the same time, a lot of us also believed that they introduced too much friction. |
I'd be fine with As for
|
(3) is a great point. I didn't consider that we expect to use literals for numerals, too. It contradicts my "everyone agrees" from my earlier comment; thanks for mentioning it. |
(as contributor) I was going to propose using backtick (` U+0060), since it is an ASCII character, is quote-like, doesn't have typesetting variations, and is rare in actual text. It also isn't widely used by programming languages. I would be okay with I agree with @stasm that we don't actually need paired characters (although an advantage of paired characters is that many editors will help you match them up). I agree with @eemeli's points about quotes and would emphasize that external editors won't hork the pattern by attempting to curl the quotes. I disagree slightly with @zbraniecki in that messages may sometimes appear in code but will definitely appear in file formats. Using double-quotes in e.g. a JSON format would require lots of quoting and produce visual clutter wherever literals are used. Developers, translators, and tools don't write messages directly: they write them for the serialization form where they are stored. Since what we're developing isn't hewing to any specific existing format, support for our syntax in editors will probably be rare. For Amazon's format we chose a dialect of JSON and didn't invent any special syntax goo, even though we expected to use a resource compiler, so that existing mature editors could be used with no special anything. (as chair) If we're going to reopen an decision, I think we need to hold the same standard for everyone.
|
As noted, double quotes are problematic for embedding content in other formats, JSON being a prevalent example, but also most programming languages use them for string literals. Since message content will be embedded in languages as strings (even if this isn't the normative use case), anything used to denote strings should be avoided, double quotes being probably the defacto one to avoid if possible. Backticks are problematic as well for slightly different reasons. One of the troubles with backticks is (a) how dangerous they are in all shell languages and a few others and (b) how difficult they are to escape in common markups like Markdown. @aphillips got off easy in his comment above with one unbalanced backtic being invalid markup, but would you even have known how to output the string Of the things I've seen suggested |
It's not luck: I quoted the backtick because I knew it would be a problem in markdown otherwise. To be clear, I support There just aren't enough ASCII characters 😜 and as a result all of them are meaningful to some syntax. I would have preferred braces, since that reduces the number of special characters in use, but these pose problems:
|
I am convinced by the JSON argument - it is indeed common. In result I retract my request to revisit double quote. @eemeli :
I am confused by your argument here. You say What about using a back tick?
It will work well in JSON, and almost all programming languages embedded literals except of JS where backtick may be used for parametrized strings. It still looks like a string.
First of all, I wouldn't consider input/output in shell to be significant use case of MF2, and two, if we want to go ad-extremum, all characters are used somewhere "there are not enough ASCII chars". |
@aphillips I don't think you were around for #269, but it proposed double braces for placeholders in alignment with Mustache, Jinja(2) and Angular. If it had been adopted, the examples here might look like one of these (depending upon the choice of literal delimiter):
However, things instead settled on using single braces to wrap both patterns and placeholders. And given that already unique syntax, there doesn't seem to be much difference between wrapping literals in parentheses vs. pipes vs. backticks vs. single quotes (which I think covers all the JSON- and XML-friendly options that have been suggested above)—none are really objectionable, but none really compelling either. |
The 2023-02-27 call resolved that we would replace |
@markusicu wrote in #230 (comment):
The text was updated successfully, but these errors were encountered: