-
-
Notifications
You must be signed in to change notification settings - Fork 36
Delimiting patterns: [ ... ]
vs { ... }
#255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Comments migrated from the slidesOptions for consideration Slides comment, Stanisław Małolepszy (@stasm), 11:10 PM Apr 21 (Forked from another comment) @mihnita wrote:
I suggested With With Slides comment, Mihai Nita (@mihnita), 2:03 PM Apr 21 (edited 2:05 PM Apr 21) Note for A and B: why Any extra thing with special meaning needs a justification. |
As I stated in the comment reproduced above, I have a strong preference for using square brackets From the point of view of discoverability of syntax rules, I'd argue that it's easier to teach users that For example, switching to
This is made worse by the fact that the CLDR names like This is why I'm advocating that one of our design goals be to make it as obvious as possible which parts of the message are meant to be translated. I'm proposing that we use square brackets to that effect:
|
|
I've been trying to collect some data on this. I estimate that fewer than 0.5% of messages contain square brackets in English and many other European languages. The percentage goes up for Japanese and Chinese (~1.5%) because these languages use square brackets as regular punctuation marks. I think it's fair to assume that brackets are on average more common than curly braces ( Also, we might want to consider which languages are most common as source languages, as they will be edited by hand more often than others. OTOH, target languages are more likely to be edited via tooling, in which case escaping requiremets matter less.
I can only offer my personal experience as well as some anecdotal data back from my Mozilla days. I've always found it very confusing that ICU MF used braces for both code and text. This was perhaps made harder for me because ICU MF also didn't use any prefixes for argument names nor functions, which could result in messages like |
In my experience, any requirement for escaping in the plain text is a source of error. Translators and other non technical people are not prone to get it right and tooling only gets us so far. I agree that the MF keywords provoke overtranslation and protection of peaceables is a constant worry. I do prefer that in-text placeables look as codelike as possible and, as mentioned, that escapes be kept to an absolute minimum. The outer "quotes" don't matter so much (translators and others mostly won't see them) but using square brackets would require them to be escaped in line. Hence a preference for curly brackets. |
I echo "Translators and other non technical people are not prone to get it
right and tooling only gets us so far."
…On Tue, May 31, 2022 at 11:06 AM Addison Phillips ***@***.***> wrote:
In my experience, any requirement for escaping in the plain text is a
source of error. Translators and other noob technical people are not prone
to get it right and tooling only gets us so far.
I agree that the MF keywords provoke overtranslation and protection of
peaceables is a constant worry.
I do prefer that in-text placeables look as codelike as possible and, as
mentioned, that escapes be kept to an absolute minimum. The outer "quotes"
don't matter so much (translators and others mostly won't see them) but
using square brackets would require them to be escaped in line. Hence a
preference for curly brackets.
—
Reply to this email directly, view it on GitHub
<#255 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJLEMFH4UQSZM36W5BMMPTVMZIL7ANCNFSM5VWCXF7A>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
If we assume that translators and others mostly won't see the outer delimiters (which I agree with), it would be helpful from the tooling point of view to be able to also assume that a literal delimiter typed by a translator is meant to be part of the translation, and hence requires escaping when serialized to MF2 syntax. This is the case for square brackets Consider a message
OTOH, consider a message
|
Only to a developer editing the message string, and they need to learn which characters may be or must be escaped. A translator normally works in a tool that hides as much syntax as possible, so if they type any syntax-relevant character, the tool must escape it. For reasons stated here and echoed by Addison and Mark, I favor curly braces to enclose patterns pretty strongly. I also favor using square brackets for selector lists and variant key lists. If we could agree on syntax that is just as easy to parse and read but without square brackets for those lists, then maybe they could be ok for enclosing patterns. But I don't think that they are better for enclosing patterns than curly braces. |
I see and acknowledge the agreement forming about using curly braces, but I'd like someone to address the concerns I had about mixing translatable and non-translatable parts of the message by using the same syntax for them. So far we're in a situation where one group says "we don't think it's overly confusing", to which the other group responds "indeed, it is confusing to us and we've seen others confused as well". How do we get out of this? I think we should take a step back and:
|
I would say: developers. I see all the time developers using raw formats (strings and layout files in Android, html in other cases) even when a friendlier WYSIWYG tool is available. And I include in this "bucket" technical people that contribute translations to a small project I actually did that before, As a developer I focus mainly on on the code: algorithms, the logic, security, correctness, etc. But translation (even as a dev contributing to open source) is 100% time job. Similar to "hey, I can write a quick and dirty script in Notepad".
I strongly-strongly advocate that translators (when working inside a translation tool) should not have to do any escaping. WYSIWYG. |
Reminder: for a refresher on localization tools and escaping (and other concerns) see the "Localization Concepts" document that I've shared a while ago. |
Translators should not type placeholder names. The placeholders come from the source (or in some cases a dev can add info saying "btw, theses are 5 extra placeholders available to you) So they choose "insert placeholder" (from a menu or hotkey) they get a list with what is available. If they type "Hello {curly" then it is a correct string, WYSIWYG, and when exported from the 10n tool is escape as |
Right, I agree with this. I was just trying to make a point that even if it works, it may look like a broken placeholder to a subset of translators who cared enough to learn that |
I thought of one more risk associated with choosing The current spec only allows placeholder expresions as local variables / named expressions.
However, we did talk inthe past about extending this to allowing whole patterns to be bound to local variables. See #149 for reference and the examples in my January proposal. For example:
I acknowledge that such a feature is controversial: it creates risks related to concatenation, but it also allows things that doesn't currently have good built-in alternatives in MF. During the discussions with the CLDR TC, this idea was removed from the scope of MF 2.0, but I could see us revisiting it in the future. If we choose the same delimiter for patterns as we do for placeholders, we'll have a syntax conflict:
(This may be an argument to rethink the local variable definition syntax, too, BTW). I realize that this is a hypotethical risk. My point here is that by being more explicit and choosing different delimiters for different things, we'd make the MF syntax more resilient in the future, and more flexible for future extensions. I fact, I now realize that the same is true for my arguments in #286. We can find a minimal set of syntax productions that parse and work well for our current scope, but we also increase the risk of not being able to extend the syntax in the future by doing so. |
Arguments against
Arguments in favor of
I understand that not all of these arguments are equal. I also acknowledge that they are all correct and valid, and that we're in fact discussing which consequences we're willing to accept. |
My extra arguments against
My comment about the 3rd argument above in favor of
If we consider the alternative of "validation/recovery in localization tools" that would catch and prevent such errors before happening during runtime (which after the fact), we can also consider that we can independently iterate on error reporting independently of the syntax. For example, in your hypothetical syntax error output in the |
This isn't an argument against using Also, arguably, a pattern is sequential data too: it's an ordered sequence of pattern parts.
I'm using recoverability from errors as one potential benefit of having a syntax with good separation of concerns, clear rules, no overloading, and no exceptions. From the note that you highlighted, it sounds like we should talk about this more for the purpose of MF2.0?
The
I think it's only related if we add one more syntax character after the opening
Well, some things will simply not be possible given any kind of syntax, and our choices influence how many such things there are. |
During the plenary meeting today, I acknowledged that the majority of the WG is in favor of using I think that the discussion about keywords in #286 and the proposed implementation in #287 have made it easier for me to accept The consensus reached during the meeting was to:
|
* Use curly braces to delimit patterns Closes #255. * Reword "nested" to "inner" Make it clearer that there's no deep nesting possible. Co-authored-by: Eemeli Aro <[email protected]> Co-authored-by: Eemeli Aro <[email protected]>
* Use curly braces to delimit patterns Closes #255. * Reword "nested" to "inner" Make it clearer that there's no deep nesting possible. Co-authored-by: Eemeli Aro <[email protected]> Co-authored-by: Eemeli Aro <[email protected]>
@markusicu wrote in #230 (comment):
The text was updated successfully, but these errors were encountered: