Skip to content

Drop restriction on using keywords in the syntax #286

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
eemeli opened this issue Jun 22, 2022 · 15 comments
Closed

Drop restriction on using keywords in the syntax #286

eemeli opened this issue Jun 22, 2022 · 15 comments
Labels
blocker-candidate The submitter thinks this might be a block for the next release syntax Issues related with syntax or ABNF

Comments

@eemeli
Copy link
Collaborator

eemeli commented Jun 22, 2022

In recent conversations with @stasm and @mihnita, it's become obvious that especially with more complex messages involving local variables and multiple variants, it is not reasonable to expect a translator unfamiliar and untrained with MF2 syntax to work with the raw plaintext representation of a message and be able to immediately see which parts of it contain the actual message or messages to translate. Consider for instance this example from the current spec:

$countInt = {$count :number maximumFractionDigits=0}
$itemAcc = {$item :noun count=$count case=accusative}
one [You bought {$color :adjective article=indefinite accord=$itemAcc} {$itemAcc}.]
* [You bought {$countInt} {$color :adjective accord=$itemAcc} {$itemAcc}.]

There's so much going on here that it's effectively a necessity to use some form of tooling assistance to extract the two variants from this message and present them individually to a translator, along with the contextual information provided by the rest of the message.

As a consequence, it is not reasonable or beneficial for us to keep this restriction on our design:

The syntax should not use nor reserve any keywords in any natural language, such as if, match, or let.

Following #252, we'll need to refactor the current preamble into two parts. When we do that, we should not be afraid of using keywords such as let and match in our syntax, esp. as it seems we have something like a consensus forming in #256 on starting in "code" mode, where such terms would be quite natural.

This change will help us reach a syntax that's easier to read and which has slightly fewer mysterious symbols and sigils that need to be deciphered.

@eemeli eemeli added syntax Issues related with syntax or ABNF blocker-candidate The submitter thinks this might be a block for the next release labels Jun 22, 2022
@echeran
Copy link
Collaborator

echeran commented Jun 22, 2022

I still think that guidance is reasonable to not use keywords like if, match, or let. The way this issue is written, it sounds like the corresponding PR #287 is a fait accompli, but the compromise syntax PR #266 shows a syntax that is readable (reasonably so, IMO, at least) and satisfies the guidance.

I would prefer the syntax to represent the fact that a message is data that is the input to a formatting API, and my concern is that if, let, and match start to look like a programming language (and even if we ensure it's not Turing complete, etc., that still is confusing).

At the least, this feels like another divergence point that should be better handled as not a blocker.

@stasm
Copy link
Collaborator

stasm commented Jun 23, 2022

Let me try to provide some context about this.

  1. The design restriction that we'd like to see removed specifically mentions "natural language" keywords. The intent of this restriction was to limit the risk of localizers also translating these keywords if they are regular English words. However, it seems like we're in agreement that the audience of the raw syntax will be developers, whom we can expect to understand that these are "code" rather than "text".

  2. At some point and level of nuance, it's difficult to evaluate readability without conducting user studies. I think that one reason for a stalemate that we're seeing with the syntax right now is that we have competing proposals that are rather similar in their complexity and readability. The way the design cycle goes right now, is that we have a proposal, then we judge that some part of the syntax is not as clear as we'd like it to be, which we attempt to solve by either: stricter ordering rules, adding prefixes, adding delimiters, or all of those. This gives us a more complex proposal, which is then thought of as "less clear" by one group. Realistically, if you squint and look at the proposals in Variants: should variant keys be delimited, too? #253, it's really hard to imagine even a developer understanding what is going on in any of them.

  3. Add to this the fact that for what I think will be many developers, the MF syntax will be an afterthought and merely a tool to get their job done. We're designing for people who won't even want to learn what we're designing for them.

  4. I find that to break the cycle it's sometimes helpful to go back to the design goals and restrictions, and see if changing some of them can potentially unblock us. This is what this issue proposes. I want us to answer the following questions:

    • Why did we not want to use natural language keywords?
    • Does this reason still hold, given the current state of the discussion?
    • What changes when this restriction is removed?

My take:

  • To minimze the risk of localizers translating them, because they are English words and look like source text.
  • No, given that the primary audience are developers.
  • We can use keywords instead of adding prefixes.

When we did this exercise, we realized that by using keywords we think we can improve the discoverability and the readability of the syntax significantly. The syntax becomes explicit and self-explanatory, which helpfully means that developers don't really have to spend anytime learning it. #287 is one potential incarnation of this idea.

The idea of being able to induce the rules of the syntax from just looking at it comes from another design goal:

The syntax inside translatable content should be easy to understand for humans. This includes making it clear which parts of the message body are translatable content, which parts inside it are placeholders, as well as making the selection logic predictable and easy to reason about.

@echeran Are your concerns about which specific keywords were proposed in #287, or more generally about the idea of using keywords in the first place?

@aphillips
Copy link
Member

One observation: one of our goals is to provide e.g. an XLIFF binding for MFv2. This is how we can (help) separate code elements such as keywords from being exposed to translation and is a better way of ensuring that sort of protection. Unless we make our syntax utterly mysterious we will always have "language-like" words that are part of the syntax to the benefit of developers who need to write the source pattern. The problem historically has been that translators see {count,number,integer} and figure those words need translation.

I think we should avoid adding additional keywords except-and-unless they add value. But I don't object to having keywords if the consensus is that we need them.

@echeran
Copy link
Collaborator

echeran commented Jun 23, 2022

@stasm My concerns are about the general idea of using keywords if they're not significantly useful. And they don't seem necessary to me in order to achieve a reasonably readable syntax, as #266 demonstrates.

Even though the words 'readable', 'useful', 'significant', 'value', etc. can have a bit of subjectivity in these discussions, mixing in these keywords into the syntax feels reminiscent of programming languages. With this MF syntax, what we're talking about is the syntax of the serialization of the data that is the input to the API. In general, data does not need a PL in order to exist. But PLs need data along with control flow constructs, etc., and they use keywords for such constructs.

When I see the match ... when ... being inserted, it is taking a series of key-value pairs (which are data -- ex: I see them as the key-value entries of an undelimited map) and using control flow keywords instead to represent that data. (Also, in this example, these control flow keywords also imply first-match over best-match, but I don't think that topic (#271) is resolved... personally, I would like to see how things work in practice because don't have the intuition yet to lean strongly in a single direction.)

Funny enough, I think my argument against introducing keywords is the same reasoning that was the origin of this idea in the first place -- this issue+PR seems to be a way to have a workaround that is compatible with the desire to have [...] delimiters for patterns instead of {...} delimiters because {...} looks like the delimitation of code blocks. But I assert that keywords harken to PL code more so than {...} -- I am also used to seeing plain data written with {...} (ex: JSON and EDN). However, I don't see match, when, let used to demarcate plain data.

The lesser nitpicky concerns come in when you notice that match in PLs operate on a single value, so when you need to match on multiple values, languages like Scala and Rust that use match require putting them into a destructurable compound value (ex: a tuple). The data literals for tuples and collections in PLs have delimiters, but #287 is inconsistent with that. And if you do add delimiters to be consistent with PLs, then you're making decisions in the direction of the compromise syntax in #266.

I don't know if there is a better way to accommodate [...] for delimiting patterns, but my original knee-jerk reaction to addressing that by introducing these identifiers in syntax would be a -0.8 if in the Apache [-1.0, 1.0] voting scale, which I can articulate now for the reasons above.

@stasm
Copy link
Collaborator

stasm commented Jun 24, 2022

@echeran I don't think there's an equivalence between having keywords and looking like a PL, is there? For instance, Jsonnet is a syntax for describing data, and yet it comes with the local keyword. Would it help if #287 proposed something more declarative than let, match, when? For example, local, selectors, variant?

Before we get to discussing particular choices for keywords, I'd like to reach an agreement on whether we're OK with allowing keywords as a general rule, assuming we come to a conclusion that they are necessary or that they improve the syntax significantly.

@stasm
Copy link
Collaborator

stasm commented Jun 24, 2022

One observation: one of our goals is to provide e.g. an XLIFF binding for MFv2. This is how we can (help) separate code elements such as keywords from being exposed to translation and is a better way of ensuring that sort of protection.

Agreed. I think the issue of separating translatable from non-translatable parts of the message is largerly solvable for translators, as you say. This is why we're suggestig to remove the explicit design restriction forbidding us from using keywords in the syntax, if needed.

The problem historically has been that translators see {count,number,integer} and figure those words need translation.

I'd extend this to developers, as well. Even if developers don't translate, they also need to be able to understand what's going on, in order to be able to make change to the source code. If the count variable name changes, is it OK to change it inside {count,number,integer}? Do number and integer need any changes, too? Etc.

I'm happy with the current syntax for placeholders precisely because I think it makes it easier for the reader to identify the functions of words inside them. {$count :number style=integer} is more verbose, but in a good kind of way (in my opinion). The price that we pay for this is the increased number of special characters in the syntax.

I imagine that there's a budget of special characters that we can use for prefixes and delimiters. We've used part of it to make the placeholder syntax more clear. If we agree that keywords are OK (when designed well), then I think we can avoid going over the budget for the syntax around patterns. #287 is one attempt, but first we need to agree that keywords are acceptable as a general rule. Hence this issue.

I think we should avoid adding additional keywords except-and-unless they add value. But I don't object to having keywords if the consensus is that we need them.

+1.

@eemeli
Copy link
Collaborator Author

eemeli commented Jun 24, 2022

In order to get some user input on this topic, at the end of my presentation on "The Road to Intl.MessageFormat" at React Norway earlier today, I asked my audience for their preferences between two options:

Option A

$relDate={$date :relativeDateTime fields=Mdjm}
[{$count :plural offset=1} {$gender}]
[1 female] {{$name} added you to her circles {$relDate}.}
[1 male] {{$name} added you to his circles {$relDate}.}
[1 *] {{$name} added you to their circles {$relDate}.}
[* *] {{$name} added you and {#count} others to their circles {$relDate}.}

Option B

let $relDate = {$date :relativeDateTime fields=Mdjm}
match {$count :plural offset=1} {$gender}
when 1 female {{$name} added you to her circles {$relDate}.}
when 1 male {{$name} added you to his circles {$relDate}.}
when 1 * {{$name} added you to their circles {$relDate}.}
when * * {{$name} added you and {#count} others to their circles {$relDate}.}

Option A is lifted directly from here; Option B applies the changes of #287 on that base, keeping the {...} pattern delimiters. A brief explanation of plurals and variants was given before presenting the options, but no part of the syntax(es) was explained.

This was an audience of front-end developers that is completely unfamiliar with the details of the syntax, but did just get a general explanation of the sort of of work we're doing. The talk itself barely touches or presents the MF2 syntax; the only example I use is:

helsinki = {+a href=(https://fi.wikipedia.org/wiki/Helsinki)}Helsinki{-a} on Suomen pääkaupunki.

Based on a quick count of hands, about 20 people preferred Option A and about 50 people preferred Option B.

Once it's available, I'll update this comment to include a timestamped link to the livestream where I present this question.

Edit: Link to video, unfortunately only showing the slides and with slightly broken audio: https://youtu.be/AQRlDs92XFA?t=19010 -- the relevant part is the subsequent 5 mins or so.

@zbraniecki
Copy link
Member

Based on a quick count of hands, about 20 people preferred Option A and about 50 people preferred Option B.

This is strongly counter-intuitive for me.
I'm wondering how did you audience reason about visually identifying female {{$name} added you to her circled is composed of a keyword and a string. I have an urge to question whether the { was sufficient.

I understand that what I'm writing now is at risk of researcher bias and belief perseverance, but I'm wondering if it's possible that your audience in fact did not visually parse the syntax of this message and rather assessed aesthetic reaction to looking at it - in such case, seeing fewer sigils may be correlated to preference.

If my hypothesis were to be correct (we could test it by constructing a simple HCI test where a subject has to look at a message like this and then answer a question or perform an action which would require to reason about the message)

Saying that, my opinion is weakly held and I do not intend to oppose Option B if the group decides to pursue it.

@echeran
Copy link
Collaborator

echeran commented Jun 24, 2022

Would it help if #287 proposed something more declarative than let, match, when? For example, local, selectors, variant?

Before we get to discussing particular choices for keywords, I'd like to reach an agreement on whether we're OK with allowing keywords as a general rule, assuming we come to a conclusion that they are necessary or that they improve the syntax significantly.

It's not just about the names of the keywords or their descriptiveness. Yes, different names can avoid confusing implications when interpreting meaning, which is important. This keywords topic highlights a still unaddressed readability concern of the syntax in develop -- you don't have the typical [...] visual cues for grouping the sequence of selectors and their value tuple cases.

But I still recognize the readability argument that is prompting the proposal to use these keywords. I would be okay with keywords only on the condition that they're not used to replace delimiters or other syntax used to represent the plain data -- in other words, I still prefer the visual cues of matching delimiters on the selectors definition list and the selector value tuple. So in the examples that @eemeli presented above as Option A and Option B, it could like something in between, like:

let $relDate={$date :relativeDateTime fields=Mdjm}
selectors [{$count :plural offset=1} {$gender}]
cases
[1 female] {{$name} added you to her circles {$relDate}.}
[1 male] {{$name} added you to his circles {$relDate}.}
[1 *] {{$name} added you to their circles {$relDate}.}
[* *] {{$name} added you and {#count} others to their circles {$relDate}.}

(Actually, in general, I'm personally on the other end of most people on verbosity vs. readability -- I prefer clarity much more over terseness or shaving off characters, so while others might find delimiter characters "verbose", I think they actually improve readability. Judicious use of keywords can also help. But it seems like most people find some amount of terseness more readable than my preferred extreme, and the question is how much and where they prefer the terseness and whether that creates confusion or irregularities.)

To support people who may want to keep the syntax terse because they find some level of terseness more readable, I would also suggest that these keywords are optional because I don't think they should take the place of delimiters. But I don't know if optional syntax for readability is considered "bad form" by people who are have more experience or opinions on syntax.

@eemeli
Copy link
Collaborator Author

eemeli commented Jun 24, 2022

Link to video, unfortunately only showing the slides and with slightly broken audio: https://youtu.be/AQRlDs92XFA?t=19010 -- the relevant part is the subsequent 5 mins or so.

@aphillips
Copy link
Member

replying to @stasm:

I'd extend this to developers, as well. Even if developers don't translate, they also need to be able to understand what's going on, in order to be able to make change to the source code. If the count variable name changes, is it OK to change it inside {count,number,integer}? Do number and integer need any changes, too? Etc.

While I agree in principle, in practice I rarely have to explain the MFv1 pattern syntax (beyond just the nuts-and-bolts of it) to developers: developers understand what it is, as they are familiar with using microsyntaxes, whereas translators very often "don't get it"--even when told explicitly.

I imagine that there's a budget of special characters that we can use for prefixes and delimiters. We've used part of it to make the placeholder syntax more clear. If we agree that keywords are OK (when designed well), then I think we can avoid going over the budget for the syntax around patterns. #287 is one attempt, but first we need to agree that keywords are acceptable as a general rule. Hence this issue.

If by "budget of special characters", you mean the (I think it's unstated?) assumption all of us are working with is that we should only use printable ASCII characters in the syntax--and that we want to avoid MFv1's adventures with apostrophe, I guess that's true.

My problem with at least some of the proposed keywords is that they feel superfluous to me--syntactic goo that we require developers to emit that don't really do anything.

For example, in let $foo = {expression}, "let" doesn't really add any functionality that $foo = {expression} doesn't have (although note that there is no expression terminator).

replying to @eemeli:

That sounds well done and I look forward to seeing the video. My concern is that seeing might not produce the same result as using the syntax in anger. Your Option B looks like structured code, so a developer might mentally go "uh-huh, uh-huh... let's assigning a variable and match is a switch-statement... I get it..." whereas Option A has to be explained. But one might tire of typing let/match/when all the time?

I do think there is something to providing syntax that identifies the selectors vs. the cases.

replying to @echeran:

(Actually, in general, I'm personally on the other end of most people on verbosity vs. readability -- I prefer clarity much more over terseness or shaving off characters, so while others might find delimiter characters "verbose", I think they actually improve readability. Judicious use of keywords can also help. But it seems like most people find some amount of terseness more readable than my preferred extreme, and the question is how much and where they prefer the terseness and whether that creates confusion or irregularities.)

I think you and I are in violent agreement here. We need developers to learn and use the syntax; debug it visually; and apply it effectively. Terseness can sometimes help this, but sometimes becomes a barrier. I personally prefer syntaxes that have strong visual cues (to assist the parser and the reader).

In fact, we are all guilty of using newlines to make the examples clear. In fact, this fragment:

selectors [{$count :plural offset=1} {$gender}]
cases
[1 female] {{$name} added you to her circles {$relDate}.}
[1 male] {{$name} added you to his circles {$relDate}.}
[1 *] {{$name} added you to their circles {$relDate}.}
[* *] {{$name} added you and {#count} others to their circles {$relDate}.}

is not an "array of cases". It's really this string:

selectors [{$count :plural offset=1} {$gender}] cases [1 female] {{$name} added you to her circles {$relDate}.} [1 male] {{$name} added you to his circles {$relDate}.} [1 *] {{$name} added you to their circles {$relDate}.} [* *] {{$name} added you and {#count} others to their circles {$relDate}.}

... and @eemeli's match/when syntax is easier to read when written that way:

match {$count :plural offset=1} {$gender} when [1 female] {{$name} added you to her circles {$relDate}.} when [1 male] {{$name} added you to his circles {$relDate}.} when [1 *] {{$name} added you to their circles {$relDate}.} when [* *] {{$name} added you and {#count} others to their circles {$relDate}.}

@stasm
Copy link
Collaborator

stasm commented Jun 27, 2022

Interesting point about using newlines in our discussions, @aphillips. Related, my preference for the recommended (but not required) style would be to use capital letters for the keywords, to make them stand out more:

MATCH [{$count :plural offset=1} {$gender}] WHEN [1 female] {{$name} added you to her circles {$relDate}.} WHEN [1 male] {{$name} added you to his circles {$relDate}.} WHEN [1 *] {{$name} added you to their circles {$relDate}.} WHEN [* *] {{$name} added you and {#count} others to their circles {$relDate}.}

Such keywords are visible enough to me, which is why I'm proposing that we not use delimiters for variant keys, like so:

MATCH {$count :plural offset=1} {$gender} WHEN 1 female {{$name} added you to her circles {$relDate}.} WHEN 1 male {{$name} added you to his circles {$relDate}.} WHEN 1 * {{$name} added you to their circles {$relDate}.} WHEN * * {{$name} added you and {#count} others to their circles {$relDate}.}

In fact, this may be clear enough for me that I don't mind the {...} around patterns here. They create "islands" of content, while the "code" stays in the background. And this way, we could not use [...] in the syntax at all.

@stasm
Copy link
Collaborator

stasm commented Jun 27, 2022

If by "budget of special characters", you mean the (I think it's unstated?) assumption all of us are working with is that we should only use printable ASCII characters in the syntax--and that we want to avoid MFv1's adventures with apostrophe, I guess that's true.

What I meant here is that I'd prefer a syntax which uses few special characters. Special characters are hard to learn: it's difficult to Google them, and some people don't even know what they're called (and that's OK). We're already treading on thin ice with three kinds of brackets ({...}, [...], (...)), $, :, and *. There's also the idea to use # for local variable references. Every time we add a new special character to the syntax, we're depleting the budget. A judicious use of keywords can help by opening up the possibility of adding contextual keywords rather than adding new special characters.

For example, if in the future we want to add some runtime meta information which follows the variants, we can:

  1. either introduce new special characters:

     [{$count}] [1] {One} [*] {Other} #{meta goes here}
    
  2. or introduce a new keyword:

     MATCH {$count} WHEN 1 {One} WHEN * {Other} META {meta goes here}
    

My position is that (2) is better because it scales to many more instances, and also is self-explanatory: it gives names to concepts, which means it's easier to talk about them (e.g. on StackOverflow).

@aphillips
Copy link
Member

I agree, @stasm, that lots of different syntax characters with subtle different meanings is hard to use/difficult to teach.

Previously my reaction might have been "well, it doesn't really matter which we choose--I'm going to strip this all away and hide the details behind my resource format", but that's a cop out. To be honest, I don't really want to teach people to write strings like this vs. using a structured format approach--but that is not an option given the goals we have.

I don't see this as a choice between symbols and keywords. What we're creating is a grammar which will use a combination of keywords and symbols to create a pattern string. Whatever form that grammar takes will have to be learned by developers and checked by tools.

@eemeli
Copy link
Collaborator Author

eemeli commented Jun 28, 2022

Closed by #287.

@eemeli eemeli closed this as completed Jun 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker-candidate The submitter thinks this might be a block for the next release syntax Issues related with syntax or ABNF
Projects
None yet
Development

No branches or pull requests

5 participants