Skip to content

Commit 29e16f1

Browse files
stasmeemeliaphillips
authored
Open/close design: Add more alternatives (#517)
* Alternative: do nothing * Alternative: exact HTML syntax * Apply suggestions from code review Co-authored-by: Addison Phillips <[email protected]> * Apply suggestions from code review Co-authored-by: Addison Phillips <[email protected]> * Apply suggestions from code review Co-authored-by: Addison Phillips <[email protected]> * s/support/allow/ any markup in text * Minor clarifications to A3 * Add A4. Poundslash * Add A5: Square Brackets * A4: Mention Mustache as prior art * Add pros and cons to the proposed design * Apply suggestions from code review Co-authored-by: Eemeli Aro <[email protected]> --------- Co-authored-by: Eemeli Aro <[email protected]> Co-authored-by: Addison Phillips <[email protected]>
1 parent 85e1639 commit 29e16f1

File tree

1 file changed

+170
-3
lines changed

1 file changed

+170
-3
lines changed

exploration/open-close-expressions.md

Lines changed: 170 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -149,6 +149,8 @@ without access to the other parts of the selected pattern.
149149
This design relies on the recognition that the formatted output of MF2
150150
may be further processed by other tools before presentation to a user.
151151
152+
### Syntax
153+
152154
Let us add _markup_ as a new type of _placeholder_,
153155
in parallel with _expression_:
154156
@@ -170,11 +172,32 @@ Unlike annotations, markup expressions may not have operands.
170172

171173
Markup is not valid in _declarations_ or _selectors_.
172174

175+
#### Pros
176+
177+
* Doesn't conflict with any other placeholder expressions.
178+
179+
* Agnostic syntax, different from HTML or other markup and templating systems.
180+
181+
#### Cons
182+
183+
* Adds 3 new sigils to the expression syntax.
184+
185+
* Because they're agnostic, the meaning of the sigils must be learned or deduced.
186+
187+
* Requires the special-casing of negative numeral literals,
188+
to distinguish `{-foo}` and `{-42}`.
189+
190+
### Runtime Behavior
191+
192+
#### Formatting to a String
193+
173194
When formatting to a string,
174195
markup placholders format to an empty string by default.
175196
An implementation may customize this behaviour,
176197
e.g. emitting XML-ish tags for each open/close placeholder.
177198

199+
#### Formatting to Parts
200+
178201
When formatting to parts (as proposed in <a href="https://github.com/unicode-org/message-format-wg/pull/463">#463</a>),
179202
markup placeholders format to an object including the following properties:
180203

@@ -222,14 +245,89 @@ _What other solutions are available?_
222245
_How do they compare against the requirements?_
223246
_What other properties they have?_
224247

225-
### HTML-like syntax
248+
### A1. Do Nothing
249+
250+
We could choose to not provide any special support for spannables or markup.
251+
This would delegate the problem to tools and downstream processing layers.
252+
253+
```
254+
This is <strong>bold</strong> and this is <img alt="an image" src="{$imgsrc}">.
255+
```
256+
257+
#### Pros:
258+
259+
* No work required from us right now. We can always add support later, provided we reserve adequate placeholder syntax.
260+
* We already allow (and are required to allow) in-line literal markup and other templating syntax in messages, since they are just character sequences.
261+
* Unlike other solutions, does not require MessageFormat to reinterpret or process markup to create the desired output.
262+
* It's HTML.
263+
* The least surprising syntax for developers and translators.
264+
* Some CAT tools already support HTML and other markup in translations.
265+
266+
#### Cons:
267+
268+
* Requires quoting in XML-based containers.
269+
* Relies on a best-effort convention; is not a standard.
270+
* Markup becomes a completely alien concept in MessageFormat:
271+
* It cannot be validated via the AST nor the reigstry.
272+
* It cannot be protected, unless put inside literal expressions.
273+
* It is not supported by `formatToParts`, which in turn makes double-parsing difficult.
274+
* It requires special handling when inserting messages into the DOM.
275+
* It requires "sniffing" the message to detect embedded markup. XSS prevention becomes much more complicated.
276+
277+
### A2. XML Syntax
278+
279+
> `<foo>`, `</foo>`, or `<foo/>`
280+
281+
We could parse the HTML syntax as part of MessageFormat parsing,
282+
and represent markup as first-class data-model concepts of MessageFormat.
283+
284+
```
285+
This is <strong>bold</strong> and this is <img alt="an image">.
286+
```
287+
288+
To represent HTML's auto-closing tags, like `<img>`,
289+
we could follow HTML's syntax to the letter, similer to the snippet above,
290+
and use the *span-open* syntax for them.
291+
This would be consistent with HTML, but would require:
292+
293+
* Either the parser to hardcode which elements are standalone;
294+
this approach wouldn't scale well beyond the current set of HTML elements.
295+
296+
* Or, the validation and processing which leverages the open/close and standalone concepts
297+
to be possible only when the registry is available.
298+
299+
Alternatively, we could diverge from proper HTML,
300+
and use the stricter XML syntax: `<img/>`.
301+
302+
```
303+
This is <html:strong>bold</html:strong> and this is <html:img alt="an image" />.
304+
```
305+
306+
The same approach would be used for self-closing elements defined by other dialects of XML.
307+
308+
#### Pros:
309+
310+
* Looks like HTML.
311+
* The least surprising syntax for developers and translators.
312+
313+
#### Cons:
314+
315+
* Looks like HTML, but isn't *exactly* HTML, unless we go to great lengths to make it so.
316+
See the differences between HTML and React's JSX as a case-study of consequences.
317+
* Requires quoting in XML-based containers.
318+
* It only supports HTML.
319+
320+
### A3. XML-like syntax
321+
322+
> `{foo}`, `{/foo}`, `{foo/}`
226323
227324
The goal of this solution is to avoid adding new sigils to the syntax.
228325
Instead, it leverages the familiarity of the `foo`...`/foo` idiom,
229326
inspired by HTML and BBCode.
230327

231328
This solution consists of adding new placeholder syntax:
232329
`{foo}`, `{/foo}`, and `{foo/}`.
330+
The data model and the runtime considerations are the same as in the proposed solution.
233331

234332
```
235333
This is {html:strong}bold{/html:strong} and this is {html:img alt=|an image|/}.
@@ -239,7 +337,7 @@ Markup names are *effectively namespaced* due to their not using any sigils;
239337
they are distinct from `$variables`, `:functions`, and `|literals|`.
240338

241339
> [!NOTE]
242-
> This requires dropping unquoted literals as operands,
340+
> This requires dropping unquoted non-numeric literals as operands,
243341
> so that `{foo}` is not parsed as `{|foo|}`.
244342
> See [#518](https://github.com/unicode-org/message-format-wg/issues/518).
245343
@@ -251,7 +349,7 @@ The exact meaning of the new placeholer types is as follows:
251349

252350
#### Pros
253351

254-
* Doesn't add new sigils except for `/`,
352+
* Only adds `/` as a new sigil,
255353
which is universally known thanks to the wide-spread use of HTML.
256354

257355
* Using syntax inspired by HTML makes it familiar to most translators.
@@ -269,3 +367,72 @@ The exact meaning of the new placeholer types is as follows:
269367
* Requires changes to the existing MF2 syntax: dropping unquoted literals as expression operands.
270368

271369
* Regular placeholders, e.g. `{$var}`, use the same `{...}` syntax, and may be confused for *open* elements.
370+
371+
### A4. Hash & Slash
372+
373+
> `{#foo}`, `{/foo}`, `{#foo/}`
374+
375+
This solution is similar to A3 in that
376+
it also proposes to use the forward slash `/` for the closing element syntax.
377+
However, opening elements are decorated with a pound sign `#`:
378+
resulting in `{#foo}` and `{/foo}`.
379+
380+
This is similar to [Mustache](http://mustache.github.io/mustache.5.html)'s control flow syntax.
381+
382+
Standalone elements combine the sigil in front and HTML's forward slash `/` at the end of the placeholder: `{#foo/}`.
383+
384+
The data model and the runtime considerations are the same as in the proposed solution.
385+
386+
```
387+
This is {#html:strong}bold{/html:strong} and this is {#html:img alt=|an image|/}.
388+
```
389+
390+
Markup names are *namespaced* by their use of the pound sign `#` and the forward slash `/` sigils.
391+
They are distinct from `$variables`, `:functions`, and `|literals|`.
392+
393+
#### Pros
394+
395+
* Leverages the familiarity of the forward slash `/` used for closing spans.
396+
397+
* Doesn't conflict with any other placeholder expressions.
398+
399+
* Prior art exists: Mustache.
400+
401+
#### Cons
402+
403+
* Introduces two new sigils, the pound sign `#` and the forward slash `/`.
404+
405+
* The standalone syntax is a bit clunky (but logical): `{#foo/}`.
406+
407+
* In Mustache, the `{{#foo}}`...`{{/foo}}` syntax is used for *control flow* statements rather than printable data.
408+
409+
### A5. Square Brackets
410+
411+
> `[foo]`, `[/foo]`, `[foo/]`
412+
413+
```
414+
This is [html:strong]bold[/html:strong] and this is [html:img alt=|an image|/].
415+
```
416+
417+
#### Pros
418+
419+
* Concise and less noisy than the alternatives.
420+
421+
* Doesn't add new sigils except for the forward slash `/`,
422+
which is universally known thanks to the wide-spread use of HTML.
423+
424+
* Leverages the familiarity of the forward slash `/` used for closing spans.
425+
426+
* Makes it clear that `{42}` and `[foo]` are different concepts:
427+
one is a standalone placeholder and the other is an open-span element.
428+
429+
* Makes it clear that markup and spans are not expressions,
430+
and thus cannot be used in declarations nor selectors.
431+
432+
* Established prior art: the [BBCode](https://en.wikipedia.org/wiki/BBCode) syntax.
433+
Despite being a niche language, BBCode can be argued to be many people's first introduction to markup-like syntax.
434+
435+
#### Cons
436+
437+
* Requires making `[` (and possibly `]`) special in text.
438+
Arguably however, markup is more common in translations than the literal `[ ... ]`.

0 commit comments

Comments
 (0)