You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: exploration/open-close-expressions.md
+170-3Lines changed: 170 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -149,6 +149,8 @@ without access to the other parts of the selected pattern.
149
149
This design relies on the recognition that the formatted output of MF2
150
150
may be further processed by other tools before presentation to a user.
151
151
152
+
### Syntax
153
+
152
154
Let us add _markup_ as a new type of _placeholder_,
153
155
in parallel with _expression_:
154
156
@@ -170,11 +172,32 @@ Unlike annotations, markup expressions may not have operands.
170
172
171
173
Markup is not valid in _declarations_ or _selectors_.
172
174
175
+
#### Pros
176
+
177
+
* Doesn't conflict with any other placeholder expressions.
178
+
179
+
* Agnostic syntax, different from HTML or other markup and templating systems.
180
+
181
+
#### Cons
182
+
183
+
* Adds 3 new sigils to the expression syntax.
184
+
185
+
* Because they're agnostic, the meaning of the sigils must be learned or deduced.
186
+
187
+
* Requires the special-casing of negative numeral literals,
188
+
to distinguish `{-foo}` and `{-42}`.
189
+
190
+
### Runtime Behavior
191
+
192
+
#### Formatting to a String
193
+
173
194
When formatting to a string,
174
195
markup placholders format to an empty string by default.
175
196
An implementation may customize this behaviour,
176
197
e.g. emitting XML-ish tags for each open/close placeholder.
177
198
199
+
#### Formatting to Parts
200
+
178
201
When formatting to parts (as proposed in <ahref="https://github.com/unicode-org/message-format-wg/pull/463">#463</a>),
179
202
markup placeholders format to an object including the following properties:
180
203
@@ -222,14 +245,89 @@ _What other solutions are available?_
222
245
_How do they compare against the requirements?_
223
246
_What other properties they have?_
224
247
225
-
### HTML-like syntax
248
+
### A1. Do Nothing
249
+
250
+
We could choose to not provide any special support for spannables or markup.
251
+
This would delegate the problem to tools and downstream processing layers.
252
+
253
+
```
254
+
This is <strong>bold</strong> and this is <img alt="an image" src="{$imgsrc}">.
255
+
```
256
+
257
+
#### Pros:
258
+
259
+
* No work required from us right now. We can always add support later, provided we reserve adequate placeholder syntax.
260
+
* We already allow (and are required to allow) in-line literal markup and other templating syntax in messages, since they are just character sequences.
261
+
* Unlike other solutions, does not require MessageFormat to reinterpret or process markup to create the desired output.
262
+
* It's HTML.
263
+
* The least surprising syntax for developers and translators.
264
+
* Some CAT tools already support HTML and other markup in translations.
265
+
266
+
#### Cons:
267
+
268
+
* Requires quoting in XML-based containers.
269
+
* Relies on a best-effort convention; is not a standard.
270
+
* Markup becomes a completely alien concept in MessageFormat:
271
+
* It cannot be validated via the AST nor the reigstry.
272
+
* It cannot be protected, unless put inside literal expressions.
273
+
* It is not supported by `formatToParts`, which in turn makes double-parsing difficult.
274
+
* It requires special handling when inserting messages into the DOM.
275
+
* It requires "sniffing" the message to detect embedded markup. XSS prevention becomes much more complicated.
276
+
277
+
### A2. XML Syntax
278
+
279
+
> `<foo>`, `</foo>`, or `<foo/>`
280
+
281
+
We could parse the HTML syntax as part of MessageFormat parsing,
282
+
and represent markup as first-class data-model concepts of MessageFormat.
283
+
284
+
```
285
+
This is <strong>bold</strong> and this is <img alt="an image">.
286
+
```
287
+
288
+
To represent HTML's auto-closing tags, like `<img>`,
289
+
we could follow HTML's syntax to the letter, similer to the snippet above,
290
+
and use the *span-open* syntax for them.
291
+
This would be consistent with HTML, but would require:
292
+
293
+
* Either the parser to hardcode which elements are standalone;
294
+
this approach wouldn't scale well beyond the current set of HTML elements.
295
+
296
+
* Or, the validation and processing which leverages the open/close and standalone concepts
297
+
to be possible only when the registry is available.
298
+
299
+
Alternatively, we could diverge from proper HTML,
300
+
and use the stricter XML syntax: `<img/>`.
301
+
302
+
```
303
+
This is <html:strong>bold</html:strong> and this is <html:img alt="an image" />.
304
+
```
305
+
306
+
The same approach would be used for self-closing elements defined by other dialects of XML.
307
+
308
+
#### Pros:
309
+
310
+
* Looks like HTML.
311
+
* The least surprising syntax for developers and translators.
312
+
313
+
#### Cons:
314
+
315
+
* Looks like HTML, but isn't *exactly* HTML, unless we go to great lengths to make it so.
316
+
See the differences between HTML and React's JSX as a case-study of consequences.
317
+
* Requires quoting in XML-based containers.
318
+
* It only supports HTML.
319
+
320
+
### A3. XML-like syntax
321
+
322
+
> `{foo}`, `{/foo}`, `{foo/}`
226
323
227
324
The goal of this solution is to avoid adding new sigils to the syntax.
228
325
Instead, it leverages the familiarity of the `foo`...`/foo` idiom,
229
326
inspired by HTML and BBCode.
230
327
231
328
This solution consists of adding new placeholder syntax:
232
329
`{foo}`, `{/foo}`, and `{foo/}`.
330
+
The data model and the runtime considerations are the same as in the proposed solution.
233
331
234
332
```
235
333
This is {html:strong}bold{/html:strong} and this is {html:img alt=|an image|/}.
@@ -239,7 +337,7 @@ Markup names are *effectively namespaced* due to their not using any sigils;
239
337
they are distinct from `$variables`, `:functions`, and `|literals|`.
240
338
241
339
> [!NOTE]
242
-
> This requires dropping unquoted literals as operands,
340
+
> This requires dropping unquoted non-numeric literals as operands,
243
341
> so that `{foo}` is not parsed as `{|foo|}`.
244
342
> See [#518](https://github.com/unicode-org/message-format-wg/issues/518).
245
343
@@ -251,7 +349,7 @@ The exact meaning of the new placeholer types is as follows:
251
349
252
350
#### Pros
253
351
254
-
*Doesn't add new sigils except for `/`,
352
+
*Only adds `/` as a new sigil,
255
353
which is universally known thanks to the wide-spread use of HTML.
256
354
257
355
* Using syntax inspired by HTML makes it familiar to most translators.
@@ -269,3 +367,72 @@ The exact meaning of the new placeholer types is as follows:
269
367
* Requires changes to the existing MF2 syntax: dropping unquoted literals as expression operands.
270
368
271
369
* Regular placeholders, e.g. `{$var}`, use the same `{...}` syntax, and may be confused for *open* elements.
370
+
371
+
### A4. Hash & Slash
372
+
373
+
> `{#foo}`, `{/foo}`, `{#foo/}`
374
+
375
+
This solution is similar to A3 in that
376
+
it also proposes to use the forward slash `/` for the closing element syntax.
377
+
However, opening elements are decorated with a pound sign `#`:
378
+
resulting in `{#foo}` and `{/foo}`.
379
+
380
+
This is similar to [Mustache](http://mustache.github.io/mustache.5.html)'s control flow syntax.
381
+
382
+
Standalone elements combine the sigil in front and HTML's forward slash `/` at the end of the placeholder: `{#foo/}`.
383
+
384
+
The data model and the runtime considerations are the same as in the proposed solution.
385
+
386
+
```
387
+
This is {#html:strong}bold{/html:strong} and this is {#html:img alt=|an image|/}.
388
+
```
389
+
390
+
Markup names are *namespaced* by their use of the pound sign `#` and the forward slash `/` sigils.
391
+
They are distinct from `$variables`, `:functions`, and `|literals|`.
392
+
393
+
#### Pros
394
+
395
+
* Leverages the familiarity of the forward slash `/` used for closing spans.
396
+
397
+
* Doesn't conflict with any other placeholder expressions.
398
+
399
+
* Prior art exists: Mustache.
400
+
401
+
#### Cons
402
+
403
+
* Introduces two new sigils, the pound sign `#` and the forward slash `/`.
404
+
405
+
* The standalone syntax is a bit clunky (but logical): `{#foo/}`.
406
+
407
+
* In Mustache, the `{{#foo}}`...`{{/foo}}` syntax is used for *control flow* statements rather than printable data.
408
+
409
+
### A5. Square Brackets
410
+
411
+
> `[foo]`, `[/foo]`, `[foo/]`
412
+
413
+
```
414
+
This is [html:strong]bold[/html:strong] and this is [html:img alt=|an image|/].
415
+
```
416
+
417
+
#### Pros
418
+
419
+
* Concise and less noisy than the alternatives.
420
+
421
+
* Doesn't add new sigils except for the forward slash `/`,
422
+
which is universally known thanks to the wide-spread use of HTML.
423
+
424
+
* Leverages the familiarity of the forward slash `/` used for closing spans.
425
+
426
+
* Makes it clear that `{42}` and `[foo]` are different concepts:
427
+
one is a standalone placeholder and the other is an open-span element.
428
+
429
+
* Makes it clear that markup and spans are not expressions,
430
+
and thus cannot be used in declarations nor selectors.
431
+
432
+
* Established prior art: the [BBCode](https://en.wikipedia.org/wiki/BBCode) syntax.
433
+
Despite being a niche language, BBCode can be argued to be many people's first introduction to markup-like syntax.
434
+
435
+
#### Cons
436
+
437
+
* Requires making `[` (and possibly `]`) special in text.
438
+
Arguably however, markup is more common in translations than the literal `[ ... ]`.
0 commit comments