Skip to content

[Discussion] {{Spannables}} #537

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
aphillips opened this issue Nov 28, 2023 · 16 comments · Fixed by #541
Closed

[Discussion] {{Spannables}} #537

aphillips opened this issue Nov 28, 2023 · 16 comments · Fixed by #541
Labels
design Design document or issues related to design LDML45 LDML45 Release (Tech Preview) resolve-candidate This issue appears to have been answered or resolved, and may be closed soon. specification Issue affects the specification syntax Issues related with syntax or ABNF

Comments

@aphillips
Copy link
Member

aphillips commented Nov 28, 2023

Per the 2023-11-27 teleconference, this issue is for discussing the design of spannables (also known as open/close/standalone markup).

The design document lives here and should be used as a reference in this discussion.

@aphillips aphillips added syntax Issues related with syntax or ABNF design Design document or issues related to design blocker-candidate The submitter thinks this might be a block for the next release Agenda+ Requested for upcoming teleconference specification Issue affects the specification labels Nov 28, 2023
@aphillips
Copy link
Member Author

aphillips commented Nov 28, 2023

(chair hat on)

There is a proposal to use option A4 "Hash and Slash" as the design based on a tepid "lack of opposition" consensus in the 2023-11-20 call, so perhaps pay close attention to this option, even though the design document uses the +/-/# sigils in examples. We merged #535 which makes "Hash and Slash" the design option per 2023-11-27 call, but the options available for discussion remain the same.

The foregoing is not a finding of consensus. Our goal will be to choose an approach in the 2023-12-04 call. The more and better we discuss ahead of that, the better.

@eemeli
Copy link
Collaborator

eemeli commented Nov 29, 2023

I'm ok with the currently proposed syntax.

My personal preference order for these choices is:

  1. <foo>…</foo> — In practice, this is the syntax currently used by markup in messages, and it's almost universally recognised as such. And while its origins are in XML, it's currently widely used at least as XML, HTML, and JSX, with slightly different parsing rules in each. I honestly think this would be the least surprising choice for users, and this would explicitly require those who wish to re-parse their output rather than formatting it to parts to do some heavier lifting rather than being able to default to this syntax, should we choose something else.
  2. {+foo}…{-foo} — If we need to go with curly braces, this is pretty good. It's what's in the spec right now, and the +/- pairing is pretty self-evident. Yes, it means that the rules need some fiddling to still allow for the really rare literal negative number operand, but I don't see that as a real cost to humans: -foo and -42 "read" differently, with the first one seeming to have two separate tokens - and foo, while -42 is a single number.
  3. {#foo}…{/foo} — It's fine, and works. It replaces the - problem by reusing what's control flow syntax elsewhere, but maybe that's not such a bad thing? It does mean that others who also wanted to span a pattern with some open/close indicators ended up with this syntax.
  4. {foo}…{/foo} — Unlike with all the other options, there's nothing about a bare {foo} that primes you as the reader to expect it to 1) not render in text and 2) start something. Especially if we still allow numbers so {42} would parse and render as a placeholder. To make this option work, we need to get rid of unquoted literals, or at least the non-numeric ones. And if we do that, then I'd prefer that we use some sigil at the start, and that brings us to one of the two preceding options.
  5. [foo]…[/foo] — If we're going to go beyond curly braces, let's just go with angle brackets. Outside of their use as markup, angle brackets are way less common in real-world text than square brackets (not going to dig up numbers, because it seems no-one else cares about data) and more surprising to need special treatment in syntax.

I don't have a strong stance on standalone markup, except to note that it's much less common in practice than open-close pairs, and that its use cases can be accounted for by either a purpose-built {:function} or an opening element like <foo>. With the {+foo} syntax in particular it does feel a bit clumsy as the + kinda expects the subsequent - to balance out (absolutely a feature, btw), so if we go with that I'd be more open than with the others to considering separate standalone syntax; the design doc includes {#foo} for that alternative.

As far as I can tell, the only place where using the same syntax for open & standalone adds some friction is for source message validators that do not access a registry and which do want to require open-close pairing within each single message. In all other cases, we can rely on the registry, the source message, or the implementation to tell us whether the element is open or standalone.

So for me the cost-benefit analysis of standalone markup makes it a pretty expensive addition providing rather little gain.

@eemeli eemeli changed the title [Discusson] {{Spannables}} [Discussion] {{Spannables}} Nov 29, 2023
@aphillips
Copy link
Member Author

@eemeli mentions:

… — In practice, this is the syntax currently used by markup in messages, and it's almost universally recognised as such.

Note that this "works" in our current syntax without any changes, since nothing prevents the literal part of a pattern from containing markup. However, the markup doesn't participate in formatting in any way. Making it participate in formatting would require recognizing sigils < and > and add more escaped to our syntax to account for it.

I generally agree with your other comments.

As far as I can tell, the only place where using the same syntax for open & standalone adds some friction is for source message validators that do not access a registry...

I think it would be useful to add an example, such as "If tool preparing a message for translation adds XLIFF around placeholders, it might need to know if the placeholder is paired or not, as this affects which tags are generated, even if the tool doesn't know the tag set being marked up"

@aphillips
Copy link
Member Author

Some observations.

In the design doc, we currently have name for markup productions and I think this should be changed to identifier to ensure that namespaces are permitted:

markup-open  = "#" name ; should be identifier
markup-close = "/" name   ; should be identifier

The primary "disagreement" we have is about the fate of standalone. Hash-and-slash allows open (or close) placeholders to appear unpaired and @eemeli proposes that we just let # be standalone. Assuming that we buy into the cases for separating standalone syntax, the cost of adding standalone is simply one more sigil. Could we agree to choose one more sigil for "standalone" and made it identical to markup-open save for the "standalone" connotation in the data model?

@stasm
Copy link
Collaborator

stasm commented Dec 6, 2023

The proposed “hash and slash” solution is acceptable to me precisely due to making room for standalone syntax which doesn’t need a third sigil. So it’s not “just one more sigil” for me; it’s still three.

#542 proposed three solutions to how we can support standalone markup without adding another sigil.

@aphillips
Copy link
Member Author

(thinking out loud)

Looking at the use cases in the spannables design this morning with an eye towards the discussion about requirements for the selected design, I see a class of cases where what translators want is:

  • code-like elements to be protected during the translation process--visible and moveable, but not something the translator has to retype
    • when the items are paired and ordered, they should stay in the correct order and enforce open/close
    • when there is something inside the element that needs translation, it should be exposed

That is, translators want tools to produce XLIFF's placeholders) for them. We could code that in our syntax, I suppose:

This has {#bpt}<strong>{/bpt}bold{#ept}</strong>{/ept} needs and
       {#ph}<img alt="{#sub}Translate me!{/sub} href=$url>{/ph}.

This has the benefit that it allows unpaired open or close code while allowing validation that the translation tooling markup is paired and syntactically correct. Formatting to parts can produce single-pass non-reparsed results.

This is the different from what developers want, since it is a PITA to type and difficult to look at--and adds no value to developers (except the deferred benefit of non-borken translations). CAT tools have to process messages anyway and would be better at inserting and removing (and maintaining) this protection than developers.

Some developers won't mind learning a message-specific variation on their code syntax and will want direct participation in rendering (that is, single-step format-and-process). This is mostly what we've been talking about as spannables. The above example could then look like this (using @stasm's #/ markup for standalone):

This has {#html:strong}bold{/html:strong} needs and {#html:img alt=|Translate me?| href=$url /}.

This doesn't quite satisfy what translators want, since it loses a number of checks they'd like to have (and which they get from raw XLIFF processing of HTML or other markup languages). Specifically, the open and close can get out of order without producing an error. To that end, we might want to introduce a non-option expression attributes to help tooling, e.g.:

This has {#html:strong @id=s1}bold{/html:strong @id=s1} and ...

@eemeli
Copy link
Collaborator

eemeli commented Dec 13, 2023

I spy with my little eye another concern that we've somewhat implicitly chose not to address in the 2.0 release: sub-flows, to use the XLIFF term.

In essence, in a message

This has {#html:strong}bold{/html:strong} needs and {#html:img alt=|Translate me?| href=$url /}.

the Translate me? part could (should?) be considered a separate translation unit rather than a literal value that can't contain a variable reference. Are we really okay with this? Or should we leave space for later reconsideration that would allow for something like a .local taking a pattern value?


As for the message in question, my expectation would be that in the real world it ends up either as

This has <b>bold</b> needs and <img alt="Translate me!" href="$url">.

or as

This has {#b}bold{/b} needs and {#img alt=|Translate me?|/}.

In the first case, the developer is formatting to a string and just presumes that HTML will be fine, and that translators will know how to deal with localizable attributes. XSS is a concern that's dealt with Elsewhere™.

In the second case, the developer is formatting to parts and separately merging in the href, and therefore needs to play according to our rules. Their localization uses tools that also need to be MF2-aware.

In neither case do I believe that strings which may include HTML will use an html: namespace.

With the latter case, the "MF2-awareness" of the tools may well be encoded in an MF2-XLIFF transformer, so the translator's view of this string could be something completely different. And for localizable attributes, it might even be able to extract the sub-flow from the parent message.

@aphillips
Copy link
Member Author

the Translate me? part could (should?) be considered a separate translation unit rather than a literal value that can't contain a variable reference. Are we really okay with this? Or should we leave space for later reconsideration that would allow for something like a .local taking a pattern value?

One could solve that using a .local, but we don't provide something at the moment.

I am thinking that we should keep our eyes on the XLIFF transform. Curious what you think about using attributes here.

In neither case do I believe that strings which may include HTML will use an html: namespace.

I'm also curious why you think so? A namespace would make visible the type of markup to tooling as well as to the formatter runtime. I know you're mostly thinking about the case in which a data model or "format-to" part is handled by the formatter's caller (rather than as part of formatting), but even there I can see how users will want to plug-in and differentiate different markup regimes. Having a namespace prefix tells me if {#span} is HTML or TTML or something else and provides a hook from which to dangle the implementation code.

@macchiati
Copy link
Member

If we are comfortable requiring that a single namespace be used for the spanables, that could be in the 'preface' section:

In pseudocode:

.namespace=html5

or

-namespace=html5 scope=spannables

@macchiati
Copy link
Member

Also, I don't think we need the id=x. The only case where that would be necessary is with 2 identically named items. But even there, I don't think the tooling needs anything. The IDs can be purely internal, derived from the original message:

x{#b stuff1}y{/b stuff2}z{#b stuff3}w{/bstuff4}
=> x{#b stuff1 id=1a}y{/b stuff2 id=1b}z{#b stuff3 id=2a}w{/bstuff4 id=2b}

The tooling would require that the a/b pairs be in order in the translation, but the id numbers can occur in any order.

@eemeli
Copy link
Collaborator

eemeli commented Dec 13, 2023

the Translate me? part could (should?) be considered a separate translation unit rather than a literal value that can't contain a variable reference. Are we really okay with this? Or should we leave space for later reconsideration that would allow for something like a .local taking a pattern value?

One could solve that using a .local, but we don't provide something at the moment.

Are you thinking of some custom sub-syntax-formatter function? Like this:

.local $x = {|Translate $foo here.| :template foo=$foo}
... {#img alt=$x/} ...

With the way we're now going, that'll be a pretty likely outcome.

I am thinking that we should keep our eyes on the XLIFF transform. Curious what you think about using attributes here.

Can you clarify which attributes you're thinking of here?

In neither case do I believe that strings which may include HTML will use an html: namespace.

I'm also curious why you think so?

Because in most cases it's not necessary as systems which may include HTML in their messages will only use HTML for markup. And when something else is needed as well, then namespaces like ttml: will make that easy.

Developers are lazy, and they'll go with {#b} rather than {#html:strong} because the former will work just as well as the latter. They'll know and control how in code the message is used, and how the formatter for the message is called. In practice, it's for the exact same reason why most current localizable messages that include HTML or XML <tags> don't namespace them.

@aphillips
Copy link
Member Author

@macchiati

Also, I don't think we need the id=x.

The point of id (and other attributes) would be compatibility with XLIFF, not anything internal to MF2. The id attributes are how XLIFF keeps track of where elements are paired. Other attributes track whether tags can be reordered or removed, etc.

That is, I'm thinking about the problem "how do we enable CAT tools to generate the XLIFF markup the developer intends?" while simultaneously letting developers put markup into messages.

@eemeli

Are you thinking of some custom sub-syntax-formatter function? Like this:

Maybe even less specific than that:

.local $x = {|Translate $foo here| @translate=yes}  // :string implied
{{You have some {#img alt=$x /} in this pattern}}

Developers are lazy, and they'll go with {#b} rather than {#html:strong} because the former will work just as well as the latter.

Yes, that's true. But we should keep an eye out to enabling (not requiring) ways to do more complex things. I've been including namespacing in examples not because I don't think folks will use {#b} when being lazy, but instead thinking about non-lazy cases where namespacing becomes useful. Your comment was close to saying that folks would never use namespaces, which is different from "mostly won't bother with"

I also remain concerned about "two syntaxes in the same message"--I have multiple examples of places where this has bothered me in the past.

@eemeli
Copy link
Collaborator

eemeli commented Dec 14, 2023

Are you thinking of some custom sub-syntax-formatter function? Like this:

Maybe even less specific than that:

.local $x = {|Translate $foo here| @translate=yes}  // :string implied
{{You have some {#img alt=$x /} in this pattern}}

That won't work, because the implicit (custom) :string won't have access to the value of $foo unless it's explicitly passed in as an option.

Your comment was close to saying that folks would never use namespaces, which is different from "mostly won't bother with"

The latter is what I intended to communicate.

I also remain concerned about "two syntaxes in the same message"--I have multiple examples of places where this has bothered me in the past.

Indeed. Which is why I started to wonder whether we should effectively reserve enough space in the syntax for a .local to take a pattern rather than expression value.

@markusicu
Copy link
Member

In HTML, the lack of syntactic distinction between "open" and "standalone" causes problems and hardcoded lists of elements that can be one or the other. Let's not start a new standard with these problems and hacks.

I don't feel strongly about the particular syntax, whether {#standalone} or maybe even {+-standalone} to save another "sigil". I just feel fairly strongly that we need a syntactic distinction.


Do I understand correctly that "markup" is not going to be in the registry? That makes me nervous. It seems like different organizations will invent different sets of things and how to process them, making messages with markup not-interoperable.

@aphillips
Copy link
Member Author

This is the discussion thread for spannables. Keeping it open in spite of merging the design doc.

@aphillips
Copy link
Member Author

I intend to close this thread after the 2024-01-15 call.

@aphillips aphillips added resolve-candidate This issue appears to have been answered or resolved, and may be closed soon. LDML45 LDML45 Release (Tech Preview) and removed blocker-candidate The submitter thinks this might be a block for the next release Agenda+ Requested for upcoming teleconference labels Jan 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Design document or issues related to design LDML45 LDML45 Release (Tech Preview) resolve-candidate This issue appears to have been answered or resolved, and may be closed soon. specification Issue affects the specification syntax Issues related with syntax or ABNF
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants