Skip to content

What's left to discuss on markup? #375

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
eemeli opened this issue Apr 10, 2023 · 15 comments · Fixed by #541
Closed

What's left to discuss on markup? #375

eemeli opened this issue Apr 10, 2023 · 15 comments · Fixed by #541
Labels
Agenda+ Requested for upcoming teleconference resolve-candidate This issue appears to have been answered or resolved, and may be closed soon. syntax Issues related with syntax or ABNF

Comments

@eemeli
Copy link
Collaborator

eemeli commented Apr 10, 2023

As we merged #371, we didn't explicitly discuss if this allows us to close all or some of the following issues:

Rather than commenting on each of those, I thought it might be appropriate to reflect as a whole what parts of markup we ought to still discuss at this time, and then close or highlight those issues.

My own sense is that all of the above could and should be closed.

@eemeli eemeli added the Agenda+ Requested for upcoming teleconference label May 22, 2023
@cdaringe
Copy link
Contributor

At Walmart, our c, java, and javascript ICU MFv1 implementations use extensive markup, without the need for ICU MF direct markup support. I was surprised to see markup supported here in MFv2. As a heavy user of markup, I am biased to MF offering a pattern of IOC--giving me full control of-all-things-formatting-and-templating--as opposed to a limited markup DSL within MF. That's a bit abstract, so allow me to share a micro-demo below.

To support not only markup, but all other use cases where a user may want to tap or map over formatted strings, we extended MFv1 to add an optional fmt/callback at each visit to variable:

Demo:

termsAndConditions: Read and agree to our {termsText, select, other {Terms of Use}} and ...SNIP...
m(messages, "termsAndConditions", {
  termsText: {
    fmt: (x) => (
      <Link className="dark-gray" href={externalTermsHref}>
        {x}
      </Link>
    ),
  }
})

Observations:

  • on visit-exit of termsText, our MFv1 compiler says "hey, i see fmt present. i'll pass the result of the select into this format function before continuing to fold my full formatting
  • in my fmt function, i unlock the following features
    • unconstrained markup: I decorate select output with rich html/javascript
    • unconstrained data: I add externally scoped data directly into my markup. My MF string needs no knowledge of externalTermsHref, which is powerful.
    • unconstrained return type: I return a React component, not a string!

With respect to the markup DSL,

  1. the markup DSL forces me to encode data that I don't want in my MF strings (e.g. button hrefs, or other non translation relevant content).
  2. the markup DSL encourages churn--visual changes likely invoke translation workflows. changing a button href, for example ought not invoke translations submissions, which I'd wager they would practically for development teams
  3. the markup DSL locks me into strings

I do not mean to disparage the markup DSL--I see it's pragmatism too, especially for highly static content!

However, as a MF+markup power-user, I don't want the markup DSL--I want a more powerful format capability that puts markup generation in my control and out of MF entirely.

  1. I'd need a better way to enter my markup formatting logic.
    1. In my demo, you can see i abuse the select to enter my formatting. Just invoking my format on vars in MFv2 could possibly do the trick:
      2. termsAndConditions: Read and agree to our $termsText and ...SNIP...
  2. If MFv2 could offer extension points natively, like my fmt callback, really interesting and portable designs could be achieved.
    • For instance, you saw me above map MF => React. Other mappings could occur. For example, I could take my MF outputs and map it into some AndroidViewPrimitive, simply by means of letting me participate in the fold/map cycle of MF formatting. The markup design limits MF to just string output, when the output could possibly support other runtime with an IOC-formatting pattern.

Food for thought!

@zbraniecki
Copy link
Member

the markup DSL forces me to encode data that I don't want in my MF strings (e.g. button hrefs, or other non translation relevant content).

I doesn't. MF2 markup elements do not have to store attributes that are not localizable, just like Fluent markup doesn't - the merge happens in bindings between l10n markup and DOM markup.

the markup DSL encourages churn--visual changes likely invoke translation workflows. changing a button href, for example ought not invoke translations submissions, which I'd wager they would practically for development teams

It doesn't, see above.

the markup DSL locks me into strings

Can you extend that point, I do not understand this concern.

@zbraniecki
Copy link
Member

Demo:

Your architecture forces new DOM generation on each translation, since it calls fmt which generates new <Link/>. This is a papercut that we should try to avoid.
Updating translation should preserve identity of elements where possible.

See list of use cases we collected for markup scenarios at Mozilla - https://github.com/zbraniecki/fluent-domoverlays-js/wiki/New-Features-%28rev-3%29 - I believe your architecture won't scale to those.

@cdaringe
Copy link
Contributor

cdaringe commented Jun 14, 2023

Your architecture forces new DOM generation on each translation

This is actually not the case. I am using react in my demo, and react uses VDOM. There are other mitigations to this in React to avoid creating a new ad-hoc react component, but i will omit as the demo above was for MVP only 😄 .

Updating translation should preserve identity of elements where possible.

I agree. I'd posit that using the markup API or the IOC pattern i've discussed above are subjected to the same amount of re-ordering/parenting. I don't see any characteristics that really support one soln or the other w.r.t. to this topic, albeit in my demo, there is bulk de-duplication of markup tags/attributes as it's consolidated to a single callsite. In MFv2, because of the top-level flat nature of nested translations, the markup would likely need be duplicated for n of N match branches (thus arguably many opportunities for hierarchy shifts), but that is really of negligible risk.

I believe your architecture won't scale to those.

Maybe! I looked at your link, and i didn't see anything resonate, even weakly so. You probably see something I don't. #11` is avoid churn, which i think the markup DSL offers up a handy footgun to churn.

I need to study the spec a little deeply, candidly, because it could be the case that my proposal is solvable thru :function formatters. However, i'd want ad-hoc formatter functions, not top level formatters, as the formatting in my demo is localized only to the message being formatted.

@cdaringe
Copy link
Contributor

cdaringe commented Jun 14, 2023

the markup DSL forces me to encode data...

I doesn't. MF2 markup elements do not have to store attributes that are not localizable

Sorry, maybe I was unclear. I could have said to be more complete:

If my application demands rich attributes in my HTML, MFv2 markup DSL invites me to put this content in translation files. MFv2 does not have another means to attach these attributes to my markup around translated entities, thus they must be placed here. Consequently, I must now expand my markup generation from one system to two systems--my usual HTML generation system (e.g. react, jekyll, gatsy, hugo, vanilla-html|js) AND my MFv2 translation system."

@zbraniecki
Copy link
Member

This is actually not the case. I am using react in my demo, and react uses VDOM. There are other mitigations to this in React to avoid creating a new ad-hoc react component, but i will omit as the demo above was for MVP only 😄 .

Assuming some secondary strategy will mitigate the architectural choice to avoid new element generation on each pass is suboptimal. MF2 is not React specific, nor any other high level UI toolkit.
I'd love to avoid architectural choices that require such footguns/workarounds later.

I encourage you to evaluate scenarios where markup element is:

  • passed to MF2 formatting from developer
  • generated out of a function (say MF2 function "STRONG" that wraps some parts in opening/closing pair of markup elements)
  • introduced by the localizer (things like "em" or "sup" in HTML)

and how your model handles them. I think once you desugar, the proposal is very aligned with what you want to achieve, but more flexible than what you showed in the demo.

@cdaringe
Copy link
Contributor

cdaringe commented Jun 14, 2023

the markup DSL locks me into strings

Can you extend that point, I do not understand this concern.

Sure, gladly. Thanks for the callout.

MF's goal is to offer templatized string => string translations. Markup formats can be supported because they happen to be string based. That's all fine and good!

Styling/formatting/semantic wrapping of translated content is highly desirable. We all agree that such capability is needed sometimes within a translation. That's certainly settled, as evidenced by the markup DSL to begin with. e.g. User, please <button href='...'>click here!</button>.

The core problem is that stringy-markup is not a portable between runtimes/environments. Not all users of MFv2 necessarily support stringy-markup-based rendering capabilities. Android and iOS for instance are two extremely common environments where translations get in front of users' eyes & ears, but do not use a stringly-based-markup DSL (sans webview) as their primary rendering primitive. Even in my example in web--I use react as the mechanism for providing renderable content, not strings. MFv2 may support markup, but I needed to put my content in a React component, not a string. Thus, I posit that there is both a more portable mechanism to use the core value of MF (translatable template string generation) whilst supporting any given runtime.

MF operates as follows:

Current state:

Given an input I,
produce a translated output string.

<I>(input: I) => string

As discussed, this practically works only in limited environments.

Desired state:

Given an input I, and
given a formatter to adapt translated string to my environment/runtime,
produce a translated output O,
where O

  • default: <I>(input: I) => string
  • <I, O>(input: I) => (formattedTranslation: string) => O

MF can continue to do all of the great stuff it does. However, rather than making the assumption that the target runtime wants strings only, allow user-space and/or adapters to tailor the output to work in their environment. This would allow not only formatting/styling to happen in a technology agnostic way, but also greatly increase the capability of MF to run portably across different systems. TLDR, let users provide formatting, map translated outputs. It arguably could void the need for a markup DSL, which I weakly suggest may not be required, because given a generic formatting API, voids the need to even embed styling/formatting/markup concerns in my translation content files at all.

@cdaringe
Copy link
Contributor

cdaringe commented Jun 14, 2023

MF2 is not React specific, nor any other high level UI toolkit.

Strong agreed.

The problem though, is UI toolkits that apply formatting don't all use stringy markup for formatting.
Text is for UIs, but only few UIs support string markup. Raw HTML does... but is HTML generation really the only possible UI target MF should be compatible with? I think with some small change, MF could be much more widely applicable.

I suggest that MF could offer the option not concern itself with markup at all. Instead, let the MF callee own taking the translated strings, and converting them into the UI primitive of choice.

MF should organize formatted raw string content (it already does). It is an over-step, or an undersight, IMHO, for MF to assume that it should do a formattedParts.join("") at the end of the formatting process. By default, I think MF really should be doing something like:

this.compiler.envAdapter.format(opts, ...orderedTranslatedStringParts)

MF could get out of the game of producing final strings, and instead focus on the production and ordering of translatable entities, and letting the runtime figure out how to present them. Psuedo code examples:

// js-strings
const format: (opts, parts: TranslatedStringPart[]) => parts.map(p => opts.fmt[p.key] ? opts.fmt(p.key, p.valu,) : p.value;

// react strings
const format: (opts, parts: TranslatedStringPart[]) => <>parts.map(p => opts.fmt[p.key] ? opts.fmt(p.key, p.value) : p.value</>
FormatFn format: (opts, parts: ArrayList<TranslatedStringPart>) => parts.stream().map(p -> {
  if (opts.fmt[p.key])(
    return opts.fmt(p.key, p.value);
  }
return p.value;
}).collect().join("")

Sorry if I'm too verbose 🤓 . Just trying to articulate clearly. We're also clearly both at our keyboards at the same time, so our responses are a bit out-of-order 😄 .

@zbraniecki
Copy link
Member

The problem though, is UI toolkits that apply formatting don't all use stringy markup for formatting.

MF2 markup is not stringy markup. It is a DSL so in MF2 markup you annotate markup as "string" (well, function call), but it's not stringy. It is meant to be merged with an actual UI Element by the bindings.

I suggest that MF could offer the option not concern itself with markup at all. Instead, let the MF callee own taking the translated strings, and converting them into the UI primitive of choice.

How can the CAT tool, validation tooling etc. work with such model? How can CAT tool support localizer being able to reorder elements in a message, add open but require close etc?

MF should organize formatted raw string content (it already does). It is an over-step, or an undersight, IMHO, for MF to assume that it should do a formattedParts.join("") at the end of the formatting process.

I do not believe MF assumes that, parts.join("") is just an option like toString() in JS. Bindings, TTS and other engines will consume parts.

@eemeli
Copy link
Collaborator Author

eemeli commented Jun 14, 2023

@cdaringe It may be useful for you to play around with the polyfill for the Intl.MessageFormat proposal; it's available on npm:

npm i messageformat@next

With that, you get results like this:

import { MessageFormat } from 'messageformat'

const mf = new MessageFormat('{Click {+a href=$url}here{-a} to continue}', 'en')
mf.resolveMessage({ url: 'http://example.com' })
{
  type: 'message',
  value: [
    { type: 'literal', value: 'Click ' },
    {
      type: 'markup-start',
      value: 'a',
      options: { href: 'http://example.com' }
    },
    { type: 'literal', value: 'here' },
    { type: 'markup-end', value: 'a' },
    { type: 'literal', value: ' to continue' }
  ]
}

This low-level API is intended to serve as a building block for formatting to exactly the sort of React or other non-flat-string targets that I understand you to also be working with. Crucially, that API isn't actually defined by the MF2 spec, but by the Intl.MessageFormat spec, much like the ICU libraries will define their interfaces separately from the MF2 language spec.

Given that, the question I'd like to pose to you is this: Are your concerns related to the shape of MF2 messages, or the APIs for formatting them?

@cdaringe
Copy link
Contributor

cdaringe commented Jun 14, 2023

Hey @eemeli! As usual, great points. Thanks for helping disambiguate. Your feedback prompted me to challenge some of my assumptions.

resolveMessage is precisely the type of output i'm talking about. Thanks for sharing that snippet. In my naive perception of the world, such an output would indeed be part of the MF2 spec, vs the interface being a pure implementation detail for any given runtime, such as the Intl API.

Are your concerns related to the shape of MF2 messages, or the APIs for formatting them?

The APIs for formatting them.

In my myopic understanding of the ecosystem, I would think that this WG could specify some amount interface definitions for binding impls to satisfy. By specifying some well-known interfaces, such as resolveMessage, it could promote language or runtime impls that ensure MessageFormat works great for all end-user developers. For example, my ios friend had a very challenging time adapting his Clang MFv1 impl to support formatting/styling/layout. MFv2 markup wouldn't help him. Markup pretty much only works for web. That's not an absolutely claim (there are obviously more markup languages than HTML), but practically speaking, MFv2 now has a subjectively "web only" feature in it. I think that's a mistake. We're giving a formatting API to web devs who use direct HTML explicitly via markup, but we're not giving any similar API to any other type of user. The iOS dev had to hack his MF impl deeply to de-stringify it. What a delight it would have been to say "the MFv1 API output yields structured data of known form (see resolveMessage output). Because of this, you can easily adapt it to your runtime".

practically speaking, MFv2 now has a subjectively "web only" feature in it. I think that's a mistake.

I think it's fair to posit that defining implementation interfaces is not MF's role. MF could define the input only, and any given engine could define how to parse and produce output freely. I do, however, think it would be beneficial to specify at least some subjective amount of API contract. I believe cases just like this are actually quite easy to overlook. A naive implementer for any given runtime may produce a string-only output API, not a lovely resolveMessage API as you have presented. You have the experience and wisdom to author smart bindings like that. Could MF itself not promote such wise bindings as part of its specification? If messageformat-haxe-bindings came into existence tomorrow, should we not recommend resolveMessage, or give the haxe implementer a suite of golden cases to test compliance against?

I'm thinking out loud. Thanks for reading this far.

I was looking at the goals page. Goals 4 & 6 both somewhat suggest this WG could promote such an interface, but goal 6 also kind demotes the idea too 😄 .

I can imagine something like:

# MessageFormat

Definition: A suite of specifications promoting developer-friendly translation workstreams regarding the production of user-facing text.

## Specifications

- Syntax: ...
- Required APIs:
   - (t: JSONInput) => ResolvedMessageOutput
- Recommended APIs:
   - ResolvedMessageOutput => string
   - <O>(o: ResolvedMessageOutput) => O 
- Golden data (input/output cases): ... 

I'll be mulling over this more.

@zbraniecki
Copy link
Member

Markup pretty much only works for web. That's not an absolutely claim (there are obviously more markup languages than HTML), but practically speaking, MFv2 now has a subjectively "web only" feature in it.

Can you please elaborate this point. The WG does not share your perspective - we actually worked hard to ensure that the markup is not HTML or Web specific and can scale to any UI concepts (including GUI and VUI models for TTS).

@cdaringe
Copy link
Contributor

@zbraniecki, i didn't realize until this morning that #272 is more or less a dupe of this conversation. i'll take it up over there or in #356. apologies for leaking this topic between issues

@stasm
Copy link
Collaborator

stasm commented Jun 19, 2023

@cdaringe There's also #41 with more discussion about formatting to something other than text.

@cdaringe
Copy link
Contributor

cdaringe commented Jun 21, 2023

Alright! Disregard all prior comments here :) I've read through all of the history on the matter, and created a timeline of markup related events: #401

I think markup has both settled design and unsettled design. I'd like to make some assertions about both. Please correct my incorrect assertions 😄

Settled

  • MFv2 has no opinion on "markup", in the traditional definition of the word
    • Users may enter markup into messages, but MFv2 makes no guarantee that your markup-lang of choice is strictly compatible. Escapes may be needed.
  • MFv2 has a "mf-markup" capability (my own wording). "mf-markup" is not really markup at all--it's just functions. These functions can be used to decorate text with markup, or map into any other UI primitive. Alternatively, markup can be used by a formatToParts API to apply markup.
    • Aside, we may consider renaming our internal documentation/syntax/verbiage away from the term markup in order to avoid ambiguity, but that's a secondary discussion!

Unsettled

I think each of the following needs assertive answers, based on outstanding community discussions (as captured in #401):

  1. MARKUP_FN_SIGNATURES: What will the signature for markup functions be? Will function calls be truly standard, or will extra data be present in invocations?
  2. RUNTIME_FORMATTING_EXTENSIBILITY: Where in the spec shall we declare that MF implementations must offer a runtime extension for formatting functions? (This may already be completed 😄 )
  3. MARKUP_SPANS: Will the opening and closing of markup syntax have any influence on the formatting function inputs? e.g., if markup is open-closed balanced, is the formatting function called with the internal contents? If not, see: FORMAT_TO_PARTS, because it would be the only other mechanism to identify spans, and becomes much more important for spec-promotion.
  4. FORMAT_TO_PARTS: Will the formatToParts API input/output interfaces be rolled into the specification? If so, where?
  5. MARKUP_XML: Will we support or drop any squishy desires to pair our syntax with something XML-ish? This question came and went with the tides 😄 🌊

@aphillips aphillips added the syntax Issues related with syntax or ABNF label Jul 16, 2023
@aphillips aphillips added the resolve-candidate This issue appears to have been answered or resolved, and may be closed soon. label Dec 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Agenda+ Requested for upcoming teleconference resolve-candidate This issue appears to have been answered or resolved, and may be closed soon. syntax Issues related with syntax or ABNF
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants