diff --git a/meetings/2020/notes-2020-10-19.md b/meetings/2020/notes-2020-10-19.md new file mode 100644 index 0000000000..edc5e5864e --- /dev/null +++ b/meetings/2020/notes-2020-10-19.md @@ -0,0 +1,289 @@ +#### October 19 Attendees: +- Romulo Cintra - CaixaBank (RCA) +- Nicolas Bouvrette - Expedia (NIC) +- Staś Małolepszy - Google (STA) +- George Rhoten - Apple (GWR) +- Nicolas Bouvrette - Expedia (NIC) +- Mihai Nita (MIH) +- Elango Cheran - Google (ECH) +- Stan Rygal - Expedia (STR) +- Luke Swartz - Google (LHS) +- Richard Gibson - OpenJSF (RGN) +- David Filip - ADAPT Centre @ Trinity College Dublin (DAF) +- Colin Sprague - DocuSign (CLS) +- Zibi Braniecki - Mozilla (ZBI) +- Eemeli Aro - OpenJSF (EAO) + + + +## MessageFormat Working Group Contacts: + +- [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg) + + +## Next Meeting + +November 16, 10am PDT (6pm GMT) + +## Agenda +- [ Agenda on Github ](https://github.com/unicode-org/message-format-wg/issues/115) + +Easy way to volunteer to participate in Chair Group : [Link](https://github.com/unicode-org/message-format-wg/pull/70) + +We should go ahead and get started + +###Moderator : + +### Chair Group Announcements + +## Review Taskforce progress for #103 (Taskforce notes) + +Taskforce notes: https://docs.google.com/document/d/1lAyBZR2VQR8ILqvcg5Gad_wf7QWUbsoJ13wGZFSmtbE/edit#heading=h.tulel52cgapk + +ECH : Presenting taskforce notes, on the taskforce meeting we decided to evaluate the pros & cons based, we decided to have a more diverse group of stakeholders to help us find + + + +## MFWG - Stakeholder definition and debate + +RCA: This was to follow up from the task force meeting to discuss how we bring in stakeholders that help us decide on the message selector question. + +NIC: Do we have an issue for that? + +RCA: We didn't discuss doing this last meeting, but maybe it makes more sense to discuss a little bit in this meeting -- who to bring in and + +NIC: Some of the action items from the task force #2 (regarding issue #103) meeting notes: + +Collecting stakeholders +Listing all categories of stakeholders +Inviting more representatives from stakeholder categories +Goal is collect information that will help us decide on priorities among the categories of stakeholders + + + +Stakeholders (by type) + +Developers (single language & i18n) +Translators +Nature of engagement +Professional +Amateur (community) +Multi-language developers +Type of content +Editor +Reviewer (client side or 3rd party) → including QA/Quality Check +Terminologist +Language specialist +TMS (Translation Management System) representatives +Users (want something grammatically correct) +Copywriting +UX + +* Check what other communities are doing in terms of stakeholder categories as an action item + +MIH: There are at least 2 types of translators: translators who are professional translators, you pay the bills by doing this all day long, you get paid by the word, and "community translators" which includes unpaid/amateur translators and multilingual developers. + +ZIB: To the distinction between professional and amateur translators, we're not the first community to encounter this distinction. The list of categories of stakeholders and translators must be a superset of the categories that individual companies would need. + +NIC: Should we add developer communities as a stakeholder? + +ZIB: Maybe we should look at how other organizations categorize their translators, organizations like W3C or TC39, instead of trying to invent categories ourselves. + +DAF: I want to chime in and agree that "translator" is not a single category. We should further subdivide professional translators. Some professional translators are employed full time by companies or government agencies, and they are able to go deep on their subject area. There are freelance translators who still pay their bills through translation and work for different companies or through vendors, and they still might be able to volunteer their services. Then there are people who are terminology experts (terminologists), language specialists, etc. For all these people, the GUI and filtering capabilities, the variability of syntax, etc. are a huge barrier. As MIH mentioned, they don't want to be slowed down by new syntax. + +In companies like Google or Oracle, these companies have employees who might go through the codebase, resolve issues, do quality checks, etc. And they may not even do translation themselves as much as managing vendors (freelancers) who do translation. + +GRH: be sure to consider users (end users) who want to hear “natural sounding” language—they are also important stakeholders + +RCA: Create persona for each category of stakeholders. Flesh out the persona's story -- who the person is, how they engage / what their use case is, what their challenges are, etc. + +ECH: Personas are like when people apply Design Thinking, or people who apply design thinking to UX development design. + +STR: I would consider language specialists and terminologists both in the same category called Localization Quality Manager (in Expedia terminology). My point being, if we have L.S. and Trm, we don't need a separate quality-focused category. + +DAF: There is the EC translation service that splits 60% of the translation work in-house, and 40% get outsourced. The more work that they outsource, the more holistic the role becomes for the in-house employees to manage the translation supply chain. + +STA: What about UX and copywriters? Especially copywriters. + +RCA: We can take this as our list of stakeholders, but we can also look at other organizations as ZIB said. + +DAF: WRT, Zibi’s point to look at how other consortia structure their audiences. W3C ha sthe internationalization core group that mainly looks at developers and concentrates on making HTML output “print ready”. In OASIS, only the XLIFF committees are aware of I18n/L10n. We use mainly three categories, buyers (content owners), CAT tool vendors, professional translators. We believe that ideally a translator should be able to keep working in their one editor of choice that would shield them from all technica complexity and variety, they should not be exposed to code/markup and related technical variances.. + +ZIB: My suggestion was not to see how other organizations classify engineers, translators, etc., but how they classify their target audiences. W3C creates HTML, which targets both professional and amateur developers. + +DAF: Html is no longer under W3C control, it went rogue. But the technologies that are still in W3C such as linked data or ODRL target I’d say an “extreme” developer.. + +ZIB: Since we're targeting ECMA TC39 as part of our output, maybe that is a good place to start. + +RCA: Let's advance onto the next topic. I am creating the issue to capture this, and ZIB and [#124](https://github.com/unicode-org/message-format-wg/issues/124) + +DAF and others, please review that issue. + +## Review collected "extreme" MessageFormat use cases #119 + +Issue #119 to collect corner cases of message formatting. + +NIC: I called it "extreme" use cases, but maybe we call this "corner cases". + +STA: The general theme in these examples is that it is a long sentence with many selectors. We wanted to find real-world examples of messages with multiple selectors. For me, this is only one kind of corner case. Are we missing something? + +NIC: Mihai are you creating your example (of rewriting a previous example) just using concatenation? + +MIH: Yes. If you don’t do some kind of concatenation or list formatting, then you’re forced to say odd things like “This resort has no pools, no golf courses…” But the example that I rewrote is just a list, and it is just asking for a list formatter. And it is not really in the same class as the other examples, where the items of the list don't have to match the other items grammatically (in terms of sentence agreement) as is necessary for the other examples. + +NIC: Is there consensus that there is no real blocker to message-level selector? + +ZIB: I don't not agree that there is a consensus. + +RCA: In the task force meeting, we agreed to come up with corner cases to help us make a decision. How many examples do we need? + +STA: I don't think there is one clear answer. + +RCA: We can spend multiple more months discussing this. + +MIH: Do we have an agreement that we can convert between internal selectors and message-level selectors? If we agree that we can do that, then why would it matter? + +ZIB: We don't have an agreement that we can do this losslessly. + +MIH: Define losslessly. + +ZIB: We send for translation and then translate it back to the original language + +MIH: similar to Unicode normalization, you can normalize things but not round-trip, that we do not have a diff. For example, we could say “everything must be precomposed”, or we can say that all case comparison/string comparison/etc has to handle both normalized & non-normalized forms → my position is that you should normalize at the edge, and then simplify processing on the inside + +MIH: Do we care? I'd compare it to Unicode normalization. For example, if we have a Unicode string that has both composed and decomposed normalized characters, then we can normalize to composed or decomposed forms, but thereafter, we cannot recover the original string. + +ZIB: It worries me that we cannot losslessly convert without + +EAO: We can losslessly convert to message-level selectors, but we cannot losslessly convert to internal selectors. + +ZIB: I think this formulation is interesting. + +LHS: I think it's not an issue of converting, b/c it's like saying we're converting ASCII to Unicode, but ASCII is already a subset of Unicode. + +MIH: The question of + +ZIB: What about a scenario of having 2 data models, where an organization can choose a simpler + +MIH: Why support 2 data models in the standard, where we already know that one style is messy? + +ZIB: That's just your opinion that it's messy. + +EAO: What is messy + +ZIB: What’s different from your analogy is that we don’t have agreement that one model is universally better, so we can’t just use it. If we use “lossy” versions, then anyone who is using another model will be “canonicalized” into another version, and it worries me to say that “since we can convert, it’s not a problem”, since it is. + + +STA: Where does the complexity live? We're discussing where to put the complexity, but if we keep the standard simple, then we can design syntaxes that have more complex, so long as they convert to the standard, that is simpler. + +MIH: Yes, you summarized my point well. + +ECH: Complexity and Simplicity are opposite, Easy and Difficult are opposite, and different. I'm going to reference info from Simple Made Easy which does a good job defining precisely what simplicity and complexity mean. Complexity/simplicity is an objective measure, while easy/difficult are relative measures. When we talk about whether the combinatorial expansion that occurs when converting internal selector messages into message-level selector messages, the increase in messages is harder for humans, which is a relative measure. The level of difficulty can be mitigated, ex: with existing CAT tools. But the increase in message count is not a sign of complexity. Cardinality is not the same as complexity. Simplicity is not about having just 1 thing. In fact, simplicity means increasing the number of things because it is about taking apart intertwined distinct concerns and keeping them separate, so it increases the number of things. + +ECH: There's a certain amount of inherent complexity in the problem, which means that at least that amount of complexity will live one way or another. The question of how much complexity should live in the standard is still a good one to ask. + +LHS: ...but a form that allowed internal selectors would also allow external selectors, it’s just a superset + +ZIB: Let’s say your translation tools only offer external selectors, so a data model that works with internal, you could compile to external selectors on the outside + +MIH: So it goes back to where we normalize. We could normalize when we go into the data model, or when we convert to XLIFF. + +ZIB: Let’s say an organization doesn’t accept anything but external. We could just convert to this. + +MIH: But why complicate the standard data model when we know that internal is horribly messy? + +EAO: It is only messy from a human point of view, from a machine point of view it can be very straightforward + +STA: This is about hiding or showing complexity, and who needs to deal with that complexity? If the inline approach is there, then everyone dealing with the standard has to deal with that complexity. If we do external selectors, then the complexity is dealt with by parties that want to deal with that, and the standard is more “primitive” (in a good way). Maybe you care about size, or expressiveness, you can still do it...and then you deal with that complexity, but the underlying standard is simple. + +MIH: That’s a good summary of my take. The producers (mostly developers) write a string with internal/inline selectors, and many consumers are translators, which we know are difficult to handle...especially when we move to XLIFF. + +ECH: One point worth putting in: in case our discussions of simplicity aren’t making sense (see the talk). Complexity & cardinality are not the same thing. Complexity & simplicity are opposites, it’s more about “easy and difficult”...it doesn’t mean “the number of things”, it’s the opposite (it’s about separate things being separate), so teasing apart things that are different means there are more things. Maybe you have more subparts of the message that need to be consistent with each other, there might be some more. + +EAO: The problem we’re solving is that we have complex & complicated messages by any definition, and how to express those, one solution is internal selectors. Another possible solution is to allow 1 message to be built up from parts of other messages (e.g. Mihai’s example with ListFormatter). We need to resolve both of these issues...maybe not together, but they correlate with each other. I believe that if we don’t have either, we have a bad standard, and if we have 1 we might not need the other. + +EAO: We need to talk about message references, because this will help decide how to handle one model or the other. + +ZIB: We keep going back and forth on the topic. It shows that the problem is non-trivial, because if we were, we wouldn't keep going back and forth. I would like us to be very humble, because the task is enormous, and what we decide will be used by many people for several years, because if this goes into JS, then many people will be using it. So maybe we can design the data model via layers, where we have a base layer in the standard that is simple, but that we enable other features in subsequent layers. + +MIH: I would like to see how that looks in the data model. Natural languages don't change as much as programming languages. French + +STA: The problem with layering is that if we design things in a top-down way to allow for both, then it allows for both in a way that could be messy if we have 2 different ways to interpret things. + +Back to the question of where the complexity lives, I am trying to dig down to the question that addresses the fundamental issue. If we have top-level selectors in the standard, then we say that messages authored using internal selectors can convert back and forth, and use message references to do so. But what I want to know is can we define the data model as the data of + +LHS: As Zibi messaged in the Unicode Conf, the stakes are high because if we get this wrong, we will live the consequences for a long time in the future. I agree with MIH that it is unlikely that natural languages will come up with new grammar constructs any time soon. Tech moves quickly, so we can have humility about the pace of change with technology. As STA was talking about, maybe there is + +ECH: Tying message references <-> internal selectors (as EAO said), so if we define data model to cover the message in transit to translators (as STA said), then + +ZIB: I agree that languages don't change often, but I do expect the way humans interact with technology will change in the next 5 years. We should define the message processing to be as lossless as possible. My concern is that if we use message references instead of internal selectors, maybe it can work, even though it is not a part of the standard, but then maybe it goes unsupported when it should have been supported, which will cause the need to create v3 of MessageFormat to fix. + +EAO: Can we support function calls that take more than one variable? It is true that natural languages don't change often, but how we expose language functionality changes more frequently. We're still understanding how to represent things, ex: plural range selectors -- "0 - 1 items" (not "0 - 1 item") needs 2 arguments to determine. + +RCA: We need to have a followup discussion on this topic. When can we do this? Hopefully can we come to a resolution so that we can avoid + +STA: ZIB and LHS's comments make me realize that lossiness / losslessness is also a spectrum, and sometimes lossiness is okay. When we speak about user-visible functionality, the losslessness is okay because once it is consumed by the user, it is done, the effect is achieved. But if we care about the effects of tooling like version control, then losslessness matters, and I worry if we make it difficult for internal selectors in this regard, then we encourage them to go off and solve the problem in their own way. + +EAO: I think this question of lossiness is the wrong question to be focusing on. I think the question comes down to whether we allow message references or internal selectors. + +MIH: My position is that we should support message references. At times, it can be useful. We can't stop people from using message references in ways that are bad, like concatenation of the content of those message references, but we should still allow it. + +RCA: We will have a followup meeting as another round of the task force for issue #103 next Monday, and defer the Chair Group meeting to the following week, like we did last month. + +NIC: Agreement for task force meeting on this topic? Anyone disagree? + +RCA: Will have as an agenda topic + +ZIB: Going back-and-forth between different models, etc. It doesn’t seem like it’s trivial, I think we’d all agree that if it were trivial we would already reach it. This is probably not the first group that has dealt with this kind of issue. I’d like us to be humble about our ability to say what is certain for the future. Everything we’re standardizing → maybe “internal selectors” is some kind of “extension” that includes a script that converts back & forth, so if we come to a conclusion that something isn’t sustainable, they can easily plug the next step. Maybe think of it as “layers”?? + +MIH: I would like to see how that looks in the data model. If we say we can add internal selectors later, that’s fine, but if we say “let’s put some fancy hooks in now” then I don’t understand the reason. I understand being humble, but we don’t want a Turing Machine in there. Languages don’t change that fast (it’s not like French will develop some new grammar in the next few years)...so it’s a finite problem. + +STA: The problem with layering is that we can start with a top-level, and then we’d end up with both of them in the standard, and I’d like to avoid that situation. If we have both, it’s probably not good (too many choices, etc.). The reason the in-message approach might be preferable...I’m trying to dig down into the question that will help solve this...I think I have a question: if we push complexity to tool authors, and say “the standard only supports external but you can support however you want” → Zibi’s concern is that the tool author can’t actually go back to the in-message selector way, so a question is should the standard be the translation, or the party that implements only uses the standard for “transport”, or canonicalization? + +LHS: Zibi said something useful at the Unicode Conference re: being stuck with this standard so we need to get this right! Also languages might not change, but technology might change. One way of thinking about it is “how hard would it be to add this later?”...it seems easier to add internal selectors later than take them out later? Another question is how important “lossless” conversion is, as long as it’s functionally equivalent…? + +ECH: If you had a simpler data model for “in transit”, then it’s up to the system that supports internal selectors, that system would have to handle conversion + +ZIB: I agree with Mihai, and Luke somewhat responded: the way we respond to software might change very much even though languages might not change much. It’s hard to predict what aspects will be necessary. Languages will always be more complex than we can encode them, and it will always be “lossy” for human languages, what are we OK not including from human language? This is a very challenging question when we are trying to handle the future. I agree that we should be more humble about technology than languages. How important is lossless? It’s important if we want internal selectors to be a “first class citizen” not a “second class citizen”. What does the layer look like? It looks like it accepts internal selectors, but it doesn’t support it right now, with a flag that turns it off and on? If the data model doesn’t allow for it at all, then adding it later would require MF 3.0 + +MIH example in chat: +message = You deleted {fancyFiles, reference} from this folder. +fancyFiles = {fileCount, plural, one {# file} other {# files}} + +EAO: A couple of things, a bit related, 1) we do need to get things right in 2.0, we shouldn’t have a “Messageformat Basic” with too many variants, we should have 1 that we are publishing. 2) If we decide we don’t allow internal selectors, are we allowing internal function calls with things inside them? We might have things that look like an internal selector but technically aren’t (like MIH’s example). 3) Languages don’t change quickly but the way we represent them can. For example, the CLDR data for the rules on the pluralization of ranges (e.g. “0-1 items”) isn’t well-identified for many languages. We might later identify that there’s a part of a language that ought to be represented in the standard but isn’t, and that part of the language may require changing some of our presumptions. + +NIC: similar to list selectors! + +RCA: to close → when is the best time for the next Task Force meeting? Next Monday? → general agreement + +STA: What I was trying to get at about lossy vs. lossless conversion: it’s also a spectrum, in some cases it’s OK...when you shift things to users, it’s OK to convert lossily since you’re not coming back (just being displayed to users). So in this case, it’s OK to be lossy, but if you care about round-trip to tooling then lossiness is more of a problem. If we go with the top-level approach, then other parties might come up with “ASCII Latin 1 Exetended” to handle what we don’t. + +RCA: ??? versioning + +ZIB: Imagine a company that decides to use internal selectors, and on every step of their tools, it gets converted to external selectors and then back, and then it changes, and on roundtrip it gets lost. So this company would have a papercut and realize that + +MIH: A typical use case is that a developer writes a message, and then it gets translated into X languages, and they all go into version control, so the translations are “normalized” but not the English one. + +EAO: ...but if the translator fixes the English version, then they’d need to change it + +RCA: but a conversion needs a new version + +MIH: like using spaces vs. tabs → if I have a tool that does the proper style for my company, if I don't respect the style, and it changes behind my back, then it is what it is. If someone changes a translation directly, it’s OK (although they shouldn’t edit things directly) + +EAO: Lossiness is the wrong question to be focusing on, only valid if we do transformations, which are only valid if we have either internal selectors or references. We could say that it’s possible to move back & forth but maybe not losslessly. + +MIH: I believe that we should allow external references. It’s true that you could do “horrible” things like concatenation, but developers will do this anyway. By having a standard way to represent this, I’ve allowed explicitly in the data model/syntax so I can put tools around that, maybe even Lint / detect that you’re doing something problematic. + + + + +## Quora React Localization Framework + +IUC44 Presentation + +## MFWG - Stakeholder definition and debate + +## Summary/Review : Unicode Conf slide deck for MFWG + +## Quora React L10n Framework data model alignment +