Create notes-2021-05-17.md

romulocintra · mihnita · commit 56a48ef3e83f · 2021-07-10T16:19:26.000-07:00
Close #171
diff --git a/meetings/2021/notes-2021-05-17.md b/meetings/2021/notes-2021-05-17.md
@@ -0,0 +1,226 @@
+[Automatic Transcription](https://docs.google.com/document/d/1qK6tKs5HwysOad0ch6eqAfNo-x2Zvp5ffQMTcY-4OsY/edit?usp=sharing)
+
+
+#### May 17, Meeting Attendees:
+- Romulo Cintra - CaixaBank (RCA)
+- Elango Cheran - Google (ECH)
+- George Rhoten - Apple (GWR)
+- Eemeli Aro - OpenJSF (EAO)
+- Erik Nordin - Mozilla (ETN)
+- Mihai Nita - Google (MIH)
+- Greg Tatum - Mozilla (GPT)
+- Standa Rygal - Expedia (STR)
+- Staś Małolepszy - Google (STA)
+- David Filip - Huawei, OASIS XLIFF TC (DAF)
+- Luke Swartz - Google (LHS)
+- Zibi Braniecki - Mozilla (ZBI)
+- Jean Aurambault - Pinterest (JAU)
+
+
+## MessageFormat Working Group Contacts: 
+
+- [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg)
+
+## Next Meeting
+
+May 31, 11am PST (6pm GMT)  - Extended
+June 21, 11am PST (6pm GMT)
+
+## Agenda
+- [ Agenda on Github ](https://github.com/unicode-org/message-format-wg/issues/171)
+
+### Moderator : Romulo Cintra 
+ 
+## Q&A - Summary Documents for each proposal
+ 
+Proposal that Leans Towards Making Things Possible
+Question (LHS): in https://github.com/unicode-org/message-format-wg/blob/master/guidelines/goals.md we say a “non-goal” is “Support all grammatical features of all languages. Instead, focus on features most commonly encountered in user interfaces, textual, graphical and spoken ones alike.” but this proposal states that “We believe that the role of the data model is to express the breadth of human communication. To achieve that end, it’s more important to make their expression possible than to establish strict or artificial limits on the complexity of those messages.” ...these two things seem to me opposed. Do the proposal authors want us to change the MFWG’s goals/non-goals, or am I misinterpreting how the two things can align?
+Comment/request for details (LHS): This proposal says a lot of things that I can get behind like (I’m paraphrasing), “Let’s design something that should work for 10-20 years” and “Let’s not let existing patterns/technologies unnecessarily constrain us”...BUT the devil’s in the details, and I think the tradeoffs matter. For example, I think we should seriously consider breaking with existing patterns/technologies (e.g. XLIFF) but only if it’s worth the cost. This proposal doesn’t give examples of things that it thinks are/aren’t worth the cost, so it’s hard to evaluate. Can the proposal authors be more explicit or give examples?
+Questions (LHS): I’ve added a few other comments/questions in the doc
+Proposal that Leans Towards Simplicity
+Comment/request for details (LHS): similar to the other proposal, it’s hard for me to get a handle on the tradeoffs without examples (I thought these summaries are supposed to mostly stand alone so that the wider MFWG can read them & respond...is the expectation that everyone dig into the detailed code to understand how the 2 proposals differ? if so, can you give examples for why you think 1 proposal is better than the other?)
+Put another way: the current documents seem like “philosophical statements” and thus it’s hard to evaluate them, especially compared to each other (and especially because my instinctive response is “well of course we’d want to find a good compromise between simplicity and power/flexibility, understanding the tradeoffs”...but the specific tradeoffs/compromises really matter!)
+Questions (LHS): I’ve added a few other comments/questions in the doc
+ 
+ 
+ 
+ 
+ 
+ 
+### [Strawman Proposla for an XLIFF 2 MessageFormat Module](https://docs.google.com/document/d/1D702OBAzT-Crb9XXUiZYJnFO9Yq5duRy4Zc3Br6JwRU/edit )
+ 
+EAO: This is based on the data model prototype that EAO and ZBI have been working on. The proposal is complete based on the XLIFF 2 documentation. I'll present how this connects to MFWG model. This is based on some understanding on XLIFF 2.
+ 
+What we are adding here are having these placeholders to representing a value, a variable reference, or a data reference. You can have `hello {foo}`, where `foo` is looked up. The second place where these placeholders are used is a variable, where this refers to a group of messages, and the selector is must be used to select a message out of the group.  See excerpt starting with `<messageformat>`. In another example, within a `<file>` element, messages exist within the `<unit>`, and they are referenced through a message id that is stored in the `<mf:messageformat>` element, which is in turned referenced within a `<ph>` unit of an XLIFF `<segment>`.
+ 
+There is code to convert Fluent and MFv1 syntax to and from XLIFF according to the MFv2 data model proposal.
+ 
+The code is available in open source, with e.g. a test suite [here](https://github.com/messageformat/messageformat/blob/mf2/packages/xliff/src/mf2xliff.test.ts).
+ 
+GRH: It looks like a good start. I had a question. Let's use Arabic as an example because I like to go for the hard examples. There's grammatical gender for the noun, and there's grammatical gender of the pronoun attached to the noun. How would you handle such a scenario?
+ 
+EAO: In the example between Finnish and English, you have to work with both case and gender as inputs.
+ 
+DAF: I'm seeing this for the first time, I think it is well designed from teh XLIFF 2.0 point of view. I think it will be interoperable even if the agent doesn't share the namespace. You assume that you will be able to used the namespaced attributes inline, which is a core assumption. There is a caveat which is that until you achieve the status of a module, you won't be able to insert your namespaced attributes into core inlines.
+ 
+EAO: Clarify what you mean by core inline?
+ 
+DAF: You assume that you can use this namespace in placeholders, etc., and `<ph>` is a core inline.
+ 
+EAO: `<ph>` accepts open inlines.
+ 
+DAF: Until you register --
+ 
+EAO: I have checked with the spec when writing this, but please tell me offline where in the spec it says so.
+ 
+DAF: Sure. I want to hear what MIH says. I think it respects the core data model.
+ 
+(via chat) http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#ph
+Constraints
+ 
+The following XLIFF Module attributes are explicitly allowed by the wildcard other:
+ 
+- attributes from the namespace urn:oasis:names:tc:xliff:fs:2.0, OPTIONAL, provided that the Constraints specified in the Format Style Module are met.
+- attributes from the namespace urn:oasis:names:tc:xliff:sizerestriction:2.0, OPTIONAL, provided that the Constraints specified in the Size and Length Restriction Module are met.
+No other attributes MUST be used.
+ 
+MIH: Overall, it looks fine to me. Maybe some of the areas that I would discuss would be just typical XML discussion, like why do you decide to make something an attribute versus an element. But it's no biggie. The other one comment I have is that I already have a proposal for handling gender, plural, and select for XLIFF, and I think I've shared it already. Maybe we should unify? Did you look at it already? Did you deem it not good enough?
+ 
+EAO: I already looked at the GPS proposal, and it links up with the discussion on ____.
+ 
+MIH: At first look, there's not anything that I can't reconcile. I'm cautiously optimistic because at the first look of the data models, I thought we could reconcile those easily, and here we are several months level.
+ 
+STA: A question about XLIFF -- what if the source language doesn't have a select, but the target language needs a select?
+ 
+DAF: XLIFF would normally be created under a corporate structure as source-only. Target would have to be created later. Target is not required, which allows content creators to create read-only content. The translators are not allowed to modify source, but they are totally free to create structures for target languages.
+ 
+STA: It seems that you want to insert a group, then, to capture all of the messages in the target language.
+ 
+DAF: You have to know what grammatical features that you are implementing.
+ 
+STA: A really quick comment -- I like the fact that the default value for the selector is specified on the selector rather than on the branch.
+ 
+DAF: This is what makes it core interoperability friendly. You don't have to navigate too far into the data structure itself.
+ 
+GRH: I know that there are frequently languages that are more linguistically variable. For instance, if you translate from Mandarin into English, you might find a word that could be singular or plural. If there is a translator to produce functions for the language that they are creating, independent of the content creators, that would be useful.
+ 
+MIH: I wanted to explain what DAF already explained with different wording, and I think it will answer GRH's comment. I do think that there is no way for anything down the line, after a XLIFF doc was created, to create new units. For example, if you create a Chinese unit, there is no way to insert a new unit in the translation. You have to know what your target language is at time of XLIFF creation. So I will have no idea where the XLIFF doc goes, whether it needs gender selects, etc., or whether it will be passed to MessageFormat.
+ 
+EAO: I think what we are identifying here is a possible weakness with how we work with XLIFF. At the very least, I have code that shows that this is not a limitation with how we translate to and from XLIFF. Also, I thought I would mention something that STA mentioned about selectors. I also posted a [PR for a clear spec for how case selection could work](https://github.com/unicode-org/message-format-wg/pull/170), which describes how it will work with the model that ZBI and I have worked on.
+ 
+STA: Coming back to what DAF and MIH said, does this imply that the XLIFF producer needs to be aware of nay functions, including custom registry functions that we were once talking about, that might introduce variability?
+ 
+MIH: Yes.
+ 
+STA: Is that new?
+ 
+MIH: It's been true forever. The whole l10n world works on the idea that you give me a simple message, and I'll give you a simple message back, not something programmatic.
+ 
+DAF: It is modular now, so if you can ask for the right thing, you will get it back.
+ 
+STA: So it would be the responsibility of the user to use the producer such that you get teh right XLIFF.
+ 
+DAF: Yes, to use the extractor or agent. We can define a new type of agent to do this. We can tell translators to not delete placeholder or some other item. If we don't produce that, then the translator cannot work with that.
+ 
+STA: If we have a registry of functions, then they will have short declarations that we can use.
+ 
+DAF: Yes, and we can define a new type of agent to do that. But it would always have to be target-language specific. You cannot work without a specified target.
+ 
+GRH: At least in Siri's use cases, we already use XLIFF. It works fine, and we can specify language-specific grammatical features. It works pretty well, it saves the translator to write out every single possibility, and allows the source creator to not predict every grammatical detail of every possible target language. So my preference is to have something similar.
+ 
+MIH: I'm really interested in how that works. Maybe not right now.
+ 
+GRH: Sure. I'm trying to say that I don't want to put the burden on the source author, which is good for long-term viability.
+ 
+MIH: Sure. Let's say that you have source in Arabic and you need the gender of the pronouns, are you saying that the translator doesn't need to include that in the translation? I'm not quite understanding, but maybe we can discuss over email.
+ 
+DAF: There is nothing programmatic that you do in XLIFF with a node, right now.
+ 
+RCA: I have a question, about the match of this proposal with what Mihai said earlier. Is there an opportunity to achieve that?
+ 
+EAO: Probably. It depends on the exact specifics of the data model that we end up with. But given that in all our specs, this is based on a data model that is more complex than that one, there is no reason why this can't work with the other data model.
+ 
+RCA: The other question here is what are the next steps? Do work on the data model and review later on?
+ 
+EAO: One reason that I went ahead and did this all thing is to have a proof of concept that all of this is possible, and have it map to the data model structure that we have. Specifically, if you all can review it and find things that can be improved, then we can make this a better proposal, but I am not aware of needful things to work on this.
+ 
+MIH: I think it's really beneficial to look into this kind of mapping XLIFF, whether we choose this data model or the one I have been working on. The sooner we look, the sooner we bump into issues of what works and what doesn't.
+ 
+DAF: A small pointer, there is a related ideas markup. The related features was added in XLIFF 2.1. Through ITS, you can specify the intended target languages. ITS is the markup for internationalization. That's just a pointer if you're looking into it.
+ 
+RCA: Can you share progress on implementing the features in the data model, how is that going?
+ 
+MIH: We discussed implementing features, I did that. I also added an implementation in Java, which is also in the `experiments` branch, [link here](https://github.com/unicode-org/message-format-wg/tree/experiments/experiments/data_model/java_mihai). It implements everything we talked about. It implements some ideas that I came across as I was implementing it. The tests directory in the source code folder shows how everything works. In FancyListTest, the testListWithItemMultiProcess test shows how 3 functions are chained. The first name of 3 person objects are retrieved, the dative form in Romanian is computed with another function, and then it is combined into a list with another function.
+ 
+The GetPersonName class shows what it takes to implement a custom function. They are all functions, so they are all equal, one is not more special than another.
+ 
+RCA: Is this `personName|grammarBB` a function too?
+ 
+MIH: Yes, I'm not fully happy with calling this a function, but is a something that is callable at runtime.
+ 
+RCA: Any comments?
+ 
+EAO: I noticed in your dynamic message reference example, the message of the browser messages is flat. Did you want to support deeper links with e.g. dot notation, or something else?
+ 
+MIH: Our structure doesn't need to have that structure.
+ 
+EAO: I thought you had message groups that can contain message groups.
+ 
+MIH: Yes. I don't want to tie the message files to the message ids. I don't wnat to encode where teh messages actually live. They may live in source files, or a database, or your Android or iOS resources. So the exact format of the message id doesn't matter, if you want to segment them with dots or slashes or colons, you can figure out how to interpret the id and retrieve the message. The RescManager is the abstraction on top of fetching the message, I don't want to expose that in the data model.
+ 
+GRH: I think this is a good first step. There's frequently a lot of cases where you need to reduce the burden where you require the translators to provide every single possibility. For some words, you provide a definite article for a specific word. Or for prepositions like "on Hawaii" vs. "in Calfornia" because Hawaii is an island. Maybe you had that in "BB"?
+ 
+MIH: Yes, I named it GrammarCasesBlackBox because I don't wnat translators to get hung up with how things are implemented.
+ 
+GRH: I have a function in house to do such work, so maybe I don't want to name it "block blox".
+ 
+MIH: No, of course you don't, you want to name it something meaningful. When you implement it, all you do is point it to your own logic function.
+ 
+GRH: How does this work? Do the translators do this?
+ 
+MIH: You just implement the function, register 
+ 
+GRH: Can you do this for multiple inflection engines?
+ 
+MIH: Yes, because in theory, the data model doesn't know and doesn't care about the implementation of the functions.
+ 
+GRH: It's kind of like namespaces.
+ 
+MIH: Yes, it's kind of like namespaces. You have a black box in the background that does the linguistic work, you just expose which switches with labels and how to use them. XLIFF will be written to indicate whether an argument (?) can or cannot be deleted. But you cannot add any new switches.
+ 
+GRH: The translator cannot add new switches?
+ 
+MIH: Yes, this goes back to the discussion about having a schema for the functions in the registry. You c
+ 
+GRH: I like this idea, there are a few things that I would like to expand on it. Maybe there needs to be advertisement of supported functionality, because maybe one function has inflection of nouns and maybe another function has inflection of verbs.
+ 
+I recognize that I will have several comments on this, what is the best way to add comments?
+ 
+RCA: Maybe we can create a pull request that we don't merge, so that we can have comments for specific lines.
+ 
+EAO: You mentioned using dotfiles and other formats for representing the messages. And that brings up the topic of the syntax that we use for MFv2. So maybe we can bring that up in our next extended meeting. Would that be okay?
+ 
+RCA: Sure I will add an item to the next meeting agenda.
+ 
+EAO: In our data model, I wanted to point out a nested message for dynamic message references where the messages references are more of a path (sequential data structure) to index into the nested message structure. I also wanted to show a rough implementation of formatToParts and a hacky attempt at a function that can correct the article in English "a" / "an" depending on the main noun.
+ 
+MIH: We should be able to do these kind of corrections across formatting tags, ex: `an <b>hour</b>`. Also, in the final formatToParts, we should be able to represent overlapping fields.
+ 
+RCA: I actually found more similarities than differences between the data models, to be honest, so I think we are on a good path. I know that we want to start discussing the syntax during the next meeting, but I don't want to lose the focus on choosing or unifying a data model.
+ 
+EAO: One reason I proposed working on the syntax is because I think the syntax informs the data model.
+ 
+MIH: My take on the syntax is that I don't think it affects much of our decisions, which is why we are working on a data model, not a syntax model. I hope the JavaScript world makes its implementation idiomatic.
+ 
+EAO: Just a short observation, this would require us to go back and redefine our deliverables if we don't think we don't need a syntax.
+ 
+MIH: No, I am not saying we should go back on that, I'm just saying that I'm not very opinionated.
+ 
+### Proposal that Leans Towards Making Things Possible
+ 
+ 
+ 
+### Proposal that Leans Towards Simplicity
+ 
+ 
+