Skip to content

Create notes-2020-03-23.md #71

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 6, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
190 changes: 190 additions & 0 deletions meetings/2020/2020-03-23.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,190 @@
#### March 23 Attendees:
- Romulo Cintra - CaixaBank (RCA)
- David Filip - ADAPT Centre, Trinity College Dublin (DAF)
- Pu Chen - Netflix (PCN)
- Staś Małolepszy - Mozilla (SMY)
- John Watson - Facebook (JRW)
- Elango Cheran - Google (ECH)
- Zibi Braniecki - Mozilla (ZBI)
- Jan Mühlemann - Locize (JMU)
- Mike McKenna - PayPal (MGM)
- Shane F. Carr - Google (SFC)
- Eemeli Aro - OpenJSF (EAO)
- Elango Cheran - Google (ECH)
- Jan Mühlemann - Locize (JMU)
- Rafael Xavier - Paypal (RXR)
- Nicolas Bouvrette - Expedia (NIC)
- Mihai Nita - Google (MIH)

## MessageFormat Working Group Contacts:

- [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg)


## Next Meeting

April 20, 10am PDT (6pm GMT)

## Agenda
- [ Agenda on Github ](https://github.com/unicode-org/message-format-wg/issues/57)


## Presentations
### Chair Group Announcements

Chair Group Guidelines [Link](https://github.com/unicode-org/message-format-wg/blob/master/guidelines/chair-group.md)

Working Guidelines Draft about Chair Group processes [Link](https://docs.google.com/document/d/1U6PiFopoOqPyAgJ_KSzEfZOyKIcCGA-qqb_PyRrv_Mc/edit?usp=sharing)

Easy way to volunteer to participate in Chair Group : [Link](https://github.com/unicode-org/message-format-wg/pull/70)

RCA: Working group should be more than one person, it should be led by more people, people who are regular attendees. Have created a Chair Group document to describe responsibilities, now added to MF WG repo, link is above.

Proposal will help handle coordination, help make proposals in offline mode between the monthly meetings to help drive momentum so that we accomplish more outside of our 1 hr / month meetings.

NIC: I would be interested in joining as a best-effort member but given travel restrictions and COVID I'm not sure if I can fully commit yet.

DAF: It sounds like a steering group, but we can keep the name Chair Group.

RCA: It's for monthly meetings and sync up.

DAF: How many people do you foresee?

RCA: Not sure, did not know if there would be volunteers. If everyone volunteers, that might be too many. Maximum maybe ⅓ of total members. But maybe 20 people too many.

DAF: I'm intrigued, but also new to the group, so I'm hesitant to volunteer.

EAO: Steering group sounds like a more appropriate name.

SFC: We could name it something. Reason why I suggested "Chair Group" to make it clear that all of us as a full body during the 90 min monthly meeting is when decisions are actually made. Whereas Steering Group / Committee sounds like they make all the decisions and only receive feedback. Chair Group is modeled after a city council where the staffers do all the legwork and the council meetings are where decisions are made.

EAO: "Chair Group" sounds like a group that makes decisions, but perhaps this is all semantics. I don't think the name matters all that much.

RCA: Do we have consensus?

ZBI, SMY: +1

RCA: Procedures and guidelines of how this will work are described in the document in the repository.

## Open Issues

### Establish the decision making process (#58)

RCA: Do you have any comments on this?

(silence)

RCA: Okay, I will create the decision making process doc.

### Goals and Non-Goals (#59)

SMY: This is related to the discussion from last months, where there are many takes on what we should be doing. It's been a long month. I remembered how we said how important it is to talk about the data model, interop with XLIFF, etc. I think the charter summarizes pretty well the general sentiment, but I think it would be interesting to expand more on what it means to supersede MessageFormat, why it warrants a new solution to be created. Because it will help us understand the tasks in front of us. So I'm hoping we can agree on a core set of goals to help us navigate.

DAF: I think this is really important, the non-goals are also very important. Can we discuss now?

SMY: I'm not sure if we discuss now or during it in a chair group.

DAF: Scope, including goals / non-goals, is very important for chartering a group.

SMY: Maybe that is a reason to have this discussion asynchronously on the repo issues.

RCA: We don't have a packed agenda for this meeting, so maybe we have time to discuss now and move towards consensus.

NIC: We have been discussing many issues for weeks. How are we planning to triage them and get consensus?

SMY: Goals and non-goals tell us scope. Design principles tell us how we should go about them. Requirements gathering has been helping us gathering the finer details of what we are building. Goals are higher-level. I hope that helps, the distinction can be tricky.

NIC: I think it's a very good approach. Practically, are we having a side group who;s going to work on this?

ECH: MessageFormat for ECMAScript has been a long-standing issue. There's a glut of libraries for this in JavaScript, but not as much in other languages (?). Being able to provide data literals in the language of JavaScript: that seems to be something users want to use, which MessageFormat doesn't allow. And making a solution for the larger

ZBI: When I did presentations on Fluent in 2017 and 2018, I got a lot of questions about MessageFormat 2.0. The temperature of the room was that MessageFormat mostly works, but has certain deficiencies. I've noticed that presentations early on were focused on Java and C and server-side technologies. However, the industry has largely been moving toward client-side and hand-crafted solutions. MessageFormat is built on 2002 technologies.

SFC: I think going through the specific bullet points in issue #59 is a very good use of time. The list Stas put together is an example of the kind of thing that the chair group would put together.

SMY: To give some background, these five points largely came from the charter. I rephrased them a bit to make them sound more like goals. I specifically put "messages" in the bullet point because there's been a lot of discussion on whether we are designing for messages or a file format. So this list of goals is not exhaustive.

DAF: I would like to start with the non-goals. I agree that we shouldn't design a general interchange format, and not support every grammatical feature. ??? About many-to-many, the industry says that you should be working on a pair. Situations where you're dealing with many-to-many are not always solvable. About the canonical syntax, I think we should focus on the data model, rather than syntaxes. Something different will work for Java versus JavaScript. For number 5, I don't think we should mix up MessageFormat with formatting.

ECH: I agree with everything DAF just said. Now that I'm thinking about the things he said for the goals and non-goals, I want to give some context on where the goals may have come from. We've been talking about the difference between data model and syntax. Many-to-many translations, at least in my work recently on a CAT tool, there are use cases where, especially when you are translating for a voice assistant, you may have different variants, like 3 variants in a source language and 5 variants in a target language.

ZBI: I think that this is a fairly sensitive point. I think that point 5 is the most important one for me. The criticism I heard so far from ECH and DAF is about dismissing the need to build a full-stack localization system. Only looking at the source model, and not thinking about how the UX side, is going to give us another MessageFormat in 5 years. There are a lot of requirements across the boundaries. Something like language switching at runtime on the mobile phone may influence our data model. If we think of ourselves as living in the silo of designing a data model, and anyone can run with it, we will make us design something that is much more limited, such that people need to extend it. If we end up with a turing-complete data model, sure, but I think it's better to focus on use cases and making sure that our API supports them. I think we should think across different layers and not isolate ourselves only to a single layer.

JMU: I agree with ZBI

EAO: I agree with ZBI

SMY: I have a question for David. On goal 5, you weren't sure if we should conflate combining the syntax with storage? Can you clarify?

DAF: There should be an API… there are 3 layers. Data model, syntax, and API. If this is all in scope, that's fine. Interpreting the message… my problem is with formatting. The formatting should be implementation-dependent. It conflates goals with non-goals.

SMY: When you say "formatting", you mean the runtime process of interpolating strings? Do you mean for dates?

DAF: "formatting" is a fuzzy word to use. We cannot ignore UX. We need to be aware of use cases.

NIC: Can you clarify your terminology?

SMY: When I say "many to many", I mean that the source string may have multiple variants that correspond to the variants of the source language, which may not be applicable to the translation. Say the source language is English, which it often is in our industry, and it requires plurals in a certain string, but a translation does not. So that's two variants to one variant.

NIC: So it's like plural rules today? You can have different rules for different languages?

SMY: Yeah. Plurals are one example, but other interesting grammatical features can be expressed in the same model.

ZBI: +1 to SMY

ECH: My interpretation of "many to many" is a bit more fuzzy than that.

RCA: All, please add your own goals and non goals to the topic.

ECH: As I understand it, the data model is what goes as the input to the API that gets implemented in each implementation. It's more about that you describe the data to represent all the relationships of the input correctly, and the implementations choose what the function call arguments and types look like. But data is equal to itself, it is not a language, nor one that could become Turing complete.

SFC: Regarding the second and third bullets in the non-goals: I do actually think that it should be in scope to support arbitrary grammatical features. For example, we should at least support gender, inflection, and plural. The mechanism for determining the correct gender, inflection, or plural should be implementation-dependent, but the syntax and data model should support a plug-and-play system so that users can plug in an inflection model and then be able to use messages that were translated in an inflection-aware way.

SMY: What I meant by "transforming parts of speech" is that transforming strings with AI is not in scope; so I had something in mind where the translator has to put the cases in manually. So I would like to discuss this further.

SFC: I think syntax is in scope, after we decide on the data model.

DAF: I agree; I was just saying that the data model and syntax should not be conflated. Data model first.

DAF: About many-to-many, there are not necessarily the same number of source and target language messages. It is in scope, but it should be clarified and grounded in CLDR data. The combinatorics should be deterministic..

SFC: I think a goal should be to separate authoring format to runtime format. Authoring format is what the programmer writes. They check it into source control. Runtime format has a collection of strings and gets interpolated at runtime.

MIH: We should discuss this separately, since I disagree with certain points.

SMY: Yeah, I think runtime versus authoring format is worth discussing further. I didn't have it as a goal here because I pulled the goals only from the charter.

EAO: My understanding is runtime format is a re-representation of the data model within the runtime. Or is it something else?

SMY: That is my understanding, too.

MIH: If you look at the formats now, there's the one the programmer writes: it might be in English, it might be in French if you're a French developer, etc. That gets translated and might end up in the same format or a different format. You don't need everything, like comments. Those files can be compiled into a binary form, etc. Even JSON might be compressed. At runtime, it's pulled from the binary form and passed to the API. So in many cases, the authoring and runtime formats could be the same.

Maybe we should have an agenda item in the future to go through and discuss the terms in the glossary thread that Elango started.

NIC: Every little word can have a different meaning for people in this list. If we continue like this, if it's the right approach.

RCA: I agree; we should continue moving on this topic. The chair group should make a definitive list.

NIC: On top of that, all these things are still very high-level. I don't have a solution, but I don't know if what we're currently doing is going to work.

SFC: What we're doing today is giving specific critiques on SMY's bullet points. SMY or the chair group should apply the feedback and come back next month with a revised list.

SMY: I think the last 2 months of discussions have been good. Requirements gathering broadened

SMY: A while ago, we talked about API, and whether it would or would not be in scope. I think this approach doesn't give us a concept of error scenarios. What happens when something goes wrong, like an input argument is missing? Leaving that out as an undefined behavior wouldn't be a good choice for our working group. So I think we should have a definition of a step-by-step process of how formatting works.

ZBI: +1

SFC: +1

ECH: +1

MIH: The comment about "many to many" translations was about well-specified in CLDR. But I think it's useful to be more free-form. We have use cases in Assistant where there may be several replies, like "it's raining cats and dogs", "bring your umbrella", etc. So there was a need to represent variations of the same message. (please fill in)

RCA: Time's up!

EAO: Data model should be resource-centric, even if canonical syntax isn't.

[Full discussion and chat notes](https://docs.google.com/document/d/1icPqRiGXkbIGE46Y9H1hh4fZt9BL-PbJqoJKg7e9oA4/edit?usp=sharing)