Skip to content

Commit e21091a

Browse files
romulocintramihnita
authored andcommitted
Create notes-2021-05-31-extended.md
1 parent 56a48ef commit e21091a

File tree

1 file changed

+148
-0
lines changed

1 file changed

+148
-0
lines changed
+148
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,148 @@
1+
[Automatic Transcription](https://docs.google.com/document/d/1DN9BDkJqtnY3UoI28k3PYUcLhsjlhk3fJK5J2LC_Atk/edit)
2+
3+
### May 31, extended meeting Attendees
4+
- Romulo Cintra - CaixaBank (RCA)
5+
- David Filip - Huawei, OASIS XLIFF TC (DAF)
6+
- Daniel Minor - Mozilla (DLM)
7+
- Eemeli Aro - OpenJSF (EAO)
8+
- Richard Gibson - OpenJSF (RGN)
9+
- Elango Cheran - Google (ECH)
10+
- Zibi Braniecki - Mozilla (ZBI)
11+
- Staś Małolepszy - Google (STA)
12+
13+
14+
## MessageFormat Working Group Contacts:
15+
16+
- [Mailing list](https://groups.google.com/a/chromium.org/forum/#!forum/message-format-wg)
17+
18+
## Next Meeting
19+
20+
June 21, 11am PST (6pm GMT)
21+
July 5, 11am PST (6pm GMT) - Extended
22+
23+
24+
25+
### Moderator : Rômulo Cintra
26+
27+
Related issues :
28+
[68#](https://github.com/unicode-org/message-format-wg/issues/68)
29+
[48#](https://github.com/unicode-org/message-format-wg/issues/48)
30+
RCA: How should we start this discussion?
31+
32+
EAO: My understanding is that we want to define a particular syntax for MFv2 just like we have for current MessageFormat. That would be parseable and handled by tooling.
33+
34+
ECH: Do we say "the canonical syntax" or "a canonical syntax"?
35+
36+
EAO: It says "the syntax".
37+
38+
ECH: That's fine, so that means to me that we maintain one syntax, and there will be other syntaxes, but there will be only one data model, which is the important part to me.
39+
40+
RCA: Do you have any examples in which you have started looking at them?
41+
42+
EAO: Those examples are the JSON files and the Fluent examples which can be read by my prototyping code. But we need our own syntax that supports features that cannot be supported by current MessageFormat or Fluent, like having selections on messages using more than 1 selector arg to define a selection case.
43+
44+
RCA: What is our starting point?
45+
46+
EAO: I think we can start at select messages. Whether the syntax for the selection should be embedded in a message, or should it be part of the structure of the larger message that approaches a file format.
47+
48+
ECH: What does "approaches a file format" mean?
49+
50+
EAO: MessageFormat defines a simple message format and selection message. But Fluent designed its own format for representing collections of messages.
51+
52+
RCA: STA, could you share the principles or drivers that made Fluent come up with the new format?
53+
54+
STA: Well, EAO was there a question of a single version or a collection of messages. But then there was the question of how Fluent compares to MFv1?
55+
56+
EAO: MFv1 defines a simple message, and a selection message. But if we define a collection of messages, and in a way that is clear for how selects happen. That could work for a simple message. But
57+
58+
STA: Are you suggesting that we have a hierarchy of messages?
59+
60+
EAO: I'm saying that this is a decision that we should address and find an answer to.
61+
62+
ECH: I don’t think that supporting a collection of messages implies that there is a file format that needs to be designed. I don’t think it needs to be a file either.
63+
64+
STA: Talking about collections is useful. I’m not sure we need to solve syntax right now, that is ahead of us.
65+
66+
ZBI: I think that ECH is conflating two concepts. I don’t think we should be narrowing ourselves and we need to make sure we can express our data in a non-file format. But at the same time, we do need to define a file format to target the web. Once we move beyond pure JavaScript, we need to think about how what we’re doing will be used by further projects without trying to scope creep our current project. I hope whatever we design will be a good candidate for a localization system for HTML, and that will require a file format. But we need to recognize that it isn’t the only way to store data.
67+
68+
ECH: Having a grouping of messages is something that in the data model huddle meetings is something [we agreed upon early](https://github.com/unicode-org/message-format-wg/blob/experiments/experiments/data_model/ts_data_models_name_mapping.md). I think if you can create something that can be serialized to a stream of bytes, either a message or group of messages, where it is persisted is a detail exterior to MF2.0. There has to be some syntax, at least one syntax that we maintain. That syntax is not important, whatever is commonplace and requires the least amount of adoption effort, associative data (maps), sequential data (lists), then that is sufficient. Pretty much any syntax does that. JSON is common and does that, we’d have to define a schema.
69+
70+
EAO: I think we’re conflating two different things. One is the canonical source for how you write a message, e.g. in Fluent, MF1.0. The second is what is the expression of some set of messages that has been parsed, how is that to be represented. These are not the same and are optimized for different purposes. How well do we support the expression / embedding of MF2.0 messages into data structures that are used by traditional systems that are technically capable of having a single message being expressed. Is this concern high enough for us should we discard the possibility of using the structures and keys to drive how we do their select.
71+
72+
ECH: I don’t understand the distinction between the serialized format and the representation in memory.
73+
74+
EAO: If I’m writing a message, I want to write it in something humane and easy to write, something terse and easy to read and usable that way. Once this is parsed into the data model, the expression of this can be transferred in a different format that is useful for computers talking to computers. E.g. using YAML vs. using JSON.
75+
76+
ZBI: I think I have a better way to frame this: why is CSS not expressed in JSON? There were debates on how to encode CSS for the web in the early days, and they chose something other than JavaScript. So it’s worth considering we’d have a syntax that is not JavaScript based, we should not assume that JSON is the best way of handling this.
77+
ECH: I consider this not metadata but data, I think the priority is that it is unambiguous to the computer and that making it human friendly is good. I think there’s an assumption in the analogy to CSS that doesn’t apply in the way I see things. Using CSS is talking about something that is a language and is mostly written by programmers and you’re editing it by hand. Translators are not usually programmers so to what extent are things being written by translators. Is CSS really crafted by hand these days.
78+
79+
ZBI: Is CSS really written by programmers these days, that is a good question.
80+
81+
ECH: Does optimization for presentation really matter? I don’t see what is special about a MF syntax that requires something that is specialized.
82+
83+
ZBI: Do you see it for CSS. Would you still create a separate syntax for CSS if you were designing it today?
84+
85+
ECH: Yes, because you’re editing it directly, so the text format is important. I don’t see that as being in MF. If we require that you need to be a programmer to do translation work, then how is the industry working.
86+
87+
EAO: Example from <…> When we’re talking about syntax for humans it needs to be easy to read, but when serialized for computers, it needs a structure that is easy to process. The other thing is that is representable in the data model should be representable in the syntax. We can imagine something that the data model can support that we would not necessarily want to be directly in the syntax, something that results from processing the syntax.
88+
89+
STA: I want to respond to ZBI who is talking about CSS. I do think it is a good analogy, but what is implicit is that we consider CSS a well-designed language. Is it? It represents complex ideas, but I’d hope that the result of our work is simpler than CSS. The syntax is simple, but the semantics are complex. If we want to design something for non-programmers so that it is easy, but we realize we’re working with a complex matter of grammar and languages.
90+
91+
RCA: I just want to share some thoughts on this analogy to CSS. I'm seeing more and more, nowadays, different ways of expressing CSS, ex: CSS-in-GS, etc. It's quite easy to represent CSS as it is to others who know CSS. A good starting point to represent them differently in structure -- a very non-standard way to represent CSS -- the existing ways how flexible we can be in representing CSS. But it brings up the question of how effective were they in designing the original CSS file format? In this year
92+
93+
ECH: I’m going to not touch that question for now, because it is more like a programming language question. I wanted to respond to EAO’s example. EAO, you were talking about two different use cases, the more important use case is the one in which computers talk. And that goes with the idea that we’re including functionality. I’m not precluding the idea that we should have a compact, concise representation for humans, but we shouldn’t maintain as a group a syntax which limits what you can do in the data model. If we choose JSON, and we have a YAML representation that doesn’t have all of the functionality, but we shouldn’t limit our canonical syntax. I prioritize computers over humans when push comes to shove.
94+
95+
EAO: I think computer exchange of data model is relatively easy and non-controversial if we ensure that the data model is representable in json. If it supported by json, it is easy to guarantee that machine exchange will work. I don’t think that that expression is the best expression for humans to use. It is verbose and not suitable for humans to write by hand.
96+
97+
RCA: Adoption of MF2.0 could be affected based upon this decision.
98+
99+
ECH: I was going to say we don’t need to consider the human representation, our job should just be the version that works with computers and if other people want to make something human readable, that is fine. How different is that going to be from JSON for simple messages? If things get more complicated then we need something fully functional. If that does affect adoption, then human friendly representation might be important. How often are people editing things by hand that are complicated?
100+
101+
EAO: At least with MF1, the only way to do it is to write the source by hand. We don’t know where the future will take us, but at the moment, anything complicated needs to be written by hand.
102+
103+
STA: From Fluent, we designed it such that it could be edited by hand by pretty much anyone. Once we started using it, I realized that the only people who were interested in editing Fluent by hand were mostly programmers, and a few translators with programming experience. My recollection is that I personally felt some disappointment that we weren’t able to convince “regular” translators to use Fluent syntax. Instead we jumped through hoops to hide syntax from them and design rich UIs so they don’t have to see syntax. This could be part of a larger conversation about who we expect to edit these files. It is easy to think about translators, but they will likely favour graphical UIs over syntax. But programmers create those localizable strings when they write code, and they will favour a text syntax.
104+
105+
ZBI: We shared this experience, but I think I see it slightly differently. This is more in alignment with how ECH sees it. Editing by hand is a fallback. Predominantly localizers will work with UI. I see three scenarios where this doesn’t happen:
106+
1) Programmers adding new localizable sources.
107+
2) Some organizations will lack resources for UI design / development, and will want to just edit files for simplicity.
108+
3) Fallback, in a big project we create a chain through localization UI, and then there is a last second mess that needs to be fixed quickly. E.g. mistranslated string days before release, can be fixed directly, skipping all of the UI steps.
109+
110+
ZBI: I think that it is a fallacy to say that because UI is the primary target, that we can discount the fallback to text, even if it is the minority of use cases. We could let other people design the human representation, but I think it is a fallacy to say that it is not necessary.
111+
112+
ECH: We can build tooling to handle complicated messages and tooling in text editors can help programmers with this. It is important to make things work for translators, let programmers deal with complexity and textual representations if they need to. Tooling can solve some complications from verbosity. I trust programmers to handle this.
113+
114+
RCA: Do we have an idea of next steps?
115+
116+
EAO: We need a decision on whether we consider human friendly syntax to be a deliverable. We agree on computer friendly representation, but we need to decide on what will fulfill our deliverable with regard to syntax.
117+
118+
RCA: I’m not sure if we can decide this now, or after we have one data model.
119+
120+
EAO: I think it is orthogonal between how the data model looks and whether it has a single syntax.
121+
122+
RCA: We might not want to go deep on syntax while we still have to merge data models. The syntax is the representation of the data model, for the end user the syntax is what matters, not the data model.
123+
124+
STA: I think we should proceed in parallel, since we’re somewhat blocked on the data model front. The syntax discussions might inform the data model discussions. The select logic would be a good action item for this group. We should limit ourselves to a single message for now. I think syntax is tricky whenever people use ‘human friendly’ or ‘readable’ because no one understands these terms the same way.
125+
126+
RCA: If we parallelize this, everything we do requires pushing something further in the future. This might require more effort if we split, we might not finish our first goal.
127+
128+
EAO: I second what STA said. Since we’ve postponed the decision on the data model discussion, one thing that the syntax discussion might give is feedback on how we can represent parts of the data model in the syntax which might inform the data model design. If something is very difficult to represent in the syntax, it might not be worthwhile including in the data model.
129+
130+
ECH: I think the syntax discussion won’t be super in depth and abstract like the data model discussion. So I think it makes sense to do things in parallel. I do want to clarify that we haven’t postponed the data model discussion, it is just taking a long time.
131+
132+
EAO: I meant that we postponed making a decision on the data model, not postponed the data model itself.
133+
134+
RCA: I think we’ve made a good start.
135+
136+
ECH: We should check with the larger group on whether to proceed with the syntax in parallel.
137+
EAO: We should open an issue for this.
138+
139+
ECH: Clarifying in an issue, with some of the points about machine readable vs. human friendly.
140+
141+
RCA: Creating an issue would help people who are not here. We can make things more concrete.
142+
143+
ECH: We should also discuss adoption, but that is subjective in the absence of more data. At some point we’ll have to make a decision and we probably won’t have data to base it on.
144+
145+
EAO: We’re 15 minutes over, the meeting is officially over.
146+
147+
148+

0 commit comments

Comments
 (0)