diff --git a/meetings/2023/notes-2023-10-02.md b/meetings/2023/notes-2023-10-02.md new file mode 100644 index 0000000000..0b5c5f3fb5 --- /dev/null +++ b/meetings/2023/notes-2023-10-02.md @@ -0,0 +1,217 @@ +# 2023-10-02 | MessageFormat Working Group Regular Teleconference + +### Attendees +- Addison Phillips - Unicode (APP) - chair +- Mihai Niță - Google (MIH) +- Elango Cheran - Google (ECH) +- Staś Małolepszy - Google (STA) +- Matt Radbourne - Bloomberg (MRR) + +Scribe: ECH + + +## Topic: Agenda Review + +## Topic: Info Share + +## Topic: Action Item Review + +- MRR: Design proposal for tests + +MRR: I am still working on the PR for creating the test data. ECH had sent the link to the Unicode Conformance repo. It stores the test data with separate files for the test input and expected output, which is different from what I am used to, but I think that is workable. I will try to stick with that format. +I will also work with STA on how to create a formatter whose behavior is well-characterized for testing purposes. +If the work isn’t ready next week, then I hope to have it ready for the following week. + +## Topic: Text mode design issues + +https://github.com/unicode-org/message-format-wg/pull/474 + +APP: I was responding to comments on the PR. I also considered the comments from STA, who said that when you have sigils on both sides, then the syntax starts to look like a statement. + +STA: I keep coming back to questions that I don’t have answers to, but don’t have the tools to answer them. For example, what happens when you have a newline after a message. If that pattern is not delimited, then what happens to the newline? Does it get trimmed? Does it get delimited? The group has different answers. Can we do both? What is our design principle in offering choice by allowing something being expressible in two different ways? + +APP: [Here is a link](https://github.com/unicode-org/message-format-wg/pull/474#discussion_r1340592954) to a comment I made in discussion with EAO on the PR. I said that we are getting hung up on the whitespace. Because we are not defining a resource format, a message that is authored will become one line, and any newlines will become `\n` characters, and then that will look like text. Another topic of conversation with EAO is being able to delimit whitespace characters belonging to the pattern get delimited (ex: `\t`, `\n`, etc.) My point of view is that when you want whitespace inside of the pattern, you quote the whole pattern, which is what we were doing when we started in code mode. In the comment in the above link, I show the different possibilities and the consequences of them. The question is: which option is the bigger tripping hazard for developers? A key assertion between EAO and me is that EAO asserts that any whitespace between message patterns are non-translatable, whereas I assert that sometimes they are, such as in CLI interfaces, or in CJK languages. + +STA: To expand on this, I think this whitespace handling business is tricky. Sometimes, it is part of the localizable content, and sometimes it is part of the required environment that makes that message work in code as you mentioned last week where a `\n\n` was needed before each bullet point in a list. And sometimes, whitespace is needed to make the presentation of the text look nice, such as inserting tabs and spaces. There are more questions that I have to think about, and I don’t have experience to draw on. + +APP: This is interesting because, if we wrote a resource format, the resource format would handle all of that. However, when we write our MF2 examples, we write them across multiple lines, but that is not how they would actually be authored. There are some file formats in which spaces are meaningful, but I find such formats fraught. Maybe if we have statement terminators, that would help, ex: the parser would have something to grab onto. I don’t have answers yet, but I do believe that people will author messages in the way that we provide examples for them. + +MIH: I agree with you, APP and STA. Once you are in the message, it is what-you-see-is-what-you-get. With the newlines, tabs, etc., you are not thinking about the storage format. In the non-HTML world, spaces and newlines matter to formatting, and they have ways to escape that in the storage format. I would really prefer WYSIWYG / no-escaping. + +STA: Putting the message on one line is educational, so thanks for pushing for us to look at that angle. We should consider the main formats in which the messages will be embedded. I also don’t like whitespace-sensitive languages. In actuality, there will be messages authored in those languages. I would like the rules to be simple, whatever they turn out to be. The other question that I want to discuss is auto-trimming. My gut reaction is that I don’t like it, but I don’t know how to articulate it. I did research on programming languages that don’t require string delimiters, etc., which is good for beginners, but it turns out that such languages rely heavily on heuristics. I came up with an example in which auto-trimming might be puzzling. Imagine a string + +``` +#input {$count} + + Hello, + {$user}. + +// -> results in "Hello,\n ${user}" when all leading whitespace is trimmed. +``` + +The problem here is that all of the whitespace that occur at the beginning of the pattern get automatically removed (ex: before `Hello,`, but the whitespace in the middle is preserved (the spaces preceding `{$user}`). + +APP: Don’t forget the trailing spaces of the message pattern. EAO’s argument is that you have to quote all meaningful whitespace. I’m a little nervous about quoting whitespace because it makes the ws in to a placeholder and perhaps hides it from translation. + +MRR: I keep thinking about our two audiences, which is desktop / native app audience and the web audience. For the desktop / native app audience, things are always clearly spelled out (ex: newlines are `\n`), whereas the web audience uses a short of shorthand. We could make it work for the web audience, and for the desktop audience, they can be more explicit because they will, anyways. + +MIH: I have a problem with the idea of “if you need spaces, quote them, otherwise you don’t have to”. There is set of rules for the file format, like Java .properties files where whitespace gets trimmed, and then a set of rules for MF2, and they are not the same as each other, so you have to keep them both in mind. + +STA: + +Take this example: +``` +input {$count} + + {Hello, + {$user}.} +``` + +We create a little bit of friction because it doesn't look right, but it is explicit in creating the equivalent output of the previous example. Maybe we can warn users who try to mix styles between auto-trimming and explicitly delimited message pattern. + +MIH: Just one note: this 2nd example looks perfectly fine to me. It works as expected, and there is no ambiguity. + + +ECH: I’ve used languages where syntax is optional except where it’s not. Invariably there are situations where things don’t work and its unexpected and I find that really annoying. We are trying hard to knock of characters that are… think it’s important to have clear unambiguous syntax. Don’t have to put a semi colon or don’t have to put a dot to indicate field access. Don’t know how beneficial that stuff really is. If that’s our motivation, that’s not good. Think that things APP and MIH said resonate. If you quote whole pattern; don’t have to put braces. If you want whitespace, quote whole pattern and be done with it. + +APP: I agree with that. To make a couple of points: MRR, to what you said, tabs and newlines can interact with the containing resource format. In HTML, whitespace always gets eaten, where multiple spaces get returned as one space, and that can cause problems for Chinese and other CJK languages and Thai. + +In the end, there is a two- or three-part set of options for whitesapce handling, and we will choose one. With whatever we pick, there will be tradeoffs for certain classes of users. Maybe we come up with + + +ECH: In HTML, if you put non-whitespace characters like tabs or newlines at the beginning or end of a message, they will get rendered as a single space. The fact that the whitespace gets rendered as a different character is really confusing, and creating a format where we design for that type of use case seems problematic. + +APP: That is interesting, but it doesn’t get us closer to deciding on whitespace handling. EAO has been forcefully arguing that preceding or trailing whitespace is a bug, and I can’t say that I dsiaggree. But the example that STA showed draws out the problem of whitespace in the middle, which we aren’t talking about. + +STA: +You could make the following messages. + +``` +#input {$count} +Hello, you. +``` + +``` +#input {$count}\nHello, you. +``` + +``` +#input {$count}\n\nHello, you. +``` + +``` +#input {$count}{|\n|}Hello, you. +``` + +APP: In the second message, if you have a bare `\n` without delimiters, it’s not clear if it’s part of the message pattern or if it was just pretty printing of the message syntax. EAO is arguing that we should delimit the preceding whitespace. + +MIH: It’s not about prettiness. If I wanted them there, then I would have put them there. And the caveat with `{|\n|}` is that it becomes a placeholder, and therefore is not localizable. + +APP: + +``` +{|bonjour| @translate=yes} +``` +If you have a literal inside a placeholder, then it is potentially localizable + + +ECH: I think we’re nitpicking. Mihai is saying what is generally the case. Placeholders are generally non-translatable inline elements. + +APP: How do we get out of this box and decide on how to handle whitespace? + +ECH: I still favor the idea of always starting in code mode. If there really is a strong push to allow developers to author simple messages starting in text mode, then we should keep that ruleset simple: delete preceding and trailing whitespace, and if you need it, then delimit the whole pattern. + +However, I think this makes it cumbersome for Chinese and other CKJ languages and Thai, and not for other languages, which I think is unfair. + +MIH: yellow is text, WYSIWYG + + Hello {$username} ! + +{# +let $exp = {$expDate :datetime year=numeric month=full} +{ Your care expires on {$exp} } +} + +{# +match {$foo :string} {$bar :string} +when bar bar { All bar } +when foo foo { All foo } +when * * { Otherwise } +} + +APP: Here are our options: + +- Option 1:quote patterns in code mode +- Option 2:trim all exterior whitespace +- Option 3:all exterior whitespace is meaningful +- Option 4: some language have special whitespace behavior a la html + +STA: If we allow for starting in text mode, we will have problems detecting whether an initial `{...}` is a placeholder or a delimited pattern. Instead of the parser being LL1, it would become LL-infinite. I don’t have a problem with delimiting preceding and trailing whitespace characters as placeholders. + +MIH: I don’t agree with that statement. It is not the same to have whitespace as placeholders, and that those placeholders are localizable. They are not localizable. + +APP: The other problem we haven’t discussed is the discoverability of a quoted pattern. Previously, a curly brace was + +ECH: sorry to not be on the bandwagon. More i hear start in text mode but allow fully delimited patterns. Don’t know if it is a pattern or placeholder. More and more complicated. Trying to revisit code mode vs. text mode. Hitting all edge cases that we solved previously. + +APP: I want to push back that. You’re right that starting in code mode would solve a lot of these problems. But it doesn’t solve the usability problem. It doesn’t support the types of messages that people want to write by default. It would makes messages extra code-y with all of this extra curly brace goo. I know that I would be unhappy having to write messages like that. I think the only problem to support that is what to do with the handling of whitespace around the message patterns. + +MRR: How does the group solicit the opinions of people outside of the group? + +MIH: We talked about that in the meeting in person in Seville. The tricky part is to get people with web and non-web backgrounds. If the question is “what would it take for you to live with the alternate solution”, I would be okay with trimming the whitespaces, but we have to solve the problem of how to do deal with representing preceding and trailing whitespace because using a placeholder means the inside content is non-translatable. We have it documented as such. + +APP: + +Take this example. Would that be okay? + +``` +#input {$user :function} +\n\n Hello {$user} + +{{\n\n Hello {$user}}} +``` + +(APP: my network connection seems to work but not meet) + +MIH: Are you saying that we trim the whitespace in the first message? + +APP: That is the discussion to be had. + +MIH: I am confused by both messages depicted above. + +STA: Have we considered using backslash to escape whitespace? + +MIH: We have discussed that, and the problem is that backspace is used for escaping in other contexts like the containing resource formats. I’ve had lots of problems with this when working with developers in Google who are embedding messages. + +STA: We could also use `\ ` to represent a space, and `\` followed by a newline would represent a newline. + +ECH: what if there is whitespace that is not ascii + +APP: we define whitespace as ASCII whitespace + +ECH: so the rest are just characters and not trimmed. + +MIH: What do people think about the example I pasted above? + +APP: This looks like the option to use sigils to indicate code mode, as depicted in the [PR #485](https://github.com/unicode-org/message-format-wg/pull/485/files). + +STA: It sounds like the option there “use blocks for declaration”. + +MIH: Yes, or maybe a mix of that and sigils for code mode. But once you enter code mode, then you always delimit the message pattern. It makes everything consistent within code mode. + +STA: I’m considered about the + +APP: An exercise that I got us to work through is to first consider `Hello, world!`, then `Hello, {$user}!`, then having a `match ` statement selection pattern. Where I’m landing is that I’m okay with trimming of whitespace for simple messages and patterns within a match statement. The question is how to design the syntax of the match statement. + +STA: What if we didn’t have match statements? What if asked a non-coder to design a match statement? It might look something like: + +``` +[$count = one, $gender = masculine] His +[$count = few, $gender = masculine] Theirs +``` + +MIH: The problem is that people in Google start including a lot of `if` statements in their code, or they misusing the syntax to achieve the equivalent of if/else statements. + + + +