-
-
Notifications
You must be signed in to change notification settings - Fork 36
Develop a resource syntax together with the message syntax #265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think that if we design the syntax for MF2 to work nicely with the most popular formats out there (.xml / .html, .json, .properties, .strings, .rc, embedded in code / gettext) then we will have little friction with a MF2 Resource format. Claiming otherwise is a red flag for me. So, should we have it in mind? |
I feel pretty strongly that this WG, and the CLDR spec portion that it is tasked to produce, should not define a resource file format. This WG should define a data model, a syntax that is usable in many places, and a function registry. There are lots of resource file formats, and database structures etc., defined by lots of organizations and projects to fit their needs and workflows. We will not replace existing formats, and we should not design something that requires a particular format. We should focus on message strings that can be carried reasonably easily in a wide variety of such formats and systems. @macchiati FYI |
We are in agreement. This WG is scoped down to the per-message syntax and API. The request in this issue is to recognize that a separate group should explore resource level syntax and the insight from that work should impact the design of this WG's work on per-message syntax. In other words, I advocate against serialization of work where we would finalize per-message syntax and then start looking into resource syntax. I think such approach would miss an opportunity to inform per-message syntax with the needs of per-resource syntax and limit the quality of per-resource syntax in areas such as resilience, readability, recovery, meta information etc. How much time this WG should give to receive an insight into per-resource one, I'm not sure yet, and I don't want to block on it indefinitely. I hope we can overlap those two workstreams and design reinforced synergy between them. |
Would it be appropriate to ask remit from the CLDR-TC to start a separate subgroup to discuss the resource syntax? |
I think it would be a good course of action. I'm reluctant to push it further than advisory because I don't like asking for work that I do not have cycles to commit. I am not at the moment able to commit my time to work on the resource syntax, so I'm merely indicating that serializing those two steps is imho a recipe for a bad design on both sides. |
I would think that if we keep in mind a rich set of existing formats (.properties, .json, .strings, .rc, xml, html, yaml, hard-coded strings & .po (gettext), maybe a few more), we should be fine. I would find it a bit worrisome if some newly invented l10n format designed for mf2 introduces "revolutionary concepts" that don't already exist in the existing formats. That is independent from low-level concerns like "how do we escape newline" |
I disagree with that assessment for two reasons:
That's for the resource format WG to decide. Depending on how such resource format will decide to handle message storage, meta information storage, groups and relations it may be. I'm particularly concerned by what I see as dismissal tone of the message in relation to my belief that this is a significant space to explore that should have ability to alter per-message syntax and thus should be explored prior to per-message syntax freeze. |
Then feel free to add other existing formats? Yes, json / xml / yaml are not l10n file formats. In fact I would argue they are not file formats, they are "meta-formats" So there are in fact l10n formats based on xml and json, if you want to split hairs. I don't understand what you mean by one pass / 2 pass.
They have nothing to do with that?
Sorry, that was not the intent. I find the suspect the idea of a storage syntax that "alter per-message syntax". But if we design a Unicode format that wants to be universally adopted then I find the idea suspect. Especially since I don't see an example of what that would look like, other than "we have to wait and see" So again: what kind of format you envision that is not already expressible in the existing formats? |
One important aspect that is not currently expressible in existing resource formats is the explicit association of a comment containing translator-relevant metadata with a message or a group of messages. There are certainly some common practices around this, but those practices are in fact for the most part against the specs of the underlying formats. |
In 1-pass file format, the parser parses the syntax of the resource and gets parsed messages directly. In 2-pass it first parses the container format (JSON, XML, TOML, YAML) and then retrieves messages that another parser parses. Human interacting with a 2-pass format can introduce errors on either of two levels.
What is CSS format? storage or content?
The problem I see is not expressiveness. You can encode absolutely anything in JSON and XML. The problem is how to create a human-readable/editable/writable resource format for MF2. I believe that a group of 5 MF2 messages, with meta data, variants and multiline content, encoded in JSON will not be readable/editable/writable by a human. |
I think my understanding may have evolved somewhat. If we're only creating a pattern string format consumed by the runtime API, then I'm still free to create a resource format over the top of that with rich support for (for example) localization, such as comments and other metadata. This is similar to the existing I do think that some of our initial discussions/tenets are called into question by this. In particular, I think that the XLIFF binding will be (necessarily) incomplete. For example, the resource format we had at Amazon has base direction metadata or string- and file-level comments that this spec cannot know about. Still, we were compiling our resource format into the runtime format. This spec would supply that runtime format. |
Again, nothing prevents one from writing a parser that does json + messages in one pass, other than saving programming effort. I don't see any conceptual difference between these 3 formats:
vs
vs
The number of passes is an implementation details.
Storage. |
We can debate if this is "against the specs" of the Java properties:
It is very much in the spirit of what Java does in Javadoc. But I will not argue that. Android strings https://developer.android.com/guide/topics/resources/localization
The The ITS W3C standard It is designed to work with any XML (and HTML) format. |
And we (of course) have in-house formats where this kind of meta info for localization is very much standard. |
OK, here is a Google format that is not internal and I can share: https://github.com/google/app-resource-bundle/wiki/ApplicationResourceBundleSpecification Take:
These are 100% meta for localization: |
I've checked the "TC Message Format 2.0 Resolution" from 2022-03-31, and there is an entry on this topic:
|
A separate working group on message resources is now being bootstrapped. |
@eemeli Can I close this? |
@macchiati Would it be possible to move the resource WG repo under |
@macchiati Pinging to see if we can bring the resource WG somewhere visible. Otherwise, I intend to close this issue. |
This is forked from #263, which went a bit off-topic but in a constructive way.
Quoting @aphillips from #263 (comment):
And @zbraniecki from #263 (comment):
The text was updated successfully, but these errors were encountered: