-
Notifications
You must be signed in to change notification settings - Fork 1
Enumerate supported metadata/properties for messages, sections & resources #19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I agree with all the proposed tags. In addition to the above, I could see a metadata tag for marking a particular string to not be translated being useful, especially for things like copyright or other legal lines. This would also be useful for excluding certain messages in a resource file from being exposed to localization. I believe in one of the issues tagged here someone else mentioned this, but metadata for indicating character limitations would also be useful. |
🎉
Maybe something like
Would just characters be a sufficient unit for this? If so, then we could have It would probably also be useful to comment on tags that are mostly handled programmatically. So something like:
In syntax terms, this could be handled by only considering the first N space-separated tokens as significant, and allowing a |
Obsolete strings
Do no translateHow do tags apply to the equivalent of a message with attributes? For example, this string.
The "do not translate" only applies to the Length limitsI agree with Bryan on the Fun fact: we have strings where the length limitations is not measured in characters but bytes… Do we need to cover that, by providing an optional unit of measurement (characters by default)? Content validationWe have cases where the string is not really a string but a boolean value (true, false, empty equals false). Do we want something like |
Agreed, that's a better term.
TBD, probably depends on the tag. Something like At least for now, this syntax does not support attributes like FTL does, so the example above would probably end up in MF2 as [about-logins]
...
login-filter.placeholder = Search Passwords
login-filter.key = F and then be referred to in code as
That's pretty much the question, isn't it? Is the case for alternative length units (bytes, lines, anything else?) significant enough that we should support an optional unit?
There are also use cases for numerical values, and for messages consisting of CSS rules. And probably many other limitations as well. I mentioned |
I've updated the list of the first comment here with suggestions from above. Left out |
The one is Firefox is pretty weird, IIRC it's a limitation caused by legacy hardware, which causes strings to be stored as 32 and 64 bytes. Based on that, not worth covering. At the same time, it would be worth investigating how app stores calculate max lengths (example from Apple), because that's the most common use case nowadays. Is it obvious what a character means in this context, e.g. when looking at Latin based writing vs CJK? |
I was not able to find a character definition from Apple. For Google Play I found this:
Twitter provides their own definition for which characters count double; I think that's specific to them? At least the consideration of every URL as having a length of 23 chars is Twitter-only. |
Not sure how accurate this is but comparing the ja vs en version of this page for iOS app store, it seems that both languages have the same character limits (30 chars for title, 170 for descriptions). Coming from a ja->en translation background, I could see the case where having the ability to specify x number of characters for Japanese and a larger number of characters for different scripts would be useful, though I have no idea how complicated such an implementation would be. |
I've been pondering something related to this, but may be unique enough to our usecase that it doesn't justify general functionality. We are looking for a mechanism to purge unused strings from the system, but the nature of the product is that there will be no static analysis that can ensure 100% that a string key is unused. If static analysis and test automation finds no usages of a string key, then I intend to mark it as "suspected unused". That way, if it is used in production, the string repository can be notified, and the flag can be removed. Similar to the @obsolete tag, but maybe semantically different enough to warrant its own tag. |
This does seem different from I think we have three choices here:
My sense would be that custom tags should be allowed, but require or strongly suggest that they be namespaced. |
| Allow custom tags, but require them to be namespaced, probably with : as a separator (as in the MF2 identifier). Pragmatically, if someone is introducing custom tags in the resource, then they almost certainly have a custom implementation of the resource parser (even if it is customized via extension). Portability of resource bundles with the custom tags is less of a concern, in my mind, because of that. Non-conflict with future evolution is the primer driver, I think. Either way, namespacing gives the most safety and flexibility. |
Hi everyone, I found out about this at FOSDEM and I'm looking forward to this becoming a standard (and using it in Python 😄) I was wondering if you've considered supporting a property/metadata for screenshots? We don't use this ourselves, but there are platforms (e.g. Transifex & Weblate) that let you attach a screenshot to a specific message to give translators more information about where and how the message is used in the UI. There would be no runtime impact, it would simply be something that GUI tools built on top of MF2 could display if they choose to do so. To give a concrete example:
Admittedly, this could be achieved with a custom property, but perhaps having a standardized name is useful? |
I think that This would enable some nice usecases:
|
I had not considered requiring any of the properties, but with an explicit locale combined with decent defaults for other values, a resource can indeed be considered complete just by itself. Often, the locale is also stored and read separately from the resource as a part of its filename or path, or otherwise. However, there is no universal specification for exactly how this is done. Including the locale in the resource is also done by other resource formats, such as gettext and XLIFF; the latter in fact has at least Adding a required resource property would make the frontmatter always required, so a minimal resource would be something like @locale en-US
---
key = value This is... okay? While it adds a little bit of bulk to the format, the locale does still need to be stored somewhere, and this has the side benefit of making the file format distinguishable from just its contents. Based on the research I did under #14, I'm relatively confident that In other words, I'm struggling a bit to find any significant downsides to requiring a locale for the resource. |
Returning to this once again, the concern raised by @SimonClark in #20 could perhaps be resolved at the resource loader level: If we allow and expect more then resource file to be potentially loaded into the same context (such that their messages and sections would share a namespace), then the sort of message variance presented in #20 (comment) in particular could be supported by a property like This way, you could for instance load first an
|
Uh oh!
There was an error while loading. Please reload this page.
In addition to supporting
@tags
in general at the resource syntax level, we should figure out common meanings for some property tags. Doing so would also inform further discussion on how to determine which (if any) properties might have any formatting runtime impact.The following prior art may be relevant, especially as it'll define developer expectations. Are there other similar definitions that are relevant?
To get this started, at least these tags should be considered (in no particular order):
Note: List updated 2024-12-18 based on comments from @bcolsson and @flodolo.
@version
Allows for explicitly versioning a source string, so that it can be changed. This allows for differentiating typo fixes from actual changes in message contents. This doesn't have a runtime impact, but the
(id, version)
tuple can be used by tooling instead of just the messageid
to uniquely identify a message and its translations. The@version
value probably should not be fixed to mandate semver or any other spec, but also allow date strings or anything else -- as long as the value is new for this message, it can be treated as a new version.@param
For documenting variables. No runtime impact, but very significant for translators. Having a well-defined structure for this tag is pretty important, at least to identify the variable its description is pertaining to. In addition to describing the variable in words, it could include:
@obsolete
(was
@deprecated
)Explicitly mark a message (also a section/entire resource?) as obsolete. This could be used in workflows where messages are not immediately removed when they are no longer referenced by code, but kept in to support patch releases for previous versions. During translation, this can be used to de-prioritize such messages. This tag could include a way to note some version or timestamp when the removal happened, or be paired with a second
@removed-in
or similar tag.@locale
Establish the locale code for messages, probably only at the resource level? Many localization systems depend on the locale code being effectively encoded in the path, but being able to represent it within the resource could prove very useful (much like XLIFF). This could well have a runtime impact as well, esp. when accounting for fallbacking and the formatting of messages in resources coming from a different locale.
@format
While this resource format is being designed primarily with MF2 in mind, it's at least possible to consider supporting other message formats within it as well. Experiences with .properties have shown that being able to explicitly define the format for a resource (or even a single message) would make its processing significantly easier, both during translation and formatting.
@schema
MF2 messages will use functions. The core/default set will include a few like
:number
and:string
, but implementations and users are free to extend and override these with their own. We should be able to define a reference in a resource to the schema or registry that's defining such messages. This tag may have a formatting runtime impact, if an implementation can use it to load the required functions dynamically.@do-not-translate
Mark a message or section as fixed for all locales. It should be available in all locales, but always hold the same value.
@max-length
Limit the length of a formatted message. Requires at least a numerical qualifier, possibly with a units indicator. Default should be "characters" or "code points", but alternatives like "bytes" and "lines" could also be supported. Probably no formatting runtime impact?
@allow-empty
Explicitly mark a message with an empty pattern as valid. Most empty messages are mistakes, so being able to mark ones that may be empty would be useful. Should probably be accompanied by an explanatory note.
Do you agree with all of the above? Are there aspects that I've not accounted for? What other tags should we be considering?
The text was updated successfully, but these errors were encountered: