-
Notifications
You must be signed in to change notification settings - Fork 19
Valid or well-formed language tags? #36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This was discussed during the i18n telecon, and @aphillips wrote a proposed edit to clarify why well-formed is usually sufficient. See https://github.com/w3c/bp-i18n-specdev/pull/34/files/105cc74bae89d08312d87736c5bb15b26fc450a8..8ce3958fea79166494296788c7e6162999d4d5fc for the PR, and https://aphillips.github.io/bp-i18n-specdev/#sec_lang_values for the rendered version. |
I realize user agents probably don’t care about all of the parts of the tag, but there’s no lenience between strict adherence and no checking, and that still makes picking one over the other hard to assess (at least for me!). If we only say that tags be well-formed, then, as I understand it, I can write this:
instead of “en”, and it won't result in a warning because it’s well formed. The problem here is that it leads to subtle bugs. The only indication of a mistake may come when a user agent fails to load a dictionary or preload a tts engine, for example, which may not be realized until a publication has already reached the user. If we chose strict validity, then every subtag has to be valid, and I agree that in most cases it's not information that the user agent cares about. For us, it's probably also information that isn't going to be specified or checked. But given the two extremes, it seems more practical to warn users about the language being invalid, even if the rest of the subtags go unexamined. How do we go about this, though? Is it reasonable to assert well-formedness and also require a valid language as an additional requirement? |
@mattgarrish Thanks for the comment. Generally, you want to require valid language tags in content, even if your normative requirement on implementations only extends to well-formed checking. Most specifications are second-order consumers of language metadata--they are using data already provided in the document format (HTML @lang, XML xml:lang, or the document format's language fields/attributes). Generally most specifications are concerned with selecting resources (such as spell check, tokenizers, fonts, etc.) or with matching (selecting which string to show, for example) and don't directly care about the content of the language tag. Invalid-but-well-formed tags just don't match anything and usually fallback schemes provide some behavior that is appropriate. There might be cases where a specification really wants implementation-level checking. In those cases, the result of a tag failing to be valid has to be specified (die? warn? what?). It's also a problem that the registry changes over time, so each implementation is registry-version dependent. The changes over time are small, minor, and mostly "not that interesting", but they do exist and real users may encounter interoperability issues if random (out of date) spec implementations start barfing on their (perfectly valid) language tags. So I generally agree with you and the edit I'm working on hopefully spells this out better than the current text. Thoughts? |
Sure, that's true. We inherit language from json-ld, and even the I just brought this up with the publishing working group and it appears that no one cares too much about invalid language tags, assuming, as you say, that the end result is no harm. So I guess I'm just the outlier worrying too much. :) |
I think this is already dealt with by the current set of recommendations. I am closing this issue, but will file new ones on the section about BCP47, since there's actually text in that section suggesting more work :-(. |
Comments arising from self-review at either w3c/json-ld-wg#93 or w3c/pub-manifest#38 (not clear which).
from Matt Garrish:
from Ivan Herman:
The text was updated successfully, but these errors were encountered: