Description
Currently, JSON-LD doesn't do any language-tag validation. However, [RDF Concepts[(https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal) does require that language tags be well formed according to BCP47:
... a non-empty language tag as defined by [BCP47]. The language tag *must be well-formed according to section 2.2.9 of [BCP47].
The other RDF syntaxes use something like the following to validate language tags:
[144s] LANGTAG ::= "@" [a-zA-Z]+ ( "-" [a-zA-Z0-9]+ )*
I could see us changing from a MAY validate language tags to MUST be valid either to [a-zA-Z]+ ( "-" [a-zA-Z0-9]+ )*
, or the more strict ABNF from BCP47:
obs-language-tag = primary-subtag *( "-" subtag )
primary-subtag = 1*8ALPHA
subtag = 1*8(ALPHA / DIGIT)
This should probably be done in context processing, expansion, and fromRDF algorithms to be consistent. The RDF Concepts regex is probably the one to use, as we're in that family.
We do say that a language tag MUST be well formed:
language-tagged string
A language-tagged string consists of a string and a non-empty language tag as defined by [BCP47]. The language tag MUST be well-formed according to section 2.2.9 Classes of Conformance of [BCP47], and is normalized to lowercase.
The Conformance Section of the API document waffles on this:
JSON-LD Processors MUST NOT attempt to correct malformed IRIs or language tags; however, they MAY issue validation warnings. IRIs are not modified other than conversion between relative and absolute IRIs.
Note the fine distinction between correction and validate and that warnings MAP be issued, rather than errors MUST be generated.
This should probably change to processors MUST generate an error and either abort or ignore invalid IRIs or language tags (or base directions). We do require that these be well formed when generating RDF.