-
Notifications
You must be signed in to change notification settings - Fork 157
Add '@language' container type #133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
RESOLVED: Attempt to add other @container options, such as "@container": "@language" to support Wikidata's language-map use case. |
I think the general idea can work. I'm against the compaction example that Markus provided, however, because I think it's confusing. It's particularly confusing because I see @container as currently defining how to interpret the JSON value associated with the term. For example, {"foo": {"@container": "@set"}} currently means to me: the JSON value for the key "foo" is an unordered array. It follows then that {"foo": {"@container": "@language"}} means: the JSON value for the key "foo" is a language map, that is, a JSON object where the keys are language identifiers and the values are language strings. I'm fine with that. However, in the compaction example, the JSON value for "foo" is instead an array, which is actually a @set (presumably), that presumably somewhere contains a language map but also other values that aren't maps at all. I'd much prefer us to take a different route for handling the compaction case. I might be ok with the JSON value of "foo" being an array, but only so long as each JSON value within that array is a language map, nothing else. This would permit multiple language maps (is there actually a use case for this?). However, IMO, there should definitely be nothing other than language maps for the term "foo". If there exists a value for "foo" that cannot be unambiguously converted to a language map, then it should not be used with that term -- which means selecting the absolute IRI if there are no other applicable terms available. I think this same compaction behavior should apply to @id and @type maps, etc. (See: #134). |
So, this is not going in a direction that I was originally proposing, based on the WikiData use-cases. Basically, there are two ways in which you might consider using language maps:
Consider a typical internationalization use case, where you have a resource with values expressed in multiple language; for example, an abbreviated example from DBPedia:
Using the value-map syntax, this could be represented as follows:
The problem here, is that indexing of values always requires d-referencing through the property before looking for a language variant. If a developer is looking for all resources that have descriptions in some language, this requires deeper navigation. If, however the property-map version is used, all values sharing a common language are nicely contained together. This might be represented as follows:
This mapping chunks properties together sharing a common language, and makes it easier to see all relevant information in the same place, and do a common query for a language (object.en) to find all keys appropriate for that language. It's possible that we could include both representations. Consider a possible change to Expansion and Value Expansion Algorithms: For value-map; in Expansion: Before 2.2:
For property-map; in Expansion: Before 2.2.2:
We may want to use different @container values to distinguish the use-cases, but the algorithm can handle each case as is without anything further. |
Now I see where you are coming from Gregg but I'm still not sure I agree with the proposal. If you model your data in that way isn't that basically the same as having multiple documents - one per language? And if so, wouldn't that be simply solved by using an array at the document's top-level (or
The access is not as convenient as in your proposed solution but I think it would adress the same use cases -- even though I doubt you agree :-) -- and wouldn't require any further changes to JSON-LD. I agree with what Dave said regarding compaction of language maps. Values that can't be brought to language map form should stay under their absolute IRI (or go under another term). |
RESOLVED: Support language-maps via the |
Based on today's call, the use of value-map proposal where the value can be a node, as well as a string, makes sense to me. In this case, the node would be anonymous, and use the skos-xl pattern to designate the primary value, but this would also allow other properties to be asserted on the value (such as pronunciation). Extending the example from above, this might be represented as follows:
If the
In Turtle, this might look like the following:
|
Should this
compact to:
or to:
|
July 10th, 2012 telecon: [16:36] markus: What should we do about @value's that are not language-tagged? |
... and what about when a property with a language container has a value similar to this one:
What language should be used for the language map? The first one? That might be non-deterministic (properties are not guaranted to be processed in order) if we don't sort them ourselves first. |
For the simple case of creating a language map of various literals it is quite usable. But deep application of Still, for the given example, we also discussed another variant of expression. It could be more appropriate in this case to use The corresponding example would be:
Yielding:
That is, (.. Nor a non-string value? Or we could specify that at least for this generic variant the actual mapped property and value must also be explicitly repeated in the object? It's certainly something to iron out.) |
I'm pretty sure that niklasl's proposal above would work for Drupal's multilingual field use case. This would be helpful, since it would mean we wouldn't have to deal with the complexity of named graphs. |
RESOLVED: Add support for language maps via the |
Manu, the text you added only allows strings or arrays of strings as values of language tags. As I recall, we were considering more things, such as skos-xl representations for those values. This was certainly part of Denny's use case: the ability to have other values hanging off of the language tag. Was this an oversight, or am I mis-remembering? |
It was an oversight, you're remembering correctly. I forgot about that part of it. The language-map approach was intended to be a short-hand for setting the @language in the @context. So, I think we can allow any value that is allowed in a regular value position, IIRC. I'll update the spec and try to think through all of the potential values. |
Sorry, I also have to reopen this issue until the API algorithms have been updated and we decided how this works when compacting. We also need to check whether this really fixes @linclark's problem - which I still doubt. |
RESOLVED: The group is committed to support language maps and property generators in JSON-LD 1.0. |
Issue #159 deals with how round-tripping of language maps ( required by the Drupal community) could be supported. |
Members of the RDF working group have expressed some concern about JSON-LD diverging from the RDF data model, and our proposed solution in Issue #159 specifically adds syntactic information that is not based on the RDF data model. Other than that, I think most everything else actually is. The issue relates to round-tripping properties with @container: @language from compact form to expanded form and back. Consider the following node definition:
With the current proposal, this would expand as follows:
The problem is that the @Languge added to the node definitions does not relate to the RDF model; in fact, if it is translated to RDF, you get something like the following:
Any language association is lost. As an alternative, we could consider using a specific blank-node pattern which does generate a reasonable RDF translation. The data could instead expand as follows:
Note that we're now using node deefinitions with a The resulting Turtle representation would look like the following:
The compaction rules would need to consider such node definitions when selecting values for each language tag. Another advantage is that it allows for all value types to be round-tripped, including typed values and those represented as value objects. |
Gregg, are you proposing to generate those blank nodes also in expanded JSON-LD or just when converting to RDF? I fear it’s the former :-) |
They need to ba generated when expanding, or it won't come out in the RDF. I believe it's round-trip-able in JSON-LD, so that the blank nodes would be consumed in compaction, but of course not if compaction didn't include a language container term. Still, it's an odd corner case anyway. |
Couldn’t we just generate them when going to RDF and just keep an @language in expanded JSON-LD? |
@niklasl It may be that it's more complex than necessary, but I have yet to see what problems the added complexity introduces, and it seems to solve our use case without any contortions in our own data handling. If you have a chance to outline the practical implications that you predict from the added complexity of named graphs in the other issue, it would really help in fully evaluating whether it's a viable approach. Thanks! |
We are no longer pursuing language maps for our use case, but one proposal has come up offline a couple of times now. In case the CG plans to continue development of language maps, I want to make sure that the following flaw with the proposal is recorded in this thread. The proposal is to wrap the object in a blank node. However, this would limit the vocabularies that you can use with language maps. For example:
This would violate the range constraint of schema:articleBody. |
Another thing we discussed, rather than using BNodes, is to use property extensions. For example, this could result in the following:
Schema.org doesn't actually need the subPropertyOf, but it allows other reasoners to know that the properties are related. |
Yes, it's good to lay that proposal to rest. It should also be noted that (AFAIK) the only reason it came up was to attempt to preserve data which the JSON desired by Drupal expresses in an unusual shape (where our interpretation violated most known vocabularies, as was noted when proposed). The original language map proposal on the other hand is (was originally, and can now continue to be) only about expressing as keys the languages of language-tagged literals. That does not violate these constraints. It just provides a syntactical language map in JSON for what otherwise has to be iterated over. (The other, extended, more complex proposal for The current problem (separate from this issue) is that Drupal doesn't want this information (the language-like keys) preserved at all, only to syntactically preserve the shape. The reason is (IIUC) to hide the treatment of language-based versions of the descriptions from being exposed in RDF. That seems to require either:
Again, I believe that the simplest solution would be to just acknowledge and express these language versions in public data, regardless of how they are diffused over node properties internally in Drupal. I must also stress that these things are much easier to reason about if the data is first expressed as RDF, which has grounded semantics, and only once the meaning is established seek any possible compact syntactical forms of that, for the purpose of matching desirable usage patterns (in programming or templating). |
We already resolved a while ago
PROPOSAL 3: The values of the key-value pairs of a language map MUST be strings or arrays of strings. When expanded, the strings are tagged with the language specified by the key. When compacting, only language-tagged strings will match a term that has a We could also allow other values such as plain literals or nodes but, as the language information would be lost during expansion, I don't think we should do that. If we disallow this now we drastically simplify the introduction of more sophisticated mechanisms at a later point in time since it won't change existing data. Therefore the MUST in the proposal above. |
@niklasl As I've pointed out, named graphs actually do handle our use case while also expressing the information in RDF. I believe you and I disagree about how odd this is. For example, JeniT has written about named graphs used for versioning UK government data. But that is besides the point and I don't want to hijack the thread with this debate, or with more attempts to convince us to handle our language-based entity variants as separate resources. I believe the current problem is how the CG deals with other (non-Drupal) use cases. For example, when I met with Gregg in Berkeley, it seemed that he had his own use case for language maps that could contain node references. @lanthaler I would prefer to use named graphs, and thus would still like to see #195 developed. However, Manu discussed another way which would not preserve the information in RDF, but would at least be good enough for us. |
To be clear, I have a use case where I have RDF data including information for separate languages, that I need to serialize in RDF. It was not necessarily the case that it needed to be done with language maps. In fact, named graphs may very well be the best way to do it. The Wikia case is different, though, as there are different resources for each language version (like WikiPedia), so named graphs might make sense, if you use named graphs to describe the resources of each page. |
@linclark I've replied in to this in issue 195, since you're right that this issue should focus on the effect of Also, I hope you have time to consider issue #196, which is an attempt to handle a bunch of related topics regarding syntactic extensibility with no defined semantics. I'm not sure if it'll have traction, but I believe it is close to the variant that has been discussed offline that you thought may be useful. |
+1 to PROPOSAL 3. For expansion, I would say that non-string (or array of string) values of a property with language maps are expanded to use the property, but loose the language association. That is, they don't round-trip. As a general principle, I'm fine with syntactic constructs that allow for zero-edits when expanding, but opposed to them for round-tripping through expansion, unless they also have a representation that can be round-tripped through RDF. |
+1 to PROPOSAL 3 (for the behavior of I also agree with the general principle. However, as noted in the last part of the #196 description, I may be willing to compromise that principle in that case if it is proven essential to usage and doesn't wreak havoc upon the expansion algorithm. Not for this issue though (since preserving |
RESOLVED: The values of the key-value pairs of a language map MUST be strings or arrays of strings. When expanded, the strings are tagged with the language specified by the key. When compacting, only language-tagged strings will match a term that has a |
To be clear: JSON-LD 1.0 will support simple language maps. When using a language map and expanding, if the term's language key's value is not a simple string, the rule for using the language map does not apply (all language-map values get dropped). When compacting, if all statements in the list are not simple @value/@language objects, then the term that defines the language map does not match (the statements are kept in expanded form). |
@msporny, could you please explain what you mean with the last sentence:
Every value will be evaluated separately and there might be values without an |
I just want to clarify that arrays of strings, or strings within an @list are also considered as being appropriate for use with language maps. To clarify @msporny's description of compaction, a language map term is only appropriate for values which have @language. Otherwise, other terms (at a lower term rank) can also be considered, defaulting to an expanded IRI if none is found. |
|
I would prefer to not do that. |
@gkellogg I added the "@container": "@language" algorithms to the spec, but without the support for "@list" as you mentioned above as it would complicate the algorithm and seems like we could always add that feature later, if necessary. If the folks that have been active in this thread could look at the commit diff and make sure I implemented this correctly, I'd appreciate it. It took a very long time to figure out where to hook into the various algorithms on this feature and even once I did the work, it was difficult to figure out if there were going to be any side-effects from the modification to the algorithms. |
I'm fine with not having @list support. |
At a first glance the changes to the algorithms look to be OK. There are a few minor things that I would change such as, e.g.,
This sounds like result could be an object containing non-unique keys which we certainly don't want. I will review all the algorithms as soon as I updated my processor. |
… a competing term This addresses #133.
…nguage maps are available Also removed unnecessary data from compact-0026-context.jsonld. This addresses #133.
This leads to much better performance as the keys don't have to be lower-cased multiple times. This addresses #133.
The idea of this issue is to set
@container
of a property to@language
to allow L10n of JSON property values as shown in the following example:When expanded, this should result in:
Compaction might be a bit trickier if there are other properties that are not language tagged for the same property. They either have to stay under the full IRI in that case or contain at least one keyword to be distinguishable from language maps, something like:
Something similar was discussed before under the term "language map" (#29) and came up again in a discussion Gregg had with @vrandezo. There has also been some discussion on the mailing list:
Gregg originally proposed to use something he called "folding" for this and #134:
The text was updated successfully, but these errors were encountered: