Skip to content

Add '@language' container type #133

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lanthaler opened this issue Jun 8, 2012 · 70 comments
Closed

Add '@language' container type #133

lanthaler opened this issue Jun 8, 2012 · 70 comments

Comments

@lanthaler
Copy link
Member

The idea of this issue is to set @container of a property to @language to allow L10n of JSON property values as shown in the following example:

{
  "@context": {
    "label": {
      "@id": "http://example.com/label",
      "@container": "@language"
    }
  },
  "@id": "http://buckingham.uk/queenie",
  "label": {
    "en": "The Queen",
    "de": "Die Koenigin"
  }
}

When expanded, this should result in:

[
  {
    "@id": "http://buckingham.uk/queenie",
    "http://example.com/label": [
      { "@value": "The Queen", "@language": "en" },
      { "@value": "Die Königin", "@language": "de" }
    ]
  }
]

Compaction might be a bit trickier if there are other properties that are not language tagged for the same property. They either have to stay under the full IRI in that case or contain at least one keyword to be distinguishable from language maps, something like:

{
  "@context": {
    "label": {
      "@id": "http://example.com/label",
      "@container": "@language"
    }
  },
  "@id": "http://buckingham.uk/queenie",
  "label": [
    {
      "en": "The Queen",
      "de": "Die Koenigin"
    },
    "No language",
    5,
    true,
    {
      "@id": "_:b1",   <-- a keyword MUST be present to distinguish an object from a language map
      "prop": value"
    }
  ]
}

Something similar was discussed before under the term "language map" (#29) and came up again in a discussion Gregg had with @vrandezo. There has also been some discussion on the mailing list:


Gregg originally proposed to use something he called "folding" for this and #134:

{
  "@context": {
    "en": {"@id": null", "@language": "en", "@fold": true},
    "de": {"@id": null", "@language": "de", "@fold": true},
    "queenie": {"@id": null", "@fold": true}
  },
  "queenie": {
    "@id": "http://buckingham.uk/queenie",
    "label": {
      "en": { "@value": "The Queen" },
      "de": { "@value": "Die Königin"}
    }
  }
}
@gkellogg
Copy link
Member

RESOLVED: Attempt to add other @container options, such as "@container": "@language" to support Wikidata's language-map use case.

@dlongley
Copy link
Member

I think the general idea can work. I'm against the compaction example that Markus provided, however, because I think it's confusing. It's particularly confusing because I see @container as currently defining how to interpret the JSON value associated with the term.

For example, {"foo": {"@container": "@set"}} currently means to me: the JSON value for the key "foo" is an unordered array. It follows then that {"foo": {"@container": "@language"}} means: the JSON value for the key "foo" is a language map, that is, a JSON object where the keys are language identifiers and the values are language strings. I'm fine with that. However, in the compaction example, the JSON value for "foo" is instead an array, which is actually a @set (presumably), that presumably somewhere contains a language map but also other values that aren't maps at all.

I'd much prefer us to take a different route for handling the compaction case. I might be ok with the JSON value of "foo" being an array, but only so long as each JSON value within that array is a language map, nothing else. This would permit multiple language maps (is there actually a use case for this?). However, IMO, there should definitely be nothing other than language maps for the term "foo". If there exists a value for "foo" that cannot be unambiguously converted to a language map, then it should not be used with that term -- which means selecting the absolute IRI if there are no other applicable terms available.

I think this same compaction behavior should apply to @id and @type maps, etc. (See: #134).

@gkellogg
Copy link
Member

So, this is not going in a direction that I was originally proposing, based on the WikiData use-cases. Basically, there are two ways in which you might consider using language maps:

  1. As a value-map, where a property contains an object who's keys are language elements, who's values in turn are language-tagged strings.
  2. As property-map, where some or all of the keys of a subject definition are language elements, who's values are objects containing properties of the original subject definition.

Consider a typical internationalization use case, where you have a resource with values expressed in multiple language; for example, an abbreviated example from DBPedia:

@prefix dbpedia:    <http://dbpedia.org/resource/> .
@prefix owl:    <http://www.w3.org/2002/07/owl#> .
@prefix rdf:    <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:   <http://www.w3.org/2000/01/rdf-schema#> .
@prefix yago:   <http://dbpedia.org/class/yago/> .

dbpedia:Linked_Data rdf:type    yago:Buzzwords ;
owl:sameAs  <http://rdf.freebase.com/ns/m/02r2kb1>, dbpedia:Linked_Data .
dbpedia:Linked_Data rdfs:comment
  "La propuesta de datos vinculados (linked data) surge dentro de marco general de la Web sem\u00E1ntica. El t\u00E9rmino \"datos vinculados\" hace referencia al m\u00E9todo con el que se pueden mostrar, intercambiar y conectar datos a trav\u00E9s de URIs desreferenciables en la Web."@es ,
    "Linked Open Data (LOD) bezeichnet im World Wide Web frei verf\u00FCgbare Daten, die per Uniform Resource Identifier (URI) identifiziert sind und dar\u00FCber direkt per HTTP abgerufen werden k\u00F6nnen und ebenfalls per URI auf andere Daten verweisen. Idealerweise werden zur Kodierung und Verlinkung der Daten das Resource Description Framework (RDF) und darauf aufbauende Standards wie SPARQL und die Web Ontology Language (OWL) verwendet, so dass Linked Open Data gleichzeitig Teil des Semantic Web ist."@de ,
    "I dati collegati (linked data in inglese) sono un aspetto del web semantico. Il termine dati collegati \u00E8 usato per descrivere un metodo di esporre, condividere e connettere dati attraverso URI deferenziabili."@it ,
    "In computing, linked Data describes a method of publishing structured data so that it can be interlinked and become more useful. It builds upon standard Web technologies such as HTTP and URIs, but rather than using them to serve web pages for human readers, it extends them to share information in a way that can be read automatically by computers. This enables data from different sources to be connected and queried."@en ,
    "Le Web des donn\u00E9es (Linked Data, en Anglais) est une initiative du W3C(Consortium World Wide Web) visant \u00E0 favoriser la publication de donn\u00E9es structur\u00E9es sur le Web, non pas sous la forme de silos de donn\u00E9es isol\u00E9s les uns des autres, mais en les reliant entre elles pour constituer un r\u00E9seau global d'informations."@fr ,
    "\u9375\u9023\u8CC7\u6599\u662F\u6B63\u5728\u5FEB\u901F\u767C\u5C55\u7684\u8A9E\u7FA9\u7DB2\u7684\u4E00\u7CFB\u5217\u7684\u6D3B\u52D5\uFF0C\u5B83\u63CF\u8FF0\u4E86\u4E00\u5957\u5728\u5168\u7403\u8CC7\u8A0A\u7DB2\u4E0A\u767C\u4F48\u3001\u5206\u4EAB\u3001\u53CA\u9023\u7D50\u8CC7\u6599\u7684\u65B9\u6CD5\u3002\u4E3B\u8981\u4EE5\u53EF\u53C3\u7167\u7684URI\u4F5C\u70BA\u6700\u57FA\u672C\u7684\u8981\u7D20\u3001\u4EE5RDF\u4F5C\u70BA\u63CF\u8FF0\u9023\u7D50\u7684\u8A9E\u8A00\u3002"@zh ;
rdfs:label
  "Linked Open Data"@de ,
    "Datos vinculados"@es ,
    "\u9375\u9023\u8CC7\u6599"@zh ,
    "Linked Data"@en ,
    "Dati collegati"@it ,
    "Web des donn\u00E9es"@fr ;

Using the value-map syntax, this could be represented as follows:

{
  "@context": {
    ...
    "rdfs:comment": {"@container": "@language"}
    "rdfs:label": {"@container": "@language"}
  },
  "@id": "dbpedia:Linked_Data",
  "owl:sameAs": ["http://rdf.freebase.com/ns/m/02r2kb1", "dbpedia:Linked_Data"],
  "rdfs:comment": {
    "es": "La propuesta de datos vinculados (linked data) surge dentro de marco general de la Web sem\u00E1ntica. El t\u00E9rmino \"datos vinculados\" hace referencia al m\u00E9todo con el que se pueden mostrar, intercambiar y conectar datos a trav\u00E9s de URIs desreferenciables en la Web.",
    "de": "Linked Open Data (LOD) bezeichnet im World Wide Web frei verf\u00FCgbare Daten, die per Uniform Resource Identifier (URI) identifiziert sind und dar\u00FCber direkt per HTTP abgerufen werden k\u00F6nnen und ebenfalls per URI auf andere Daten verweisen. Idealerweise werden zur Kodierung und Verlinkung der Daten das Resource Description Framework (RDF) und darauf aufbauende Standards wie SPARQL und die Web Ontology Language (OWL) verwendet, so dass Linked Open Data gleichzeitig Teil des Semantic Web ist.",
    "it": "I dati collegati (linked data in inglese) sono un aspetto del web semantico. Il termine dati collegati \u00E8 usato per descrivere un metodo di esporre, condividere e connettere dati attraverso URI deferenziabili.",
    "en": "In computing, linked Data describes a method of publishing structured data so that it can be interlinked and become more useful. It builds upon standard Web technologies such as HTTP and URIs, but rather than using them to serve web pages for human readers, it extends them to share information in a way that can be read automatically by computers. This enables data from different sources to be connected and queried.",
    "fr": "Le Web des donn\u00E9es (Linked Data, en Anglais) est une initiative du W3C(Consortium World Wide Web) visant \u00E0 favoriser la publication de donn\u00E9es structur\u00E9es sur le Web, non pas sous la forme de silos de donn\u00E9es isol\u00E9s les uns des autres, mais en les reliant entre elles pour constituer un r\u00E9seau global d'informations.",
    "zh": "\u9375\u9023\u8CC7\u6599\u662F\u6B63\u5728\u5FEB\u901F\u767C\u5C55\u7684\u8A9E\u7FA9\u7DB2\u7684\u4E00\u7CFB\u5217\u7684\u6D3B\u52D5\uFF0C\u5B83\u63CF\u8FF0\u4E86\u4E00\u5957\u5728\u5168\u7403\u8CC7\u8A0A\u7DB2\u4E0A\u767C\u4F48\u3001\u5206\u4EAB\u3001\u53CA\u9023\u7D50\u8CC7\u6599\u7684\u65B9\u6CD5\u3002\u4E3B\u8981\u4EE5\u53EF\u53C3\u7167\u7684URI\u4F5C\u70BA\u6700\u57FA\u672C\u7684\u8981\u7D20\u3001\u4EE5RDF\u4F5C\u70BA\u63CF\u8FF0\u9023\u7D50\u7684\u8A9E\u8A00\u3002";
  },
  "rdfs:label": {
    "es": "Datos vinculados",
    "de": "Linked Open Data",
    "it": "Dati collegati",
    "en": "Linked Data",
    "fr": "Web des donn\u00E9es",
    "zh": "\u9375\u9023\u8CC7\u6599";
  }
}

The problem here, is that indexing of values always requires d-referencing through the property before looking for a language variant. If a developer is looking for all resources that have descriptions in some language, this requires deeper navigation.

If, however the property-map version is used, all values sharing a common language are nicely contained together. This might be represented as follows:

{
  "@context": {
    ...
    "es": {"@container": "@language"},
    "de": {"@container": "@language"},
    ...
  },
  "@id": "dbpedia:Linked_Data",
  "owl:sameAs": ["http://rdf.freebase.com/ns/m/02r2kb1", "dbpedia:Linked_Data"],
  "es": {
    "rdfs:comment": "La propuesta de datos vinculados (linked data) surge dentro de marco general de la Web sem\u00E1ntica. El t\u00E9rmino \"datos vinculados\" hace referencia al m\u00E9todo con el que se pueden mostrar, intercambiar y conectar datos a trav\u00E9s de URIs desreferenciables en la Web.",
    "rdfs:label": "Datos vinculados"
  },
  "de": {
    "rdfs:comment": "Linked Open Data (LOD) bezeichnet im World Wide Web frei verf\u00FCgbare Daten, die per Uniform Resource Identifier (URI) identifiziert sind und dar\u00FCber direkt per HTTP abgerufen werden k\u00F6nnen und ebenfalls per URI auf andere Daten verweisen. Idealerweise werden zur Kodierung und Verlinkung der Daten das Resource Description Framework (RDF) und darauf aufbauende Standards wie SPARQL und die Web Ontology Language (OWL) verwendet, so dass Linked Open Data gleichzeitig Teil des Semantic Web ist.",
    "rdfs:label": "Linked Open Data"
  },
  "it": {
    "rdfs:comment": "La propuesta de datos vinculados (linked data) surge dentro de marco general de la Web sem\u00E1ntica. El t\u00E9rmino \"datos vinculados\" hace referencia al m\u00E9todo con el que se pueden mostrar, intercambiar y conectar datos a trav\u00E9s de URIs desreferenciables en la Web.",
    "rdfs:comment": "I dati collegati (linked data in inglese) sono un aspetto del web semantico. Il termine dati collegati \u00E8 usato per descrivere un metodo di esporre, condividere e connettere dati attraverso URI deferenziabili.",
    "rdfs:label": "Dati collegati"
  },
  "en": {
    "rdfs:comment": "In computing, linked Data describes a method of publishing structured data so that it can be interlinked and become more useful. It builds upon standard Web technologies such as HTTP and URIs, but rather than using them to serve web pages for human readers, it extends them to share information in a way that can be read automatically by computers. This enables data from different sources to be connected and queried.",
    "rdfs:label": "Linked Data"
  },
  "fr": {
    "rdfs:comment": "Le Web des donn\u00E9es (Linked Data, en Anglais) est une initiative du W3C(Consortium World Wide Web) visant \u00E0 favoriser la publication de donn\u00E9es structur\u00E9es sur le Web, non pas sous la forme de silos de donn\u00E9es isol\u00E9s les uns des autres, mais en les reliant entre elles pour constituer un r\u00E9seau global d'informations.",
    "rdfs:label": "Web des donn\u00E9es"
  },
  "zh": {
    "rdfs:comment": "\u9375\u9023\u8CC7\u6599\u662F\u6B63\u5728\u5FEB\u901F\u767C\u5C55\u7684\u8A9E\u7FA9\u7DB2\u7684\u4E00\u7CFB\u5217\u7684\u6D3B\u52D5\uFF0C\u5B83\u63CF\u8FF0\u4E86\u4E00\u5957\u5728\u5168\u7403\u8CC7\u8A0A\u7DB2\u4E0A\u767C\u4F48\u3001\u5206\u4EAB\u3001\u53CA\u9023\u7D50\u8CC7\u6599\u7684\u65B9\u6CD5\u3002\u4E3B\u8981\u4EE5\u53EF\u53C3\u7167\u7684URI\u4F5C\u70BA\u6700\u57FA\u672C\u7684\u8981\u7D20\u3001\u4EE5RDF\u4F5C\u70BA\u63CF\u8FF0\u9023\u7D50\u7684\u8A9E\u8A00\u3002";
    "rdfs:label": "\u9375\u9023\u8CC7\u6599"
  }
}

This mapping chunks properties together sharing a common language, and makes it easier to see all relevant information in the same place, and do a common query for a language (object.en) to find all keys appropriate for that language.

It's possible that we could include both representations. Consider a possible change to Expansion and Value Expansion Algorithms:

For value-map; in Expansion:

Before 2.2:

  • If active property is has @container: @language, and every key in element is of the form language (from BCP47) and does not map to an absolute IRI, the return value is an array constructed from the result of performing Value Expansion on each value using a copy of context with @language set to each key from element in turn.

For property-map; in Expansion:

Before 2.2.2:

  • If property does not expand to a keyword or absolute IRI and property has @container: @language, value MUST be a JSON object.
    • Process the object using a copy of context with @language set to property using the existing active subject and active property.
    • For each key in the resulting expanded object, either merge value into an existing property property of element, or create a new property property with value as value.

We may want to use different @container values to distinguish the use-cases, but the algorithm can handle each case as is without anything further.

@lanthaler
Copy link
Member Author

Now I see where you are coming from Gregg but I'm still not sure I agree with the proposal. If you model your data in that way isn't that basically the same as having multiple documents - one per language? And if so, wouldn't that be simply solved by using an array at the document's top-level (or @graph) and setting a new default language in each "sub-document". Something like

{
  "@context": { .. shared terms .. }
  "@graph": [
   {
     "@context": { "@language": "en" },
     ... English "sub-document" ...
   },
   {
     "@context": { "@language": "de" }
      ... German "sub-document" ...
  }
 ]
}

The access is not as convenient as in your proposed solution but I think it would adress the same use cases -- even though I doubt you agree :-) -- and wouldn't require any further changes to JSON-LD.

I agree with what Dave said regarding compaction of language maps. Values that can't be brought to language map form should stay under their absolute IRI (or go under another term).

@lanthaler
Copy link
Member Author

RESOLVED: Support language-maps via the "@container": "@language" pattern in @context.

@gkellogg
Copy link
Member

gkellogg commented Jul 3, 2012

Based on today's call, the use of value-map proposal where the value can be a node, as well as a string, makes sense to me. In this case, the node would be anonymous, and use the skos-xl pattern to designate the primary value, but this would also allow other properties to be asserted on the value (such as pronunciation). Extending the example from above, this might be represented as follows:

{
  "@context": {
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
    "ipa": "http://dbpedia.org/resource/International_Phonetic_Alphabet",
    "skosxl": "http://www.w3.org/2008/05/skos-xl#",
    "label": {"@id": "rdfs:label", "@container": "@language"}
  },
  "@id": "http://dbpedia.org/resource/Queen_Elizabeth",
  "label": {
    "en": {
      "@type": "skosxl:Label",
      "skosxl:literalForm": "Queen Elizabeth",
      "ipa": "kwiːn ʔeˈliːzabɛt əˈlɪzəbəθ"
    },
    "de": {
      "@type": "skosxl:Label",
      "skosxl:literalForm": "Königin Elisabeth",
      "ipa": "ˈkøːnɪɡɪn ʔeˈliːzabɛt"
    }
  }
}

If the @container: @language means to apply the property as a language within a context, this could expand to the following:

[{
  "@id": "http://dbpedia.org/resource/Queen_Elizabeth",
  "http://www.w3.org/2000/01/rdf-schema#label": [
    {
      "@type": "http://www.w3.org/2008/05/skos-xl#Label",
      "http://www.w3.org/2008/05/skos-xl#literalForm": [{
        "@value": "Queen Elizabeth",
        "@language": "en"
      }],
      "http://dbpedia.org/resource/International_Phonetic_Alphabet": [{
        "@value": "kwiːn ʔeˈliːzabɛt əˈlɪzəbəθ",
        "@language": "en"
      }]
    },
    {
      "@type": "http://www.w3.org/2008/05/skos-xl#Label",
      "http://www.w3.org/2008/05/skos-xl#literalForm": [{
        "@value": "Königin Elisabeth",
        "@language": "de"
      }],
      "http://dbpedia.org/resource/International_Phonetic_Alphabet": [{
        "@value": "ˈkøːnɪɡɪn ʔeˈliːzabɛt",
        "@language": "de"
      }]
    }
  ]
}]

In Turtle, this might look like the following:

<http://dbpedia.org/resource/Queen_Elizabeth> rdfs:label
  [ a skosxl:Label;
    skosxl:literalForm "Queen Elizabeth"@en;
    ipa: "kwiːn ʔeˈliːzabɛt əˈlɪzəbəθ"@en ],
  [ a skosxl:Label;
    skosxl:literalForm "Königin Elisabeth"@de;
    ipa: "ˈkøːnɪɡɪn ʔeˈliːzabɛt"@de ],

@lanthaler
Copy link
Member Author

Should this

[{
  "@id": "http://dbpedia.org/resource/Queen_Elizabeth",
  "http://www.w3.org/2000/01/rdf-schema#label": [
    {
      "@type": "http://www.w3.org/2008/05/skos-xl#Label",
      "http://www.w3.org/2008/05/skos-xl#literalForm": [{
        "@value": "Queen Elizabeth",
        "@language": "en"
      }],
      "http://www.example.com": [ { "@value": "not language tagged" } ],
      "http://dbpedia.org/resource/International_Phonetic_Alphabet": [{
        "@value": "kwiːn ʔeˈliːzabɛt əˈlɪzəbəθ",
        "@language": "en"
      }]
    },
    {
      "@type": "http://www.w3.org/2008/05/skos-xl#Label",
      "http://www.w3.org/2008/05/skos-xl#literalForm": [{
        "@value": "Königin Elisabeth",
        "@language": "de"
      }],
      "http://www.example.com": [ { "@value": "not language tagged" } ],
      "http://dbpedia.org/resource/International_Phonetic_Alphabet": [{
        "@value": "ˈkøːnɪɡɪn ʔeˈliːzabɛt",
        "@language": "de"
      }]
    }
  ]
}]

compact to:

{
  "@context": {
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
    "ipa": "http://dbpedia.org/resource/International_Phonetic_Alphabet",
    "skosxl": "http://www.w3.org/2008/05/skos-xl#",
    "ex": "http://www.example.com",
    "label": {"@id": "rdfs:label", "@container": "@language"}
  },
  "@id": "http://dbpedia.org/resource/Queen_Elizabeth",
  "label": {
    "en": {
      "@type": "skosxl:Label",
      "skosxl:literalForm": "Queen Elizabeth",
      "ex": [ { "@value": "not language tagged" } ],
      "ipa": "kwiːn ʔeˈliːzabɛt əˈlɪzəbəθ"
    },
    "de": {
      "@type": "skosxl:Label",
      "skosxl:literalForm": "Königin Elisabeth",
      "ex": [ { "@value": "not language tagged" } ],
      "ipa": "ˈkøːnɪɡɪn ʔeˈliːzabɛt"
    }
  }
}

or to:

{
  "@context": {
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
    "ipa": "http://dbpedia.org/resource/International_Phonetic_Alphabet",
    "skosxl": "http://www.w3.org/2008/05/skos-xl#",
    "ex": "http://www.example.com",
    "label": {"@id": "rdfs:label", "@container": "@language"}
  },
  "@id": "http://dbpedia.org/resource/Queen_Elizabeth",
  "rdfs:label": {
      "@type": "skosxl:Label",
      "ex": "not language tagged"
  },
  "label": {
    "en": {
      "skosxl:literalForm": "Queen Elizabeth",
      "ipa": "kwiːn ʔeˈliːzabɛt əˈlɪzəbəθ"
    },
    "de": {
      "skosxl:literalForm": "Königin Elisabeth",
      "ipa": "ˈkøːnɪɡɪn ʔeˈliːzabɛt"
    }
  }
}

@lanthaler
Copy link
Member Author

July 10th, 2012 telecon:

[16:36] markus: What should we do about @value's that are not language-tagged?
[16:36] gkellogg: I think it should remain in expanded form.
[16:37] gkellogg: The way that I was proposing it was that the result is to set the language specified in the key as the default language in the context.
[16:37] gkellogg: The other way to do it would be to override the language definition of 'ex' to say that the language is null...
[16:38] niklasl: It's hard to know what X means here.
[16:38] gkellogg: We need to be careful here about how to set xsd:string - it's an RDF 1.1 model issue, so a back-end should implement it this way, though. A plain literal gets the datatype of xsd:string.
[16:39] gkellogg: From RDF Concepts: "A language-tagged string is any literal whose datatype IRI is equal to http://www.w3.org/1999/02/22-rdf-syntax-ns#langString."

@lanthaler
Copy link
Member Author

... and what about when a property with a language container has a value similar to this one:

    {
      "@type": "http://www.w3.org/2008/05/skos-xl#Label",
      "http://www.example.com/deutsch": [ { "@value": "deutsch", "@language": "de" } ],
      "http://www.example.com/english": [ { "@value": "english", "@language": "en" } ],
      "http://www.example.com/italiano": [ { "@value": "italiano", "@language": "it" } ]
   }

What language should be used for the language map? The first one? That might be non-deterministic (properties are not guaranted to be processed in order) if we don't sort them ourselves first.

@niklasl
Copy link
Member

niklasl commented Jul 10, 2012

For the simple case of creating a language map of various literals it is quite usable. But deep application of @language might make compaction very (too?) complex.

Still, for the given example, we also discussed another variant of expression. It could be more appropriate in this case to use dc:language to describe the bnode itself as being in/about a specific language. That could warrant a generic extension of @context to take any property references as values (as well as the special @set, @list or @language (and possibly @id or @type)). A property using such a construct would then provide mappings for specific property values within the referenced objects.

The corresponding example would be:

{
  "@context": {
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
    "dc": "http://purl.org/dc/terms/",
    "skosxl": "http://www.w3.org/2008/05/skos-xl#",
    "ipa": "http://dbpedia.org/resource/International_Phonetic_Alphabet",
    "labelByLang": {"@id": "skosxl:prefLabel", "@container": "dc:language"}
  },
  "@id": "http://dbpedia.org/resource/Queen_Elizabeth",
  "labelByLang": {
    "en": {
      "@type": "skosxl:Label",
      "skosxl:literalForm": "Queen Elizabeth",
      "ipa": "kwiːn ʔeˈliːzabɛt əˈlɪzəbəθ"
    },
    "de": {
      "@type": "skosxl:Label",
      "skosxl:literalForm": "Königin Elisabeth",
      "ipa": "ˈkøːnɪɡɪn ʔeˈliːzabɛt"
    }
  }
}

Yielding:

[{
  "@id": "http://dbpedia.org/resource/Queen_Elizabeth",
  "http://www.w3.org/2008/05/skos-xl#prefLabel": [
    {
      "http://purl.org/dc/terms/language": ["en"],
      "@type": "http://www.w3.org/2008/05/skos-xl#Label",
      "http://www.w3.org/2008/05/skos-xl#literalForm": ["Queen Elizabeth"],
      "http://dbpedia.org/resource/International_Phonetic_Alphabet": ["kwiːn ʔeˈliːzabɛt əˈlɪzəbəθ"]
    },
    {
      "http://purl.org/dc/terms/language": ["de"],
      "@type": "http://www.w3.org/2008/05/skos-xl#Label",
      "http://www.w3.org/2008/05/skos-xl#literalForm": ["Königin Elisabeth"],
      "http://dbpedia.org/resource/International_Phonetic_Alphabet": ["ˈkøːnɪɡɪn ʔeˈliːzabɛt"]
    }
  ]
}]

That is, labelByLang in this context would match anything related to via skosxl:prefLabel which itself has a property dc:language. The @container mechanism would create an object map with keys for each value of that property per related resource. It would not match a resource lacking that property or having more than one value for it.

(.. Nor a non-string value? Or we could specify that at least for this generic variant the actual mapped property and value must also be explicitly repeated in the object? It's certainly something to iron out.)

@linclark
Copy link
Contributor

I'm pretty sure that niklasl's proposal above would work for Drupal's multilingual field use case. This would be helpful, since it would mean we wouldn't have to deal with the complexity of named graphs.

@lanthaler
Copy link
Member Author

RESOLVED: Add support for language maps via the "@container": "@language" annotation in @context. For example: "tags": { "@id": "http://example.com/vocab#tags", "@container": "@language"}. The child property of the term MUST be an associative array. All associative array keys MUST be BCP47 language strings.

@gkellogg
Copy link
Member

Manu, the text you added only allows strings or arrays of strings as values of language tags. As I recall, we were considering more things, such as skos-xl representations for those values. This was certainly part of Denny's use case: the ability to have other values hanging off of the language tag.

Was this an oversight, or am I mis-remembering?

@msporny msporny reopened this Aug 19, 2012
@msporny
Copy link
Member

msporny commented Aug 19, 2012

It was an oversight, you're remembering correctly. I forgot about that part of it. The language-map approach was intended to be a short-hand for setting the @language in the @context. So, I think we can allow any value that is allowed in a regular value position, IIRC. I'll update the spec and try to think through all of the potential values.

@lanthaler
Copy link
Member Author

Sorry, I also have to reopen this issue until the API algorithms have been updated and we decided how this works when compacting. We also need to check whether this really fixes @linclark's problem - which I still doubt.

@lanthaler
Copy link
Member Author

RESOLVED: The group is committed to support language maps and property generators in JSON-LD 1.0.

@lanthaler
Copy link
Member Author

Issue #159 deals with how round-tripping of language maps ( required by the Drupal community) could be supported.

lanthaler added a commit that referenced this issue Sep 19, 2012
This test assumes that the language is injected as the default language into the active context. Depending on the outcome of issue #159 this test might need to be updated.

This addresses #133.
@gkellogg
Copy link
Member

Members of the RDF working group have expressed some concern about JSON-LD diverging from the RDF data model, and our proposed solution in Issue #159 specifically adds syntactic information that is not based on the RDF data model. Other than that, I think most everything else actually is. The issue relates to round-tripping properties with @container: @language from compact form to expanded form and back. Consider the following node definition:

{
 "@context": {
   "label": {"@id": "http://example.com/label", "@container": "@language"}
 },
 "@id": "http://buckingham.uk/queenie",
 "label": {
   "en": ["The Queen", {"@id": "http://example.com/the_queen"}],
   "de": ["Die Königin", {"@id": "http://example.de/die_königin"}]
 }
}

With the current proposal, this would expand as follows:

[
 {
   "@id": "http://buckingham.uk/queenie",
   "http://example.com/label": [
     { "@value": "The Queen", "@language": "en" },
     { "@value": "Die Königin", "@language": "de" },
     {"@id": "http://example.com/the_queen", "@language": "en"},
     {"@id": "http://example.de/de_königin", "@language": "de"}
   ]
 }
]

The problem is that the @Languge added to the node definitions does not relate to the RDF model; in fact, if it is translated to RDF, you get something like the following:

<http://buckingham.uk/queenie> <http://example.com/label">
 "The Queen"@en,
 "Die Königin"@de,
 <http://example.com/the_queen>,
 <http://example.de/die_königin>.

Any language association is lost.

As an alternative, we could consider using a specific blank-node pattern which does generate a reasonable RDF translation. The data could instead expand as follows:

 [{
   "@id": "http://buckingham.uk/queenie",
   "http://example.com/label": [
     { "@value": "The Queen", "@language": "en" },
     { "@value": "Die Königin", "@language": "de" },
     {
       "http://purl.org/dc/terms/language": "en",
       http://www.w3.org/1999/02/22-rdf-syntax-ns#value: {"@id": "http://example.com/the_queen"}
      },
     {
       "http://purl.org/dc/terms/language": "de",
       "http://www.w3.org/1999/02/22-rdf-syntax-ns#value": {"@id": "http://example.de/de_königin}
     }
   ]
 }]

Note that we're now using node deefinitions with a dc:language property, and an rdf:value that references the other value. Of course, a major downside of this is placing built-in dependencies on external vocabularies. We could consider creating equivalents in a json-ld namespace (jsonld:language, jsonld:value), but I don't know if that really helps too much.

The resulting Turtle representation would look like the following:

<http://buckingham.uk/queenie> <http://example.com/label">
  "The Queen"@en,
  "Die Königin"@de,
  [dc:language "en"; rdf:value <http://example.com/the_queen>], 
  [dc:language "de"; rdf:value <http://example.de/die_königin>].

The compaction rules would need to consider such node definitions when selecting values for each language tag. Another advantage is that it allows for all value types to be round-tripped, including typed values and those represented as value objects.

@lanthaler
Copy link
Member Author

Gregg, are you proposing to generate those blank nodes also in expanded JSON-LD or just when converting to RDF? I fear it’s the former :-)

@gkellogg
Copy link
Member

They need to ba generated when expanding, or it won't come out in the RDF.

I believe it's round-trip-able in JSON-LD, so that the blank nodes would be consumed in compaction, but of course not if compaction didn't include a language container term. Still, it's an odd corner case anyway.

@lanthaler
Copy link
Member Author

Couldn’t we just generate them when going to RDF and just keep an @language in expanded JSON-LD?

@linclark
Copy link
Contributor

linclark commented Nov 9, 2012

@niklasl It may be that it's more complex than necessary, but I have yet to see what problems the added complexity introduces, and it seems to solve our use case without any contortions in our own data handling.

If you have a chance to outline the practical implications that you predict from the added complexity of named graphs in the other issue, it would really help in fully evaluating whether it's a viable approach. Thanks!

@linclark
Copy link
Contributor

We are no longer pursuing language maps for our use case, but one proposal has come up offline a couple of times now. In case the CG plans to continue development of language maps, I want to make sure that the following flaw with the proposal is recorded in this thread.

The proposal is to wrap the object in a blank node. However, this would limit the vocabularies that you can use with language maps.

For example:

<node/1> schema:articleBody
    [ :en "This is the body text"^^rdf:HTML; ] .

This would violate the range constraint of schema:articleBody.

@gkellogg
Copy link
Member

Another thing we discussed, rather than using BNodes, is to use property extensions. For example, this could result in the following:

<node/1> schema:articleBody/en "This is the body of text"^^rdf:HTML .
schema:articleBody/en rdfs:subPropertyOf schema:articleBody

Schema.org doesn't actually need the subPropertyOf, but it allows other reasoners to know that the properties are related.

@niklasl
Copy link
Member

niklasl commented Nov 11, 2012

Yes, it's good to lay that proposal to rest. It should also be noted that (AFAIK) the only reason it came up was to attempt to preserve data which the JSON desired by Drupal expresses in an unusual shape (where our interpretation violated most known vocabularies, as was noted when proposed).

The original language map proposal on the other hand is (was originally, and can now continue to be) only about expressing as keys the languages of language-tagged literals. That does not violate these constraints. It just provides a syntactical language map in JSON for what otherwise has to be iterated over. (The other, extended, more complex proposal for @container about mapping on regular properties (see e.g. point 2 in my comment above), is also free of any odd data patterns, RDF-wise.)

The current problem (separate from this issue) is that Drupal doesn't want this information (the language-like keys) preserved at all, only to syntactically preserve the shape. The reason is (IIUC) to hide the treatment of language-based versions of the descriptions from being exposed in RDF. That seems to require either:

  • splicing faux language keys, actually representing these versions as named graphs, between term and value (issue Add '@graph' container type #195), or
  • some kind of probing (akin to Support for controlled probing of unlinked objects #84, but quite different in detail: instead ignoring the object with the special key but needing linkage preserved, plus extended for expansion), or
  • a way to concatenate a term with its object's keys, as Gregg suggests above (to be used in combination with schema.org:s particular property extension mechanism).

Again, I believe that the simplest solution would be to just acknowledge and express these language versions in public data, regardless of how they are diffused over node properties internally in Drupal. I must also stress that these things are much easier to reason about if the data is first expressed as RDF, which has grounded semantics, and only once the meaning is established seek any possible compact syntactical forms of that, for the purpose of matching desirable usage patterns (in programming or templating).

@lanthaler
Copy link
Member Author

@linclark, just out of curiosity, how are you going to address your use case?

Does that also mean that you no longer need "@container": "@graph" (#195)?

@lanthaler
Copy link
Member Author

We already resolved a while ago

RESOLVED: Add support for language maps via the "@container": "@language" annotation in @context. For example: "tags": { "@id": "http://example.com/vocab#tags", "@container": "@language"}. The child property of the term MUST be an associative array. All associative array keys MUST be BCP47 language strings.

PROPOSAL 3: The values of the key-value pairs of a language map MUST be strings or arrays of strings. When expanded, the strings are tagged with the language specified by the key. When compacting, only language-tagged strings will match a term that has a "@container": "@language" mapping. Terms that have a "@container": "@language" mapping MUST NOT be type-coerced.

We could also allow other values such as plain literals or nodes but, as the language information would be lost during expansion, I don't think we should do that. If we disallow this now we drastically simplify the introduction of more sophisticated mechanisms at a later point in time since it won't change existing data. Therefore the MUST in the proposal above.

@linclark
Copy link
Contributor

@niklasl As I've pointed out, named graphs actually do handle our use case while also expressing the information in RDF. I believe you and I disagree about how odd this is. For example, JeniT has written about named graphs used for versioning UK government data. But that is besides the point and I don't want to hijack the thread with this debate, or with more attempts to convince us to handle our language-based entity variants as separate resources.

I believe the current problem is how the CG deals with other (non-Drupal) use cases. For example, when I met with Gregg in Berkeley, it seemed that he had his own use case for language maps that could contain node references.

@lanthaler I would prefer to use named graphs, and thus would still like to see #195 developed. However, Manu discussed another way which would not preserve the information in RDF, but would at least be good enough for us.

@gkellogg
Copy link
Member

I believe the current problem is how the CG deals with other (non-Drupal) use cases. For example, when I met with Gregg in Berkeley, it seemed that he had his own use case for language maps that could contain node references.

To be clear, I have a use case where I have RDF data including information for separate languages, that I need to serialize in RDF. It was not necessarily the case that it needed to be done with language maps. In fact, named graphs may very well be the best way to do it. The Wikia case is different, though, as there are different resources for each language version (like WikiPedia), so named graphs might make sense, if you use named graphs to describe the resources of each page.

@niklasl
Copy link
Member

niklasl commented Nov 12, 2012

@linclark I've replied in to this in issue 195, since you're right that this issue should focus on the effect of @container: @language only.

Also, I hope you have time to consider issue #196, which is an attempt to handle a bunch of related topics regarding syntactic extensibility with no defined semantics. I'm not sure if it'll have traction, but I believe it is close to the variant that has been discussed offline that you thought may be useful.

@gkellogg
Copy link
Member

+1 to PROPOSAL 3.

For expansion, I would say that non-string (or array of string) values of a property with language maps are expanded to use the property, but loose the language association. That is, they don't round-trip.

As a general principle, I'm fine with syntactic constructs that allow for zero-edits when expanding, but opposed to them for round-tripping through expansion, unless they also have a representation that can be round-tripped through RDF.

@niklasl
Copy link
Member

niklasl commented Nov 12, 2012

+1 to PROPOSAL 3 (for the behavior of @container: @language).

I also agree with the general principle. However, as noted in the last part of the #196 description, I may be willing to compromise that principle in that case if it is proven essential to usage and doesn't wreak havoc upon the expansion algorithm. Not for this issue though (since preserving @language would add ambiguous or nonsensical information).

@lanthaler
Copy link
Member Author

RESOLVED: The values of the key-value pairs of a language map MUST be strings or arrays of strings. When expanded, the strings are tagged with the language specified by the key. When compacting, only language-tagged strings will match a term that has a "@container": "@language" mapping. Terms that have a "@container": "@language" mapping MUST NOT be type-coerced.

@msporny
Copy link
Member

msporny commented Nov 19, 2012

To be clear: JSON-LD 1.0 will support simple language maps. When using a language map and expanding, if the term's language key's value is not a simple string, the rule for using the language map does not apply (all language-map values get dropped). When compacting, if all statements in the list are not simple @value/@language objects, then the term that defines the language map does not match (the statements are kept in expanded form).

@lanthaler
Copy link
Member Author

@msporny, could you please explain what you mean with the last sentence:

When compacting, if all statements in the list are not simple @value/@language objects, then the term that defines the language map does not match (the statements are kept in expanded form).

Every value will be evaluated separately and there might be values without an @language that weren't part of a language map... but maybe I'm don't understand what you are saying.

@gkellogg
Copy link
Member

I just want to clarify that arrays of strings, or strings within an @list are also considered as being appropriate for use with language maps.

To clarify @msporny's description of compaction, a language map term is only appropriate for values which have @language. Otherwise, other terms (at a lower term rank) can also be considered, defaulting to an expanded IRI if none is found.

@lanthaler
Copy link
Member Author

I just want to clarify that arrays of strings, or strings within an @list are also considered as being appropriate for use with language maps.

@list as well? Really? That would make everything a lot more complex

@gkellogg
Copy link
Member

@list could work for multiple lists just like the example in #172 distinguishes based on language. However, I don't have a strong opinion on this.

@lanthaler
Copy link
Member Author

I would prefer to not do that.

@msporny msporny closed this as completed in 51de407 Dec 3, 2012
@msporny
Copy link
Member

msporny commented Dec 3, 2012

@gkellogg I added the "@container": "@language" algorithms to the spec, but without the support for "@list" as you mentioned above as it would complicate the algorithm and seems like we could always add that feature later, if necessary. If the folks that have been active in this thread could look at the commit diff and make sure I implemented this correctly, I'd appreciate it. It took a very long time to figure out where to hook into the various algorithms on this feature and even once I did the work, it was difficult to figure out if there were going to be any side-effects from the modification to the algorithms.

@gkellogg
Copy link
Member

gkellogg commented Dec 3, 2012

I'm fine with not having @list support.

@lanthaler
Copy link
Member Author

At a first glance the changes to the algorithms look to be OK. There are a few minor things that I would change such as, e.g.,

For each item in multilingual array, add a key-value pair to the language map

This sounds like result could be an object containing non-unique keys which we certainly don't want. I will review all the algorithms as soon as I updated my processor.

lanthaler added a commit that referenced this issue Dec 8, 2012
lanthaler added a commit that referenced this issue Dec 8, 2012
lanthaler added a commit that referenced this issue Dec 8, 2012
…nguage maps are available

Also removed unnecessary data from compact-0026-context.jsonld.

This addresses #133.
lanthaler added a commit that referenced this issue Dec 11, 2012
…SON object

See Gregg's changes in 8c546b9.

This addresses #133 and #196.
lanthaler added a commit that referenced this issue Dec 11, 2012
This leads to much better performance as the keys don't have to be lower-cased multiple times.

This addresses #133.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants