Skip to content

Detail how IRI conflicts are resolved when compacting/expanding #74

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
msporny opened this issue Jan 31, 2012 · 19 comments
Closed

Detail how IRI conflicts are resolved when compacting/expanding #74

msporny opened this issue Jan 31, 2012 · 19 comments

Comments

@msporny
Copy link
Member

msporny commented Jan 31, 2012

From Gregg:

So, one significant complication by looking up term coercions vs. IRI coercions is when compacting IRIs. IRI compaction is now more complicated, as there may be multiple terms associated with an absolute IRI. We now must take into consideration the datatypes of values associated with a specific key to choose between multiple possible term mappings.

Are we making this to complex? Before, the algorithms specified coercion based on expanded IRIs, not unexpanded terms, CURIEs or IRIs.

For example, consider the following:

{
  "@context":
  {
    "a": {"@id": "http://example.com/foo", "@type": "xsd:integer"},
    "b": {"@id": "http://example.com/foo"}
  },
  "a": "1",
  "b": "1"
}

I've not declared the following RDF:

[ <http://example.com/foo> 1, "1" ] .

When expanded, I get the following:

{
  "http://example.com/foo": {"@literal": "1", "@type": "http://www.w3.org/2001/XMLSchema#integer"},
  "http://example.com/foo": "1"
}

Of course, this is illegal, so expansion rules need to consider that multiple keys may need to be resolved:

{
  "http://example.com/foo":
  [
    {"@literal": "1", "@type": "http://www.w3.org/2001/XMLSchema#integer"},
    "1"
  ]
}

Now when we go to compact, it becomes even more difficult. Do we use two different keys and re-split this? Looking up the appropriate term becomes much more complicated.

This problem would be simplified if we restricted a context from having at most on mapping from a term/CURIE/IRI to an IRI, allowing a reverse map. As the context algorithm is specified now, this is pretty much what happens:

It is also used to maintain coercion mappings from IRIs associated with terms to datatypes, and list mappings for IRIs associated with terms

This describes that the mapping is from IRIs, not terms. I suggest we keep this algorithm and restrict the context to have only a single mapping from term to IRI and no mapping for CURIEs or IRIs to IRI, depending on the prefix being a term.

Still, even as it is now, a user could specify two terms mapping to the same IRI, in which case the one which is used for compaction becomes undefined, as is any coercion rule used.

@lanthaler
Copy link
Member

I think we always have this issue as we allow an author to use an IRI directly in the key position. That way expansion and compaction always have to deal with the above described scenarios. Example:

{
  "@context":
  {
    "a": {"@id": "http://example.com/foo", "@type": "xsd:integer"},
  },
  "a": "1",
  "http://example.com/foo": "1"
}

So I think we should try to handle this complexity and eliminate all ambiguity instead of restricting the syntax.

@lanthaler
Copy link
Member

The intent for how IRI conflicts are resolved when compacting/expanding: any conflicts between terms that use the same IRI will use the most specific solution (considers both @type and @container) when compacting (for example, when compacting "foo": "5" and having to pick between a term that specifies "xsd:integer" as the type and one that doesn't, the one that specifies "xsd:integer" is selected). If there is no solution that is more specific than the other, then a lexicographical comparison is made between the terms in the @context and the lexicographically least term and it's associated @type and other information is used to expand the data.

When expanding multiple keys that resolve to the same IRI, the expanded value will have all of the values associated with the IRI merged into a single JSON array (the order of the values in the resulting JSON array is undefined).

lanthaler added a commit that referenced this issue Mar 18, 2012
Added an example which shows how compact IRIs can be used in a context.

This addresses #74. I'll leave the issue open as long as the API spec hasn't been updated.
@msporny
Copy link
Member Author

msporny commented Mar 19, 2012

More on this discussion here: http://json-ld.org/minutes/2012-03-06/#topic-3

@dlongley
Copy link
Member

I would be ok with Markus' suggestion of choosing the most specific context definition (the one with the most @type/@container attributes) and if there isn't one, the lexicographically least term. Although, we might want to pick the shortest term before checking for the lexicographically least one. I believe this makes the most sense in a "compaction" algorithm.

@dlongley
Copy link
Member

I think the algorithm for compacting a property IRI should be modified to be something like this:

Inputs:
active context,
active property (full IRI),
type (the @type for the associated value or null),
language (the @language for the associated value or null),
container (the @container for the associated value or null)

Create an empty list X of compaction choices.
For each term entry in the active context where there is a matching IRI:
  If type is not null and container is not null:
    If the entry has a matching @type and @container, add it to X.
  Otherwise, if type is not null:
    If the entry has a matching @type, add it to X.
  Otherwise, if language is not null:
    If the entry has a matching @language, add it to X.
  Otherwise add the entry to X.
If X is non-empty, sort it first by shortest string length and then by lexicographically least string.
  Return the first entry as the compact form of the IRI.
Otherwise repeat the above for each prefix (CURIE) entry in the active context.
If there is no term or prefix match (the compacted result is equal to the IRI):
  If container is not null, recurse with container set to null.
  Otherwise if type is not null, recurse with type set to null.
  Otherwise if language is not null, recurse with language set to null.
  Otherwise return the result.

The processors I work on do keyword aliasing at the same time (via the same compaction method) -- and an alias is picked using the same sorting operation of shortest string and then lexicographically least string (and there is obviously no need to check @type, @language, or @container). I believe my context processing handling is a little different from the algorithm in the spec, but if this isn't already there, we should add mappings of keywords to arrays of aliases (and sort them) during the context processing step.

@gkellogg
Copy link
Member

I think this mostly works, but we also need to check for variations on @language and @container..

Otherwise, if language is not null and container is not null:
  If the entry has a matching @language and @container, add it to X.

Also, note that an entry might not have a @language, but the context has @language and a term definition exists where @language: null; we would want to use that term in this case. We'd need to fold that interpretation in as well.

@dlongley
Copy link
Member

Yeah, I was just coming back in to comment on that. We also need to check on the existence of just @container.

@dlongley
Copy link
Member

Actually, there are a few more cases that need to be covered ... and I don't think the algorithm was working properly for a few reasons. Anyway, here's another attempt at it that is no longer recursive and maybe covers all the cases? I don't recall what we do with plain string literals ... but we might need to tweak something for them. I think we also might have to start passing the parent container (or whether or not it's a @list) when we recurse in the compaction algorithm. We were thinking of doing this anyway to ensure we throw exceptions for lists of @lists.

Inputs:
active context,
active property (full IRI),
value (the associated value),
container (the @container for the associated value or null)

Set an integer 'bestMatch' to 0.
Create an empty list 'X' of compaction choices.
For each term entry in the active context where there is a matching IRI:
  // container with type or language
  If the entry has a matching @container (can both be null) and
  the value has a matching @type OR the value has no @type and
  the entry has a @language that matches (null matches none):
    If bestMatch is less than 3:
      Clear X and set bestMatch to 3.
    Add term to X.
  Otherwise, if bestMatch is less than 3:
    // no container with type or language
    If the entry has no @container and the value has a matching
    @type OR the value has no @type and the entry has a @language
    that matches (null matches none):
      If bestMatch is less than 2:
        Clear X and set bestMatch to 2.
      Add term to X.
  Otherwise, if bestMatch is less than 2:
    // container with no type or language
    If the entry has a matching @container (can both be null) and
    no @type and no @language:
      If bestMatch is less than 1:
        Clear X and set bestMatch to 1.
      Add term to X.
  Otherwise, if bestMatch is less than 1:
    // no container, no type, no language
    If the entry has no @container, no type, and no @language:
      Add term to X.
If X is non-empty, sort it first by shortest string length and then by lexicographically least string.
  Return the first entry as the compact form of the IRI.
Otherwise repeat the above for each prefix (CURIE) entry in the active context.
Return the result.

@dlongley
Copy link
Member

We will also need to permit matching of @container type @set to a null container to get the most expected behavior, IMO. Also, the "Otherwise" lines within the for loop should be nested. We can clean it up if people agree that this is the right way to go. We will also need to discuss in greater detail how we want to handle @lists and the ambiguities that arise from values that aren't in lists that match the same term that a list does. (Do we just concatenate, throw an exception? etc.)

@lanthaler
Copy link
Member

PROPOSAL: In IRI compaction for each term mapped to the input IRI a term rank is calculated depending on the @type, @language, and @container mappings for the term matching the value of the property to compact. The highest ranked term is chosen. If two terms have the same rank, the lexicographically least is selected.

@gkellogg
Copy link
Member

The algorithm described and implemented does a bit more than this proposal suggests. There are more sophisticated selection criteria based on if it's a list or not and how to consider compact iris. Also, the selection of terms of equal rank looks for the shortest match before the lexographically first.

@lanthaler
Copy link
Member

Any suggestion how we could formulate that in a short proposal? I thought the one I put up is already specific enough to get consensus.

@gkellogg
Copy link
Member

If necessary, we could enter the body of the current algorithm as the proposed resolution, or just resolve that to accept the current spec text.

@lanthaler
Copy link
Member

I would like to first upgrade my processor to the current spec before +1'ing on the exact algorithm. I think the proposal describes the current spec well enough to accept the current algorithm.

@msporny
Copy link
Member Author

msporny commented Apr 29, 2012

I'd rather we vote on the spec text than the proposal Markus put in here - namely because we've gotten tripped up on this multiple times, it's not very easy to see all of the "moving parts", and because having the text in front of you is easier than spending time discussing whether or not we've captured everything on the call.

@lanthaler - what's missing from the current spec text that your proposal addresses?

@lanthaler
Copy link
Member

The algo is much more complex and I haven't implemented it yet so I don't understand it's full consequences yet. I think it's easier to agree on what we are trying to achieve, the exact algorithm is a consequence thereof (and depends on many more details).

@dlongley
Copy link
Member

I'm ok with voting on the spec text since it's the very specific solution to the problem. Perhaps we can come up with some proposal text that broadly outlines the goal but then says that we think the spec text accomplishes the specifics, if this would be helpful/a good compromise.

@lanthaler
Copy link
Member

I started updating my processor to the current spec but I'm not done yet. I think the current spec is not complete so I would really not like to vote on it (yet).

@lanthaler
Copy link
Member

I'm closing this issue as I created a new one (#113) to agree on how compaction is supposed to work in detail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants