-
Notifications
You must be signed in to change notification settings - Fork 157
Define exactly how (IRI) compaction is supposed to work #113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
We'll have to look more closely at what happens between the two algorithms to see what the different outputs would be. I don't think that we should only consider "what removes the most expanded forms or IRIs" without considering the semantics behind what's going on. In other words, if given the option of compacting to a term with a type that doesn't match, or compacting to a prefix with a type that does or with no type at all, I think the prefix would be preferred. I think an application would be expecting to find only things of type X in a term with a coercion of type X. In my view, if they didn't care what the type was, they wouldn't have specified it along with the term. The same principle applies to language. Also, we don't want to break up @lists, I don't think. So we should be careful there. If we agree that lists shouldn't be broken up, then there is something to be said about picking the term that results in the fewest number of expanded forms there; however, that might result in an overly complex algorithm (as opposed to what's in the spec now) with little gain, IMO. |
For example, when compacting with this input:
And this context:
I think the output ought to be:
Perhaps the example seems even more obvious when using a language:
And this context:
I think the output ought to be:
Whereas, unless I'm mistaken, the algorithm above would result in:
Which I don't think is preferable. I think when you include a @type or a @language along with your term or prefix, you expect only those things with matching types or languages to be present. If you specifically say you want a null language, then I think you should get anything without a language as well. |
In my implementation the prefix with the matching type would be chosen but if it doesn't, the term would.
This would be a reasonable design choice and I wouldn't be opposed to change my algorithm to do that. The question remains though if we should (mis-)use term definitions (i.e., terms that have type/language/container set) as prefixes in that case.
Agree.
Please note that I do NOT break up lists. I don't think that would even be possible without breaking the data.
I think the algorithm I outlined above is simple (and understandable) enough to achieve that. A related question is what we do with properties in the input document that have multiple lists (sets of lists). Would it be OK to compact the first list to a property and put the others in other properties? Should all be put in a property without list-container? Should an exception be thrown? |
Regarding your example, that's true. At the moment I prefer terms but I'm more than willing to change that behavior. Please note that also the following would be a valid output that would be even compacter (I'm not advocating this!):
The question remains though what should happen if no prefix exists. Should we keep the full IRI or use the term then?
And what happens with terms that fall back to the default language?
Would that be correct or should the full IRI be used instead due to the (implicit) language mismatch? |
I don't think we should do empty suffixes, it seems odd to me -- so I can see others considering it unexpected behavior. I'll have to think more about what should happen with a default language. I can see it going either way right now, which means we might just go with whatever is easiest (probably what we currently have ... which is to reject the term because of the implicit language mismatch). I'm not sold either way at the moment, so maybe we need a few more examples that could highlight whether one way would seem more unnatural than the other. |
Agree, empty suffixes seem odd to me as well. I've updated my term rank implementation to not simply check if a expanded object form can be eliminated but to check if the term's type/language match. So my IRI compaction algorithm still looks like this:
But the term rank calculation for a term (could also be a (compact) IRI) has to be changed to:
The type/language match check is also quite simple:
This code passes all the compaction/expansion tests we currently have: I think these algorithms are much easier to understand and to implement. If you are OK with it, I'll go ahead and update the spec accordingly. |
Hmm, I wouldn't update the spec just yet; I'm not sure exactly how different the two algorithms we have right now are or which one is actually simpler. I'll take a closer look when I get a chance. Any thoughts, @gkellogg? |
I'm not sure it's simpler either, just different. And, don't have time to re-implement right now anyway, so I'll go with the flow for now. The current algorithm is more strict on container-set and container-list; it seems like this one might allow some cross-over. I think there's value in being deterministic in going straight to a CURIE or IRI if there's no exact term match. The main difference is requiring all @list members to have a type/language match, rather than the average of them. |
No, it won't cross over except if the full IRI is coerced to a list and no prefixes are defined...bBut you have no choice then anyway.
Don't understand what you are trying to say with this.
Both algorithms calculate the sum of the members. Mine uses +1 for matches and -1 for mismatches whereas the yours adds something between 0 and 3 for every term - which I found quite difficult to reproduce on a sheet of paper e.g. |
At this point, I'd say just do what you want. I don't really have the energy to continue debating such changes. Certainly, if algorithms that we're using can be simplified, and still do the same thing, we should do this. However, it seems that the trend is for everyone to take the algorithms and instead of implementing them, come up with their own thing. Perhaps we'd be better to focus on areas of the spec that are not done, rather than re-working those that are complete with multiple implementations. |
There have been some further discussions in the 2012-09-18 telecon which yielded to the following action:
|
I spoke with @dlongley about this as he updated the JavaScript, PHP, and Python implementations last week based on the latest text in the spec. He did not agree that we should try to re-write the term ranking algorithm because the original approach he used when writing the IRI compaction algorithm was to do multiple rounds. He said that there are so many conditionals that you end up having a ton of rounds and the term ranking algorithm ends up being much easier than doing multiple rounds. So, at this point, I agree with him as the re-write I proposed would end up further complicating the spec text. |
Issue #172 is related to this. |
PROPOSAL 1: Clarify parts of the IRI compaction algorithm that need to change, but do not change the algorithm in any large way as it works and has been implemented by two different people. PROPOSAL 2: Adopt Markus' proposed algorithm above for the IRI compaction algorithm. |
PROPOSAL 1: +1 |
1 similar comment
PROPOSAL 1: +1 |
What does this mean? Which parts can be discussed/clarified? Term ranking apparently not and that's the meat of the algorithm. |
As I just found out, the one of the two implementations you mentioned is buggy as it collapses multiple lists into one: http://bit.ly/UQG2C7 |
PROPOSAL 1: +1 |
RESOLVED: When compacting lists, the most specific term that matches all of the elements in the list, taking into account the default language, must be selected. |
I've updated all algorithms, unless I hear objections I will close this issue in 24 hours. |
In the context of issue #74 we were already discussing how compaction should work and came to the conclusion that:
We didn't go any further in describing it more specifically. This lead to the situation that Gregg and Dave implemented the algorithm completely different than I did. My strategy is to choose the term that eliminates the most expanded object forms (in lists, for arrays, every value is compacted separately now). If more terms achieve the same ranking according that criteria, the lexicographically least term is chosen.
Edit: I updated my algorithms based on the feedback, please read the algorithms in the comments below instead!
My IRI compaction algorithm looks like this:
The term rank calculation for a term (could also be a (compact) IRI) and a value works as follows:
You can find the full implementation in Processor.php of JsonLD.
I think this algorithm is much easier to implement and much easier to understand than the current algorithm in the spec so I would propose to change it. The only open issue is whether a term with a mismatching type will ever match.
Consider for example the following:
Should this be compacted to
or not?
Another question is whether we compact to empty suffixes, so the example above could also be compacted to
"term1:"
(note that the suffix is empty which would drop the type coercion).The text was updated successfully, but these errors were encountered: