Description
I've been working on a generic browser concept based on JSON Reference. It's still very early stages, but I think the model I came up with is a candidate for a solution to the issues JSON Schema has with $id
. I've been meaning to share this for a while, but have been hesitating due to uncertainty about how to present it. I've finally decided that getting something out there is better than nothing, so I'm presenting this brief overview and I'll let any questions drive any discussion.
I've found this model easy and efficient to implement. It has strong parallels to existing web constructs. It simplifies the concepts without loosing anything of value.
One of the goals of this model is to fully decouple JSON Pointer, JSON Reference, and JSON Schema. Each can be implemented independently of one another. I wrote a JSON Schema-ish validation proof of concept that builds on JSON Reference (rather than JSON). This implementation has full support for
$ref
/$id
without dedicating a single line of code to supporting it.
JSON Reference for JSON Schema Implementors
The features of JSON Reference are very similar to the features of $ref
and
$id
in JSON Schema. However, the concepts are slightly different and the
keywords are slightly more constrained (in a good way) than their JSON Schema
counterparts.
Documents vs Values
All JSON Reference documents have a "value". The fragment part of the document's
URI identifies the portion of the document that is considered the "value" of the
document.
If the fragment is empty, the value of the document is the whole document.
If the fragment starts with a /
the fragment is interpreted as a JSON Pointer
and the value of the document is the document resolved against the JSON Pointer.
If the fragment is not a JSON Pointer, then it's an anchor fragment. The
$anchor
keyword provides a label that marks a portion of the document. Given
an anchor fragment, the value of the document is the portion of the document
identified by an $anchor
keyword that matches the anchor fragment.
The value of a document whose URI fragment does not point to a valid part of the
document is undefined. Implementations must not cross document boundaries in
attempt to resolve a fragment.
$ref
indicates an embedded document
$ref
indicates a document to be embedded. It's analogous to an <iframe/>
in
an HTML document. Even $ref
s that point to the current document are embedded
documents. Notice that the entire document is embedded, not just the value of
the document. However, a user agent that encounters an embedded document should
use the value of the document. It's necessary to embed the entire document in
order to properly handle any $ref
within the embedded document.
$id
is an embedded $ref
An $id
indicates an inlined $ref
. This is similar to using the HTTP/2 push
feature to send the document identified by the src
attribute of an <iframe>
.
It's just a network optimization for a $ref
. This means that unlike JSON
Schema, an $id
can have a fragment and that fragment is meaningful.
$anchor
is not an embedded document
$anchor
provides a way to have a path-independent way to identify a
document's value without creating a document boundary.
Activity
jdesrosiers commentedon Mar 1, 2019
I'll be posting a walk-through of how I derived this model within a few days. If the above isn't clear, you might want to wait until my next post before asking questions.
NOTE: This is not a proposal, it's a description of what I'm doing. Any changes to JSON Schema's model should be proposed as separate issues. This issue is just for clarifying this model.
handrews commentedon Mar 1, 2019
Thanks, @jdesrosiers . Whether it becomes a proposal or not it is great to have this for a reference for discussion.
All: Let's limit discussion here to clarifying questions on this idea, and I recommend waiting on that until @jdesrosiers has posted his walkthrough. If this starts to turn into a debate, I'll lock this (but since I'm the person mostly likely to mess that up... um... 🤣 )
jdesrosiers commentedon Mar 10, 2019
Deriving JSON Reference
JSON
We start with JSON.
http://example.com/example1
Now we need a way to retrieve that JSON. We'll use URIs as identifiers.
We will assume that
Data.fetch
will retrieve a document by URI from wherever you store data. It could be on the network, the filesystem, or in memory. It doesn't matter. For simplicity, we will also assume this is a synchronous operation.Document Value
In HTML, a URI fragment changes the view point for the document. It doesn't identify the document in any way. It only changes the way the browser presents the document. For JSON Reference, we want to use a similar concept of changing the view point of the document. When working with data, it makes sense to consider a fragment to be some portion of the data. I call this the "value" of the document. Let's add support for a JSON Reference fragment. The value of the document will be the JSON Pointer fragment applied to the document.
We've modified our document representation to envelope the data so we can also store metadata like the JSON Pointer from the URI fragment. Then we added a
value
function to return the value with respect to the JSON Pointer. We can now get the value of any part of the document with just a URI. Finally, we added thecontextDoc
so we can use relative URIs and thus make it easier to use. We can also use thecontextDoc
to avoid a fetch and parse if the document is the same and only the fragment changes, but I left that optimization out this time.$ref
Next we'll add support for references.
http://example.com/example2
We've modified our
get
function so that if the value of the document is a reference, then we get and return the referenced document instead. Now, when implementing JSON Schema, instead of working with JSON data directly, we can useget
andvalue
on the document. An entire JSON Schema implementation can be written like this without any concern for$ref
. It just works. A few generic helper functions can make it nearly as natural to write code with JSON Reference as it is to work with normal data. Here's one example.$id
I wasn't going to support
$id
at first mostly because I've never found use for it. But, then I realized that I could define$id
to be nothing more than an inlined$ref
. That model is so simple that it could be added to our implementation in just a few lines, so why not.http://example.com/example3
If you aren't familiar with the
reviver
function ofJSON.parse
, it allows you to modify JSON as it's being parsed. We are using it here to identify any documents embedded with$id
, replace them with their$ref
versions, and add them to whereverData.fetch
gets data by usingData.add
.With just these few lines of code, we now have full support for
$id
. A JSON Schema implementation built withget
andvalue
doesn't have to do anything special to support$id
.$anchor
Location-independent identifiers is another JSON Schema feature I wasn't going to support, but changed my mind. Unlike the JSON Schema version of
$id
, we can't use$id
for this feature.$id
is necessarily a document boundary. We don't want to create an embedded document just to mark a location for easy reference. We need a new keyword that is nothing but a label. I've called that nleew keyword$anchor
.A plain name can be used in the URI fragment instead of a JSON Pointer. If the fragment is a plain name, it refers to a location within the document with an
$anchor
whose value matches the fragment.http://example.com/example4
Now we add to our
reviver
function to identify$anchors
and their locations in the document. We store those anchor-to-pointer mappings in the document metadata for use in resolving anchor fragments. If the fragment is an anchor, we use the pointer associated with that anchor and then use the pointer just as we did before.Note: The arguments of the
reviver
function are actuallykey
andvalue
rather thanpointer
andvalue
. A slightly more sophisticated parser is necessary to parse$anchors
in JavaScript. For simplicity, I'm pretending that the necessary functionality exists inJSON.parse
.jdesrosiers commentedon Mar 10, 2019
I hope this illustrates how simple and consistent this model is by how easy it is to implement. I hope you find it useful. I would appreciate any feedback or questions anyone has.
@handrews @awwright @Relequestual @johandorland @KayEss @epoberezkin
handrews commentedon Mar 10, 2019
@jdesrosiers while I agree that this systems is consistent within itself and supports an elegant implementation, it is not compatible with how URIs actually work.
You assert:
The first sentence is true in the sense that, when a browser navigates to a URI (due to clicking on an
<a href="#foo">...</a>
element), the fragment is used to navigate to a location within the document. But that is not because of the nature of fragments. It absolutely does not imply that "[the fragment] doesn't identify the document in any way."Per RFC 3986 (emphasis mine):
URIs are used in two ways: identification and navigation.
Identification involves associating a URI with all or (when using fragments) part of a resource. Directly giving the URI is one way to perform such identification, which JSON Schema currently does with
"$id": "#foo"
. Another way is for a media type to have a way to set part of the URI indirectly. JSON Schema does this by the structure of the document (JSON Pointer fragments). HTML uses its own 'id' element to indirectly create a usable fragment.You have conflated these things. You notably omitted from this explanation the case of
$id
with a fragment, which is the problematic case (otherwise, this all works fine).In examples elsewhere, though, you used:
equivalent to these two documents together:
I believe this works as you stated before with the code you have given here.
$ref
is a navigation element. We all agree that for the non-fragment part, you retrieve the doucment with that base URI. We also all agree that when evaluating$ref
, you apply the fragment to the retrieved document to get the schema object that is the target of the reference.In order to make your system work, you have changed
$id
from being solely an identification element to being a hybrid identification + navigation element. It identifies the base URI, but then you have it apply the fragment portion the same way that it is applied when evaluating$ref
.That is just not how URIs work. It's either identification or navigation. It's not both.
$id
conform to RFC 3986 suggestion for base URI elements #729jdesrosiers commentedon Mar 10, 2019
This model uses URIs exactly the same way HTML does. If this is incompatible with URI then so is HTML. When I read what you quoted from the specification, I definitely see how my choice of words was misleading. I'll try to come up with a better way to explain it. My point is that the response you get from a server will always be the same regardless of the fragment. When I retrieve an HTML document with a URI that includes a fragment, I get the entire HTML document, not just the bit identified by the fragment. Correct me if I'm wrong, but I don't think that's disputed.
I have certainly have changed the semantics of
$id
, but I've changed it from being an identification element to being a navigation element. There is no hybridization. The whole concept of base URI change does not exist in this model. It's just documents and document-values.$id
always means the same thing as$ref
except that it saves you a fetch.I'm not sure I agree with or even fully understand your distinction between identification and navigation, but hopefully this clarification makes it unnecessary to go down that road.
I actually left that out on purpose as a way to gauge whether people where understanding what I am presenting. If they understood the implications of the change, this question should come up. However, there is nothing problematic about this case. It is completely consistent with the model and existing web standards. Hopefully the above clarification of
$id
having exactly the same semantics as$ref
clarifies this as well.handrews commentedon Mar 11, 2019
No, you have not made
$id
a navigation element. You are using it to identify part of the singleapplication/json
document as an embedded but logically separateapplication/schema+json
document. That is not navigation.You're not wrong, but that is navigation (resolving the URL and pointing the client at the resource it identifies, and then applying the fragment within the client as opposed to sending the fragment to the server). That is not what is happening with
$id
, which as an embedded document identifier is identifying, not navigating.Look at the difference between
<base>
(identification) and<a>
(navigation). In HTML 4.0.1 fragments are forbidden in<base>
. In HTML5 they are not mentioned, but it is made clear that the value is used as an absolute URI, which means fragments must be ignored.I don't see anywhere in HTML that behaves the way you assert.
Also, you can't just wave away the concept of base URI. Your
$ref
values have to resolve against something, and that is the URI of the document in which they are found. Which is a base URI whether you call it that or not.I'm going to stop here and see if anyone else wants to advocate for this proposal. I am against it, but let's see what others say.
handrews commentedon Mar 11, 2019
@jdesrosiers I guess if you define
$id
as some sort of internal navigation, you could call it navigation? But without dealing with base URIs how do you resolve$ref
s in subschemas of subschemas that have an$id
? From your examples it seems to be relative to the$id
which means$id
does set the base URI for the document.(yeah, I know I said I would stop commenting, but I'm really trying to understand still)
jdesrosiers commentedon Mar 11, 2019
@handrews
I don't know how we can have a meaningful conversation when you refuse to accept that the things that I define mean what I say they mean. I know you're trying, but I don't see where to go from here. I'll try to describe it in a different way, but until we find some common ground, I'm sure it won't make any difference.
$id
is a poor man's HTTP/2 push. It sends the document along in anticipation that you're going to request it later. Notice that the implementation I presented above does exactly that. It finds embedded documents marked with$id
, stores them in the document store, and replaces them with their equivalent$ref
. The implementation then deals exclusively with$ref
. Once the document has been parsed, there are no longer any$id
s.Maybe an example will help.
http://example.com/foo
Notice that fetching
http://example/foo#/aaa
takes several hops before it gets to where it ends up. The "value" of the document it finds is{"$ref":"/bar#/bbb"}
, which triggers navigation tohttp://example.com/bar#/bbb
, whose value is{"$ref":"#/ccc"}
, which triggers navigation tohttp://example.com/bar#/ccc
, whose value is"ddd"
and that's where it ends up.This is the primary benefit of considering
$id
and$ref
equivalent. We can convert all$id
s to$ref
s and the implementation only ever has to know how to navigate from one document to another using$ref
. The concept of base URI change is side stepped entirely by turning it into a standard run-of-the-mill navigation operation. You are always only working with one document at a time and the base URI is always the URI of that document. The base URI only changes when you navigate to another document.KayEss commentedon Mar 11, 2019
I've been trying to work out what the differences with the current behaviour are. It seems to me that they are:
$ref
works as per draft v7, but the behaviour for draft v8 doesn't look possible. Or at least, it would require very different handling.$id
boundary will no longer workThere might be other more subtle differences, but those two stand out.
handrews commentedon Mar 11, 2019
@jdesrosiers OK so you are changing the base URI.
The initial base is
http://example.com/foo
You resolve
"/bar#/bbb
agains that to gethttp://example.com/bar#/bbb
And the
"#/ccc
is resolved against that to gethttp://example.com/bar#/ccc
I mean, maybe you don't call it a base URI but that's what it looks like to me.
Regardless, I find this very confusing- the behavior of
$id
you propose is much harder to explain, and I'm still not convinced that I could sell it as RFC 3986-compliant. Maybe you could, but I'm not seeing it. There are intermediate processing steps of converting to a$ref
before using it, and I don't even entirely see why that is valuable.I understand embedding a document, and
$ref
-ing into that document, but I don't understand why the act of embedding it needs to produce a value. That is just not something that I'm finding compelling. And in fact is still confusing despite following all your steps (I think) and agreeing that, as a self contained thing, it is quite elegant.But I don't think it makes the conceptual model in JSON Schema easier. The
$id
behavior really just feels like doing everything possible to translate it to something else, which produces unintuitive behavior. It's no longer an id, really.ucarion commentedon Mar 11, 2019
Perhaps the test of whether this is truly lightening the cognitive load is to attempt to summarize -- with words, not code -- what is going on under this proposal? The simplicity of the proposal can then be judged by the merits of its terseness and clarity.
One question I have is how such a proposal, which appears to be centered upon the notion of the implicit inlining of resources, would work in the context of error reporting? Validators producing standardized errors will need to be able to know "which" schema they're currently in, so the notion of following a
$ref
cannot be made completely transparent.jdesrosiers commentedon Mar 12, 2019
@handrews
Of course there's a base URI. I'm not claiming there isn't. What I'm saying is that there is always only one base URI per document. If it's necessary to change the base URI, you have to navigate to a different document. I know I'm repeating myself, but I don't know how else to say it.
Yes! This is the other thing I was hoping people would realize. The only way I found to salvage
$id
was to change it's nature. It serves almost every case$id
serves in JSON Schema and some more, but it doesn't really mean the same thing it used to. The biggest implication of this is what it means for$id
at the root of the document. Under this model, putting an$id
at the root of a document is the same as having a document that is nothing but a$ref
. It would work in most cases, but it would effectively block anyone from referencing a fragment of the document (rather than the whole document). I can see reasons why this restriction could be beneficial, but it's certainly surprising if you are used the way JSON Schema works. If this model were to be adapted for JSON Schema, you'd probably want to add another keyword ($self
?) for identifying the document. Or, keep$id
as a document identifier restricted to the root and rename this version of$id
($embedded
?).The intermediate processing step is optional. You could handle
$id
more directly. It doesn't change much. There are a couple of things you get by replacing$id
s with$ref
s$id
.$ref
rather than$ref
and$id
, you simplify the implementation making it faster (fewer branches to check) and have less potential for bugs (less code).@KayEss
I think that's correct. I haven't put much thought into the full implications, but at the least, it complicates things. I'm glad you called that out.
There are two major ones. @handrews identified them almost immediately. The first is the meaning of fragments in
$id
s. See the example @handrews posted above. The second is the$id
-is-no-longer-an-id issue I just addressed.@ucarion
I thought that's what I did in the description of this issue? Clearly this isn't as straightforward as I thought. I'm biased of course, but I think if you took someone who didn't already understand the nuances of the way
$id
/$ref
works in JSON Schema and asked them to implement both the JSON Reference and the JSON Schema versions of$id
/$ref
, I'd bet they would find JSON Reference easier to understand and to implement.I'm not sure I understand the question. How would you not know which schema you are currently in? Nothing about how
$ref
works has changed. To be honest, I haven't fully implemented error handing in my not-quite-JSON-Schema implementation, so I could be missing something, but I don't see any potential issues.handrews commentedon Apr 1, 2019
@jdesrosiers while this is great work and makes sense for your standalone project, and I have shamelessly lifted elements into separate issues, I don't see support building for the more drastic changes to the nature of
$id
, plus we don't really want to reverse the change in behavior for$ref
with respect to adjacent keywords. That change got an unusually large number of people expressing support (if mostly in the form of thums-ups) when we discussed it.Unless a surge of people advocating for this show up in the next week, I think it's best that we close this.
jdesrosiers commentedon Apr 5, 2019
I know you're joking, but there's nothing shameless about it! This is exactly how I expected this to go down (maybe better than expected). And not worry, I'll wait until draft-08 is out before rocking the boat again 😉