Skip to content

"$id": Eliminate base URI shadowing #726

Closed
@handrews

Description

@handrews

"URI shadowing" (h/t to @johandorland for the term) refers to the scenario when there are multiple $ids in a schema, with at least one $id in a subschema of another object containing $id (such a the root schema object).

When did this become a thing?

Here is the schema and various resolved URI examples, copied and pasted directly from the current spec:

{
       "$id": "http://example.com/root.json",
       "definitions": {
           "A": { "$id": "#foo" },
           "B": {
               "$id": "other.json",
               "definitions": {
                   "X": { "$id": "#bar" },
                   "Y": { "$id": "t/inner.json" }
               }
           },
           "C": {
               "$id": "urn:uuid:ee564b8a-7a87-4125-8c96-e9f123d6766f"
           }
       }
   }

   The schemas at the following URI-encoded JSON Pointers [RFC6901]
   (relative to the root schema) have the following base URIs, and are
   identifiable by any listed URI in accordance with Section 5 above:

   # (document root)

         http://example.com/root.json

         http://example.com/root.json#

   #/definitions/A

         http://example.com/root.json#foo

         http://example.com/root.json#/definitions/A

   #/definitions/B

         http://example.com/other.json

         http://example.com/other.json#

         http://example.com/root.json#/definitions/B

   #/definitions/B/definitions/X

         http://example.com/other.json#bar

         http://example.com/other.json#/definitions/X

         http://example.com/root.json#/definitions/B/definitions/X

   #/definitions/B/definitions/Y

         http://example.com/t/inner.json

         http://example.com/t/inner.json#

         http://example.com/other.json#/definitions/Y

         http://example.com/root.json#/definitions/B/definitions/Y

   #/definitions/C

         urn:uuid:ee564b8a-7a87-4125-8c96-e9f123d6766f

         urn:uuid:ee564b8a-7a87-4125-8c96-e9f123d6766f#

         http://example.com/root.json#/definitions/C

Notice that all of the locations (the list section headers) are done as JSON Pointer fragments, relative to the root of the entire example schema.

Prior to draft-handrews-json-schema-01, the analogous example section did not show any examples with JSON Pointer fragments. Using the exact same schema, it showed these resolved URIs only:

   # (document root)  http://example.com/root.json#

   #/definitions/A  http://example.com/root.json#foo

   #/definitions/B  http://example.com/other.json

   #/definitions/B/definitions/X  http://example.com/other.json#bar

   #/definitions/B/definitions/Y  http://example.com/t/inner.json

   #/definitions/C  urn:uuid:ee564b8a-7a87-4125-8c96-e9f123d6766f

But you can see that the locations were done the same way- as a relative JSON Pointer fragment from the overall document root.

The key point here is that we added examples of URI shadowing in handrews-*-01, which was a clarification of draft-07 (draft-07 was originally published as handrews-*00). So we didn't even introduce these with draft-07. There was some confusion over it (and other aspects of $id and $ref) and we added the examples when we clarified the text.

Why did we do this?

Good question. The change was made in PR #550, which was primarily about removing the terminology around "internal references" vs "external references". Which had itself been a great improvement by @awwright over the draft-04 language around "inline resolution" and "canonical resolution". None of these prior approaches directly explained how JSON Pointer fragments work across the presence of $id.

The issue mentioned by that PR only mentions JSON Pointer fragments once, at the end, after the PR was submitted: #545 (comment)

I know I had questions over how that should be handled when I tried implementing what were then the new draft-06 proposals. And I know that I ended up implementing URI shadowing. I don't remember why.

Quite a few people weighed in on and reviewed #545 and #550, so this isn't something that slipped in by accident.

Did anyone rely on this before handrews-*-01?

No clue.

Did anyone actually implement what's in handrews-*-01?

I think @johandorland did? He commented on the issue/pr. Not sure if anyone else did. @Julian?

Does the test suite test this?

No.

So, really, why did we do this?

I think the rationale was that since the locations in the existing example were given as JSON Pointer fragments, that must mean that those fragments were valid, which would definitely imply that every parent $id introduces a base URI that must be tracked throughout all subschemas, even if another $id appears.

Can we just not?

I think so. JSON Pointer fragment evaluation is just defined in terms of JSON document structure, without any thought given to changing base URIs or embedding one document in another. But media types get to specify fragment syntax and semantics, so I think we can reasonably say that it stops when the base URI is reset, and from that point on, the prior base URI no longer applies at all.

That is actually how we interpret plain name fragments. Note in this part of the example:

   #/definitions/B/definitions/X

         http://example.com/other.json#bar

         http://example.com/other.json#/definitions/X

         http://example.com/root.json#/definitions/B/definitions/X

the #bar plain name fragment can be used with the innermost base http://example.com/other.json, but not with the outer base http://example.com/root.json.

So restricting the use of JSON Pointer fragments to not cross $id boundaries would make their behavior more consistent with how we handle plain name fragments.

Similar to #724, this considers $id to establish a document boundary, with the new base URI it creates applying to the document within that boundary. Unlike #724, what is proposed here does not make any other changes to $id's behavior. In particular, the effect of an $id with a JSON Pointer fragment remains undefined, although we can address that separately later if we want to.

So what would this look like?

There are two options WITHIN THE SCOPE OF THIS ISSUE. If you want to talk about other options, file them yourself :-) Comments that go off-topic here will be deleted.

Here's the example schema again so you don't have to scroll so much:

{
       "$id": "http://example.com/root.json",
       "definitions": {
           "A": { "$id": "#foo" },
           "B": {
               "$id": "other.json",
               "definitions": {
                   "X": { "$id": "#bar" },
                   "Y": { "$id": "t/inner.json" }
               }
           },
           "C": {
               "$id": "urn:uuid:ee564b8a-7a87-4125-8c96-e9f123d6766f"
           }
       }
   }
}

Option 1: Exactly one valid URI involving a JSON Pointer fragment for each location:

   # (document root)

         http://example.com/root.json

         http://example.com/root.json#

   #/definitions/A

         http://example.com/root.json#foo

         http://example.com/root.json#/definitions/A

   #/definitions/B

         http://example.com/other.json

         http://example.com/other.json#

   #/definitions/B/definitions/X

         http://example.com/other.json#bar

         http://example.com/other.json#/definitions/X

   #/definitions/B/definitions/Y

         http://example.com/t/inner.json

         http://example.com/t/inner.json#

   #/definitions/C

         urn:uuid:ee564b8a-7a87-4125-8c96-e9f123d6766f

         urn:uuid:ee564b8a-7a87-4125-8c96-e9f123d6766f#

This is a lot more straightforward, and all of these addresses work whether the schemas were loaded as one document (as in this example), or if each of root.json, other.json, and inner.json had been loaded separately, and $ref-ed each other.

Option 2: But what about the URI used to fetch the document?

If it's possible to retrieve this example from http://example.com/alias.json (as opposed to or in addition to its declared $id of http://example.com/root.json), then should fragments also work from that base?

I can see it either way.

One option is to consider fetching this from alias.json and discovering that its declared $id is root.json is to conceptualize this as a redirect, in which case the fragment is applied to the redirect target (at least in HTTP). h/t to @jdesrosiers for this mental model. In this sense, anywhere we have root.json we would also require supporting alias.json

On the other hand, JSON Schema repeatedly talks about URIs as identifiers rather than locators, so I think we could rationally say that the $id is how the schema needs to be referenced. It's fine if it's fetched from elsewhere outside of the reference process (like being pre-loaded from a local filesystem) but does that mean that we should support those loading URIs when an $id is provided? (obviously if no $id, then the loading URI is the only available base).

Supporting the loading URI as a shadowed base, but nothing else, might be a bit odd, or it might be more intuitive. I really have no idea at this point 😆

Activity

added this to the draft-08 milestone on Mar 1, 2019
awwright

awwright commented on Mar 1, 2019

@awwright
Member

I'm not sure this improves schema authoring or writing implementations. As far as I can tell, this would mean I'd have to add a condition to implementations saying "Evaluate JSON Pointer fragments until an $id is encountered, so as to make anything under a subschema with $id inaccessible."

At that point, it seems like we just want to get rid of JSON Pointer fragments altogether.

If I've got this right, why would this be desirable?

handrews

handrews commented on Mar 1, 2019

@handrews
ContributorAuthor

@awwright I found having to manage a stack of in-scope base URIs and the current relative position to each to be one of the most complicated things to implement. But @johandorland can perhaps speak to a more real-world scenario.

handrews

handrews commented on Mar 1, 2019

@handrews
ContributorAuthor

@awwright so far what I have established with recent issues is that there are people who think every possible thing to do or not do with $id is horrible and obviously wrong, and no one particularly likes any of them more than barely tolerating the idea.

I should probably just ignore all comments/requests/complaints regarding $id, I guess. I have no idea at this point.

handrews

handrews commented on Mar 2, 2019

@handrews
ContributorAuthor

@awwright

At that point, it seems like we just want to get rid of JSON Pointer fragments altogether.

Do you mean in $id or just in general? Getting rid of fragments in $id was part of #719 but I filed this more modest version due in part to pushback there. Although I suppose I should re-file the full version for proper comparison, separated from the back-and-forth in #719.

If I've got this right, why would this be desirable?

The most important question, I think, is "was this ever intended to be a feature at all?"

We added it to examples just in the most recent raft for... um... reasons? Honestly I cannot tell why. It's not clear to me that it was ever intentionally made a feature.

I basically filed this issue because I think I made a mistake putting the examples in. Do you remember why that seemed like a good idea at the time?

johandorland

johandorland commented on Mar 2, 2019

@johandorland
Collaborator

The way I look at this shadowed base URIs are not really a feature, but more of an unintended consequence. In the past I've had a hard time explaining to people in the JSON schema slack that a single part of the schema can be addressed by an arbitrary number of $refs, based on how many levels of nested $ids there are.

Conceptually it'd be nice if every part of a schema only has a single URI it can be $reffed by. However even in your example this is not the case as http://example.com/other.json#bar also remains accessible by http://example.com/other.json#definitions/X as it has the same base URI. Even so as long as every part of the schema is only allowed to have one base URI it'd make it easier to parse the JSON by walking over the entire JSON structure and register its $reffable URI in a lookup map, so when you encounter a $ref in the second pass you know every valid $ref is in that lookup map.

@awwright has a very good point though. Currently the way I and I suspect most others to this first pass is to register all different base URIs contained in the JSON and when a $ref is encountered it looks at its base URI and then gets the proper JSON by using the JSON pointer fragment. It's kind of odd to say use JSON pointer for the fragment, but then amend it by saying it doesn't always work across $id boundaries.

So if anyone thinks Option 1 is a good idea, we should just say that the behaviour of shadowed base URIs is undefined. That way an implementation is a bit more free in how it wants to implement behind the scenes, but others do not have to go out of their way to explicitly disable it.

Even though I suggested this I'm not actually sure it's a good idea. Even though I don't like shadowed base URIs, the cure might be worse than the problem itself as I'm not sure how we disallow it in a proper manner without making it more complicated.

All I have to say about option 2 is that I don't get the mental model described, so I can't really comment on that.

handrews

handrews commented on Mar 2, 2019

@handrews
ContributorAuthor

@johandorland

It's kind of odd to say use JSON pointer for the fragment, but then amend it by saying it doesn't always work across $id boundaries.

I think it's odd that if I embed a document using $id, it suddenly becomes accessible by one or more sets of URIs that are not available if it is hosted separately.

However even in your example this is not the case as http://example.com/other.json#bar also remains accessible by http://example.com/other.json#definitions/X as it has the same base URI.

But that's intentional with a clear use case. There are two defined fragment syntaxes, and the plain name syntax is for labeling schemas as kind of analogous to exported function from a module. They are independent of the schema's location within the document, so you can refer to them from outside the document without worrying if the internal structure gets rearranged somehow.

Note that you cannot use plain name fragments across $id boundaries, and I still think that is the more sensible behavior, but 🤷‍♂️

jdesrosiers

jdesrosiers commented on Mar 3, 2019

@jdesrosiers
Member

👍 for Option 1

As far as I can tell, this would mean I'd have to add a condition to implementations saying "Evaluate JSON Pointer fragments until an $id is encountered, so as to make anything under a subschema with $id inaccessible."

I'd be ok with calling this behavior "undefined" as @johandorland suggests. We can let linters check for document boundary violations. It'd be no worse than what JSON does with calling the behavior of duplicate properties "undefined".

At that point, it seems like we just want to get rid of JSON Pointer fragments altogether.

I have no idea what you mean by this. I'm sure it can't mean what it sounds like.


Option 2

I've always found it awkward to put $id at the root of a document. (Apparently I'm the only one because it seems to be accepted as best practice today.) Coupling a representation's identifier with it's representation makes it harder to change that identifier or have alias identifiers. HTML documents don't declare their identifiers. They are identified by the URI used to retrieve them. I don't see why JSON Schema should be any different.

The approach I prefer is that it just doesn't make sense to put an $id at the root. Doing so would be like having a schema that is nothing but a $ref. Any pointer would pointer across a document boundary and the value of the document would be meaningless.

However, the approach that seems to make most sense given the path JSON Schema has chosen, is to consider $id the one and only way to identify a schema.

I think defining an $id at the root as a redirect or some other kind of alias is a reasonable compromise.

awwright

awwright commented on Mar 3, 2019

@awwright
Member

Do you mean in $id or just in general?

Not using JSON Pointer at all, only named schemas. (I'm not sure it's viable to remove outright, but at least for authoring I don't see any reason to use pointer fragments anymore.)

johandorland

johandorland commented on Mar 3, 2019

@johandorland
Collaborator

I've always found it awkward to put $id at the root of a document. (Apparently I'm the only one because it seems to be accepted as best practice today.) Coupling a representation's identifier with it's representation makes it harder to change that identifier or have alias identifiers. HTML documents don't declare their identifiers. They are identified by the URI used to retrieve them. I don't see why JSON Schema should be any different.

The major difference between HTML and JSON schema is that HTML is online by default whilst JSON schema is not. It's even discouraged that remote references are downloaded by default. Having identifiers within each JSON document makes it very helpful to keep track of them. In my opinion https://json-schema.org/latest/json-schema-core.html#rfc.section.8.3.1 explains this very well. You can just pass a schema to an implementation and it will sort out all the references and you don't have to worry about telling which schema is identified by what URI.

jdesrosiers

jdesrosiers commented on Mar 3, 2019

@jdesrosiers
Member

@johandorland

The major difference between HTML and JSON schema is that HTML is online by default whilst JSON schema is not. It's even discouraged that remote references are downloaded by default.

I don't think it makes any difference whether it's remote or local. Either way you are retrieving schemas from somewhere.

@awwright

Not using JSON Pointer at all, only named schemas.

I'm still not entirely clear, but I'm assuming that by "named schemas" you are referring to the location-independent identifiers as currently defined in the spec. If so, that's interesting. It's a little less efficient than using pointers because you would have to scan the document for all the named schemas before you could process the schema. But if you support named schemas at all, you have to do that anyway. Without pointers, schemas wouldn't be able to arbitrarily reference any part of a schema. Especially with regard to remote references, this might be a good thing. It makes only part of a schema "public" (can be $reffed from another schema). Schema authors can change their schemas with confidence that they aren't breaking another schema that depends on it.

As for removing JSON Pointer support entirely, I'll point out that JSON Pointer (or something similar) is required for my use-case of JSON Reference. My use case doesn't involve JSON Schema, so it's not relevant by itself, but with the introduction of vocabularies we might not want to keep options open. If my use-case requires JSON Pointer, some vocabulary might need it as well.

awwright

awwright commented on Mar 3, 2019

@awwright
Member

But if you support named schemas at all, you have to do that anyway.

Also consider: All parsers have to do some sort of indexing. If you're a JSON parser, you're indexing a byte stream into a set of properties or items. We're just so used to the JSON parser doing it for us that when we add an additional thing to index ("$id"), it seems foreign: We have to take the parsed result, and scan over it again.

If you were parsing a JSON Schema document using a SAX-like event stream instead, it wouldn't look like an additional step at all. It would just be an event you handle, just like all the others.

handrews

handrews commented on Mar 3, 2019

@handrews
ContributorAuthor

@awwright

If you were parsing a JSON Schema document using a SAX-like event stream instead, it wouldn't look like an additional step at all. It would just be an event you handle, just like all the others.

🐡 Mind blown! (there's no github exploding head emoji, so... blowfish 🤷‍♂️ )

25 remaining items

handrews

handrews commented on Apr 1, 2019

@handrews
ContributorAuthor

I'm going to write a PR for this, possibly in combination with other $id things as we have several that all affect each other fairly directly.

handrews

handrews commented on May 11, 2019

@handrews
ContributorAuthor

I don't think I have it in me to get this through in draft-08

handrews

handrews commented on Aug 10, 2019

@handrews
ContributorAuthor

OK, I read back through all of this and think we all did a really great job of sorting through the issues and settling on the "shadowed URIs have undefined behavior" change.

Let's get this in draft-08 after all: I'm writing the PR right now.

added
clarificationItems that need to be clarified in the specification
and removed on Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    clarificationItems that need to be clarified in the specificationcore

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Participants

      @KayEss@awwright@jdesrosiers@Relequestual@handrews

      Issue actions

        "$id": Eliminate base URI shadowing · Issue #726 · json-schema-org/json-schema-spec