Skip to content

Question about $id #349

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
vivin opened this issue Aug 17, 2017 · 17 comments
Closed

Question about $id #349

vivin opened this issue Aug 17, 2017 · 17 comments
Assignees
Milestone

Comments

@vivin
Copy link

vivin commented Aug 17, 2017

Perhaps this has been covered elsewhere, but I couldn't find it. I found the definition for $id at http://json-schema.org/latest/json-schema-core.html#rfc.section.9.2 a bit confusing; specifically this part:

To name subschemas in a JSON Schema document, subschemas can use "$id" to give themselves a document-local identifier. This is done by setting "$id" to a URI reference consisting only of a fragment. The fragment identifier MUST begin with a letter ([A-Za-z]), followed by any number of letters, digits ([0-9]), hyphens ("-"), underscores ("_"), colons (":"), or periods (".").

The effect of defining an "$id" that neither matches the above requirements nor is a valid JSON pointer is not defined.

The example that is provided is:

{
    "$id": "http://example.com/root.json",
    "definitions": {
        "A": { "$id": "#foo" },
        "B": {
            "$id": "other.json",
            "definitions": {
                "X": { "$id": "#bar" },
                "Y": { "$id": "t/inner.json" }
            }
        },
        "C": {
            "$id": "urn:uuid:ee564b8a-7a87-4125-8c96-e9f123d6766f"
        }
    }
}

The specification says that $id is a URI reference consisting only of a fragment, which I assume is anything starting with a #. It then goes on to say that the effect of defining an $id that doesn't match the above requirements, or is not a valid JSON pointer, is not defined.

Doesn't that mean that other.json and t/inner.json don't meet this criteria? Neither of those are fragments. Also, what does it mean when the fragment can be a JSON pointer? Can you have an id like #foo/bar/baz?

@handrews
Copy link
Contributor

@vivin Thank you for catching this! I believe the last sentence you quoted should read:

The effect of defining a URI fragment "$id" that neither matches the above requirements nor is a valid JSON pointer is not defined.

(emphasized words were added/changed).

That would exclude URIs with more than just a fragment, such as other.json and t/inner.json. Or the URN example for that matter.

You can have an "$id" like #/foo/bar/baz (note that the leading slash is required, except for the root pointer which is an empty string and therefore becomes the fragment #). I think we should clarify that explicitly declaring a JSON Pointer $id that conflicts with the actual position of the subschema would also result in undefined behavior.

@awwright did we talk about that at some point? I think the case that motivates an explicit clarification here is when someone declares an "$id" with the JSON Pointer that would otherwise point to a different subschema within the document. I would also argue for stating that any declared JSON Pointer that does not match its position should be considered undefined behavior, but declaring a duplicate definitely cannot be resolved in any sensible manner by an implementation.

@handrews handrews self-assigned this Aug 21, 2017
@handrews
Copy link
Contributor

PR #356 is also clarifying this same concept (and a bit more), so I'm going to add the "URI fragment" words in. They fit with other changes to clarify the syntax and usage of exactly this sort of $id.

@handrews
Copy link
Contributor

There is still the question of what happens if a JSON Pointer fragment is defined that does not match the actual position with respect to the nearest base URI. I'm keeping this issue open for that topic, as it is too unclear to add to the existing PR.

handrews added a commit to handrews/json-schema-spec that referenced this issue Aug 21, 2017

Verified

This commit was signed with the committer’s verified signature.
This addresses the trivial portion of issue json-schema-org#349.
@handrews handrews added this to the draft-07 (wright-*-02) milestone Aug 21, 2017
@toddobryan
Copy link

Adding my own confusion to this.

From [http://json-schema.org/latest/json-schema-core.html#rfc.section.9.2](Section 9.2 of the core spec):

The "$id" keyword defines a URI for the schema, and the base URI that other URI references within the schema are resolved against. The "$id" keyword itself is resolved against the base URI that the object as a whole appears in.

It also says

To name subschemas in a JSON Schema document, subschemas can use "$id" to give themselves a document-local identifier. This is done by setting "$id" to a URI reference consisting only of a fragment. The fragment identifier MUST begin with a letter ([A-Za-z]), followed by any number of letters, digits ([0-9]), hyphens ("-"), underscores ("_"), colons (":"), or periods (".").

I read that to mean that only the root schema could have an absolute URI, that all $id elements were resolved against that base URI (not any intervening $ids in a lower scope), and that you could only name non-root schemas with fragments.

I think that is consistent with https://github.com/json-schema/json-schema/wiki/The-%22id%22-conundrum, but it's inconsistent with some of the tests in the v6 test suite. In particular, this case:

    "description": "base URI change",
    "schema": {
        "$id": "http://localhost:1234/",
        "items": {
            "$id": "folder/",
            "items": {"$ref": "folderInteger.json"}
        }
    },

would require you to resolve both the $id and $ref elements against URIs other than the base URI.

Is this supposed to be legal in v6?

@handrews
Copy link
Contributor

@toddobryan "$id" is always a URI reference. It SHOULD be an absolute URI with no fragment in the root schema. "$id" changes the base wherever it is used, so the closest containing (or adjacent within the same subschema) "$id" is always the base.

Since a fragment is always ignored when considering base URIs, declaring a plain name fragment with "$id" has no real effect on the base URI. It is technically different, but since resolving a URI reference against a base URI per RFC 3986 discards the base's fragment, it does not matter.

I don't recommend paying attention to anything in the old repository. There's a reason it has a big "this is out of date" heading at the top of every wiki page. It's there to be available for historical research only.

The full URI for the "$ref" in your example is "http://localhost:1234/folder/folderInteger.json"

@toddobryan
Copy link

Please clarify this in the spec. In particular, "$id changes the base wherever it is used, so the closest containing (or adjacent within the same subschema) $id is always the base" is not at all clear from the current spec.

@handrews
Copy link
Contributor

@toddobryan Looking at

The "$id" keyword itself is resolved against the base URI that the object as a whole appears in.

the problem appears to be the phrase "object as a whole". If that were replaced with something like:

The base URI for the "$id" keyword in a subschema is that of its containing schema. In the root schema, the base is determined per RFC 3986 section 5.

would you find that sufficiently clear?

@toddobryan
Copy link

Yes. That would help a lot.

Also, please explicitly state that non-fragment URIs are allowed for subschemas. Specifically calling out that authors can use fragments on subschemas without specifically stating that non-fragment $ids are also allowed may lead other people like me to assume that the lack of mention is meaningful.

Also, just to make sure I've got this--there is nothing to stop me from declaring a new absolute URI in a subschema, correct?

{
  "$id": "http://foo.com/main.json",
  "type": "integer",
  "definitions": {
    "baz": {
      "$id": "http://bar.com/sub.json",
      ...stuff inside baz...
    }
  }
}

So, in this case, the only way to reference the stuff inside baz through its containing schema is by either using its full URI or by using a fragment with a JSON Pointer? And similarly, to escape from the baz subschema, any $ref element would have to use the full, absolute URI of the containing schema. Is that all correct?

BTW, according to the proposed core spec, you can't declare fragment $ids with slashes. A slash is not one of the allowed characters it lists:

To name subschemas in a JSON Schema document, subschemas can use "$id" to give themselves a document-local identifier. This is done by setting "$id" to a URI reference consisting only of a fragment. The fragment identifier MUST begin with a letter ([A-Za-z]), followed by any number of letters, digits ([0-9]), hyphens ("-"), underscores ("_"), colons (":"), or periods (".").

I gather this was to make clear whether a $ref was meant to be a JSON Pointer or not.

Actually, this brings up something. I guess that a JSON Pointer fragment is also resolved according to the closest defined $id, so that if that $id is not a fragment, you have to provide a full URI to escape out of its scope. By which I mean:

{
  "$id": "http://foo.com/main.json",
  "type": "integer",
  "definitions": {
    "baz": {
      "$id": "sub",
      "definitions": {
        "baz": { "$ref": "#/json/pointer/here" }
      }
    }
  }
}

I think I understand you to mean that the $ref cannot refer to anything above the $id sub unless it specifies the whole base URI of the schema (because any fragment would be resolved to http://foo.com/sub, which is where evaluation of the JSON Pointer would start from.

Is that all correct?

@handrews
Copy link
Contributor

@toddobryan I will add some clarifications, but it is not JSON Schema's responsibility to repeat RFC 3986's information. When we say something is a URI Reference per RFC 3986 as a normative reference, the expectation is that the reader will refer to the definition there. We do not repeat all of the different variations on URI references.

The fragment syntax is examined in detail because fragments are media-type specific. Likewise for defining what parts of the media type's content change the base URI. But everything else is per RFC 3986, and repeating it in the spec is just clutter.

The specification is just that: a specification defining the requirements for conforming implementations. It is not a schema author's guide or user's guide. Those belong on the website, and you can request a guide by filing an issue at https://github.com/json-schema-org/json-schema-org.github.io (and we would really love to have more people writing guide articles).

I believe that all of your examples are correct, although forbidding slashes is just establishing the plain name fragment syntax, as it behaves differently from the JSON Pointer fragment syntax. PR #356 clarifies the distinctions between plain name and JSON Pointer fragments and their usage. You can still declare an "$id" that is a JSON Pointer fragment, but if you declare a pointer that conflicts with the subschema's position in the document, the behavior is not well-defined, nor should it be. In the partial implementation that I did, it would warn you if you declared a conflicting pointer fragment, but that's really up to implementations.

@handrews
Copy link
Contributor

@toddobryan BTW the use case for absolute URI "$id"s in subschemas is to allow "packing" multiple schema documents into a single file/resource for easy distribution.

@toddobryan
Copy link

OK. Now that I know what it says, the sentence is perfectly understandable. Thanks!

(My confusion resulted from the fact that I thought each document could only have one base URI. I now realize that every $id element defines a (potentially) new base URI for the part of the document that it encloses. Sorry I was being dense. I never should have read your "id conundrum" post, because I keep thinking about all the ways this could potentially break instead of just doing what it says. :-) )

@handrews
Copy link
Contributor

@toddobryan LOL, thanks!
BTW, I did not write any of the content on the old wiki, nor do I endorse it. My name is only there because I put the "OUT OF DATE" notices on all the pages. I wanted to delete them to avoid exactly this kind of thing, but other people wanted to keep them "archived".

@handrews
Copy link
Contributor

I've added more wording tweaks to #372 to make all of this more clear.

@vivin
Copy link
Author

vivin commented Aug 30, 2017

@handrews @toddobryan this discussion really cleared some things up. Just to clarify, I want to make sure that my understanding is correct. Assuming that the value of $id in the parent scope is http://example.org/blah.json, then a subschema with:

  • $id set to foo.json would have the fully-resolved id http://example.org/foo.json.
  • $id set to foo would have the fully-resolved id http://example.org/foo.
  • $id set to bar/baz would have the fully-resolved id http://example.org/bar/baz
  • $id set to #/definitions/foo would have the fully-resolved id http://example.org/blah.json#/definitions/foo.
  • $id set to bar#/definitions/bar would have the fully-resolved id http://example.org/bar#/definitions/bar. (is this allowed?)
  • $id set to http://example.org/other.json would have the fully-resolved id http://example.org/other.json.

Is that correct? Also, would we get the same resolved ids if the id in the parent scope was http://example.org/schema/blah.json instead (that is, does the fact that blah.json is under the schema path matter)? What if the id was http://example.org/schema/blah.json#/definitions/foo? Based on my understanding, in both of these cases, the id's would resolve similarly. Do let me know if I've got this completely wrong.

Thanks again!

@handrews
Copy link
Contributor

@vivin all of your bullet point examples are absolutely correct!

bar#/definitions/bar is allowed, although probably confusing for human readers. I've seen it in tools that "dereference" schemas by replacing al "$ref"s with the things to which they refer (this only works when there are no circular references). To make that work right, you need to preserve the id of the schema when you replace "$ref" with it, and that often includes a JSON Pointer fragment. But I've only seen that done as a processing step to produce a single in-memory non-circular data structure, and not as something that is presented to a schema user directly.

If your parent scope is http://example.org/schema/blah.json, then your 1st, 2nd, 3rd, and 5th bullet point fully-resolved ids will also have /schema in the path. However, if you had an $id of /foo.json instead of foo.json, you would get http://example.org/foo.json instead of http://example.org/schema/foo.json

Handling leading / trailing slashes in the base URI and URI reference is covered in detail in RFC 3986, so you can get a full explanation in the examples there.

@vivin
Copy link
Author

vivin commented Aug 30, 2017

@handrews Thank you -- this really helped clear things up for me!

@handrews
Copy link
Contributor

handrews commented Sep 3, 2017

@vivin I've filed a web site issue to put more explanations on the web site. Since you seem to be happy with the answers and we can track adding things to the web site in the other issue, I'm going to go ahead and close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants