Skip to content

Clarify usage of $ref with properties not provided in definitions #1097

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 task
cybtachyon opened this issue May 4, 2021 · 24 comments
Closed
1 task

Clarify usage of $ref with properties not provided in definitions #1097

cybtachyon opened this issue May 4, 2021 · 24 comments

Comments

@cybtachyon
Copy link

cybtachyon commented May 4, 2021

From @ThatGuyCND @rjmill and myself:

Based on reading https://json-schema.org/understanding-json-schema/structuring.html#using-id-with-ref:
In an effort to reuse definitions and lessen code duplication, common schemas are abstracted into smaller “chunks.”
As noted by the documentation linked above, you would either:

  1. Use local definitions to define these and then reference them via fragment #/definitions
  2. Define them with $id (id in draft-04) and call them that way : #<$id>
  3. Need to use fragment #/properties

Assumptions:

  • Given the structure of the files below, because id is not defined, and the properties are specified in an external file, and JSON-Schema documentation notes that definitions are not required, one would need to use #/properties/… in order to interpolate the defined values.
  • When you use $id, you're explicitly setting the "canonical" URI for that schema resource. Unless $id specifies an absolute URI, the URI-reference is resolved against the current base URI to create an absolute URI.
    The thing to note here is the base URI. $ref (unless it's an absolute URI) is resolved against the current base URI to create a full URI.
    When the base URI isn't explicitly set in the root schema via $id, it defaults to the retrieval URI. So if you retrieved the first one from http://example.com/foo/image_embed.json, the "$ref": "link.json" would be resolved as "$ref": "http://example.com/foo/link.json" as per standard URI resolution rules.
    As long as your JSON Schema implementation fetches unknown URIs, and the two schemas are retrievable next to each other, then that should work.
    Worth noting, JSON Schema implementations aren't required to fetch unknown URIs. Many do, but it's not a requirement. The URI is being used as an identifier, not a location.
  • $id as a root-level value and $id as a property-level value in a schema act on different sides of the JSON Pointer. A root level $id acts on the left side of the #/ pointer, while a property-level $id acts on the right side.

Examples:

image_embed.json:

{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "title": "Image embed",
  "description": "A method for displaying images",
  "category": "component",
  "type": "object",
  "format": "grid",
  "properties": {
    "link": {
      "entity": "link",
      "type": "object",
      "format": "grid",
      "options": {
        "grid_columns": 6
      },
      "properties": {
        "href": {
          "$ref": "link.json#/properties/href",
          "options": {
            "grid_columns": 3
          }
        },
        "title": {
          "$ref": "link.json#/properties/title",
          "options": {
            "grid_columns": 3
          }
        }
      },
      "required": ["href"]
    }
  }
}

link.json:

{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "category": "atom",
  "title": "Link",
  "entity": "link",
  "type": "object",
  "format": "grid",
  "properties": {
    "href": {
      "title": "URL",
      "type": "string",
      "format": "url",
      "options": {
        "grid_columns": 4
      }
    },
    "title": {
      "title": "Title attribute",
      "description": "Shown on mouseover.",
      "type": "string",
      "options": {
        "grid_columns": 4
      }
    }
  }
}

the bits in question: "$ref": "link.json#/properties/href", and "$ref": "link.json#/properties/title".

Why is this important enough to merit changes to the docs?

Goal

@handrews
Copy link
Contributor

handrews commented May 5, 2021

@cybtachyon

"definitions"/"$defs" are not required by the spec, therefore most developers are unlikely to use them.

That is highly dependent on whose schemas you look at. Pretty much every large project I've seen (ranging from internal APIs to a large biomedical information system) uses definitions or $defs heavily. Some other projects (notably OpenAPI with their components/schemas section, but also a non-public project I've worked on) define alternative keywords that have the same function but are more suitable for whatever reason.

Do you specifically need draft-04 explanations? The way id and $ref were explained in draft-04 was mind-bending, and it hasn't looked anything like that for a good long while. Using $ref to point into a properties definition is generally something to be avoided. It's a bit like ignoring the leading underscore on a Python method- Python doesn't enforce that such methods are "private" but it's a convention that is generally understood.

I'm not quite sure what "canonical response" means here, but if I were working with these schemas they wouldn't look like this at all. Anything re-usable would be factored out under $defs (or definitions). In recent drafts, $id is used to identify JSON Schema resources, since you can embed multiple resources in a single JSON Schema document.

@gregsdennis
Copy link
Member

Just take a look at any of the large libraries that use JSON Schema to define their schema - for example https://explore.fast.design/components/fast-accordion

Can you provide a link to where they use JSON Schema. That page is an examples site where the only code that's displayed is XML/HTML

@handrews
Copy link
Contributor

handrews commented May 5, 2021

To show an example of definitions usage (in draft-06), the FHIR JSON Schema has over 650 schemas under definitions.

@ThatGuyCND
Copy link

Can you provide a link to where they use JSON Schema.

@gregsdennis there is a tab in the bottom frame of the page labeled Schema

@ryangalamb
Copy link
Contributor

Using $ref to point into a properties definition is generally something to be avoided. It's a bit like ignoring the leading underscore on a Python method- Python doesn't enforce that such methods are "private" but it's a convention that is generally understood.

This is a great way of looking at it, thanks @handrews.

My point in the slack discussion was that using definitions and $defs isn't required. But if I were working on a codebase that had schemas with $refs to schemas that weren't inside $defs, I'd move the reusable schemas to $defs. It's one of those "low hanging fruit" "easy win" refactors.

While $ref-ing schemas that aren't in $defs/definitions is valid, it should be considered a code smell.

Regarding the "Understanding JSON Schema" link, that example with $id is incorrect. I'm not sure if that's how it worked in earlier drafts, but that's not how it works now. The value of $id should be an absolute URI (with no fragment). But that example has a fragment-only URI-reference. (For more info, see the official spec)

The other sections do explain how the JSON Pointer fragment behavior works with $ref. They just don't include examples with properties. And I think that's good. We should steer developers away from patterns that consistently cause headaches in the future.

Maybe a section on JSON Pointers could be worth while? It's probably out of scope for that document, but worth considering.

@cybtachyon
Copy link
Author

Thanks for the productive responses team.

"definitions"/"$defs" are not required by the spec,

Does this need to change then, based on the phrasing in the maintainers responses here?

To show an example of definitions usage

This is not necessary, as it is not the problem we are attempting to solve. Thanks though.

Do you specifically need draft-04 explanations?

No, latest is fine.

Using $ref to point into a properties definition is generally something to be avoided. It's a bit like ignoring the leading underscore on a Python method- Python doesn't enforce that such methods are "private" but it's a convention that is generally understood.

So it sounds like the docs need to be updated to strongly discourage usage here? As it stands, there are multiple massive codebases by different companies I am responsible for consuming in production environments that do not use definitions at all.

Since the spec currently does not require "definitions"/"$defs", what we're looking for here is exactitude on either:

  • Support for pathing through properties being removed / defs being required or soft-required in a future draft
    or
  • A clear example of using $ref with properties so codebases without definitions have something to point to.

@gregsdennis
Copy link
Member

gregsdennis commented May 5, 2021

there is a tab in the bottom frame of the page labeled Schema

Thanks. I see that now. But what do they do when they want to compose these components into a larger schema, e.g. to represent a form? It looks to me that they've elected to store their component schemas in separate files. My guess is that they'd just $ref those files. This is perfectly fine, but it does mean that either the validator or the client using it would have to make multiple web requests to get those schemas, which is less optimal than composing them into a single file.


Since the spec currently does not require "definitions"/"$defs"...

The $defs keyword is already defined in such a way that it encourages storage of reusable subschemas.

(8.2.4) The "$defs" keyword reserves a location for schema authors to inline re-usable JSON Schemas into a more general schema.

Reserving a location for this purpose is sufficient. We generally try to leave authorship options open. If a schema author wants to (or has good reason to) $ref into properties or allOf or any other random location within a schema or even an external JSON file (so long as it resolves to a valid schema), then we don't want to prevent them from doing so, even if such a practice is discouraged.

I'm not sure anything needs to change in the spec. This

Support for pathing through properties being removed / $defs being required or soft-required in a future draft.

is not going to be an acceptable outcome from this issue.

If the "how to schema" site could use more clarification in some areas, that's fine.

@handrews
Copy link
Contributor

handrews commented May 5, 2021

@cybtachyon

"definitions"/"$defs" are not required by the spec,

Does this need to change then, based on the phrasing in the maintainers responses here?

This is less straightforward than "not required by the spec" implies, although that is not at all obvious from draft-04. Really, I advise never looking at draft-04 for anything ever again 😝 There was a lot of great work put into that draft, as demonstrated by its longevity. However, many of the choices made in an attempt to clarify turned out to do the opposite, so in some ways it is uniquely confusing.

This is all much more clear in 2020-12, so from here on out I will use those keyword names and terminology exclusively. For one thing, fragments are no longer allows in $id (except for an empty fragment, which is equivalent to not having a fragment, and is discouraged and only allowed for confusing historical reasons so please ignore it).

To elaborate on what @gregsdennis said:

Keywords fall into one of five classifications. $defs is a reserved location keyword, specifically one that reserves locations for schemas. While this may seem like a no-op, reserved schema location keywords have important implications when it comes to identifying and referencing schemas.

When you load a schema document, you need to scan for $id and $anchor to see if there are referenceable URIs defined within the document. These keywords are only valid in schema objects, so you have to know which objects are schema objects. That is what $defs and keywords like it do. They say "these things here are schemas, so if you're looking for things defined in schemas, look here."

Often, you can get away without doing this. You can $ref using JSON Pointer fragments appended to the base URI of the entire document, and just assume that the result is a schema. But as described in the section linked above, sometimes that doesn't work. And sometimes it seems to work but shouldn't.

So, technically you do not need to use $defs, but you do need to use some sort of reserved location keyword. You can define your own in a new vocabulary, which is valid. I'm looking at doing that for a client project right now, because there are some unusual things about that project that mean that the UX of developing the schemas will be better with a different name. But it's not just a random spot- there will be a keyword that is defined as a reserved location for use in place of (or alongside) $defs.

And yes, I agree that we don't want to forbid $ref to properties, just like Python doesn't forbid calling leading single-leading-underscore names from outside of a class. It's not advised to do that, but sometimes you have to do ill-advised things. The spec shouldn't lock that down (but a linter should probably complain unless configured not to).

@cybtachyon
Copy link
Author

cybtachyon commented May 5, 2021

Thanks again for the background.

Really, I advise never looking at draft-04 for anything ever again

That would be preferable, however one of the massive codebases I'm consuming was primarily authored back in draft-04, with a few draft-07 sprinkled in. I do not have a choice here.

we don't want to forbid $ref to properties, just like Python doesn't forbid calling leading single-leading-underscore names from outside of a class. It's not advised to do that, but sometimes you have to do ill-advised things. The spec shouldn't lock that down (but a linter should probably complain unless configured not to).

Ok. So it sounds like there should be no spec changes, but the documentation site should have a line something along the lines of:

While you can also $ref properties with a JSON pointer by doing "$ref": "otherSchema.json#/properties/link", you should avoid doing so and instead move the schema definition to the $defs reserved location to avoid issues with schema validation and references.

If we can agree on that, I can make an Issue/PR to the Understanding Schema repo and close this one out.

This does end up making things more difficult for a massive library of individual JSON schemas with complex interdependencies like the ones I work with, but we'll have to accept that the approach taken is not considered a primary use case by the maintainers.

@handrews
Copy link
Contributor

handrews commented May 5, 2021

@cybtachyon

we'll have to accept that the approach taken is not considered a primary use case by the maintainers.

Just as with programming languages or XML, there is a learning curve with JSON Schema, and we all made some sub-optimal choices as we learned.

This does end up making things more difficult for a massive library of individual JSON schemas with complex interdependencies like the ones I work with

I would like to understand if you think this is true in an absolute sense, or only because of the way the existing JSON Schemas are designed. I have managed very large schema libraries worked on by globally distributed teams, and in my experience proper use of reserved locations like $defs (meaning that the various uses of those schemas are all $ref'd rather than one being inline and others $ref-ing it) makes such tasks much simpler. If you believe that is not the case, we should try to understand what's going on.

@cybtachyon
Copy link
Author

cybtachyon commented May 6, 2021

I would like to understand if you think this is true in an absolute sense

Yes, although making the recommended usage more obvious through examples on the website will help with adoption.

Right now, according to the Understanding JSON Schema site, only "complex schemas" use definitions for reuse, when in the real world I would suggest based on my own experience that most schemas are likely intended for re-use elsewhere in the system eventually.

To fit with the above recommended method for reuse, we'd have to refactor thousands of schema files (most of which are re-used or use another schema) to use definitions:

Before:

{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "category": "atom",
  "title": "Link",
  "entity": "link",
  "type": "object",
  "format": "grid",
  "properties": {
    "href": {
      "title": "URL",
      "type": "string",
      "format": "url",
      "options": {
        "grid_columns": 4
      }
    },
    "title": {
      "title": "Title attribute",
      "description": "Shown on mouseover.",
      "type": "string",
      "options": {
        "grid_columns": 4
      }
    }
  }
}

After:

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "category": "atom",
  "title": "Link",
  "entity": "link",
  "type": "object",
  "format": "grid",
  "$defs": {
    "href": {
      "title": "URL",
      "type": "string",
      "format": "url",
      "options": {
        "grid_columns": 4
      }
    },
    "title": {
      "title": "Title attribute",
      "description": "Shown on mouseover.",
      "type": "string",
      "options": {
        "grid_columns": 4
      }
    }
  },
  "properties": {
    "href": { "$ref": "#/$defs/href" },
    "title": { "$ref": "#/$defs/title" }
  }
}

A second example just for completions sake:

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Image embed",
  "description": "A method for displaying images",
  "category": "component",
  "type": "object",
  "format": "grid",
  "$defs": {
    "link": {
      "entity": "link",
      "type": "object",
      "format": "grid",
      "options": {
        "grid_columns": 6
      },
      "properties": {
        "href": {
          "$ref": "link.json#/href",
          "options": {
            "grid_columns": 3
          }
        },
        "title": {
          "$ref": "link.json#/title",
          "options": {
            "grid_columns": 3
          }
        }
      },
      "required": ["href"]
    }
  },
  "properties": {
    "link": { "$ref": "#/$defs/link"  }
  }
}

@gregsdennis
Copy link
Member

gregsdennis commented May 6, 2021

I think this is really a developer decision of what schemas are worth being factored out for reuse. Typically, the Rule of Three is followed.

The idea is that you extract functionality when it's been proven to be reused. Premature factorization results in wasted effort.

If you want to extract all of your subschemas into definitions, then you're welcome to. We won't stop you. But that's your philosophy of how you want to organize your schemas. We have to remain open to different development patterns.

I personally wouldn't suggest extracting all of your subschemas into definitions. I'd follow the Ro3 mentioned above. In fact, I have an extension for my library which uses a "rule of two" to refactor common subschemas (Optimize() method) while generating schemas from C# types. If a subschema only appears once, the generator leaves it in place.

@handrews
Copy link
Contributor

handrews commented May 6, 2021

@cybtachyon yeah what @gregsdennis said.

I'll also add, regarding:

Right now, according to the Understanding JSON Schema site, only "complex schemas" use definitions for reuse, when in the real world I would suggest based on my own experience that most schemas are likely intended for re-use elsewhere in the system eventually.

Please keep in mind that we inherited Understanding JSON Schema, which was a completely independent project. The authors generously donated it to us when they were no longer in a position to keep it up to date, but we have not had time to do much more than some minor tweaks to add draft-07 notes. It was originally written for draft-04, which was very different (and also not written by the current spec team).

So the book is not the official view of the current JSON Schema spec team, and every once in a while we find something we really would have done differently. We are finally in a position where someone is dedicating substantial time to updating and reworking it, which is exciting! But that's a very recent development.

On the plus side, now is the perfect time to get this sort of feedback in!

@karenetheridge
Copy link
Member

BTW, there are existing tests of $refs to locations not under $defs, here: https://github.com/json-schema-org/JSON-Schema-Test-Suite/blob/master/tests/draft2020-12/ref.json#L38

..and there are tests of $refs to locations under keywords that do not contain subschemas, here: https://github.com/json-schema-org/JSON-Schema-Test-Suite/blob/master/tests/draft2020-12/optional/refOfUnknownKeyword.json#L4-L9 -- if the spec is explicit that this usecase should be allowed, then we should move this test out of optional/.

@handrews
Copy link
Contributor

handrews commented May 6, 2021

@karenetheridge testing $ref to another known schema location is fine, and is a good idea to catch the people who mistakenly think you can only $ref to $defs.

Regarding unknown keywords, it's complicated. Given the following schema (with totally unknown keywords, not a vocabulary):

(YAML because I'm lazy)

$id: https://example.com/foo
unknown1:
  $id: bar
  $anchor: bbb
  unknown2:
    type: string

You can $ref: "#/unknown1" because you can evaluate a JSON pointer just fine, whether the implementation previously noticed that as a subschema or not. So in that sense, it "works," but mostly by coincidence.

You cannot $ref bar or bar#bbb because the implementation didn't know to check under unknown1 for schema keywords including $id and `$anchor.

You can try to $ref: "#/unknown1/unknown2" and most implementations will probably do it, but you crossed an embedded resource boundary with that JSON Pointer so now the results are officially not defined.

Although if your implementation scans and pre-calculates which possible JSON Pointers point to schemas and only allow those to be dereferenced, then none of these "unknown" cases would work.

Optional seems like a good place for it. Probably a compromise solution to a past discussion, I'm guessing.

@karenetheridge
Copy link
Member

karenetheridge commented May 8, 2021

I didn't mean to suggest that we should be allowing $refs to identifiers defined under unknown keywords, nor are we testing for that now, either in the main test section or in optional/ (in fact I added tests that check that we DON'T allow this).

We should definitely NOT be scanning for identifiers under unknown keywords, because we don't know where the subschemas are, or even if there are any. The only $refs that would work to that area of the document would be using json pointer fragments using identifiers that are already known outside that section.

@handrews
Copy link
Contributor

handrews commented May 8, 2021

@karenetheridge oops, sorry to misinterpret! I think I don't really understand what that test is doing, then, but I am quite happy to take your word for it that it's doing the appropriate thing 🙂 My understanding of how the test suite works is a little vague.

@karenetheridge
Copy link
Member

Both the tests I linked to above use fragment-only json pointer uri references (that's a mouthful).

..and I take it back, there aren't tests for $refs to locations below unknown keywords.. but there are similar tests that check that identifiers aren't extracted from non-keyword locations (https://github.com/json-schema-org/JSON-Schema-Test-Suite/blob/master/tests/draft2020-12/id.json#L208-L257
and https://github.com/json-schema-org/JSON-Schema-Test-Suite/blob/master/tests/draft2020-12/anchor.json#L81-L137), and I'll write some missing tests now based on these.

@handrews
Copy link
Contributor

handrews commented May 8, 2021

@karenetheridge thank you!

@handrews
Copy link
Contributor

handrews commented May 9, 2021

@cybtachyon is there anything more here that needs to be answered or clarified, or can this be closed?

@cybtachyon
Copy link
Author

cybtachyon commented May 10, 2021

@handrews We need to create an issue for the docs, and I need to finish an issue for https://github.com/json-editor/json-editor , and then this can be closed.

@Relequestual
Copy link
Member

@cybtachyon Feel free to create an issue for attention of @jdesrosiers. He is doing major work on it, so I’m not sure if a PR from elsewhere would be counter productive or not.

@handrews
Copy link
Contributor

@cybtachyon I'm happy to keep this open until you file the Understanding JSON Schema issue as long as that is done soon, but it should not be held open for a PR there (use the issue there to track the PR) or anything else. We don't keep issues open here waiting for other people (JSON Editor) to update their projects. This repository only tracks work we can do on this repository.

@cybtachyon
Copy link
Author

This can now be closed. Thanks all for clarifying and contributing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants