Skip to content

Recursive schema composition #558

Closed
Closed
@handrews

Description

@handrews

TL;DR:

Re-using recursive schemas is a challenge.

  • $recurse is a specialized version of $ref with a context-dependent target
  • The target is the root schema of the document where schema processing began
  • Processing can be either static schema walking or dynamic evaluation with an instance
  • The value of $recurse is always true (discussed in the "alternatives" section)
  • This is based on a keyword we have long used in Doca

Example

APPARENTLY MANDATORY DISCLAIMER: This is a minimal contrived example, please do not point out all of the ways in which it is unrealistic or fails to be a convincing use case because you can refactor it. It's just showing the mechanism.

foo-schema:

{
    "$id": "http://example.com/foo-schema",
    "properties": {
        "type": "object",
        "foo": {"$recurse": true}
    }
}

bar-schema:

{
    "$id": "http://example.com/bar-schema",
    "allOf": [{"$ref": "http://example.com/foo"}],
    "required": ["bar"],
    "properties": {"bar": {"type": "boolean"}}
}

The instance:

{
    "bar": true,
    "foo": {
        "bar": false,
        "foo": {
            "foo": {}
        }
    }
}

is valid against the first schema, but not the second.

It is valid against foo-schema because the "$recurse": true is in foo-schema, which is the same document that we started processing. Therefore it behaves exactly like "$ref": "#". The recursive "foo" works as you'd expect with "$ref": "#", and foo-schema doesn't care about "bar" being there (additional properties are not forbidden).

However, it is not valid against bar-schema because in that case, the "$recurse": true in foo-schema behaves like "$ref": "http://example.com/bar-schema", as bar-schema is the document that we started processing. Taking this step by step from the top down:

  • Processing the root of the instance, we have the "bar" property required by bar-schema; we got this directly from the root schema of bar-schema, without $recurse being involved
  • Looking inside "foo", processing follows the allOf and $ref to foo-schema. The top-level instance is an object, so we pass the type constraint
  • Still processing foo-schema, for the contents of the "foo" property, we have "$recurse": true. Since we started processing with bar-schema, this is the equivalent of "$ref": "bar-schema"
  • So now we apply bar-schema to the contents of foo. This works fine: there is a boolean "bar", and we follow allOf and $ref back to foo-schema, and pass the `"type": "object" constraint
  • Now, once again, we look at "$recurse": true to go into the next level "foo", and once again this is treated as "$ref": "bar-schema"
  • Now validation fails, because the innermost "foo" does not have the required "bar" property.

Use cases

The primary use case for this meta-schemas. For example, the hyper-schema meta-schema has to re-define all of the applicator keywords from the core and validation meta-schema. And if something wanted to extend hyper-schema, not only would they have to re-declare all of the core applicators a third time, but also re-declare all of the LDO keywords that use "$ref": "#".

As we make more vocabularies and encourage more extensions, this rapidly becomes untenable.

I will show what the hyper-schema meta-schema would look like with $recurse in a subsequent comment.

There are some other use cases in hypermedia with common response formats, but they are all simpler than the meta-schema use case.

Alternatives

Doca's cfRecurse

This is a simplified version of an extension keyword, cfRecurse, used with Doca. That keyword takes a JSON Pointer (not a URI fragment) that is evaluated with respect to the post-$ref-resolution in-memory data structure. [EDIT: Although don't try it right now, it's broken, long story that is totally irrelevant to the proposal.]

If that has you scratching your head, that's part of why I'm not proposing cfRecurse's exact behavior.

In fact, Doca only supports "" (the root JSON Pointer) as a cfRecurse value, and no one has ever asked for any other path. The use case really just comes up for us with pure recursion.

Specifying any other pointer requires knowing the structure of the in-memory document. And when the whole point is that you don't know what your original root schema (where processing began) will be until runtime, you cannot know that structure.

One could treat the JSON Pointer as an interface constraint- "this schema may only be used with an initial document that has a /definitions/foo schema", but that is a lot of complexity for something that has never come up in practice.

For this reason, $recurse does not take a meaningful value. I chose true because false or null would be counter-intuitive (you'd expect those values to not do things), and a number, string, array, or object would be much more subject to error or misinterpretation.

Parametrized schemas

#322 proposes a general schema parametrization feature, which could possibly be used to implement this feature. It would look something like:

Parameterized schema for oneOf:

{
    "$id": "http://example.com/oneof",
    "properties": {
        "oneOf": {
            "items": {"$ref": {"$param": "rootPointer"}}
        }
    }
}

Using the parametrized schema:

{
    "$id": "http://example.com/caller",
    "allOf": [
        {
            "$ref": "http://example.com/oneof",
            "$params": {
                "rootPointer": "http://example.com/caller"
            }
        }
    ],
    ...
}

See #322 for an explanation of how this works.

I'd rather not open the schema parametrization can of worms right now. $recurse is a much simpler and easy to implement proposal and meets the core need for meta-schema extensibility. It does not preclude implementing schema parametrization, either in a later draft or as an extension vocabulary of some sort (it makes an interesting test case for vocabulary support, actually).

Summary

  • extending recursive schemas is a fundamental use case of JSON Schema as seen in meta-schemas, which happens to require knowledge of where runtime processing started
  • referring to something inside a schema document determined at runtime adds a lot of complexity and has no apparent use case (neither from Doca nor from any issue I've ever seen here), so let's not do it

Runtime resolution (whether $recurse or parametrized schemas) is sufficiently new and powerful that I feel we should lock it down to the simplest case with a clear need. We can always extend it later, but it's hard to pull these things back.

Activity

added this to the draft-08 milestone on Mar 6, 2018
handrews

handrews commented on Mar 6, 2018

@handrews
ContributorAuthor

Note that $recurse would have the same behavior with respect to adjacent keywords as $ref, and the same conceptual model. So delegation (#514) and allowing adjacent keywords by AND-ing results (#523).

handrews

handrews commented on Mar 6, 2018

@handrews
ContributorAuthor

If we were to replace all occurrences of {"$ref": "#"} in the core and validation meta-schema with {"$recurse": true}, then we could re-write the hyper-schema meta-schema as follows:

{
    "$schema": "http://json-schema.org/draft-07/hyper-schema#",
    "$id": "http://json-schema.org/draft-07/hyper-schema#",
    "title": "JSON Hyper-Schema",
    "allOf": [ { "$ref": "http://json-schema.org/draft-07/schema#" } ],
    "properties": {
        "base": {
            "type": "string",
            "format": "uri-template"
        },
        "links": {
            "type": "array",
            "items": {
                "$ref": "http://json-schema.org/draft-07/links#"
            }
        }
    },
    "links": [
        {
            "rel": "self",
            "href": "{+%24id}"
        }
    ]
}

This is 24 lines. The current file is 69 lines, and every time we add, remove, or change an applicator those other lines need to be updated.


If we also replaced {"$ref": "http://json-schema.org/draft-07/hyper-schema#"} with {"$recurse": true} in the Link Description Object schema, then adding a schema keyword "abc" and an LDO keyword "xyz" would look like this:

{
    "$id": "http://example.com/abcxyz",
    "$schema": "http://json-schema.org/draft-07/hyper-schema#",
    "allOf": [ {"$schema": "http://json-schema.org/draft-07/hyper-schema#" } ],
    "properties": {
        "abc": {...},
        "links": {
            "items": {
                "properties": {
                    "xyz": {...}
                }
            }
        }
    }
}

Without $recurse, an extension schema like this would need to both re-re-declare all of the core applicators, and re-declare all four schema fields in the link object.

And of course, if "abc" and "xyz" are schema fields, then without $recurse, any extension of the extension would need to do all of that re-declaration, plus re-declaring "abc" and "xyz".

etc. etc. etc.

Relequestual

Relequestual commented on Mar 7, 2018

@Relequestual
Member

I think I understand this, but let me check.
The proposed $recurse keyword behaves as if it were a $ref where the ref is the root document that has included the schema by use of other $refs, right?

So the two schemas are the equivilent of:

{
  "$id": "http://example.com/bar-schema",
  "allOf": [{
    "properties": {
        "type": "object",
        "foo": {"$ref": "#""}
    }
  }],
  "required": ["bar"],
  "properties": {"bar": {"type": "boolean"}}
}

However because bar-schema has an $id in your example, inlining it wouldn't work, as the base URI for $refs within that schema are re-set to that schema, right?

handrews

handrews commented on Mar 7, 2018

@handrews
ContributorAuthor

@Relequestual yes that is the equivalent of bar-schema (with foo-schema more or less inlined).

However, the presence or absence of $id doesn't matter. bar-schema and foo-schema have separate base URIs even without $id in either of them. Per RFC 3986, the base URI is the URI from which the document was retrieved (which would be a file:// URI if read from the local filesystem) or if none can be determined, an application-dependent fabricated URI:

If none of the conditions described above apply, then the base URI is
defined by the context of the application. As this definition is
necessarily application-dependent, failing to define a base URI by
using one of the other methods may result in the same content being
interpreted differently by different types of applications.

So even if you are just creating and working with these in-memory, they implicitly have different base URIs so "$ref": "#" in one can only ever refer to its own root. You cannot simulate the cross-file behavior of "$recurse" with "$ref".

Relequestual

Relequestual commented on Mar 7, 2018

@Relequestual
Member

Right, but my point was, if I took the two schemas and made one, which would have the equivilent behaviour, then that would be it. WHICH looks like I've dereferenced the ref, but not included the $id.

It's not suggesting an alternative, but more that's what this feature is aiming to achive.

handrews

handrews commented on Mar 7, 2018

@handrews
ContributorAuthor

but not included the $id.

Right- I was responding to your

However because bar-schema has an $id in your example, inlining it wouldn't work

which to me implies that it would have worked without an $id, and it doesn't (to inline a schema without an explicit $id, you MUST assign it an $id in the inlined version). I'm being pedantic about this because so many people are confused about $id and how it fits with $ref.

handrews

handrews commented on Mar 8, 2018

@handrews
ContributorAuthor

Note that I've updated the "Alternatives" section in the initial comment with a discussion of #322 (parametrized/higher-order/templatized schemas) as a possible alternative solution.

awwright

awwright commented on Apr 10, 2018

@awwright
Member

It looks like the problem statement is We want to be able to extend a some schema, and have recursive references refer back to the extended version.

So what if I'm writing a JSON document, and I want a JSON Schema to be one of the values?

{
  type: "object",
  properties: {
    "name": { type: "string" },
    "label": { type: "string" },
    "range": { $ref: "http://json-schema.org/draft-07/hyper-schema#" }
  }
}
awwright

awwright commented on Apr 10, 2018

@awwright
Member

Perhaps there can be an argument that specifies substitutions to make when evaluating sub-schemas: "in sub-schemas, when it refers to <http://json-schema.org/draft-07/schema#>, actually use <http://json-schema.org/draft-07/hyper-schema#>"

12 remaining items

handrews

handrews commented on Jun 15, 2018

@handrews
ContributorAuthor

Thanks, @awwright! I've been thinking about this more since our discussion.

I feel that if we really want to go for the aliasing approach, we need to consider a generic syntax such as that proposed by #322 (parametrized/templatized/higher-order schemas). We might want to restrict where it can be used at least at first, but what I like about that proposal is that the thing being replaced needs to opt-in with $param.

What concerns me about the #322 proposal is that it raises a lot of questions about keywords needing to allow the $param object as a value, and how that works with values that are already objects allowing any keys, and how it changes the concept of a schema's identity. You also end up needing to pipe parameters through intermediate schemas, so your $params values need to support $param themselves. Which is not a horrible thing, but does demonstrate the complexity.


I'd like to try again to stay focused on the recursion case, as I still feel that it is better motivated. And I think that "this schema allows recursive extension" is less of a problem for managing schema identity than full-on parameterization. Although I don't have a clear argument for that so I might be wrong.

I think a double-opt-in approach is key: It needs to be clear which references are dynamic, and the dynamic target needs to be explicit rather than implicit. The latter point is where $recurse got into trouble. Since the dynamic target was implicitly the entry point schema, no matter how the schemas were structured, it would not work when embedded in another schema (such as links.json, the LDO meta-schema, when used on its own).

We can solve that with a keyword I'm calling $recursiveRoot, paired with $recursiveRef which is somewhat like $recurse except that it still takes a URI reference which it uses when no recursive root has been set.

Like $schema, $recursiveRoot is only respected in root schemas. Furthermore, when walking references to new documents, only the first $recursiveRoot encountered takes effect, pinning the target of all $recursiveRefs to point back to it, rather than to the reference provided as a value.

This provides the double opt-in:

  • A recursive schema that may be extended must use $recursiveRef appropriately, which also indicates to readers of that schema that the URI reference value may or may not be the actual reference target at runtime.
  • A schema referring to a recursive schema must use $recursiveRoot to get the recursive extension behavior. Otherwise, the embedding behavior (which is all we've had up until now so is the default) is what you get.

Rather than paste a bunch of examples in here, I have created PR #605 to show how schema.json, hyper-schema.json, links.json, and a hypothetical hyper-operations.json that further extends hyper-schema.json, work with these keywords. This covers several extension and embedding cases (I think all of them, but I might be missing something somewhere).

handrews

handrews commented on Jun 16, 2018

@handrews
ContributorAuthor

I think a better term to use here than "extending" is "refining". JSON Schema is a constraint system. The empty (meta-)schema allows everything. Using type, properties, etc. refines that unbounded set of possible instance documents (which are schemas in the case of meta-schemas) to a smaller set.

Hyper-Schema's meta-schema doesn't really add links and base as keywords. The core/validation meta-schema allows them with any value, and any semantics (this is how extension keywords work in general- the assumption is that some specific implementation can handle them).

The hyper-schema meta-schema refines the core/validation meta-schema by constraining those two keywords into a syntax that supports well-defined semantics.

This is a bit mind-bending at first, but I've found that when I can get it across, usually by starting from explaining the empty schema, a lot of things seem to click for people. We could explain this in detail with examples on the web site (not just for the $recursiveRef case, but in general), and hopefully start change people's assumptions about JSON Schema from being OO-ish to being what it really is.

ghost

ghost commented on Oct 29, 2018

@ghost

Hyper-Schema's meta-schema doesn't really add links and base as keywords. The core/validation meta-schema allows them with any value, and any semantics (this is how extension keywords work in general- the assumption is that some specific implementation can handle them).

Very interesting. Given this, if I have some custom schema keywords should I bother creating my own meta-schema with all the complexity it entails? Or just informally document that we've added 2 keywords and explain what they mean?

Given that this issue is still open it appears that the easiest way to create a custom meta-schema is to
copy/paste/tweak the hyper-schema; using it as an example of how to provide semantics for new keywords.

handrews

handrews commented on Nov 1, 2018

@handrews
ContributorAuthor

@mgwelch at the moment, either option is fine. If you use a validator that pays attention to $schema and would be confused by a custom meta-schema, then it makes more sense to just informally document it. However, a custom meta-schema can be useful for ensuring that your new keywords are used correctly.

And yes, copy-pasting the hyper-schema meta-schema is the best option for now. This will improve in draft-08 (I just need to update those PRs with review feedback and do the other part with the $vocabularies keyword).

ghost

ghost commented on Nov 1, 2018

@ghost

Thanks @handrews, I really appreciate the feedback. I think I will take a crack at the custom meta-schema but would appreciate a recommendation for a tool you use for validating schemas against custom meta-schemas. I find it hard to google for answers to anything involving meta-schemas.

handrews

handrews commented on Nov 1, 2018

@handrews
ContributorAuthor

@mgwelch most validators will work if you just pass the schema as the instance and the meta-schema as the schema. A few will handle meta-schemas specially (Ajv in JavaScript, for example).

handrews

handrews commented on Nov 13, 2018

@handrews
ContributorAuthor

PRs merged!

ghost

ghost commented on Nov 13, 2018

@ghost

PRs merged!

Awesome! And this is a real live example?: https://github.com/json-schema-org/json-schema-spec/blob/master/hyper-schema.json

That's nice.

handrews

handrews commented on Nov 14, 2018

@handrews
ContributorAuthor

@mgwelch yup! And yes, it is! As we write fine-grained meta-schemas for vocabularies (assuming that goes the way I expect- you can see a bit of it in #671) you'll see us depend on the feature even more.

I don't know how much it will get used outside of that context, but it will be available in conforming validators, so I guess we'll find out :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Type

No type

Projects

Status

Closed

Relationships

None yet

    Development

    No branches or pull requests

      Participants

      @awwright@Relequestual@handrews

      Issue actions

        Recursive schema composition · Issue #558 · json-schema-org/json-schema-spec