Description
TL;DR:
Re-using recursive schemas is a challenge.
$recurse
is a specialized version of$ref
with a context-dependent target- The target is the root schema of the document where schema processing began
- Processing can be either static schema walking or dynamic evaluation with an instance
- The value of
$recurse
is alwaystrue
(discussed in the "alternatives" section) - This is based on a keyword we have long used in Doca
Example
APPARENTLY MANDATORY DISCLAIMER: This is a minimal contrived example, please do not point out all of the ways in which it is unrealistic or fails to be a convincing use case because you can refactor it. It's just showing the mechanism.
foo-schema:
{
"$id": "http://example.com/foo-schema",
"properties": {
"type": "object",
"foo": {"$recurse": true}
}
}
bar-schema:
{
"$id": "http://example.com/bar-schema",
"allOf": [{"$ref": "http://example.com/foo"}],
"required": ["bar"],
"properties": {"bar": {"type": "boolean"}}
}
The instance:
{
"bar": true,
"foo": {
"bar": false,
"foo": {
"foo": {}
}
}
}
is valid against the first schema, but not the second.
It is valid against foo-schema because the "$recurse": true
is in foo-schema, which is the same document that we started processing. Therefore it behaves exactly like "$ref": "#"
. The recursive "foo" works as you'd expect with "$ref": "#"
, and foo-schema doesn't care about "bar" being there (additional properties are not forbidden).
However, it is not valid against bar-schema because in that case, the "$recurse": true
in foo-schema behaves like "$ref": "http://example.com/bar-schema"
, as bar-schema is the document that we started processing. Taking this step by step from the top down:
- Processing the root of the instance, we have the "bar" property required by bar-schema; we got this directly from the root schema of bar-schema, without
$recurse
being involved - Looking inside "foo", processing follows the
allOf
and$ref
to foo-schema. The top-level instance is an object, so we pass thetype
constraint - Still processing foo-schema, for the contents of the "foo" property, we have
"$recurse": true. Since we started processing with bar-schema, this is the equivalent of
"$ref": "bar-schema" - So now we apply bar-schema to the contents of foo. This works fine: there is a boolean "bar", and we follow
allOf
and$ref
back to foo-schema, and pass the `"type": "object" constraint - Now, once again, we look at
"$recurse": true
to go into the next level "foo", and once again this is treated as"$ref": "bar-schema"
- Now validation fails, because the innermost "foo" does not have the required "bar" property.
Use cases
The primary use case for this meta-schemas. For example, the hyper-schema meta-schema has to re-define all of the applicator keywords from the core and validation meta-schema. And if something wanted to extend hyper-schema, not only would they have to re-declare all of the core applicators a third time, but also re-declare all of the LDO keywords that use "$ref": "#"
.
As we make more vocabularies and encourage more extensions, this rapidly becomes untenable.
I will show what the hyper-schema meta-schema would look like with $recurse
in a subsequent comment.
There are some other use cases in hypermedia with common response formats, but they are all simpler than the meta-schema use case.
Alternatives
Doca's cfRecurse
This is a simplified version of an extension keyword, cfRecurse
, used with Doca. That keyword takes a JSON Pointer (not a URI fragment) that is evaluated with respect to the post-$ref
-resolution in-memory data structure. [EDIT: Although don't try it right now, it's broken, long story that is totally irrelevant to the proposal.]
If that has you scratching your head, that's part of why I'm not proposing cfRecurse
's exact behavior.
In fact, Doca only supports ""
(the root JSON Pointer) as a cfRecurse
value, and no one has ever asked for any other path. The use case really just comes up for us with pure recursion.
Specifying any other pointer requires knowing the structure of the in-memory document. And when the whole point is that you don't know what your original root schema (where processing began) will be until runtime, you cannot know that structure.
One could treat the JSON Pointer as an interface constraint- "this schema may only be used with an initial document that has a /definitions/foo
schema", but that is a lot of complexity for something that has never come up in practice.
For this reason, $recurse
does not take a meaningful value. I chose true
because false
or null
would be counter-intuitive (you'd expect those values to not do things), and a number, string, array, or object would be much more subject to error or misinterpretation.
Parametrized schemas
#322 proposes a general schema parametrization feature, which could possibly be used to implement this feature. It would look something like:
Parameterized schema for oneOf
:
{
"$id": "http://example.com/oneof",
"properties": {
"oneOf": {
"items": {"$ref": {"$param": "rootPointer"}}
}
}
}
Using the parametrized schema:
{
"$id": "http://example.com/caller",
"allOf": [
{
"$ref": "http://example.com/oneof",
"$params": {
"rootPointer": "http://example.com/caller"
}
}
],
...
}
See #322 for an explanation of how this works.
I'd rather not open the schema parametrization can of worms right now. $recurse
is a much simpler and easy to implement proposal and meets the core need for meta-schema extensibility. It does not preclude implementing schema parametrization, either in a later draft or as an extension vocabulary of some sort (it makes an interesting test case for vocabulary support, actually).
Summary
- extending recursive schemas is a fundamental use case of JSON Schema as seen in meta-schemas, which happens to require knowledge of where runtime processing started
- referring to something inside a schema document determined at runtime adds a lot of complexity and has no apparent use case (neither from Doca nor from any issue I've ever seen here), so let's not do it
Runtime resolution (whether $recurse
or parametrized schemas) is sufficiently new and powerful that I feel we should lock it down to the simplest case with a clear need. We can always extend it later, but it's hard to pull these things back.
Metadata
Metadata
Assignees
Type
Projects
Status
Activity
handrews commentedon Mar 6, 2018
Note that
$recurse
would have the same behavior with respect to adjacent keywords as$ref
, and the same conceptual model. So delegation (#514) and allowing adjacent keywords by AND-ing results (#523).handrews commentedon Mar 6, 2018
If we were to replace all occurrences of
{"$ref": "#"}
in the core and validation meta-schema with{"$recurse": true}
, then we could re-write the hyper-schema meta-schema as follows:This is 24 lines. The current file is 69 lines, and every time we add, remove, or change an applicator those other lines need to be updated.
If we also replaced
{"$ref": "http://json-schema.org/draft-07/hyper-schema#"}
with{"$recurse": true}
in the Link Description Object schema, then adding a schema keyword "abc" and an LDO keyword "xyz" would look like this:Without
$recurse
, an extension schema like this would need to both re-re-declare all of the core applicators, and re-declare all four schema fields in the link object.And of course, if "abc" and "xyz" are schema fields, then without
$recurse
, any extension of the extension would need to do all of that re-declaration, plus re-declaring "abc" and "xyz".etc. etc. etc.
Relequestual commentedon Mar 7, 2018
I think I understand this, but let me check.
The proposed
$recurse
keyword behaves as if it were a$ref
where the ref is the root document that has included the schema by use of other $refs, right?So the two schemas are the equivilent of:
However because bar-schema has an
$id
in your example, inlining it wouldn't work, as the base URI for $refs within that schema are re-set to that schema, right?handrews commentedon Mar 7, 2018
@Relequestual yes that is the equivalent of bar-schema (with foo-schema more or less inlined).
However, the presence or absence of
$id
doesn't matter. bar-schema and foo-schema have separate base URIs even without$id
in either of them. Per RFC 3986, the base URI is the URI from which the document was retrieved (which would be afile://
URI if read from the local filesystem) or if none can be determined, an application-dependent fabricated URI:So even if you are just creating and working with these in-memory, they implicitly have different base URIs so
"$ref": "#"
in one can only ever refer to its own root. You cannot simulate the cross-file behavior of"$recurse"
with"$ref"
.Relequestual commentedon Mar 7, 2018
Right, but my point was, if I took the two schemas and made one, which would have the equivilent behaviour, then that would be it. WHICH looks like I've dereferenced the ref, but not included the $id.
It's not suggesting an alternative, but more that's what this feature is aiming to achive.
handrews commentedon Mar 7, 2018
Right- I was responding to your
which to me implies that it would have worked without an
$id
, and it doesn't (to inline a schema without an explicit$id
, you MUST assign it an$id
in the inlined version). I'm being pedantic about this because so many people are confused about$id
and how it fits with$ref
.handrews commentedon Mar 8, 2018
Note that I've updated the "Alternatives" section in the initial comment with a discussion of #322 (parametrized/higher-order/templatized schemas) as a possible alternative solution.
awwright commentedon Apr 10, 2018
It looks like the problem statement is We want to be able to extend a some schema, and have recursive references refer back to the extended version.
So what if I'm writing a JSON document, and I want a JSON Schema to be one of the values?
awwright commentedon Apr 10, 2018
Perhaps there can be an argument that specifies substitutions to make when evaluating sub-schemas: "in sub-schemas, when it refers to <
http://json-schema.org/draft-07/schema#
>, actually use <http://json-schema.org/draft-07/hyper-schema#
>"12 remaining items
handrews commentedon Jun 15, 2018
Thanks, @awwright! I've been thinking about this more since our discussion.
I feel that if we really want to go for the aliasing approach, we need to consider a generic syntax such as that proposed by #322 (parametrized/templatized/higher-order schemas). We might want to restrict where it can be used at least at first, but what I like about that proposal is that the thing being replaced needs to opt-in with
$param
.What concerns me about the #322 proposal is that it raises a lot of questions about keywords needing to allow the
$param
object as a value, and how that works with values that are already objects allowing any keys, and how it changes the concept of a schema's identity. You also end up needing to pipe parameters through intermediate schemas, so your$params
values need to support$param
themselves. Which is not a horrible thing, but does demonstrate the complexity.I'd like to try again to stay focused on the recursion case, as I still feel that it is better motivated. And I think that "this schema allows recursive extension" is less of a problem for managing schema identity than full-on parameterization. Although I don't have a clear argument for that so I might be wrong.
I think a double-opt-in approach is key: It needs to be clear which references are dynamic, and the dynamic target needs to be explicit rather than implicit. The latter point is where
$recurse
got into trouble. Since the dynamic target was implicitly the entry point schema, no matter how the schemas were structured, it would not work when embedded in another schema (such as links.json, the LDO meta-schema, when used on its own).We can solve that with a keyword I'm calling
$recursiveRoot
, paired with$recursiveRef
which is somewhat like$recurse
except that it still takes a URI reference which it uses when no recursive root has been set.Like
$schema
,$recursiveRoot
is only respected in root schemas. Furthermore, when walking references to new documents, only the first$recursiveRoot
encountered takes effect, pinning the target of all$recursiveRef
s to point back to it, rather than to the reference provided as a value.This provides the double opt-in:
$recursiveRef
appropriately, which also indicates to readers of that schema that the URI reference value may or may not be the actual reference target at runtime.$recursiveRoot
to get the recursive extension behavior. Otherwise, the embedding behavior (which is all we've had up until now so is the default) is what you get.Rather than paste a bunch of examples in here, I have created PR #605 to show how schema.json, hyper-schema.json, links.json, and a hypothetical hyper-operations.json that further extends hyper-schema.json, work with these keywords. This covers several extension and embedding cases (I think all of them, but I might be missing something somewhere).
handrews commentedon Jun 16, 2018
I think a better term to use here than "extending" is "refining". JSON Schema is a constraint system. The empty (meta-)schema allows everything. Using
type
,properties
, etc. refines that unbounded set of possible instance documents (which are schemas in the case of meta-schemas) to a smaller set.Hyper-Schema's meta-schema doesn't really add
links
andbase
as keywords. The core/validation meta-schema allows them with any value, and any semantics (this is how extension keywords work in general- the assumption is that some specific implementation can handle them).The hyper-schema meta-schema refines the core/validation meta-schema by constraining those two keywords into a syntax that supports well-defined semantics.
This is a bit mind-bending at first, but I've found that when I can get it across, usually by starting from explaining the empty schema, a lot of things seem to click for people. We could explain this in detail with examples on the web site (not just for the
$recursiveRef
case, but in general), and hopefully start change people's assumptions about JSON Schema from being OO-ish to being what it really is.ghost commentedon Oct 29, 2018
Very interesting. Given this, if I have some custom schema keywords should I bother creating my own meta-schema with all the complexity it entails? Or just informally document that we've added 2 keywords and explain what they mean?
Given that this issue is still open it appears that the easiest way to create a custom meta-schema is to
copy/paste/tweak the hyper-schema; using it as an example of how to provide semantics for new keywords.
handrews commentedon Nov 1, 2018
@mgwelch at the moment, either option is fine. If you use a validator that pays attention to
$schema
and would be confused by a custom meta-schema, then it makes more sense to just informally document it. However, a custom meta-schema can be useful for ensuring that your new keywords are used correctly.And yes, copy-pasting the hyper-schema meta-schema is the best option for now. This will improve in draft-08 (I just need to update those PRs with review feedback and do the other part with the
$vocabularies
keyword).ghost commentedon Nov 1, 2018
Thanks @handrews, I really appreciate the feedback. I think I will take a crack at the custom meta-schema but would appreciate a recommendation for a tool you use for validating schemas against custom meta-schemas. I find it hard to google for answers to anything involving meta-schemas.
handrews commentedon Nov 1, 2018
@mgwelch most validators will work if you just pass the schema as the instance and the meta-schema as the schema. A few will handle meta-schemas specially (Ajv in JavaScript, for example).
handrews commentedon Nov 13, 2018
PRs merged!
ghost commentedon Nov 13, 2018
Awesome! And this is a real live example?: https://github.com/json-schema-org/json-schema-spec/blob/master/hyper-schema.json
That's nice.
handrews commentedon Nov 14, 2018
@mgwelch yup! And yes, it is! As we write fine-grained meta-schemas for vocabularies (assuming that goes the way I expect- you can see a bit of it in #671) you'll see us depend on the feature even more.
I don't know how much it will get used outside of that context, but it will be available in conforming validators, so I guess we'll find out :-)