Skip to content

spec: remove/change extending meta-schemas recommendation #86

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
epoberezkin opened this issue Oct 11, 2016 · 23 comments
Closed

spec: remove/change extending meta-schemas recommendation #86

epoberezkin opened this issue Oct 11, 2016 · 23 comments
Labels
clarification Items that need to be clarified in the specification core
Milestone

Comments

@epoberezkin
Copy link
Member

epoberezkin commented Oct 11, 2016

https://github.com/json-schema-org/json-schema-spec/blob/master/jsonschema-core.xml#L254

The suggestion to extend meta-schemas using allOf (assuming that one of subschemas is $ref) is not feasible because meta-schemas are recursive and they have references to the meta-schema itself ("#"). Therefore if meta-schema is extended using $ref to the original meta-schema the # ref in that meta-schema will not point to the extended one, it will still point to the original, so the schemas would not be properly validated.

The only working meta-schema extension mechanism at the moment is copy-pasting and adding keywords, suggestion to use allOf is misleading because it implies $ref.

We have several options:

  1. Suggest copy-pasting with explanation why it is the only working option
  2. Adopt $merge/$patch keywords proposed by @fge
  3. Drop this suggestion from this version of the spec until we agree what is the right thing.

Happy to draft 1, can draft 2 as well...

@awwright awwright added this to the draft-6 milestone Oct 11, 2016
@handrews
Copy link
Contributor

For any extension of JSON Schema, if an instance validates against the extended version, it should also validate against the regular JSON Schema. So while you do need to re-declare the properties with child schemas so that those children reference the extended meta-schema, doing that within the allOf is sufficient to make everything work.

If your extension sets additionalProperties to false (which is not advised), then we're back in a specific re-use problem, and I have a very detailed proposal that I hope to post today or tomorrow (it's derived from use cases and vague proposals from the email thread on the google group a while back). So, @epoberezkin , please wait before writing up $merge/$patch. For one thing, it's already written up as issue #15 , even if not specifically for this case.

@awwright
Copy link
Member

Without thinking too much about it, I'd venture to guess that patching the official meta-schema to replace all meta-schema references references to your own meta-schema would be really painful (at least if absolute URIs were used exclusively)

@epoberezkin
Copy link
Member Author

epoberezkin commented Oct 11, 2016

patching the official meta-schema to replace all meta-schema references references to your own meta-schema would be really painful

@awwright you don't need to. See the example below.

doing that within the allOf is sufficient to make everything work.

@handrews if you limit the definition of "work" to "pass validation of valid schemas", allOf keyword would indeed be sufficient. Actually with this definition of "working" you don't have to extend the original meta-schema at all. If you include into the definition of "work" the requirement "to fail validation of invalid schemas" then it won't work.

Consider the example: you add a new keyword my to the schema vocabulary that must be boolean. To validate your new keyword you extend the meta-schema in this way:

{
  "id": "draft6_with_my",
  "$schema": "draft6",
  "allOf": [
    { "$ref": "draft6" },
    { "properties": { "my": { "type": "boolean" } } }
  ]
}

That's what I think the suggestion of using allOf implies.

Now this schema would indeed be invalid, as it should be:

{
  "$schema": "draft6_with_my",
  "my": "foo"
}

But this schema will be valid, although it must be invalid:

{
  "$schema": "draft6_with_my",
  "properties": {
    "prop": { "my": "foo" }
  }
}

The schema above will be valid because the subschema for "prop" property will be validated using the original rather than the extended meta-schema. I hope it makes sense.

Now, to address this issue without using copy-paste with the proposed @merge keyword you can write the meta-schema in this way, (@awwright without the need to patch all the references):

{
  "id": "draft6_with_my",
  "$schema": "draft6",
  "$merge": {
    "source": { "$ref": "draft6" },
    "with": { "properties": { "my": { "type": "boolean" } } }
  }  
}

The result of this merge should be the new schema where all the refs to # would point to this new schema, rather than to the original, and this new schema would have "my" keyword added to the list of standard keywords, so it would validate it on all levels. Or at least that's my interpretation of what @merge should be doing.

In any case, whether we add @merge or not is another issue, I agree. But "allOf" approach doesn't work as you can see from the example, unless you copy paste the whole schema instead of using $ref (which would not be different from simply editing this copy without using allOf).

@epoberezkin
Copy link
Member Author

epoberezkin commented Oct 11, 2016

@awwright Actually, that also means that $ref in $merge should mean a different thing from $ref outside of $merge (inlining vs validating - see #85 (comment)). So maybe we should add one more $ keyword for inlining instead of using $ref for both purposes. That would make sense actually because in $ref'd (=referenced) schemas refs will be resolved based on the ID of $ref'd schema and in inlined schemas ($inline?) refs will be resolved based on the ID in the current schema. You kind of need both for different use-cases...

@awwright
Copy link
Member

I... I'll need to contemplate this when I'm feeling a little more awake

@epoberezkin
Copy link
Member Author

Yep... Needs some thinking indeed

@epoberezkin
Copy link
Member Author

I will sleep on it and create another issue proposing this $inline (or whatever it's called) thing with motivation, explanation and examples.

@handrews
Copy link
Contributor

@epoberezkin : You don't need to change $ref to get $merge to work. We implemented it at Riverbed and used it extensively. You just need to ensure that $merge is also late-binding.

Here's the (I think) most concise way of handling this without $merge:

{
    "id": "draft6_with_my",
    "$schema": "draft6",
    "allOf": [
        { "$ref": "draft6" },
        {
            "properties": {
                "my": { "type": "boolean" },
                "properties": { "$ref": "#" },
                "patternProperties": { "additionalProperties": { "$ref": "#" } },
                "additionalProperties": { "oneOf": [ { "type": "boolean" }, { "$ref": "#" } ] },
                "items": { "oneOf": [ { "$ref": "#" }, { "type": "array", "items": { "$ref": "#" } } ] },
                "additionalItems": { "oneOf": [ { "type": "boolean" }, { "$ref": "#" } ] },
                "dependencies": { "additionalProperties": { "oneOf": [ { "type": "string" }, { "$ref": "#" } ] } },
                "anyOf": { "type": "array", "items": { "$ref": "#" } },
                "allOf": { "type": "array", "items": { "$ref": "#" } },
                "oneOf": { "type": "array", "items": { "$ref": "#" } },
                "not": { "$ref": "#" },
                "definitions": { "additionalProperties": { "$ref": "#" } }
            }
        }
    ]
}

While "concise" might be stretching it, it is actually quite clear. A shorthand would be useful, but as you'll soon see with my forthcoming issue on re-using schemas that set additionalProperties to be false, trying to have schemas that are being re-used together produce a result that depends on both of them is highly problematic.

I have also come around to the idea that $merge and $patch are too generic and powerful to be part of JSON Schema. The correct option might simply be for someone to implement those separately and then folks can have a build step that processes them out before feeding the results to any actual JSON Schema validator.

@jdesrosiers
Copy link
Member

The Hyper-Schema meta schema is a real-life example of the difficulties with extending JSON Schema.

@handrews
Copy link
Contributor

One thing worth considering is whether this is a use case that really needs optimizing. How often will people extend the schema? Also, the meta-schemas shouldn't be changing. Even if we release new updates frequently while driving this to completion, each new revision should be a new document, with the old ones left valid. So while I loathe copy-pasting, it's pretty safe here.

@epoberezkin
Copy link
Member Author

Here's the (I think) most concise way of handling this without $merge

Copying 50% of the schema is marginally better than copying it all. You still need to repeat all the keywords that contain schemas, and the new keywords are coming.

How often will people extend the schema?

The meta-schema extension is needed to validate application specific custom keywords. A lot of people use custom keywords in their schemas and the proper way is to define your own meta-schema that would validate both standard and custom keywords. At the moment, because there is no extension mechanism, Ajv allows to define micro-meta-schemas for each custom keyword separately, but it seems a wrong solution - the correct one is to be able to validate the whole schema.

The fact that $merge is powerful seems to be the argument to use it, rather than to not use it. We use $merge not only for meta-schema extension, but to derive similar schemas from one schema without having "additionalProperties" issue. So I don't really understand the opposition.

In any case, this issue is not about $merge, it's about the paragraph in the spec that encourages users to extend meta-schemas giving them misleading recommendations how to do it. Nobody will make a mental leap from the suggestion to use allOf to the realisation that you need to copy/paste 50% of the original meta-schema inside allOf.

@epoberezkin
Copy link
Member Author

epoberezkin commented Oct 12, 2016

@handrews see this comment: #85 (comment)

In addition to being not concise, the meta-schema you suggest will validate many keywords twice. I don't think there are any validators smart enough to avoid it.

@jdesrosiers
Copy link
Member

@epoberezkin, I'm for option 1

  1. Suggest copy-pasting with explanation why it is the only working option

I sincerely hope we come up with a better way in the future, but for now, this is how it has to be done. Even if this isn't done often, it should be documented for the people who are interested. I answered a question about this on StackOverflow a while back, so I can also testify that meta schema extension is something people want to do sometimes.

@handrews
Copy link
Contributor

@epoberezkin @jdesrosiers I wasn't really asking how many people want to extend the meta-schema (although that's a relevant question), I was asking how often any one person or group will need to extend / re-extend a meta-schema. If it's something that no one entity has to do very often, a slightly annoying manual process (perhaps we could provide a starter extension template) is better than a complicated and/or costly additional feature.

@jdesrosiers
Copy link
Member

a slightly annoying manual process is better than a complicated and/or costly additional feature.

Agreed. That's why I voted for option 1 (documenting the difficulties).

@epoberezkin
Copy link
Member Author

@handrews For this particular use-case, it's acceptable to copy/paste, it is indeed not too often for a single user.

But schema extension is a wider issue than meta-schema extension, and whenever you want to extend a recursive schema (which is quite common) there is no mechanism in JSON-schema at the moment. I tried several approaches:

  • copy/paste
  • build process to generate from a common schema (either with code or with template).
  • $merge/$patch

The last one seems to me the most straightforward and the easiest to maintain. Copy paste makes a big chunk of you code duplicated and very difficult to see what is the difference (applies to meta-schema as well). Build process introduces complexity, additional code and entities - so it is prone to bugs. $merge/$patch keeps it concise, contained and easy to maintain. Try using it, it may grow on you.

@handrews
Copy link
Contributor

But schema extension is a wider issue than meta-schema extension, and whenever you want to extend a recursive schema (which is quite common) there is no mechanism in JSON-schema at the moment.

That is a more convincing argument- perhaps you should change the name of this issue to focus on that instead of on the meta-schema aspect?

$merge/$patch keeps it concise, contained and easy to maintain. Try using it, it may grow on you.

I (or people on the teams I advised) tended to make use of it 10+ times in each of the 20 or so schema-based service definitions involved in my last project, so I'd say I'm pretty familiar with it's pros and cons.

I don't like the possibility of arbitrary transformations applied. It makes schema validation algebra extremely difficult to work with, because you can't reason about the validation without first resolving the $merge/$patch. It would be much better to come up with a proposal targeted specifically at solving the recursion problem, as that is the problem that needs to be solved.

@handrews
Copy link
Contributor

handrews commented Nov 28, 2016

Here is an idea that I'm not even sure that I like :-P

We could introduce another form of JSON Reference that, instead of indicating the current schema document "#", it would specifically indicate the root of the overall schema in which it is being used.

I'm not sure what the syntax would be. Perhaps in place of the URI Reference string, it would take an object that (at this time) can only indicate that it is referencing the full root:

{
    "someProperty": {
        "$ref": {"$root": true}
    }
}

Or maybe it's just a different object with the same "MUST ignore all other properties" approach as $ref:

{
    "someProperty":  {"$root": true}
}

In both cases I can't think of any use for {"$root": false} which indicates that this is probably not a great design.

Even aside from that I'm not sure if I think this is a good idea or not. Just tossing it out there. It would not be hard to implement- the validator just has to remember the id of its initial entry point and use it whenever it encounters the root reference.

I was just working on the hyper-schema meta-schema and noticed a number of bugs caused by the duplication from the core/validation meta-schema. So it reminded me of this issue.

@handrews
Copy link
Contributor

Another idea for syntax would just be {"$recurse": true} used in either of the ways proposed for "$root".

@awwright awwright modified the milestones: draft-future, draft-next Nov 30, 2016
@handrews handrews modified the milestones: draft-07 (wright-*-02), draft-future May 16, 2017
@handrews handrews added the core label May 16, 2017
@handrews
Copy link
Contributor

handrews commented Aug 19, 2017

Note that Cloudflare's "doca" project has a keyword that implements exactly this recurse/root behavior. Currently, it's an overloaded "rel": "self" used outside of an LDO, but I'm fixing that to be a distinct namespaced extension keyword.

The implementation is quite straightforward. You do need to make note of the root schema when you start processing, and remember it as you lazily evaluate references. In practice, this has not been difficult, but it is a significant new requirement that breaks the independence of subschema evaluation.

(I had not taken over the maintenance of doca yet when I wrote the above comments, so I didn't realize it was already existing in the wild at the time!)

@handrews
Copy link
Contributor

handrews commented Sep 1, 2017

but it is a significant new requirement that breaks the independence of subschema evaluation.

The more I think about this statement, the less I think it is true. Having to remember the entry point schema where you started processing is no more burdensome than remembering the base URI. In fact it is less burdensome, as the base URI can change during processing. The entry point schema, by definition, does not.

handrews added a commit to handrews/json-schema-spec that referenced this issue Sep 1, 2017
This addresses the original question posed in json-schema-org#86, which
noted that simply saying that meta-schemas can be extended
with "allOf" is misleading and likely to lead to errors.
@handrews
Copy link
Contributor

handrews commented Sep 1, 2017

@epoberezkin I've added wording addressing the initial concern in PR #389.

We have #314 for a more detailed examination of how to extend meta-schemas, and if we want to pursue the $recurse/$root idea that we use in Doca I can file a separate issue to track that idea on its own.

Given that, is it OK to close this if/when you approve #389?

@epoberezkin
Copy link
Member Author

@handrews thank you, closing.

handrews added a commit to handrews/json-schema-spec that referenced this issue Sep 2, 2017
This addresses the original question posed in json-schema-org#86, which
noted that simply saying that meta-schemas can be extended
with "allOf" is misleading and likely to lead to errors.
@gregsdennis gregsdennis added clarification Items that need to be clarified in the specification and removed Type: Bug labels Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clarification Items that need to be clarified in the specification core
Projects
None yet
Development

No branches or pull requests

6 participants