Skip to content

complex schema with parent-child-grandchild references in multiple subdirectories aren't resolved correctly #601

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kuba-lilz opened this issue Sep 10, 2019 · 6 comments
Labels
Invalid Not a bug, PEBKAC, or an unsupported setup

Comments

@kuba-lilz
Copy link

kuba-lilz commented Sep 10, 2019

There are already quite a few issues related to RefResolver not working correctly for complex schemas, but many of them don't include much code, and none that I saw includes multiple subdirectories.

In this issue I included code that shows that jsonschema doesn't resolve file references correctly for nested references in multiple directories.

System info:
OSX
Python 3.6.8
jsonschema version: '3.0.1'

Project setup:

.
└── my_complex_schemas
   ├── basic_shapes
   │   ├── circle.json
   │   └── rectangle.json
   ├── complex_shapes
   │   └── rectangle_with_hole.json
   └── point.json

my_complex_schemas/point.json

{
  "type": "object",
  "properties": {
    "x": { "type": "number" },
    "y": { "type": "number" }
  },
  "required": ["x", "y"]
}

my_complex_schemas/basic_shapes/circle.json

{
    "type": "object",
    "properties": {
        "center": { "$ref": "point.json" },
        "radius": {"type": "number"}
    },
    "required": ["center", "radius"]
}

my_complex_schemas/basic_shapes/rectangle.json

{
    "type": "object",
    "properties": {
      "top_left": { "$ref": "point.json" },
      "top_right": { "$ref": "point.json" },
      "bottom_left": { "$ref": "point.json" },
      "bottom_right": { "$ref": "point.json" }
    },
    "required": ["top_left", "top_right", "bottom_left", "bottom_right"]
}

my_complex_schemas/complex_shapes/rectangle_with_hole.json

{
    "type": "object",

    "allOf": [
       { "$ref": "basic_shapes/rectangle.json" }
    ],

    "properties": {
      "hole": { "$ref": "basic_shapes/circle.json" }
    },
    "required": ["hole"]
}

With above, when running from directory above my_complex_schemas directory, I can correctly validate circle schema that reference point schema, e.g.:

with open("./my_complex_schemas/basic_shapes/circle.json") as file:
    schema = json.load(file)

base_uri = "file://{}/".format(os.path.abspath("./my_complex_schemas"))
resolver = jsonschema.RefResolver(base_uri=base_uri, referrer=schema)

valid_data = {"center": {"x": 10, "y": 20}, "radius": 2}
jsonschema.validate(valid_data, schema, resolver=resolver)

and similar test passes for rectangle schema.
However, I can't validate rectangle_with_hole schema that references circle and rectangle schemas.

For code

with open("./my_complex_schemas/complex_shapes/rectangle_with_hole.json") as file:
    schema = json.load(file)

base_uri = "file://{}/".format(os.path.abspath("./my_complex_schemas"))
resolver = jsonschema.RefResolver(base_uri=base_uri, referrer=schema)

valid_data = {
    "top_left": {"x": 10, "y": 10},
    "top_right": {"x": 20, "y": 10},
    "bottom_left": {"x": 10, "y": 20},
    "bottom_right": {"x": 20, "y": 20},
    "hole": {"center": {"x": 20, "y": 20}, "radius": 7}
}
jsonschema.validate(valid_data, schema, resolver=resolver)

I get exception

jsonschema.exceptions.RefResolutionError: <urlopen error [Errno 2] No such file or directory: '/Users/kuba/Projects/code/sketchpad/python/python_sketchpad/my_complex_schemas/basic_shapes/point.json'>

That means that when resolver encounters "$ref": "basic_shapes/circle.json" inside my_complex_schemas/complex_shapes/rectangle_with_hole.json, enters basic_shapes/circle.json and encounters "$ref": "point.json", it tries to read it from my_complex_schemas/basic_shapes/point.json instead of my_complex_schemas/point.json, even though base_uri is set to base_uri = "file://{}/".format(os.path.abspath("./my_complex_schemas")).

I can get my_complex_schemas/complex_shapes/rectangle_with_hole.json to validate if I change references in basic_shapes/circle.json and basic_shapes/rectangle.json to "$ref": "../point.json". But if I do that,

with open("./my_complex_schemas/basic_shapes/circle.json") as file:
    schema = json.load(file)

base_uri = "file://{}/".format(os.path.abspath("./my_complex_schemas"))
resolver = jsonschema.RefResolver(base_uri=base_uri, referrer=schema)

valid_data = {"center": {"x": 10, "y": 20}, "radius": 2}
jsonschema.validate(valid_data, schema, resolver=resolver)

fails with

jsonschema.exceptions.RefResolutionError: <urlopen error [Errno 2] No such file or directory: '/Users/kuba/Projects/code/sketchpad/python/python_sketchpad/point.json'>

that is in this case resolves looks for point.json in ./my_complex_schemas/.. instead of ./my_complex_schemas, which is expected.

Hope above will help to make a unit test that fixes resolver.
Or maybe the problem is between the chair and the keyboard, and I missed some important setting?

@Julian
Copy link
Member

Julian commented Sep 10, 2019

Thank you!

There are already quite a few issues related to RefResolver not working correctly for complex schemas, but many of them don't include much code, and none that I saw includes multiple subdirectories.

Very much agreed! Which obviously makes them hard to fix, or know when there's some user error involved.

I'll have to read your example more carefully, but the only concrete thing I'd say immediately is "the known bugs are to me essentially the skipped tests in the test suite".

So IIRC there are basically 2 there, one for location-independent identifiers, and one for change-of-ID in a subschema. The former is unlikely to be relevant but you might want to have a look at the second one.

And the "rule" there is basically that jsonschema's spec related unit tests are the upstream test suite at this point (which I also maintain), so every RefResolver fix essentially should correspond to either adding or un-skipping a test there.

But very much do appreciate the full example!

@Julian Julian added the Bug Something doesn't work the way it should. label Sep 10, 2019
@Julian Julian added the Needs Simplification An issue which is in need of simplifying the example or issue being demonstrated for diagnosis. label Mar 24, 2020
@willson-chen
Copy link
Contributor

Here is the cause of this problem I think: Layer 1 references the file based on base_uri, so does Layer 2. The base_uri changes to the path of the layer 1 referenced file. However, the configuration file referenced by layer 2 is a relative path configured based on base_uri, which cause the configuration file cannot be found. If base_uri keeps, the problem will be resolved. And I think I can submit the patch.

Hi @Julian, what do you think of the solution?

@Julian
Copy link
Member

Julian commented Jul 14, 2020

@willson-chen certainly happy to review a patch!

@Julian
Copy link
Member

Julian commented Jul 25, 2022

So hopefully I haven't confused myself here, and apologies for this taking so long, but I think having now looked at this yet again that everything is actually correct here as-is. (I still want to add another test upstream regardless, since this case isn't covered, but that's a different story).

I can get my_complex_schemas/complex_shapes/rectangle_with_hole.json to validate if I change references in basic_shapes/circle.json and basic_shapes/rectangle.json to "$ref": "../point.json"

This is/was the correct fix -- the base_uri you pass to RefResolver is the base URI for the root document. It doesn't mean that all refs are resolved relative to that -- just that it's that URI that applies to the root document.

If you have a schema which references basic_shapes/circle.json, indeed point.json is in the parent directory, not basic_shapes, so you need ../.

In your last example, i.e. after you changed to $ref: "../point.json", your exception is also correct --

jsonschema.exceptions.RefResolutionError: <urlopen error [Errno 2] No such file or directory: '/Users/kuba/Projects/code/sketchpad/python/python_sketchpad/point.json'>

As you say, it's looking for point.json in ./my_complex_schemas/.. instead of ./my_complex_schemas, but that's because that's the URI you gave for base_uri.

Specifically, if I take the same layout you mentioned:

⊙  tree my_complex_schemas                                                                                                                                                        julian@Airm ●
my_complex_schemas
├── basic_shapes
│   ├── circle.json
│   └── rectangle.json
├── complex_shapes
│   └── rectangle_with_hole.json
└── point.json

Then with the modification you mentioned:

⊙  git diff                                                                                                                                                                       julian@Airm ●
diff --git a/my_complex_schemas/basic_shapes/circle.json b/my_complex_schemas/basic_shapes/circle.json
index 3cf3c39..d73d86d 100644
--- a/my_complex_schemas/basic_shapes/circle.json
+++ b/my_complex_schemas/basic_shapes/circle.json
@@ -1,7 +1,7 @@
 {
     "type": "object",
     "properties": {
-        "center": { "$ref": "point.json" },
+        "center": { "$ref": "../point.json" },
         "radius": {"type": "number"}
     },
     "required": ["center", "radius"]
diff --git a/my_complex_schemas/basic_shapes/rectangle.json b/my_complex_schemas/basic_shapes/rectangle.json
index 30bce78..bdf4797 100644
--- a/my_complex_schemas/basic_shapes/rectangle.json
+++ b/my_complex_schemas/basic_shapes/rectangle.json
@@ -1,10 +1,10 @@
 {
     "type": "object",
     "properties": {
-      "top_left": { "$ref": "point.json" },
-      "top_right": { "$ref": "point.json" },
-      "bottom_left": { "$ref": "point.json" },
-      "bottom_right": { "$ref": "point.json" }
+      "top_left": { "$ref": "../point.json" },
+      "top_right": { "$ref": "../point.json" },
+      "bottom_left": { "$ref": "../point.json" },
+      "bottom_right": { "$ref": "../point.json" }
     },
     "required": ["top_left", "top_right", "bottom_left", "bottom_right"]
 }

and with code like:

from pathlib import Path
import json
import jsonschema


base = Path("./my_complex_schemas/").absolute()
rectangle = base / "basic_shapes/rectangle.json"
rectangle_with_hole = base / "complex_shapes/rectangle_with_hole.json"

# schema = json.loads(rectangle.read_text())
schema = json.loads(rectangle_with_hole.read_text())

resolver = jsonschema.RefResolver(base_uri=f"{base.as_uri()}/", referrer=schema)

valid_data = {"center": {"x": 10, "y": 20}, "radius": 2}
jsonschema.validate(valid_data, schema, resolver=resolver)

-- which is essentially the same as what you had just with some Pathlib conveniences that didn't exist awhile ago -- I can now validate with either schema, but you need to be careful to pass a base URI that's really the base URI you mean and have used in whichver root schema you pass -- if you move around which directory your schema lives in, you may need to adjust it.

Hopefully some of the above helps -- as I say, I'm reasonably confident this is all working as-is, but $ref is hard, so I still could have made an error :). Closing, but comments welcome if anything's unclear.

@Julian Julian closed this as completed Jul 25, 2022
@kuba-lilz
Copy link
Author

kuba-lilz commented Jul 26, 2022

Ok, so in short I thought that references to schemas should be defined with respect to base uri, but they should be defined with respect to current schema instead! My bad, and thank you for taking time to understand and answer my problem :)

@Julian
Copy link
Member

Julian commented Jul 26, 2022

Precisely! And no problem!

@Julian Julian added Invalid Not a bug, PEBKAC, or an unsupported setup and removed Bug Something doesn't work the way it should. Needs Simplification An issue which is in need of simplifying the example or issue being demonstrated for diagnosis. labels Jul 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Invalid Not a bug, PEBKAC, or an unsupported setup
Projects
None yet
Development

No branches or pull requests

3 participants