Skip to content

recursive relativel references don't seem to work correctly #274

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
fleur opened this issue Feb 10, 2016 · 22 comments
Closed

recursive relativel references don't seem to work correctly #274

fleur opened this issue Feb 10, 2016 · 22 comments
Labels
Invalid Not a bug, PEBKAC, or an unsupported setup

Comments

@fleur
Copy link

fleur commented Feb 10, 2016

Maybe I'm misunderstanding JSON Schema, but the resolver seems to be adding directories onto it's resolver path as it goes down a relative path. If I have a 'project' schema that references a 'container' schema, that references a 'generic' schema, the references have to look like this to work, even with all files in the same directory:

project: "allOf": [ { "$ref": "file:schemas/container.json" } ],
container: "allOf": [ { "$ref": "file:generic.json" } ],

I've got a github project here because the example has to be a little involved: https://github.com/fleur/jsonschema-bug

@Julian
Copy link
Member

Julian commented Feb 12, 2016

I'm not sure when I'll have a second to look at this (I of course appreciate you filing the ticket and especially providing an example!) -- my guess would be that this is possibly related to caching behavior, but as I've said in the past (not to you obviously :P) is that relative URIs don't really make much sense in schemas AFAICT, let alone URIs that reference file, despite its utility for development. But I'll have a look and see if I can get to the bottom of this.

@gregbillock
Copy link

I've run into this as well. I have a group of schemas I'm working on which have nested dependencies like this. The error is below.

It looks to me like the scoping URL is getting a "/" added to it in the scope process. Is there a better mental image for how to refer to groups of schema files than with $ref? I couldn't find another way to think about this for schema files I'm working with locally. Is the mental image to simply wrap up all the schemas into one larger file and then reference into that as needed?

Traceback (most recent call last):
[... my code]
File "/Library/Python/2.7/site-packages/jsonschema/validators.py", line 478, in validate
cls(schema, _args, *_kwargs).validate(instance)
File "/Library/Python/2.7/site-packages/jsonschema/validators.py", line 122, in validate
for error in self.iter_errors(_args, *_kwargs):
File "/Library/Python/2.7/site-packages/jsonschema/validators.py", line 98, in iter_errors
for error in errors:
File "/Library/Python/2.7/site-packages/jsonschema/_validators.py", line 291, in properties_draft4
schema_path=property,
File "/Library/Python/2.7/site-packages/jsonschema/validators.py", line 114, in descend
for error in self.iter_errors(instance, schema):
File "/Library/Python/2.7/site-packages/jsonschema/validators.py", line 98, in iter_errors
for error in errors:
File "/Library/Python/2.7/site-packages/jsonschema/_validators.py", line 203, in ref
for error in validator.descend(instance, resolved):
File "/Library/Python/2.7/site-packages/jsonschema/validators.py", line 114, in descend
for error in self.iter_errors(instance, schema):
File "/Library/Python/2.7/site-packages/jsonschema/validators.py", line 98, in iter_errors
for error in errors:
File "/Library/Python/2.7/site-packages/jsonschema/_validators.py", line 291, in properties_draft4
schema_path=property,
File "/Library/Python/2.7/site-packages/jsonschema/validators.py", line 114, in descend
for error in self.iter_errors(instance, schema):
File "/Library/Python/2.7/site-packages/jsonschema/validators.py", line 98, in iter_errors
for error in errors:
File "/Library/Python/2.7/site-packages/jsonschema/_validators.py", line 199, in ref
scope, resolved = validator.resolver.resolve(ref)
File "/Library/Python/2.7/site-packages/jsonschema/validators.py", line 336, in resolve
return url, self._remote_cache(url)
File "/Library/Python/2.7/site-packages/functools32/functools32.py", line 400, in wrapper
result = user_function(_args, *_kwds)
File "/Library/Python/2.7/site-packages/jsonschema/validators.py", line 346, in resolve_from_url
raise RefResolutionError(exc)
jsonschema.exceptions.RefResolutionError: <urlopen error [Errno 2] No such file or directory: '/FrequencyRange.schema.json'>

@uvsmtid
Copy link

uvsmtid commented Aug 9, 2016

I'm using python-jsonschema-2.4.0-4.fc24.noarch RPM on Fedora 24 and hit the same issue.

I created example to demonstrate this.
The trivial schema, Python script, sample JSON data can be checked out from here:
https://github.com/uvsmtid/incubator/tree/7792ba00ab58b91eafc1e567c070a05ac755ba9f/json_shema

There are three schema.json files:

level_1/schema.json
level_1/level_2/schema.json
level_1/level_2/level_3/schema.json

CASE 1: The level_1/schema.json file refers
to level_1/level_2/schema.json as:

"$ref": "file:level_2/schema.json"

CASE 2: The level_1/level_2/schema.json file refers
to level_1/level_2/level_3/schema.json as:

"$ref": "file:level_3/schema.json"

There is no problem with CASE 1 - it loads relative schema file perfectly.
However, identical CASE 2 (which is only one level down fails):

  File "/usr/lib/python2.7/site-packages/jsonschema/validators.py", line 292, in resolving
    raise RefResolutionError(exc)
jsonschema.exceptions.RefResolutionError: <urlopen error [Errno 2] No such file or directory: '/level_2/level_3/schema.json'>

@uvsmtid
Copy link

uvsmtid commented Aug 10, 2016

Relative references to files should work being nested at any depth within filesystem tree.
While this should be default without necessity to provide resolver as explained below, the fix makes it work right now.

Workaround

As suggested by @traut here, providing resolver (see original example) makes all relative references work:

from jsonschema import validate, RefResolver
# ...
json_schema = read_json(json_schema_path)
json_schema_dir = os.path.dirname(os.path.realpath(json_schema_path))
# python-jsonschema needs base_uri (it needs to be with trailing /)
# to be able to resolve local file references
resolver = RefResolver(referrer = json_schema, base_uri = 'file://' + json_schema_dir + '/')
json_data = read_json(json_data_path)
validate(json_data, json_schema_dir, resolver = resolver)

@Julian
Copy link
Member

Julian commented Aug 10, 2016

I still would like to give this a look at some point, apologies for it taking so long, but just to be clear again -- no, in my opinion, relative references should never work by default. They're unportable and out-of-spec-ish. Having to provide an additional configured object to get them to work is I think a reasonable amount of work to have to do.

@uvsmtid
Copy link

uvsmtid commented Aug 10, 2016

@Julian, I support the stance on portability and sticking to the specs.

What is deceptive is the fact that it works when additional file is referenced from level 1 to level 2, but it unexpectedly fails from level 2 to level 3 (and 3-to-4, 4-to-5, ... (N-1)-to-N, I guess). This feature could be treated as unportable but it can also be complete at the same time.

From the general public perspective (away from specs), I also guess that relative references in JSON schema are deemed similar to how relative links in HTML work (they make subdirectory with HTML files in filesystem tree "mountable" at any URL prefix served by HTTP server without breaking links within that sub-tree). Such functionality might be useful to become part of the specs eventually.

@holvianssi
Copy link

The problem is that for the first file the references are resolved against the base_uri, but for reffed files the references are resolved against those file's paths.

The fix is to use the file's full URL as base_uri instead of having a path to the file. So, if you have file '/foo/bar/example.json', then your base_uri should be '/foo/bar/example.json'.

I guess the problem is that it's too easy to think the base_uri as the base to resolve all references against, when in fact it's the uri to resolve the first file's references against. A small documentation fix might be in order here.

@holvianssi
Copy link

Another solution might be to just have some validator method for which you give the full path of the schema as parameter, nothing else. This way the validator would immediately know the base_uri for the first file and could always resolve relative to the file being currently under processing.

@gregbillock
Copy link

@Julian I'm not sure I understand the argument that relative filenames are out-of-spec. It looks to me that the spec refers to JSON-Reference as the way of scoping URLs. The JSON-Ref spec has rules about combining URIs which look to me like they will support relative paths and actually talk explicitly about how to join paths and use base uri scheme in such cases (https://tools.ietf.org/html/rfc3986#section-5.2.3 and nearby).

So does the $ref resolving logic use the right scheme inheritance rules in RFC 3986, so that relative uris in schemas work right? That way there'd be the RefResolver/base_uri indicator needed, but schemas could use just "$ref": "relative/path/schema.json". I'm suspicious from my own experimentation that this does not work. Instead, the resolving logic seems to want to see "$ref": "file:schema.json" and to have a bug that is not RFC-3986-compliant in the path merges as well as non-compliance to the scheme logic in RFC-3986 for relative uris. But I'm not a JSON-Ref expert so it could be this interpretation is not right.

@holvianssi
Copy link

In many cases it would make a lot of sense to build the schema validator with a file reference directly. Something along the lines of:

validator = jsonschema.build_draft4validator('file://foo/bar.json')

The build_draft4validator() could have the following default kwargs:

resolver_base=jsonschema.RefResolver
format_checker=jsonschema.FormatChecker

This way you'd get an usable validator in single line, and in addition it would be impossible to get the base_uri wrong.

@handrews
Copy link

According to section 7 of the current JSON Schema core spec:

The value of the $ref is a URI Reference. Resolved against the current URI base, it identifies the URI of a schema to use.

Unless I am really misunderstanding something here, that means that relative references are part of the specification.

Relative references are useful when you have an API suite that is present on many hosts (for instance, for appliance configuration), and those hosts do not necessarily have connectivity to the wider internet.

@ashb
Copy link

ashb commented Jan 20, 2017

@Julian Would it perhaps make sense somewhere around here https://github.com/Julian/jsonschema/blob/ee2de6d0e0b6fed6f580cf2d744fb2790fe06a54/jsonschema/validators.py#L459-L461 to check if scheme == "" and in which get the scheme from the top of the _scopes_stack -- that way a relative URL would be relative to what was last resolved.

@Julian
Copy link
Member

Julian commented Jan 21, 2017

Apologies for being behind here guys -- the most helpful thing I think for me at this point would be a suggested new test case.

@ashb I have to think about that a bit more carefully, but it's possible

And to clarify -- what I was referring to with "out of spec" is a relative, scheme-unqualified file references without declaring an ID, but again I'm not too familiar with the URI parts of the spec unfortunately so it's possible I'm wrong even there.

@ashb
Copy link

ashb commented Jan 21, 2017

@Julian we understand. Being an open source maintainer is a hard, often thankless task. I appreciate this module!

@erickpeirson
Copy link

@Julian Late to the party, here -- I second @ashb's suggestion as a possible way forward. In jsonschema==2.6.0, resolution of relative references is relative to the execution path, not the schema path. That behavior is counter-intuitive, and a blocker to local development using jsonschema. It's also inconsistent with the (perhaps implied) behavior described at https://spacetelescope.github.io/understanding-json-schema/structuring.html#reuse, specifically:

$ref can also be a relative or absolute URI, so if you prefer to include your definitions in separate files, you can also do that. For example:

{ "$ref": "definitions.json#/address" }
would load the address schema from another file residing alongside this one.

Happy to raise a PR with a test case if that would still be helpful. Thanks for a great package!

@Julian
Copy link
Member

Julian commented Jan 24, 2018

@erickpeirson a PR would be great! And thanks :)

@ghost
Copy link

ghost commented May 9, 2018

Having the same issue here myself. I have a staff.schema file with a "$ref": "file:occupation.schema" reference and within the occupation.schema file I also have a "$ref": "occupation_template.json" reference.

I get the error jsonschema.exceptions.RefResolutionError: <urlopen error [Errno 2] No such file or directory: '/organisation_template.schema'> when trying to validate.

@sidscry
Copy link

sidscry commented Jul 20, 2018

How about always use relative-paths with respect to root folder of project?

grandparent_json has reference to parent_json ("$ref" : "parent_json")
parent_json has reference to child_json ("$ref" : "child_json")
All 3 json files can be located anywhere in the project
def jsonschema_validate(validated_json, schema_filepath):
    base_uri = <root folder>
    store = {
        "child_json" : load_json_file(os.path.join(base_uri, "path/to/child_json/relative/to/base_uri")),
        "parent_json" : load_json_file(os.path.join(base_uri, "path/to/parent_json/relative/to/base_uri")),
        "grandparent_json" : load_json_file(os.path.join(base_uri, "path/to/grandparent_json/relative/to/base_uri")),
    }
    schema_json = load_json_file(schema_filepath)
    resolver = jsonschema.RefResolver(base_uri="", referrer=None, cache_remote=False, store=store)
    try:
        jsonschema.validate(validated_json, schema_json, resolver=resolver)
        return True
    except jsonschema.SchemaError:
        raise ValueError("Invalid schema JSON")
    except jsonschema.ValidationError:
        raise ValueError("schema does not follow the definition in " + schema_filepath)

@Julian Julian added the Bug Something doesn't work the way it should. label Aug 13, 2018
@reece
Copy link

reece commented Feb 27, 2019

The new jsonschema draft seems to make it clear that $refs should be resolved per https://tools.ietf.org/html/rfc3986#section-4.2. That is, in the absence of a scheme or base uri, these should be taken from the referring document. That implies knowing the base uri of the referring document (or passing it explicitly).

The use case I'm particularly interested in is having a path (in openapi) like $ref: "otherfile.yaml#/components/...", where otherfile.yaml is adjacent to the source file.

@Julian Julian added the Needs Simplification An issue which is in need of simplifying the example or issue being demonstrated for diagnosis. label Mar 24, 2020
@kratsg
Copy link

kratsg commented Apr 2, 2020

(hopefully not too noisy) I made this recursive ref resolver here: https://gist.github.com/kratsg/96cec81df8c0d78ebdf14bf7b800e938

The idea was that (on some systems), one wants to recursively resolve the entire schema at once into a single local-file. So this was an implementation that relies on the referrer document to be used as the base for resolving refs located in that document.

@Julian
Copy link
Member

Julian commented Jul 26, 2022

Sorry for taking so long to get to some of these, but I'm going through a bunch of $ref related tickets, and, though this one is long and has some other things in the comments above, I believe this is an example of the same issue as in e.g. #915 (comment) or #601.

Specifically, the base URI being provided in the linked repo has no trailing slash, so relative URIs resolved against it indeed are supposed to strip the last component.

If I change the code in that repo to:

diff --git a/schemas/container.json b/schemas/container.json
index e9b1598..43b8516 100644
--- a/schemas/container.json
+++ b/schemas/container.json
@@ -3,7 +3,7 @@
     "title": "container",
     "description": "JSON schema for container",
     "type": "object",
-    "allOf": [ { "$ref": "file:schema/generic.json" } ],
+    "allOf": [ { "$ref": "file:generic.json" } ],
     "properties": {
         "type":     { "type": "string" },
         "children": {
diff --git a/schemas/project.json b/schemas/project.json
index e5c8d37..a33e4dd 100644
--- a/schemas/project.json
+++ b/schemas/project.json
@@ -3,7 +3,7 @@
     "title": "project",
     "description": "JSON schema for project",
     "type": "object",
-    "allOf": [ { "$ref": "file:schemas/container.json" } ],
+    "allOf": [ { "$ref": "file:container.json" } ],
     "properties": {
         "type":     { "type": "string", "pattern": "^project$" },
         "children": {
diff --git a/validate.py b/validate.py
index bca15c8..892b98f 100644
--- a/validate.py
+++ b/validate.py
@@ -11,11 +11,11 @@ type_hierarchy_dict = None
 # puts them in a dict, indexed by title
 for filename in [each for each in os.listdir('schemas') if each.endswith('.json')]:
     path = os.path.join('schemas', filename)
-    print "reading schema:", path
+    print("reading schema:", path)
 
     with open(path) as json_data:
         tmp = json.load(json_data)
-        if (tmp['properties'].has_key('type')):
+        if 'type' in tmp['properties']:
             schemas_dict[tmp['title']] = tmp
 
 schema_types = schemas_dict.keys()
@@ -24,8 +24,8 @@ schema_types = schemas_dict.keys()
 def validate(data):
 
     try:
-        uri = 'file://'+os.path.join(os.getcwd(), 'schemas')
-        print "uri:", uri
+        uri = 'file://'+os.path.join(os.getcwd(), 'schemas') + '/'
+        print("uri:", uri)
         r = jsonschema.RefResolver(uri, None)
         v = jsonschema.Draft4Validator(schemas_dict[data['type']], resolver=r)
         v.validate(data)

everything works as it should.

Closing this, though I'll be adding a FAQ entry since this seems to be a common mistake, but if I've missed anything (either in this comment or in others here) please feel free to open a new issue.

@Julian Julian closed this as completed Jul 26, 2022
@Julian Julian added Invalid Not a bug, PEBKAC, or an unsupported setup and removed Bug Something doesn't work the way it should. Needs Simplification An issue which is in need of simplifying the example or issue being demonstrated for diagnosis. labels Jul 28, 2022
@kakyoism
Copy link

kakyoism commented Aug 26, 2022

For some reason I'm still trapped in this one, is this already fixed in 4.14.0?

My setup

  • macOS
  • py3.9
  • jsonschema 4.14.0

JSON files and schemas are all in the same folder.

Code

import jsonschema as jsch

schema = load_json('/path/to/root.schema.json')

jsch.validate(instance=ins, schema=schema)

root.schema.json refers to target.schema.json through two layers of references.

The first layer: root.schema.json

...
"external": {
      "anyOf": [
        {
          "$ref": "file:mid1.schema.json"
        },
        {
          "$ref": "file:mid2.schema.json"
        }
      ]
    }
...

The second layer: mid1.schema.json

...
"external": {
      "anyOf": [
        {
          "$ref": "file:target.schema.json"
        },
        {
          "$ref": "file:else.schema.json"
        }
      ]
    }
...

Expected

There should be no missing reference missing.

Got

E     jsonschema.exceptions.RefResolutionError: <urlopen error [Errno 2] No such file or directory: '/target.schema.json'>

I've also tried to run the CLI program jsonschema under various working directories including the folder that has all the ..json files. Still the same error.

This error looks like jsonschema found the immediate references but not the second-order indirect references.
Where am I wrong?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Invalid Not a bug, PEBKAC, or an unsupported setup
Projects
None yet
Development

No branches or pull requests