Skip to content

String date format check allows unexpected formats with Python 3.11 and later #1056

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jvtm opened this issue Mar 15, 2023 · 4 comments · Fixed by #1076
Closed

String date format check allows unexpected formats with Python 3.11 and later #1056

jvtm opened this issue Mar 15, 2023 · 4 comments · Fixed by #1076
Labels
Bug Something doesn't work the way it should.

Comments

@jvtm
Copy link
Contributor

jvtm commented Mar 15, 2023

Looks like date format check uses datetime.date.fromisoformat(). The behavior changed in Python 3.11, allowing e.g. 2023-W01 and 20230315 style strings.

Running this with Python 3.11.x and having jsonschema[format] installed:

#!/usr/bin/env python
"""
Show jsonschema `date` string format check behavior.

Looks like jsonschema uses `datetime.date.fromisoformat()` for the validation.
Python 3.11 changed `.fromisoformat()` behavior so that it accepts more
variants than just `YYYY-MM-DD`.

https://docs.python.org/3/library/datetime.html#datetime.date.fromisoformat
> Changed in version 3.11: Previously, this method only supported the format YYYY-MM-DD.

https://json-schema.org/understanding-json-schema/reference/string.html#dates-and-times

https://www.rfc-editor.org/rfc/rfc3339#section-5.6

"""
import sys

import jsonschema

schema = {
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "type": "string",
    "format": "date",
}

values = [
    ("2023-03-15", True),
    ("2023-02-29", False),
    ("2023-W01", False),
    ("2023-001", False),
    ("20230315", False),
    ("1969-12-31", True),
    ("2038-01-20", True),
]

print("Python", sys.version.split()[0])

cls = jsonschema.validators.validator_for(schema)
cls.check_schema(schema)
validator = cls(schema, format_checker=cls.FORMAT_CHECKER)
print(validator)

for value, expected in values:
    is_valid = validator.is_valid(value)
    print(f" {value=} {is_valid=} {expected=}")

results in this:

3.11.1
Draft202012Validator(schema={'$schema': 'https://json...020-12/schema', 'format': 'date', 'type': 'string'}, format_checker=<FormatChecker checkers=['date', 'date-time', 'duration', 'email', 'hostname', 'idn-email', 'idn-hostname', 'ipv4', 'ipv6', 'iri', 'iri-reference', 'json-pointer', 'regex', 'relative-json-pointer', 'time', 'uri', 'uri-reference', 'uri-template', 'uuid']>)
value='2023-03-15' is_valid=True expected=True
value='2023-02-29' is_valid=False expected=False
value='2023-W01' is_valid=True expected=False
value='2023-001' is_valid=False expected=False
value='20230315' is_valid=True expected=False
value='1969-12-31' is_valid=True expected=True
value='2038-01-20' is_valid=True expected=True

Quickly looking at the code it feels like a regular expression match ^\d{4}-\d{2}-\d{2}$ and keeping the fromisoformat() call would be the easiest fix.

See

@Julian
Copy link
Member

Julian commented Mar 15, 2023

Thanks! The proposed change sounds reasonable (to use a regex) -- but the first step as it sounds like you found is to add some (failing) test cases upstream to the test suite. Any chance you're up for doing so?

@Julian Julian added the Bug Something doesn't work the way it should. label Mar 15, 2023
@jvtm
Copy link
Contributor Author

jvtm commented Mar 15, 2023

Thanks! The proposed change sounds reasonable (to use a regex) -- but the first step as it sounds like you found is to add some (failing) test cases upstream to the test suite. Any chance you're up for doing so?

I might have some spare time during the weekend, but no promises...

@jvtm
Copy link
Contributor Author

jvtm commented Mar 26, 2023

Some progress: after some tox 4.x oddities (bug in envlist command line option handling?), managed to run tests only for certain Python versions locally.

Also spotted that also datetime.time.isoformat() parsing has been extended in Python 3.11, but AFAIKT jsonschema code is not affected by that -- a bit annoying anyway, It might make sense to add the additional styles as negative cases anyway time format tests
https://docs.python.org/3.11/library/datetime.html#datetime.time.fromisoformat

Hopefully having some more time in few days.

jvtm added a commit to jvtm/jsonschema that referenced this issue Mar 28, 2023
Python 3.11 and later allow additional ISO8601 formats in `datetime` module
ISO8601 parsing. These new formats are not RFC3339 section 5.6 compliant.

Especially `datetime.date.fromisoformat()` now allows strings like:
 * `20230328` (2023-03-28)
 * `2022W527` (2023-01-01)
 * `2023-W01` (2023-01-02)
 * `2023-W13-2` (2023-03-28)

Fix by doing a regular expression check before passing the value to `datetime`
module. This made the original `.isascii()` check unnecessary.

See:
 * https://docs.python.org/3/whatsnew/3.11.html#datetime
 * python/cpython@1303f8c927
 * https://docs.python.org/3.11/library/datetime.html#datetime.date.fromisoformat
 * https://www.rfc-editor.org/rfc/rfc3339#section-5.6

Tests covering the invalid values to be sent to json-schema-org/JSON-Schema-Test-Suite

Fixes python-jsonschema#1056.
jvtm added a commit to jvtm/jsonschema that referenced this issue Mar 28, 2023
Python 3.11 and later allow additional ISO8601 formats in `datetime` module
ISO8601 parsing. These formats are not RFC3339 section 5.6 compliant.

Especially `datetime.date.fromisoformat()` now allows strings like:
 * `20230328` (2023-03-28)
 * `2022W527` (2023-01-01)
 * `2023-W01` (2023-01-02)
 * `2023-W13-2` (2023-03-28)

Fix by doing a regular expression check before passing the value to `datetime`
module. This made the original `.isascii()` check unnecessary.

See:
 * https://docs.python.org/3/whatsnew/3.11.html#datetime
 * python/cpython@1303f8c927
 * https://docs.python.org/3.11/library/datetime.html#datetime.date.fromisoformat
 * https://www.rfc-editor.org/rfc/rfc3339#section-5.6

Tests covering the invalid values to be sent to json-schema-org/JSON-Schema-Test-Suite

Fixes python-jsonschema#1056.
jvtm added a commit to jvtm/jsonschema that referenced this issue Mar 28, 2023
Python 3.11 and later allow additional ISO8601 formats in `datetime` module
ISO8601 parsing. These formats are not RFC3339 section 5.6 compliant.

Especially `datetime.date.fromisoformat()` now allows strings like:
 * `20230328` (2023-03-28)
 * `2022W527` (2023-01-01)
 * `2023-W01` (2023-01-02)
 * `2023-W13-2` (2023-03-28)

Fix by doing a regular expression check before passing the value to `datetime`
module. This made the original `.isascii()` check unnecessary.

See:
 * https://docs.python.org/3/whatsnew/3.11.html#datetime
 * python/cpython@1303f8c927
 * https://docs.python.org/3.11/library/datetime.html#datetime.date.fromisoformat
 * https://www.rfc-editor.org/rfc/rfc3339#section-5.6

Tests covering the invalid values to be sent to json-schema-org/JSON-Schema-Test-Suite

Fixes python-jsonschema#1056.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something doesn't work the way it should.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants