Skip to content

Coerce data to text for JSON parsing #429

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Apr 19, 2019
Merged

Coerce data to text for JSON parsing #429

merged 11 commits into from
Apr 19, 2019

Conversation

jakirkham
Copy link
Member

@jakirkham jakirkham commented Apr 15, 2019

Cleans up some Python 2/3 code for handling JSON parsing by simply always coercing metadata to text regardless of Python version.

xref: #372
xref: #401

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/tutorial.rst
  • Changes documented in docs/release.rst
  • Docs build locally (e.g., run tox -e docs)
  • AppVeyor and Travis CI passes
  • Test coverage is 100% (Coveralls passes)

To simplify the branching required for Python 2/3 compatibility.
Rename `ensure_str` to `ensure_text_type` and rework the code to
coerce data that is `bytes` or `bytes`-like to `bytes` and then to
text data. It appears JSON on Python 2 or Python 3 handles this just
fine. So should make handling these two cases a bit more
straightforward.
`MongoDBStore` inherited the behavior on `pymongo` with respect to
returning `bson.Binary` for blob values on Python 2. As this caused
some issues on Python 2 when parsing JSON content (as the parser was
unable) to work with objects that were not `bytes` type (i.e.
`bson.Binary`), a workaround was needed to coerce `bson.Binary` to
`bytes` on Python 2. It's worth noting that this workaround is not
needed for loading binary data from chunks as we use the buffer
protocol there.

As we have now fixed our handling of JSON data to coerce data to text
on Python 2/3 and leverage the buffer protocol in the effort, we no
longer need this workaround in `MongoDBStore`. Hence we go ahead and
drop it.
@jakirkham jakirkham requested review from alimanfoo and jhamman April 15, 2019 05:07
Copy link
Member

@jhamman jhamman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

value = binary_type(value)

return value
return doc[self._value]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes me very happy to see

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise. 🙂

FWIW it turns out this is not Python 2 specific. We just only handled decoding before parsing JSON on Python 3 (hence avoiding the issue there). With this change we just always decode to text before parsing JSON. Here's a short reproducer.

>>> import json
>>> json.loads(b"{}")
{}
>>> json.loads(b"{\x00}")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/jkirkham/miniconda/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/Users/jkirkham/miniconda/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/Users/jkirkham/miniconda/lib/python3.7/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)

Much as we have a helper function for writing JSON, this adds a
helper function for loading JSON. Mainly it ensure data is coerced to
text before handing it off to the JSON parser. Should simplify code
that is loading JSON.
Changes other library code to use `json_loads` for handling text
encoding and JSON parsing. Should simplify things a bit and avoid
having some errors sneak in.
@jakirkham
Copy link
Member Author

Planning on merging end of day Friday if no comments.

@meggart
Copy link
Member

meggart commented Apr 17, 2019

Maybe this is completely unrelated, but since you touched the JSON code anyway, is there a chance that #412 gets fixed during the process?

@jakirkham
Copy link
Member Author

It’s unrelated. Though I agree it’s important to fix. Let’s discuss after.

@jakirkham jakirkham merged commit a7546b7 into zarr-developers:master Apr 19, 2019
@jakirkham jakirkham deleted the add_ensure_text_type branch April 19, 2019 23:07
@jakirkham
Copy link
Member Author

Thanks all 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants