Coerce data to text for JSON parsing #429

jakirkham · 2019-04-15T03:42:32Z

Cleans up some Python 2/3 code for handling JSON parsing by simply always coercing metadata to text regardless of Python version.

xref: #372
xref: #401

TODO:

Add unit tests and/or doctests in docstrings
Add docstrings and API docs for any new/modified user-facing classes and functions
New/modified features documented in docs/tutorial.rst
Changes documented in docs/release.rst
Docs build locally (e.g., run tox -e docs)
AppVeyor and Travis CI passes
Test coverage is 100% (Coveralls passes)

To simplify the branching required for Python 2/3 compatibility. Rename `ensure_str` to `ensure_text_type` and rework the code to coerce data that is `bytes` or `bytes`-like to `bytes` and then to text data. It appears JSON on Python 2 or Python 3 handles this just fine. So should make handling these two cases a bit more straightforward.

`MongoDBStore` inherited the behavior on `pymongo` with respect to returning `bson.Binary` for blob values on Python 2. As this caused some issues on Python 2 when parsing JSON content (as the parser was unable) to work with objects that were not `bytes` type (i.e. `bson.Binary`), a workaround was needed to coerce `bson.Binary` to `bytes` on Python 2. It's worth noting that this workaround is not needed for loading binary data from chunks as we use the buffer protocol there. As we have now fixed our handling of JSON data to coerce data to text on Python 2/3 and leverage the buffer protocol in the effort, we no longer need this workaround in `MongoDBStore`. Hence we go ahead and drop it.

jhamman

LGTM!

jhamman · 2019-04-15T06:26:14Z

zarr/storage.py

-                value = binary_type(value)
-
-            return value
+            return doc[self._value]


this makes me very happy to see

Likewise. 🙂

FWIW it turns out this is not Python 2 specific. We just only handled decoding before parsing JSON on Python 3 (hence avoiding the issue there). With this change we just always decode to text before parsing JSON. Here's a short reproducer.

>>> import json >>> json.loads(b"{}") {} >>> json.loads(b"{\x00}") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/jkirkham/miniconda/lib/python3.7/json/__init__.py", line 348, in loads return _default_decoder.decode(s) File "/Users/jkirkham/miniconda/lib/python3.7/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/Users/jkirkham/miniconda/lib/python3.7/json/decoder.py", line 353, in raw_decode obj, end = self.scan_once(s, idx) json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)

Much as we have a helper function for writing JSON, this adds a helper function for loading JSON. Mainly it ensure data is coerced to text before handing it off to the JSON parser. Should simplify code that is loading JSON.

Changes other library code to use `json_loads` for handling text encoding and JSON parsing. Should simplify things a bit and avoid having some errors sneak in.

jakirkham · 2019-04-17T03:42:09Z

Planning on merging end of day Friday if no comments.

meggart · 2019-04-17T06:43:57Z

Maybe this is completely unrelated, but since you touched the JSON code anyway, is there a chance that #412 gets fixed during the process?

jakirkham · 2019-04-17T07:46:31Z

It’s unrelated. Though I agree it’s important to fix. Let’s discuss after.

jakirkham · 2019-04-19T23:08:05Z

Thanks all 😄

jakirkham added 8 commits April 14, 2019 23:31

Use ensure_text_type elsewhere in meta

ca34304

Rework convenience to use ensure_text_type

987f287

Use ensure_text_type in n5

5b6794c

Use ensure_text_type in attribute tests

cbd8d31

Simplify MongoDBStore's __getitem__'s return

5267895

Drop unused import of binary_type in storage

d4734a8

jakirkham requested review from alimanfoo and jhamman April 15, 2019 05:07

jhamman reviewed Apr 15, 2019

View reviewed changes

jakirkham added 3 commits April 16, 2019 19:35

Add a helper function for loading JSON

0755773

Much as we have a helper function for writing JSON, this adds a helper function for loading JSON. Mainly it ensure data is coerced to text before handing it off to the JSON parser. Should simplify code that is loading JSON.

Rewrite code to use json_loads directly

5529456

Changes other library code to use `json_loads` for handling text encoding and JSON parsing. Should simplify things a bit and avoid having some errors sneak in.

Note JSON changes in release notes

bba219a

jakirkham merged commit a7546b7 into zarr-developers:master Apr 19, 2019

jakirkham deleted the add_ensure_text_type branch April 19, 2019 23:07

jakirkham mentioned this pull request Apr 20, 2019

Collect JSON utility functions #430

Merged

7 tasks

jakirkham mentioned this pull request Nov 3, 2019

Ensure subok zarr-developers/numcodecs#173

Closed

8 tasks

jhamman mentioned this pull request Nov 12, 2020

Connecting with numpytiles zarr-developers/community#37

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Coerce data to text for JSON parsing #429

Coerce data to text for JSON parsing #429

Uh oh!

jakirkham commented Apr 15, 2019 •

edited

Loading

Uh oh!

jhamman left a comment

Uh oh!

jhamman Apr 15, 2019

Uh oh!

jakirkham Apr 17, 2019

Uh oh!

jakirkham commented Apr 17, 2019

Uh oh!

meggart commented Apr 17, 2019

Uh oh!

jakirkham commented Apr 17, 2019

Uh oh!

jakirkham commented Apr 19, 2019

Uh oh!

Uh oh!

Uh oh!

Coerce data to text for JSON parsing #429

Coerce data to text for JSON parsing #429

Uh oh!

Conversation

jakirkham commented Apr 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhamman left a comment

Choose a reason for hiding this comment

Uh oh!

jhamman Apr 15, 2019

Choose a reason for hiding this comment

Uh oh!

jakirkham Apr 17, 2019

Choose a reason for hiding this comment

Uh oh!

jakirkham commented Apr 17, 2019

Uh oh!

meggart commented Apr 17, 2019

Uh oh!

jakirkham commented Apr 17, 2019

Uh oh!

jakirkham commented Apr 19, 2019

Uh oh!

Uh oh!

jakirkham commented Apr 15, 2019 •

edited

Loading