-
-
Notifications
You must be signed in to change notification settings - Fork 329
Coerce data to text for JSON parsing #429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coerce data to text for JSON parsing #429
Conversation
To simplify the branching required for Python 2/3 compatibility. Rename `ensure_str` to `ensure_text_type` and rework the code to coerce data that is `bytes` or `bytes`-like to `bytes` and then to text data. It appears JSON on Python 2 or Python 3 handles this just fine. So should make handling these two cases a bit more straightforward.
`MongoDBStore` inherited the behavior on `pymongo` with respect to returning `bson.Binary` for blob values on Python 2. As this caused some issues on Python 2 when parsing JSON content (as the parser was unable) to work with objects that were not `bytes` type (i.e. `bson.Binary`), a workaround was needed to coerce `bson.Binary` to `bytes` on Python 2. It's worth noting that this workaround is not needed for loading binary data from chunks as we use the buffer protocol there. As we have now fixed our handling of JSON data to coerce data to text on Python 2/3 and leverage the buffer protocol in the effort, we no longer need this workaround in `MongoDBStore`. Hence we go ahead and drop it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
value = binary_type(value) | ||
|
||
return value | ||
return doc[self._value] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this makes me very happy to see
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise. 🙂
FWIW it turns out this is not Python 2 specific. We just only handled decoding before parsing JSON on Python 3 (hence avoiding the issue there). With this change we just always decode to text before parsing JSON. Here's a short reproducer.
>>> import json
>>> json.loads(b"{}")
{}
>>> json.loads(b"{\x00}")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/jkirkham/miniconda/lib/python3.7/json/__init__.py", line 348, in loads
return _default_decoder.decode(s)
File "/Users/jkirkham/miniconda/lib/python3.7/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/Users/jkirkham/miniconda/lib/python3.7/json/decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
Much as we have a helper function for writing JSON, this adds a helper function for loading JSON. Mainly it ensure data is coerced to text before handing it off to the JSON parser. Should simplify code that is loading JSON.
Changes other library code to use `json_loads` for handling text encoding and JSON parsing. Should simplify things a bit and avoid having some errors sneak in.
Planning on merging end of day Friday if no comments. |
Maybe this is completely unrelated, but since you touched the JSON code anyway, is there a chance that #412 gets fixed during the process? |
It’s unrelated. Though I agree it’s important to fix. Let’s discuss after. |
Thanks all 😄 |
Cleans up some Python 2/3 code for handling JSON parsing by simply always coercing metadata to text regardless of Python version.
xref: #372
xref: #401
TODO:
tox -e docs
)