Skip to content

bpo-30193: Allow to load buffer objects with json.loads() #1334

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 15 additions & 10 deletions Lib/json/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -242,12 +242,13 @@ def dumps(obj, *, skipkeys=False, ensure_ascii=True, check_circular=True,


def detect_encoding(b):
bstartswith = b.startswith
if bstartswith((codecs.BOM_UTF32_BE, codecs.BOM_UTF32_LE)):
return 'utf-32'
if bstartswith((codecs.BOM_UTF16_BE, codecs.BOM_UTF16_LE)):
return 'utf-16'
if bstartswith(codecs.BOM_UTF8):
for prefix in (codecs.BOM_UTF32_BE, codecs.BOM_UTF32_LE):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this change made?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memoryview does not provide "startswith" function

Copy link
Contributor

@DimitrisJim DimitrisJim Apr 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

D'oh, completely missed that, it's late here :) Anyway, support for bytes was added in https://bugs.python.org/issue17909 by Serhiy so it might make sense to ping him.

if b[:len(prefix)] == prefix:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work for e.g. memoryview(array.array('u', '\ufeff[1,2,3]')).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

purpose of this change is to allow to use c-ext classes that supports buffer protocol.

I can remove memoryview from allowed type, and if encoding detection fails then fail with TypeError.

proper solution would be to write memoryview like object that supports only "b|B" format

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any conclusion on this ticket?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it reasonable to require unicode arrays for this patch? They are deprecated after all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to start simple and raise a TypeError if it's memory view with view.format != 'B'.

return 'utf-32'
for prefix in (codecs.BOM_UTF16_BE, codecs.BOM_UTF16_LE):
if b[:len(prefix)] == prefix:
return 'utf-16'
if b[:len(codecs.BOM_UTF8)] == codecs.BOM_UTF8:
return 'utf-8-sig'

if len(b) >= 4:
Expand Down Expand Up @@ -343,10 +344,14 @@ def loads(s, *, encoding=None, cls=None, object_hook=None, parse_float=None,
raise JSONDecodeError("Unexpected UTF-8 BOM (decode using utf-8-sig)",
s, 0)
else:
if not isinstance(s, (bytes, bytearray)):
raise TypeError('the JSON object must be str, bytes or bytearray, '
'not {!r}'.format(s.__class__.__name__))
s = s.decode(detect_encoding(s), 'surrogatepass')
if not isinstance(s, (bytes, bytearray, memoryview)):
try:
s = memoryview(s)
except TypeError:
raise TypeError('the JSON object must be str, bytes or '
'bytearray or memoryview compatible object, '
'not {!r}'.format(s.__class__.__name__))
s = str(s, detect_encoding(s), 'surrogatepass')

if (cls is None and object_hook is None and
parse_int is None and parse_float is None and
Expand Down
10 changes: 10 additions & 0 deletions Lib/test/test_json/test_decode.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import array
import decimal
from io import StringIO
from collections import OrderedDict
Expand All @@ -20,6 +21,15 @@ def test_empty_objects(self):
self.assertEqual(self.loads('[]'), [])
self.assertEqual(self.loads('""'), "")

def test_memoryview(self):
data = memoryview(b'{"key": "val"}')
self.assertEqual(self.loads(data), {"key": "val"})

def test_buffer(self):
data = array.array('B')
data.frombytes(b'{"key": "val"}')
self.assertEqual(self.loads(data), {"key": "val"})

def test_object_pairs_hook(self):
s = '{"xkd":1, "kcw":2, "art":3, "hxm":4, "qrt":5, "pad":6, "hoy":7}'
p = [("xkd", 1), ("kcw", 2), ("art", 3), ("hxm", 4),
Expand Down
2 changes: 2 additions & 0 deletions Misc/NEWS
Original file line number Diff line number Diff line change
Expand Up @@ -317,6 +317,8 @@ Extension Modules
Library
-------

- bpo-30193: Allow to load buffer objects with ``json.loads()``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add a NEWS entry using the blurb tool, and revert this change.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like that is being done in @flavianh's follow-up PR ( #14977 ).


- bpo-30101: Add support for curses.A_ITALIC.

- bpo-29822: inspect.isabstract() now works during __init_subclass__. Patch
Expand Down