Skip to content

bpo-27413: add --no-ensure-ascii argument to json.tool #201

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 13 commits into from

Conversation

dhimmel
Copy link
Contributor

@dhimmel dhimmel commented Feb 20, 2017

Work in progress, pending consensus on whether feature should be added

This pull request increases the utility of python -m json.tool by adding support for setting ensure_ascii and indent in the json.dump call.

Happy to also address any other arguments that need updating. Or other issues with json.tool.

@dhimmel
Copy link
Contributor Author

dhimmel commented Feb 20, 2017

Here a related issue from the issue tracker: Add an option to json.tool to bypass non-ASCII characters. This is my first time contributing to Python, so let me know if there's anything I need to do related to the issue tracker.

Lib/json/tool.py Outdated
@@ -25,24 +37,36 @@ def main():
help='a JSON file to be validated or pretty-printed')
parser.add_argument('outfile', nargs='?', type=argparse.FileType('w'),
help='write the output of infile to outfile')
parser.add_argument('--no_escape', action='store_true', default=False,
help='Do not set ensure_ascii to escape non-ASCII characters')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This option name should be --ensure_ascii, (and default should be True).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@methane are you suggesting:

parser.add_argument('--ensure_ascii', action='store_false', default=True)
# And then when calling `json.dump`:
ensure_ascii=options.ensure_ascii

The problem here is that it's counterintuitive for --ensure_ascii to result in ensure_ascii=False. In https://bugs.python.org/issue27413, the discussion settled on --no-ensure-ascii, which I'm happy to switch to:

parser.add_argument('--no-ensure-ascii', action='store_true', default=False)
# And then when calling `json.dump`:
ensure_ascii=not options.no_ensure_ascii

Just let me know what's preferred.

Lib/json/tool.py Outdated
if indent == 'None':
return None
if indent == r'\t':
return '\t'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like such special casing.
I think json.tool should be minimal to expose json.dumps as CLI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@methane can you think of any other way to expose setting indent? The issue is that indent can take an int, string, or None. Command line arguments are not ideal for type flexibility. Furthermore, the most common string is a tab, which is difficult to pass via the command line without such special casing.

Another option would be to pass options.indent through ast.literal_eval. Therefore, users could specify any python value, but the command line usage would be more cumbersome.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this either. You can use external commands like unexpand for converting spaces to tabs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cross-referencing bpo-29636.

@serhiy-storchaka so do you think it's best to simplify so you must either pass --indent an int or None. Anything that cannot be converted to an int besides None would throw an error?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pass an integer with --indent. Add --no-indent for None.

@methane
Copy link
Member

methane commented Feb 21, 2017

Sorry, please file an issue to bugs.python.org.
We use github only for code review, not for design discussion.

@dhimmel
Copy link
Contributor Author

dhimmel commented Feb 21, 2017

We use github only for code review, not for design discussion.

@methane no problem. I'll open an issue on bugs.python.org. Give me a couple days... currently traveling.

@dhimmel
Copy link
Contributor Author

dhimmel commented Feb 23, 2017

please file an issue to bugs.python.org.

@methane I opened bpo-29636 for design discussion of --indent. In addition, this PR would close the existing issue bpo-27413 regarding the --no-ensure-ascii argument.

Lib/json/tool.py Outdated
if indent == 'None':
return None
if indent == r'\t':
return '\t'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this either. You can use external commands like unexpand for converting spaces to tabs.

rc, out, err = assert_python_ok('-m', 'json.tool', '--sort-keys', infile)
self.assertEqual(rc, 0)
self.assertEqual(out.splitlines(),
self.expect_without_sort_keys.encode().splitlines())
self.assertEqual(err, b'')

def test_no_ascii_flag(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_no_ensure_ascii_flag

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in a3e2b23.

infile = support.TESTFN
with open(infile, "w") as fp:
self.addCleanup(os.remove, infile)
fp.write(self.data)
fp.write(text)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it would be better just to add non-ascii data to self.data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in a3e2b23.

}
''')
infile = self._create_infile(data)
rc, out, err = assert_python_ok('-m', 'json.tool', '--no-ensure-ascii', infile)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I afraid this fails on non-UTF-8 locale.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this still fail with Python 3.7? Because in https://docs.python.org/3.7/library/json.html there isn't written anymore that --ensure-ascii can cause problems, while you can still find it in e.g. https://docs.python.org/2.7/library/json.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't tested but I assume so. The issue that Python uses the system encoding which cannot represent certain unicode characters. I could imagine a few possible behaviors:

  1. json.tool fails when trying to output a character that cannot be represented by the output encoding
  2. json.tool always outputs utf-8
  3. json.tool attempts to output using the system's encoding. In the case of failure, the exception is handled, the procedure is repeated with --no-ensure-ascii disabled.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main problem -- output characters on the locale that doesn't support them.

Lib/json/tool.py Outdated
if indent == 'None':
return None
if indent == r'\t':
return '\t'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pass an integer with --indent. Add --no-indent for None.

@@ -11,7 +11,7 @@ class TestTool(unittest.TestCase):
data = """

[["blorpie"],[ "whoops" ] , [
],\t"d-shtaeou",\r"d-nthiouh",
],\t"d-shtaeou",\r"🐍 and δ",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to avoid using non-ascii characters in sources. Use escaped sequences. Non-ascii characters are not always displayed correctly (for example I see just a rectangle instead of the first character) and it is not clear from which range they are. Use characters from three different ranges: U+0080-U+00FF, U+0100-U+FFFF and U+1000-U+10FFFF. They are encoded differently in Python and JSON.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay in d699070 I use the following unicode characters, which should cover three ranges:

  • δ U+03B4 \u03B4
  • 🐍 U+1F40D \N{snake}
  • 𝀷 U+1D037 \U0001D037

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed something in the range U+0080-U+00FF. In Python string literals they are encoded with \x, but JSON always use \u.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference (in python 3.6):

>>> print("\xA7 \N{snake} \u03B4 and \U0001D037")
§ 🐍 δ and 𝀷
>>> json.dumps("\xA7 \N{snake} \u03B4 and \U0001D037")
'"\\u00a7 \\ud83d\\udc0d \\u03b4 and \\ud834\\udc37"'

Lib/json/tool.py Outdated
help='Do not set ensure_ascii to escape non-ASCII characters')
group = parser.add_mutually_exclusive_group()
group.add_argument('--indent', default=4, type=int,
help='Indent level for pretty-printing.')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document default value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 66a173a.

Lib/json/tool.py Outdated
group = parser.add_mutually_exclusive_group()
group.add_argument('--indent', default=4, type=int,
help='Indent level for pretty-printing.')
group.add_argument('--no-indent', action='store_true', default=False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use action='store_const', dest='indent', const=None?

Copy link
Contributor Author

@dhimmel dhimmel Feb 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea. Implemented in 0f38c18.

As an aside, argparse in Python 3.6 seems to have a bug where mutual exclusivity is not checked if you don't change the default value. So for example:

# error: argument --no-indent: not allowed with argument --indent
echo "[1,2,3]" | python Lib/json/tool.py --indent=5 --no-indent

# No error, compact mode
echo "[1,2,3]" | python Lib/json/tool.py --indent=4 --no-indent

# No error, indented
echo "[1,2,3]" | python Lib/json/tool.py --no-indent --indent=4

Something to keep in mind... assuming this will be fixed.

parser.add_argument('--no-ensure-ascii', action='store_true', default=False,
help='Do not set ensure_ascii to escape non-ASCII characters')
group = parser.add_mutually_exclusive_group()
group.add_argument('--indent', default=4, type=int,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed tests for --indent and --no-indent.

@dhimmel dhimmel force-pushed the json_tool branch 2 times, most recently from e8a7cee to da24e13 Compare February 26, 2017 23:16
@dhimmel
Copy link
Contributor Author

dhimmel commented Feb 27, 2017

@serhiy-storchaka and @methane, thanks for the review thus far. It's been very helpful.

Builds are now passing.

The main problem -- output characters on the locale that doesn't support them.

I'm not sure whether I've addressed this. Could you be a little more specific on the locale issues? Is this just something that affects the tests?

@serhiy-storchaka
Copy link
Member

serhiy-storchaka commented Feb 27, 2017

test_ensure_ascii is failed with non-UTF-8 encoding.

$ LC_ALL=en_US.iso88591 ./python -m test.regrtest -v -m test_ensure_ascii test_json
...
======================================================================
FAIL: test_ensure_ascii (test.test_json.test_tool.TestTool)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/serhiy/py/cpython/Lib/test/test_json/test_tool.py", line 137, in test_ensure_ascii
    self.assertEqual(expect.splitlines(), json_stdout.splitlines())
AssertionError: Lists differ: [b'"\\u00a7 \\ud83d\\udc0d \\u03b4 \\ud834\\udc37"'] != [b'"\\u00c2\\u00a7 \\u00f0\\u009f\\u0090\\u008d \\[39 chars]b7"']

First differing element 0:
b'"\\u00a7 \\ud83d\\udc0d \\u03b4 \\ud834\\udc37"'
b'"\\u00c2\\u00a7 \\u00f0\\u009f\\u0090\\u008d \\[38 chars]0b7"'

- [b'"\\u00a7 \\ud83d\\udc0d \\u03b4 \\ud834\\udc37"']
+ [b'"\\u00c2\\u00a7 \\u00f0\\u009f\\u0090\\u008d \\u00ce\\u00b4 \\u00f0\\u009d'
+  b'\\u0080\\u00b7"']

----------------------------------------------------------------------

Other tests are passed by accident. Input data is encoded with UTF-8 and decoded with locale encoding in json.tool. Output data is encoded with locale encoding in json.tool and you get back the initial UTF-8 representation.

Try to pass escaped data not encodable with locale encoding.

$ echo '["\u20ac"]' | LC_ALL=en_US.iso88591 ./python -m json.tool --no-ensure-ascii
Traceback (most recent call last):
  File "/home/serhiy/py/cpython/Lib/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/serhiy/py/cpython/Lib/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/serhiy/py/cpython/Lib/json/tool.py", line 64, in <module>
    main()
  File "/home/serhiy/py/cpython/Lib/json/tool.py", line 58, in main
    sort_keys=options.sort_keys,
  File "/home/serhiy/py/cpython/Lib/json/__init__.py", line 180, in dump
    fp.write(chunk)
UnicodeEncodeError: 'latin-1' codec can't encode character '\u20ac' in position 7: ordinal not in range(256)

@berkerpeksag
Copy link
Member

By the way, --no-ensure-ascii and --indent/--no-indent are two separate enhancements so there should be two pull requests.

out, err = proc.communicate(self.data.encode())
self.assertEqual(out.splitlines(), self.expect.encode().splitlines())
self.assertEqual(err, None)
self.assertEqual(err, b'')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@berkerpeksag is it okay to bundle a change to an unrelated test within these PRs? Previously stderr wasn't being piped, and err would invariably be None even if the process were to fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, --no-ensure-ascii and --indent/--no-indent are two separate enhancements so there should be two pull requests.

@berkerpeksag will keep this PR for --no-ensure-ascii and open a new one for --indent/--no-indent. I think it would make sense to merge the indent PR first, as it's closer to complete.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say open a third pull request for this one (this way we can quickly merge the pipe change) No need to create a new issue on bugs.python.org.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@berkerpeksag opened at #346

@dhimmel dhimmel changed the title Increase the utility of json.tool bpo-27413: add --no-ensure-ascii argument to json.tool Feb 27, 2017
dhimmel added a commit to dhimmel/cpython that referenced this pull request Feb 27, 2017
@dhimmel
Copy link
Contributor Author

dhimmel commented Mar 10, 2017

@berkerpeksag @serhiy-storchaka @methane: I'm excited to continue work on this pull request and now have a better grasp of the locale-dependent encoding issues that I still have to deal with. However, I was hoping that we could get the following PRs merged (in the following order since they're dependent):

  1. Fix stderr bug in json.tool test #346: fixing stderr bug in existing json.tool test
  2. bpo-29636: Add --indent / --no-indent arguments to json.tool #345: addition of indentation options to json.tool which affects the structure of the json.tool code.
  3. then it will be easier to work address the outstanding issues with this PR.

Of course review thoroughly... just that it seems like review momentum has stalled.

berkerpeksag pushed a commit that referenced this pull request Mar 15, 2017
berkerpeksag pushed a commit to berkerpeksag/cpython that referenced this pull request Mar 15, 2017
berkerpeksag added a commit that referenced this pull request Mar 16, 2017
dhimmel added a commit to dhimmel/cpython that referenced this pull request Jul 20, 2017
@brettcannon
Copy link
Member

To try and help move older pull requests forward, we are going through and backfilling 'awaiting' labels on pull requests that are lacking the label. Based on the current reviews, the best we can tell in an automated fashion is that a core developer requested changes to be made to this pull request.

If/when the requested changes have been made, please leave a comment that says, I have made the requested changes; please review again. That will trigger a bot to flag this pull request as ready for a follow-up review.

akruis pushed a commit to akruis/cpython that referenced this pull request Jan 6, 2019
…ptor"

objects, if soft-switching is enabled. The code has been in Stackless
forever, but was disabled for some unknown reason.
akruis pushed a commit to akruis/cpython that referenced this pull request Jan 12, 2019
…ptor"

objects, if soft-switching is enabled. The code has been in Stackless
forever, but was disabled for some unknown reason.
@brettcannon brettcannon added the type-feature A feature request or enhancement label Aug 22, 2019
@txtsd
Copy link

txtsd commented Dec 1, 2019

Any chance of this getting merged?

@dhimmel
Copy link
Contributor Author

dhimmel commented Dec 5, 2019

Closing in favor of #17472.

@dhimmel dhimmel closed this Dec 5, 2019
jaraco pushed a commit that referenced this pull request Dec 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting changes type-feature A feature request or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants