Treat ASCII-only literals as str when using unicode_literals #3648

ilevkivskyi · 2017-07-02T19:53:19Z

This PR only affects code that uses from __future__ import unicode_literals.

It looks like there is a solution that will not require modifying typed_ast. Note that this will treat ASCII-only literals with an explicit prefix u'foo' as str, not unicode. This can be changed, but only by updating typed_ast (the point is that Str has an attribute has_b indicating the presence of b prefix, we will need also has_u).

I know unicode_literals are bad, but I have seen several people complaining about this recently.
@JukkaL @JelleZijlstra what do you think?

JelleZijlstra · 2017-07-02T20:01:36Z

mypy/fastparse2.py

@@ -637,6 +638,9 @@ def visit_ImportFrom(self, n: ast27.ImportFrom) -> ImportBase:
                           n.level,
                           [(a.name, a.asname) for a in n.names])
        self.imports.append(i)
+        if (n.module == '__future__' and len(n.names) == 1 and


Will this work if I do from __future__ import absolute_import, unicode_literals? That works at runtime.

Also, I don't think the asname check is necessary. This program:

from __future__ import unicode_literals as why_would_you_do_this print(type(''))

prints "unicode".

OK, good points. I will relax the check now.

JelleZijlstra · 2017-07-02T20:23:05Z

I left some comments on the implementation. If it's not too hard to do in typed_ast, I think we should also make sure that strings with an explicit u prefix remain unicode.

I'm also not entirely sure this is a good idea, since it does introduce a difference with how these things actually work at runtime. For example, perhaps somebody is passing ['a', 'list', 'of', 'strings'] to a function that accepts List[unicode], and with this change the list will become a List[str] instead of List[unicode]. We should see if that is an issue in real-world codebases that use unicode_literals; @rowillia could you help check that?

JelleZijlstra · 2017-07-02T20:23:57Z

test-data/unit/python2eval.test

+
+[case testUnicodeLiteralsFutureImportAllASCII_python2]
+# coding=utf-8
+from __future__ import unicode_literals


Could you add tests for some of the other syntactic variations, like from __future__ import absolute_import, unicode_literals and from __future__ import unicode_literals as something_else?

OK, will do.

ilevkivskyi · 2017-07-02T20:28:32Z

@JelleZijlstra

If it's not too hard to do in typed_ast, I think we should also make sure that strings with an explicit u prefix remain unicode.

This is basically one-two lines (plus some boilerplate, but it is auto-generated). The main problem is that this will require a typed_ast release and updating its stub. I can do this if we agree we need this change.

ilevkivskyi · 2017-07-21T11:57:21Z

Are we going to push in this direction? If it is not that important, maybe better just close this?

JukkaL · 2017-07-21T12:58:51Z

This requires more analysis/experimentation to proceed. Also, I think that we'd need to handle the explicit u prefix.

ilevkivskyi · 2017-07-21T13:01:08Z

Also, I think that we'd need to handle the explicit u prefix.

OK, I will then prepare a typed_ast PR (this PR can be updated however only after the release of typed_ast, otherwise tests will fail).

ilevkivskyi · 2017-07-24T16:34:24Z

The corresponding typed_ast PR python/typed_ast#49

This will help experimenting with python/mypy#3648 Currently we preserve only ``b`` string modifier on Python 2, this is a bit arbitrary and for the above mentioned PR we need to keep ``u``. Instead of adding another special-casing I just always preserve all string modifiers in a short string ``kind`` on ``Str`` node, for example: ```python >>> st = ast3.parse("u'hi'") >>> st.body[0].value.kind 'u' ```

ilevkivskyi · 2018-05-24T19:18:34Z

I am closing this since it seems many people don't like unicode_literals. Instead I propose to have a separate discussion about how to use the typed_ast feature we are not using: typed_ast preserves original prefix for string/bytes literals in both Python 2 and 3.

gvanrossum · 2019-01-20T00:54:54Z

Is that separate discussion somewhere? Because in the end we did release the typed_ast feature (with a lot of delays and some fumblings). Was it all for naught? Or is there actually some use for the kind attribute on Str?

ilevkivskyi · 2019-01-20T01:21:46Z

I think there was an attempt to use Str.kind for better bytes vs unicode story (in the typing repo IIRC). In mypy it is only currently used for literal types (this is why this feature was released).

gvanrossum · 2019-01-20T02:10:30Z

Oh, that's a good enough use case for me!

Treat ASCII-only literals as bytes even with unicode_literals

8c343a1

JelleZijlstra reviewed Jul 2, 2017

View reviewed changes

Relax check for future import; try to fix Windows encoding problem

ff699d9

JelleZijlstra reviewed Jul 2, 2017

View reviewed changes

Add more tests as per CR

6ece9f9

JukkaL added the blocked label Jul 21, 2017

ilevkivskyi mentioned this pull request Jul 24, 2017

Preserve string kind modifiers python/typed_ast#49

Merged

Merge branch 'master' into unicode_literals

144041f

ilevkivskyi removed the blocked label Sep 17, 2017

ilevkivskyi mentioned this pull request May 13, 2018

Decide how to handle str/unicode python/typing#208

Closed

ilevkivskyi closed this May 24, 2018

ilevkivskyi deleted the unicode_literals branch May 24, 2018 19:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Treat ASCII-only literals as str when using unicode_literals #3648

Treat ASCII-only literals as str when using unicode_literals #3648

Uh oh!

ilevkivskyi commented Jul 2, 2017

Uh oh!

JelleZijlstra Jul 2, 2017

Uh oh!

ilevkivskyi Jul 2, 2017

Uh oh!

JelleZijlstra commented Jul 2, 2017

Uh oh!

JelleZijlstra Jul 2, 2017

Uh oh!

ilevkivskyi Jul 2, 2017

Uh oh!

ilevkivskyi commented Jul 2, 2017

Uh oh!

ilevkivskyi commented Jul 21, 2017

Uh oh!

JukkaL commented Jul 21, 2017

Uh oh!

ilevkivskyi commented Jul 21, 2017

Uh oh!

ilevkivskyi commented Jul 24, 2017 •

edited

Loading

Uh oh!

ilevkivskyi commented May 24, 2018

Uh oh!

gvanrossum commented Jan 20, 2019

Uh oh!

ilevkivskyi commented Jan 20, 2019

Uh oh!

gvanrossum commented Jan 20, 2019

Uh oh!

Uh oh!

Uh oh!

Treat ASCII-only literals as str when using unicode_literals #3648

Treat ASCII-only literals as str when using unicode_literals #3648

Uh oh!

Conversation

ilevkivskyi commented Jul 2, 2017

Uh oh!

JelleZijlstra Jul 2, 2017

Choose a reason for hiding this comment

Uh oh!

ilevkivskyi Jul 2, 2017

Choose a reason for hiding this comment

Uh oh!

JelleZijlstra commented Jul 2, 2017

Uh oh!

JelleZijlstra Jul 2, 2017

Choose a reason for hiding this comment

Uh oh!

ilevkivskyi Jul 2, 2017

Choose a reason for hiding this comment

Uh oh!

ilevkivskyi commented Jul 2, 2017

Uh oh!

ilevkivskyi commented Jul 21, 2017

Uh oh!

JukkaL commented Jul 21, 2017

Uh oh!

ilevkivskyi commented Jul 21, 2017

Uh oh!

ilevkivskyi commented Jul 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ilevkivskyi commented May 24, 2018

Uh oh!

gvanrossum commented Jan 20, 2019

Uh oh!

ilevkivskyi commented Jan 20, 2019

Uh oh!

gvanrossum commented Jan 20, 2019

Uh oh!

Uh oh!

ilevkivskyi commented Jul 24, 2017 •

edited

Loading