Skip to content

Treat ASCII-only literals as str when using unicode_literals #3648

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

ilevkivskyi
Copy link
Member

Fixes #3619

This PR only affects code that uses from __future__ import unicode_literals.

It looks like there is a solution that will not require modifying typed_ast. Note that this will treat ASCII-only literals with an explicit prefix u'foo' as str, not unicode. This can be changed, but only by updating typed_ast (the point is that Str has an attribute has_b indicating the presence of b prefix, we will need also has_u).

I know unicode_literals are bad, but I have seen several people complaining about this recently.
@JukkaL @JelleZijlstra what do you think?

@@ -637,6 +638,9 @@ def visit_ImportFrom(self, n: ast27.ImportFrom) -> ImportBase:
n.level,
[(a.name, a.asname) for a in n.names])
self.imports.append(i)
if (n.module == '__future__' and len(n.names) == 1 and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this work if I do from __future__ import absolute_import, unicode_literals? That works at runtime.

Also, I don't think the asname check is necessary. This program:

from __future__ import unicode_literals as why_would_you_do_this
print(type(''))

prints "unicode".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, good points. I will relax the check now.

@JelleZijlstra
Copy link
Member

I left some comments on the implementation. If it's not too hard to do in typed_ast, I think we should also make sure that strings with an explicit u prefix remain unicode.

I'm also not entirely sure this is a good idea, since it does introduce a difference with how these things actually work at runtime. For example, perhaps somebody is passing ['a', 'list', 'of', 'strings'] to a function that accepts List[unicode], and with this change the list will become a List[str] instead of List[unicode]. We should see if that is an issue in real-world codebases that use unicode_literals; @rowillia could you help check that?


[case testUnicodeLiteralsFutureImportAllASCII_python2]
# coding=utf-8
from __future__ import unicode_literals
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add tests for some of the other syntactic variations, like from __future__ import absolute_import, unicode_literals and from __future__ import unicode_literals as something_else?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, will do.

@ilevkivskyi
Copy link
Member Author

@JelleZijlstra

If it's not too hard to do in typed_ast, I think we should also make sure that strings with an explicit u prefix remain unicode.

This is basically one-two lines (plus some boilerplate, but it is auto-generated). The main problem is that this will require a typed_ast release and updating its stub. I can do this if we agree we need this change.

@ilevkivskyi
Copy link
Member Author

Are we going to push in this direction? If it is not that important, maybe better just close this?

@JukkaL JukkaL added the blocked label Jul 21, 2017
@JukkaL
Copy link
Collaborator

JukkaL commented Jul 21, 2017

This requires more analysis/experimentation to proceed. Also, I think that we'd need to handle the explicit u prefix.

@ilevkivskyi
Copy link
Member Author

Also, I think that we'd need to handle the explicit u prefix.

OK, I will then prepare a typed_ast PR (this PR can be updated however only after the release of typed_ast, otherwise tests will fail).

@ilevkivskyi
Copy link
Member Author

ilevkivskyi commented Jul 24, 2017

The corresponding typed_ast PR python/typed_ast#49

gvanrossum pushed a commit to python/typed_ast that referenced this pull request Sep 15, 2017
This will help experimenting with python/mypy#3648

Currently we preserve only ``b`` string modifier on Python 2, this is a bit arbitrary and for the above mentioned PR we need to keep ``u``. Instead of adding another special-casing I just always preserve all string modifiers in a short string ``kind`` on ``Str`` node, for example:
```python
>>> st = ast3.parse("u'hi'")
>>> st.body[0].value.kind
'u'
```
@ilevkivskyi
Copy link
Member Author

I am closing this since it seems many people don't like unicode_literals. Instead I propose to have a separate discussion about how to use the typed_ast feature we are not using: typed_ast preserves original prefix for string/bytes literals in both Python 2 and 3.

@ilevkivskyi ilevkivskyi deleted the unicode_literals branch May 24, 2018 19:18
@gvanrossum
Copy link
Member

Is that separate discussion somewhere? Because in the end we did release the typed_ast feature (with a lot of delays and some fumblings). Was it all for naught? Or is there actually some use for the kind attribute on Str?

@ilevkivskyi
Copy link
Member Author

I think there was an attempt to use Str.kind for better bytes vs unicode story (in the typing repo IIRC). In mypy it is only currently used for literal types (this is why this feature was released).

@gvanrossum
Copy link
Member

Oh, that's a good enough use case for me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Treat ascii-only literals as str, not unicode when using unicode_literals
4 participants