Skip to content

Be able to handle str/bytes in Python 2/3 code #19

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
brettcannon opened this issue Oct 24, 2014 · 11 comments
Closed

Be able to handle str/bytes in Python 2/3 code #19

brettcannon opened this issue Oct 24, 2014 · 11 comments
Milestone

Comments

@brettcannon
Copy link
Member

In Python 2/3 code, what str and bytes represent can be considered somewhat muddled depending under what interpreter you are executing (and unicode should be simply left out). I'm not sure if the preferred approach is to have tools like mypy assume str means unicode in Python 2 and str in Python 3 and bytes means Python 3 bytes of to have typing.py have Str and Bytes types to make it abundantly clear. Since function annotations are Python 3 syntax I'm assuming the tools will be more-or-less be assuming Python 3 semantics, but it might be good to state upfront that's the expectation when specified in Python 2/3 code. Going with the former approach does mean the usefulness to Python 2-only code is diminished since the concept of native strings becomes hard to express. The latter solution has the annoyance of not using the built-in types.

@gvanrossum
Copy link
Member

I'm not sure I have the answer here, but I do want to point out that perhaps a useful feature for mypy (or any other linter for that matter) might be to check code under the assumption that it must be valid in both Python 2 and Python 3. A hackish way to implement this might even be very simple -- just run the typechecker twice with the same input, once using its Python 2 mode and once using its Python 3 mode. But that's probably something for the mypy tracker (https://github.com/JukkaL/mypy/issues).

As far as what we should put in the PEP about this, the PEP is rather skewed towards Python 3 (after all Python 2 doesn't have annotations, even though it's possible to fake them using the mypy codec hack).

The typing module will mostly mirror collections.abc, and that has ByteString (which tries to capture the behaviors common to bytes and bytearray, but is not very formal about it), but not UnicodeString.

This absence can presumably be explained because Python 3 really only has one Unicode string type (str), but multiple byte string types: bytes, bytearray, and arguably anything that supports memoryview(). IIRC Nick just did a big update to the docs to use uniform terminology for this, and it seems we converged on bytes-like object: https://docs.python.org/3.5/glossary.html#term-bytes-like-object.

I'll keep this issue open in case someone else has a good insight. And perhaps we can find a way to make type hinting useful for the conversion from strict Python 2 code to Python 2/3 straddling code. Perhaps the datatypes that modernize and/or futurize define for this purpose can serve as starting points? (Not everything needs to be specified in the PEP. An important mechanism is the ability to define stub module, which define the interface of various stdlib and 3rd party modules to type checkers. The PEP should define the subset of Python usable for stub modules, but it can't specify what stub modules should exist -- that will be a problem for the community to solve.)

@ambv
Copy link
Contributor

ambv commented Jan 7, 2015

One thing we started doing at Facebook is this:

# s/str(/bytes(/g

from __future__ import unicode_literals
str = type('str')

# s/unicode(/str(/g

This code won't behave exactly the same in all circumstances (for instance while indexing bytes) but it covers enough cases to be usable (with tests!). Puts Python 3 syntax first, which I like.

@gvanrossum
Copy link
Member

Honestly I don't see anything actionable for the PEP in this issue. The PEP is for Python 3. What to do for Python 2 is up to mypy.

@ambv ambv self-assigned this Jan 8, 2015
@ambv ambv added resolution: wontfix A valid issue that most likely won't be resolved for reasons described in the issue resolution: out of scope The idea or discussion was out of scope for this repository labels Jan 8, 2015
@ambv
Copy link
Contributor

ambv commented Jan 8, 2015

Fair enough. Closing for now. @brettcannon, sorry!

@ambv ambv closed this as completed Jan 8, 2015
@vlasovskikh
Copy link
Member

@gvanrossum In the context of the newly-added comment-based syntax for Python 2.7 + Python 3 should we reopen this issue? It's not clear how to annotate text/bytes types in the standard way.

@gvanrossum
Copy link
Member

We have a huge discussion on the mypy tracker about this too. It's not at all clear what to do...

@gvanrossum gvanrossum reopened this Feb 4, 2016
@gvanrossum gvanrossum removed resolution: out of scope The idea or discussion was out of scope for this repository resolution: wontfix A valid issue that most likely won't be resolved for reasons described in the issue labels Feb 4, 2016
@gvanrossum
Copy link
Member

I propose to do only one thing in PEP 484 and typing.py: add a new type alias Text. On Python 3 this is an alias for str; on Python 2 it is unicode.

@gvanrossum gvanrossum added this to the 3.5.2 milestone Apr 5, 2016
@JukkaL
Copy link
Contributor

JukkaL commented Apr 5, 2016

That seems reasonable for 3.5.2. It's at least a strict improvement over what we have now.

@vlasovskikh
Copy link
Member

I agree with @JukkaL. The discussion in python/mypy/issues/1141 will take some time to come up with a general satisfactory solution. Meanwhile people will be able to experiment with text / binary types for Python 2 and 3 using the updated typing module.

@gvanrossum
Copy link
Member

OK, I've added Text to both versions and to the stdlib and the PEP.

I'll close this issue now, since the full spec is probably going to be a separate PEP.

@gvanrossum
Copy link
Member

FWIW I think this has been superseded by #208

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants