Skip to content

Use old-style buffer on Python 2 #119

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from
Closed

Use old-style buffer on Python 2 #119

wants to merge 8 commits into from

Conversation

jakirkham
Copy link
Member

Makes use of the old-style buffer on Python 2 to more easily coerce objects to memoryviews on Python 2. The code here is a bit cleaner by smoothing over Python 2/3 differences. That said, it results in a ~100ns slowdown for Python 3 (due to the lambda) and a ~400ns slowdown for Python 2. This is pretty negligible on the grand scheme of things, but it is worth noting. Figured this might be worth discussion.

TODO:

  • Unit tests and/or doctests in docstrings
  • tox -e py37 passes locally
  • tox -e py27 passes locally
  • Docstrings and API docs for any new/modified user-facing classes and functions
  • Changes documented in docs/release.rst
  • tox -e docs passes locally
  • AppVeyor and Travis CI passes
  • Test coverage to 100% (Coveralls passes)

As Python 2 objects can be comfortably coerced to the old-style buffer
interface, which can cleanly be converted to a `memoryview`, go ahead
and coerce everything to an old-style buffer in Python 2. Then it is a
straightforward matter to get a `memoryview` from the object and handle
it the same as one might on Python 3.
Once we have an old-style buffer object in Python 2, use the same
`memoryview` handling code for both Python 2 and Python 3.
In Python 2, everything can be comfortably coerced to the old-style
buffer interface, which is easily coerced to the new buffer interface.
This is a nice path to proceed down as it can cleanly get a `memoryview`
on Python 2 without copying. This strategy works well on Python 3 as
well as long as we make `buffer` a no-op.
@jakirkham
Copy link
Member Author

That said, it results in a ~100ns slowdown for Python 3 (due to the lambda)...

As a minor note, we eliminated a conditional branch that would have been ~25ns (not factored into the previous benchmark), so this is more accurately ~75ns slowdown for Python 3.

...and a ~400ns slowdown for Python 2.

Similarly we did not include the isinstance check that was here previously. When combined with the conditional branch, this is ~160ns. So the slowdown here is more accurately ~240ns for Python 2.

@jakirkham jakirkham mentioned this pull request Nov 8, 2018
@alimanfoo
Copy link
Member

alimanfoo commented Nov 8, 2018

Thanks @jakirkham, didn't know about this.

FWIW I wouldn't be too concerned about performance of buffer_tobytes, if code is having to go down that path then it usually implies there will be a memory copy (because of creating a new bytes object) so smaller timing changes probably won't have a noticeable impact. I.e., it is already a sub-optimal path for working around functions that don't support any buffer protocol.

I'm being picky but I'd be slightly reluctant to introduce buffer = lambda a: a in PY3, it makes it slightly obscure that there is PY2/3 compat going on within buffer_tobytes. I'd be happy to keep a conditional PY2 branch within buffer_tobytes to be more explicit, even if the extra conditional makes it slightly slower.

@jakirkham
Copy link
Member Author

Yeah, it's not well known. Just one of the subtleties of how the new buffer interface got added to Python 2 (much like the fact that Python 2's array.array doesn't support it 😄).

Sure. Not actually concerned personally. Just trying to be transparent is all. Was using largish (100s to 1000s) of elements to make sure there wasn't a copy.

Completely understandable. Actually thought you might say that. Have reverted the last commit, which leaves a Python 2 branch in there. Technically this makes Python 3 slightly faster (though I don't want to micro-optimize 😉).

@jakirkham
Copy link
Member Author

So one thing that is a little concerning is the tobytes method of memoryview. This seems fine for most sizes of arrays, but if the array gets really large this starts to take a startling amount of time, which makes me wonder if there is a copy happening. Benchmarks taken on Python 3.

In [1]: for i in range(9): 
    ...:     n = 10 ** i 
    ...:     r = 50 
    ...:     t = timeit.timeit( 
    ...:         "memoryview(a).tobytes()", 
    ...:         "import array; a = array.array('d', {n} * [1.0]);".format(n=n),
    ...:  
    ...:         number=r 
    ...:     ) 
    ...:  
    ...:     print("Took %f secs for %i elements." % (t / float(r), n))         
Took 0.000000 secs for 1 elements.
Took 0.000001 secs for 10 elements.
Took 0.000001 secs for 100 elements.
Took 0.000001 secs for 1000 elements.
Took 0.000004 secs for 10000 elements.
Took 0.000065 secs for 100000 elements.
Took 0.001999 secs for 1000000 elements.
Took 0.029316 secs for 10000000 elements.
Took 0.628060 secs for 100000000 elements.

@alimanfoo
Copy link
Member

alimanfoo commented Nov 8, 2018 via email

@jakirkham
Copy link
Member Author

Guess that makes sense. Would think that if the data was already read-only (e.g. another bytes object) then it would just reuse the data. Though that doesn't seem to be the case either. In any event, if this is expected, then I won't worry about.

In the GZip and Zlib encoders, coerce data to old-style buffers on
Python 2 as both GZip and Zlib can handle these inputs. By doing this,
we are able to avoid inducing a copy as would have been the case when
converting the data to `bytes`. We also are able to handle a wide
variety of inputs include `bytearray`s and `array`s.
Basically just define `buffer` in the Python 2 world as itself so that
`flake8` thinks it is defined and doesn't raise false positives.
This is sort of a lie. That said, they are pretty close to one another
and most compat documentation suggests to do this to migrate Python 2
code to Python 3. We only need this so importing `buffer` to appease
`flake8` doesn't become even more of a mess, which seems a bit odd.
@jakirkham
Copy link
Member Author

Have reworked this a bit in PR ( #121 ), which seems like a nicer approach. So will close this out.

@jakirkham jakirkham closed this Nov 9, 2018
@jakirkham jakirkham deleted the use_old_style_buffer_py2 branch November 9, 2018 07:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants