Skip to content
This repository was archived by the owner on Nov 23, 2017. It is now read-only.

Optimize transport.write #389

Closed
1st1 opened this issue Jul 27, 2016 · 8 comments
Closed

Optimize transport.write #389

1st1 opened this issue Jul 27, 2016 · 8 comments

Comments

@1st1
Copy link
Member

1st1 commented Jul 27, 2016

So we have two PRs atm:

I reviewed the second one, and I thought it's actually quite ready to be merged. Now I'm a bit lost between two separate PRs fixing same thing in different ways.

Let's discuss both PRs in this one issue and decide which one is the way to go.

FWIW I've updated uvloop quite a while ago to avoid using custom bytearray buffers, preferring the writev sys call instead.

@methane
Copy link
Member

methane commented Jul 28, 2016

First of all, bytearray based buffer is used already, for most transports except _UnixWritePipeTransport.
#385 just uses bytearray in _UnixWritePipeTransport too.

bytearray-based buffer

Pros:

  • Simple implementation.
  • Faster for small data.
  • .write(data) raises TypeError when data is not bytes-like memoryview. (e.g. memoryview(array.array('H', [1,2,3,4]))
  • Caller can reuse bytearray buffer passed to .write().
  • Easy to use bytearray-like, more efficient buffer implemented in C in the future.

Cons:

  • Slower for large (1000~ bytes) data

#339

Pros:

  • Faster for large data

Cons:

  • Complex implementation.
  • Slow on Windows (until Python have sendmsg on Windows).
  • User can't reuse bytearray passed to .write(). If user did it, they encounter bug which is rarely reproducible and hard to debug.
  • _write_ready() may raise TypeError, instead of write().

Summary of benchmark

From #339 (comment)

small data:

bytearray:
bench_sockets: 10000000 buffers by 10 bytes : 11.32 seconds (8.42 MiB/sec)
bench_sockets: 1000000 buffers by 100 bytes : 1.26 seconds (75.42 MiB/sec)

writev:
bench_sockets: 10000000 buffers by 10 bytes : 15.59 seconds (6.12 MiB/sec)
bench_sockets: 1000000 buffers by 100 bytes : 1.61 seconds (59.39 MiB/sec)

large data:

bytearray:
bench_sockets: 100000 buffers by 1000 bytes : 0.26 seconds (370.33 MiB/sec)
bench_sockets: 10000 buffers by 10000 bytes : 0.15 seconds (620.15 MiB/sec)
bench_sockets: 2000 buffers by 50000 bytes : 0.15 seconds (649.65 MiB/sec)

writev:
bench_sockets: 100000 buffers by 1000 bytes : 0.23 seconds (419.86 MiB/sec)
bench_sockets: 10000 buffers by 10000 bytes : 0.08 seconds (1131.09 MiB/sec)
bench_sockets: 2000 buffers by 50000 bytes : 0.07 seconds (1324.88 MiB/sec)

@1st1
Copy link
Member Author

1st1 commented Aug 2, 2016

@methane @socketpair Alright, I think we should first merge #385.

As for #339 -- I feel a bit uneasy about it. I want asyncio to be fast in all use cases, but the PR is invasive and makes the code much more complex. An ideal solution would be to adaptively use bytearray to aggregate small writes in one buffer, and use vectorized writes for series of big bytes objects.

I'm wondering if we can design a nice fast buffer abstraction to do this. It should be separated from IO, and thoroughly tested. Then we can add os.writev for unix selector loop that would use the buffer.

User can't reuse bytearray passed to .write(). If user did it, they encounter bug which is rarely reproducible and hard to debug.

This is something we need to fix if we are to merge #339. A quick fix is it convert anything that's not bytes to bytes (if the first try-write fails with a BlockingIOError).

@socketpair
Copy link

socketpair commented Aug 2, 2016

  1. I agree, that unix pipes should be fixed first (as bytearray)
  2. Next commit (I can make) is to fix (simplify) tests (assertSequenceEqual)
  3. After that we may think about merging writev() variant.

OK ?

@methane
Copy link
Member

methane commented Aug 3, 2016

I will prototype my idea, faster buffer implementation.
But I hope faster future implementation is reviewd
next Python 3.6 alpha.

@1st1
Copy link
Member Author

1st1 commented Aug 3, 2016

Next commit (I can make) is to fix (simplify) tests (assertSequenceEqual)

@socketpair If you can simplify the tests then sure, make a PR.

@methane If I don't review the C Future patch this week feel free to ping me again on the issue tracker sometime next week.

@1st1
Copy link
Member Author

1st1 commented Sep 15, 2016

Alright, I merged #385 in 6f8f833. Does it mean we can close #339? I think we can still add vectorized writes in 3.7.

@socketpair
Copy link

Well, please don't close. Seems I will find a time to fix/refactor that

@1st1
Copy link
Member Author

1st1 commented Oct 5, 2016

Closing this issue now.

@1st1 1st1 closed this as completed Oct 5, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants