Optimize transport.write #389

1st1 · 2016-07-27T14:46:33Z

So we have two PRs atm:

Use bytearray: Use bytearray for buffer in _UnixWritePipeTransport #385 by @methane
Use os.writev: Fix #338 os.writev() instead of os.write(b.join()) and sendmsg() instead of send() #339 by @socketpair

I reviewed the second one, and I thought it's actually quite ready to be merged. Now I'm a bit lost between two separate PRs fixing same thing in different ways.

Let's discuss both PRs in this one issue and decide which one is the way to go.

FWIW I've updated uvloop quite a while ago to avoid using custom bytearray buffers, preferring the writev sys call instead.

The text was updated successfully, but these errors were encountered:

methane · 2016-07-28T08:27:48Z

First of all, bytearray based buffer is used already, for most transports except _UnixWritePipeTransport.
#385 just uses bytearray in _UnixWritePipeTransport too.

bytearray-based buffer

Pros:

Simple implementation.
Faster for small data.
.write(data) raises TypeError when data is not bytes-like memoryview. (e.g. memoryview(array.array('H', [1,2,3,4]))
Caller can reuse bytearray buffer passed to .write().
Easy to use bytearray-like, more efficient buffer implemented in C in the future.

Cons:

Slower for large (1000~ bytes) data

#339

Pros:

Faster for large data

Cons:

Complex implementation.
Slow on Windows (until Python have sendmsg on Windows).
User can't reuse bytearray passed to .write(). If user did it, they encounter bug which is rarely reproducible and hard to debug.
_write_ready() may raise TypeError, instead of write().

Summary of benchmark

From #339 (comment)

small data:

bytearray:
bench_sockets: 10000000 buffers by 10 bytes : 11.32 seconds (8.42 MiB/sec)
bench_sockets: 1000000 buffers by 100 bytes : 1.26 seconds (75.42 MiB/sec)

writev:
bench_sockets: 10000000 buffers by 10 bytes : 15.59 seconds (6.12 MiB/sec)
bench_sockets: 1000000 buffers by 100 bytes : 1.61 seconds (59.39 MiB/sec)

large data:

bytearray:
bench_sockets: 100000 buffers by 1000 bytes : 0.26 seconds (370.33 MiB/sec)
bench_sockets: 10000 buffers by 10000 bytes : 0.15 seconds (620.15 MiB/sec)
bench_sockets: 2000 buffers by 50000 bytes : 0.15 seconds (649.65 MiB/sec)

writev:
bench_sockets: 100000 buffers by 1000 bytes : 0.23 seconds (419.86 MiB/sec)
bench_sockets: 10000 buffers by 10000 bytes : 0.08 seconds (1131.09 MiB/sec)
bench_sockets: 2000 buffers by 50000 bytes : 0.07 seconds (1324.88 MiB/sec)

1st1 · 2016-08-02T13:15:55Z

@methane @socketpair Alright, I think we should first merge #385.

As for #339 -- I feel a bit uneasy about it. I want asyncio to be fast in all use cases, but the PR is invasive and makes the code much more complex. An ideal solution would be to adaptively use bytearray to aggregate small writes in one buffer, and use vectorized writes for series of big bytes objects.

I'm wondering if we can design a nice fast buffer abstraction to do this. It should be separated from IO, and thoroughly tested. Then we can add os.writev for unix selector loop that would use the buffer.

User can't reuse bytearray passed to .write(). If user did it, they encounter bug which is rarely reproducible and hard to debug.

This is something we need to fix if we are to merge #339. A quick fix is it convert anything that's not bytes to bytes (if the first try-write fails with a BlockingIOError).

socketpair · 2016-08-02T14:35:13Z

I agree, that unix pipes should be fixed first (as bytearray)
Next commit (I can make) is to fix (simplify) tests (assertSequenceEqual)
After that we may think about merging writev() variant.

OK ?

methane · 2016-08-03T06:05:14Z

I will prototype my idea, faster buffer implementation.
But I hope faster future implementation is reviewd
next Python 3.6 alpha.

1st1 · 2016-08-03T16:25:29Z

Next commit (I can make) is to fix (simplify) tests (assertSequenceEqual)

@socketpair If you can simplify the tests then sure, make a PR.

@methane If I don't review the C Future patch this week feel free to ping me again on the issue tracker sometime next week.

1st1 · 2016-09-15T20:44:13Z

Alright, I merged #385 in 6f8f833. Does it mean we can close #339? I think we can still add vectorized writes in 3.7.

socketpair · 2016-09-16T09:46:28Z

Well, please don't close. Seems I will find a time to fix/refactor that

1st1 · 2016-10-05T21:39:49Z

Closing this issue now.

1st1 closed this as completed Oct 5, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Optimize transport.write #389

Optimize transport.write #389

1st1 commented Jul 27, 2016

methane commented Jul 28, 2016

Uh oh!

1st1 commented Aug 2, 2016

Uh oh!

socketpair commented Aug 2, 2016 •

edited

Loading

Uh oh!

methane commented Aug 3, 2016

Uh oh!

1st1 commented Aug 3, 2016

Uh oh!

1st1 commented Sep 15, 2016

Uh oh!

socketpair commented Sep 16, 2016

Uh oh!

1st1 commented Oct 5, 2016

Uh oh!

Uh oh!

Optimize transport.write #389

Optimize transport.write #389

Comments

1st1 commented Jul 27, 2016

methane commented Jul 28, 2016

bytearray-based buffer

#339

Summary of benchmark

Uh oh!

1st1 commented Aug 2, 2016

Uh oh!

socketpair commented Aug 2, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

methane commented Aug 3, 2016

Uh oh!

1st1 commented Aug 3, 2016

Uh oh!

1st1 commented Sep 15, 2016

Uh oh!

socketpair commented Sep 16, 2016

Uh oh!

1st1 commented Oct 5, 2016

Uh oh!

socketpair commented Aug 2, 2016 •

edited

Loading