WIP speedup #2734

socketpair · 2019-08-28T20:01:53Z

No description provided.

bdarnell

Thanks for this! Here are some preliminary comments:

bdarnell · 2019-09-02T16:17:59Z

tornado/iostream.py

@@ -1552,6 +1587,9 @@ def write_to_fd(self, data: memoryview) -> int:
        finally:
            # Avoid keeping to data, which can be a memoryview.
            # See https://github.com/tornadoweb/tornado/pull/2008
+            if multiple:
+                for i in data:
+                    del i


This doesn't do anything - it just creates a reference and then unreferences it. del data is sufficient even in the multiple case.

bdarnell · 2019-09-02T16:24:55Z

tornado/iostream.py

-        if data:
+
+        if not isinstance(data1, list):
+            data1 = (data1,)


I think I'd rather have separate methods for write and write_multiple instead of adding more options to the single write method. Using a list internally even when there's only one seems less efficient.

bdarnell · 2019-09-02T16:35:49Z

tornado/iostream.py

        try:
-            return self.socket.send(data)  # type: ignore
+            if multiple:
+                return self.socket.sendmsg(data)


According to the docs, sendmsg is not available on windows, so this needs a similar fallback as the SSL case (same for writev in the pipe case)

Given the limited availability of sendmsg and writev, I wonder if this part is worth it. We avoid copies with the changes to websocket.py; the changes at this level are about avoiding additional system calls. I'm not sure how much of a difference that makes.

Note that even if we leave write_to_fd and _handle_write alone, we may still want an IOStream.write_multiple method: it lets us avoid the Future creation and _add_io_state calls without any compatibility costs. (and gives us a stable interface for applications in the event we decide to use sendmsg in the future).

I change the code in a way that this will never be triggered in Windows.

bdarnell · 2019-09-02T16:42:26Z

tornado/websocket.py

-        data = compressor.compress(data) + compressor.flush(zlib.Z_SYNC_FLUSH)
-        assert data.endswith(b"\x00\x00\xff\xff")
-        return data[:-4]
+        if isinstance(data, list):


This is an internal interface; it should probably be changed to either always use a list or always use a single bytes object. The single object case still does a copy; it could return a two-item list instead.

bdarnell · 2019-09-02T16:53:02Z

tornado/websocket.py

        flags = 0
+
+        if isinstance(message, list):


Is writing one message in multiple pieces desirable? The examples in #2102 (comment) show many messages, but each one is written as one shot. @bryevdv would it be helpful to bokeh to have this interface, or are you already working with a single input buffer per message?

On the other hand, it looks like it might be interesting to have an interface for writing multiple messages at once to reduce overhead when there are a lot of small messages.

If one websocket message actually consists of multiple parts, it is desirable to give all of them to this function as an array. Instead of that, Bokeh splits one big logical message to small ones in its exchange protocol. This is not optimal, and new versions might take advantage of new Tornado feature.

Writing multiple websocket messages using one function call will not help against memory copying though.

@bdarnell Currently Bokeh splits up messages like this:

[ # these are required b'{header}', # serialized header dict b'{metadata}', # serialized metadata dict b'{content}, # serialized content dict # these are optional, and come in pairs; header contains num_buffers b'{buf_header}', # serialized buffer header dict b'array' # raw buffer payload data ... ]

The header fragment will have the form:

header = { # these are required 'msgid' : <str> # a unique id for the message 'msgtype' : <str> # a message type, e.g. 'ACK', 'PATCH-DOC', etc # these are optional 'num_buffers' : <int> # the number of additional buffers, if any }

The metadata fragment has never been used. We could remove it for 2.0. The content and buffer arrays are potentially large, which is why I made the protocol split them out, so that the potentially large copies/concats to assemble a single message would be avoided.

If it is better to concat smaller fragments in order to send a single message rather then multiple, then an optimization we could potentially make on the Bokeh side would be to concat the main header and any buffer headers up front, and just leave content or array fragments separate. Do you think that is advisable?

Or if I am completely off base, and even the large content fragments should be concated to favor a single very large message, we could try that too. I think the array buffers always have to be separate? They are binary, unlike the others.

@bdarnell @socketpair any feedback on these comments? Is there something I can elaborate on?

Well. I think, there are three ways how to make Bokeh faster:

Change websocket internal protocol - i.e. move from sending small messages to one contiguous message. Splitting to multiple messages had been made in order to eliminate memory copying in Tornado. Now it is not the case.

Think about yield at left side of every websocket send. If present -- it's correct -- it catch errors, waits fro send completion, stops server from sending next message if previous one has not been sent. Without it -- memory consumption may be raised, but latency should be much lower + throughput is higher (since no pauses between messages).

Switch to aiohttp :). Tornado is way slower than aiohttp in all cases I know, from websocket handling to coroutines. Also it has uvloop. Possibly only some of the requests can be moved to aiohttp. I don't think the move will be difficult. Yes, you have to change every source file by hands, but changes are simple and template-like.

bryevdv · 2019-09-02T18:30:21Z

tornado/websocket.py

+            if isinstance(message, str):
+                message = tornado.escape.utf8(message)
+            assert isinstance(message, (bytes, memoryview))
+            self._message_bytes_out += len(message)


This length computation is not correct for memorviews AFAIK:

In [13]: a = np.linspace(0,1, 100) In [15]: mv = memoryview(a) In [16]: len(mv) Out[16]: 100 In [17]: mv.nbytes Out[17]: 800

This bug present since Tornado 4.5: BaseIOStream.write() mentions that since version 4.5 it supports memoryviews. actually not :) because of their shape.

Anyway. I've fixed also this issue.

Great, I will take a look tonight

bryevdv · 2019-09-11T05:39:15Z

@socketpair I don't know how to obtain a good benchmark on just the data xfer in the context of bokeh. I did a very simple thing and simply did a time diff between each send event on the bokeh server side after scrubbing the slider as fast as possible to the end and back. Under this simplistic scenario the PR version shows improvements:

#  PR
0.01871204376220703
0.03457999229431152
0.03917407989501953
0.0618898868560791
0.07506299018859863
0.08429718017578125
0.09840774536132812
0.11544418334960938
0.08565402030944824
0.06834793090820312
0.04084205627441406
0.024216175079345703
0.0087738037109375
0.0076751708984375

# without PR
0.054023027420043945
0.07475113868713379
0.07443594932556152
0.09787774085998535
0.11354708671569824
0.15288114547729492
0.16255998611450195
0.14158391952514648
0.08508515357971191
0.06563305854797363
0.03964400291442871
0.04157900810241699
0.03808784484863281
0.011606931686401367

I expect that the increasing times in this image blur example comes from the blur operation taking more time as the convolution mask size is increased. I will work on a more neutral example that does the same amount of compute on every update.

socketpair · 2019-09-11T08:32:34Z

@bryevdv I don't understand these numbers. Also, performance depends on presence of yield at left side of websocket.send_message() in Bokeh code. Without it, things will be faster, but it may eat too much memory because of buffering websocket messages.

bryevdv · 2019-09-11T15:17:03Z

@socketpair sorry, I just ran a very rough (bad) test to time successive calls to the code that generates the buffer, with and without the PR. (And also using a memoryview with the PR) I was only trying to indicate that, absent any other changes, there seems to be improvement with this PR.

bdarnell reviewed Sep 2, 2019

View reviewed changes

bryevdv reviewed Sep 2, 2019

View reviewed changes

WIP speedup (#2102)

2dcbee1

bdarnell added the websocket label May 3, 2020

jakirkham mentioned this pull request Feb 13, 2021

Profiling Scheduler Performance dask/distributed#4443

Open

bdarnell mentioned this pull request Mar 13, 2021

Supporting a series of buffers in BaseIOStream.write #2991

Open

bdarnell mentioned this pull request Apr 3, 2021

Performance issues when use websocket upload binary files #3007

Closed

socketpair closed this by deleting the head repository May 28, 2023

WIP speedup #2734

WIP speedup #2734

Uh oh!

Conversation

socketpair commented Aug 28, 2019

Uh oh!

bdarnell left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bryevdv Sep 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

socketpair Sep 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

socketpair Sep 10, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bryevdv commented Sep 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

socketpair commented Sep 11, 2019

Uh oh!

bryevdv commented Sep 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

bryevdv Sep 2, 2019 •

edited

Loading

socketpair Sep 16, 2019 •

edited

Loading

socketpair Sep 10, 2019 •

edited

Loading

bryevdv commented Sep 11, 2019 •

edited

Loading

bryevdv commented Sep 11, 2019 •

edited

Loading