Skip to content

WIP speedup #2734

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

WIP speedup #2734

wants to merge 1 commit into from

Conversation

socketpair
Copy link

No description provided.

Copy link
Member

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! Here are some preliminary comments:

@@ -1552,6 +1587,9 @@ def write_to_fd(self, data: memoryview) -> int:
finally:
# Avoid keeping to data, which can be a memoryview.
# See https://github.com/tornadoweb/tornado/pull/2008
if multiple:
for i in data:
del i
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't do anything - it just creates a reference and then unreferences it. del data is sufficient even in the multiple case.

if data:

if not isinstance(data1, list):
data1 = (data1,)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd rather have separate methods for write and write_multiple instead of adding more options to the single write method. Using a list internally even when there's only one seems less efficient.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

try:
return self.socket.send(data) # type: ignore
if multiple:
return self.socket.sendmsg(data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the docs, sendmsg is not available on windows, so this needs a similar fallback as the SSL case (same for writev in the pipe case)

Given the limited availability of sendmsg and writev, I wonder if this part is worth it. We avoid copies with the changes to websocket.py; the changes at this level are about avoiding additional system calls. I'm not sure how much of a difference that makes.

Note that even if we leave write_to_fd and _handle_write alone, we may still want an IOStream.write_multiple method: it lets us avoid the Future creation and _add_io_state calls without any compatibility costs. (and gives us a stable interface for applications in the event we decide to use sendmsg in the future).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I change the code in a way that this will never be triggered in Windows.

data = compressor.compress(data) + compressor.flush(zlib.Z_SYNC_FLUSH)
assert data.endswith(b"\x00\x00\xff\xff")
return data[:-4]
if isinstance(data, list):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an internal interface; it should probably be changed to either always use a list or always use a single bytes object. The single object case still does a copy; it could return a two-item list instead.

flags = 0

if isinstance(message, list):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is writing one message in multiple pieces desirable? The examples in #2102 (comment) show many messages, but each one is written as one shot. @bryevdv would it be helpful to bokeh to have this interface, or are you already working with a single input buffer per message?

On the other hand, it looks like it might be interesting to have an interface for writing multiple messages at once to reduce overhead when there are a lot of small messages.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If one websocket message actually consists of multiple parts, it is desirable to give all of them to this function as an array. Instead of that, Bokeh splits one big logical message to small ones in its exchange protocol. This is not optimal, and new versions might take advantage of new Tornado feature.

Writing multiple websocket messages using one function call will not help against memory copying though.

Copy link

@bryevdv bryevdv Sep 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bdarnell Currently Bokeh splits up messages like this:

    [
        # these are required
        b'{header}',        # serialized header dict
        b'{metadata}',      # serialized metadata dict
        b'{content},        # serialized content dict

        # these are optional, and come in pairs; header contains num_buffers
        b'{buf_header}',    # serialized buffer header dict
        b'array'            # raw buffer payload data
        ...
    ]

The header fragment will have the form:

    header = {
        # these are required
        'msgid'       : <str> # a unique id for the message
        'msgtype'     : <str> # a message type, e.g. 'ACK', 'PATCH-DOC', etc

        # these are optional
        'num_buffers' : <int> # the number of additional buffers, if any
    }

The metadata fragment has never been used. We could remove it for 2.0. The content and buffer arrays are potentially large, which is why I made the protocol split them out, so that the potentially large copies/concats to assemble a single message would be avoided.

If it is better to concat smaller fragments in order to send a single message rather then multiple, then an optimization we could potentially make on the Bokeh side would be to concat the main header and any buffer headers up front, and just leave content or array fragments separate. Do you think that is advisable?

Or if I am completely off base, and even the large content fragments should be concated to favor a single very large message, we could try that too. I think the array buffers always have to be separate? They are binary, unlike the others.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bdarnell @socketpair any feedback on these comments? Is there something I can elaborate on?

Copy link
Author

@socketpair socketpair Sep 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well. I think, there are three ways how to make Bokeh faster:

  1. Change websocket internal protocol - i.e. move from sending small messages to one contiguous message. Splitting to multiple messages had been made in order to eliminate memory copying in Tornado. Now it is not the case.
  2. Think about yield at left side of every websocket send. If present -- it's correct -- it catch errors, waits fro send completion, stops server from sending next message if previous one has not been sent. Without it -- memory consumption may be raised, but latency should be much lower + throughput is higher (since no pauses between messages).
  3. Switch to aiohttp :). Tornado is way slower than aiohttp in all cases I know, from websocket handling to coroutines. Also it has uvloop. Possibly only some of the requests can be moved to aiohttp. I don't think the move will be difficult. Yes, you have to change every source file by hands, but changes are simple and template-like.

if isinstance(message, str):
message = tornado.escape.utf8(message)
assert isinstance(message, (bytes, memoryview))
self._message_bytes_out += len(message)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This length computation is not correct for memorviews AFAIK:

In [13]: a = np.linspace(0,1, 100)

In [15]: mv = memoryview(a)

In [16]: len(mv)
Out[16]: 100

In [17]: mv.nbytes
Out[17]: 800

Copy link
Author

@socketpair socketpair Sep 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bug present since Tornado 4.5: BaseIOStream.write() mentions that since version 4.5 it supports memoryviews. actually not :) because of their shape.

Anyway. I've fixed also this issue.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, I will take a look tonight

@bryevdv
Copy link

bryevdv commented Sep 11, 2019

@socketpair I don't know how to obtain a good benchmark on just the data xfer in the context of bokeh. I did a very simple thing and simply did a time diff between each send event on the bokeh server side after scrubbing the slider as fast as possible to the end and back. Under this simplistic scenario the PR version shows improvements:

#  PR
0.01871204376220703
0.03457999229431152
0.03917407989501953
0.0618898868560791
0.07506299018859863
0.08429718017578125
0.09840774536132812
0.11544418334960938
0.08565402030944824
0.06834793090820312
0.04084205627441406
0.024216175079345703
0.0087738037109375
0.0076751708984375
# without PR
0.054023027420043945
0.07475113868713379
0.07443594932556152
0.09787774085998535
0.11354708671569824
0.15288114547729492
0.16255998611450195
0.14158391952514648
0.08508515357971191
0.06563305854797363
0.03964400291442871
0.04157900810241699
0.03808784484863281
0.011606931686401367

I expect that the increasing times in this image blur example comes from the blur operation taking more time as the convolution mask size is increased. I will work on a more neutral example that does the same amount of compute on every update.

@socketpair
Copy link
Author

@bryevdv I don't understand these numbers. Also, performance depends on presence of yield at left side of websocket.send_message() in Bokeh code. Without it, things will be faster, but it may eat too much memory because of buffering websocket messages.

@bryevdv
Copy link

bryevdv commented Sep 11, 2019

@socketpair sorry, I just ran a very rough (bad) test to time successive calls to the code that generates the buffer, with and without the PR. (And also using a memoryview with the PR) I was only trying to indicate that, absent any other changes, there seems to be improvement with this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants