Skip to content
This repository was archived by the owner on Nov 23, 2017. It is now read-only.

Set TCP_NODELAY on TCP transports by default #373

Merged
merged 1 commit into from
Sep 12, 2016
Merged

Conversation

1st1
Copy link
Member

@1st1 1st1 commented Jul 5, 2016

This PR enables TCP_NODELAY for all TCP transports.

@gvanrossum
Copy link
Member

Shouldn't we at least have a way to turn it off? Or why do sockets not have this on by default?

@jimfulton
Copy link

jimfulton commented Jul 5, 2016

Assuming that appveyor eventually goes green, LGTM. Thanks!

@1st1
Copy link
Member Author

1st1 commented Jul 6, 2016

@gvanrossum Please see #286, where I wanted to add another base Transport class - TCPTransport, with a set_nodelay method. I still think it'd be a good idea ;)

Or why do sockets not have this on by default?

NODELAY can be a bad thing in cases where you want to send as few packages as possible. Setting it causes writes to be sent asap, instead of waiting until the buffer is full or for a TCP ACK.

Without TCP_NODELAY socket operations can have a huge latency. It's particularly important to set for database drivers, HTTP servers and clients etc, where we want the latency to be as small as possible. For modern applications always setting TCP_NODELAY is a good default (and, for instance, that's what Golang does for all TCP connections).

@Martiusweb
Copy link
Member

If TCP_NODELAY is enabled, a call to write()/send() means that a TCP packet is emitted regardless of if the client ACK'ed the previous packet (as long as the congestion window isn't full), while when disabled, the packet is emited only when there is as much as the MTU to send (or after a well known delay, usually 200ms).

This is a full win when the user sends a whole message (as seen by the application protocol) in one call, and for "synchronized" protocols such as http when peers don't usually write on the socket "at the same time".

Regarding the PR, isn't TCP_NODELAY available on windows/with proactor?

@1st1
Copy link
Member Author

1st1 commented Jul 6, 2016

If TCP_NODELAY is enabled, a call to write()/send() means that a TCP packet is emitted regardless of if the client ACK'ed the previous packet (as long as the congestion window isn't full), while when disabled, the packet is emited only when there is as much as the MTU to send (or after a well known delay, usually 200ms).

I'm curious what happens when you use os.writev() call – writing a bunch of buffers with one syscall. Will a bunch of small buffers sent with writev be aggregated in one TCP packet?

Regarding the PR, isn't TCP_NODELAY available on windows/with proactor?

Yes, good catch. I'll add it to the proactor too.

@1st1
Copy link
Member Author

1st1 commented Jul 6, 2016

If TCP_NODELAY is enabled, a call to write()/send() means that a TCP packet is emitted regardless of if the client ACK'ed the previous packet (as long as the congestion window isn't full), while when disabled, the packet is emited only when there is as much as the MTU to send (or after a well known delay, usually 200ms).

I'm curious what happens when you use os.writev() call – writing a bunch of buffers with one syscall. Will a bunch of small buffers sent with writev be aggregated in one TCP packet?

I think I found the answer -- writev gathers the output and transfers the data in a single operation. This is important because we want to start using it when #339 is merged.

@Martiusweb
Copy link
Member

Only one packet is sent, even with TCP_NODELAY (tested with Linux 4.6.3 and wireshark).

# Disable the Nagle algorithm -- small writes will be
# sent without waiting for the TCP ACK. This generally
# decreases the latency (in some cases significantly.)
_set_nodelay(self._sock)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will work only for TCP sockets, and not for UNIX stream sockets...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. We need to add functional unittests for TCP/UNIX transports to catch these sort of errors...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, accepted sockets should be set as NODELAY too. Please check that this is true.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. I'll prepare a new patch some time later. Thanks for reviewing this iteration!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

@socketpair socketpair Jul 7, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. We need to add functional unittests for TCP/UNIX transports to catch these sort of errors...

Tests probably will show nothing, since error is swallowed. Or, tests should test that TCP_NODELAY has been actually set.

@1st1 1st1 force-pushed the nodelay branch 2 times, most recently from 7db527f to bea3a42 Compare September 12, 2016 01:36
@1st1 1st1 merged commit bea3a42 into python:master Sep 12, 2016
@1st1
Copy link
Member Author

1st1 commented Sep 12, 2016

@socketpair I don't care that much about "performance" here. Questions like "what is faster to build, set or tuple" should only be addressed when you have an extremely tight and performance critical loop in your algorithm. Even then your decision will be very CPython specific. Anyways, I like the code as it is now.

@sethmlarson
Copy link

@1st1 Sorry, misunderstood the usage of _set_nodelay, I thought it was applied to each socket created by asyncio, not just server sockets. If that's the case then how it is now is fine.

@1st1
Copy link
Member Author

1st1 commented Sep 12, 2016

@1st1 Sorry, misunderstood the usage of _set_nodelay, I thought it was applied to each socket created by asyncio, not just server sockets. If that's the case then how it is now is fine.

@SethMichaelLarson @socketpair Hm, NODELAY should be applied to client connections & server connections created with loop.create_connection & loop.create_server. Transports created by those functions are instances of _SelectorSocketTransport which applies NODELAY in its constructor, so all of them have the flag set. Am I missing something?

@sethmlarson
Copy link

@1st1 Oh, so it applies to all sockets as I thought previously. Disregard my above comment. If that's the case then I think I'm still +1 on not using a tuple/set to check socket.family just because of how often it's used.

@asvetlov
Copy link

I believe the check for set is nothing comparing to syscall.
But using TCP_NODELAY makes sense for almost all use cases and should be enabled by default.

@AndreLouisCaron
Copy link

@1st1

Yes, good catch. I'll add it to the proactor too.

I just installed Python 3.6 and seems like the proactor doesn't use TCP_NODELAY. Was there some kind of limitation that prevented adding it? If it's a simple omission, would you be open to a PR that adds it there too?

@AndreLouisCaron
Copy link

For context, I opened issue aio-libs/aiomysql#149 because it has its own switch for TCP_NODELAY (presumably for Python 3.4 and 3.5) that breaks when run against the proactor event loop.

It would be neat if we could have this fixed upstream and if we can remove that broken code from aiomysql.

@@ -640,6 +651,11 @@ def __init__(self, loop, sock, protocol, waiter=None,
self._eof = False
self._paused = False

# Disable the Nagle algorithm -- small writes will be
# sent without waiting for the TCP ACK. This generally
Copy link

@socketpair socketpair Feb 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, TCP_NODELAY is not connected with ACK. Actually, when nodelay is NOT set, kernel will delay sending small TCP packet, waiting for possible additional data to form one big packet. Since we buffer data in our own way, we already concatenate sequence of small writes to big one. So, kernel never sees sequence of small writes and therefore it is not needed to wait for data to concatenate in kernel.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@socketpair see https://en.wikipedia.org/wiki/Nagle%27s_algorithm
nagle waits ACK or enough data in send buffer. That's why delayed ACK + nagle cause trouble.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

You have said only part of algorightm (that is connected with ACK). Much more important thing is what I tried to describe (that is connected with writes < MSS).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P.S. I'm not right too. It's best not to describe Nagle's algorithm in the comment.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants