Locking issue with asyncio StreamWriter.drain() with SSL Transport #102792
Labels
stdlib
Python modules in the Lib dir
topic-asyncio
topic-SSL
type-bug
An unexpected behavior, bug, or error
Bug report
Hello,
I'm writing about a change in behavior for StreamReader/StreamWriter objects with SSL transports in Python 3.11.x. When assessing an application support for Python 3.11 ( vertexproject/synapse#3025 ) I ran across an issue with some of our application tests locking up. This was occurring with some tests which relied on behavior where we are creating an asyncio task to read data from a socket, and write that data back to a StreamWriter. This relay task was blocking a
drain()
call.We wrap the StreamReader/StreamWriter objects into our own Link class, which provides some common configuration. In this, we do two notable thing: We set the StreamWriter transport write_buffer_limits high water mark to 0; and we ensure that all of our
writer.write()
calls have a corresponding drain() call. We do this to ensure that we've written the application buffer into the kernel socket buffer; as we've had issues in the past with the Python application buffer failing to actually flush data to the OS socket after drain() calls have returned ( this is described in https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/#bug-3-closing-time ).The applications main use of this code is to relay data between the main process, and subprocess via a socket ( since a socket file descriptor can be shared between the two processes ). I've removed this multiprocess aspect from the reproduction, and I'm still seeing my relay task blocking. I have only been able to reproduce this with a server-side task which uses a SSL transport.
I've attached two files that can be used to recreate this issue:
To reproduce this, do the following:
./make_certs.sh
to make some ssl certificates.python ssllink_repro.py server
to start the server.python ssllink_repro.py client
to run the client. This client accepts a-n
argument to control the number of bytes it sends. It defaults to 1024 bytes ( which always works ). Providing values > 1024 usually fails as the relay locks up. This behavior is non-deterministic - sometimes the client runs, but the majority of the time it fails.This locking behavior seems to happen only when the server code is running python 3.11. When using Python 3.10, this locking is not observed.
Client Server May Lock
3.10.10 3.10.10 No
3.10.10 3.11.2 Yes
3.11.2 3.10.10 No
3.11.2 3.11.2 Yes
Things I have experimented with that seems to unblock the relay loop:
That seems dangerous and against the reccomendation for using the StreamWriter.write().
This removes the blocking behavior in the example code; but re-introduces the problem of drain() leaving data in the python application buffer, which doens't always end up in the OS socket buffer. That leads to some race-condition behavior that our unit tests expose.
There is also a
--no-ssl
option on the client and server as well, which runs the code without an SSL link.Reproduction Code
make_certs.sh
ssllink_repro.py
Expected output for a successful run
Server:
client
Expected failure case
server
client
Your environment
CPython 3.10.6, 3.10.10, 3.11.1, 3.11.2
Ubuntu 22.04, Debian Bookworm, x86_64
The text was updated successfully, but these errors were encountered: