Skip to content

Clarify asyncio API in a backward compatible manner #92679

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
7 tasks
arhadthedev opened this issue May 11, 2022 · 3 comments
Closed
7 tasks

Clarify asyncio API in a backward compatible manner #92679

arhadthedev opened this issue May 11, 2022 · 3 comments
Labels
3.12 only security fixes docs Documentation in the Doc dir topic-asyncio type-feature A feature request or enhancement

Comments

@arhadthedev
Copy link
Member

arhadthedev commented May 11, 2022

While studying asyncio to add the TLS shutdown in a non-hacky way, I noticed that the transport/protocol concept is quite misrepresented.

The documentation uses terms colliding with the ISO model. It leaves the impression that asyncio.BaseTransport is related to the transport layer (TCP/UDP), and asyncio.BaseProtocol is related to the application layer (HTTP, FTP).

However, the source code reveals that such an assumption is incorrect and the transport and the protocol are facing sides of links in the chain that starts from a data source outwards:

<<<--------------------------------------------------------------------------------------------<<<

(another I/O API)       (inter-API wrapper)      (for a user,
┌──────────────┐                                  pretends to be the transport)
│ StreamReader │<-+  ┌────────────────────────┐   ┌───────────────────────┐
└──────────────┘  |  │                        │<->│ _SSLProtocolTransport │
                  +->│  StreamReaderProtocol  │   └───────────┐┄┄┄┄┄┄┄┄┄┄┄└─┐   ┌───────────⸾───
┌──────────────┐  |  │                        │               │ SSLProtocol │<->│ Transport ⸾ Unix socket
│ StreamWriter │<-+  └────────────────────────┘               └─────────────┘   └───────────⸾───
└──────────────┘                                    (for a socket,
                                                    pretends to be the protocol)
    (daemon I/O processor)
                                  ┌──────────────────────────┐     ┌──────────────────────────⸾───
┌───────────────────────────┐     │ SubprocessStreamProtocol │     │ _UnixSubprocessTransport ⸾
│                           │     ├──────────────────────────┤     ├──────────────────────────⸾ subprocess
│ _SendfileFallbackProtocol │ <-> ├──────────────────────────┤ <-> ├──────────────────────────⸾  module
│                           │     │ + get_pipe_transport     │     │ + _start                 ⸾
└───────────────────────────┘     │ + get_returncode         │     └──────────────────────────⸾───
                                  │ + pipe_connection_lost   │
                                  │ + pipe_data_received     │
                                  │   ...                    │
                                  └──────────────────────────┘

As a result, I have an idea of doing some improvements and I'd like to get an opinion on whether anything should be changed or postponed to not break the world:

  • Merge _SSLProtocolTransport class into SSLProtocol (need a feedback before a PR)

    The TLS codec is implemented as two tightly coupled classes-interfaces of the same socketless BIO (OpenSSL Binary I/O) object. Moreover, _SSLProtocolTransport is a thin do-nothing wrapper with methods like this:

    def get_extra_info(self, name, default=None):
        """Get optional transport information."""
        return self._ssl_protocol._get_extra_info(name, default)
    
    def set_protocol(self, protocol):
        self._ssl_protocol._set_app_protocol(protocol)
    
    def get_protocol(self):
        return self._ssl_protocol._app_protocol

    Backward compatibility: the public SSLProtocol doesn't change semantics or composition of already existing methods. For third-party hacks that add TLS start/shutdown to StreamWriter, we can permanently soft deprecate _SSLProtocolTransport and replace all of its content with the redirecting __setitem__/__getitem__. The permadeprecation is required because SSL switch-on/off is a common use case (STARSSL and others) so we can be sure that many private hacks exist here, both homebrew and enterprise.

  • Extend https://docs.python.org/3/library/asyncio-protocol.html with a howto (need a feedback before a PR)

    Currently, a concept of the transport/protocol looks as a byside internal detail that leaks through loop methods. However, this concept is powerful. A programmer can spawn a new socket/process transport from an event loop and start threading arbitrary amount of modules onto it.

    For now, the only such a module ("transprotocol") is SSLProtocol but there can be a stream compressor, a base64-cator, or a verifier/monitor/scanner. There is a huge potential of and for third party PyPI packages in a field of stream data processing.

  • Extend loop.start_tls docs with an explanation on parameters (gh-92679: Clarify asyncio.loop.start_tls parameters #92682)

    The method requires transport and protocol parameters but gives no explanation what they're used for (to insert the TLS "transportocol" between them).

  • Allow transports to have multiple outputs (need a feedback before a PR)

    Transports like *SubprocessTransport naturally have multiple outputs, one for each standard stream. Currently, they're provided via SubprocessStreamProtocol with extra methods. However, we can officially recognize that any transport can have one anonymous or many named outputs and allow a user to connect any random protocol to any individual output.

    The named outputs can be presented as a plain dictionary with keys documented for each class. For example, SubprocessTransport could look like this:

    class UnixSubprocessTransport(...):
    
        def __init__(...)
            # [...]
            self.outputs = {stdin: ..., stdout: ..., stderr: ...}
            # [...]

    Backward compatibility: single-output transports continue to behave the same so the compatibility is preserved.

  • Allow protocols to have multiple inputs (need a feedback before a PR)

    With multiple outputs, there can be a need to merge them for further links of the chain. For example, SubprocessTransport's stdout and stderr.

    Backward compatibility: the same as above.

  • Generalize SSLProtocol (need a feedback before a PR)

    Currently, each transport defines a private _start_tls_compatible field used by loop.start_ssl. However, the asyncio module uses in-memory OpenSSL BIO (Binary I/O) that inherently allows to connect any transport and feed any data.

    Also, a programmer could create and insert SSLProtocol into whatever place of the chain they want. Since the class works passively, waiting for events on both sides, it isn't bound to the loop so does not need a special loop.start_tls method.

    Backward compatibility: the loop.start_tls method may be left as a wrapper.

  • Break the reference loop (need a feedback before a PR)

    Currently protocols and transports that face each other hold mutual strong links. As a result, to destroy the whole chain a programmer needs to manually traverse it and break all links, or delegate it to the loop.

    However, a chain has no sence if there is no consumer. This allows to either replace strong Transport → Interface references with the weak ones, or transfer to a model where a transport provides self.on_data = lambda: 0-like connection points, and a protocol connects/disconnects these points to itself by itself. As a result, a transport feeds data into the void hoping that there will be a listner (or listeners).

    Backward compatibility: [help needed]


Conclusion: the changes above will turn asyncio into a powerful tool of asynchronous dataflow processing. Maybe (a little of pipe dream) there'll be a pandas-grade library on this topic.


cc @1st1 @asvetlov as asyncio experts.

@kumaraditya303
Copy link
Contributor

FWIW I doubt that we should touch the ssl implementation at this point. Anyways I leave this to @gvanrossum to decide what to do here.

@gvanrossum
Copy link
Member

Yeah, sadly I have to agree with Kumar, it looks like you know what you're talking about but nobody here has the expertise and bandwidth to help get such a project merged. Maybe you could implement it as a 3rd party library that acts as a replacement for the TLS support in asyncio (still based on the ssl module in the stdlib) and then we can eventually deprecate and remove asyncio's native TLS support? The more we offload this stuff away from the stdlib the better.

@gvanrossum gvanrossum closed this as not planned Won't fix, can't repro, duplicate, stale Oct 27, 2022
Repository owner moved this from Todo to Done in asyncio Oct 27, 2022
@arhadthedev
Copy link
Member Author

arhadthedev commented Oct 27, 2022

Thank you for the feedback. After rereading the issue I would also add that such an overhauling proposal requires a PEP with a mature and moderately adopted proof-of-concept anyway, to gather as wide feedback as possible. So I totally agree that making a PyPI library is a must.

By the way, it's not only about expertise or bandwidth. The deprecation itself cannot be started in a few years until I achieve a solution than naturally makes people dropping the current asyncio to use the library for enhanced SSL and pipelines.

miss-islington pushed a commit to miss-islington/cpython that referenced this issue Nov 2, 2022
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Nov 2, 2022
miss-islington added a commit that referenced this issue Nov 2, 2022
(cherry picked from commit 898d0d9)

Co-authored-by: Oleg Iarygin <[email protected]>
miss-islington added a commit that referenced this issue Nov 2, 2022
(cherry picked from commit 898d0d9)

Co-authored-by: Oleg Iarygin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 only security fixes docs Documentation in the Doc dir topic-asyncio type-feature A feature request or enhancement
Projects
Status: Done
Development

No branches or pull requests

4 participants