Skip to content

asyncio SSL contexts leak sockets after calling close with certain Apache servers #73592

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
thehesiod mannequin opened this issue Feb 1, 2017 · 37 comments
Closed

asyncio SSL contexts leak sockets after calling close with certain Apache servers #73592

thehesiod mannequin opened this issue Feb 1, 2017 · 37 comments
Assignees
Labels
3.11 only security fixes build The build process and cross-build performance Performance or resource usage topic-asyncio

Comments

@thehesiod
Copy link
Mannequin

thehesiod mannequin commented Feb 1, 2017

BPO 29406
Nosy @tiran, @ned-deily, @asvetlov, @ambv, @dimaqq, @1st1, @fafhrd91, @thehesiod, @grzgrzgrz3, @Mariatta, @arthurdarcet, @kumaraditya303
PRs
  • bpo-29406: asyncio SSL contexts leak sockets after calling close with certain servers #409
  • bpo-29870: ssl socket leak #981
  • [3.6] bpo-29406: asyncio SSL contexts leak sockets after calling close with certain servers (GH-409) #2062
  • [3.5] bpo-29406: asyncio SSL contexts leak sockets after calling close with certain servers (GH-409) #2063
  • Revert "bpo-29406: asyncio SSL contexts leak sockets after calling cl… #2111
  • Revert "[3.6] bpo-29406: asyncio SSL contexts leak sockets after call… #2112
  • Revert "[3.5] bpo-29406: asyncio SSL contexts leak sockets after call… #2113
  • bpo-29406: asyncio SSL protocol leaks sockets after calling close with certain servers #2269
  • bpo-29406: Prevent SSL socket leaking in asyncio #4402
  • Files
  • scratch_1.py: asyncio SSL socket leak after close
  • ssl_shutdown_timeout.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/asvetlov'
    closed_at = <Date 2022-03-14.22:50:24.009>
    created_at = <Date 2017-02-01.00:39:04.997>
    labels = ['3.11', 'build', 'expert-asyncio', 'performance']
    title = 'asyncio SSL contexts leak sockets after calling close with certain Apache servers'
    updated_at = <Date 2022-03-14.22:50:24.008>
    user = 'https://github.com/thehesiod'

    bugs.python.org fields:

    activity = <Date 2022-03-14.22:50:24.008>
    actor = 'asvetlov'
    assignee = 'asvetlov'
    closed = True
    closed_date = <Date 2022-03-14.22:50:24.009>
    closer = 'asvetlov'
    components = ['Build', 'asyncio']
    creation = <Date 2017-02-01.00:39:04.997>
    creator = 'thehesiod'
    dependencies = []
    files = ['46470', '46474']
    hgrepos = []
    issue_num = 29406
    keywords = ['patch']
    message_count = 37.0
    messages = ['286573', '286574', '286586', '286664', '288841', '295588', '295589', '295605', '295636', '295637', '295659', '295660', '295661', '295670', '295671', '295690', '295691', '295692', '295714', '295715', '295716', '295717', '295765', '296359', '296360', '297493', '297678', '302985', '306228', '306230', '306232', '309822', '312892', '312916', '332042', '332074', '414509']
    nosy_count = 13.0
    nosy_names = ['christian.heimes', 'ned.deily', 'fafhrd', 'asvetlov', 'lukasz.langa', 'Dima.Tisnek', 'yselivanov', 'fafhrd91', 'thehesiod', 'grzgrzgrz3', 'Mariatta', 'rthr', 'kumaraditya']
    pr_nums = ['409', '981', '2062', '2063', '2111', '2112', '2113', '2269', '4402']
    priority = 'critical'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue29406'
    versions = ['Python 3.11']

    @thehesiod
    Copy link
    Mannequin Author

    thehesiod mannequin commented Feb 1, 2017

    with the attached code note how HttpClient.connection_lost callback is never called for the madis server. The madis server is an apache server, I tried with the OSX apache server and could not reproduce the issue so it seems something particular about their apache version or configuration. This is a pretty critical issue as close() does not release the socket.

    @thehesiod thehesiod mannequin added performance Performance or resource usage topic-asyncio labels Feb 1, 2017
    @thehesiod
    Copy link
    Mannequin Author

    thehesiod mannequin commented Feb 1, 2017

    updating to make default the error case (madis)

    @fafhrd91
    Copy link
    Contributor

    fafhrd91 commented Feb 1, 2017

    madis-data.ncep.noaa.gov side does not complete ssl shutdown process.

    @thehesiod
    Copy link
    Mannequin Author

    thehesiod mannequin commented Feb 1, 2017

    Thanks so much for the patch!

    may want to change spelling of what was supposed to be "shutdown" =) Also think it's worth a comment stating why it's needed? Like certain Apache servers were noticed to not complete the SSL shutdown process.

    @1st1
    Copy link
    Member

    1st1 commented Mar 2, 2017

    Can you guys create a PR on github.com/python/cpython?

    @Mariatta Mariatta added the 3.7 (EOL) end of life label Jun 9, 2017
    @Mariatta
    Copy link
    Member

    Mariatta commented Jun 9, 2017

    There are two PRs for this issue. Is one depending on the other?

    PR 409 has merge conflict that needs to be resolved.

    @1st1
    Copy link
    Member

    1st1 commented Jun 9, 2017

    There are two PRs for this issue. Is one depending on the other?

    No, they all address slightly different bugs. All need to be merged though.

    PR 409 has merge conflict that needs to be resolved.

    Yes, it seems that it's based on 3.6 branch. I asked the author to rebase it.

    @1st1
    Copy link
    Member

    1st1 commented Jun 10, 2017

    New changeset a608d2d by Yury Selivanov (Nikolay Kim) in branch 'master':
    bpo-29406: asyncio SSL contexts leak sockets after calling close with certain servers (#409)
    a608d2d

    @1st1
    Copy link
    Member

    1st1 commented Jun 10, 2017

    New changeset 6e14fd2 by Yury Selivanov in branch '3.6':
    [3.6] bpo-29406: asyncio SSL contexts leak sockets after calling close with certain servers (GH-409) (bpo-2062)
    6e14fd2

    @1st1
    Copy link
    Member

    1st1 commented Jun 10, 2017

    New changeset 1395c58 by Yury Selivanov in branch '3.5':
    [3.5] bpo-29406: asyncio SSL contexts leak sockets after calling close with certain servers (GH-409) (bpo-2063)
    1395c58

    @ned-deily
    Copy link
    Member

    While testing current 3.6 top of trunk, I noticed spurious error messages being reported that were not causing the tests to fail. Investigating further, I found the culprit to be this issue's 6e14fd2. Curiously, the errors aren't showing up on buildbots although I can reproduce 100% on both platforms I've tried: Debian Linux and macOS 10.12. For some reason that needs to be explored, using -W on regrtest causes the messages to be suppressed and the buildbots use -w by default.

    $ ./python
    Python 3.6.1+ (remotes/upstream/3.6:76eabd3a21, Jun 10 2017, 15:20:44)
    [GCC 6.3.0 20170516] on linux
    Type "help", "copyright", "credits" or "license" for more information.

    >>

    # without -w or -W
    $ ./python -m test test_asyncio
    Run tests sequentially
    0:00:00 [1/1] test_asyncio
    Fatal error on SSL transport
    protocol: <asyncio.sslproto.SSLProtocol object at 0xb55423b4>
    transport: None
    Traceback (most recent call last):
      File "/py/3x/unix/source/Lib/asyncio/sslproto.py", line 660, in _process_write_backlog
        self._transport.write(chunk)
    AttributeError: 'NoneType' object has no attribute 'write'
    Fatal error on SSL transport
    protocol: <asyncio.sslproto.SSLProtocol object at 0xb5542504>
    transport: None
    Traceback (most recent call last):
      File "/py/3x/unix/source/Lib/asyncio/sslproto.py", line 660, in _process_write_backlog
        self._transport.write(chunk)
    AttributeError: 'NoneType' object has no attribute 'write'
    Fatal error on SSL transport
    protocol: <asyncio.sslproto.SSLProtocol object at 0xb639f8bc>
    transport: None
    Traceback (most recent call last):
      File "/py/3x/unix/source/Lib/asyncio/sslproto.py", line 660, in _process_write_backlog
        self._transport.write(chunk)
    AttributeError: 'NoneType' object has no attribute 'write'
    Fatal error on SSL transport
    protocol: <asyncio.sslproto.SSLProtocol object at 0xb6126f84>
    transport: None
    Traceback (most recent call last):
      File "/py/3x/unix/source/Lib/asyncio/sslproto.py", line 660, in _process_write_backlog
        self._transport.write(chunk)
    AttributeError: 'NoneType' object has no attribute 'write'
    Fatal error on SSL transport
    protocol: <asyncio.sslproto.SSLProtocol object at 0xb6392fbc>
    transport: None
    Traceback (most recent call last):
      File "/py/3x/unix/source/Lib/asyncio/sslproto.py", line 660, in _process_write_backlog
        self._transport.write(chunk)
    AttributeError: 'NoneType' object has no attribute 'write'
    Fatal error on SSL transport
    protocol: <asyncio.sslproto.SSLProtocol object at 0xb671dedc>
    transport: None
    Traceback (most recent call last):
      File "/py/3x/unix/source/Lib/asyncio/sslproto.py", line 660, in _process_write_backlog
        self._transport.write(chunk)
    AttributeError: 'NoneType' object has no attribute 'write'
    1 test OK.

    Total duration: 13 sec
    Tests result: SUCCESS

    # with -w
    $ ./python -m test -w test_asyncio
    Run tests sequentially
    0:00:00 [1/1] test_asyncio
    Fatal error on SSL transport
    protocol: <asyncio.sslproto.SSLProtocol object at 0xb4b15b5c>
    transport: None
    Traceback (most recent call last):
      File "/py/3x/unix/source/Lib/asyncio/sslproto.py", line 660, in _process_write_backlog
        self._transport.write(chunk)
    AttributeError: 'NoneType' object has no attribute 'write'
    Fatal error on SSL transport
    protocol: <asyncio.sslproto.SSLProtocol object at 0xb581f264>
    transport: None
    Traceback (most recent call last):
      File "/py/3x/unix/source/Lib/asyncio/sslproto.py", line 660, in _process_write_backlog
        self._transport.write(chunk)
    AttributeError: 'NoneType' object has no attribute 'write'
    Fatal error on SSL transport
    protocol: <asyncio.sslproto.SSLProtocol object at 0xb53f068c>
    transport: None
    Traceback (most recent call last):
      File "/py/3x/unix/source/Lib/asyncio/sslproto.py", line 660, in _process_write_backlog
        self._transport.write(chunk)
    AttributeError: 'NoneType' object has no attribute 'write'
    Fatal error on SSL transport
    protocol: <asyncio.sslproto.SSLProtocol object at 0xb65faf14>
    transport: None
    Traceback (most recent call last):
      File "/py/3x/unix/source/Lib/asyncio/sslproto.py", line 660, in _process_write_backlog
        self._transport.write(chunk)
    AttributeError: 'NoneType' object has no attribute 'write'
    Fatal error on SSL transport
    protocol: <asyncio.sslproto.SSLProtocol object at 0xb5816c04>
    transport: None
    Traceback (most recent call last):
      File "/py/3x/unix/source/Lib/asyncio/sslproto.py", line 660, in _process_write_backlog
        self._transport.write(chunk)
    AttributeError: 'NoneType' object has no attribute 'write'
    Fatal error on SSL transport
    protocol: <asyncio.sslproto.SSLProtocol object at 0xb582dea4>
    transport: None
    Traceback (most recent call last):
      File "/py/3x/unix/source/Lib/asyncio/sslproto.py", line 660, in _process_write_backlog
        self._transport.write(chunk)
    AttributeError: 'NoneType' object has no attribute 'write'
    1 test OK.

    Total duration: 13 sec
    Tests result: SUCCESS

    # with -W -> no messages
    $ ./python -m test -W test_asyncio
    Run tests sequentially
    0:00:00 [1/1] test_asyncio
    1 test OK.

    Total duration: 13 sec
    Tests result: SUCCESS

    @ned-deily
    Copy link
    Member

    Er, "For some reason that needs to be explored, using -W on regrtest causes the messages to be suppressed and the buildbots use -w by default."

    That should be "use -W by default".

    @ned-deily
    Copy link
    Member

    FTR, same behavior with master (3.7) top of trunk.

    @1st1
    Copy link
    Member

    1st1 commented Jun 10, 2017

    Nikolay, can you please take a look?

    @fafhrd91
    Copy link
    Contributor

    yes, i am on it

    @1st1
    Copy link
    Member

    1st1 commented Jun 11, 2017

    yes, i am on it

    Nick, have you been able to find what the problem is? If not, we'll need to pull this change out of 3.6.2.

    @1st1
    Copy link
    Member

    1st1 commented Jun 11, 2017

    > yes, i am on it

    Nick, have you been able to find what the problem is? If not, we'll need to pull this change out of 3.6.2.

    It's interesting to see how _process_write_backlog gets called after _finalize. One option would be to simply guard transport.write with if transport is not None, but I'm afraid that we'll only mask the bug then.

    @1st1
    Copy link
    Member

    1st1 commented Jun 11, 2017

    It's interesting to see how _process_write_backlog gets called after _finalize. One option would be to simply guard transport.write with if transport is not None, but I'm afraid that we'll only mask the bug then.

    OTOH we weren't setting the transport to None before, we were just closing it. Is setting it to None really necessary?

    @1st1
    Copy link
    Member

    1st1 commented Jun 11, 2017

    Alright, let's play it safe. I'm going to be reverting the change from all branches. Nick, if you are able to figure this out please create a new PR.

    @1st1
    Copy link
    Member

    1st1 commented Jun 11, 2017

    New changeset 09663de by Yury Selivanov in branch 'master':
    Revert "bpo-29406: asyncio SSL contexts leak sockets after calling close with certain servers (#409)" (bpo-2111)
    09663de

    @1st1
    Copy link
    Member

    1st1 commented Jun 11, 2017

    New changeset 83d30bd by Yury Selivanov in branch '3.6':
    Revert "[3.6] bpo-29406: asyncio SSL contexts leak sockets after calling close with certain servers (GH-409) (bpo-2062)" (bpo-2112)
    83d30bd

    @1st1
    Copy link
    Member

    1st1 commented Jun 11, 2017

    New changeset 4e9dfe2 by Yury Selivanov in branch '3.5':
    Revert "[3.5] bpo-29406: asyncio SSL contexts leak sockets after calling close with certain servers (GH-409) (bpo-2063)" (bpo-2113)
    4e9dfe2

    @1st1
    Copy link
    Member

    1st1 commented Jun 12, 2017

    See also bpo-29970.

    @grzgrzgrz3
    Copy link
    Mannequin

    grzgrzgrz3 mannequin commented Jun 19, 2017

    This is not problem with madis-data.ncep.noaa.gov not doing ssl shutdown, this is problem with asyncio not doing it.

    Patch from this bpo-30698 issue fix this too.

    @fafhrd
    Copy link
    Mannequin

    fafhrd mannequin commented Jun 19, 2017

    Let’s close this issue then. I don’t like it anyway.

    On Jun 19, 2017, at 10:21 AM, Grzegorz Grzywacz <[email protected]> wrote:

    Grzegorz Grzywacz added the comment:

    This is not problem with madis-data.ncep.noaa.gov not doing ssl shutdown, this is problem with asyncio not doing it.

    Patch from this bpo-30698 issue fix this too.

    ----------
    nosy: +grzgrzgrz3


    Python tracker <[email protected]>
    <http://bugs.python.org/issue29406\>


    @ned-deily
    Copy link
    Member

    Yury, based on the most recent comments, can this issue be closed now?

    @1st1
    Copy link
    Member

    1st1 commented Jul 4, 2017

    Yury, based on the most recent comments, can this issue be closed now?

    Not sure, I'm reviewing the patch for bpo-30698, let's see if it fixes the problem.

    @ambv
    Copy link
    Contributor

    ambv commented Sep 25, 2017

    This is marked as Critical so either we should change the priority or fix it soon :)

    @asvetlov
    Copy link
    Contributor

    I'm picking up the issue.
    Will provide an updated PR soon.

    @asvetlov asvetlov assigned asvetlov and unassigned 1st1 Nov 14, 2017
    @asvetlov
    Copy link
    Contributor

    I'm skeptical about critical priority.

    The bugfix could be backported to Python 3.6 but I very not sure about the need for 3.5.
    It is a desirable fix but not secure.

    Yury Selivanov please confirm.

    @1st1
    Copy link
    Member

    1st1 commented Nov 14, 2017

    The bugfix could be backported to Python 3.6 but I very not sure about the need for 3.5.
    It is a desirable fix but not secure.

    The attached script looks very innocent, and it's worrisome that it's that easy to make an asyncio SSL server to leak sockets in some contexts. I think it's OK to fix 3.5, but let's make the final decision when we have an approved PR.

    I merged Nick's PR once, but then it introduced some problems on buildbots (https://bugs.python.org/msg295659), so I reverted the commit.

    I'd suggest to look at the attached script again (scratch_1.py) and try to understand what exactly causes the problem with that specific Apache server version. I'd also suggest to come up with a *functional* unittest that reproduces it (you can grab some testing utilities from the uvloop project).

    @thehesiod
    Copy link
    Mannequin Author

    thehesiod mannequin commented Jan 11, 2018

    my understanding is that the PR in https://bugs.python.org/issue30698 fixes this issue no? If so can we get it merged?

    @AngieG AngieG mannequin added build The build process and cross-build topic-SSL performance Performance or resource usage and removed performance Performance or resource usage labels Jan 18, 2018
    @tiran
    Copy link
    Member

    tiran commented Feb 26, 2018

    Andrew, Yury, PR 4402 is still open. Should the fix land in 3.7?

    I have removed the SSL component because it's not a bug in the ssl module.

    @tiran tiran added 3.8 (EOL) end of life and removed topic-SSL labels Feb 26, 2018
    @asvetlov
    Copy link
    Contributor

    I'll take a look on the evening

    @dimaqq
    Copy link
    Mannequin

    dimaqq mannequin commented Dec 18, 2018

    ping...

    @1st1
    Copy link
    Member

    1st1 commented Dec 18, 2018

    Dima, we'll likely address this in 3.8, when we land new SSL implementation. Meanwhile you can use uvloop where all these bugs should be fixed.

    @kumaraditya303
    Copy link
    Contributor

    Since bpo-issue44011 is fixed, this can be closed now @asvetlov.

    @asvetlov asvetlov added 3.11 only security fixes and removed 3.7 (EOL) end of life 3.8 (EOL) end of life labels Mar 14, 2022
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.11 only security fixes build The build process and cross-build performance Performance or resource usage topic-asyncio
    Projects
    None yet
    Development

    No branches or pull requests

    8 participants