Skip to content

Intermittent failures of loop.subprocess_exec() to capture output #85760

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kyleam mannequin opened this issue Aug 20, 2020 · 5 comments
Closed

Intermittent failures of loop.subprocess_exec() to capture output #85760

kyleam mannequin opened this issue Aug 20, 2020 · 5 comments
Assignees
Labels
3.10 only security fixes 3.11 only security fixes 3.12 only security fixes topic-asyncio type-bug An unexpected behavior, bug, or error

Comments

@kyleam
Copy link
Mannequin

kyleam mannequin commented Aug 20, 2020

BPO 41594
Nosy @asvetlov, @1st1, @kyleam
Files
  • reproducer.py: script to trigger issue
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2020-08-20.00:24:11.509>
    labels = ['type-bug', '3.7', '3.10', 'expert-asyncio']
    title = 'Intermittent failures of loop.subprocess_exec() to capture output'
    updated_at = <Date 2020-08-27.11:23:44.419>
    user = 'https://github.com/kyleam'

    bugs.python.org fields:

    activity = <Date 2020-08-27.11:23:44.419>
    actor = 'kyleam'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['asyncio']
    creation = <Date 2020-08-20.00:24:11.509>
    creator = 'kyleam'
    dependencies = []
    files = ['49406']
    hgrepos = []
    issue_num = 41594
    keywords = []
    message_count = 3.0
    messages = ['375680', '375890', '375985']
    nosy_count = 4.0
    nosy_names = ['asvetlov', 'Yaroslav.Halchenko', 'yselivanov', 'kyleam']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = None
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue41594'
    versions = ['Python 3.7', 'Python 3.10']

    @kyleam
    Copy link
    Mannequin Author

    kyleam mannequin commented Aug 20, 2020

    I've been debugging an intermittent test failure in code that calls
    loop.subprocess_exec with an asyncio.SubprocessProtocol subclass.
    Here's a minimal example that I hope captures the issue. It's based
    closely off of the DateProtocol example in the asyncio protocol
    documentation.

        import asyncio
    
        class Protocol(asyncio.SubprocessProtocol):
            def __init__(self, exit_future):
                self.exit_future = exit_future
                self.output = bytearray()
    
            def pipe_data_received(self, fd, data):
                self.output.extend(data)
    
            def process_exited(self):
                self.exit_future.set_result(True)
    
        async def get_stdout():
            loop = asyncio.get_running_loop()
            exit_future = asyncio.Future(loop=loop)
            transport, protocol = await loop.subprocess_exec(
                lambda: Protocol(exit_future),
                "printf", "ok", stdin=None, stderr=None)
            await exit_future
            transport.close()
            return bytes(protocol.output)

    The attached script adds some debugging statements to what's above and
    then repeats the specified number of calls to
    asyncio.run(get_stdout()), aborting if asyncio.run(get_stdout())
    returns an empty byte string rather then the expected b'ok'. With
    Python 3.7.3 on a Debian system, I see occasional aborts. For
    example, after two runs of the script with no unexpectedly empty
    output, I saw

        $ python3 reproducer.py 500
        Iteration:  32 of 500
        Failed on iteration 32
        logfile: /tmp/asyncio-debug-bqehu8f3.log
    
        $ tail /tmp/asyncio-debug-bqehu8f3.log
        DEBUG: process_exited() called
        DEBUG: Starting iteration 31
        DEBUG: Using selector: EpollSelector
        DEBUG: pipe_data_received(): fd=1, data=b'ok'
        DEBUG: process_exited() called
        DEBUG: Starting iteration 32
        DEBUG: Using selector: EpollSelector
        DEBUG: process_exited() called
        DEBUG: pipe_data_received(): fd=1, data=b'ok'
        DEBUG: Failed on iteration 32

    Based on the debug statements, it looks like pipe_data_received does
    get called with the output for the run that comes up empty, but only
    after process_exited is called.

    On my first try with a python built from a recent commit (0be7c21),
    I triggered the failure:

        $ python -V
        Python 3.10.0a0
    
        $ python reproducer.py 500
        Iteration:   8 of 500
        Failed on iteration 8
        logfile: /tmp/asyncio-debug-m5fba4ga.log
    
        $ tail /tmp/asyncio-debug-m5fba4ga.log
        DEBUG: process_exited() called
        DEBUG: Starting iteration 7
        DEBUG: Using selector: EpollSelector
        DEBUG: pipe_data_received(): fd=1, data=b'ok'
        DEBUG: process_exited() called
        DEBUG: Starting iteration 8
        DEBUG: Using selector: EpollSelector
        DEBUG: process_exited() called
        DEBUG: pipe_data_received(): fd=1, data=b'ok'
        DEBUG: Failed on iteration 8

    As I'm following the example from the documentation closely, I hope
    I'm not committing a silly error that leads to an expected race here.

    @kyleam kyleam mannequin added 3.7 (EOL) end of life topic-asyncio type-bug An unexpected behavior, bug, or error labels Aug 20, 2020
    @YaroslavHalchenko
    Copy link
    Mannequin

    YaroslavHalchenko mannequin commented Aug 25, 2020

    Might (although unlikely) be related to https://bugs.python.org/issue40634 which is about BlockingIOError being raised (and ignored) if SelectorEventLoop is reused (not the case here) also in the case of short lived processes.

    @kyleam
    Copy link
    Mannequin Author

    kyleam mannequin commented Aug 27, 2020

    I should have thought to provide the output of when debug=True is
    passed to asyncio.run(). Here it is, with the python on my Debian
    system:

        $ python3 -V
        Python 3.7.3
        
        $ python3 reproducer.py
        Iteration:   1 of 100
        Failed on iteration 1
        logfile: /tmp/asyncio-debug-rqfsxyth.log
        
        $ cat /tmp/asyncio-debug-rqfsxyth.log 
        DEBUG: Starting iteration 1
        DEBUG: Using selector: EpollSelector
        DEBUG: execute program 'printf' stdout=<pipe>
        DEBUG: process 'printf' created: pid 20488
        DEBUG: process 20488 exited with returncode 0
        INFO: <_UnixSubprocessTransport pid=20488 running> exited with return code 0
        DEBUG: Read pipe 8 connected: (<_UnixReadPipeTransport fd=8 polling>, <ReadSubprocessPipeProto fd=1 pipe=<_UnixReadPipeTransport fd=8 polling>>)
        DEBUG: process_exited() called
        INFO: execute program 'printf': <_UnixSubprocessTransport pid=20488 returncode=0 stdout=<_UnixReadPipeTransport fd=8 polling>>
        DEBUG: pipe_data_received(): fd=1, data=b'ok'
        DEBUG: Close <_UnixSelectorEventLoop running=False closed=False debug=True>
        DEBUG: Failed on iteration 1

    And with a python built from a recent commit (8e19c8b):

        $ python -V
        Python 3.10.0a0
        
        $ python reproducer.py
        Iteration:   1 of 100
        Failed on iteration 1
        logfile: /tmp/asyncio-debug-9eyhuas4.log
        
        $ cat /tmp/asyncio-debug-9eyhuas4.log 
        DEBUG: Starting iteration 1
        DEBUG: Using selector: EpollSelector
        DEBUG: execute program 'printf' stdout=<pipe>
        DEBUG: process 'printf' created: pid 20524
        DEBUG: process 20524 exited with returncode 0
        INFO: <_UnixSubprocessTransport pid=20524 running> exited with return code 0
        DEBUG: Read pipe 8 connected: (<_UnixReadPipeTransport fd=8 polling>, <ReadSubprocessPipeProto fd=1 pipe=<_UnixReadPipeTransport fd=8 polling>>)
        DEBUG: process_exited() called
        INFO: execute program 'printf': <_UnixSubprocessTransport pid=20524 returncode=0 stdout=<_UnixReadPipeTransport fd=8 polling>>
        DEBUG: pipe_data_received(): fd=1, data=b'ok'
        DEBUG: Close <_UnixSelectorEventLoop running=False closed=False debug=True>
        DEBUG: Failed on iteration 1

    It looks like I can work around the issue (i.e. I don't observe any
    lost output) by adding this line to the attached script:

    diff --git a/reproducer.py b/reproducer.py
    index 5e04c36..a462898 100644
    --- a/reproducer.py
    +++ b/reproducer.py
    @@ -25,6 +25,7 @@ async def get_stdout():
         transport, protocol = await loop.subprocess_exec(
             lambda: Protocol(exit_future),
             "printf", "ok", stdin=None, stderr=None)
    +    await asyncio.ensure_future(transport._wait())
         await exit_future
         transport.close()
         return bytes(protocol.output)
    

    @kyleam kyleam mannequin added 3.10 only security fixes labels Aug 27, 2020
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @ezio-melotti ezio-melotti moved this to Todo in asyncio Jul 17, 2022
    @kumaraditya303 kumaraditya303 added 3.11 only security fixes 3.12 only security fixes and removed 3.7 (EOL) end of life labels Sep 22, 2022
    @kumaraditya303
    Copy link
    Contributor

    kumaraditya303 commented Sep 22, 2022

    So after some debugging, this is happening because the process_exited callback is executed too early before the data received and connection lost callbacks are executed.

    The following patch fixes the issue and the test suite has no failure:

    diff --git a/Lib/asyncio/unix_events.py b/Lib/asyncio/unix_events.py
    index cf7683fee64..96e6d73a759 100644
    --- a/Lib/asyncio/unix_events.py
    +++ b/Lib/asyncio/unix_events.py
    @@ -223,7 +223,8 @@ async def _make_subprocess_transport(self, protocol, args, shell,
             return transp
     
         def _child_watcher_callback(self, pid, returncode, transp):
    -        self.call_soon_threadsafe(transp._process_exited, returncode)
    +        # Skip one iteration for callbacks to be executed
    +        self.call_soon_threadsafe(self.call_soon, transp._process_exited, returncode)
     
         async def create_unix_connection(
                 self, protocol_factory, path=None, *,

    Output:

    @kumaraditya303 ➜ /workspaces/cpython (main ✗) $ ./python main.py 
    Iteration: 100 of 100
    Every iteration captured stdout as expected

    Without fix sometimes pipe_data_received is executed after process_exited:

    DEBUG: Starting iteration 1
    DEBUG: Using selector: EpollSelector
    DEBUG: pipe_data_received(): fd=1, data=b'ok'
    DEBUG: process_exited() called
    DEBUG: Starting iteration 2
    DEBUG: Using selector: EpollSelector
    DEBUG: pipe_data_received(): fd=1, data=b'ok'
    DEBUG: process_exited() called
    DEBUG: Starting iteration 3
    DEBUG: Using selector: EpollSelector
    DEBUG: process_exited() called
    DEBUG: pipe_data_received(): fd=1, data=b'ok'
    DEBUG: Failed on iteration 3

    With fix every process_exited is preceded by pipe_data_received:

    # Truncated for brevity 
    DEBUG: Starting iteration 98
    DEBUG: Using selector: EpollSelector
    DEBUG: pipe_data_received(): fd=1, data=b'ok'
    DEBUG: process_exited() called
    DEBUG: Starting iteration 99
    DEBUG: Using selector: EpollSelector
    DEBUG: pipe_data_received(): fd=1, data=b'ok'
    DEBUG: process_exited() called
    DEBUG: Starting iteration 100
    DEBUG: Using selector: EpollSelector
    DEBUG: pipe_data_received(): fd=1, data=b'ok'
    DEBUG: process_exited() called

    @kumaraditya303 kumaraditya303 self-assigned this Sep 22, 2022
    miss-islington pushed a commit to miss-islington/cpython that referenced this issue Sep 22, 2022
    miss-islington pushed a commit to miss-islington/cpython that referenced this issue Sep 22, 2022
    miss-islington added a commit that referenced this issue Sep 22, 2022
    miss-islington added a commit that referenced this issue Sep 22, 2022
    @kumaraditya303
    Copy link
    Contributor

    kumaraditya303 commented Sep 22, 2022

    Fixed by #97009 and backported to 3.11 & 3.10. Thanks @gvanrossum!

    Repository owner moved this from Todo to Done in asyncio Sep 22, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 only security fixes 3.11 only security fixes 3.12 only security fixes topic-asyncio type-bug An unexpected behavior, bug, or error
    Projects
    Status: Done
    Development

    No branches or pull requests

    1 participant