Skip to content

bpo-33550: Warn not to set SIGPIPE to SIG_DFL #6773

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 17, 2018

Conversation

splbio
Copy link
Contributor

@splbio splbio commented May 12, 2018

https://bugs.python.org/issue33550

An anti-pattern often seen in python programs is reset SIGPIPE handling to SIG_DFL (process should terminate) for cli tools in order to avoid "messy" output on stderr when the cli tool is piped into a program such as head(1).

The problem with doing this is that if your program happens to talk to anything via sockets or pipes (even via a library) setting SIGPIPE handling to SIG_DFL means that if the remote side closes the connection, or the network drops, your program will instantly terminate instead of following any retry logic. This results in difficult to debug "why is my tool exiting for no reason?" bugs for cli tools.

Since over the years I've seen this anti-pattern often enough, and often enough it's caused problems in the various environments I've worked in, it deserves some form of documentation somewhere.

https://bugs.python.org/issue33550

@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Unfortunately we couldn't find an account corresponding to your GitHub username on bugs.python.org (b.p.o) to verify you have signed the CLA (this might be simply due to a missing "GitHub Name" entry in your b.p.o account settings). This is necessary for legal reasons before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

When your account is ready, please add a comment in this pull request
and a Python core developer will remove the CLA not signed label
to make the bot check again.

Thanks again to your contribution and we look forward to looking at it!

@splbio
Copy link
Contributor Author

splbio commented May 12, 2018

I have signed the CLA and made an account under bug.python.org as "splbio".

  1. Does this documentation seem clear enough? It is a weird corner case, but it's clearly a pretty problematic anti-pattern. Would love feedback on what would make this more clear.

  2. Unfortunately, my attempt to find a funny "sarah connor hiding from terminator robots" image to attach to this pull request in response to the bots was unsuccessful. Even so I hope all is clear now for the administrative part of doing a pull req.

thank you.

for x in range(10000):
print("y")

if __name__ == '__main__':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two issues:

  • this combines example code with test code: the flushes are there to force the issue, while the Exception handler is there solve the issue. A user might think they need the flushes.
  • It is more typical, in code I've seen, to have nothing under the if __name__ check other than a call to main and to have setup code happen in main or similar. Note that this might not be standard python style.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this combines example code with test code: the flushes are there to force the issue, while the Exception handler is there solve the issue. A user might think they need the flushes.

The user does need the flushes, otherwise some stdout may be buffered until after the python program exits the try/except block. You can demonstrate this by removing the flush and piping this program into head -1.

That said the flush on stderr is probably not needed and I can remove it.

I've spun another version moving the code into main() and removing the flush of stderr. The flush of stdout must remain as far as I can tell.

@splbio
Copy link
Contributor Author

splbio commented May 15, 2018

Diff feedback taken, moved all code into main().

@splbio
Copy link
Contributor Author

splbio commented May 16, 2018

@grimreaper Do I need to open an issue in the bug tracker for this and/or create a news entry? Or are small doc changes such as this exempt from both or one of these requirements? ty.

@grimreaper
Copy link
Contributor

I am not a core developer and am not fully aware of the customs here. From what I've seen even more trivial changes get a bug and a news entry (via "blurb") so I'd expect this one to as well.

@splbio splbio changed the title Warn not to set SIGPIPE to SIG_DFL BPO-33550: Warn not to set SIGPIPE to SIG_DFL May 16, 2018
Note on SIGPIPE
---------------

Do not set set :const:`SIGPIPE`'s disposition to :const:`SIGDFL`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set is doubled.

I think SIG_DFL has an underscore, both in Python and in C.


This will cause your program to exit in unexpected ways because Python
will exit if a write to any socket which is the remote side has closed
or dropped occurs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would make more sense without is. Also may be easier to understand with occurs closer to write: “a write occurs to any socket which the remote side has closed or dropped.”

@splbio
Copy link
Contributor Author

splbio commented May 19, 2018

@vadmium The issues noted are cleaned up. Thank you.

@splbio splbio changed the title BPO-33550: Warn not to set SIGPIPE to SIG_DFL bpo-33550: Warn not to set SIGPIPE to SIG_DFL May 22, 2018
@splbio splbio force-pushed the docs_warn_sigpipe branch from 651e623 to 51bb421 Compare June 13, 2018 19:14
@ned-deily
Copy link
Member

@vadmium, is this ready to go?

@splbio splbio force-pushed the docs_warn_sigpipe branch 2 times, most recently from a5a7f0f to 3240b8a Compare June 13, 2018 20:16
@@ -0,0 +1 @@
Warn not to set SIGPIPE to SIG_DFL.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A NEWS entry is not required for small documentation patches.

@splbio splbio force-pushed the docs_warn_sigpipe branch from 88a057b to e307258 Compare June 14, 2018 06:40
@splbio
Copy link
Contributor Author

splbio commented Jun 15, 2018

@berkerpeksag I have removed the unneeded news entry.

Copy link
Member

@vadmium vadmium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn’t read the proposal thoroughly before, sorry. I agree that Python’s documentation or library could support better handling of interrupts and broken pipes in end-user programs. But I have a few concerns.

One is that you seem to confuse the signal SIGPIPE with the BrokenPipeError exception. Yes, outputting to a broken pipe can cause a SIGPIPE signal, but so can os.kill. If Python ignores this signal, it has no effect on its own. It is the OS setting errno to EPIPE that triggers BrokenPipeError.

The other is about the second paragraph. You discourage setting SIGPIPE to SIG_DFL, because that may “cause your program to exit unexpectedly”. But the alternative seems to be calling os._exit without even an error message. This is subtler than the default Python exception message, and the SIG_DFL behaviour, where you may at least discover that Python was terminated by SIGPIPE by checking the equivalent of os.WTERMSIG.

A better solution is often to ignore BrokenPipeError or skip the output and continue to the next stage in the program.

# while inside this try block.
sys.stdout.flush()
except BrokenPipeError:
os._exit(1) # Python exits with error code 1 on EPIPE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t understand why you suggest os._exit. Usually people use SystemExit or sys.exit, which triggers try / finally handlers, flushes sys.__stderr__, calls atexit handlers, and so on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use os.exit() the following output happens:

22:56:46 ~ % cat t2.py 
import os
import sys

def main():
    try:
        for x in range(10000):
            print("y")
        # flush output here to force SIGPIPE to be triggered
        # while inside this try block.
        sys.stdout.flush()
    except BrokenPipeError:
        sys.exit(1)  # Python exits with error code 1 on EPIPE

if __name__ == '__main__':
    main()

22:56:50 ~ % python3 t2.py | head -1
y
Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>
BrokenPipeError: [Errno 32] Broken pipe
22:56:52 ~ % 

Since I can't seem to find a way for python NOT to flush on sys.exit(), this code works, but is very ugly, however it's safer since you are correct about os._exit() skipping shutdown handlers....

This works, so maybe we should amend the code to be like this (removing the unneeded comments and writes to stderr):

22:59:21 ~ % cat t.py 
import os
import sys

def main():
    try:
        for x in range(10000):
            print("y")
        # flush output here to force SIGPIPE to be triggered
        # while inside this try block.
        sys.stdout.flush()
    except BrokenPipeError:
        sys.stderr.write("BPE 1\n")
        try:
            sys.stdout.close()
            #sys.stdout.flush()
            pass
        except BrokenPipeError:
            sys.stderr.write("BPE 2\n")
            pass
        #os._exit(1)  # Python exits with error code 1 on EPIPE
        sys.exit(1)  # Python exits with error code 1 on EPIPE
    finally:
        sys.stdout.close()


if __name__ == '__main__':
    main()

22:59:26 ~ % python3 t.py | head -1 
y
BPE 1
BPE 2
22:59:47 ~ % 

The problem here is that python doesn't seem to have a method to "fpurge(3)" a stream which would work better than double closing it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to work... I wish there was a way to "fpurge(3)" a python file object instead.

import os
import sys

def main():
    try:
        for x in range(10000):
            print("y")
        # flush output here to force SIGPIPE to be triggered
        # while inside this try block.
        sys.stdout.flush()
    except BrokenPipeError:
        devnull = os.open(os.devnull, os.O_WRONLY)
        os.dup2(devnull, sys.stdout.fileno())
        sys.exit(1)  # Python exits with error code 1 on EPIPE

if __name__ == '__main__':
    main()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also python doesn't exit with the correct status code when it catches sigpipe, it returns with exit code 1. (I tested this with Lukasz)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with the latest iteration. We just need to comment on why we're duping stdout to devnull. As simple as "Python flushes standard streams on exit; redirect remaining output to devnull to avoid another BrokenPipeError at shutdown".

print("y")
# flush output here to force SIGPIPE to be triggered
# while inside this try block.
sys.stdout.flush()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it would be clearer if you made it an indefinite while True loop; then you wouldn’t need the flush

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal here is to give a recipe which covers common uses. Showing that it's important to use sys.stdout.flush() because not all programs are an infinite loop, a significant portion do a single task outputting some output and that flush is required to avoid the pipe error.

@@ -503,3 +503,32 @@ be sent, and the handler raises an exception. ::

signal.alarm(0) # Disable the alarm

Note on SIGPIPE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, I don't think this is suitable for the signal module documentation. Maybe we can create a HOWTO document for signals? A HOWTO document would also allow us adding more best practices in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Until we have such document, I'm fine putting this in the documentation as is so it's searchable.

---------------

Piping output of your program to tools like :manpage:`head(1)` will
cause a SIGPIPE signal to be sent to your process when the receiver
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SIGPIPE -> :const:`SIGPIPE`

@splbio splbio force-pushed the docs_warn_sigpipe branch 2 times, most recently from 7a0511b to f9c05c4 Compare June 19, 2018 15:18
@splbio
Copy link
Contributor Author

splbio commented Jun 19, 2018

Updates from @berkerpeksag (const for SIGPIPE) and @ambv (comments why devnull dup2'd over stdout) taken. Also rebased onto master.

I wonder if there should be a standard decorator for this since it's pretty problematic pattern? I may make a package shortly. Advice please?

@splbio
Copy link
Contributor Author

splbio commented Jun 26, 2018

@vadmium

I've inlined here a program that illustrates why setting SIGPIPE to SIG_DFL is insidious and the subtle nature of the bugs it introduces.

Please run it with:

./foo.py dfl

And you will see it exit when trying to do network communications.
This is the crux of the issue, basically by setting SIGPIPE to SIGDFL you break expected behavior for most libraries. There is really no truly safe way to set SIGPIPE to SIGDFL in anything but the most naive of programs, specifically filters such as cut(1), grep(1), col(1) iff those programs do not contact a central logging system for grabbing statistics as one typically does in a large scale enterprise. Even if one does write a very simple filter program then it is bad idea due to the fact that most programs eventually expand to do some form of network communication for example logging or ldap.

This is the "unexpected behavior/exit" that I am warning against and why os._exit() while not optimal is actually not as bad as SIGPIPE->SIG_DFL.

I understand that this is a complex corner case, but really a python program should almost never, ever, set SIGPIPE to SIG_DFL or it risks the entire program shutting down unexpectedly if it does interactions with sockets as part of its lifecycle.

#
# Program showing the dangers of setting the SIG_PIPE handler to the default handler (SIG_DFL).
#
# To illustrate the problem run with:
# ./foo.py dfl
#
# The program will exit in do_network_stuff() even though there is a an "except" clause.
# The do_network_stuff() simulates a remote connection that closes before it can be written to
# which happens often enough to be a hazard in practice.
#
# 
#

import signal
import sys
import socket
import os

def sigpipe_handler(sig, frame):
    sys.stderr.write("Got sigpipe \n\n\n")
    sys.stderr.flush()

def get_server_connection():
    # simulate making a connection to a remote service that closes the connection
    # before we can write to it.  (In practice a host rebooting, or otherwise exiting while we are
    # trying to interact with it will be the true source of such behavior.)
    s1, s2 = socket.socketpair()
    s2.close()
    return s1


def do_network_stuff():
    # simulate interacting with a remote service that closes its connection
    # before we can write to it.  Example: connecting to an http service and
    # issuing a GET request, but the remote server is shutting down between
    # when our connection finishes the 3-way handshake and when we are able
    # to write our "GET /" request to it.
    # In theory this function should be resilient to this, however if SIGPIPE is set
    # to SIGDFL then this code will cause termination of the program. 
    if 'dfl' in sys.argv[1:]:
        signal.signal(signal.SIGPIPE, signal.SIG_DFL)

    for x in range(5):
        server_conn = get_server_connection()
        print("about to write to server socket...")
        try:
            server_conn.send(b"GET /")
        except BrokenPipeError as bpe:
            print("caught broken pipe on talking to server, retrying...")

def work():
    do_network_stuff()
    for x in range(10000):
        print("y")
    sys.stdout.flush()


def main():
    if 'nocatch' in sys.argv[1:]:
        work()
    else:
        try:
            work()
        except BrokenPipeError as bpe:
            signal.signal(signal.SIGPIPE, signal.SIG_DFL)
            os.kill(os.getpid(), signal.SIGPIPE)


if __name__ == '__main__':
    main()


@splbio splbio force-pushed the docs_warn_sigpipe branch from f9c05c4 to f129008 Compare June 26, 2018 15:51
@ambv ambv added the skip news label Aug 17, 2018
@ambv ambv merged commit a251073 into python:master Aug 17, 2018
@bedevere-bot
Copy link

@ambv: Please replace # with GH- in the commit message next time. Thanks!

@miss-islington
Copy link
Contributor

Thanks @splbio for the PR, and @ambv for merging it 🌮🎉.. I'm working now to backport this PR to: 3.7.
🐍🍒⛏🤖

@bedevere-bot
Copy link

GH-8798 is a backport of this pull request to the 3.7 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Aug 17, 2018
(cherry picked from commit a251073)

Co-authored-by: Alfred Perlstein <[email protected]>
carljm added a commit to carljm/cpython that referenced this pull request Aug 19, 2018
* master: (107 commits)
  bpo-22057: Clarify eval() documentation (pythonGH-8812)
  bpo-34318: Convert deprecation warnings to errors in assertRaises() etc. (pythonGH-8623)
  bpo-22602: Raise an exception in the UTF-7 decoder for ill-formed sequences starting with "+". (pythonGH-8741)
  bpo-34415: Updated logging.Formatter docstring. (pythonGH-8811)
  bpo-34432: doc Mention complex and decimal.Decimal on str.format not about locales (pythonGH-8808)
  bpo-34381: refer to 'Running & Writing Tests' in README.rst (pythonGH-8797)
  Improve error message when mock.assert_has_calls fails (pythonGH-8205)
  Warn not to set SIGPIPE to SIG_DFL (python#6773)
  bpo-34419: selectmodule.c does not compile on HP-UX due to bpo-31938 (pythonGH-8796)
  bpo-34418: Fix HTTPErrorProcessor documentation (pythonGH-8793)
  bpo-34391: Fix ftplib test for TLS 1.3 (pythonGH-8787)
  bpo-34217: Use lowercase for windows headers (pythonGH-8472)
  bpo-34395: Fix memory leaks caused by incautious usage of PyMem_Resize(). (pythonGH-8756)
  bpo-34405: Updated to OpenSSL 1.1.0i for Windows builds. (pythonGH-8775)
  bpo-34384: Fix os.readlink() on Windows (pythonGH-8740)
  closes bpo-34400: Fix undefined behavior in parsetok(). (pythonGH-4439)
  bpo-34399: 2048 bits RSA keys and DH params (python#8762)
  Make regular expressions in test_tasks.py raw strings. (pythonGH-8759)
  smtplib documentation fixes (pythonGH-8708)
  Fix misindented yaml in logging how to example (pythonGH-8604)
  ...
Mariatta pushed a commit that referenced this pull request Aug 25, 2018
(cherry picked from commit a251073)

Co-authored-by: Alfred Perlstein <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants