Skip to content

Error message "Signal SIGCHLD received, but no signal handler set." #17662

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
gflohr opened this issue Mar 20, 2020 · 14 comments
Open

Error message "Signal SIGCHLD received, but no signal handler set." #17662

gflohr opened this issue Mar 20, 2020 · 14 comments

Comments

@gflohr
Copy link

gflohr commented Mar 20, 2020

Module:

Description

In a web server based on Net::Server (version 2.007 and the most recent 2.009) with the Net::Server::Single personality, I get the error "Signal SIGCHLD received, but no signal handler set." and the server terminates. When using the Net::Server::PreFork personality, the signal is SIGHUP instead of SIGCHLD, and the server continues to run because only one of the forked children had terminated.

The error occurs after a handler of the server has forked a background process. In the child, POSIX::setsid() is called, all open file descriptors are closed and STDIN is redirected to /dev/null. STDOUT and STDERR are not re-opened because the same code is supposed to work in a mod_perl environment. %SIG is not modified. The child process (another Perl script) is executed with exec().

There is a corresponding question on stackoverflow.com: https://stackoverflow.com/questions/60708194/error-message-signal-sigchld-received-but-no-signal-handler-set/60761593#60761593

I have answered it myself with additional findings/information: https://stackoverflow.com/a/60761593/5464233

My workaround is to explicitely assign "DEFAULT" to $SIG{CHLD} resp. $SIG{HUP}, instead of the old value undef but that is not a satisfying solution. That is very similar to the workaround that was recommended for an almost identical problem in GNU parallel several years ago: https://lists.gnu.org/archive/html/bug-parallel/2016-10/msg00000.html

Steps to Reproduce

Not possible. It's a (proprietary) application with 350000+ lines of Perl code. Modifying the code or running the application in the Perl debugger makes the error disappear instantly. If there is interest in trying to debug the problem, I can provide access to a test installation.

I also wasn't able to run the program in gdb because the server did not come up.

Expected behavior

The script should not be able to make the interpreter terminate prematurely.

Perl configuration

# perl -V output goes here (a little below ...)

I have tested on Mac OS X with these Perl versions using perlbrew:

$ perlbrew list
  perl-5.30.2                               
* perl-5.28.0                               
  perl-5.26.2                               
  perl-5.26.0                               
  perl-5.18.4                               
  perl-5.16.3                               
  perl-5.14.4                               
  perl-5.8.9                                

All versions starting from 5.18.4 have the problem. The older ones do not have it.

The same behavior was reported for 5.26.x to 5.30.x on Linux.

 $ perl -V
Summary of my perl5 (revision 5 version 28 subversion 0) configuration:
   
  Platform:
    osname=darwin
    osvers=18.0.0
    archname=darwin-2level
    uname='darwin hostname.example.com 18.0.0 darwin kernel version 18.0.0: wed aug 22 20:13:40 pdt 2018; root:xnu-4903.201.2~1release_x86_64 x86_64 '
    config_args='-de -Dprefix=/Users/myname/perl5/perlbrew/perls/perl-5.28.0 -Aeval:scriptdir=/Users/myname/perl5/perlbrew/perls/perl-5.28.0/bin'
    hint=recommended
    useposix=true
    d_sigaction=define
    useithreads=undef
    usemultiplicity=undef
    use64bitint=define
    use64bitall=define
    uselongdouble=undef
    usemymalloc=n
    default_inc_excludes_dot=define
    bincompat5005=undef
  Compiler:
    cc='cc'
    ccflags ='-fno-common -DPERL_DARWIN -mmacosx-version-min=10.14 -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -I/opt/local/include -DPERL_USE_SAFE_PUTENV'
    optimize='-O3'
    cppflags='-fno-common -DPERL_DARWIN -mmacosx-version-min=10.14 -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -I/opt/local/include'
    ccversion=''
    gccversion='4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.11.45.5)'
    gccosandvers=''
    intsize=4
    longsize=8
    ptrsize=8
    doublesize=8
    byteorder=12345678
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=16
    longdblkind=3
    ivtype='long'
    ivsize=8
    nvtype='double'
    nvsize=8
    Off_t='off_t'
    lseeksize=8
    alignbytes=8
    prototype=define
  Linker and Libraries:
    ld='cc'
    ldflags =' -mmacosx-version-min=10.14 -fstack-protector-strong -L/usr/local/lib -L/opt/local/lib'
    libpth=/usr/local/lib /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/10.0.0/lib /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/lib /opt/local/lib /usr/lib
    libs=-lpthread -lgdbm -ldbm -ldl -lm -lutil -lc
    perllibs=-lpthread -ldl -lm -lutil -lc
    libc=
    so=dylib
    useshrplib=false
    libperl=libperl.a
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_dlopen.xs
    dlext=bundle
    d_dlsymun=undef
    ccdlflags=' '
    cccdlflags=' '
    lddlflags=' -mmacosx-version-min=10.14 -bundle -undefined dynamic_lookup -L/usr/local/lib -L/opt/local/lib -fstack-protector-strong'


Characteristics of this binary (from libperl): 
  Compile-time options:
    HAS_TIMES
    PERLIO_LAYERS
    PERL_COPY_ON_WRITE
    PERL_DONT_CREATE_GVSV
    PERL_MALLOC_WRAP
    PERL_OP_PARENT
    PERL_PRESERVE_IVUV
    PERL_USE_SAFE_PUTENV
    USE_64_BIT_ALL
    USE_64_BIT_INT
    USE_LARGE_FILES
    USE_LOCALE
    USE_LOCALE_COLLATE
    USE_LOCALE_CTYPE
    USE_LOCALE_NUMERIC
    USE_LOCALE_TIME
    USE_PERLIO
    USE_PERL_ATOF
  Locally applied patches:
    Devel::PatchPerl 1.52
  Built under darwin
  Compiled at Nov  9 2018 16:47:19
  %ENV:
    PERLBREW_HOME="/Users/myname/.perlbrew"
    PERLBREW_MANPATH="/Users/myname/perl5/perlbrew/perls/perl-5.28.0/man"
    PERLBREW_PATH="/Users/myname/perl5/perlbrew/bin:/Users/myname/perl5/perlbrew/perls/perl-5.28.0/bin"
    PERLBREW_PERL="perl-5.28.0"
    PERLBREW_ROOT="/Users/myname/perl5/perlbrew"
    PERLBREW_SHELLRC_VERSION="0.87"
    PERLBREW_VERSION="0.87"
  @INC:
    /Users/myname/perl5/perlbrew/perls/perl-5.28.0/lib/site_perl/5.28.0/darwin-2level
    /Users/myname/perl5/perlbrew/perls/perl-5.28.0/lib/site_perl/5.28.0
    /Users/myname/perl5/perlbrew/perls/perl-5.28.0/lib/5.28.0/darwin-2level
    /Users/myname/perl5/perlbrew/perls/perl-5.28.0/lib/5.28.0
@gflohr
Copy link
Author

gflohr commented Mar 20, 2020

This is the code snippet that spawns the child process:

            if ($self->{__daemonize}) {
                if (fork) {
                    CORE::exit(0);
                }

                unless (POSIX::setsid()) {
                    # make child session leader and detach from terminal
                    die "Unable to detach process\n";
                }

                ## Close open file descriptors.
                POSIX::close($_) foreach open_files;

                ## Reopen stdin to /dev/null
                open(STDIN,  "+>/dev/null")
                     or warn "Cannot redirect standard input to /dev/null: $!\n";

                # XXX
                # Re-opening STDOUT colides with it being tie-ed during the request.
                # open(STDOUT,  "+>/dev/null")
                #     or warn "Cannot redirect standard output to /dev/null: $!\n";
                # Re-opening STDERR colides with mod_perl.
                # open(STDERR,  "+>/dev/null")
                #      or warn "Cannot redirect standard error to /dev/null: $!\n";
            }

            exec($self->{__path}, @{$self->{__argv}})
                or die "Exec failed: " . $!;

The function open_files() returns a list of open file descriptors $self->{__path} is the path to the script to be executed, $self->{__argv} is the list of arguments.

@jkeenan
Copy link
Contributor

jkeenan commented Mar 20, 2020

This is the code snippet that spawns the child process:

            if ($self->{__daemonize}) {
                if (fork) {
                    CORE::exit(0);
                }

                unless (POSIX::setsid()) {
                    # make child session leader and detach from terminal
                    die "Unable to detach process\n";
                }

                ## Close open file descriptors.
                POSIX::close($_) foreach open_files;

                ## Reopen stdin to /dev/null
                open(STDIN,  "+>/dev/null")
                     or warn "Cannot redirect standard input to /dev/null: $!\n";

                # XXX
                # Re-opening STDOUT colides with it being tie-ed during the request.
                # open(STDOUT,  "+>/dev/null")
                #     or warn "Cannot redirect standard output to /dev/null: $!\n";
                # Re-opening STDERR colides with mod_perl.
                # open(STDERR,  "+>/dev/null")
                #      or warn "Cannot redirect standard error to /dev/null: $!\n";
            }

            exec($self->{__path}, @{$self->{__argv}})
                or die "Exec failed: " . $!;

The function open_files() returns a list of open file descriptors $self->{__path} is the path to the script to be executed, $self->{__argv} is the list of arguments.

Is the code snippet from your own code -- or from perl or CPAN code?

@gflohr
Copy link
Author

gflohr commented Mar 20, 2020

The code snippet is not from a CPAN module but from the application (which is not on CPAN).

@iabyn
Copy link
Contributor

iabyn commented Mar 20, 2020 via email

@gflohr
Copy link
Author

gflohr commented Mar 22, 2020

It sounds like somehow there is a discrepancy between what the OS and perl thinks about the state of the signal handdler.

The bug is not OS dependant. It is reproducible under Mac OS X and under Linux.

Without code that reproduces the issue I don't think there's much we can do - we don't even know whether its a bug in perl.

I can provide a tarball if anybody is willing to debug the issue. My customer and me have found the workaround described above, so it is not urgent. But I think it is quite likely that it is a bug in perl given the fact that it happens with completely different OSs.

@iabyn
Copy link
Contributor

iabyn commented Mar 22, 2020 via email

@gflohr
Copy link
Author

gflohr commented Mar 23, 2020

I didn't say that it was a bug in the OS.

I see. Then I misunderstood it.

If the tarball contains a standalone reliable reproducer, we'd be more interested.

Yes, I can prepare that. I am just not allowed to make it public. But you can always contact me via my email address on my github page @gflohr.

@tonycoz
Copy link
Contributor

tonycoz commented Mar 25, 2020

I could see this happening under threads, since IIRC %SIG isn't synchronized between threads, if one thread sets a handler for CHLD and the OS sends the signal to a different thread, this error could occur.

Does your code use threads at all?

@gflohr
Copy link
Author

gflohr commented Mar 26, 2020

Does your code use threads at all?

No, and the interpreters are all compiled without interpreter threads support. I had put a "use threads" into the main script and it dies right away with "This Perl not built to support threads ...", so I can also rule out that I am by accident using a different interpreter.

@tonycoz
Copy link
Contributor

tonycoz commented Apr 1, 2020

My workaround is to explicitely assign "DEFAULT" to $SIG{CHLD} resp. $SIG{HUP}, instead of the old value undef but that is not a satisfying solution. That is very similar to the workaround that was recommended for an almost identical problem in GNU parallel several years ago: https://lists.gnu.org/archive/html/bug-parallel/2016-10/msg00000.html

It's unclear to me - are you setting/resetting $SIG{CHLD} continuously over the life of the program, or setting it once?

Net::Server appears to only set it on startup.

@gflohr
Copy link
Author

gflohr commented Apr 1, 2020

The original program itself did not touch $SIG{CHILD} at all. Now, this workaround has been added:

    $SIG{CHLD} ||= 'DEFAULT';
    $SIG{HUP} ||= 'DEFAULT';

This is now executed before any fork(). At this point, $SIG{CHILD} was undef, when using the Net::Server::Single personality. Setting it to "DEFAULT" instead should be a noop but as a matter of fact, it makes the error vanish.

When using Net::Server::PreFork, the children died with the same error message but about SIGHUP. The "workaround" cures this behavior as well.

@tonycoz tonycoz self-assigned this Apr 2, 2020
@tonycoz
Copy link
Contributor

tonycoz commented Apr 6, 2020

Which other libraries are being used by the process? For example database drivers.

@gflohr
Copy link
Author

gflohr commented Apr 13, 2020

DBD::SQLite is in use. Otherwise nothing suspicious.

@tonycoz tonycoz removed their assignment Apr 13, 2020
@tonycoz
Copy link
Contributor

tonycoz commented Apr 14, 2020

I can see a race if a signal is (safe signal) received and marked pending, but the handler is removed before it can be delivered, but I don't see another possibility right now.

Without a reproducible example I don't see a way to debug this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants