Skip to content

Recursive call to mortal_getenv() causes deadlock #18341

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mhx opened this issue Nov 20, 2020 · 1 comment
Closed

Recursive call to mortal_getenv() causes deadlock #18341

mhx opened this issue Nov 20, 2020 · 1 comment
Milestone

Comments

@mhx
Copy link
Contributor

mhx commented Nov 20, 2020

I've just built blead with a configuration that I've been building quite regularly back in the days:

./Configure -des -Uinstallusrbinperl -Dusethreads -DDEBUGGING=both -Doptimize=-ggdb3 -Dprefix=/home/mhx/perl/blead-mem -Dusedevel -Uversiononly -Aappend:ccflags="-DPERL_MEM_LOG -DPERL_MEM_LOG_STDERR -DPERL_MEM_LOG_ENV -DPERL_MEM_LOG_ENV_FD -DPERL_MEM_LOG_TIMESTAMP -DPERL_TRACK_MEMPOOL -DPERL_POISON -DDEBUG_LEAKING_SCALARS -DDEBUG_LEAKING_SCALARS_FORK_DUMP -DPERL_DEBUGGING_MSTATS -Wall -Wextra"

I'm pretty certain not all of this is needed for to reproduce the issue, but at least some of the MEM_LOG stuff is from a quick glance at the stack trace below.

The first time miniperl gets run during the build, it hangs immediately. Here's a stack trace:

(gdb) bt
#0  0x00007ffff7f8852b in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007ffff7f811e3 in pthread_mutex_lock () from /lib64/libpthread.so.0
#2  0x00005555557300d2 in Perl_mortal_getenv (str=0x555555b92fff "PERL_MEM_LOG") at inline.h:2608
#3  0x0000555555740882 in S_mem_log_common (mlt=MLT_NEW_SV, n=0, typesize=0, type_name=0x555555b919e8 "", sv=0x555555c89ea8, 
    oldalloc=0x0, newalloc=0x0, filename=0x555555ba2368 "sv.c", linenumber=9366, 
    funcname=0x555555bb5a58 <__FUNCTION__.24917> "Perl_newSVpv") at util.c:5011
#4  0x0000555555740fcc in Perl_mem_log_new_sv (sv=0x555555c89ea8, filename=0x555555ba2368 "sv.c", linenumber=9366, 
    funcname=0x555555bb5a58 <__FUNCTION__.24917> "Perl_newSVpv") at util.c:5150
#5  0x00005555557b40b5 in S_new_SV (my_perl=0x555555c862a0, file=0x555555ba2368 "sv.c", line=9366, 
    func=0x555555bb5a58 <__FUNCTION__.24917> "Perl_newSVpv") at sv.c:342
#6  0x00005555557fc810 in Perl_newSVpv (my_perl=0x555555c862a0, s=0x7fffffffe30e "en_US.UTF-8", len=0) at sv.c:9366
#7  0x000055555595709b in Perl_mortal_getenv (str=0x555555c44a27 "LANG") at inline.h:2613
#8  0x000055555595bab1 in Perl_init_i18nl10n (my_perl=0x555555c862a0, printwarn=1) at locale.c:3336
#9  0x00005555555d792d in perl_construct (my_perl=0x555555c862a0) at perl.c:430
#10 0x0000555555995871 in main (argc=1, argv=0x7fffffffdae8, env=0x7fffffffdaf8) at miniperlmain.c:111

I did a quick blame and reverted 24f3e84 and that fixed the deadlock. I didn't take a closer look yet, hopefully @khwilliamson has an idea :)

@khwilliamson khwilliamson added this to the 5.32.1 milestone Nov 26, 2020
khwilliamson added a commit that referenced this issue Nov 26, 2020
This fixes GH #18341

getenv() call allocates memory to squirrel safely away the result of
that getenv() call.  It does this while in a critical section so as to
make sure another thread can't interrupt it and destroy it.

The problem when using PERL_MEM_LOG is that the allocation of that
memory causes a recursive call to getenv() to see how to log that
allocation.  And it deadlocks trying to enter the critical section.

There are various solutions.  One is to use or emulate a general semaphore
instead of a binary one.  This is effectively what
PL_lc_numeric_mutex_depth does for another mutex, and the code for that
could be used as a template.

But given that this is an extreme edge case which requires Perl to be
specially compiled to enable this feature which is used only for
debugging, I thought it would be sufficient to just make a special case
to not try to lock during that recursive call.
khwilliamson added a commit that referenced this issue Nov 26, 2020
This fixes GH #18341

The Perl wrapper for getenv() was changed in 5.32 to allocate memory to
squirrel safely away the result of the wrapped getenv() call.  It does
this while in a critical section so as to make sure another thread can't
interrupt it and destroy it.

Unfortunately, when Perl is compiled for debugging memory problems and
has PERL_MEM_LOG enabled, that allocation causes a recursive call to
getenv() for the purpose of checking an environment variable to see how
to log that allocation.  And hence it deadlocks trying to enter the
critical section.

There are various solutions.  One is to use or emulate a general semaphore
instead of a binary one.  This is effectively what
PL_lc_numeric_mutex_depth does for another mutex, and the code for that
could be used as a template.

But given that this is an extreme edge case which requires Perl to be
specially compiled to enable this feature which is used only for
debugging, a much simpler, if less safe if it were to ever be used in
production, solution should suffice.  Tony Cook suggested just avoiding
the wrapper for this particular purpose.
khwilliamson added a commit that referenced this issue Nov 27, 2020
This fixes GH #18341

The Perl wrapper for getenv() was changed in 5.32 to allocate memory to
squirrel safely away the result of the wrapped getenv() call.  It does
this while in a critical section so as to make sure another thread can't
interrupt it and destroy it.

Unfortunately, when Perl is compiled for debugging memory problems and
has PERL_MEM_LOG enabled, that allocation causes a recursive call to
getenv() for the purpose of checking an environment variable to see how
to log that allocation.  And hence it deadlocks trying to enter the
critical section.

There are various solutions.  One is to use or emulate a general semaphore
instead of a binary one.  This is effectively what
PL_lc_numeric_mutex_depth does for another mutex, and the code for that
could be used as a template.

But given that this is an extreme edge case which requires Perl to be
specially compiled to enable this feature which is used only for
debugging, a much simpler, if less safe if it were to ever be used in
production, solution should suffice.  Tony Cook suggested just avoiding
the wrapper for this particular purpose.
@tonycoz
Copy link
Contributor

tonycoz commented Nov 30, 2020

This was fixed by 0cc28fe

@tonycoz tonycoz closed this as completed Nov 30, 2020
khwilliamson added a commit that referenced this issue Dec 12, 2020
This fixes GH #18341

There are problems with getenv() on threaded perls wchich can lead to
incorrect results when compiled with PERL_MEM_LOG.

Commit 0b83dfe fixed this for some
platforms, but as Tony Cook, pointed out there may be
standards-compliant platforms that that didn't fix.

The detailed comments outline the issues and (complicated) full solution.
khwilliamson added a commit that referenced this issue Dec 17, 2020
This fixes GH #18341

There are problems with getenv() on threaded perls wchich can lead to
incorrect results when compiled with PERL_MEM_LOG.

Commit 0b83dfe fixed this for some
platforms, but as Tony Cook, pointed out there may be
standards-compliant platforms that that didn't fix.

The detailed comments outline the issues and (complicated) full solution.
khwilliamson added a commit that referenced this issue Dec 17, 2020
This fixes GH #18341

There are problems with getenv() on threaded perls wchich can lead to
incorrect results when compiled with PERL_MEM_LOG.

Commit 0b83dfe fixed this for some
platforms, but as Tony Cook, pointed out there may be
standards-compliant platforms that that didn't fix.

The detailed comments outline the issues and (complicated) full solution.
khwilliamson added a commit that referenced this issue Dec 20, 2020
This fixes GH #18341

There are problems with getenv() on threaded perls wchich can lead to
incorrect results when compiled with PERL_MEM_LOG.

Commit 0b83dfe fixed this for some
platforms, but as Tony Cook, pointed out there may be
standards-compliant platforms that that didn't fix.

The detailed comments outline the issues and (complicated) full solution.
steve-m-hay pushed a commit that referenced this issue Jan 9, 2021
This fixes GH #18341

The Perl wrapper for getenv() was changed in 5.32 to allocate memory to
squirrel safely away the result of the wrapped getenv() call.  It does
this while in a critical section so as to make sure another thread can't
interrupt it and destroy it.

Unfortunately, when Perl is compiled for debugging memory problems and
has PERL_MEM_LOG enabled, that allocation causes a recursive call to
getenv() for the purpose of checking an environment variable to see how
to log that allocation.  And hence it deadlocks trying to enter the
critical section.

There are various solutions.  One is to use or emulate a general semaphore
instead of a binary one.  This is effectively what
PL_lc_numeric_mutex_depth does for another mutex, and the code for that
could be used as a template.

But given that this is an extreme edge case which requires Perl to be
specially compiled to enable this feature which is used only for
debugging, a much simpler, if less safe if it were to ever be used in
production, solution should suffice.  Tony Cook suggested just avoiding
the wrapper for this particular purpose.

(cherry picked from commit 0cc28fe)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants