-
Notifications
You must be signed in to change notification settings - Fork 577
Crash: Embedded perl on FreeBSD: Perl__inverse_folds #17774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
For what its worth, it seems like this occurs after some interpreters that we created in process have been destroyed. |
[snip] For reference (only):
|
Also in the core distribution, but a bit more generally:
|
This looks like the same type of issue as #17154 |
I have a fix that I hope fixes the problem. If you can reproduce this with any regularity, you could try out https://github.com/Perl/perl5/commits/smoke-me/khw-intrpvar |
I've applied the patch locally, will put it under some stress. |
So far no new crashes on 5.31.11. I've applied the patch to 5.30.2 as well, will know soon enough. |
Hi. #include <EXTERN.h>
#include <perl.h>
#include <XSUB.h>
int main(int argc, char** argv, char** env) {
int i;
PERL_SYS_INIT3(&argc, &argv, &env);
for (i=0; i<2; i++) {
PerlInterpreter *perl;
char *args[] = { "perl", "-e", "sub some_func { 'some line' =~ /[ab]/ }", NULL };
perl = perl_alloc();
PL_perl_destruct_level = 1;
perl_construct(perl);
perl_parse(perl, NULL, 3, args, (char **)NULL);
perl_run(perl);
PL_perl_destruct_level = 1;
perl_destruct(perl);
perl_free(perl);
}
PERL_SYS_TERM();
return 0;
} Stack trace is:
I'll try patch proposed by @khwilliamson and report feedback. |
It's not easy to apply the patch to 5.30.2. If I apply the changes from 1c98fee then some tests fail. |
I've not seen any more crashes after applying the patch. Will this still make it into 5.32.0? |
I might be willing to make an exception for this. (RC0 was out yesterday, so it will go mid-RC.) It depends on approval from @tonycoz and @khwilliamson. |
It looks fine to me. Note that this can't be backported if not included in 5.32 (and can't be backported to 5.30) since it would break binary compatibility. |
While I don't claim to have any of the expertise that @tonycoz and @khwilliamson have in our C-level code, I must say that changing fundamental source code files after we've already begun RCs is an unwise course. Do we have any idea at all as to what the impact of this patch will be downstream on CPAN? No, we do not. If we apply this patch, I think we should suspend the 5.32 release process and issue a 5.31.12 so that we can assess the downstream impact. Conversely, is this bug reported so late in our release cycle important enough to shut down our forward movement? Thank you very much. |
One additional point. The branch proposed to address this problem, AFAICT, contains no regression tests. So, are we at the eleventh hour to insert source code that is not tested in blead and has not been tested downstream? Saying a patch like this one "looks good to me" may be acceptable in July or October, but it's not acceptable when most of the world is already in June. Thank you very much. |
On 5/31/20 7:47 PM, James E Keenan wrote:
It looks fine to me.
Note that this can't be backported if not included in 5.32 (and
can't be backported to 5.30) since it would break binary
compatibility.
While I don't claim to have any of the expertise that @tonycoz
<https://github.com/tonycoz> and @khwilliamson
<https://github.com/khwilliamson> have in our C-level code, I must
say that changing fundamental source code files after we've already
begun RCs is an unwise course.
Do we have any idea at all as to what the impact of this patch will
be downstream on CPAN?
No, we do not.
If we apply this patch, I think we should suspend the 5.32 release
process and issue a 5.31.12 so that we can assess the downstream
impact. Conversely, is this bug reported so late in our release
cycle important enough to shut down our forward movement?
One additional point. The branch proposed to address this problem,
AFAICT, contains no regression tests. So, are we at the eleventh hour to
insert source code that is not tested in blead and has not been tested
downstream? Saying a patch like this one "looks good to me" may be
acceptable in July or October, but it's not acceptable when most of the
world is already in June.
I believe the bug is only known to manifest when running multiple
embedded perls. (And that is consistent with what the patch does.)
Multiple embedded perls is not something that we have any smokers set up
to do. Any regression tests would pass everywhere except in such a
configuration, and even then it apparently happens quite rarely.
I think the patch should go in. I think it is riskier to leave it out.
First of all, the patch merely brings one outlier into conformance with
the paradigm established by a commit much earlier in the cycle. For
some reason, I failed to see this one, and omitted it from that patch.
Secondly, the patch restores things to how they worked for several
release cycles. We know that that logic worked. What happened to cause
this problem is I changed things to speed up interpreter cloning. But
neither I nor the people I consulted with at the time considered the
multiple embedded interpreter case. It took quite a while for the bug
to surface. My patch earlier in the cycle effectively (almost) reverted
the original patch. And the word 'almost' is crucial here.
Thirdly, we would be negligent in releasing a major version with a known
bug that cannot be fixed until the next major version. There will be
shops which can't upgrade to 5.32 if we don't fix this; and some that
have regretted their previous upgrades to 5.28.
The patch has been run in our normal smokers, and by the OP. From my
knowledge of the actual nuts and bolts, any problems should have
surfacted by now.
I think doing another development release would be from an
over-abundance of caution. But better that than not including this patch.
|
@khwilliamson, I think 1c98fee is the patch. Am I right? Can you provide a cleaner commit of this that includes a clear summary and explanation of the problem? |
This resolves #17774. This ticket is because the fixes in GH #17154 failed to get every case, leaving this one outlier to be fixed by this commit. The text in #17154 gives extensive details as to the problem. But briefly, in an attempt to speed up interpreter cloning, I moved certain SVs from interpreter level to global level in e80a011 (v5.27.11, March 2018). This was doable, we thought, because the content of these SVs is constant throughout the life of the program, so no need to copy them when cloning a new interpreter or thread. However when an interpreter exits, all its SVs get cleaned up, which caused these to become garbage in applications where another interpreter remains running. This circumstance is rare enough that the bug wasn't reported until September 2019, #17154. I made an initial attempt to fix the problem, and closed that ticket, but I overlooked one of the variables, which was reported in #17774, which this commit addresses. Effectively the behavior is reverted to the way it was before e80a011.
The rebased patch is 535f99a68c01fd437bfdebcb547ec50f81b2d272(url) |
This resolves #17774. This ticket is because the fixes in GH #17154 failed to get every case, leaving this one outlier to be fixed by this commit. The text in #17154 gives extensive details as to the problem. But briefly, in an attempt to speed up interpreter cloning, I moved certain SVs from interpreter level to global level in e80a011 (v5.27.11, March 2018). This was doable, we thought, because the content of these SVs is constant throughout the life of the program, so no need to copy them when cloning a new interpreter or thread. However when an interpreter exits, all its SVs get cleaned up, which caused these to become garbage in applications where another interpreter remains running. This circumstance is rare enough that the bug wasn't reported until September 2019, #17154. I made an initial attempt to fix the problem, and closed that ticket, but I overlooked one of the variables, which was reported in #17774, which this commit addresses. Effectively the behavior is reverted to the way it was before e80a011.
Thank you for fixing and putting it into 5.32.0. |
@khwilliamson 's patch has been merged into blead. However, before that happened I figured that additional data would be beneficial. So built a This is similar to what I have done monthly with the CPAN River 3000, but to expedite matters I tested on a host which is fbsd 11 rather than in the VM which is fbsd 12. So the results are not strictly comparable. Nonetheless, I was surprised to get some good news. Karl's branch appears to have resolved certain BBC issues from earlier in this production cycle. Please see:
Note that, among other things, we're once again getting a PASS for AnyEvent. Thank you very much. |
@jkeenan, thank you for the thorough work! |
@khwilliamson Thank you, I've actually reworked the existing patches in git to test it with perl 5.30.2 a while ago already to test the fix. We're currently on 5.26, and with this merged, we're looking at jumping straight to 5.32.0. |
Description
We embed multiple perl interpreters into some of our applications. After upgrading from 5.26.0 to 5.30.2 (also confirmed that it exists on 5.31.11, 5.28.2 is still unknown), we see the following intermittent crash
Steps to Reproduce
Unknown
Expected behavior
Should not seg fault.
Perl configuration
The text was updated successfully, but these errors were encountered: