Skip to content

Commit 1f621a8

Browse files
iabynrjbs
authored andcommitted
RT #119125: fix two issues with/[#]/x
(This is a maint-specific patch, not a cherry-pick from blead) A hash within a character class in an expanded pattern is an odd beast. It is handled twice, first by the perl toker, which is looking for things like embedded variables that need interpolating, and second by the regex parser. The toker only has limited knowledge of regex syntax, and struggles to work out for things like /#$foo/x and /[#$foo]/x, whether that's a regex comment and so whether '$foo' is part of the comment string or a variable to be interpolated. Up until 5.18.0 inclusive it got very confused when the '#' was within a character class, and usually got it wrong. 5.18.0 also introduced the additional complication that (?{}) code-blocks were now normally handled by the perl toker rather than by the regex parser. A side-effect of this was that if for any reason the toker didn't spot a code block (because it erroneously thought it was part of regex comment for example), then the literal code block text would be passed through uncompiled to the regex parser, which would then refuse to compile unless "use re eval" was in scope. Al these problems have been fixed in blead. However, the fixes couldn't be fully back-ported to maint, since there was a fair bit of code on CPAN that would (erroneously) do things like /[#$^]/ which the author expected to match one three special characters, and indeed does on on older perls. On bleed however, this (correctly) expands to /[#STDOUT_TOP]/ (based on what $^ is currently set to). So we decided to keep the old (broken) behaviour on maint. These fixes and half-fixes were included in 5.18.2. However, it turns out that 5.18.2 still has a couple of issues, one of which is a regression from 5.16.x. The table below shows the behaviours of certain regex constructs under various flavours of perl. "5.18.3" represents the changes included in this commit, and the entries marked "*******" represent changes in behaviour since 5.18.2 (i.e. they are what this commit fixes). /[#$b]/x 5.16.3 - $b not expanded 5.18.0 - $b not expanded 5.18.2 - $b not expanded - keep bug for backwards compatibility 5.18.3 - $b not expanded - keep bug for backwards compatibility blead - $b expanded /[#]$c/x 5.16.3 - $c not expanded 5.18.0 - $c not expanded 5.18.2 - $c not expanded 5.18.3 - $c expanded ******* blead - $c expanded /[#] (?{})/x # i.e. this pattern includes a literal newline 5.16.3 - re eval not needed 5.18.0 - re eval needed 5.18.1 - re eval needed 5.18.2 - re eval needed 5.18.3 - re eval not needed ******* blead - re eval not needed
1 parent 43c6e0a commit 1f621a8

File tree

2 files changed

+33
-15
lines changed

2 files changed

+33
-15
lines changed

t/re/pat.t

+9-1
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ BEGIN {
2020
require './test.pl';
2121
}
2222

23-
plan tests => 672; # Update this when adding/deleting tests.
23+
plan tests => 674; # Update this when adding/deleting tests.
2424

2525
run_tests() unless caller;
2626

@@ -1399,6 +1399,14 @@ EOP
13991399
$s = 'abcd$%#&';
14001400
$s =~ s/[a#$b%]/X/gx;
14011401
is ($s, 'XXcdXXX&', 'RT #119125 with /x');
1402+
1403+
$s = 'abYcd$Y#Y&';
1404+
my $c = 'Y';
1405+
$s =~ s/[#$b]$c/X/gx;
1406+
is ($s, 'aXcdXX&', 'RT #119125 with /x and trailing var');
1407+
1408+
ok("a#b" =~ /a[#]
1409+
b(?{})/x, 'RT #119125 with newline and codeblock');
14021410
}
14031411

14041412
} # End of sub run_tests

toke.c

+24-14
Original file line numberDiff line numberDiff line change
@@ -3178,22 +3178,32 @@ S_scan_const(pTHX_ char *start)
31783178

31793179
/* likewise skip #-initiated comments in //x patterns */
31803180
else if (*s == '#' && PL_lex_inpat &&
3181-
((PMOP*)PL_lex_inpat)->op_pmflags & RXf_PMf_EXTENDED) {
3182-
while (s+1 < send && *s != '\n') {
3181+
((PMOP*)PL_lex_inpat)->op_pmflags & RXf_PMf_EXTENDED)
3182+
{
3183+
if (in_charclass) {
31833184
/* for maint-5.18, half-fix #-in-charclass bug:
3184-
* *do* recognise codeblocks: /[#](?{})/
3185-
* *don't* recognise interpolated vars: /[#$x]/
3186-
*/
3187-
if (in_charclass && !PL_lex_casemods && s+3 < send &&
3188-
s[0] == '(' &&
3189-
s[1] == '?' &&
3190-
( s[2] == '{'
3191-
|| (s[2] == '?' && s[3] == '{')))
3192-
break;
3193-
*d++ = NATIVE_TO_NEED(has_utf8,*s++);
3185+
* strictly speaking, #-in-charclass has no special
3186+
* meaning; however, for backwards compatibility,
3187+
* ignore $variables etc for the rest of the charclass
3188+
* scope */
3189+
while (in_charclass && s+1 < send && *s != '\n') {
3190+
if (*s == ']') {
3191+
char *s1 = s-1;
3192+
int esc = 0;
3193+
while (s1 >= start && *s1-- == '\\')
3194+
esc = !esc;
3195+
if (!esc)
3196+
in_charclass = FALSE;
3197+
}
3198+
*d++ = *s++;
3199+
}
3200+
continue;
3201+
}
3202+
else {
3203+
/* normal /...#.../x; skipt to end of line */
3204+
while (s+1 < send && *s != '\n')
3205+
*d++ = *s++;
31943206
}
3195-
if (s+ 1 < send && *s != '\n')
3196-
break; /* we stopped on (?{}), not EOL */
31973207
}
31983208

31993209
/* no further processing of single-quoted regex */

0 commit comments

Comments
 (0)