Skip to content

Commit 8e84dec

Browse files
committed
Add missing deprecation message for unescaped '{' in regexes
The use of literal '{' without being escaped has been deprecated since 5.16, and warned on since 5.20. In 5.24, this has been made illegal, with a bunch of CPAN modules broken by it, in spite of the long deprecation period. See https://rt.perl.org/Ticket/Display.html?id=128139 Unfortunately, I overlooked a code path, and not all instances that should have warned did so in fact. This was spotted by Tom Wyant in https://rt.perl.org/Ticket/Display.html?id=128213 This commit adds that warning, and rewords the fatal one slightly, and clarifies the whole thing in perldiag.
1 parent a139980 commit 8e84dec

File tree

4 files changed

+141
-19
lines changed

4 files changed

+141
-19
lines changed

pod/perldelta.pod

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -271,7 +271,12 @@ Perl yourself. The #! line at the top of your file could look like
271271

272272
=item *
273273

274-
XXX L<message|perldiag/"message">
274+
L<Unescaped left brace in regex is deprecated here, passed through in regex; marked by S<<-- HERE> in mE<sol>%sE<sol>|perldiag/"Unescaped left brace in regex is deprecated here, passed through in regex; marked by S<<-- HERE> in m/%s/">
275+
276+
Unescaped left braces are already illegal in some contexts in regular
277+
expression patterns, but, due to an oversight, no deprecation warning
278+
was raised in other contexts where they are intended to become illegal.
279+
This warning is now raised in these contexts.
275280

276281
=back
277282

@@ -293,7 +298,11 @@ XXX Changes (i.e. rewording) of diagnostic messages go here
293298

294299
=item *
295300

296-
XXX Describe change here
301+
L<Unescaped left brace in regex is illegal here in regex; marked by S<<-- HERE> in mE<sol>%sE<sol>|perldiag/"Unescaped left brace in regex is illegal here in regex; marked by S<<-- HERE> in m/%s/">
302+
303+
The word "here" has been added to the message that was raised in
304+
v5.25.1. This is to indicate that there are contexts in which unescaped
305+
left braces are not (yet) illegal.
297306

298307
=back
299308

pod/perldiag.pod

Lines changed: 107 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -6139,20 +6139,117 @@ C<undef *foo>.
61396139
(A) You've accidentally run your script through B<csh> instead of Perl.
61406140
Check the #! line, or manually feed your script into Perl yourself.
61416141

6142-
=item Unescaped left brace in regex is illegal in regex;
6142+
=item Unescaped left brace in regex is deprecated here, passed through in
6143+
regex; marked by S<<-- HERE> in m/%s/
6144+
6145+
(D deprecated, regexp) The simple rule to remember, if you want to
6146+
match a literal C<"{"> character (U+007B C<LEFT CURLY BRACKET>) in a
6147+
regular expression pattern, is to escape each literal instance of it in
6148+
some way. Generally easiest is to precede it with a backslash, like
6149+
C<"\{"> or enclose it in square brackets (C<"[{]">). If the pattern
6150+
delimiters are also braces, any matching right brace (C<"}">) should
6151+
also be escaped to avoid confusing the parser, for example,
6152+
6153+
qr{abc\{def\}ghi}
6154+
6155+
Forcing literal C<"{"> characters to be escaped will enable the Perl
6156+
language to be extended in various ways in future releases. To avoid
6157+
needlessly breaking existing code, the restriction is is not enforced in
6158+
contexts where there are unlikely to ever be extensions that could
6159+
conflict with the use there of C<"{"> as a literal.
6160+
6161+
In this release of Perl, some literal uses of C<"{"> are fatal, and some
6162+
still just deprecated. This is because of an oversight: some uses of a
6163+
literal C<"{"> that should have raised a deprecation warning starting in
6164+
v5.20 did not warn until v5.26. By making the already-warned uses fatal
6165+
now, some of the planned extensions can be made to the language sooner.
6166+
6167+
The contexts where no warnings or errors are raised are:
6168+
6169+
=over 4
6170+
6171+
=item *
6172+
6173+
as the first character in a pattern, or following C<"^"> indicating to
6174+
anchor the match to the beginning of a line.
6175+
6176+
=item *
6177+
6178+
as the first character following a C<"|"> indicating alternation.
6179+
6180+
=item *
6181+
6182+
as the first character in a parenthesized grouping like
6183+
6184+
/foo({bar)/
6185+
/foo(?:{bar)/
6186+
6187+
=item *
6188+
6189+
as the first character following a quantifier
6190+
6191+
/\s*{/
6192+
6193+
=back
6194+
6195+
=for comment
6196+
The text of the message above is duplicated below to allow splain (and
6197+
'use diagnostics') to work. Since one is fatal, and one not, they can't
6198+
be combined as one message. And since the non-fatal one is temporary,
6199+
there's no real need to enhance perldiag to handle this transient case.
6200+
6201+
=item Unescaped left brace in regex is illegal here in regex;
61436202
marked by S<<-- HERE> in m/%s/
61446203

6145-
(F) You used a literal C<"{"> character in a regular
6146-
expression pattern. You should change to use C<"\{"> or C<[{]> instead.
6147-
If the pattern delimiters are also braces, any matching
6148-
right brace (C<"}">) should also be escaped to avoid confusing the parser,
6149-
for example,
6204+
(F) The simple rule to remember, if you want to
6205+
match a literal C<"{"> character (U+007B C<LEFT CURLY BRACKET>) in a
6206+
regular expression pattern, is to escape each literal instance of it in
6207+
some way. Generally easiest is to precede it with a backslash, like
6208+
C<"\{"> or enclose it in square brackets (C<"[{]">). If the pattern
6209+
delimiters are also braces, any matching right brace (C<"}">) should
6210+
also be escaped to avoid confusing the parser, for example,
6211+
6212+
qr{abc\{def\}ghi}
6213+
6214+
Forcing literal C<"{"> characters to be escaped will enable the Perl
6215+
language to be extended in various ways in future releases. To avoid
6216+
needlessly breaking existing code, the restriction is is not enforced in
6217+
contexts where there are unlikely to ever be extensions that could
6218+
conflict with the use there of C<"{"> as a literal.
6219+
6220+
In this release of Perl, some literal uses of C<"{"> are fatal, and some
6221+
still just deprecated. This is because of an oversight: some uses of a
6222+
literal C<"{"> that should have raised a deprecation warning starting in
6223+
v5.20 did not warn until v5.26. By making the already-warned uses fatal
6224+
now, some of the planned extensions can be made to the language sooner.
6225+
6226+
The contexts where no warnings or errors are raised are:
6227+
6228+
=over 4
6229+
6230+
=item *
61506231

6151-
qr{abc\{def\}ghi}
6232+
as the first character in a pattern, or following C<"^"> indicating to
6233+
anchor the match to the beginning of a line.
61526234

6153-
This restriction is not enforced if the C<"{"> is the first character in
6154-
the pattern; nor is a warning generated for this case, as there are no
6155-
current plans to forbid it.
6235+
=item *
6236+
6237+
as the first character following a C<"|"> indicating alternation.
6238+
6239+
=item *
6240+
6241+
as the first character in a parenthesized grouping like
6242+
6243+
/foo({bar)/
6244+
/foo(?:{bar)/
6245+
6246+
=item *
6247+
6248+
as the first character following a quantifier
6249+
6250+
/\s*{/
6251+
6252+
=back
61566253

61576254
=item unexec of %s into %s failed!
61586255

regcomp.c

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13259,7 +13259,7 @@ S_regatom(pTHX_ RExC_state_t *pRExC_state, I32 *flagp, U32 depth)
1325913259
* something like "\b" */
1326013260
if (len || (p > RExC_start && isALPHA_A(*(p -1)))) {
1326113261
RExC_parse = p + 1;
13262-
vFAIL("Unescaped left brace in regex is illegal");
13262+
vFAIL("Unescaped left brace in regex is illegal here");
1326313263
}
1326413264
/*FALLTHROUGH*/
1326513265
default: /* A literal character */
@@ -13664,8 +13664,6 @@ S_regatom(pTHX_ RExC_state_t *pRExC_state, I32 *flagp, U32 depth)
1366413664
RExC_parse = p - 1;
1366513665
Set_Node_Cur_Length(ret, parse_start);
1366613666
RExC_parse = p;
13667-
skip_to_be_ignored_text(pRExC_state, &RExC_parse,
13668-
FALSE /* Don't force to /x */ );
1366913667
{
1367013668
/* len is STRLEN which is unsigned, need to copy to signed */
1367113669
IV iv = len;
@@ -13677,6 +13675,13 @@ S_regatom(pTHX_ RExC_state_t *pRExC_state, I32 *flagp, U32 depth)
1367713675
break;
1367813676
} /* End of giant switch on input character */
1367913677

13678+
/* Position parse to next real character */
13679+
skip_to_be_ignored_text(pRExC_state, &RExC_parse,
13680+
FALSE /* Don't force to /x */ );
13681+
if (PASS2 && *RExC_parse == '{' && OP(ret) != SBOL && ! regcurly(RExC_parse)) {
13682+
ckWARNregdep(RExC_parse + 1, "Unescaped left brace in regex is deprecated here, passed through");
13683+
}
13684+
1368013685
return(ret);
1368113686
}
1368213687

t/re/reg_mesg.t

Lines changed: 15 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -268,10 +268,11 @@ my @death =
268268
'/(?[\ |!])/' => 'Incomplete expression within \'(?[ ])\' {#} m/(?[\ |!{#}])/', # [perl #126180]
269269
'/(?[()-!])/' => 'Incomplete expression within \'(?[ ])\' {#} m/(?[()-!{#}])/', # [perl #126204]
270270
'/(?[!()])/' => 'Incomplete expression within \'(?[ ])\' {#} m/(?[!(){#}])/', # [perl #126404]
271-
'/\w{/' => 'Unescaped left brace in regex is illegal {#} m/\w{{#}/',
272-
'/\q{/' => 'Unescaped left brace in regex is illegal {#} m/\q{{#}/',
273-
'/:{4,a}/' => 'Unescaped left brace in regex is illegal {#} m/:{{#}4,a}/',
274-
'/xa{3\,4}y/' => 'Unescaped left brace in regex is illegal {#} m/xa{{#}3\,4}y/',
271+
'/\w{/' => 'Unescaped left brace in regex is illegal here {#} m/\w{{#}/',
272+
'/\q{/' => 'Unescaped left brace in regex is illegal here {#} m/\q{{#}/',
273+
'/\A{/' => 'Unescaped left brace in regex is illegal here {#} m/\A{{#}/',
274+
'/:{4,a}/' => 'Unescaped left brace in regex is illegal here {#} m/:{{#}4,a}/',
275+
'/xa{3\,4}y/' => 'Unescaped left brace in regex is illegal here {#} m/xa{{#}3\,4}y/',
275276
'/abc/xix' => 'Only one /x regex modifier is allowed',
276277
'/(?xmsixp:abc)/' => 'Only one /x regex modifier is allowed {#} m/(?xmsixp{#}:abc)/',
277278
'/(?xmsixp)abc/' => 'Only one /x regex modifier is allowed {#} m/(?xmsixp{#})abc/',
@@ -621,6 +622,16 @@ my @experimental_regex_sets = (
621622
);
622623

623624
my @deprecated = (
625+
'/^{/' => "",
626+
'/foo|{/' => "",
627+
'/foo|^{/' => "",
628+
'/foo({bar)/' => "",
629+
'/foo(:?{bar)/' => "",
630+
'/\s*{/' => "",
631+
'/a{3,4}{/' => "",
632+
'/.{/' => 'Unescaped left brace in regex is deprecated here, passed through {#} m/.{{#}/',
633+
'/[x]{/' => 'Unescaped left brace in regex is deprecated here, passed through {#} m/[x]{{#}/',
634+
'/\p{Latin}{/' => 'Unescaped left brace in regex is deprecated here, passed through {#} m/\p{Latin}{{#}/',
624635
);
625636

626637
for my $strict ("", "use re 'strict';") {

0 commit comments

Comments
 (0)