Skip to content

Commit cd9d511

Browse files
committed
Only allow punct delimiter for regex subpattern
The experimental feature that allows wildcard subpatterns in finding Unicode properties, is supposed to only allow ASCII punctuation for delimitters. But if you preceded the delimitter by a backslash, the check was skipped. This commit fixes that. It may be that we will eventually want to loosen the restriction and allow a wider range of delimiters. But until we have valid use-cases that would push us in that direction, I don't want to get into supporting stuff that we might later regret, such as invisible characters for delimitters. This feature is not really required for programs to work, so I don't view it as necessary to be as general as possible.
1 parent 11fcdeb commit cd9d511

File tree

2 files changed

+10
-6
lines changed

2 files changed

+10
-6
lines changed

regcomp.c

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -23290,10 +23290,13 @@ Perl_parse_uniprop_string(pTHX_
2329023290
/* Most punctuation after the equals indicates a subpattern, like
2329123291
* \p{foo=/bar/} */
2329223292
if ( isPUNCT_A(name[i])
23293-
&& name[i] != '-'
23294-
&& name[i] != '+'
23295-
&& name[i] != '_'
23296-
&& name[i] != '{')
23293+
&& name[i] != '-'
23294+
&& name[i] != '+'
23295+
&& name[i] != '_'
23296+
&& name[i] != '{'
23297+
/* A backslash means the real delimitter is the next character,
23298+
* but it must be punctuation */
23299+
&& (name[i] != '\\' || (i < name_len && isPUNCT_A(name[i+1]))))
2329723300
{
2329823301
/* Find the property. The table includes the equals sign, so we
2329923302
* use 'j' as-is */
@@ -23309,8 +23312,8 @@ Perl_parse_uniprop_string(pTHX_
2330923312
const char * pos_in_brackets;
2331023313
bool escaped = 0;
2331123314

23312-
/* A backslash means the real delimitter is the next character.
23313-
* */
23315+
/* Backslash => delimitter is the character following. We
23316+
* already checked that it is punctuation */
2331423317
if (open == '\\') {
2331523318
open = name[i++];
2331623319
escaped = 1;

t/re/reg_mesg.t

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -319,6 +319,7 @@ my @death =
319319
'/\x{100}(?(/' => 'Unknown switch condition (?(...)) {#} m/\\x{100}(?({#}/', # [perl #133896]
320320
'/(?[\N{KEYCAP DIGIT NINE}/' => '\N{} here is restricted to one character {#} m/(?[\\N{U+39.FE0F.20E3{#}}/', # [perl #133988]
321321
'/0000000000000000[\N{U+0.00}0000/' => 'Unmatched [ {#} m/0000000000000000[{#}\N{U+0.00}0000/', # [perl #134059]
322+
'/\p{nv=\b5\b}/' => 'Can\'t find Unicode property definition "nv=\\b5\\b" {#} m/\\p{nv=\\b5\\b}{#}/',
322323
);
323324

324325
# These are messages that are death under 'use re "strict"', and may or may

0 commit comments

Comments
 (0)