Skip to content

Commit d742518

Browse files
author
Father Chrysostomos
committed
[perl #74022] Parser hangs on some Unicode characters
This changes the definition of isIDFIRST_utf8 to avoid any characters that would put the parser in a loop. isIDFIRST_utf8 is used all over the place in toke.c. Almost every instance is followed by a call to S_scan_word. S_scan_word is only called when it is known that there is a word to scan. What was happening was that isIDFIRST_utf8 would accept a character, but S_scan_word in toke.t would then reject it, as it was using is_utf8_alnum, resulting in an infinite number of zero-length identifiers. Another possible solution was to change S_scan_word to use isIDFIRST_utf8 or similar, but that has back-compatibility problems, as it stops q·foo· from being a strings and makes it an identi- fier instead.
1 parent b5d9a95 commit d742518

File tree

2 files changed

+16
-5
lines changed

2 files changed

+16
-5
lines changed

handy.h

+10-4
Original file line numberDiff line numberDiff line change
@@ -849,10 +849,16 @@ patched there. The file as of this writing is cpan/Devel-PPPort/parts/inc/misc
849849
#define isBLANK_LC_uni(c) isBLANK(c) /* could be wrong */
850850

851851
#define isALNUM_utf8(p) is_utf8_alnum(p)
852-
/* The ID_Start of Unicode is quite limiting: it assumes a L-class
853-
* character (meaning that you cannot have, say, a CJK character).
854-
* Instead, let's allow ID_Continue but not digits. */
855-
#define isIDFIRST_utf8(p) (is_utf8_idcont(p) && !is_utf8_digit(p))
852+
/* The ID_Start of Unicode was originally quite limiting: it assumed an
853+
* L-class character (meaning that you could not have, say, a CJK charac-
854+
* ter). So, instead, perl has for a long time allowed ID_Continue but
855+
* not digits.
856+
* We still preserve that for backward compatibility. But we also make sure
857+
* that it is alphanumeric, so S_scan_word in toke.c will not hang. See
858+
* http://rt.perl.org/rt3/Ticket/Display.html?id=74022
859+
* for more detail than you ever wanted to know about. */
860+
#define isIDFIRST_utf8(p) \
861+
(is_utf8_idcont(p) && !is_utf8_digit(p) && is_utf8_alnum(p))
856862
#define isALPHA_utf8(p) is_utf8_alpha(p)
857863
#define isSPACE_utf8(p) is_utf8_space(p)
858864
#define isDIGIT_utf8(p) is_utf8_digit(p)

t/comp/parser.t

+6-1
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
# Checks if the parser behaves correctly in edge cases
44
# (including weird syntax errors)
55

6-
print "1..122\n";
6+
print "1..123\n";
77

88
sub failed {
99
my ($got, $expected, $name) = @_;
@@ -355,6 +355,11 @@ is($@, "", "multiline whitespace inside substitute expression");
355355

356356
# Add new tests HERE:
357357

358+
# bug #74022: Loop on characters in \p{OtherIDContinue}
359+
# This test hangs if it fails.
360+
eval chr 0x387;
361+
is(1,1, '[perl #74022] Parser looping on OtherIDContinue chars');
362+
358363
# More awkward tests for #line. Keep these at the end, as they will screw
359364
# with sane line reporting for any other test failures
360365

0 commit comments

Comments
 (0)