Skip to content

Commit 233d2f7

Browse files
committed
Exclude RtoL characters from paired string delimiters
Fixes Perl#22228 Some scripts in the world are written right-to-left, such as Arabic and Hebrew. This can result in confusion for quote-like string delimitters that we have chosen based on left-to_right. Therefore exclude all such. Currently, the only pair that falls into this category that we don't exclude for other reasons are SYRIAC COLON SKEWED LEFT/RIGHT.
1 parent 440bec8 commit 233d2f7

File tree

4 files changed

+37
-11
lines changed

4 files changed

+37
-11
lines changed

pod/perlop.pod

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3850,7 +3850,6 @@ The complete list of accepted paired delimiters as of Unicode 14.0 is:
38503850
{ } U+007B, U+007D LEFT/RIGHT CURLY BRACKET
38513851
« » U+00AB, U+00BB LEFT/RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
38523852
» « U+00BB, U+00AB RIGHT/LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
3853-
܆ ܇ U+0706, U+0707 SYRIAC COLON SKEWED LEFT/RIGHT
38543853
༺ ༻ U+0F3A, U+0F3B TIBETAN MARK GUG RTAGS GYON, TIBETAN MARK GUG
38553854
RTAGS GYAS
38563855
༼ ༽ U+0F3C, U+0F3D TIBETAN MARK ANG KHANG GYON, TIBETAN MARK ANG
@@ -4231,5 +4230,4 @@ The complete list of accepted paired delimiters as of Unicode 14.0 is:
42314230
🢩 🢨 U+1F8A9, U+1F8A8 RIGHT/LEFTWARDS BACK-TILTED SHADOWED WHITE ARROW
42324231
🢫 🢪 U+1F8AB, U+1F8AA RIGHT/LEFTWARDS FRONT-TILTED SHADOWED WHITE
42334232
ARROW
4234-
42354233
=cut

regen/unicode_constants.pl

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -378,6 +378,7 @@ END
378378
my $illegal = "Mirror illegal";
379379
my $no_encoded_mate = "Mirrored, but Unicode has no encoded mirror";
380380
my $bidirectional = "Bidirectional";
381+
my $r2l = "Is in a Right to Left script";
381382

382383
my %unused_bidi_pairs;
383384
my %inverted_unused_bidi_pairs;
@@ -634,6 +635,15 @@ END
634635
next;
635636
}
636637

638+
# Exclude characters that are R to L ordering, as this can cause
639+
# confusion. See GH #22228
640+
if ($chr =~ / (?[ \p{Bidi_Class:R} + \p{Bidi_Class:AL} ]) /x) {
641+
$discards{$code_point} = { reason => $r2l,
642+
mirror => $mirror_code_point
643+
};
644+
next;
645+
}
646+
637647
# We enter the pair with the original code point on the left; if it
638648
# should instead be on the R, swap. Most Symbols that contain the
639649
# word REVERSE go on the rhs, except those whose names explicitly

t/lib/croak/toke

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -133,6 +133,24 @@ EXPECT
133133
Use of '«' is deprecated as a string delimiter at - line 3.
134134
Can't find string terminator "«" anywhere before EOF at - line 5.
135135
########
136+
# NAME mirrored delimiters in R-to-L scripts are invalid
137+
BEGIN { binmode STDERR, ":utf8" }
138+
use utf8;
139+
use feature 'extra_paired_delimiters';
140+
my $good = q܈this string is delimitted by a symbol in a R-to-L script܈;
141+
$good = q܇this string is delimitted by a symbol in a R-to-L script܇;
142+
my $bad = q܈Can't use mirrored R-to-L script delimiters܇;
143+
EXPECT
144+
Can't find string terminator "܈" anywhere before EOF at - line 6.
145+
########
146+
# NAME mirrored delimiters in R-to-L scripts are invalid in the other order too
147+
BEGIN { binmode STDERR, ":utf8" }
148+
use utf8;
149+
use feature 'extra_paired_delimiters';
150+
my $bad = q܇Can't use mirrored R-to-L script delimiters܈;
151+
EXPECT
152+
Can't find string terminator "܇" anywhere before EOF at - line 4.
153+
########
136154
# NAME paired above Latin1 delimiters need feature enabled
137155
BEGIN { binmode STDERR, ":utf8" }
138156
use utf8;

0 commit comments

Comments
 (0)