Skip to content

feature 'extra_paired_delimiters': Please remove U+0706/U+0707 from the list #22228

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
HaraldJoerg opened this issue May 21, 2024 · 2 comments · Fixed by #22229
Closed

feature 'extra_paired_delimiters': Please remove U+0706/U+0707 from the list #22228

HaraldJoerg opened this issue May 21, 2024 · 2 comments · Fixed by #22229

Comments

@HaraldJoerg
Copy link
Contributor

Description

The feature 'extra_paired_delimiters' allows many pairs, at least one of them is likely to cause confusion: U+0706 and U+0707 (SYRIAC COLON SKEWED LEFT and SYRIAC COLON SKEWED RIGHT). These characters have the BIDI property "Right-to-Left Arabic [AL]".
Unicode-aware editors and terminals process this property. The consequence: If a character between the delimiters has a left-to-right property, then everything looks normal. However, if the text between the delimiters has only BIDI neutral characters, then this text including the delimiters is displayed as right-to-left. As a special effect, if one of the characters has a "Mirror" property, then the mirrored character is displayed: a "<" is shown with the glyph ">".

Perl is not confused by this, but humans looking at the code might be. Can we please remove this (and maybe other) right-to-left delimiters from the list?

use utf8;
use feature 'extra_paired_delimiters';

my $alphabetic  = a܇; # this is correct
my $punctuation = q܆.܇; # this is also correct but looks backwards

my $weird = <-܇;
print $weird; # prints "<-", because that's actually the value

Perl configuration

# perl -V output goes here

Summary of my perl5 (revision 5 version 39 subversion 11) configuration:
Local Commit: 2fc32c2
Ancestor: e3f226d
Platform:
osname=linux
osvers=6.5.0-35-generic
archname=x86_64-linux
uname='linux hajbuntu 6.5.0-35-generic #35~22.04.1-ubuntu smp preempt_dynamic tue may 7 09:00:52 utc 2 x86_64 x86_64 x86_64 gnulinux '
config_args='-des -Dusedevel -Dprefix=/home/haj/localperl/ -Dman1dir=none -Dman3dir=none'
hint=recommended
useposix=true
d_sigaction=define
useithreads=undef
usemultiplicity=undef
use64bitint=define
use64bitall=define
uselongdouble=undef
usemymalloc=n
default_inc_excludes_dot=define
Compiler:
cc='cc'
ccflags ='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
optimize='-O2'
cppflags='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include'
ccversion=''
gccversion='11.4.0'
gccosandvers=''
intsize=4
longsize=8
ptrsize=8
doublesize=8
byteorder=12345678
doublekind=3
d_longlong=define
longlongsize=8
d_longdbl=define
longdblsize=16
longdblkind=3
ivtype='long'
ivsize=8
nvtype='double'
nvsize=8
Off_t='off_t'
lseeksize=8
alignbytes=8
prototype=define
Linker and Libraries:
ld='cc'
ldflags =' -fstack-protector-strong -L/usr/local/lib'
libpth=/usr/local/lib /usr/lib/x86_64-linux-gnu /usr/lib /usr/lib64
libs=-lpthread -ldl -lm -lcrypt -lutil -lc
perllibs=-lpthread -ldl -lm -lcrypt -lutil -lc
libc=/lib/x86_64-linux-gnu/libc.so.6
so=so
useshrplib=false
libperl=libperl.a
gnulibc_version='2.35'
Dynamic Linking:
dlsrc=dl_dlopen.xs
dlext=so
d_dlsymun=undef
ccdlflags='-Wl,-E'
cccdlflags='-fPIC'
lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector-strong'

Characteristics of this binary (from libperl):
Compile-time options:
HAS_LONG_DOUBLE
HAS_STRTOLD
HAS_TIMES
PERLIO_LAYERS
PERL_COPY_ON_WRITE
PERL_DONT_CREATE_GVSV
PERL_HASH_FUNC_SIPHASH13
PERL_HASH_USE_SBOX32
PERL_MALLOC_WRAP
PERL_OP_PARENT

khwilliamson added a commit to khwilliamson/perl5 that referenced this issue May 21, 2024

Unverified

This user has not yet uploaded their public signing key.
Fixes Perl#22228

Some scripts in the world are written right-to-left, such as Arabic and
Hebrew.  This can result in confusion for regex pattern delimitters that
we have chosen based on left-to_right.  Therefore exclude all such.
Currently, the only two that fall into this category that we don't
exclude for other reasons are SYRIAC COLON SKEWED LEFT/RIGHT.
@khwilliamson
Copy link
Contributor

Thank you for reporting this. Of course you are right; I just hadn't thought of issues with R-to-L scripts. I added an exclusion for all characters in such scripts, and the ones you mentioned are the only ones that were problematic.

I'm marking this as a release blocker, so that it gets considered for 5.40.

khwilliamson added a commit to khwilliamson/perl5 that referenced this issue May 22, 2024
Fixes Perl#22228

Some scripts in the world are written right-to-left, such as Arabic and
Hebrew.  This can result in confusion for quote-like string delimitters
that we have chosen based on left-to_right.  Therefore exclude all such.
Currently, the only pair that falls into this category that we don't
exclude for other reasons are SYRIAC COLON SKEWED LEFT/RIGHT.
khwilliamson added a commit to khwilliamson/perl5 that referenced this issue May 22, 2024
khwilliamson added a commit to khwilliamson/perl5 that referenced this issue May 22, 2024
khwilliamson added a commit to khwilliamson/perl5 that referenced this issue May 22, 2024
Fixes Perl#22228

Some scripts in the world are written right-to-left, such as Arabic and
Hebrew.  This can result in confusion for quote-like string delimitters
that we have chosen based on left-to_right.  Therefore exclude all such.
Currently, the only pair that falls into this category that we don't
exclude for other reasons are SYRIAC COLON SKEWED LEFT/RIGHT.
book pushed a commit that referenced this issue May 23, 2024
book pushed a commit that referenced this issue May 23, 2024
book pushed a commit that referenced this issue May 23, 2024
Fixes #22228

Some scripts in the world are written right-to-left, such as Arabic and
Hebrew.  This can result in confusion for quote-like string delimitters
that we have chosen based on left-to_right.  Therefore exclude all such.
Currently, the only pair that falls into this category that we don't
exclude for other reasons are SYRIAC COLON SKEWED LEFT/RIGHT.
@rjbs
Copy link
Member

rjbs commented May 30, 2024

@HaraldJoerg I echo Karl's sentiment: Thanks for this report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants