Skip to content

sprintf %d printing \d matches #13395

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
p5pRT opened this issue Nov 3, 2013 · 19 comments
Closed

sprintf %d printing \d matches #13395

p5pRT opened this issue Nov 3, 2013 · 19 comments

Comments

@p5pRT
Copy link

p5pRT commented Nov 3, 2013

Migrated from rt.perl.org#120448 (status was 'rejected')

Searchable as RT120448$

@p5pRT
Copy link
Author

p5pRT commented Nov 3, 2013

From [email protected]

Created by [email protected]

Had someone file a bug against 'P' because it used a numeric format
from 'sprintf' to print out a string matching "\d".... (running
w/UTF-8...)

"2.٩" is like writing "2.IX", or 2.9. Since sprintf's "%d" (and %f
for that matter) are grouped under the heading "universally-known
conversions".

By such reasoning, many people would know that '\d' is a universal
match for numbers.

Should the universal match '\d' be printable by sprintf using the
universal corresponding format of '%d'?

Should it print it "as is" and not convert it to arabic numerals
(though U+669 is the arabic-indic digit nine).

It *seems* that if the universally known pattern matching groups are
being allowed to match numbers in other scripts, that the
corresponding universally known format (\d => %d) be usable to print
it out. Otherwise, how might one be ABLE to print the value of the
foreign number propose that "\d" now matches?

Sure could complicate formatted printing, but it's not like \d has
ever meant anything other than arabic numerals [0-9]. Seeing how
precedent has been set to repurpose the 'universally known character
specifications" to match unicode w.r.t. '\d', it only seems logical to
continue the trend and do the same with '%d'.

(Note... as I write the above, I wince at the size of such a task, but
I ask again -- how would one print it in a helpful way? It seems like
just printing the whole thing as a string is *A WAY*, but it also
feels a bit like a cop-out.)

Comments, ideas and discourse that would make me feel better about
taking the 'low' road (presuming that fixing sprintf's %d to match
pattern matching's \d isn't already planned for and near completion).
;-)

Perl Info

Flags:
    category=core
    severity=medium

This perlbug was built using Perl 5.16.2 - Fri Feb 15 01:17:37 UTC 2013
It is being executed now by  Perl 5.16.2 - Fri Feb 15 01:12:05 UTC 2013.

Site configuration information for perl 5.16.2:

Configured by abuild at Fri Feb 15 01:12:05 UTC 2013.

Summary of my perl5 (revision 5 version 16 subversion 2) configuration:
   
  Platform:
    osname=linux, osvers=3.4.6-2.10-default, archname=x86_64-linux-thread-multi
    uname='linux build34 3.4.6-2.10-default #1 smp thu jul 26 09:36:26 utc 2012 (641c197) x86_64 x86_64 x86_64 gnulinux '
    config_args='-ds -e -Dprefix=/usr -Dvendorprefix=/usr -Dinstallusrbinperl -Dusethreads -Di_db -Di_dbm -Di_ndbm -Di_gdbm -Dd_dbm_open -Duseshrplib=true -Doptimize=-fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -g -Wall -pipe -Accflags=-DPERL_USE_SAFE_PUTENV -Dotherlibdirs=/usr/lib/perl5/site_perl'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DPERL_USE_SAFE_PUTENV -fno-strict-aliasing -pipe -fstack-protector -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -g -Wall -pipe',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DPERL_USE_SAFE_PUTENV -fno-strict-aliasing -pipe -fstack-protector'
    ccversion='', gccversion='4.7.2 20130108 [gcc-4_7-branch revision 195012]', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib64 -fstack-protector'
    libpth=/lib64 /usr/lib64 /usr/local/lib64
    libs=-lm -ldl -lcrypt -lpthread
    perllibs=-lm -ldl -lcrypt -lpthread
    libc=/lib64/libc-2.17.so, so=so, useshrplib=true, libperl=libperl.so
    gnulibc_version='2.17'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E -Wl,-rpath,/usr/lib/perl5/5.16.2/x86_64-linux-thread-multi/CORE'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib64 -fstack-protector'

Locally applied patches:
    


@INC for perl 5.16.2:
    /home/law/bin/lib
    /usr/lib/perl5/site_perl/5.16.2/x86_64-linux-thread-multi
    /usr/lib/perl5/site_perl/5.16.2
    /usr/lib/perl5/vendor_perl/5.16.2/x86_64-linux-thread-multi
    /usr/lib/perl5/vendor_perl/5.16.2
    /usr/lib/perl5/5.16.2/x86_64-linux-thread-multi
    /usr/lib/perl5/5.16.2
    /usr/lib/perl5/site_perl/5.16.2/x86_64-linux-thread-multi
    /usr/lib/perl5/site_perl/5.16.2
    /usr/lib/perl5/site_perl
    .


Environment for perl 5.16.2:
    HOME=/home/law
    LANG=en_US.UTF-8
    LANGUAGE (unset)
    LC_COLLATE=C
    LC_CTYPE=en_US.UTF-8
    LD_LIBRARY_PATH=/usr/lib64/mpi/gcc/openmpi/lib64
    LOGDIR (unset)
    PATH=/home/law/bin/lib:/sbin:/usr/local/sbin:/usr/lib64/mpi/gcc/openmpi/bin:/home/law/bin:/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/games:/opt/kde3/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:.:/usr/lib/qt3/bin:/opt/dell/srvadmin/bin:/usr/sbin:/etc/local/func_lib:/home/law/lib
    PERL5OPT=-Mutf8 -CSA -I/home/law/bin/lib
    PERL_BADLANG (unset)
    SHELL=/bin/bash

@p5pRT
Copy link
Author

p5pRT commented Nov 3, 2013

From @mauke

On 03.11.2013 21​:50, Linda Walsh (via RT) wrote​:

Had someone file a bug against 'P' because it used a numeric format
from 'sprintf' to print out a string matching "\d".... (running
w/UTF-8...)

"2.٩" is like writing "2.IX", or 2.9. Since sprintf's "%d" (and %f
for that matter) are grouped under the heading "universally-known
conversions".

By such reasoning, many people would know that '\d' is a universal
match for numbers.

Should the universal match '\d' be printable by sprintf using the
universal corresponding format of '%d'?

This is a category error. Regexes (such as \d) match strings. sprintf %d
takes numbers (specifically integers). So no, strings matched by \d
should not be printable via %d because %d takes integers, not strings.

sprintf %d doesn't "correspond" to \d in regex. I don't know what you
mean by "universal" here.

The two 'd's don't even refer to the same thing. sprintf %d stands for
"decimal" (there's also %o for "octal" and %x for "hexadecimal"), regex
\d stands for "digit" (and there's \w for "word character" and \s for
"(white)space character".

@p5pRT
Copy link
Author

p5pRT commented Nov 3, 2013

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented Nov 4, 2013

From @rjbs

I agree with Lukas Mai.

The place one would make a change like this would be to change the interpretation of a string-as-a-number to accept all digits, rather than only the ASCII digits.

Simply making the change would be a massive backcompat issue. Adding a lexical pragma would not be much of a help, because to avoid bizarre effects at a distance, the scope would need to be exceedingly controlled.

Easier and already possible to just use Unicode​::UCD​::num to convert strings of digits (or single numeric characters) into their numeric value, as needed.

--
rjbs

@p5pRT p5pRT closed this as completed Nov 4, 2013
@p5pRT
Copy link
Author

p5pRT commented Nov 4, 2013

@rjbs - Status changed from 'open' to 'rejected'

@p5pRT
Copy link
Author

p5pRT commented Nov 4, 2013

From [email protected]

Ricardo SIGNES via RT wrote​:

Adding a lexical pragma would not be much of a help, because to avoid
bizarre effects at a distance, the scope would need to be exceedingly
controlled.

This is one of those cases where lexical scoping doesn't help at all.
Is "2.\x{666}" a numeric scalar, and what is its numeric value?
It's normal to pass numeric strings across module boundaries, and this
usage breaks entirely if the modules have different string->number
coercion semantics. If there's a change in the coercion behaviour,
it has to be everywhere at once with no lexical or dynamic option.

I think the OP actually wanted sprintf("%.2f", "2.\x{666}") to yield
"2.\x{666}0", or possibly "2.\x{666}\x{660}". That would involve
attaching extra script information to numerical values, which would
be ridiculous.

The original bug report to which the OP referred was from me. The OP's
bug was to assume that /\d/ was a suitable thing to use in a regexp
meant to match Perl's numeric syntax for string->number coercion.
It's a somewhat common bug. Changing the coercion behaviour would fix
those modules that share this bug; or, rather, would fix *this aspect*
of them, as this bug tends to go alongside other bugs such as the use
of /$/ where /\z/ is required. (The OP's code has the /$/ bug, reported
separately.) Changing coercion behaviour (or /\d/) to fix these modules,
at the expense of breaking modules that got it right, should be viewed
similarly to the idea of changing what /$/ means for the same purpose.

-zefram

@p5pRT
Copy link
Author

p5pRT commented Nov 4, 2013

From @demerphq

On 4 November 2013 17​:04, Ricardo SIGNES via RT
<perlbug-followup@​perl.org> wrote​:

I agree with Lukas Mai.

The place one would make a change like this would be to change the interpretation of a string-as-a-number to accept all digits, rather than only the ASCII digits.

Simply making the change would be a massive backcompat issue. Adding a lexical pragma would not be much of a help, because to avoid bizarre effects at a distance, the scope would need to be exceedingly controlled.

Easier and already possible to just use Unicode​::UCD​::num to convert strings of digits (or single numeric characters) into their numeric value, as needed.

This subject comes up rather regularly. Unfortunately the Unicode
folks included non-ascii "digits" in their definition of a digit.

One plan that was kicking around for a while was a regex modifier that
made \d match only [0-9]. I was very much in favour of this plan.

Im not sure how possible this is anymore given Karls new modifiers,
and it might even already be covered. I havent checked.

But anyway, the root point here is that the problem is with \d not with sprintf.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Nov 4, 2013

From @ikegami

On Mon, Nov 4, 2013 at 12​:01 PM, demerphq <demerphq@​gmail.com> wrote​:

One plan that was kicking around for a while was a regex modifier that
made \d match only [0-9]. I was very much in favour of this plan.

Im not sure how possible this is anymore given Karls new modifiers,
and it might even already be covered. I havent checked.

/\d/a

@p5pRT
Copy link
Author

p5pRT commented Nov 6, 2013

From [email protected]

On Mon Nov 04 09​:02​:15 2013, demerphq wrote​:

This subject comes up rather regularly. Unfortunately the Unicode
folks included non-ascii "digits" in their definition of a digit.

One plan that was kicking around for a while was a regex modifier that
made \d match only [0-9]. I was very much in favour of this plan.

Im not sure how possible this is anymore given Karls new modifiers,
and it might even already be covered. I havent checked.

But anyway, the root point here is that the problem is with \d not
with sprintf.

Yves


The above hits it right on. I wasn't suggesting, *necessarily*, that sprintf
change.

To the folks who thought that I was, please note the following quote from the original​:

  "Note... as I write the above, I wince at the size of such a task, but I ask again --
  how would one print it in a helpful way? It seems like just printing the whole thing as
  a string is *A WAY*, but it also feels a bit like a cop-out.)

  Comments, ideas and discourse that would make me feel better about taking the 'low'
  road (presuming that fixing sprintf's %d to match pattern matching's \d isn't
  already planned for and near completion). ;-) "

I thought changing sprintf would be a huge task, and wasn't comfortable with the idea.

I do feel there is an inconsistency between interpreting "\d" as a string of numeric digits, and "%d" printing out a signed integer (note, \d doesn't match "\.", so it isn't a floating point number, it would be an integer) -- same goes for including "\." and trying to print with "%f".

It's the inconsistency introduced by expanding the meaning of a digit in pattern matching, yet having the same mnemonically named "%d" fail on the digits that "\d" matches.

I'm not sure that having \d match digits that %d cannot handle is a *great idea*.... in fact, I'd lean toward the opposite.

As for fixing the above to fix anything in my code -- that's ridiculous, as the above *design flaw* (note, that isn't a bug against a spec, but a defect in design -- something I do all the time, and later correct (that's called learning)) has already been worked around in my development code.

I'm not saying, necessarily "\d" needs to be changed either -- only that the mismatch was a bad choice that either needs some alternate workaround, or change.

The pattern matching got enhanced with modal charset matching. That being the case, it would be a logical and helpful solution if sprintf got similar treatment.

Rejecting this as not being a problem is called "burying one's head in the sand".

@p5pRT
Copy link
Author

p5pRT commented Nov 6, 2013

From @ikegami

On Tue, Nov 5, 2013 at 7​:15 PM, Linda Walsh via RT <
perlbug-followup@​perl.org> wrote​:

I do feel there is an inconsistency between interpreting "\d" as a string
of numeric digits, and "%d" printing out a signed integer

Then you must really hate the difference between
%u and \u
%x and \x

@p5pRT
Copy link
Author

p5pRT commented Nov 6, 2013

From [email protected]

On Tue Nov 05 20​:12​:29 2013, ikegami@​adaelis.com wrote​:

Then you must really hate the difference between
%u and \u
%x and \x


None of those have the historical usage.

Perl's usage of \u conflicts with gnu and shell usage (likely POSIX as well given gnu's and bash's posix bent lately), though if hex was my primary counting system, that %x works w/numbers and \x works with characters might bug me more. But since neither are nearly as commonly used as \d & %d, neither are nearly the ripe area for inconsistent usage.

@p5pRT
Copy link
Author

p5pRT commented Nov 6, 2013

From @mauke

On 06.11.2013 05​:26, Linda Walsh via RT wrote​:

On Tue Nov 05 20​:12​:29 2013, ikegami@​adaelis.com wrote​:

Then you must really hate the difference between
%u and \u
%x and \x
----
None of those have the historical usage.

Perl's usage of \u conflicts with gnu and shell usage (likely POSIX as well given gnu's and bash's posix bent lately), though if hex was my primary counting system, that %x works w/numbers and \x works with characters might bug me more. But since neither are nearly as commonly used as \d & %d, neither are nearly the ripe area for inconsistent usage.

What about the difference between %s and \s?

--
Lukas Mai <plokinom@​gmail.com>

@p5pRT
Copy link
Author

p5pRT commented Nov 6, 2013

From @ikegami

On Tue, Nov 5, 2013 at 11​:26 PM, Linda Walsh via RT <
perlbug-followup@​perl.org> wrote​:

On Tue Nov 05 20​:12​:29 2013, ikegami@​adaelis.com wrote​:

Then you must really hate the difference between
%u and \u
%x and \x
----
None of those have the historical usage.

huh? %u and %x are just as old!

sprintf "%d" # Converts a number to (d)ecimal
//d # (d)efault
t///d # (d)etete
/\d/ # Matches a digit (has nothing to do with numbers, even if just
matched [0-9]).

And of course there's %s and \s.

sprintf "%s" # Interpolates a (s)tring
//s # (s)ingle
s/// # (s)ubstitute
/\s/ # Matches a white(s)pace character.

Anyway, there's no design flaw. /\d/ doesn't match numbers. There are too
many definitions of numbers. Even sprintf's definition varies, and it
definitely differs from Perl's.

@p5pRT
Copy link
Author

p5pRT commented Nov 6, 2013

From [email protected]

On Wed Nov 06 06​:29​:05 2013, ikegami@​adaelis.com wrote​:

On Tue, Nov 5, 2013 at 11​:26 PM, Linda Walsh via RT <
perlbug-followup@​perl.org> wrote​:

On Tue Nov 05 20​:12​:29 2013, ikegami@​adaelis.com wrote​:

Then you must really hate the difference between
%u and \u
%x and \x
----
None of those have the historical usage.

huh? %u and %x are just as old!


You are missing the "and" between the pairs.

Or are you claiming \u used to present strings that were
printable by %u, and that \x returned strings that were printable via $x. If that is your claim, I will stand corrected.

No, it was anything read by what \d matched, was printable by %d (until perl broke it).

\d was created as a short-hand for [0-9] -- not all forms of integers in any format. It didn't match abcdef -- even though in a different encoding, they are hex digits. It didn't match I II III IV, either, as they are in a different locale. So \d wasn't designed to match any "number symbol -- it never did". It only matched [0-9].

In perl, it has been made worthless in all locales and languages. GREAT JOB GUYS!

Give me any good usage for "\d" in common usage.

(You can't.)

This all comes down to the same problem that contributed to my not being on the devel list --- I wanted to make that work, like Unicode, based on locale settings. I have
en_US.utf8 (used to be en_US.UTF-8, but that seems to no longer be in vogue, likely so "-" could be significant) in all entries except collation ("C").

I see this being the same *type* of bug that caused problems when Unicode was first implemented.

Perl knows the difference between different ranges in Unicode, but, unlike most other multi-lingual programs, it refuses to use locale settings at all, by default, and then does it wrong when you do enable them.

My locale clearly says (among other things)​:
LANG=en_US.utf8
LC_CTYPE=en_US.utf8
LC_NUMERIC=en_US.utf8

Yet this "works"[sic]​:

echo "ENV=$PERL5OPTS"
ENV=
perl -CL -we 'use 5.16.0;use utf8;my $num="1٦";
printf "%s\n", $num =~ /^\d+$/?"T"​:"F";'
T
#or equiv​:
perl -CL -we 'use 5.16.0;use utf8;my $num="1\x{666}";
printf "%s\n", $num =~ /^\d+$/?"T"​:"F";'

I would assert this bug is valid, since the matching code isn't paying attention to the locale as specified.

The Arabic-Indic number "6", is not a number in locale
en_US.* (including utf8).

Thank you for the "discussion" that helped me find bug I'd call important.

I would also point out that this would make \d useful again in every locality.

If someone specifies *no* language or country code but only "UTF8" in LC_NUMERIC, then the current behavior might be correct.

@p5pRT
Copy link
Author

p5pRT commented Nov 7, 2013

From [email protected]

TLDR​: +---------------------------------------------------------------------------+
  | This ticket should be closed until Perl's UTS18 Level 3 conformance is a |
  | documented, non-experimental, and supported element of its regex match. |
  +---------------------------------------------------------------------------+

"Linda Walsh via RT" <perlbug-followup@​perl.org> wrote
  on Wed, 06 Nov 2013 15​:19​:56 PST​:

The Arabic-Indic number "6", is not a number in locale
en_US.* (including utf8).

That's not the way it works. Code point U+0666 is a numeric digit by
definition, and this has nothing to do with your so-called "locale".

It is a digit because under UAX#44, the Unicode Standard assigns the
character property General_Category=Decimal_Number to that code point, and
this in turn derives from its being a Numeric_Type=Decimal.

Character properties are not optional. That's part of the real standard,
Unicode Standard Annax #44​: "Unicode Character Database". This isn't like
some optional UTS or something. It's a UAX, which means you *have* to do
what it says. This cannot be argued.

What you are hissing over is Annex C​: "Compatibility Properties" from
Unicode Technical Standard #18​: "Unicode Regular Expressions", in which
it gives two possible interpretations of \d​:

  Property Standard Recommendation POSIX Compatible
  ======================================================
  digit (\d) \p{gc=Decimal_Number} [0..9]

Perl has always​:

  (1) Taken UTS 18 as part of the de-facto standard.
  (2) Followed the Standard Recommendation.

If you want locale-tailoring, then you need something like

  \T{<locale_id>}..\E

Where <locale_id> is a CLDR locale, not some pansy-sass POSIX locale,
thus admitting this solution​:

  m{ \T{<us>} \d \E }x

However, that comes from UTS 18's Level 3, a conformance level that Perl
Perl has never purported to have anything whatsoever to do with. At all.
We barely squeak by through Level 1 (arguably), and have several Level 2
features. But Level 3? No.

It is behaving precisely as documented and indeed as the Standard requires,
irrespective of your personal likes or dislikes.

+---------------------------------------------------------------------------+
| This ticket should be closed until Perl's UTS18 Level 3 conformance is a |
| documented, non-experimental, and supported element of its regex match. |
+---------------------------------------------------------------------------+

In the meanwhile, you are welcome to support a match bringing Perl up to
speed with Level 3. I'm sure many would appreciate that.

--tom

@p5pRT
Copy link
Author

p5pRT commented Nov 7, 2013

From @ikegami

On Wed, Nov 6, 2013 at 6​:19 PM, Linda Walsh via RT <
perlbug-followup@​perl.org> wrote​:

On Wed Nov 06 06​:29​:05 2013, ikegami@​adaelis.com wrote​:

On Tue, Nov 5, 2013 at 11​:26 PM, Linda Walsh via RT <
perlbug-followup@​perl.org> wrote​:

On Tue Nov 05 20​:12​:29 2013, ikegami@​adaelis.com wrote​:

Then you must really hate the difference between
%u and \u
%x and \x
----
None of those have the historical usage.

huh? %u and %x are just as old!
---
You are missing the "and" between the pairs.

No, I'm not. My whole point is that there is no parallel.

Or are you claiming \u used to present strings that were

I'm "claiming" that \u doesn't match an unsigned integer, just like \d
doesn't match an signed integer.

\d was created as a short-hand for [0-9]

Perhaps, but not for "-1,234" (or can %d only do "-1234"? no matter). You
need to do more than limit \d to [0-9] to match your locale's numbers.

Give me any good usage for "\d" in common usage.

(You can't.)

Was that suppose to addressed to me? That has nothing to do with my
comments.

Perl knows the difference between different ranges in Unicode, but, unlike
most other multi-lingual programs, it refuses to use locale settings at
all, by default, and then does it wrong when you do enable them.

Are you now suggesting \d *should* match something other than [0-9]???

@p5pRT
Copy link
Author

p5pRT commented Nov 7, 2013

From [email protected]

On Wed Nov 06 16​:19​:13 2013, tom christiansen wrote​:

In English​: Having to do with locale support, how?
| This ticket should be closed until Perl's UTS18 Level 3
conformance is a |
| documented, non-experimental, and supported element of its
regex match. |

That's not the way it works. Code point U+0666 is a numeric digit by
definition, and this has nothing to do with your so-called "locale".


It's **YOUR** "so called "locale". Go read the
1) perlre page. Specifically :

Perl continues to support the old locale system, and starting in v5.16,
  provides a hybrid way to use the Unicode character set, along with the
  other portions of locales that may not be so problematic.

Except that perl never really supported the locale system -- it doesn't seem to support the language or country codes that are part of the locale system perl claims to support.

"

It is a digit because under UAX#44, the Unicode Standard assigns the
character property General_Category=Decimal_Number to that code point,
and
this in turn derives from its being a Numeric_Type=Decimal.


But it is NOT a digit in my locale. It is a digit in someplace that uses Arabic-Indic numbers.

I stated that I asked for matching that was specific to my locale, which is EN_US.<some charset>.

Character properties are not optional.
Neither are they currently being locally appropriate or useful.

If I use the -CL switch -- I see that as conforming to my locale settings. As such, \d should match my locales' definition of digits -- not the whole world's definition.

Otherwise, I will ask you the same Q that Eric dodged. What would be the use case of "\d" outside of anything that is unicode-project specific?

Can I use it to check my input forms (no)... It no longer serves the purpose for which it was intended.

Perl has always​:

(1) Taken UTS 18 as part of the de-facto standard.
(2) Followed the Standard Recommendation.

In the meanwhile, you are welcome to support a match


I'll inquire about why the standard for CLDR doesn't include regex matching. The current implementation is conceptually broken (even if technically accurate).

I ask ANYONE, how is "\d" still useful in standard day-to-day programming?

It's usage has been appropriated by some pay-to-play standards group that is not open.

Adhering to such standards (the new "prescriptive" POSIX also falls into this category; new=post 2002) mindlessly reduces humans to little more than machine parts...

@p5pRT
Copy link
Author

p5pRT commented Nov 7, 2013

From @kentfredric

On 7 November 2013 16​:09, Linda Walsh via RT <perlbug-followup@​perl.org>wrote​:

It's **YOUR** "so called "locale". Go read the
1) perlre page. Specifically :

Perl continues to support the old locale system, and starting in v5.16,
provides a hybrid way to use the Unicode character set, along with
the
other portions of locales that may not be so problematic.

Maybe you could clarify your request as follows​:

Your request is not that sprintf %d should interpret its parameter
specially.

Your request is more that, sprintf should render its parameter in a locale
sensitive way.

ie​:

sprintf "%d", 2.5 # should emit a character string that represents 2.5
in the relevant locale

and by proxy, sprintf "%d", $locale_specific_numeric_string should first
decode that numeric string via its intended locale to internal
representation, convert it to an integer, and then re-emit it in a locale
sensitive way.

This entirely side steps the argument about regexp's \d

Then the question becomes "Should sprintf do that, or is sprintf intended
to be lower level".

Though for lower level things, we have pack/unpack where you want
machine-level interpretation.

sprintf seems more "human oriented" than machine oriented, so it makes
sense to have some locale support.

But as sprintf is a *print formatting tool*, not a binary interface tool,
it makes sense that it would be tasked with interpreting values to users in
locale relevant forms.

Either that, or we need a function similar to sprintf tasked with
formatting things in locale-sensitive ways.

--
Kent

@p5pRT
Copy link
Author

p5pRT commented Nov 7, 2013

From [email protected]

On Wed Nov 06 20​:07​:14 2013, kentfredric@​gmail.com wrote​:

Maybe you could clarify your request as follows​:

Your request is not that sprintf %d should interpret its parameter
specially.

Your request is more that, sprintf should render its parameter in a
locale
sensitive way.
....
But as sprintf is a *print formatting tool*, not a binary interface
tool,
it makes sense that it would be tasked with interpreting values to
users in
locale relevant forms.

Either that, or we need a function similar to sprintf tasked with
formatting things in locale-sensitive ways.


I agree with what you are saying, wholeheartedly.

Given that, and given the situation where a user has asked that their regex pattern match according to their locale, then would it make sense to also have "\d" **match** in a locale-sensitive way?

I.e. in Indic-arabia, (?!?) it would match it's numbers. In latin1 based locales, it would match with the numbers in the basic latin set.

I don't understand why both are not equally valid needs (I.e. one doesn't obviate or preclude the other).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant