-
Notifications
You must be signed in to change notification settings - Fork 577
sprintf %d printing \d matches #13395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
From [email protected]Created by [email protected]Had someone file a bug against 'P' because it used a numeric format "2.٩" is like writing "2.IX", or 2.9. Since sprintf's "%d" (and %f By such reasoning, many people would know that '\d' is a universal Should the universal match '\d' be printable by sprintf using the Should it print it "as is" and not convert it to arabic numerals It *seems* that if the universally known pattern matching groups are Sure could complicate formatted printing, but it's not like \d has (Note... as I write the above, I wince at the size of such a task, but Comments, ideas and discourse that would make me feel better about Perl Info
|
From @maukeOn 03.11.2013 21:50, Linda Walsh (via RT) wrote:
This is a category error. Regexes (such as \d) match strings. sprintf %d sprintf %d doesn't "correspond" to \d in regex. I don't know what you The two 'd's don't even refer to the same thing. sprintf %d stands for |
The RT System itself - Status changed from 'new' to 'open' |
From @rjbsI agree with Lukas Mai. The place one would make a change like this would be to change the interpretation of a string-as-a-number to accept all digits, rather than only the ASCII digits. Simply making the change would be a massive backcompat issue. Adding a lexical pragma would not be much of a help, because to avoid bizarre effects at a distance, the scope would need to be exceedingly controlled. Easier and already possible to just use Unicode::UCD::num to convert strings of digits (or single numeric characters) into their numeric value, as needed. -- |
@rjbs - Status changed from 'open' to 'rejected' |
From [email protected]Ricardo SIGNES via RT wrote:
This is one of those cases where lexical scoping doesn't help at all. I think the OP actually wanted sprintf("%.2f", "2.\x{666}") to yield The original bug report to which the OP referred was from me. The OP's -zefram |
From @demerphqOn 4 November 2013 17:04, Ricardo SIGNES via RT
This subject comes up rather regularly. Unfortunately the Unicode One plan that was kicking around for a while was a regex modifier that Im not sure how possible this is anymore given Karls new modifiers, But anyway, the root point here is that the problem is with \d not with sprintf. Yves -- |
From @ikegamiOn Mon, Nov 4, 2013 at 12:01 PM, demerphq <demerphq@gmail.com> wrote:
/\d/a |
From [email protected]On Mon Nov 04 09:02:15 2013, demerphq wrote:
The above hits it right on. I wasn't suggesting, *necessarily*, that sprintf To the folks who thought that I was, please note the following quote from the original: "Note... as I write the above, I wince at the size of such a task, but I ask again -- Comments, ideas and discourse that would make me feel better about taking the 'low' I thought changing sprintf would be a huge task, and wasn't comfortable with the idea. I do feel there is an inconsistency between interpreting "\d" as a string of numeric digits, and "%d" printing out a signed integer (note, \d doesn't match "\.", so it isn't a floating point number, it would be an integer) -- same goes for including "\." and trying to print with "%f". It's the inconsistency introduced by expanding the meaning of a digit in pattern matching, yet having the same mnemonically named "%d" fail on the digits that "\d" matches. I'm not sure that having \d match digits that %d cannot handle is a *great idea*.... in fact, I'd lean toward the opposite. As for fixing the above to fix anything in my code -- that's ridiculous, as the above *design flaw* (note, that isn't a bug against a spec, but a defect in design -- something I do all the time, and later correct (that's called learning)) has already been worked around in my development code. I'm not saying, necessarily "\d" needs to be changed either -- only that the mismatch was a bad choice that either needs some alternate workaround, or change. The pattern matching got enhanced with modal charset matching. That being the case, it would be a logical and helpful solution if sprintf got similar treatment. Rejecting this as not being a problem is called "burying one's head in the sand". |
From @ikegamiOn Tue, Nov 5, 2013 at 7:15 PM, Linda Walsh via RT <
Then you must really hate the difference between |
From [email protected]On Tue Nov 05 20:12:29 2013, ikegami@adaelis.com wrote:
None of those have the historical usage. Perl's usage of \u conflicts with gnu and shell usage (likely POSIX as well given gnu's and bash's posix bent lately), though if hex was my primary counting system, that %x works w/numbers and \x works with characters might bug me more. But since neither are nearly as commonly used as \d & %d, neither are nearly the ripe area for inconsistent usage. |
From @maukeOn 06.11.2013 05:26, Linda Walsh via RT wrote:
What about the difference between %s and \s? -- |
From @ikegamiOn Tue, Nov 5, 2013 at 11:26 PM, Linda Walsh via RT <
huh? %u and %x are just as old! sprintf "%d" # Converts a number to (d)ecimal And of course there's %s and \s. sprintf "%s" # Interpolates a (s)tring Anyway, there's no design flaw. /\d/ doesn't match numbers. There are too |
From [email protected]On Wed Nov 06 06:29:05 2013, ikegami@adaelis.com wrote:
You are missing the "and" between the pairs. Or are you claiming \u used to present strings that were No, it was anything read by what \d matched, was printable by %d (until perl broke it). \d was created as a short-hand for [0-9] -- not all forms of integers in any format. It didn't match abcdef -- even though in a different encoding, they are hex digits. It didn't match I II III IV, either, as they are in a different locale. So \d wasn't designed to match any "number symbol -- it never did". It only matched [0-9]. In perl, it has been made worthless in all locales and languages. GREAT JOB GUYS! Give me any good usage for "\d" in common usage. (You can't.) This all comes down to the same problem that contributed to my not being on the devel list --- I wanted to make that work, like Unicode, based on locale settings. I have I see this being the same *type* of bug that caused problems when Unicode was first implemented. Perl knows the difference between different ranges in Unicode, but, unlike most other multi-lingual programs, it refuses to use locale settings at all, by default, and then does it wrong when you do enable them. My locale clearly says (among other things): Yet this "works"[sic]:
I would assert this bug is valid, since the matching code isn't paying attention to the locale as specified. The Arabic-Indic number "6", is not a number in locale Thank you for the "discussion" that helped me find bug I'd call important. I would also point out that this would make \d useful again in every locality. If someone specifies *no* language or country code but only "UTF8" in LC_NUMERIC, then the current behavior might be correct. |
From [email protected]TLDR: +---------------------------------------------------------------------------+ "Linda Walsh via RT" <perlbug-followup@perl.org> wrote
That's not the way it works. Code point U+0666 is a numeric digit by It is a digit because under UAX#44, the Unicode Standard assigns the Character properties are not optional. That's part of the real standard, What you are hissing over is Annex C: "Compatibility Properties" from Property Standard Recommendation POSIX Compatible Perl has always: (1) Taken UTS 18 as part of the de-facto standard. If you want locale-tailoring, then you need something like \T{<locale_id>}..\E Where <locale_id> is a CLDR locale, not some pansy-sass POSIX locale, m{ \T{<us>} \d \E }x However, that comes from UTS 18's Level 3, a conformance level that Perl It is behaving precisely as documented and indeed as the Standard requires, +---------------------------------------------------------------------------+ In the meanwhile, you are welcome to support a match bringing Perl up to --tom |
From @ikegamiOn Wed, Nov 6, 2013 at 6:19 PM, Linda Walsh via RT <
No, I'm not. My whole point is that there is no parallel.
I'm "claiming" that \u doesn't match an unsigned integer, just like \d \d was created as a short-hand for [0-9] Perhaps, but not for "-1,234" (or can %d only do "-1234"? no matter). You Give me any good usage for "\d" in common usage.
Was that suppose to addressed to me? That has nothing to do with my
Are you now suggesting \d *should* match something other than [0-9]??? |
From [email protected]On Wed Nov 06 16:19:13 2013, tom christiansen wrote:
It's **YOUR** "so called "locale". Go read the Perl continues to support the old locale system, and starting in v5.16, Except that perl never really supported the locale system -- it doesn't seem to support the language or country codes that are part of the locale system perl claims to support. "
But it is NOT a digit in my locale. It is a digit in someplace that uses Arabic-Indic numbers. I stated that I asked for matching that was specific to my locale, which is EN_US.<some charset>.
If I use the -CL switch -- I see that as conforming to my locale settings. As such, \d should match my locales' definition of digits -- not the whole world's definition. Otherwise, I will ask you the same Q that Eric dodged. What would be the use case of "\d" outside of anything that is unicode-project specific? Can I use it to check my input forms (no)... It no longer serves the purpose for which it was intended.
I'll inquire about why the standard for CLDR doesn't include regex matching. The current implementation is conceptually broken (even if technically accurate). I ask ANYONE, how is "\d" still useful in standard day-to-day programming? It's usage has been appropriated by some pay-to-play standards group that is not open. Adhering to such standards (the new "prescriptive" POSIX also falls into this category; new=post 2002) mindlessly reduces humans to little more than machine parts... |
From @kentfredricOn 7 November 2013 16:09, Linda Walsh via RT <perlbug-followup@perl.org>wrote:
Maybe you could clarify your request as follows: Your request is not that sprintf %d should interpret its parameter Your request is more that, sprintf should render its parameter in a locale ie: sprintf "%d", 2.5 # should emit a character string that represents 2.5 and by proxy, sprintf "%d", $locale_specific_numeric_string should first This entirely side steps the argument about regexp's \d Then the question becomes "Should sprintf do that, or is sprintf intended Though for lower level things, we have pack/unpack where you want sprintf seems more "human oriented" than machine oriented, so it makes But as sprintf is a *print formatting tool*, not a binary interface tool, Either that, or we need a function similar to sprintf tasked with -- |
From [email protected]On Wed Nov 06 20:07:14 2013, kentfredric@gmail.com wrote:
I agree with what you are saying, wholeheartedly. Given that, and given the situation where a user has asked that their regex pattern match according to their locale, then would it make sense to also have "\d" **match** in a locale-sensitive way? I.e. in Indic-arabia, (?!?) it would match it's numbers. In latin1 based locales, it would match with the numbers in the basic latin set. I don't understand why both are not equally valid needs (I.e. one doesn't obviate or preclude the other). |
Migrated from rt.perl.org#120448 (status was 'rejected')
Searchable as RT120448$
The text was updated successfully, but these errors were encountered: