-
Notifications
You must be signed in to change notification settings - Fork 577
strangeness with Unicode #2281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
From [email protected]Created by [email protected]This is another one where I hesitate to say it's a bug, since this is my Here's a test program that inspects the length of strings in a number #!/usr/local/bin/perl -w my $smiley = "\x{263a}"; ## a smiley character my $count = 0; "" . $smiley, # 3 $smiley . "", # 5 "\x{263a}" . "\x{263a}", # 7 "\x{263a}\x{263a}", # 9 "\x{263a}" x 2, # 11 my $chars = length($string); ## Unicode characters my @regexchars = $string =~ m/(.)/g; my @splitchars = split //, $string; print "$count: string [$string] has chars=$chars/$regexchars/$splitchars, bytes=$bytes\n"; Here's the output, piped through less (which shows hex codes for non-ASCII): 1: string [<E2><98><BA>] has chars=1/1/1, bytes=3 The first four look fine to me, as <E2><98><BA> are the utf8 for the smiley: % utf8-decode and indeed, when I view the output on a utf8 xterm, I see the smiley. Lines 5 and 6 seem odd, since the length() is 3 instead of the 1 I'd expect. As for the rest, 7-12, I'd expect them all to be like #9, which shows #11 and 12 just have the length() wrong, but the other three are really % utf8-decode But, at least the length() is correct for them. So, it seems that there are two separate problems: * length() not working correctly (examples 5,6, 11, 12) But hey, I'm learning a lot about Unicode :-) Perl Info
|
From [Unknown Contact. See original ticket]Jeffrey Friedl (lists.p5p):
Could you try these with a bleeding-edge Perl, and then contact
Fun, isn't it? |
From The RT System itselfSeems to have been fixed, works in the bleeding edge post-5.7.0 Perl. |
Migrated from rt.perl.org#3599 (status was 'resolved')
Searchable as RT3599$
The text was updated successfully, but these errors were encountered: