-
Notifications
You must be signed in to change notification settings - Fork 580
Perl bug -- split function #5454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
From [email protected]Hi, I've attached a program that open each file of a directory one at a The result varies based on the input. For some files (when there are not Sometimes, after having read 5-6 files, the split function begins to Sometimes, the split function do not slow down too much but the memory I've tried the program on two computers running Linux (i386) on Am I doing something really stupid in the program or is there a bug in Perl? Thanks a lot, Laurent Birtz |
From [email protected]#!/usr/bin/perl -w
use strict;
$| = 1;
main();
sub main
{
my $file_data;
my @array;
my $saved_sep;
my @file_list;
### Get all file in current working directory.
@file_list = <*>;
print("The following files will be tried: @file_list\n\n");
foreach my $file (@file_list)
{
if(! -f $file)
{
print("Skipping $file: not regular\n");
next;
}
print("Opening file '$file'...\n");
open(INPUT_FILE, "<$file")
|| die("Cannot open $file: $!");
### Slurp file in.
$saved_sep = $/;
$/ = undef;
$file_data = <INPUT_FILE>;
$/ = $saved_sep;
### Attempt to split file data.
print("Split..\n");
### On my system it hangs here.
@array = split(//, $file_data);
print("End split..\n");
close(INPUT_FILE)
|| die("Cannot close $file: $!");
}
} |
From @iabynOn Mon, May 13, 2002 at 04:47:42PM -0400, Delta wrote:
The problem here is that when you split large strings into 1-character So, when you split your first large file, it takes a while, and memory The moral is: don't split 1Mbyte strings into individual characters :-) Note to perl5porters: This behaviour appears in bleedperl too; after splitting s 2.3Mb file, memory * for a future Perl, do we have the technology to presize the array in Dave.
-- |
From @paulg1973Dave Mitchell [mailto:davem@fdgroup.com] wonders, based on an inquiry from
I've worked with a number of different operating-system-supplied storage The allocator we use on VOS allocates storage that is at least 64-byte The short answer is that 1M of anything is expensive. The long answer is Any good algorithms course and / or textbook goes into great depth on these HTH |
From @paulg1973s/icebert/iceberg/ With apologies to Bert. PG
|
From @iabynOn Mon, May 13, 2002 at 07:12:43PM -0400, Green, Paul wrote:
I guess what I meant to say was: I would expect freeing 1M scalars to take a long time, but it took Dave. -- |
From @jkeenanOn Mon May 13 20:31:02 2002, RT_System wrote:
Reviewing this ticket today, my impression is that it falls into the So, is Perl "doing anything silly here"? If not, do we need to keep Thank you very much. |
From @ikegamiOn Mon May 13 20:31:02 2002, RT_System wrote:
It takes 0.1 seconds for me. 32-bit Windows threaded build from 1,000,000 5,000,000 10,000,000 use strict; use Time::HiRes qw( time ); my $x = "x" x 10_000_000; printf("split and alloc: %6.1f\n", $time2-$time1); |
From @bulk88On Sat Apr 21 14:58:41 2012, jkeenan wrote:
I looked at this in C, I think (by eye, not profiler) most of the time while (--limit) { if (make_mortal) PUSHs(dstr); s++; if (s >= strend) |
From @bulk88I thought a bit more. I dont have a Perl build with a C profiler Maybe pp_split should directly write into AVs targ style. But targ is a Made with Perl 5.12 and B Concise. # 7: my $time1 = time; |
From @bulk88Hacked NYTProf to profile opcodes aassign padav pushmark split. Long story short. pp_split or a call fron pp_split is the quadratic problem. 6 1 0s my $x = "x" x 1000000; 6 1 0s my $x = "x" x 2000000; 6 1 15.6ms my $x = "x" x 4000000; |
From @bulk88I made this XS func. void called it as use Blah; got C:\Perl>perl svspeed.pl C:\Perl> So its not those 2 calls, on a PER call basis. I noticed there was 1 |
From @bulk88rewrote as void result C:\Perl>perl svspeed.pl C:\Perl> We have the culprit. I am done. |
From @bulk88Ok, actually problem is the mortal system. C:\Perl>perl svspeed.pl C:\Perl> void |
From @iabynOn Sun, Apr 22, 2012 at 12:53:06AM -0700, bulk 88 via RT wrote:
I don't see similar slowdowns on linux with perl 5.15.9; I presume it must #include "EXTERN.h" #include "ppport.h" #include <sys/time.h> MODULE = Mytest PACKAGE = Mytest void for (j=1; j<=5; j++) { gettimeofday(&t1, NULL); printf("\n"); [davem@pigeon Mytest]$ perl5159o -Mblib /tmp/p newsv 2 time=0.185385s newsv 3 time=0.184901s newsv 4 time=0.185011s newsv 5 time=0.184997s Nor can I see any quadratic slowdown with doing @a = split //, $ling_string, so again, perhaps its very sensitive to the malloc library? -- |
From @demerphqOn 22 April 2012 13:25, Dave Mitchell <davem@iabyn.com> wrote:
This is a known issue, windows realloc did/does not extend segments. Yves -- |
From @iabynOn Sun, Apr 22, 2012 at 04:04:12PM +0200, demerphq wrote:
Since 5.005_03, the tmps stack has been grown in chunks of 128 up to 512, -- |
From @nwc10["bulk 88", thanks for diagnosing that the problem on Win32 is with the On Mon, Apr 23, 2012 at 10:08:18AM +0100, Dave Mitchell wrote:
This is the underlying Win32 runtime realloc that has this wonderful
Given that the perl "malloc" for ithreads & fork emulation already ends It seems crazy to have to keep putting in place work arounds for what is a Nicholas Clark PS I looked to see whether it would be viable to replace the contiguous |
From @bulk88On Mon Apr 23 08:12:05 2012, nicholas wrote:
I suggest a percentage growth for the mortal stack realloc rather than I made a couple examples of the windows alloc system. I compared Perl's I have an issue with Perl using the C lib's malloc system which is Summary of my perl5 (revision 5 version 12 subversion 2) configuration: Platform: Characteristics of this binary (from libperl): |
From @bulk88ptr changed at 5, ptr=00822ED4 |
From @bulk88ptr changed at 520185, ptr=00B20020 |
From @bulk88ptr changed at 9, ptr=002874E0 |
From @bulk88 |
From @bulk88forgot to say, the reason why the Heap* test didn't move the block until |
From @bulk88Staring at the data I generated. I'm jaw dropped. Its not that windows |
@dcollinsn - Status changed from 'open' to 'stalled' |
Since this ticket stalled, growth of the tmps stack growth was changed from linear (fixed size increases each time) to exponential (relative to the current size of the stack): fea90cf This was done to address similar pathological behaviour on Windows when creating a lot of tmps. Likely it addresses the main performance difference seen between Windows and other platforms in this issue. Running ikegami's sample code on a 84-bit build of blead on Linux now,
This seems reasonable, so this issue might now be closable. |
This ticket has been open for more than 22 years. We should put it out of its misery. @ikegami, @demerphq, @iabyn, do you have any additional thoughts? Otherwise, I propose we close it by Dec 31 2024. |
This ticket is excellent engineering/design talk, but there isn't a fixed solvable problem here, Most of the slow downs were my own stats, with WinXP, which are other tickets/patches on their own. OP/others never posted a demo of a "flaw", and a "this is a fix" goalpost and a "this is still broken" goalpost. Nothing really changed in 10 years Windows perf wise, I can still reproduce the pathologic move 200KB, every 4096 bytes of alloc size increase. updated code. But I have other tickets open right now, or patches in tickets up, regarding Perl and perl win32 port's backend's design problems.
NOTICE PERL's
This is still a problem in 5.41.8/Win64/Win7. IDK if MS ever improved this in Win10/11. |
Migrated from rt.perl.org#9319 (status was 'stalled')
Searchable as RT9319$
The text was updated successfully, but these errors were encountered: