-
Notifications
You must be signed in to change notification settings - Fork 581
[win32] Both IO and global match are VERY SLOW since 5.12 and up to 5.38 unless pre-heated #21654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Copied from the link:
use strict;
use warnings;
use feature 'say';
use Time::HiRes 'time';
if ( !@ARGV ) {
say $^V;
for my $size ( 5e5, 1e6, 5e6 ) {
say "Array size: $size";
for my $method ( 0, 1, 2 ) {
for my $heat ( 0, 1 ) {
system $^X, $0, $method, $heat, $size
}
}
}
}
else {
my ( $method, $preheat, $size ) = @ARGV;
my $s = join ' ', 0 .. 9;
$s .= "\n";
$s x= $size;
chomp $s;
my $t = time;
if ( $preheat ) {
my @garbage = ( undef ) x $size
}
my @a;
if ( $method == 0 ) { # split
@a = split /(?<=\n)/, $s
}
elsif ( $method == 1 ) { # global match
@a = $s =~ /(.*?\n|.+)/gs
}
elsif ( $method == 2 ) { # IO (list context)
open my $fh, '>', 'garbage.tmp';
binmode $fh;
print $fh $s;
close $fh;
open $fh, '<', 'garbage.tmp';
binmode $fh;
@a = <$fh>;
close $fh;
}
printf "\t%s, %s:\t%.3f\n",
( <split match io(list)> )[ $method ],
( $preheat ? 'pre-heat' : 'no pre-heat' ),
time - $t
} Result:
|
I'll just observe that "preheat" only touches malloc, not the array:
So it seems the problem must be in malloc or our use of it. |
Looking at @LorenzoTa's results in the PerlMonks thread, notwithstanding the pre-heating behaviour, it looks like significant performance differences between 32 & 64 bit builds. For example, the 32-bit and 64-bit
|
realloc() on Windows of large blocks is slow relative to Linux and eg. FreeBSD, since those can use mremap() to move such blocks cheaply. This code would EXTEND() the stack and prevent all those reallocs when incrementally growing the stack. |
IIRC the stack starts off quite small, we could easily change it to be much larger at startup. (Note to casual readers, this is one of the Perl stacks, not the C stack) |
A slower realloc() (O(size of allocation) vs ~O(1) for large sizes) explains why it's slower on Windows than Linux, but doesn't explain the apparent slow down over releases. From what I can see EXTEND() ends up calling av_extend() which via av_extend_guts() exponentially increases the stack by 20% each time which hasn't changed since 5.000 so I wouldn't expect the growth in time we see here. The ratios between the times:
do imply we have If we are seeing I plan to do some tests today. |
And it is the tmps stack, which grows linearly:
After making tmps growth exponential:
|
As with the value stack and the save stack, this gives us constant amortized growth per element. After this patch the profiler shows the "SvPV_shrink_to_cur(sv)" and "sv = sv_2mortal(newSV(80))" calls in do_readline as the hotspots for the io unheated test case, using 55% of the measured time in total. Fixes Perl#21654
As with the value stack and the save stack, this gives us constant amortized growth per element. After this patch the profiler shows the "SvPV_shrink_to_cur(sv)" and "sv = sv_2mortal(newSV(80))" calls in do_readline as the hotspots for the io unheated test case, using 55% of the measured time in total. Fixes Perl#21654
As with the value stack and the save stack, this gives us constant amortized growth per element. After this patch the profiler shows the "SvPV_shrink_to_cur(sv)" and "sv = sv_2mortal(newSV(80))" calls in do_readline as the hotspots for the io unheated test case, using 55% of the measured time in total. Fixes Perl#21654
As with the value stack and the save stack, this gives us constant amortized growth per element. After this patch the profiler shows the "SvPV_shrink_to_cur(sv)" and "sv = sv_2mortal(newSV(80))" calls in do_readline as the hotspots for the io unheated test case, using 55% of the measured time in total. Fixes Perl#21654
As with the value stack and the save stack, this gives us constant amortized growth per element. After this patch the profiler shows the "SvPV_shrink_to_cur(sv)" and "sv = sv_2mortal(newSV(80))" calls in do_readline as the hotspots for the io unheated test case, using 55% of the measured time in total. Fixes #21654
Hello,
in this thread at perlmonks an anonymous contributor spotted a severe performance drop in perl under windows for both IO and global match.
I have tested the code against different strawberry distros in my answer and seems something starting getting worst after 5.12 and still present in 5.38
Pre-heating (predimensionate the array) seems to mitigate the issue after 5.26
Someone in the chat supposed to be something related to malloc on win32.
I'm not in the position to dig it up further.
Thanks for looking and, if something is found, I'll take care to update the above thread.
L*
The text was updated successfully, but these errors were encountered: