Skip to content

COW related performance regression in 5.19 #13800

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
p5pRT opened this issue May 4, 2014 · 47 comments
Closed

COW related performance regression in 5.19 #13800

p5pRT opened this issue May 4, 2014 · 47 comments

Comments

@p5pRT
Copy link

p5pRT commented May 4, 2014

Migrated from rt.perl.org#121796 (status was 'resolved')

Searchable as RT121796$

@p5pRT
Copy link
Author

p5pRT commented May 4, 2014

From @demerphq

The attached benchmark shows a marked performance degradation between
earlier perls and 5.19, I see approx 20% drop.

The performance drop has been bisected back to​:

commit 13b0f67
Author​: David Mitchell <davem@​iabyn.com>
Date​: Wed May 22 16​:38​:29 2013 +0100

  re-enable Copy-on-Write by default.

  COW was first introduced (and enabled by default) in 5.17.7.
  It was disabled by default in 5.17.10, because it was though to have too
  many rough edges for the 5.18.0 release.

  By re-enabling it now, early in the 5.19.x release cycle, hopefully it
  will be ready for production use by 5.20.

  This commit mainly reverts 9f351b4 and e1fd413 (with
modifications),
  then updates perldelta.

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented May 4, 2014

From @demerphq

hash_bench.pl

@p5pRT
Copy link
Author

p5pRT commented May 6, 2014

From @rjbs

* yves orton <perlbug-followup@​perl.org> [2014-05-04T06​:08​:46]

The attached benchmark shows a marked performance degradation between
earlier perls and 5.19, I see approx 20% drop.

Some discussion on IRC suggested that the fundamental problem is that COW is
faster on big strings and slower on short strings. There is a constant setting
the minimum length at which strings should be COW, and it is currently set to
zero, and was intended to be tuned up... but never was.

Is a reasonable next step to find a decent setting for this value to have a
better trade-off? It seems so to me. We don't need it to be optimal in every
way, of course, just to find a happier value.

--
rjbs

@p5pRT
Copy link
Author

p5pRT commented May 6, 2014

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

p5pRT commented May 6, 2014

From @demerphq

On 6 May 2014 16​:46, Ricardo Signes <perl.p5p@​rjbs.manxome.org> wrote​:

* yves orton <perlbug-followup@​perl.org> [2014-05-04T06​:08​:46]

The attached benchmark shows a marked performance degradation between
earlier perls and 5.19, I see approx 20% drop.

Some discussion on IRC suggested that the fundamental problem is that COW
is
faster on big strings and slower on short strings. There is a constant
setting
the minimum length at which strings should be COW, and it is currently set
to
zero, and was intended to be tuned up... but never was.

Is a reasonable next step to find a decent setting for this value to have a
better trade-off? It seems so to me. We don't need it to be optimal in
every
way, of course, just to find a happier value.

I thought I tried setting this and it had no affect whatsoever.

Confusingly there are actually two vars for this, and it isnt clear which
is the right one to setp.

I will try to play further.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented May 8, 2014

From @demerphq

I have further details on this issue.

$ cat ../read_bench.pl
use strict;
use warnings;

open my $fh, "<", "/usr/share/dict/american-english"
  or die "failed to open file​: $!";

my @​got;
while (<$fh>) {
  push @​got, $_;
}
__END__

Performance is 5.14.2​:

$ perl -E'say $]'; dumbbench -- perl ../read_bench.pl
5.014002
cmd​: Ran 25 iterations (2 outliers).
cmd​: Rounded run time per iteration​: 1.0204e-01 +/- 7.4e-04 (0.7%)

Performance in stock blead from this morning​:

$ ./perl -Ilib -E'say $]'; dumbbench -- ./perl -Ilib ../read_bench.pl
5.019012
cmd​: Ran 21 iterations (1 outliers).
cmd​: Rounded run time per iteration​: 4.0526e-01 +/- 9.6e-04 (0.2%)
$ ./perl -Ilib -E'say $]'; dumbbench -- ./perl -Ilib ../read_bench.pl
5.019012
cmd​: Ran 20 iterations (0 outliers).
cmd​: Rounded run time per iteration​: 4.089e-01 +/- 2.0e-03 (0.5%)

Setting SV_COW_THRESHOLD 1250 which is the same value
as SV_COWBUF_THRESHOLD results in this​:

$ ./perl -Ilib -E'say $]'; dumbbench -- ./perl -Ilib ../read_bench.pl
5.019012
cmd​: Ran 26 iterations (6 outliers).
cmd​: Rounded run time per iteration​: 1.0298e-01 +/- 2.5e-04 (0.2%)
$ ./perl -Ilib -E'say $]'; dumbbench -- ./perl -Ilib ../read_bench.pl
5.019012
cmd​: Ran 27 iterations (7 outliers).
cmd​: Rounded run time per iteration​: 1.0416e-01 +/- 3.2e-04 (0.3%)

However I do not see simply setting SV_COW_THRESHOLD as the solution. I
struggle to understand why using COW should make this code four times
slower. And I wonder if using a file with longer strings in it which are
longer than the threshold will also see the same 4 times slowdown. More
research required.

cheers,
Yves

@p5pRT
Copy link
Author

p5pRT commented May 8, 2014

From @demerphq

Further to this, i pushed the following commit as a branch. It moves the
threshold definitions to sv.c so that if you change them you dont have to
rebuild everything. The definitions are not used anywhere else in the core.

commit ff6e449
Author​: Yves Orton <demerphq@​gmail.com>
Date​: Thu May 8 11​:45​:35 2014 +0200

Move PERL_NEW_COPY_ON_WRITE thresholds to sv.c so its faster to rebuild
when tweaking them

On 8 May 2014 11​:36, demerphq <demerphq@​gmail.com> wrote​:

I have further details on this issue.

$ cat ../read_bench.pl
use strict;
use warnings;

open my $fh, "<", "/usr/share/dict/american-english"
or die "failed to open file​: $!";

my @​got;
while (<$fh>) {
push @​got, $_;
}
__END__

Performance is 5.14.2​:

$ perl -E'say $]'; dumbbench -- perl ../read_bench.pl
5.014002
cmd​: Ran 25 iterations (2 outliers).
cmd​: Rounded run time per iteration​: 1.0204e-01 +/- 7.4e-04 (0.7%)

Performance in stock blead from this morning​:

$ ./perl -Ilib -E'say $]'; dumbbench -- ./perl -Ilib ../read_bench.pl
5.019012
cmd​: Ran 21 iterations (1 outliers).
cmd​: Rounded run time per iteration​: 4.0526e-01 +/- 9.6e-04 (0.2%)
$ ./perl -Ilib -E'say $]'; dumbbench -- ./perl -Ilib ../read_bench.pl
5.019012
cmd​: Ran 20 iterations (0 outliers).
cmd​: Rounded run time per iteration​: 4.089e-01 +/- 2.0e-03 (0.5%)

Setting SV_COW_THRESHOLD 1250 which is the same value
as SV_COWBUF_THRESHOLD results in this​:

$ ./perl -Ilib -E'say $]'; dumbbench -- ./perl -Ilib ../read_bench.pl
5.019012
cmd​: Ran 26 iterations (6 outliers).
cmd​: Rounded run time per iteration​: 1.0298e-01 +/- 2.5e-04 (0.2%)
$ ./perl -Ilib -E'say $]'; dumbbench -- ./perl -Ilib ../read_bench.pl
5.019012
cmd​: Ran 27 iterations (7 outliers).
cmd​: Rounded run time per iteration​: 1.0416e-01 +/- 3.2e-04 (0.3%)

However I do not see simply setting SV_COW_THRESHOLD as the solution. I
struggle to understand why using COW should make this code four times
slower. And I wonder if using a file with longer strings in it which are
longer than the threshold will also see the same 4 times slowdown. More
research required.

cheers,
Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented May 8, 2014

From [email protected]

On 05/04/2014 12​:08 PM, yves orton (via RT) wrote​:

# New Ticket Created by yves orton
# Please include the string​: [perl #121796]
# in the subject line of all future correspondence about this issue.
# <URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=121796 >

Shorter test case. The two perls are the commit before the sha1 that
Yves pointed out and AT that exact commit (re-enabling COW).

$ time install-COW/bin/perl5.19.1 -e 'my $s = "x\n" x 1e5; open my $fh,
"<", \$s or die $!; my @​got; while (<$fh>) {$a{$_}=$_}'

real 0m0.069s
user 0m0.065s
sys 0m0.004s

$ time install-COW/bin/perl5.19.1 -e 'my $s = ("x\n") x 1e5; open my
$fh, "&lt;", \$s or die $!; my @​got; while (<$fh>) {push @​got, $_}'

real 0m1.326s
user 0m0.254s
sys 0m1.050s

If I do actual file IO, the same happens, so don't be put off by the
"open string ref" in there. If $_ is not used, appended to a scalar, or
stored in a hash, the slowdown doesn't happen. But if it's pushed in an
array... well see above. A COW-enabled perl with a high COW threshold
set does NOT exhibit this problem either.

We're still reading code (and profiler output) to find the root cause.

--Steffen

@p5pRT
Copy link
Author

p5pRT commented May 8, 2014

From @demerphq

On 8 May 2014 13​:33, Steffen Mueller <mail@​steffen-mueller.net> wrote​:

On 05/04/2014 12​:08 PM, yves orton (via RT) wrote​:

# New Ticket Created by yves orton
# Please include the string​: [perl #121796]
# in the subject line of all future correspondence about this issue.
# <URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=121796 >

Shorter test case. The two perls are the commit before the sha1 that Yves
pointed out and AT that exact commit (re-enabling COW).

$ time install-COW/bin/perl5.19.1 -e 'my $s = "x\n" x 1e5; open my $fh,
"<", \$s or die $!; my @​got; while (<$fh>) {$a{$_}=$_}'

real 0m0.069s
user 0m0.065s
sys 0m0.004s

$ time install-COW/bin/perl5.19.1 -e 'my $s = ("x\n") x 1e5; open my $fh,
"<", \$s or die $!; my @​got; while (<$fh>) {push @​got, $_}'

real 0m1.326s
user 0m0.254s
sys 0m1.050s

If I do actual file IO, the same happens, so don't be put off by the "open
string ref" in there. If $_ is not used, appended to a scalar, or stored in
a hash, the slowdown doesn't happen. But if it's pushed in an array... well
see above. A COW-enabled perl with a high COW threshold set does NOT
exhibit this problem either.

We're still reading code (and profiler output) to find the root cause.

The only place the threshold is checked is via the GE_COW_THRESHOLD(cur)
which is used exactly once in our code, in sv.c on line 4447 (give or
take)., in a complex condition. Something in the block that condition
enables is the source of this performance hit.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented May 8, 2014

From [email protected]

RT 121815 was reduced down to a script very similar to the benchmark here (I put it in the body of the email instead of as an attachment at http​://nntp.perl.org/group/perl.perl5.porters/215113); can/should that be made a child ticket of this one (or some such? I don't really know RT).

@p5pRT
Copy link
Author

p5pRT commented May 8, 2014

From [Unknown Contact. See original ticket]

RT 121815 was reduced down to a script very similar to the benchmark here (I put it in the body of the email instead of as an attachment at http​://nntp.perl.org/group/perl.perl5.porters/215113); can/should that be made a child ticket of this one (or some such? I don't really know RT).

@p5pRT
Copy link
Author

p5pRT commented May 8, 2014

From @iabyn

On Thu, May 08, 2014 at 01​:33​:16PM +0200, Steffen Mueller wrote​:

$ time install-COW/bin/perl5.19.1 -e 'my $s = ("x\n") x 1e5; open my
$fh, "&lt;", \$s or die $!; my @​got; while (<$fh>) {push @​got, $_}'

real 0m1.326s
user 0m0.254s
sys 0m1.050s

The fault here appears to lie with sv_gets() rather than COW per se.
What happens is that for some reason, sv_gets() creates a new SVPV with an
SvLEN of 200_000 (equal to the length of $s, or how much data is in the
buffer). It then reads two characters into it. So it sets $_ to​:

  SV = PV(0xa98c18) at 0xac4638
  REFCNT = 1
  FLAGS = (POK,pPOK)
  PV = 0x7ffff180c028 "x\n"\0
  CUR = 2
  LEN = 200000

The sv_setsv() called from pp_push() then does a COW copy of $_, so the
SV pushed onto the array has an SvLEN() of 200_000.

At the next call to sv_gets(), the COW is dropped from $_ and another 200K
buffer is malloc()ed for it.

So the net effect is that every SV pushed onto the array has a 200K
buffer.

How much better this is when reading from a file, I don't yet know; I suspect
that it will create lots of 4K SVs.

I had a quick look at sv_gets() 5.18.2 and it was creating 80 byte
strings, so this is probably a regression. I suspect that post 5.18.2
(i.e. after it started allocating large buffers), but before COW was
enabled, it was just copying $_ each time, so only one 200K buffer was
being allocated (and repeatedly reused), and all the SVs being pushed were
only a few bytes.

If I'm right, then the real fix probably needs to be in sv_gets();
although possibly the code in sv_setsv() could be tweaked to not COW
for a string with small SvCUR() but very large SvLEN() ?

Anyway I thought I'd report this early, since other people are working on
this in parallel.

--
I before E. Except when it isn't.

@p5pRT
Copy link
Author

p5pRT commented May 8, 2014

From @iabyn

On Thu, May 08, 2014 at 04​:28​:41PM +0100, Dave Mitchell wrote​:

The fault here appears to lie with sv_gets() rather than COW per se.

And looking further​: with this code​:

  my $s = "x\n" x 1e5;
  open my $fh, "&lt;", \$s or die $!;
  #open my $fh, "<", "/usr/share/dict/words" or die;
  my @​got;
  study;
  while (<$fh>) {
  #$a{$_}=$_
  push @​got, $_;
  }
  use Devel​::Peek; Dump $got[1];

gives​:

  $ perl5190 ~/tmp/p
  SV = PV(0x1ce9e80) at 0x1d0db40
  REFCNT = 1
  FLAGS = (POK,pPOK)
  PV = 0x1d26d60 "x\n"\0
  CUR = 2
  LEN = 16

  $ perl5191 ~/tmp/p
  SV = PV(0x1e43e80) at 0x1e67cd0
  REFCNT = 1
  FLAGS = (POK,IsCOW,pPOK)
  PV = 0x7fe865c0a010 "x\n"\0
  CUR = 2
  LEN = 200000
  COW_REFCNT = 0

Note the massive LEN.

And changing the code to read from a regular file gives​:

  $ perl5190 ~/tmp/p
  SV = PV(0x1573e90) at 0x15871f8
  REFCNT = 1
  FLAGS = (POK,pPOK)
  PV = 0x1595d20 "10-point\n"\0
  CUR = 9
  LEN = 16

  $ perl5191 ~/tmp/p
  SV = PV(0x1714e90) at 0x17282c8
  REFCNT = 1
  FLAGS = (POK,IsCOW,pPOK)
  PV = 0x1775840 "10-point\n"\0
  CUR = 9
  LEN = 8192
  COW_REFCNT = 0

--
"Foul and greedy Dwarf - you have eaten the last candle."
  -- "Hordes of the Things", BBC Radio.

@p5pRT
Copy link
Author

p5pRT commented May 8, 2014

From @demerphq

On 8 May 2014 17​:48, Dave Mitchell <davem@​iabyn.com> wrote​:

On Thu, May 08, 2014 at 04​:28​:41PM +0100, Dave Mitchell wrote​:

The fault here appears to lie with sv_gets() rather than COW per se.

And looking further​: with this code​:

my $s = "x\\n" x 1e5;
open my $fh\, "\<"\, \\$s or die $\!;
\#open my $fh\, "\<"\, "/usr/share/dict/words" or die;
my @&#8203;got;
study;
while \(\<$fh>\) \{
    \#$a\{$\_\}=$\_
    push @&#8203;got\, $\_;
\}
use Devel&#8203;::Peek; Dump $got\[1\];

gives​:

$ perl5190 ~/tmp/p
SV = PV\(0x1ce9e80\) at 0x1d0db40
  REFCNT = 1
  FLAGS = \(POK\,pPOK\)
  PV = 0x1d26d60 "x\\n"\\0
  CUR = 2
  LEN = 16

$ perl5191 ~/tmp/p
SV = PV\(0x1e43e80\) at 0x1e67cd0
  REFCNT = 1
  FLAGS = \(POK\,IsCOW\,pPOK\)
  PV = 0x7fe865c0a010 "x\\n"\\0
  CUR = 2
  LEN = 200000
  COW\_REFCNT = 0

Note the massive LEN.

And changing the code to read from a regular file gives​:

$ perl5190 ~/tmp/p
SV = PV\(0x1573e90\) at 0x15871f8
  REFCNT = 1
  FLAGS = \(POK\,pPOK\)
  PV = 0x1595d20 "10\-point\\n"\\0
  CUR = 9
  LEN = 16

$ perl5191 ~/tmp/p
SV = PV\(0x1714e90\) at 0x17282c8
  REFCNT = 1
  FLAGS = \(POK\,IsCOW\,pPOK\)
  PV = 0x1775840 "10\-point\\n"\\0
  CUR = 9
  LEN = 8192
  COW\_REFCNT = 0

This probably also explains the out of memory errors that people are seeing
as well.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented May 8, 2014

From @Leont

On Thu, May 8, 2014 at 5​:28 PM, Dave Mitchell <davem@​iabyn.com> wrote​:

On Thu, May 08, 2014 at 01​:33​:16PM +0200, Steffen Mueller wrote​:

$ time install-COW/bin/perl5.19.1 -e 'my $s = ("x\n") x 1e5; open my
$fh, "&lt;", \$s or die $!; my @​got; while (<$fh>) {push @​got, $_}'

real 0m1.326s
user 0m0.254s
sys 0m1.050s

The fault here appears to lie with sv_gets() rather than COW per se.
What happens is that for some reason, sv_gets() creates a new SVPV with an
SvLEN of 200_000 (equal to the length of $s, or how much data is in the
buffer). It then reads two characters into it. So it sets $_ to​:

SV = PV\(0xa98c18\) at 0xac4638
  REFCNT = 1
  FLAGS = \(POK\,pPOK\)
  PV = 0x7ffff180c028 "x\\n"\\0
  CUR = 2
  LEN = 200000

The sv_setsv() called from pp_push() then does a COW copy of $_, so the
SV pushed onto the array has an SvLEN() of 200_000.

At the next call to sv_gets(), the COW is dropped from $_ and another 200K
buffer is malloc()ed for it.

So the net effect is that every SV pushed onto the array has a 200K
buffer.

How much better this is when reading from a file, I don't yet know; I
suspect
that it will create lots of 4K SVs.

I had a quick look at sv_gets() 5.18.2 and it was creating 80 byte
strings, so this is probably a regression. I suspect that post 5.18.2
(i.e. after it started allocating large buffers), but before COW was
enabled, it was just copying $_ each time, so only one 200K buffer was
being allocated (and repeatedly reused), and all the SVs being pushed were
only a few bytes.

sv_gets is begging for a rewrite, for more reasons than just this one. It's
full with optimizations that I are not necessarily optimal.

If I'm right, then the real fix probably needs to be in sv_gets();
although possibly the code in sv_setsv() could be tweaked to not COW
for a string with small SvCUR() but very large SvLEN() ?

That is a good point too.

Leon

@p5pRT
Copy link
Author

p5pRT commented May 9, 2014

From @demerphq

On 8 May 2014 17​:48, Dave Mitchell <davem@​iabyn.com> wrote​:

On Thu, May 08, 2014 at 04​:28​:41PM +0100, Dave Mitchell wrote​:

The fault here appears to lie with sv_gets() rather than COW per se.

And looking further​: with this code​:

my $s = "x\\n" x 1e5;
open my $fh\, "\<"\, \\$s or die $\!;
\#open my $fh\, "\<"\, "/usr/share/dict/words" or die;
my @&#8203;got;
study;
while \(\<$fh>\) \{
    \#$a\{$\_\}=$\_
    push @&#8203;got\, $\_;
\}
use Devel&#8203;::Peek; Dump $got\[1\];

gives​:

$ perl5190 ~/tmp/p
SV = PV\(0x1ce9e80\) at 0x1d0db40
  REFCNT = 1
  FLAGS = \(POK\,pPOK\)
  PV = 0x1d26d60 "x\\n"\\0
  CUR = 2
  LEN = 16

$ perl5191 ~/tmp/p
SV = PV\(0x1e43e80\) at 0x1e67cd0
  REFCNT = 1
  FLAGS = \(POK\,IsCOW\,pPOK\)
  PV = 0x7fe865c0a010 "x\\n"\\0
  CUR = 2
  LEN = 200000
  COW\_REFCNT = 0

Note the massive LEN.

And changing the code to read from a regular file gives​:

$ perl5190 ~/tmp/p
SV = PV\(0x1573e90\) at 0x15871f8
  REFCNT = 1
  FLAGS = \(POK\,pPOK\)
  PV = 0x1595d20 "10\-point\\n"\\0
  CUR = 9
  LEN = 16

$ perl5191 ~/tmp/p
SV = PV\(0x1714e90\) at 0x17282c8
  REFCNT = 1
  FLAGS = \(POK\,IsCOW\,pPOK\)
  PV = 0x1775840 "10\-point\\n"\\0
  CUR = 9
  LEN = 8192
  COW\_REFCNT = 0

Thanks, this gave me the clue. See branch yves/cow_threshold_in_sv_c for a
possible fix.

BTW, setting PERL_PREALLOC_EMPTY_IN_SV_GETS to a higher number produces a
measurable slowdown in dumbbench.

The code probably needs a comment too.

Patch​:

commit b53d1dc
Author​: Yves Orton <demerphq@​gmail.com>
Date​: Fri May 9 02​:38​:32 2014 +0200

  in sv_gets() if the sv is empty then preallocate a small buffer

  When the COW logic is triggered there is an unfortunate slowdown.

  https://rt-archive.perl.org/perl5/Ticket/Display.html?id=121796

  What happens is that the buffer gets "stolen" by COW,
  and $_ ends up with SvLEN()==0, which through a tortorious
  pathway leads us to allocate the same amount of memory as there
  is in the current buffer. Which then gets COW'ed out, and etc.

  By initializing the string to at least something this does not happen.

  sv_gets() seriously needs love. :-(

Inline Patch
diff --git a/sv.c b/sv.c
index 439b973..3c52e0d 100644
--- a/sv.c
+++ b/sv.c
@@ -46,6 +46,8 @@
   char *gconvert(double, int, int,  char *);
 #endif

+#define PERL_PREALLOC_EMPTY_IN_SV_GETS 20
+
 #ifdef PERL_NEW_COPY_ON_WRITE
 #   ifndef SV_COW_THRESHOLD
 #    define SV_COW_THRESHOLD 0 /* min string length for cow */
@@ -8154,6 +8156,10 @@ Perl_sv_gets(pTHX_ SV *const sv, PerlIO *const fp,
I32 append)   \(\*fp\)\->\_cnt\+\+; \#endif

+ if (!SvLEN(sv)) {
+ SvGROW(sv,PERL_PREALLOC_EMPTY_IN_SV_GETS);
+ }
+
  /* Here is some breathtakingly efficient cheating */

  cnt = PerlIO_get_cnt(fp); /* get count into register */

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented May 9, 2014

From @tsee

On 05/09/2014 03​:00 AM, demerphq wrote​:

Thanks, this gave me the clue. See branch yves/cow_threshold_in_sv_c for
a possible fix.

BTW, setting PERL_PREALLOC_EMPTY_IN_SV_GETS to a higher number produces
a measurable slowdown in dumbbench.

The code probably needs a comment too.

Oh, wow. Not to take away the achievement of having figured this nasty
out, but this is a _workaround_ if I've ever seen one. :(

I've only scratched the surface, but I think this is really multiple
bugs (or unfortunate bits of behavior) flying in close formation.
Whatever allocates "the same amount of memory as there is in the current
buffer" clearly needs its head adjusted. I'm sure that's intended as an
optimization in some situations, but that'll need very careful reassessment.

The downside of preallocating a buffer of some arbitrary (small) size is
that it's likely going to mean an additional malloc and that hurts.

--Steffen

@p5pRT
Copy link
Author

p5pRT commented May 9, 2014

From @demerphq

On 9 May 2014 08​:35, Steffen Mueller <smueller@​cpan.org> wrote​:

On 05/09/2014 03​:00 AM, demerphq wrote​:

Thanks, this gave me the clue. See branch yves/cow_threshold_in_sv_c for
a possible fix.

BTW, setting PERL_PREALLOC_EMPTY_IN_SV_GETS to a higher number produces
a measurable slowdown in dumbbench.

The code probably needs a comment too.

Oh, wow. Not to take away the achievement of having figured this nasty
out, but this is a _workaround_ if I've ever seen one. :(

Yeah, I guess so. I think the alternative is a complete rewrite of the
relevant part of sv_gets().

Which I am actually game for, and started at some point (to support regex
$/), but which is not going to be ready in time for 5.20

I've only scratched the surface, but I think this is really multiple bugs
(or unfortunate bits of behavior) flying in close formation. Whatever
allocates "the same amount of memory as there is in the current buffer"
clearly needs its head adjusted. I'm sure that's intended as an
optimization in some situations, but that'll need very careful reassessment.

I was a bit tired when I wrote that. It actually allocates the same amount
as is *left* in the read buffer. And the code has the assumption that this
is done built in deep into the logic.

The downside of preallocating a buffer of some arbitrary (small) size is
that it's likely going to mean an additional malloc and that hurts.

Well when the alternative is what it is now, I think that is an acceptable
burden until 5.22 .

For what its worth I think the code in sv_gets() was written from the point
of view that it would mostly affect only $_ which would preserve the
allocations over time, so that preallocating a big buffer would be a net
speedup. When we starting *dropping* the COW'ed data we fell into an "odd"
codepath in sv_gets(), and ended up doing this weird proportional
allocation.

So I definitely agree there is a mess of weird interactions here.

However from what i can tell the way it works with my patch is pretty close
to how it worked pre-COW.

So I think we are good from now.

Sorry if this is a bit stream of consciousness, I didnt get a lot of sleep
last night. :-)

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented May 9, 2014

From @tsee

On 05/09/2014 10​:08 AM, demerphq wrote​:

So I definitely agree there is a mess of weird interactions here.

However from what i can tell the way it works with my patch is pretty
close to how it worked pre-COW.

So I think we are good from now.

Don't get me wrong. I'll take your patch over the current broken
behaviour for 5.20 any day!

--Steffen

@p5pRT
Copy link
Author

p5pRT commented May 9, 2014

From @Leont

On Fri, May 9, 2014 at 10​:08 AM, demerphq <demerphq@​gmail.com> wrote​:

Yeah, I guess so. I think the alternative is a complete rewrite of the
relevant part of sv_gets().

Which I am actually game for, and started at some point (to support regex
$/), but which is not going to be ready in time for 5.20

I had similar plans for slightly different reasons. Part of the logic
really belongs in PerlIO, not having a low-level readline really bites us
in a few tricky corner cases. Combining this with regexps is a bit of a
complication though.

For what its worth I think the code in sv_gets() was written from the
point of view that it would mostly affect only $_ which would preserve the
allocations over time, so that preallocating a big buffer would be a net
speedup. When we starting *dropping* the COW'ed data we fell into an "odd"
codepath in sv_gets(), and ended up doing this weird proportional
allocation.

So I definitely agree there is a mess of weird interactions here.

There is also the assumption that an extra read through the buffer is more
expensive than the malloc costs. This may or may not be warranted.

Leon

@p5pRT
Copy link
Author

p5pRT commented May 9, 2014

From @iabyn

On Fri, May 09, 2014 at 10​:21​:35AM +0200, Steffen Mueller wrote​:

On 05/09/2014 10​:08 AM, demerphq wrote​:

So I definitely agree there is a mess of weird interactions here.

However from what i can tell the way it works with my patch is pretty
close to how it worked pre-COW.

So I think we are good from now.

Don't get me wrong. I'll take your patch over the current broken
behaviour for 5.20 any day!

Unfortunately I don't think the patch is sufficient. If the line being
read is longer than the PVX buffer size (20 in this vase), it still does a
silly grow.

For a file full of 76 char char lines, and with your cow_threshold_in_sv_c
branch, I get with this code​:

  open my $fh, "<", "/tmp/s" or die;
  my @​got;
  while (<$fh>) {
  push @​got, $_;
  }
  use Devel​::Peek;

  for my $i (0..20) {
  Dump $got[$i];
  }

$ head /tmp/s
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
baaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
caaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
daaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
daaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
daaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
daaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
daaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
daaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
daaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

$ wc -l /tmp/s
1003 /tmp/s

$ ./perl -Ilib ~/tmp/p 2>&1 | grep LEN
  LEN = 117
  LEN = 8120
  LEN = 8048
  LEN = 7968
  LEN = 7888
  LEN = 7816
  LEN = 7736
  LEN = 7664
  LEN = 7584
  LEN = 7504
  LEN = 7432
  LEN = 7352
  LEN = 7272
  LEN = 7200
  LEN = 7120
  LEN = 7048
  LEN = 6968
  LEN = 6888
  LEN = 6816
  LEN = 6736
  LEN = 6656

I still suspect that it may be better to (also) hack the COW/swipe
decision code in sv_setsv() to prefer a copy over a COW if there is a lot
of wasted space in the buffer. A condition something like, say

  SvLEN(sv) > 16 && SvLEN(sv) > 2*SvCUR(sv)

This would just be making perl revert to pre-COW behaviour for certain
classes of strings, and is therefore very conservative and thus likely
safe for 5.20. We can always tweak this post 5.20.

I intend to spend today looking at this issue further (i.e. daytime,
GMT), and try and understand the issues better. If anyone else plans to
work on it as well during this time, let me know so we can coordinate.

--
Red sky at night - gerroff my land!
Red sky at morning - gerroff my land!
  -- old farmers' sayings #14

@p5pRT
Copy link
Author

p5pRT commented May 9, 2014

From @demerphq

On 9 May 2014 12​:46, Dave Mitchell <davem@​iabyn.com> wrote​:

On Fri, May 09, 2014 at 10​:21​:35AM +0200, Steffen Mueller wrote​:

On 05/09/2014 10​:08 AM, demerphq wrote​:

So I definitely agree there is a mess of weird interactions here.

However from what i can tell the way it works with my patch is pretty
close to how it worked pre-COW.

So I think we are good from now.

Don't get me wrong. I'll take your patch over the current broken
behaviour for 5.20 any day!

Unfortunately I don't think the patch is sufficient. If the line being
read is longer than the PVX buffer size (20 in this vase), it still does a
silly grow.

I agree its not great. I think maybe the threshold should be raised.

For a file full of 76 char char lines, and with your cow_threshold_in_sv_c
branch, I get with this code​:

open my $fh\, "\<"\, "/tmp/s" or die;
my @&#8203;got;
while \(\<$fh>\) \{
    push @&#8203;got\, $\_;
\}
use Devel&#8203;::Peek;

for my $i \(0\.\.20\) \{
    Dump $got\[$i\];
\}

$ head /tmp/s

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

baaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

caaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

daaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

daaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

daaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

daaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

daaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

daaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

daaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

$ wc -l /tmp/s
1003 /tmp/s

$ ./perl -Ilib ~/tmp/p 2>&1 | grep LEN
LEN = 117
LEN = 8120
LEN = 8048
LEN = 7968
LEN = 7888
LEN = 7816
LEN = 7736
LEN = 7664
LEN = 7584
LEN = 7504
LEN = 7432
LEN = 7352
LEN = 7272
LEN = 7200
LEN = 7120
LEN = 7048
LEN = 6968
LEN = 6888
LEN = 6816
LEN = 6736
LEN = 6656

I still suspect that it may be better to (also) hack the COW/swipe
decision code in sv_setsv() to prefer a copy over a COW if there is a lot
of wasted space in the buffer. A condition something like, say

SvLEN\(sv\) > 16 && SvLEN\(sv\) > 2\*SvCUR\(sv\)

This would just be making perl revert to pre-COW behaviour for certain
classes of strings, and is therefore very conservative and thus likely
safe for 5.20. We can always tweak this post 5.20.

I intend to spend today looking at this issue further (i.e. daytime,
GMT), and try and understand the issues better. If anyone else plans to
work on it as well during this time, let me know so we can coordinate.

I did this last night but apparently forgot to push it​:

$ git log -1 origin/yves/check_cow_threshold
commit 94992f5
Author​: Yves Orton <demerphq@​gmail.com>
Date​: Fri May 9 00​:34​:38 2014 +0200

  check_cow_threshold

However I think it is also the wrong fix, in that I dont see why strings
like this should not be COWed, although maybe it together with my other
patch is a usable workaround.

The problem seems to really be related to *uncowing* a string.

A strategy where the "original owner" of the string got to keep its string,
and everybody else got properly sized copies would avoid all this. Instead
we do this weird "the owner drops its ownership of the string and the
sharer gets its copy" thing which leads to this overallocation. This works
against any code that was to preallocate a buffer and then write into it to
save a bunch of mallocs.

I was also thinking that we really could use a "DONT COW THIS SV" flag so
that we could mark a scalar like $_ as such.

Also I think the behavior of sv_gets() is not perfect. Maybe it made sense
at one point, but I dont think it makes much sense now. At the very least
it is suboptimal for delimiters longer than one byte.

We need to bisect when this behavior of setting the size of $_ to the size
of the remaining buffer was introduced.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented May 11, 2014

From @demerphq

Thoughts on sv_gets() and COW.

sv_gets() implements readline, and sets from <>. It handles the various
interesting cases of $/ by delegation, and handles the simple case of
delimited lines itself. It has two implementations of this logic, one a
"fast" set which "cheats" the IO system and processes block by block by
direct pointer manipulation, and another a slow system intended for IO
styles where there is no read ahead buffer. This mail is about the fast
branch of sv_gets().

The partof sv_gets() that implements "find a delimiter" logic starts with
the comment

/* Here is some breathtakingly efficient cheating */

The uninitiated then finds a dense and nearly impenetrable set of logic,
with seemingly no structure.

Being uninitiated I thought I would try to rewrite it to be friendly to
COW. :-)

This was a useful if perhaps unfruitful exercise as it taught me that the
problem is more complex than it appeared to me at first, and what the
current solution does is vastly more clever than meets the eye.

First off, resist the tendency to think about the problem in terms of line
endings like "\n". This is a mistake. Delimiters with length of 1 are a
special simple case that because of their size side step the real challenge
at hand which is how to identify long delimiters (2 or more chars) which
cross a read buffer boundary.

The strategy that sv_gets() uses is to preallocate the target SV with as
many bytes as could be read from the file buffer (via SvGROW), and then it
scans forward, copying as it goes, looking for the char that matches the
last character in the delimiter. If it has just copied the last character
then it checks the relevant last K bytes of the target SV, and if it
matches it stops the copy/scan, if it runs out of buffer it simply does
another grow/read and continues the pattern. (It is smart enough however
that when the string contains unused space to first try to fill that before
reallocing, if the string fits then no realloc is done at all.)

This cleverness of this strategy is that Perl performs relatively few
realloc calls on the SV's pv, and by using the target SV's pv for checking
long delimiters it side steps any issues of multi-byte characters crossing
buffer boundaries, and the same code handles single and multi-byte
delimiters more or less as efficiently as each other.

The problem however is that it *really* wants its target strings to have
enough room, and it expects that the buffer is reusable over subsequent
calls. (There is even an "append" interface internally).

This last part plays really badly with how COW is currently implemented.
The way that COW works is that each string has a single byte refcount at
its tail, and sharing is achieved by all users having a pointer to the same
pv. Sharing and unsharing is by changing refcounts and copying or
overwriting a pv. A byproduct of this is that "ownership" of the pv is
"last-writer-owns-the-first-copy".

Consider the following code​:

while (<>) {
  push @​got, $_;
}

What this code does is first load $_ with sv_gets(), and lets assume that
$_ ends up holding a long string, say 100 characters, inside of a 8092
byte allocated buffer.

Then the push @​got, $_ results in $got[0] and $_ containing a COW'ed pv.
When on the next read $_ is assigned to again, its copy of the PV is
dropped, so now $got[0] "owns" the string alone, and sv_gets() has to
allocate a new 8192 byte pv for the SV. A workaround is to ensure that an
SV with no pv is pre-initialized to a certain size, but this only moves the
problem around, short strings still overallocate, and long strings still
trigger the full buffer alloc. Changing the algorithm to read in smaller
chunks would solve the latter problem but not the former.

Pre COW this problem does not come up because $got[0] ends up with a copy
of $_, properly sized, and sv_gets() most of the time reuses whatever space
it allocated in $_.

The most obvious short term fix for this is what Dave M proposed, impose an
upper bound on how much wastage can occur before a string is COWed. A
combined scaling and hard threshold will avoid the worst of this kind of
behaviour in sv_gets() and maybe elsewhere.

I wonder however if this is really just masking the deeper issue, and that
there aren't other ways to approach this issue. It occurs to me that if COW
somehow respected "original ownership" then most previous expectations
would be fulfilled. Another thought is that perhaps we need a way to flag
an SV as unCOWable.

Yves

On 4 May 2014 12​:08, yves orton <perlbug-followup@​perl.org> wrote​:

# New Ticket Created by yves orton
# Please include the string​: [perl #121796]
# in the subject line of all future correspondence about this issue.
# <URL​: https://rt-archive.perl.org/perl5/Ticket/Display.html?id=121796 >

The attached benchmark shows a marked performance degradation between
earlier perls and 5.19, I see approx 20% drop.

The performance drop has been bisected back to​:

commit 13b0f67
Author​: David Mitchell <davem@​iabyn.com>
Date​: Wed May 22 16​:38​:29 2013 +0100

re\-enable Copy\-on\-Write by default\.

COW was first introduced \(and enabled by default\) in 5\.17\.7\.
It was disabled by default in 5\.17\.10\, because it was though to have

too
many rough edges for the 5.18.0 release.

By re\-enabling it now\, early in the 5\.19\.x release cycle\, hopefully it
will be ready for production use by 5\.20\.

This commit mainly reverts 9f351b45f4 and e1fd41328c \(with

modifications),
then updates perldelta.

--
perl -Mre=debug -e "/just|another|perl|hacker/"

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented May 11, 2014

From [email protected]

On 05/11/2014 11​:44 AM, demerphq wrote​:

Thoughts on sv_gets() and COW.

sv_gets() implements readline, and sets from <>. It handles the various
interesting cases of $/ by delegation, and handles the simple case of
delimited lines itself. It has two implementations of this logic, one a
"fast" set which "cheats" the IO system and processes block by block by
direct pointer manipulation, and another a slow system intended for IO
styles where there is no read ahead buffer. This mail is about the fast
branch of sv_gets().

The partof sv_gets() that implements "find a delimiter" logic starts
with the comment

/* Here is some breathtakingly efficient cheating */

The uninitiated then finds a dense and nearly impenetrable set of logic,
with seemingly no structure.

Being uninitiated I thought I would try to rewrite it to be friendly to
COW. :-)

This was a useful if perhaps unfruitful exercise as it taught me that
the problem is more complex than it appeared to me at first, and what
the current solution does is vastly more clever than meets the eye.

First off, resist the tendency to think about the problem in terms of
line endings like "\n". This is a mistake. Delimiters with length of 1
are a special simple case that because of their size side step the real
challenge at hand which is how to identify long delimiters (2 or more
chars) which cross a read buffer boundary.

The strategy that sv_gets() uses is to preallocate the target SV with as
many bytes as could be read from the file buffer (via SvGROW), and then
it scans forward, copying as it goes, looking for the char that matches
the last character in the delimiter. If it has just copied the last
character then it checks the relevant last K bytes of the target SV, and
if it matches it stops the copy/scan, if it runs out of buffer it simply
does another grow/read and continues the pattern. (It is smart enough
however that when the string contains unused space to first try to fill
that before reallocing, if the string fits then no realloc is done at all.)

This cleverness of this strategy is that Perl performs relatively few
realloc calls on the SV's pv, and by using the target SV's pv for
checking long delimiters it side steps any issues of multi-byte
characters crossing buffer boundaries, and the same code handles single
and multi-byte delimiters more or less as efficiently as each other.

The problem however is that it *really* wants its target strings to have
enough room, and it expects that the buffer is reusable over subsequent
calls. (There is even an "append" interface internally).

This last part plays really badly with how COW is currently implemented.
The way that COW works is that each string has a single byte refcount at
its tail, and sharing is achieved by all users having a pointer to the
same pv. Sharing and unsharing is by changing refcounts and copying or
overwriting a pv. A byproduct of this is that "ownership" of the pv is
"last-writer-owns-the-first-copy".

Consider the following code​:

while (<>) {
push @​got, $_;
}

What this code does is first load $_ with sv_gets(), and lets assume
that $_ ends up holding a long string, say 100 characters, inside of a
8092 byte allocated buffer.

Then the push @​got, $_ results in $got[0] and $_ containing a COW'ed pv.
When on the next read $_ is assigned to again, its copy of the PV is
dropped, so now $got[0] "owns" the string alone, and sv_gets() has to
allocate a new 8192 byte pv for the SV. A workaround is to ensure that
an SV with no pv is pre-initialized to a certain size, but this only
moves the problem around, short strings still overallocate, and long
strings still trigger the full buffer alloc. Changing the algorithm to
read in smaller chunks would solve the latter problem but not the former.

Pre COW this problem does not come up because $got[0] ends up with a
copy of $_, properly sized, and sv_gets() most of the time reuses
whatever space it allocated in $_.

The most obvious short term fix for this is what Dave M proposed, impose
an upper bound on how much wastage can occur before a string is COWed. A
combined scaling and hard threshold will avoid the worst of this kind of
behaviour in sv_gets() and maybe elsewhere.

I wonder however if this is really just masking the deeper issue, and
that there aren't other ways to approach this issue. It occurs to me
that if COW somehow respected "original ownership" then most previous
expectations would be fulfilled. Another thought is that perhaps we need
a way to flag an SV as unCOWable.

I'm really not a fan of yet another state bit on an SV. It's already all
so twisted that

@p5pRT
Copy link
Author

p5pRT commented May 11, 2014

From @tsee

Somehow I think I just accidentally sent my reply long before I was done
with it. Apologies.

On 05/11/2014 11​:44 AM, demerphq wrote​:

The most obvious short term fix for this is what Dave M proposed, impose
an upper bound on how much wastage can occur before a string is COWed. A
combined scaling and hard threshold will avoid the worst of this kind of
behaviour in sv_gets() and maybe elsewhere.

I wonder however if this is really just masking the deeper issue, and
that there aren't other ways to approach this issue. It occurs to me
that if COW somehow respected "original ownership" then most previous
expectations would be fulfilled. Another thought is that perhaps we need
a way to flag an SV as unCOWable.

I don't think another bit of state that needs to be taken into account
all the time is a very good idea. Few really manage to use SVs safely as
it stands.

How about changing the scanning logic to NOT copy as it goes but really
mostly just do scanning a la strchr() and then memcpy()? An extra pass,
but then you _know_ it'll be exactly one malloc for the string buffer. I
think that may often be more efficient and it will also be simpler.

--Steffen

@p5pRT
Copy link
Author

p5pRT commented May 11, 2014

From @Leont

On Sun, May 11, 2014 at 12​:34 PM, Steffen Mueller <smueller@​cpan.org> wrote​:

I don't think another bit of state that needs to be taken into account all
the time is a very good idea. Few really manage to use SVs safely as it
stands.

Maybe. I don't think we have flag bits available to the point is moot right
now anyway.

How about changing the scanning logic to NOT copy as it goes but really

mostly just do scanning a la strchr() and then memcpy()? An extra pass, but
then you _know_ it'll be exactly one malloc for the string buffer. I think
that may often be more efficient and it will also be simpler.

That's what I suggested earlier. It would make more sense to me too, though
it's still a trade-off; given the buffer should be in L1 cache I'd expect
the extra read to be relatively cheap, but that's just hand-waving. Also,
you don't always know how much to allocate, in particular when a line
crosses a buffer boundary; moving the strchr step into PerlIO would be a
extremely helpful step there, also for other more complicated reasons
(#113424 for example).

Leon

@p5pRT
Copy link
Author

p5pRT commented May 11, 2014

From @tsee

On 05/11/2014 02​:25 PM, Leon Timmermans wrote​:

On Sun, May 11, 2014 at 12​:34 PM, Steffen Mueller <smueller@​cpan.org
<mailto​:smueller@​cpan.org>> wrote​:

I don't think another bit of state that needs to be taken into
account all the time is a very good idea\. Few really manage to use
SVs safely as it stands\.

Maybe. I don't think we have flag bits available to the point is moot
right now anyway.

How about changing the scanning logic to NOT copy as it goes but
really mostly just do scanning a la strchr\(\) and then memcpy\(\)? An
extra pass\, but then you \_know\_ it'll be exactly one malloc for the
string buffer\. I think that may often be more efficient and it will
also be simpler\.

That's what I suggested earlier. It would make more sense to me too,
though it's still a trade-off; given the buffer should be in L1 cache
I'd expect the extra read to be relatively cheap, but that's just
hand-waving.

I think that's a tad optimistic. We're talking about string processing
after all. I'd already be happy if most OP's were in L1. ;)

Also, you don't always know how much to allocate, in
particular when a line crosses a buffer boundary; moving the strchr step
into PerlIO would be a extremely helpful step there, also for other more
complicated reasons (#113424 for example).

Agreed.

--Steffen

@p5pRT
Copy link
Author

p5pRT commented May 12, 2014

From @bulk88

On Sun May 11 02​:45​:34 2014, demerphq wrote​:

The most obvious short term fix for this is what Dave M proposed, impose an
upper bound on how much wastage can occur before a string is COWed. A
combined scaling and hard threshold will avoid the worst of this kind of
behaviour in sv_gets() and maybe elsewhere.

I wonder however if this is really just masking the deeper issue, and that
there aren't other ways to approach this issue. It occurs to me that if COW
somehow respected "original ownership" then most previous expectations
would be fulfilled. Another thought is that perhaps we need a way to flag
an SV as unCOWable.

Yves

Here is an idea, realloc to SvCUR.

--
bulk88 ~ bulk88 at hotmail.com

@p5pRT
Copy link
Author

p5pRT commented May 12, 2014

From @rjbs

Yves noted that the oversizing of strings may be the cause of OOM in https​://rt.perl.org/Ticket/Display.html?id=121826 and (I believe) https://rt-archive.perl.org/perl5/Ticket/Display.html?id=121815

--
rjbs

@p5pRT
Copy link
Author

p5pRT commented May 12, 2014

From [Unknown Contact. See original ticket]

Yves noted that the oversizing of strings may be the cause of OOM in https​://rt.perl.org/Ticket/Display.html?id=121826 and (I believe) https://rt-archive.perl.org/perl5/Ticket/Display.html?id=121815

--
rjbs

@p5pRT
Copy link
Author

p5pRT commented May 12, 2014

From @iabyn

I wonder however if this is really just masking the deeper issue, and that
there aren't other ways to approach this issue. It occurs to me that if COW
somehow respected "original ownership"

Note however that COW has different use cases which would imply
different priorities for when to copy.

For example, sometimes the original string is just a temporary value,
and a following assignment is where you want the value to end up​:

  $x = (string-expression);

Here, the expression creates a temporary SV, which we want to pass to $x
without copying, then free the temporary SV.

Conversely, in something like

  $s =~ /(.)/;

we likely want to keep $s, and temporarily make use of $1.

--
The optimist believes that he lives in the best of all possible worlds.
As does the pessimist.

@p5pRT
Copy link
Author

p5pRT commented May 12, 2014

From @demerphq

On 12 May 2014 17​:25, Ricardo SIGNES via RT <perlbug-comment@​perl.org>wrote​:

Yves noted that the oversizing of strings may be the cause of OOM in
https://rt-archive.perl.org/perl5/Ticket/Display.html?id=121826 and (I believe)
https://rt-archive.perl.org/perl5/Ticket/Display.html?id=121815

We should know soon. The patch I pushed this morning should be run against
the offending modules.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented May 13, 2014

From @demerphq

On 11 May 2014 15​:02, Steffen Mueller <smueller@​cpan.org> wrote​:

On 05/11/2014 02​:25 PM, Leon Timmermans wrote​:

On Sun, May 11, 2014 at 12​:34 PM, Steffen Mueller <smueller@​cpan.org
<mailto​:smueller@​cpan.org>> wrote​:

I don't think another bit of state that needs to be taken into
account all the time is a very good idea\. Few really manage to use
SVs safely as it stands\.

Maybe. I don't think we have flag bits available to the point is moot
right now anyway.

How about changing the scanning logic to NOT copy as it goes but
really mostly just do scanning a la strchr\(\) and then memcpy\(\)? An
extra pass\, but then you \_know\_ it'll be exactly one malloc for the
string buffer\. I think that may often be more efficient and it will
also be simpler\.

That's what I suggested earlier. It would make more sense to me too,
though it's still a trade-off; given the buffer should be in L1 cache
I'd expect the extra read to be relatively cheap, but that's just
hand-waving.

I think that's a tad optimistic. We're talking about string processing
after all. I'd already be happy if most OP's were in L1. ;)

Also, you don't always know how much to allocate, in

particular when a line crosses a buffer boundary; moving the strchr step
into PerlIO would be a extremely helpful step there, also for other more
complicated reasons (#113424 for example).

Agreed.

Steffen and I accidentally continued discussing this offlist (the
accidental part), and he really challenged me on my claim.

And I have to admit I was wrong.

After a bunch of benchmarking, analysis, and some late night hacking
Steffen and i have concluded that we can speed up the code in sv_gets() by
something like an order of magnitude (or so) by using a scan and then copy
strategy.

I do not have an implementation yet, but we definitely can do better than
we do now in sv_gets().

It is on my todo list. More in the future. Thanks to Steffen for not
believing me. Working with people like him makes people like me raise their
game. :-)

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented May 13, 2014

From @Leont

On Sun, May 11, 2014 at 3​:02 PM, Steffen Mueller <smueller@​cpan.org> wrote​:

On 05/11/2014 02​:25 PM, Leon Timmermans wrote​:

That's what I suggested earlier. It would make more sense to me too,
though it's still a trade-off; given the buffer should be in L1 cache
I'd expect the extra read to be relatively cheap, but that's just
hand-waving.

I think that's a tad optimistic. We're talking about string processing
after all. I'd already be happy if most OP's were in L1. ;)

I meant after the strchr(), right after the buffer is accessed it I'd hope
it to be cached, specially if the cache is split by data/code. I may well
be mistaken though.

Leon

@p5pRT
Copy link
Author

p5pRT commented May 19, 2014

From @rjbs

I've removed this from blocking 5.20.0.

Yves, do you want to keep this ticket open for the further changes, or make a new one?

--
rjbs

@p5pRT
Copy link
Author

p5pRT commented May 19, 2014

From [Unknown Contact. See original ticket]

I've removed this from blocking 5.20.0.

Yves, do you want to keep this ticket open for the further changes, or make a new one?

--
rjbs

@p5pRT
Copy link
Author

p5pRT commented May 19, 2014

From @demerphq

On 19 May 2014 16​:42, Ricardo SIGNES via RT <perlbug-comment@​perl.org>wrote​:

I've removed this from blocking 5.20.0.

Yves, do you want to keep this ticket open for the further changes, or
make a new one?

I think we should start a new ticket. So please feel free to close this one.

Yves

@p5pRT
Copy link
Author

p5pRT commented May 19, 2014

From @rjbs

Closed per discussion with Yves.

--
rjbs

@p5pRT
Copy link
Author

p5pRT commented May 19, 2014

@rjbs - Status changed from 'open' to 'resolved'

@p5pRT p5pRT closed this as completed May 19, 2014
@p5pRT
Copy link
Author

p5pRT commented Aug 23, 2014

From @cpansprout

On Mon May 12 09​:22​:19 2014, demerphq wrote​:

On 12 May 2014 17​:25, Ricardo SIGNES via RT <perlbug-comment@​perl.org>wrote​:

Yves noted that the oversizing of strings may be the cause of OOM in
https://rt-archive.perl.org/perl5/Ticket/Display.html?id=121826 and (I believe)
https://rt-archive.perl.org/perl5/Ticket/Display.html?id=121815

We should know soon. The patch I pushed this morning should be run against
the offending modules.

Finding this whole discussion late, I would like to thank Yves Orton and Dave Mitchell for diagnosing and fixing this. That’s quite a bit of detective work!

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Aug 23, 2014

From [Unknown Contact. See original ticket]

On Mon May 12 09​:22​:19 2014, demerphq wrote​:

On 12 May 2014 17​:25, Ricardo SIGNES via RT <perlbug-comment@​perl.org>wrote​:

Yves noted that the oversizing of strings may be the cause of OOM in
https://rt-archive.perl.org/perl5/Ticket/Display.html?id=121826 and (I believe)
https://rt-archive.perl.org/perl5/Ticket/Display.html?id=121815

We should know soon. The patch I pushed this morning should be run against
the offending modules.

Finding this whole discussion late, I would like to thank Yves Orton and Dave Mitchell for diagnosing and fixing this. That’s quite a bit of detective work!

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Aug 24, 2014

From @demerphq

On 23 August 2014 07​:20, Father Chrysostomos via RT <
perlbug-comment@​perl.org> wrote​:

On Mon May 12 09​:22​:19 2014, demerphq wrote​:

On 12 May 2014 17​:25, Ricardo SIGNES via RT <perlbug-comment@​perl.org
wrote​:

Yves noted that the oversizing of strings may be the cause of OOM in
https://rt-archive.perl.org/perl5/Ticket/Display.html?id=121826 and (I believe)
https://rt-archive.perl.org/perl5/Ticket/Display.html?id=121815

We should know soon. The patch I pushed this morning should be run
against
the offending modules.

Finding this whole discussion late, I would like to thank Yves Orton and
Dave Mitchell for diagnosing and fixing this. That’s quite a bit of
detective work!

Now you are back I think there are issues raised in this thread that bear
further discussion.

Specifically​:

Should we have a "do not COW" flag?

Also, I had a question about the implementation of the refcount data for
COW/shared strings. Specifically the refcount is at the *end* of the string.

I was wondering if you had thought about doing something like the OOK
strategy and putting it at the *head* of the string instead of at the tail?
We could then store buf+sizeof(refcount) in the pv slot, so to pretty much
all code there would be no difference between a shared or unshared string,
the SvCUR/SvLEN data would be identical, etc. The only place that would
have to know about the difference would be code that does a realloc(),
which would have to be taught about the offset trick.

The reason I ask is twofold, first there is a lack of locality of reference
with the refcount being at the end of the string, more operations on a
string look at its first bytes than last, and second the refcount is
currently in user "visible" data at the end of the string. I can imagine
dodgy XS code not aware off the refcount data deciding to use the full
SvLEN of the string.

Just curious for your thoughts.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Aug 24, 2014

From @cpansprout

On Sun Aug 24 04​:17​:31 2014, demerphq wrote​:

Now you are back I think there are issues raised in this thread that bear
further discussion.

Specifically​:

Should we have a "do not COW" flag?

It doesn’t really matter to me.

I think that in each case where it seems that COW shouldn’t happen, it’s because the COW algorithm needs adjustment. Fixing the algorithm for one particular inefficient case should also fix it for other similar cases.

That said, I would not be opposed to such a flag. (BTW, SVf_READONLY already means that, but I assume you want a flag for mutable strings.) We have plenty of room in sv_flags. You just have to look closely to see it. :-) Specifically, SVf_FAKE is unused on strings now and SVf_OOK|SVf_IsCOW is an unused combination. SVf_AMAGIC is free for non-HVs.

Also, I had a question about the implementation of the refcount data for
COW/shared strings. Specifically the refcount is at the *end* of the string.

I was wondering if you had thought about doing something like the OOK
strategy and putting it at the *head* of the string instead of at the tail?
We could then store buf+sizeof(refcount) in the pv slot, so to pretty much
all code there would be no difference between a shared or unshared string,
the SvCUR/SvLEN data would be identical, etc. The only place that would
have to know about the difference would be code that does a realloc(),
which would have to be taught about the offset trick.

The whole point of this COW mechanism was to upgrade existing strings to COW strings. That precludes using a byte at the beginning of the string, because that byte is already in use.

If you are thinking of allocating most strings with an extra initial byte to begin with, that could work, but would effectively disable COW* until we comb through the entire codebase changing every instance of SvPVX_set to account. Is it worth that much trouble?

* Bear in mind that I have an innate tendency to exaggerate.

The reason I ask is twofold, first there is a lack of locality of reference
with the refcount being at the end of the string, more operations on a
string look at its first bytes than last, and second the refcount is
currently in user "visible" data at the end of the string. I can imagine
dodgy XS code not aware off the refcount data deciding to use the full
SvLEN of the string.

That is why I sent patches to many CPAN modules to stop them from diddling with SvPVX without calling sv_force_normal first. :-)

I hope that COW strings are common enough that such dodgy behaviour will result in test failures.

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Aug 26, 2014

From @cpansprout

On Sun Aug 24 07​:06​:26 2014, sprout wrote​:

On Sun Aug 24 04​:17​:31 2014, demerphq wrote​:

I was wondering if you had thought about doing something like the OOK
strategy and putting it at the *head* of the string instead of at the
tail?
We could then store buf+sizeof(refcount) in the pv slot, so to pretty
much
all code there would be no difference between a shared or unshared
string,
the SvCUR/SvLEN data would be identical, etc. The only place that
would
have to know about the difference would be code that does a
realloc(),
which would have to be taught about the offset trick.

The whole point of this COW mechanism was to upgrade existing strings
to COW strings. That precludes using a byte at the beginning of the
string, because that byte is already in use.

How about SvLEN(sv)-- when turning on COWness? (And SvLEN(sv)++ when turning it off.)

--

Father Chrysostomos

@p5pRT
Copy link
Author

p5pRT commented Aug 26, 2014

From @rurban

Since you now realized the problem with COW, maybe I can add my old
argument again how it should have be done, and all others are doing
it.

The cow refcount needs to be in it's own xpv field. It should not be
in the pv data.
COW is used for shared readonly data. Changing readonly data is not possible.

The two usecases for COW are threads and the compiler. proper threads
support is too futuristic to think about (it would need a full word
for CAS then), but the compiler already uses shared readonly strings
but needs to pessimize on the current COW strings, as the current COW
strings are not readonly. They change the ending byte.

@p5pRT
Copy link
Author

p5pRT commented Aug 26, 2014

From @demerphq

On 26 August 2014 15​:49, Reini Urban <rurban@​x-ray.at> wrote​:

Since you now realized the problem with COW,

Which problem? There were at least two I mentioned.

maybe I can add my old
argument again how it should have be done, and all others are doing
it.

The cow refcount needs to be in it's own xpv field. It should not be
in the pv data.
COW is used for shared readonly data. Changing readonly data is not
possible.

Which devolves down to some specific overhead for all strings.

I think you are right that there is a debate to be had about whether adding
such a field is a better strategy, but I dont think the case is clear-cut.

The two usecases for COW are threads and the compiler

Huh?

What do you mean by "*the* two uses cases"?

The uses cases I can think of for COW have nothing to do with threads or
the compiler.

. proper threads
support is too futuristic to think about (it would need a full word
for CAS then), but the compiler already uses shared readonly strings
but needs to pessimize on the current COW strings, as the current COW
strings are not readonly. They change the ending byte.

The current implementation is not about making COW strings read only. I
agree there is a relationship but not a direct one.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

p5pRT commented Sep 1, 2014

From @iabyn

On Tue, Aug 26, 2014 at 08​:49​:25AM -0500, Reini Urban wrote​:

The cow refcount needs to be in it's own xpv field. It should not be
in the pv data.

How would that work? Each SV has its own private XPV struct, but a shared
PV. If the refcount was stored in the XPV, each SV would have its own
private refcount, that would quickly get out of sync.

Or are you suggesting instead that a single XPV should be shared between
multiple SV heads?

--
In England there is a special word which means the last sunshine
of the summer. That word is "spring".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant