Add optimization for simple multideref function calls #20991

atrodo · 2023-03-30T03:02:42Z

As an experiment and research into the perl internals, I am attempting to add an optimization for simple, but somewhat common, functions that simply return values from the first argument. This draft PR represents my efforts so far, but I am running into issues with my poor understanding of how the stack operates.

The optimization looks for functions that are of the form { return $_[0]->{X} }, marks them as a multideref accessor, and then runs the mutlideref instead of creating a stack frame and calling the body of the function. It is currently disabled unless the PERL_MULTIDEREF_ACC environment variable has a non-zero number. Below is script that exercises the optimization when enabled; currently functions x and n are optimized, but I am hoping to get all the functions, except for new, to be optimizable.

use strict;
use warnings;

my $n = bless {};

package Point {
    sub new {
        return bless { x => rand, n => $n }, 'Point';
    }
    sub x {
        return $_[0]->{x};
    }
    sub y {
        my $self = shift;
        return $self->{x};
    }
    sub z {
        1;
        return $_[0]->{x};
    }
    sub n  { return $_[0]->{n} }
    sub nn { 1; return $_[0]->{n} }
    sub s  { $_[0]->{s} = $_[1] }
    sub m {
        return $_[0]->{m}
          if @_ > 1;
        $_[0]->{m} = $_[1];
    }
}

my $a = Point->new;
warn $a->n;
warn $a->{x};
warn $a->x;
warn $a->n->x;

__END__
use Benchmark qw/cmpthese/;

cmpthese(
    30_000_000,
    {
        'multideref'   => sub { $a->x },
        'static_fn'    => sub { Point::x($a) },
        'two_step'     => sub { $a->y },
        'const_prefix' => sub { $a->z },
        'direct'       => sub { $a->{x} },
    }
);

There are several things that I'm unsure about, specifically the location of the optimization since no optimization is in peep although that felt the most appropriate place, the addition of the multideref accessor flag on control ops, the global static AV optimization I used (which I know I need to figure out), and generally if I am doing/structuring things right, even though it mostly works.

However, the main issue I'm having is that, in the script above, calling $a->n->x causes a Can't call method "x" on an undefined value error even though $a->n returns the correct object. I suspect its because I'm putting the stack into a unexpected state, but I don't know how the stack works well enough in order to know what the expected state is suppose to be. Any assistance in understanding what I am doing wrong would be appreciated.

…able

It appears more reliable than checking the struct in the COP

iabyn · 2023-03-30T15:51:33Z

On Wed, Mar 29, 2023 at 08:03:00PM -0700, Jon Gentle wrote: As an experiment and research into the perl internals, I am attempting to add an optimization for simple, but somewhat common, functions that simply return values from the first argument. This draft PR represents my efforts so far, but I am running into issues with my poor understanding of how the stack operates.

While we try and encourage new contributors, I'm not minded to devote more than the few minutes I've already spent trying to review your work because you have made it hard to do so. Your PR has 4 commits, each of which just has just a single one-line cryptic description, with no indication of what the commit does. Then your email doesn't help by providing any clear overview of what the optimisation actually does. I *think* it works by marking candidate optimisible subs (ones consisting of just a single multideref op) by adding some info in the sub's first NEXTSTATE op. Then in pp_entersub, you've got some runtime code which, when about to call a perl (as opposed to XS) sub, checks whether the sub has this info in the NS op, and instead calls the body of of pp_multideref, which has been extracted from the pp function as a static sub. Is this right? First, some major observations. Optimisations like this makes me *extremely* twitchy. It potentially slows down *every* sub call and every multideref, and it's not clear whether that slowdown is worth it for any possible speed up when calling accessor-type methods. Ad an aside, I can't see the results of benchmarking in your email. Also, this sort of code tends to have many edge cases: what about a sub called without args, such as C<&foo;> or called via goto: C<goto &foo> for example? What about methods declared using the new class {} syntax? Also also, the two major projects I'm currently working on or intend to work on next (reference-counted stack and eliminating @_ in subs with signatures), both mess around a lot with how arguments are passed to subs and/or copied to @_, so I'd prefer it if this area of code was left alone by others for a while (for the next year say). Part of my twitching is that this a very special-case optimisation: pp_entersub is being trained to handle just *one* particular type of sub inline. What about other possible forms of inlining? Will pp_entersub become a morass of special-cases for each type of possible inlining? Surely a more general approach should be taken. For example, inlining the ops from the sub into the call-site when the call-site is compiled, as a form of speculative optimisation if the compiler can make a guess at what sub/method is likely to be called, with a runtime check and fallback to calling the real sub if the guess was wrong. (This is not easy - I'm just putting it out as a top-of-my-head example of something more general that might be aimed for.) I can't answer most of the questions in your email as I don't understand them (which is partly a side-effect of the lack of description and my lack of examination by me alluded to above.) Other observations: Setting an environment variable isn't the usual way to enable an optimisation. In perl it's usually always enabled (or via a build option if its new and controversial). It absolutely should not be used at run-time - i.e. being checked for by every sub call. A static var is obviously wrong: perl supports multithreading with multiple interpreters, so a static variable is almost never what's required. As to what is actually required, I don't know, as I don't know what the static var's purpose is. You say you're confused by the stack, but it's difficult to help as you don't make it clear what areas of it confuse you. In general, perl ops expect pointers to their args on the stack, pop them off, and pop their results onto the stack. So pp_add() pops two args off the stack and pushes one result. List ops expect a variable number of args, the base of which is indicated by the top entry on the markstack. pp_entersub is different in that it pops a list of args off the stack, and then moves them to @_. pp_entersub also switches PL_curpad to point to the pad belonging to the new sub, so that lexical variable accesses (as done by pp_padsv, pp_multideref etc) are array accesses within the called sub's PL_curpad[] rather than the caller's PL_curpad[]. It you want to indicate that the sub is optimised for a direct multideref call, it would be better to set a CVf_ flag in the CV's CvFLAGS() field: it would then be quicker for pp_entersub to check for this special case. Where do warning messages and errors (such as undefined variable) appear to come from: from the appropriate line in the method, or from the call site? Perl normally handles this by pp_nextstate() setting PL_curcop as the first action on calling a sub. Is the correct void/scalar/list context and rvalue/lvalue context preserved? For fine-grained benchmarks, you should be looking into adding tests to t/perf/benchmarks and running Porting/bench.pl on them.

…

-- Red sky at night - gerroff my land! Red sky at morning - gerroff my land! -- old farmers' sayings #14

atrodo · 2023-03-31T04:11:09Z

While we try and encourage new contributors, I'm not minded to devote more
than the few minutes I've already spent trying to review your work because
you have made it hard to do so.

I am very sorry, I did not mean to cause any frustration on anyone's part. I created this as a draft PR in GitHub and thought about it in those terms, but I see now that I could have done a lot more to make this easier on others. This optimisation and PR are part of an educational process for me, I should have made that clearer. I am in the process of trying to understand more of the perl internals personally, and was using this idea as a way to do such. I know that this optimisation has a long way to go before before it could be included, if its viable at all.

I greatly appreciate and value the amount of time you have taken to respond and as such will keep my responses short.

I think it works by marking candidate optimisible subs (ones
consisting of just a single multideref op) by adding some info in the
sub's first NEXTSTATE op.

Then in pp_entersub, you've got some runtime code which, when about to
call a perl (as opposed to XS) sub, checks whether the sub has this info
in the NS op, and instead calls the body of of pp_multideref, which has
been extracted from the pp function as a static sub.

Is this right?

Your understanding of the operation is correct.

First, some major observations.

Optimisations like this makes me extremely twitchy. It potentially slows
down every sub call and every multideref, and it's not clear whether
that slowdown is worth it for any possible speed up when calling
accessor-type methods. Ad an aside, I can't see the results of
benchmarking in your email.

I agree, I am not yet sure this optimization has any value. I am currently working through this optimisation, asking the question, personally, if this is even viable. You have some great questions and I don't have answers to them at this point. I will make sure I can answer these questions and concerns in any future PRs.

I intentionally did not add any benchmarking numbers as I have no faith in whatever results I do have. Since the code currently breaks some normal function calls, I am unable to conduct any good benchmarks yet.

Other observations:

Setting an environment variable isn't the usual way to enable an
optimisation. In perl it's usually always enabled (or via a build option
if its new and controversial). It absolutely should not be used at
run-time - i.e. being checked for by every sub call.

The environment variable is a side effect of my process: without being able to selectively disable the optimsation, make cannot finish. It is not a permanent fixture.

You say you're confused by the stack, but it's difficult to help as you
don't make it clear what areas of it confuse you. In general, perl ops
expect pointers to their args on the stack, pop them off, and pop their
results onto the stack. So pp_add() pops two args off the stack and pushes
one result. List ops expect a variable number of args, the base of which
is indicated by the top entry on the markstack.

I appreciate the explanation. The actual issue I am having trouble with is around the actual calling of S_multideref. My understanding of the code that I have right now is that it clears the stack (SP = MARK) and closes the arguments list (PUTBACK) before calling S_multideref. The ordering of those doesn't make sense to me, but swapping the order of them means the argument stack is not cleared. In simple cases like $object->accessor, the result ends up on the stack. In the slightly more complex case of $object->accessor->accessor, undef ends up on the stack instead. I suspect it's because of my misunderstanding of how these macros interact with each other. I will investigate more.

It you want to indicate that the sub is optimised for a direct multideref
call, it would be better to set a CVf_ flag in the CV's CvFLAGS() field:
it would then be quicker for pp_entersub to check for this special case.

This is a good suggestion. The way it is structured now is because of I discovered how things work; the body of a CV is passed to the peephole optimizer, not the CV itself, and that was easier to use since all CV bodies are passed to it. A single if statement in pp_entersub would be better than what I have now, so I will aim for that.

For fine-grained benchmarks, you should be looking into adding tests to
t/perf/benchmarks and running Porting/bench.pl on them.

I appreciate the tip, I will make sure to use that to validate if this optimsation creates a gain, not a general slowdown.

Again, I am very sorry about making this difficult. I should have done a better job of expressing the problem I was actually looking for help with. I will take all of your feedback and plan to close this PR in the near future.

demerphq · 2023-03-31T09:37:56Z

The environment variable is a side effect of my process: without being able to selectively disable the optimsation, make cannot finish. It is not a permanent fixture.

Add a var to intrpvar.h and then initialize it from the environment in perl_construct(). You can find similar logic for how hash seed initialization is done. Be aware that anything set up in intrpvar.h needs support in Perl_parser_dup() in sv.c.

iabyn · 2023-03-31T11:45:54Z

On Thu, Mar 30, 2023 at 09:11:19PM -0700, Jon Gentle wrote: I appreciate the explanation. The actual issue I am having trouble with is around the actual calling of S_multideref. My understanding of the code that I have right now is that it clears the stack (`SP = MARK`) and closes the arguments list (`PUTBACK`) before calling S_multideref. The ordering of those doesn't make sense to me, but swapping the order of them means the argument stack is not cleared. In simple cases like `$object->accessor`, the result ends up on the stack. In the slightly more complex case of `$object->accessor->accessor`, undef ends up on the stack instead. I suspect it's because of my misunderstanding of how these macros interact with each other. I will investigate more.

pp_multideref is an optimisation that merges several simple array and hash access ops into a single op which runs a mini state machine. Depending on the exact sequence of actions, the op may expect zero or one arguments passed to it on the stack, but will always push one result SV onto the stack before returning. You can see these in the bits of code in pp_multideref() which declare dSP. pp_entersub is a *list* op. On entry to it, *PL_markstack_ptr holds an index into the stack. The SVs from PL_stack_base[*PL_markstack_ptr + 1] .. PL_stack_sp are the arguments to the function - or to put it another way, the current stack frame. Normally pp_entersub() is expected to pop the CV off the stack, copy all the remaining arguments into @_, push a CXt_SUB context onto the context stack, set the blk_oldsp in that context struct to the base of the current stack frame, pop the mark (PL_markstack_ptr--), then pass control to the first op in the sub. This first op is always an OP_NEXTSTATE, and one of its actions is to reset the stack: PL_stack_sp = PL_stack_base + CX_CUR()->blk_oldsp; This has the effect of popping all the sub arguments off the stack (which have already been copied to @_). This is a rather clumsy approach, but it works. On the other hand, pp_multideref() is *not* a list op: it is a unary or nullary [is that the right word?] op which consumes zero or one items off the stack depending on circumstances. Since it is not a list op, it does not pop a mark off the mark stack when it has finished. Your problem is that for a simple method like sub foo { $_[0]->{foo} } pp_multideref is expecting to access @_, which has already been populated by pp_entersub. If pp_entersub has skipped setting @_, then the pp_multideref actions need to be changed to tell it to get its arg from the stack rather than $_[0]. Similarly, pp_entersub needs to know that the method only consumes one arg, so it needs to convert its arg list (which it can assert contains only one element) into a single arg on the stack (by popping the mark), which is then consumed by pp_multideref. So amongst other things, your code in the optimiser needs to carefully examine the multideref op and only inline those those functions where the multideref gets its initial value from $_[0] (and change it so it expects the argument on the stack).

…

-- Please note that ash-trays are provided for the use of smokers, whereas the floor is provided for the use of all patrons. -- Bill Royston

atrodo · 2023-04-01T02:22:55Z

Thank you for that explanation, it is extremely helpful in understanding what is happening in both pp_multideref and pp_entersub. That will be very helpful as I continue my discovery.

atrodo added 5 commits March 28, 2023 23:03

WIP Add a fast path for simple multideref methods

9639bf2

WIP Add addl requirements on mulideref accessors including a env vari…

49da5bf

…able

Make a copy of the multideref items for the accessor

1bc554a

Add a private flag to signal md_accessor is in play

d294a66

It appears more reliable than checking the struct in the COP

Change to temporarily setting @_ for the multideref call

a2a4485

atrodo closed this Apr 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add optimization for simple multideref function calls #20991

Add optimization for simple multideref function calls #20991

Uh oh!

atrodo commented Mar 30, 2023

Uh oh!

iabyn commented Mar 30, 2023 via email

Uh oh!

atrodo commented Mar 31, 2023

Uh oh!

demerphq commented Mar 31, 2023

Uh oh!

iabyn commented Mar 31, 2023 via email

Uh oh!

atrodo commented Apr 1, 2023

Uh oh!

Uh oh!

Add optimization for simple multideref function calls #20991

Add optimization for simple multideref function calls #20991

Uh oh!

Conversation

atrodo commented Mar 30, 2023

Uh oh!

iabyn commented Mar 30, 2023 via email

Uh oh!

atrodo commented Mar 31, 2023

Uh oh!

demerphq commented Mar 31, 2023

Uh oh!

iabyn commented Mar 31, 2023 via email

Uh oh!

atrodo commented Apr 1, 2023

Uh oh!

Uh oh!