-
Notifications
You must be signed in to change notification settings - Fork 577
Add optimization for simple multideref function calls #20991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
It appears more reliable than checking the struct in the COP
On Wed, Mar 29, 2023 at 08:03:00PM -0700, Jon Gentle wrote:
As an experiment and research into the perl internals, I am attempting
to add an optimization for simple, but somewhat common, functions that
simply return values from the first argument. This draft PR represents
my efforts so far, but I am running into issues with my poor
understanding of how the stack operates.
While we try and encourage new contributors, I'm not minded to devote more
than the few minutes I've already spent trying to review your work because
you have made it hard to do so.
Your PR has 4 commits, each of which just has just a single one-line
cryptic description, with no indication of what the commit does. Then
your email doesn't help by providing any clear overview of what the
optimisation actually does.
I *think* it works by marking candidate optimisible subs (ones
consisting of just a single multideref op) by adding some info in the
sub's first NEXTSTATE op.
Then in pp_entersub, you've got some runtime code which, when about to
call a perl (as opposed to XS) sub, checks whether the sub has this info
in the NS op, and instead calls the body of of pp_multideref, which has
been extracted from the pp function as a static sub.
Is this right?
First, some major observations.
Optimisations like this makes me *extremely* twitchy. It potentially slows
down *every* sub call and every multideref, and it's not clear whether
that slowdown is worth it for any possible speed up when calling
accessor-type methods. Ad an aside, I can't see the results of
benchmarking in your email.
Also, this sort of code tends to have many edge cases: what about a sub
called without args, such as C<&foo;> or called via goto: C<goto &foo> for
example? What about methods declared using the new class {} syntax?
Also also, the two major projects I'm currently working on or intend to
work on next (reference-counted stack and eliminating @_ in subs with
signatures), both mess around a lot with how arguments are passed to subs
and/or copied to @_, so I'd prefer it if this area of code was left alone
by others for a while (for the next year say).
Part of my twitching is that this a very special-case optimisation:
pp_entersub is being trained to handle just *one* particular type of sub
inline. What about other possible forms of inlining? Will pp_entersub
become a morass of special-cases for each type of possible inlining?
Surely a more general approach should be taken.
For example, inlining the ops from the sub into the call-site when the
call-site is compiled, as a form of speculative optimisation if the
compiler can make a guess at what sub/method is likely to be called, with
a runtime check and fallback to calling the real sub if the guess was
wrong. (This is not easy - I'm just putting it out as a top-of-my-head
example of something more general that might be aimed for.)
I can't answer most of the questions in your email as I don't understand
them (which is partly a side-effect of the lack of description and my lack
of examination by me alluded to above.)
Other observations:
Setting an environment variable isn't the usual way to enable an
optimisation. In perl it's usually always enabled (or via a build option
if its new and controversial). It absolutely should not be used at
run-time - i.e. being checked for by every sub call.
A static var is obviously wrong: perl supports multithreading with
multiple interpreters, so a static variable is almost never what's
required. As to what is actually required, I don't know, as I don't know
what the static var's purpose is.
You say you're confused by the stack, but it's difficult to help as you
don't make it clear what areas of it confuse you. In general, perl ops
expect pointers to their args on the stack, pop them off, and pop their
results onto the stack. So pp_add() pops two args off the stack and pushes
one result. List ops expect a variable number of args, the base of which
is indicated by the top entry on the markstack.
pp_entersub is different in that it pops a list of args off the stack, and
then moves them to @_.
pp_entersub also switches PL_curpad to point to the pad belonging to the
new sub, so that lexical variable accesses (as done by pp_padsv,
pp_multideref etc) are array accesses within the called sub's PL_curpad[]
rather than the caller's PL_curpad[].
It you want to indicate that the sub is optimised for a direct multideref
call, it would be better to set a CVf_ flag in the CV's CvFLAGS() field:
it would then be quicker for pp_entersub to check for this special case.
Where do warning messages and errors (such as undefined variable) appear
to come from: from the appropriate line in the method, or from the call
site? Perl normally handles this by pp_nextstate() setting PL_curcop as
the first action on calling a sub.
Is the correct void/scalar/list context and rvalue/lvalue context
preserved?
For fine-grained benchmarks, you should be looking into adding tests to
t/perf/benchmarks and running Porting/bench.pl on them.
…--
Red sky at night - gerroff my land!
Red sky at morning - gerroff my land!
-- old farmers' sayings #14
|
I am very sorry, I did not mean to cause any frustration on anyone's part. I created this as a draft PR in GitHub and thought about it in those terms, but I see now that I could have done a lot more to make this easier on others. This optimisation and PR are part of an educational process for me, I should have made that clearer. I am in the process of trying to understand more of the perl internals personally, and was using this idea as a way to do such. I know that this optimisation has a long way to go before before it could be included, if its viable at all. I greatly appreciate and value the amount of time you have taken to respond and as such will keep my responses short.
Your understanding of the operation is correct.
I agree, I am not yet sure this optimization has any value. I am currently working through this optimisation, asking the question, personally, if this is even viable. You have some great questions and I don't have answers to them at this point. I will make sure I can answer these questions and concerns in any future PRs. I intentionally did not add any benchmarking numbers as I have no faith in whatever results I do have. Since the code currently breaks some normal function calls, I am unable to conduct any good benchmarks yet.
The environment variable is a side effect of my process: without being able to selectively disable the optimsation, make cannot finish. It is not a permanent fixture.
I appreciate the explanation. The actual issue I am having trouble with is around the actual calling of S_multideref. My understanding of the code that I have right now is that it clears the stack (
This is a good suggestion. The way it is structured now is because of I discovered how things work; the body of a CV is passed to the peephole optimizer, not the CV itself, and that was easier to use since all CV bodies are passed to it. A single if statement in pp_entersub would be better than what I have now, so I will aim for that.
I appreciate the tip, I will make sure to use that to validate if this optimsation creates a gain, not a general slowdown. Again, I am very sorry about making this difficult. I should have done a better job of expressing the problem I was actually looking for help with. I will take all of your feedback and plan to close this PR in the near future. |
Add a var to intrpvar.h and then initialize it from the environment in perl_construct(). You can find similar logic for how hash seed initialization is done. Be aware that anything set up in intrpvar.h needs support in Perl_parser_dup() in sv.c. |
On Thu, Mar 30, 2023 at 09:11:19PM -0700, Jon Gentle wrote:
I appreciate the explanation. The actual issue I am having trouble with
is around the actual calling of S_multideref. My understanding of the
code that I have right now is that it clears the stack (`SP = MARK`) and
closes the arguments list (`PUTBACK`) before calling S_multideref. The
ordering of those doesn't make sense to me, but swapping the order of
them means the argument stack is not cleared. In simple cases like
`$object->accessor`, the result ends up on the stack. In the slightly
more complex case of `$object->accessor->accessor`, undef ends up on the
stack instead. I suspect it's because of my misunderstanding of how
these macros interact with each other. I will investigate more.
pp_multideref is an optimisation that merges several simple array and hash
access ops into a single op which runs a mini state machine. Depending on
the exact sequence of actions, the op may expect zero or one arguments
passed to it on the stack, but will always push one result SV onto the
stack before returning. You can see these in the bits of code in
pp_multideref() which declare dSP.
pp_entersub is a *list* op. On entry to it, *PL_markstack_ptr holds an
index into the stack. The SVs from
PL_stack_base[*PL_markstack_ptr + 1] .. PL_stack_sp
are the arguments to the function - or to put it another way, the current
stack frame. Normally pp_entersub() is expected to pop the CV off the
stack, copy all the remaining arguments into @_, push a CXt_SUB context
onto the context stack, set the blk_oldsp in that context struct to the
base of the current stack frame, pop the mark (PL_markstack_ptr--), then
pass control to the first op in the sub.
This first op is always an OP_NEXTSTATE, and one of its actions is to
reset the stack:
PL_stack_sp = PL_stack_base + CX_CUR()->blk_oldsp;
This has the effect of popping all the sub arguments off the stack (which
have already been copied to @_). This is a rather clumsy approach, but it
works.
On the other hand, pp_multideref() is *not* a list op: it is a unary or
nullary [is that the right word?] op which consumes zero or one items off
the stack depending on circumstances. Since it is not a list op, it does
not pop a mark off the mark stack when it has finished.
Your problem is that for a simple method like
sub foo { $_[0]->{foo} }
pp_multideref is expecting to access @_, which has already been
populated by pp_entersub. If pp_entersub has skipped setting @_, then
the pp_multideref actions need to be changed to tell it to get its arg
from the stack rather than $_[0]. Similarly, pp_entersub needs to know
that the method only consumes one arg, so it needs to convert its arg list
(which it can assert contains only one element) into a single arg on the
stack (by popping the mark), which is then consumed by pp_multideref.
So amongst other things, your code in the optimiser needs to carefully
examine the multideref op and only inline those those functions where
the multideref gets its initial value from $_[0] (and change it so it
expects the argument on the stack).
…--
Please note that ash-trays are provided for the use of smokers,
whereas the floor is provided for the use of all patrons.
-- Bill Royston
|
Thank you for that explanation, it is extremely helpful in understanding what is happening in both pp_multideref and pp_entersub. That will be very helpful as I continue my discovery. |
As an experiment and research into the perl internals, I am attempting to add an optimization for simple, but somewhat common, functions that simply return values from the first argument. This draft PR represents my efforts so far, but I am running into issues with my poor understanding of how the stack operates.
The optimization looks for functions that are of the form
{ return $_[0]->{X} }
, marks them as a multideref accessor, and then runs the mutlideref instead of creating a stack frame and calling the body of the function. It is currently disabled unless thePERL_MULTIDEREF_ACC
environment variable has a non-zero number. Below is script that exercises the optimization when enabled; currently functionsx
andn
are optimized, but I am hoping to get all the functions, except for new, to be optimizable.There are several things that I'm unsure about, specifically the location of the optimization since no optimization is in
peep
although that felt the most appropriate place, the addition of the multideref accessor flag on control ops, the global static AV optimization I used (which I know I need to figure out), and generally if I am doing/structuring things right, even though it mostly works.However, the main issue I'm having is that, in the script above, calling
$a->n->x
causes aCan't call method "x" on an undefined value
error even though$a->n
returns the correct object. I suspect its because I'm putting the stack into a unexpected state, but I don't know how the stack works well enough in order to know what the expected state is suppose to be. Any assistance in understanding what I am doing wrong would be appreciated.