Add combined opcodes #87850

gvanrossum · 2021-03-31T22:40:33Z

BPO	43684
Nosy	@gvanrossum, @tim-one, @rhettinger, @terryjreedy, @gpshead, @markshannon, @ericsnowcurrently, @serhiy-storchaka, @pablogsal, @brandtbucher, @isidentical
PRs	bpo-43684: Add ADD_INT opcode #25090

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2021-03-31.22:40:32.519>
labels = ['3.10', 'performance']
title = 'Add combined opcodes'
updated_at = <Date 2021-04-06.21:41:49.532>
user = 'https://github.com/gvanrossum'

bugs.python.org fields:

activity = <Date 2021-04-06.21:41:49.532>
actor = 'gvanrossum'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = []
creation = <Date 2021-03-31.22:40:32.519>
creator = 'gvanrossum'
dependencies = []
files = []
hgrepos = []
issue_num = 43684
keywords = ['patch']
message_count = 11.0
messages = ['389939', '389967', '390101', '390104', '390105', '390109', '390263', '390292', '390350', '390373', '390378']
nosy_count = 11.0
nosy_names = ['gvanrossum', 'tim.peters', 'rhettinger', 'terry.reedy', 'gregory.p.smith', 'Mark.Shannon', 'eric.snow', 'serhiy.storchaka', 'pablogsal', 'brandtbucher', 'BTaskaya']
pr_nums = ['25090']
priority = 'normal'
resolution = None
stage = 'patch review'
status = 'open'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue43684'
versions = ['Python 3.10']

gvanrossum · 2021-03-31T22:40:32Z

I'm lining up some PRs (inspired by some of Mark Shannon's ideas) that add new opcodes which are straightforward combinations of existing opcodes. For example, ADD_INT is equivalent to LOAD_CONST + BINARY_ADD, for certain small (common) integer constants.

Each of these adds only a minor speedup, but after a dozen or so of these the speedup is (hopefully) significant enough to warrant the opcode churn.

serhiy-storchaka · 2021-04-01T09:53:06Z

We usually rejected such propositions unless there were evidences of significant effect on performance. I myself withdrawn several my patches after analyzing statistics.

What percent of opcodes will be ADD_INT in the compiled bytecode? What percent of executed opcodes will be ADD_INT? Compare it with other opcodes, in particularly BINARY_ADD. What are microbenchmark results?

gvanrossum · 2021-04-02T21:23:12Z

Microbench results for ADD_INT:
#25090 (comment)

I have a few others like this lined up -- Mark recommended submitting them one at a time to make review easier.

My philosophy here (which I learned from Tim Peters in the early 2000s) is that even though each individual improvement has no measurable effect on a general benchmark (as shown in the same comment), the combined effect of a number of tiny improvements can be significant.

We have reached a point where there are very few individual changes possible that still have a significant effect, but we should not use this as an excuse to stop trying to improve performance.

I have mislaid the results that made me think this particular opcode is a good candidate; I will go back to reproduce that result and get back to you here.

tim-one · 2021-04-02T21:58:54Z

"""
My philosophy here (which I learned from Tim Peters in the early 2000s) is that even though each individual improvement has no measurable effect on a general benchmark (as shown in the same comment), the combined effect of a number of tiny improvements can be significant.
"""

And sometimes more so than the naïve sum of their individual contributions! Although this was often much clearer on older architectures and compilers (both more predictable).

Indeed, in the old days I routinely checked "optimizations" in that _slowed_ microbenchmarks, provided they reduced the work on the critical path, and didn't make the code markedly harder to follow (but reducing the critical path usually makes the code clearer!).

Because, sooner or later, compilers will get smart enough to see what I saw, and generate better code. And then these can compound. Like "oh! these three temp variables don't actually need to be materialized at all anymore, and suddenly I have few enough that do need to be materialized that _all_ of them can live in fast registers...". So, at some point, the next seemingly insignificant optimization checked in could _appear_ to have an "inexplicable" benefit, by breaking the bottleneck on _some_ inscrutable HW resource overtaxed on the critical path by the original code.

So optimize for what a smart compiler will eventually do, not what they happen to do today ;-) Profile-guided optimization was a major step in that direction.

Saddest to me: layers of indirection introduced to support marginal features, because "well, they don't slow down pybench by much at all". That's the opposite: over time, they can compound to do worse damage than the sum of their individual burdens.

In the end, Everything Counts™.

gvanrossum · 2021-04-02T22:18:21Z

Statically, in the stdlib, about 1 in a 1000 opcodes is an ADD_INT candidate.

I'll have to do some more work to come up with a dynamic count.

rhettinger · 2021-04-02T22:43:01Z

Statically, in the stdlib, about 1 in a 1000 opcodes
is an ADD_INT candidate.

I would expect that ADD_INT would typically occur in loops, so even if it is a rare opcode statically, it will really count when it used.

gvanrossum · 2021-04-05T21:33:05Z

So dynamically, I see about 0.5% of all opcodes are ADD_INT, so higher than statically (which was 0.1%) but still small enough that I'm abandoning work on ADD_INT.

I've got a branch where I'm adding POP_JUMP_IF_NONE and POP_JUMP_IF_NOT_NONE (which capture if x is None, if x is not None) but the static count there is still pretty low, around 0.2% statically. These numbers are remarkably stable -- I get 0.2% exactly for mypy, 0.18% for the stdlib. (A bit more in the azure-cli source code: 0.35%.)

I expect that the dynamic count here would be a bit higher too: without running it, I'd expect around 1%, if we take it that the average loop does about 5 iterations -- my results for ADD_INT would lead to that conclusion, and this was independently verified by an Instagram blog post from 2017. (https://instagram-engineering.com/profiling-cpython-at-instagram-89d4cbeeb898, search for "loopyness")

Is that enough to persevere? I really don't know.

I have a few other potential opcode combinations, RETURN_CONST and RETURN_NONE, but since those (by definition) don't occur in a loop, I'm even less optimistic about the value of those.

We really should optimize things like LOAD_FAST + LOAD_FAST, which occurs much more frequently. However the way to encode that combination is pretty gross, and I'd rather do that another time.

serhiy-storchaka · 2021-04-06T08:01:23Z

Interesting. What code did you use to collect statistics? I used patches in bpo-27255. Perhaps it is worth to add an optionally compiled code for collecting dynamic opcode statistics permanently.

We have around 125 opcodes, so 0.5% looks too small. But what is the range of that opcode? How many opcodes are more frequent? And what is the range of BINARY_ADD for comparison.

RETURN_CONST or RETURN_NONE was already discussed before. They can save few bytes in bytecode, but they are never used in tight loops and the time difference is dwarfed by the frame creation overhead in any case.

For frequently used pairs of opcodes we have the PREDICT() macros which reduces overhead.

gvanrossum · 2021-04-06T16:00:14Z

Interesting. What code did you use to collect statistics?

For static statistics I wrote my own script: https://github.com/python/cpython/pull/25090/files#diff-994d3592c951c78cbe71084d562590d1507ddfed767e2ec040f5e2610845a11c. I can add that to Tools permanently (I have a slightly improved version that can look for different things).

I used patches in bpo-27255. Perhaps it is worth to add an optionally compiled code for collecting dynamic opcode statistics permanently.

We have that already: compile with -DDYNAMIC_EXECUTION_PROFILE -DDXPAIRS. There's a script (Tools/scripts/analyze_dxp.py) that works with such data, but it's primitive. Eric is working on improvements; I added my own hacky script to run an app and collect the data.

I collected a few different sets of statistics: static stats for the stdlib and for mypy, dynamic stats for running mypy and a few of the benchmarks in pyperformance. Eric and I have plans to do this more systematically; we'll then publish our tools and results.

brandtbucher · 2021-04-06T20:43:40Z

I collected a few different sets of statistics: static stats for the stdlib and for mypy, dynamic stats for running mypy and a few of the benchmarks in pyperformance.

I'm sure you've considered this, but I'd be *very* careful using opcode stats from the official benchmarks to inform work on these improvements. It's probably better to stick to in-sample data from the stdlib / mypy / etc. and use the benchmarks *only* for measuring out-of-sample results.

Otherwise we may just end up hacking the benchmark suite. :)

gvanrossum · 2021-04-06T21:41:49Z

Yeah, I've received many warnings about benchmark hacking already, and we
won't fall to that.

For static statistics, it's easy enough to either download a thousand of
the most popular projects from GitHub or PyPI and just try to compile all
the .py files found there.

For dynamic statistics, the problem is always to come up with a
representative run. I know how to do that for mypy (just running mypy over
itself should be pretty good), but we can't do that for random projects
from PyPI or GitHub -- not only do I not want to run 3rd party code I
haven't eyeballed reviewed carefully first, but usually there's no way to
know how to even run it -- configuration, environment, input data all need
to be fixed, and dependencies need to be installed.

Running test suites (whether it's the stdlib tests or some other set of
unit tests for some other code base) is likely to give skewed results,
given how people often write tests (trying to hit each line of code at
least once but as few times as possible so the tests run quickly). This is
where benchmarks provide some limited value, since they come with a
standard way to run non-interactively.

I wish it was the case that static stats can be a good proxy for dynamic
stats, since then we could just compute static stats for a large amount of
3rd party code, but I'm not too sure of that. So I would like to have more
examples that we can measure both statically and dynamically -- if the
dynamic numbers look sufficiently similar to the static numbers, we can see
that as an indication that static numbers are a good indicator, and if they
look different, it means that we're going to have to collect more examples
in order to make progress.

iritkatriel · 2022-09-10T17:01:46Z

We now have POP_JUMP_IF_NONE and LOAD_FAST__LOAD_FAST. So most of the ideas here are implemented, and a few of the others were rejected.

If nobody will object I will close this as completed.

gvanrossum added 3.10 only security fixes performance Performance or resource usage labels Mar 31, 2021

ezio-melotti transferred this issue from another repository Apr 10, 2022

iritkatriel added the pending The issue will be closed if no feedback is provided label Sep 10, 2022

iritkatriel closed this as completed Sep 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add combined opcodes #87850

Add combined opcodes #87850

gvanrossum commented Mar 31, 2021

gvanrossum commented Mar 31, 2021

Uh oh!

serhiy-storchaka commented Apr 1, 2021

Uh oh!

gvanrossum commented Apr 2, 2021

Uh oh!

tim-one commented Apr 2, 2021

Uh oh!

gvanrossum commented Apr 2, 2021

Uh oh!

rhettinger commented Apr 2, 2021

Uh oh!

gvanrossum commented Apr 5, 2021

Uh oh!

serhiy-storchaka commented Apr 6, 2021

Uh oh!

gvanrossum commented Apr 6, 2021

Uh oh!

brandtbucher commented Apr 6, 2021

Uh oh!

gvanrossum commented Apr 6, 2021

Uh oh!

iritkatriel commented Sep 10, 2022

Uh oh!

Uh oh!

Add combined opcodes #87850

Add combined opcodes #87850

Comments

gvanrossum commented Mar 31, 2021

gvanrossum commented Mar 31, 2021

Uh oh!

serhiy-storchaka commented Apr 1, 2021

Uh oh!

gvanrossum commented Apr 2, 2021

Uh oh!

tim-one commented Apr 2, 2021

Uh oh!

gvanrossum commented Apr 2, 2021

Uh oh!

rhettinger commented Apr 2, 2021

Uh oh!

gvanrossum commented Apr 5, 2021

Uh oh!

serhiy-storchaka commented Apr 6, 2021

Uh oh!

gvanrossum commented Apr 6, 2021

Uh oh!

brandtbucher commented Apr 6, 2021

Uh oh!

gvanrossum commented Apr 6, 2021

Uh oh!

iritkatriel commented Sep 10, 2022

Uh oh!