-
Notifications
You must be signed in to change notification settings - Fork 52
Specialize binary ops according to reference count as well as type. #662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I performed an experiment to see how well binary ops specialize for reference count and immortality. They specialize quite well, both for the fraction specialized and the hit rates. Refcount stability seems quite good. Numbers are in millions of instructions. Full stats
So if we specialize just for refcounts, we will want to specialize for Specializing for type and refcountIf we are specializing for refcount and type, we will obviously need to use some sort of table approach to handle the many combinations. We will need to guard on the refcounts, and on the types. Both types of guards can be potentially removed in the tier 2 optimizer. |
We can split specialized For the operations we care about types and where the left or right side has refcount of 1. Here's the data for all
|
Top 100 most common type/op/type/refcount quads, including indexing and comparisons:
|
Uh oh!
There was an error while loading. Please reload this page.
By specializing
BINARY_OP
by reference count as well as by type we can potentially speedup tier 2 jitted binary operations quite a lot.It is quite common that one of the operands of
BINARY_OP
has a reference count of 1, so it is profitable to avoid object allocation and reuse that object. We already do that forfloat
specializations ofBINARY_OP
.The problem is that the resulting code is quite branchy, and thus bulky, which is a problem for the jit.
If we specialize by refcount, the code size shrinks dramatically.
For example, the x86-64 code size for
_BINARY_OP_MULTIPLY_FLOAT
is 232 bytes.If we specialize for the left hand side having a refcount of 1 the equivalent instruction is 143* with a guard for the refcount.
If we assume that the optimizer can remove the guard, then the size drops to 74 bytes.
If we also specialize for the right hand side having a refcount > 1, then the code size (without guards) drops to 41 bytes, less than 20% of the original code size.
Pyston performs this optimization. It is, I believe, the only specialization that Pyston does that we don't.
There isn't much room for more tier 1 instructions, but by combining this approach with #660 we can implement this without excessive tier 1 specialization.
Currently,
BINARY_OP
specializes to one of:We can replace these with:
The
L1
andR1
suffices mean "left has refcount of one" and "right has refcount of one" respectively.int
is still worth treating separately because it is both common and more awkward due to the caching of small values.(ignoring BINARY_OP_INPLACE_ADD_UNICODE as a special case)
*This will go down once we have hot/cold splitting
The text was updated successfully, but these errors were encountered: