Speeding up arithmetic.

Python int arithmetic is very slow. We need to do a lot of work, checking the size and shape of ints before adding them, and then yet more work boxing them.

We want to minimize the amount of work we do. Many arithmetic expressions consume temporary values which we can reuse, if we discard "small ints" and use information that the compiler can gather to determine whether an operation is one of the forms:

temporary_value + ...
var += ... or var = var + ...

In the first case, the refcount of temporary_value is likely to be 1, and we can reuse the object.
In the second case, if the refcount of var is 2, which is likely, we can reuse the object.

The above is impossible with "small ints" because if the result is a small int, we need to use the immortal small object and free the temporary.

We can still have small ints. There are useful in lots of places, but we need to get rid of the requirement that we must use them, that (10 + 10) is 20.

What the specialized form would look like

We want the specialization of + for ints, BINARY_OP_ADD_INT_REUSE_LEFT, to be fully inline and do no allocation when the lhs has the expected refcount:
Something like:

        inst(BINARY_OP_ADD_INT_REUSE_LEFT, (unused/1, left, right -- sum)) {
            assert(cframe.use_tracing == 0);
            DEOPT_IF(!PyLong_CheckExact(left), BINARY_OP);
            DEOPT_IF(Py_TYPE(right) != Py_TYPE(left), BINARY_OP);
            DEOPT_IF(Py_REFCNT(left) != 1, BINARY_OP); /* Or two for the x += ... case */
            DEOPT_IF((left->tagged_bits | right->tagged_bits) & 3, BINARY_OP); /* Both are mortal ints, not longs */
            DEOPT_IF((add_overflows(left->tagged_bits, right->tagged_bits), BINARY_OP);
            STAT_INC(BINARY_OP, hit);
            left->tagged_bits = (left->tagged_bits + right->tagged_bits) & ~(1<<2);
            _Py_DECREF_SPECIALIZED(right, (destructor)PyObject_Free);
        }

We attempted this before, but the code gets messy and very branchy handling all the special cases due to small ints, and refcounts.

We will want a bit of help from the compiler, to mark which binary ops are of the form x += ... or x = x + ..., that way we will know whether check for refcount 1 or 2 at specialization and runtime.

Prototyping on floats

We can implement the reference counting specializations first for floats which don't have to worry about overflow or immortal values. This should tell us whether we can reuse objects often enough for this sort of refcounting optimization will be beneficial.

More efficient implementation of integers #548

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions