Skip to content

Insert shim frames at entries points to the interpreter. #436

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
markshannon opened this issue Jul 27, 2022 · 6 comments
Closed

Insert shim frames at entries points to the interpreter. #436

markshannon opened this issue Jul 27, 2022 · 6 comments
Labels
epic-specialization More specialization work for 3.12

Comments

@markshannon
Copy link
Member

We should insert shim frames where the C-API calls into the interpreter.
The idea is that we can simplify returns and yields, as they can assume that it is safe to just pop the current frame and continue interpretation.

The places where we enter the interpreter from the C-API are:
PyEval_EvalFrame
PyEval_EvalFrameEx
_PyEval_Vector
The interpreter is also called from gen_send_ex2, but that's not directly part of the C-API, although its callers are.

Starting with the simplest case, the C-API functions listed above (except gen_send_ex2), we need to push a frame with a single stack entry and EXIT_INTERPRETER as its sole instruction.

This allows us to simplify RETURN_VALUE and RETURN_GENERATOR as they no longer need to check whether the frame is an entry frame.

RETURN_VALUE goes from

    PyObject *retval = POP();
    _PyFrame_SetStackPointer(frame, stack_pointer);
    TRACE_FUNCTION_EXIT();
    DTRACE_FUNCTION_EXIT();
    _Py_LeaveRecursiveCallTstate(tstate);
    if (!frame->is_entry) {
        frame = cframe.current_frame = pop_frame(tstate, frame);
        _PyFrame_StackPush(frame, retval);
        goto resume_frame;
    }
    /* Restore previous cframe and return. */
    tstate->cframe = cframe.previous;
    tstate->cframe->use_tracing = cframe.use_tracing;
    return retval;

to

    PyObject *retval = POP();
    _PyFrame_SetStackPointer(frame, stack_pointer);
    TRACE_FUNCTION_EXIT();
    DTRACE_FUNCTION_EXIT();
    _Py_LeaveRecursiveCallTstate(tstate);
    assert(!frame->is_entry);
    frame = cframe.current_frame = pop_frame(tstate, frame);
    _PyFrame_StackPush(frame, retval);
    goto resume_frame;

Similarly for RETURN_GENERATOR.

The new EXIT_INTERPRETER instruction is defined as:

    PyObject *retval = POP();
    /* Restore previous cframe and return. */
    tstate->cframe = cframe.previous;
    tstate->cframe->use_tracing = cframe.use_tracing;
    return retval;

So far, so good. But things get a bit more complex with YIELD_VALUE. First off we add a yield_offset to the interpreter frame, so that yielding goes to a different location than RETURN_VALUE. This should allow us to inline generator iteration and yield from in a similar way to calls.

To get this to work we will need to implement the following in bytecode:
gen.__next__(), gen.send(), gen.throw(), gen.close(), coro.send(), async_gen.throw(), etc.

gen.__next__() could be implemented as follows:

    LOAD_FAST 0 (self)
    SETUP_FINALLY error
    ENTER_GENERATOR yield_to # sets `yield_offset` to offset of yield_to label, then jumps into the generator
    POP_BLOCK
    LOAD_FAST 0 (self)
    GEN_CLEAR
    LOAD_CONST StopIteration
    RAISE_VARARGS 1
yield_to:
    POP_BLOCK
    RETURN_VALUE
error:
    MATCH StopIteration (Convert StopIteration into RuntimeError)
    ...
    RERAISE

The other functions are left as an exercise for the reader 🙂

We could start by implementing gen_send_ex2 as bytecode, then re-implementing its callers in bytecode until we can dicard gen_send_ex2.

It might also be useful to implement next() in bytecode to avoid the context swap.

next():

    LOAD_FAST 0 (self)
    FOR_ITER done
    RETURN_VALUE
done:
    LOAD_CONST StopIteration
    RAISE_VARARGS 1
@gvanrossum
Copy link
Collaborator

Interesting. The question is whether we save enough on simplified returns (really just one check that the CPU can probably figure out is almost always taken) to dispatch for an extra opcode. Is this measurable?

Are there other benefits?

@markshannon
Copy link
Member Author

Are there other benefits?

Yes. This should allow us to inline generator iteration and yield from/await in a similar way to calls. Which broadens the scope of any trace-based optimization to include generators and coroutines.

@mdboom mdboom added the epic-specialization More specialization work for 3.12 label Aug 2, 2022
@graingert
Copy link

graingert commented Aug 11, 2022

If yield from behavior was changed to unwrap the raise RuntimeError("generator raised StopIteration") from StopIteration back into a regular StopIteration, would it make it easier to "inline generator iteration and yield from/await in a similar way to calls. "?

@markshannon
Copy link
Member Author

markshannon commented Aug 11, 2022

The current behavior of yield from is rather convoluted, but changing it is out of scope for this project.

If you want to propose to the wider community fixing all the weird corner cases, I'd be delighted.
But we are interested in improving performance without changing behavior.

@markshannon
Copy link
Member Author

#457

@markshannon
Copy link
Member Author

Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic-specialization More specialization work for 3.12
Projects
None yet
Development

No branches or pull requests

4 participants