Skip to content

gh-120321: Lock gen_send #120327

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from
Closed

Conversation

Fidget-Spinner
Copy link
Member

@Fidget-Spinner Fidget-Spinner commented Jun 10, 2024

Adds per-object lock for gen_send to fix crashes.

@markshannon
Copy link
Member

What's the root cause of #120321?
Until you know that, how do you know this fixes the problem?

This feels like slapping a lock on something and hoping it works.

@Fidget-Spinner
Copy link
Member Author

What's the root cause of #120321? Until you know that, how do you know this fixes the problem?

The crash's root cause is here https://github.com/python/cpython/pull/120327/files#diff-2a0c449b68605ebd0872fd232e60ce7e838a77782a6d2e364764f99514fb508aR219

Those 3 lines can race and cause tstate->exc_info along with gi_exc_state to become invalid.

I locked the whole thing because TSAN is warning not just those few lines. It warns that the _PyFrame_StackPush and all the other stack accessor functions in there are racing as well. Though I don't think that results in a crash.

@markshannon
Copy link
Member

What about all the other places that tstate->exc_info is updated when we push or pop a generator frame?

I suspect that this doesn't crash only because specialization is turned off for free-threading:

def gen():
    while True:
        yield

def my_next(it):
    for val in it:
        return value
    raise StopIteration

it = gen()
with concurrent.futures.ThreadPoolExecutor() as executor:
    while True:
        _ = executor.submit(lambda: my_next(it))

@Fidget-Spinner
Copy link
Member Author

Fidget-Spinner commented Jun 11, 2024

I suspect that this doesn't crash only because specialization is turned off for free-threading:

Yeah the entire gen_send has multiple races in multiple places. Likewise for the specializing interpreter. So the entire thing needs to be locked (what the GIL does at the moment).

@markshannon
Copy link
Member

gen_send isn't the problem. It contains an instance of the problematic code, but there are instances in bytecodes.c in _SEND and _FOR_ITER_GEN`.

Pushing and popping generator frames needs to be locked. The best way to do that would be to move popping and pushing into its own helper function.

@Fidget-Spinner
Copy link
Member Author

I agree that pushing and popping frames need fixing, but gen_send is still problematic even with that. It has non-atomic accesses of fields on tstate and the frame which another thread could be concurrently modifying. In fact even after factoring out the generator frame pushing and locking just that, I still get crashes from the other issues in gen send.

@markshannon
Copy link
Member

OOI, what are the other crashes?

@Fidget-Spinner
Copy link
Member Author

A selection of them after I've locked just the frame pushing

python: Objects/genobject.c:235: gen_send_ex2_unlocked: Assertion `gen->gi_exc_state.previous_item == NULL' failed

./Include/internal/pycore_frame.h:313: _PyFrame_GetGenerator: Assertion `frame->owner == FRAME_OWNED_BY_GENERATOR' failed.

./Include/internal/pycore_frame.h:77: _PyFrame_GetCode: Assertion `PyCode_Check(f->f_executable)' failed.

@DinoV
Copy link
Contributor

DinoV commented Jun 12, 2024

I wonder if we really only need the locks up to the transition to gen->gi_frame_state = FRAME_EXECUTING;. That would seem to work better with fixing SEND in the opcode as well as doing the unlocking there isn't going to be as easy.

I think other than the asserts that only leaves if (FRAME_STATE_SUSPENDED(gen->gi_frame_state)) { as the lone racing read. I'm not sure that this is fully really safe even w/ the GIL though. It seems like _PyEval_EvalFrameDefault wouldn't release the GIL between yielding a value and getting back to here, but something that's hooked the evaluation function could indeed release the GIL. So this is really already racing against another thread resuming the generator already (and you shouldn't get an interpreter crash, just odd results from trying to run a generator from multiple threads at the same time, which isn't really a sane thing to do anyway).

@vstinner
Copy link
Member

vstinner commented Nov 6, 2024

What's the status of this PR? It now has multiple merge conflicts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants