Skip to content

gh-91869: Fix tracing of specialized instructions with extended args #91945

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Apr 28, 2022

Conversation

sweeneyde
Copy link
Member

@sweeneyde sweeneyde added type-bug An unexpected behavior, bug, or error interpreter-core (Objects, Python, Grammar, and Parser dirs) 3.11 only security fixes labels Apr 26, 2022
@markshannon
Copy link
Member

I'm worried about keeping the adaptive form of the EXTENDED_ARG and its following opcode in sync. Adding instrumentation is likely to make this even more complex.

I think we should go back to the old approach of simply not quickening any opcode that is preceded by an EXTENDED_ARG.

According to the latest stats, EXTENDED_ARG is only 0.4% of the opcodes executed and only a few of those are candidates for specialization.

The new test looks good.

@sweeneyde
Copy link
Member Author

sweeneyde commented Apr 26, 2022

I think we should go back to the old approach of simply not quickening any opcode that is preceded by an EXTENDED_ARG.

It looks like FOR_ITER has an EXTENDED_ARG 7.7% of the time, and that's one motivation for this.

I'm not sure I understand what you mean by "in sync" -- tracing doesn't modify any opcodes in place, and the only difference in behavior between EXTENDED_ARG/EXTENDED_ARG_QUICK here is whether they do (the same-oparg equivalent of) NEXTOPARG() or TRACING_NEXTOPARG().

The in-sync-ness is guaranteed by the implementation of EXTENDED_ARG/EXTENDED_ARG_QUICK not checking for tracing before jumping straight into the next instruction, and adaptive instructions don't clobber the opcode because of NOTRACE_DISPATCH_SAME_OPARG.

Before this PR, when running EXTENDED_ARG+LOAD_GLOBAL_MODULE with tracing enabled, this happens:

* previous instruction does opcode |= 255
* DO_TRACING for EXTENDED_ARG
* TRACING_NEXTOPARG() takes us back to EXTENDED_ARG
* EXTENDED_ARG sets the big oparg
* EXTENDED_ARG jumps straight to LOAD_GLOBAL_MODULE => assertion failure.

With this PR, running EXTENDED_ARG_QUICK+LOAD_GLOBAL_MODULE with tracing enabled:

* previous instruction does opcode |= 255
* DO_TRACING for EXTENDED_ARG(_QUICK)
* TRACING_NEXTOPARG() takes us back to (deoptimized) EXTENDED_ARG
* EXTENDED_ARG sets the big oparg
* EXTENDED_ARG jumps straight to (deoptimized) LOAD_GLOBAL

This change should also make it impossible for superinstructions like LOAD_FAST__STORE_FAST with EXTENDED_ARG to be executed while tracing is enabled, which I think may fix some bugs as well -- the second instruction in the pair should be able to provoke tracing.

@markshannon
Copy link
Member

My concern was that we might optimize (or deoptimize) the EXTENDED_ARG, but not the following instruction (or vice versa).

But, from what you are saying that should be OK as EXTENDED_ARG is really the special form, only needed when tracing and EXTENDED_ARG_QUICK is the normal form. Is that correct?

Would it better to never quicken EXTENDED_ARG, but have the compiler emit EXTENDED_ARG_QUICK directly (so EXTENDED_ARG only exists for tracing).

@sweeneyde
Copy link
Member Author

My concern was that we might optimize (or deoptimize) the EXTENDED_ARG, but not the following instruction (or vice versa).

But, from what you are saying that should be OK as EXTENDED_ARG is really the special form, only needed when tracing and EXTENDED_ARG_QUICK is the normal form. Is that correct?

Yes, the current PR's EXTENDED_ARG primarily exists for tracing. It should always be safe to switch from EXTENDED_ARG_QUICK to EXTENDED_ARG because it's always safe (right?) to do as tracing does and execute _PyOpcode_Deopt[opcode] instead of opcode.

Would it better to never quicken EXTENDED_ARG, but have the compiler emit EXTENDED_ARG_QUICK directly (so EXTENDED_ARG only exists for tracing).

That would make sense -- I'll see how hard it is to implement EXTENDED_ARG/EXTENDED_ARG_FOR_TRACING instead, such that _PyOpcode_Deopt[EXTENDED_ARG] == EXTENDED_ARG_FOR_TRACING.

@sweeneyde
Copy link
Member Author

#91974 implements that idea. It slightly complicates the meaning of _PyOpcode_Deopt[] and which opcodes are included in dis.opmap and such, but it does make it so that EXTENDED_ARG_TRACE is only ever executed if tracing is happening.

I think I'd lean toward this PR (91945), but either is fine by me.

@markshannon
Copy link
Member

You can leave EXTENDED_ARG and EXTENDED_ARG_QUICK; no need for EXTENDED_ARG_FOR_TRACING or to change _PyOpcode_Deopt

I tried removing the case for EXTENDED_ARG from _PyCode_Quicken and emitting EXTENDED_ARG_QUICK instead of EXTENDED_ARG in write_instr and all the test pass.

@markshannon markshannon added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Apr 27, 2022
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @markshannon for commit 6591ba1 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Apr 27, 2022
@markshannon markshannon merged commit 37c6db6 into python:main Apr 28, 2022
@markshannon
Copy link
Member

@sweeneyde Thanks for the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.11 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants