-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
We need to be consistent in our use of instruction/codeunit/bytecode/opcode, etc. #94437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I have a feeling that all three instruction, execution unit and code unit can be seen as instructions, just of different tiers. The tiers can be named, for example:
|
I like instruction, pseudo-instruction, code-unit, and bytecode, but I'm not excited about execution-unit (too long, not quite self-explanatory enough, and I don't think we've been using that term). Maybe the latter could be "concrete instructions"? Longer, but more self-explanatory. If this was a classical assembly language, those pseudo-instructions would be macros, and I wouldn't object to reusing that term instead of pseudo- (or virtual-?) instructions. In Mark's definitions, are cache entries code-units or not? The definition of bytecode seems to imply they are, but the definition of code-unit seems to imply they're not (cache entries don't have an opcode and oparg). Agreed that this documentation doesn't belong in dis.rst. |
There's a lot of good information about how the CPython interpreter works in the devguide. I wonder if it makes sense to split that out into a separate document solely focused on how the CPython interpreter (parser, etc.) works? Then such definitions would make sense going there. Maybe it still makes sense to cover this in the devguide in the meantime? |
Oh, I always forget about the devguide! The section on the PEG parser is great (thanks Pablo) but the compiler design section seems outdated (there is no NEXT_BLOCK macro in compile.c any more, and it even still lists the long-dead peephole.c file). I don't know whether those chapters would be served by moving them into yet another document -- we already have too many. |
No, they are instructions that do not map directly to a execution-unit (or concrete instruction, using your terminology). This is consistent with assemblers for hardware machines, which have pseudo-instructions that are built into the assembler, not defined as macros. |
You are correct. |
Documentation
We use the terms opcode, bytecode, instruction, and codeunit, in the code, comments and documentation.
However we aren't consistent, nor do we define those terms properly anywhere. The best docs are in
dis.rst
which is the wrong place for them.A glossary
First of all we want some sort of glossary like this:
Representation of instruction at runtime:
The assembler converts each instruction to zero or more execution-units, and each of those are converted to one or more code-units
An execution-unit is composed of:
opcode == EXTENDED_ARG
and whose oparg is 8 of the high bits of the instruction's operand.opcode
represents the name of the instruction, and whoseoparg == (opcode & 255)
Although the bytecode,
co.co_code
, is presented as a sequence of bytes, it should be viewed as a sequence of codeunits, with the opcode preceding the oparg. Thedis
module will disassemble bytecode to a list of codeunits.Why do this?
Doing this will expose inconsistencies in our terminology and tools and allow us to consider better tooling in the future.
For example, shouldn't dis output a list of instructions, not codeunits?
Could we support an assembler, allowing backwards compatible assembly code?
We could convert a list of 3.10 instructions to 3.11 bytecode. At the instruction level, they aren't so different, even though the bytecode is quite different.
The set of names is infinite, allowing us more flexibility to add new instructions, and support old ones.
Examples
The
BINARY_ADD
instruction is also an execution-unit in 3.10, but could be a pseudo-instruction in 3.11+Likewise
SETUP_FINALLY
. The difference is that the 3.11 front-end emitsSETUP_FINALLY
, but notBINARY_ADD
.*Instruction:
LOAD_METHOD "spam"
*Execution unit:
LOAD_ATTR 515
*Code units:
EXTENDED_ARG 2
LOAD_ATTR 3
CACHE 0
*6The text was updated successfully, but these errors were encountered: