Skip to content

bpo-43693: Add new internal code objects fields: co_fastlocalnames and co_fastlocalkinds. #26388

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
Jun 3, 2021

Conversation

ericsnowcurrently
Copy link
Member

@ericsnowcurrently ericsnowcurrently commented May 26, 2021

A number of places in the code base (notably ceval.c and frameobject.c) rely on mapping variable names to indices in the frame "locals plus" array (AKA fast locals), and thus opargs. Currently the compiler indirectly encodes that information on the code object as the tuples co_varnames, co_cellvars, and co_freevars. At runtime the dependent code must calculate the proper mapping from those, which isn't ideal and impacts performance-sensitive sections. This is something we can easily address in the compiler instead.

This PR addresses the situation by replacing internal use of co_varnames, etc. with a single combined tuple of names in locals-plus order, along with a minimal array mapping each to its kind (local vs. cell vs. free). These two new PyCodeObject fields, co_fastlocalnames and co_fastllocalkinds, are not exposed to Python code for now, but co_varnames, etc. are still available with the same values as before (though computed lazily).

Aside from the (mild) performance impact, there are a number of other benefits:

  • there's now a clear, direct relationship between locals-plus and variables
  • code that relies on the locals-plus-to-name mapping is simpler
  • marshaled code objects are smaller and serialize/de-serialize faster

Also note that we can take this approach further by expanding the possible values in co_fastlocalkinds to include specific argument types (e.g. positional-only, kwargs). Doing so would allow further speed-ups in _PyEval_MakeFrameVector(), which is where args get unpacked into the locals-plus array. It would also allow us to shrink marshaled code objects even further.

(FYI, I have a later PR that relies on having this direct mapping instead of co_varnames.)

FWIW, if this PR is too big I can split it up between "add the fields" and "use the fields". However, typically in this project we keep those together in one change.

https://bugs.python.org/issue43693

@markshannon markshannon self-assigned this May 28, 2021
@markshannon
Copy link
Member

Some thoughts on the API.

I don't like the word "fast". It has no meaning and just makes names longer. I'd much prefer "variables" to "fastlocals".
The point of your refactoring was to hide implementation details, but by exposing things like the arrays of locals and kinds we are just exposing different ones.

So I was thinking...
What is the use case for accessing this data? Presumably debuggers, profilers and dis.
Would it not be more useful for them to be able to query the metadata, rather than having to parse it?
E.g. (and this is just off the top of my head)

class CodeType:
     def get_variable_name(self, n:int):
           "Return the names of the nth variable"
     def get_variable_scope(self, n:int):
           "Returns the scope of the nth var (one of LOCAL, ESCAPES, FREE)"
     def get_variable_kind(self, n:int):
           "Return the kind of nth var (one of LOCAL, ARG, VARARG, etc)"

or

class CodeType:
     def get_variable_metadata(self, n:int):
          "Returns a triple (name, scope, kind) for nth variable"

What do you think?

Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple drive-by comments.

@gvanrossum
Copy link
Member

I don't like the word "fast". It has no meaning and just makes names longer. I'd much prefer "variables" to "fastlocals".

Yeah, they were fast several decades ago when this was an improvement over storing locals in a dict. :-) But it's a traditional name, and changing it would just add a bunch of additional unneeded churn (e.g. we have LOAD_FAST etc. -- to be consistent you'd have to rename those too).

Plus, these locals really are pretty fast. :-)

[...]

Would it not be more useful for them to be able to query the metadata, rather than having to parse it?

At least some debuggers access this data in C, so we'd have to introduce C API functions as well. I don't think we'll actually be able to come up with abstractions that won't have to change if this stuff changes again in the future.

I think that at least for this PR, we should leave well enough alone -- at some point in the future, co_fastlocalnames and friends will be presumably lazily computed, at least when accessed from Python code (e.g. pdb, dis).

@markshannon
Copy link
Member

If the abstraction models the language, then it shouldn't need to change (unless the language does).

The scope (local, escapes, or free), the kind (arg, kwarg, non-arg, etc), and the name are all properties of the variable and independent of the data structures used.

@gvanrossum
Copy link
Member

gvanrossum commented May 28, 2021 via email

@markshannon
Copy link
Member

Why not do it now? What is the advantage of exposing things to hide them later.

@ericsnowcurrently
Copy link
Member Author

Some thoughts on the API.

[...]

What do you think?

I'm fine with taking a more neutral approach, along the lines you suggested. However, I don't want to introduce a new public API (in Python or in C) in this PR.

@ericsnowcurrently
Copy link
Member Author

@markshannon, please take another look at this PR when you get the chance.

@markshannon
Copy link
Member

Windows warnings need fixing: https://github.com/python/cpython/pull/26388/files#diff-ebc983d9f91e5bcf73500e377ac65e85863c4f77fd5b6b6caf4fcdf7c0f0b057R7253

You say that you don't want to add a public API, but that is what you are doing by adding co_fastlocalnames and co_fastlocalkinds.
I'd rather that we added the API first then changed the implementation behind that API, but to get things moving can you just make the attributes "private" by prefixing them with an underscore?

@ericsnowcurrently
Copy link
Member Author

You say that you don't want to add a public API, but that is what you are doing by adding co_fastlocalnames and co_fastlocalkinds.
I'd rather that we added the API first then changed the implementation behind that API, but to get things moving can you just make the attributes "private" by prefixing them with an underscore?

You mean the fields on the struct? I was talking about the attributes exposed to Python code. As far as the struct goes, I'm fine with adding an underscore prefix but I don't see the point since using the fields is required and given the class of expected users. I'd rather make PyCodeObject opaque in the public C-API (moving it to Include/internal/pycore_code.h), but that is a separate matter.

In the meantime I will add an undercore prefix to those fields so we can move this along.

@markshannon
Copy link
Member

I meant the Python attributes, but it looks like you've already removed them 👍

@ericsnowcurrently
Copy link
Member Author

Thanks, @markshannon!

@ericsnowcurrently ericsnowcurrently merged commit 2c1e258 into python:main Jun 3, 2021
@ericsnowcurrently ericsnowcurrently deleted the fast-local-kinds branch June 3, 2021 16:28
pablogsal added a commit to pablogsal/cpython that referenced this pull request Jun 4, 2021
pablogsal added a commit that referenced this pull request Jun 4, 2021
* Revert "bpo-43693: Compute deref offsets in compiler (gh-25152)"

This reverts commit b2bf2bc.

* Revert "bpo-43693: Add new internal code objects fields: co_fastlocalnames and co_fastlocalkinds. (gh-26388)"

This reverts commit 2c1e258.

These two commits are breaking the refleak buildbots.
ericsnowcurrently added a commit to ericsnowcurrently/cpython that referenced this pull request Jun 7, 2021
These were reverted in pythonGH-26530 (commit 17c4edc).

* Compute deref offsets in compiler (pythongh-25152)
* Add new internal code objects fields: co_fastlocalnames and co_fastlocalkinds. (pythongh-26388)
ericsnowcurrently added a commit that referenced this pull request Jun 7, 2021
These were reverted in gh-26530 (commit 17c4edc) due to refleaks.

* 2c1e258 - Compute deref offsets in compiler (gh-25152)
* b2bf2bc - Add new internal code objects fields: co_fastlocalnames and co_fastlocalkinds. (gh-26388)

This change fixes the refleaks.

https://bugs.python.org/issue43693
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants