Skip to content

bpo-45653: Freeze parts of the encodings package #30030

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

tiran
Copy link
Member

@tiran tiran commented Dec 10, 2021

The PR is inspired by Kumar's PR GH-29788. I extended _PyCodecRegistry_Init to set the encodings.__path__ to [os.path.join(config->stdlib_dir, "encodings)].

Signed-off-by: Christian Heimes [email protected]
Co-authored-by: Kumar Aditya [email protected]

https://bugs.python.org/issue45653

Comment on lines +1409 to +1417
/* Set encodings.__path__ for frozen encodings package
*
* The encodings package is frozen but most encodings modules are not. The
* __path__ attribute of the encodings package must be reset so importlib is
* able to find the pure Python modules.
* Returns -1 on error
*/
static int
_set_encodings_path(PyObject *mod) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... This should not be necessary. It is already handled by FrozenImporter.find_spec() (in Lib/importlib/_bootstrap.py). If path is not getting set then something went wrong and needs to be fixed.

Is it here to provide a fallback for the config->stdlib_dir == NULL case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something is not working right when embedding Python.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😞

I'll try to take a look this week.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I spent some time looking at this today. I take back what I said: "This should not be necessary." The approach you took is probably good enough until we can find a better solution. I plan on troubleshooting the test_embed failures if you don't figure them out first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that sys._stdlib_dir is set to None. This can happen in some embedding scenarios (for now). This is the reason why FrozenImporter.find_spec() doesn't populate encodings.__path__ in the failing tests. It is why _set_encodings_path() is failing.

The problem is that sys._stdlib_dir is set to None. sys._stdlib_dir is set from _Py_GetStdlibDir(), which returns the value calculated by the getpath.c code during runtime init. In some embedding cases that code refuses to extrapolate the stdlib dir, so it ends up None.

FrozenImporter.find_spec() uses sys._stdlib_dir to figured out the encodings.__path__ entry to add. If the stdlib dir is unknown then it doesn't add any. This is also why _set_encodings_path() isn't working.

We have several options:

  1. extrapolate sys._stdlib_dir some other way
  2. fall back to using the non-frozen encodings module
  3. freeze all "encodings" submodules

(2) effectively accomplishes the same thing as (1), though it doesn't actually update sys._stdlib_dir. Furthermore, we already know it works. (2) also has the benefit of being very simple, since we'd use the normal import machinery unchanged. (Note that (1) and (2) are not guaranteed to find the stdlib dir. However, with (2) that failure mode already exists, so embedders would already have to deal with it.) (3) would get what we want but would make the compiled binary bigger and would add a bunch of noise to make output when building.

So I recommend (2).


(2) involves 2 things: drop _set_encodings_path() here, and update FrozenImporter.find_spec().

diff --git a/Lib/importlib/_bootstrap.py b/Lib/importlib/_bootstrap.py
index afb95f4e1d..4200afae7c 100644
--- a/Lib/importlib/_bootstrap.py
+++ b/Lib/importlib/_bootstrap.py
@@ -941,6 +941,9 @@ def find_spec(cls, fullname, path=None, target=None):
                                 origin=cls._ORIGIN,
                                 is_package=ispkg)
         filename, pkgdir = cls._resolve_filename(origname, fullname, ispkg)
+        if ispkg and not pkgdir:
+            # We can't resolve __path__, so Fall back to the path finder.
+            return None
         spec.loader_state = type(sys.implementation)(
             filename=filename,
             origname=origname,

@gvanrossum
Copy link
Member

The question is, do we still need this version, or is Kumar's original #29788 sufficient? I'd like to have a single PR that we can actually merge. :-)

@tiran tiran force-pushed the bpo-45653-encodings branch from 0ce223d to eef175c Compare January 7, 2022 22:27
@tiran tiran marked this pull request as ready for review January 8, 2022 11:08
@tiran tiran requested a review from a team as a code owner January 8, 2022 11:08
@tiran tiran closed this Apr 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants