Skip to content

bpo-45653: Freeze encodings package modules #29788

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

kumaraditya303
Copy link
Contributor

@kumaraditya303 kumaraditya303 commented Nov 26, 2021

Freezes encodings modules as suggested by @gvanrossum

Oh, and it would be nice to be able to freeze some submodules in a package but not all of them -- e.g. I'd like to freeze encodings/{init,utf_8,aliases}.py but not the remaining 100 submodules in that package.

https://bugs.python.org/issue45653

@FFY00
Copy link
Member

FFY00 commented Nov 26, 2021

I am alright with continuing it but if you want, I can take it. I will close this in favor of your PR 😊

@FFY00 FFY00 closed this Nov 26, 2021
@FFY00 FFY00 reopened this Nov 26, 2021
@FFY00
Copy link
Member

FFY00 commented Nov 26, 2021

Oops, wrong PR 🤦

@gvanrossum
Copy link
Member

Do you have a clue about the test failures yet? (You know you can run tests locally right? Etc.)

@kumaraditya303
Copy link
Contributor Author

I am on Windows, and even in debug builds frozen modules are used which is kinda confusing because on Linux in debug mode frozen modules are not used. I know how to run tests but when I run test_embed on windows it skips it saying that _testembed.exe doesn't exists, even though it exists. I'll see if I can fix it.

@gvanrossum
Copy link
Member

I am on Windows, and even in debug builds frozen modules are used which is kinda confusing because on Linux in debug mode frozen modules are not used. I know how to run tests but when I run test_embed on windows it skips it saying that _testembed.exe doesn't exists, even though it exists. I'll see if I can fix it.

You are seeing the effect of https://bugs.python.org/issue45651. To force frozen modules on/off you can pass -X frozen_modules={on,off}. I don't know what's up with _testembed.exe, but it could be because of the current directory.

@gvanrossum
Copy link
Member

@ericsnowcurrently Do you have any ideas to share on where to look for these failures? Apparently (deep)freezing part of a package is not fully supported yet.

@kumaraditya303 I'll look into this later this week if we don't hear from Eric, but my hunch is that we need to add code (I don't know where yet) that ties the frozen package to the on-disk directory for the same package, so that from encodings import blah will first look for a (deepfrozen) encodings.blah and if that's not found it will look in the encodings directory.

PS. If you want your fix for test_embed incorporated, rebase on the new main branch.

@gvanrossum
Copy link
Member

Looks like test_embed fails on all platforms (and a few others fail on Windows only). It looks like you have your test_embed fix in here. Could you revert that change and push that to this PR to see if that fixes the macOS and Ubuntu tests? (I tried it on Windows and it does fix it, somehow. So maybe it's just wrong?)

@kumaraditya303
Copy link
Contributor Author

I have rebased PR and included the test_embed fix by which I am atleast able to run test_embed, earlier it was skipped moreover that change only affects windows

if MS_WINDOWS:
ext = ("_d" if debug_build(sys.executable) else "") + ".exe"
exename += ext
exepath = builddir
expecteddir = os.path.join(support.REPO_ROOT, builddir)
else:
so I don't think reverting it would fix any other platform. It seems more involved in how import works with deepfreeze and packages.

@gvanrossum
Copy link
Member

Nevermind, reverting that just skips test_embed on Windows. The culprit must be that some way of setting the default encoding causes Python to fail, presumably because it can't find the encoding in the encodings package. I'd focus on finding the culprit here (hopefully you can figure out how to run just the failing command, outside the test harness).

@gvanrossum
Copy link
Member

gvanrossum commented Nov 30, 2021

I can tell you that the test_embed failures seem to ultimately come from line 194-195 in Python/codecs.c::_PyCodec_Lookup, called from Objects/unicodeobject.c::init_stdio_encoding at line 15913 (!). (UPDATE: Via config_get_codec_name at line 15870.)

@ericsnowcurrently
Copy link
Member

@ericsnowcurrently Do you have any ideas to share on where to look for these failures? Apparently (deep)freezing part of a package is not fully supported yet.

The most notable thing to me is that 4 of the 5 failing tests are printing nothing to stdout. This means that _testembed (Programs/_testembed.c) returned 0 and either dump_config() wasn't called or it failed silently. The latter seems more likely. It looks like dump_config() can fail silently. That's where I'd start investigating.

I suspect encodings.__path__ is not set properly. You can verify this with:

./python -c "import encodings; print(encodings.__path__).

It should be something like ['Lib/encodings'].

@kumaraditya303 I'll look into this later this week if we don't hear from Eric, but my hunch is that we need to add code (I don't know where yet) that ties the frozen package to the on-disk directory for the same package, so that from encodings import blah will first look for a (deepfrozen) encodings.blah and if that's not found it will look in the encodings directory.

__path__ on a frozen package is effectively set in FrozenImporter.find_spec() (Lib/importlib/_bootstrap.py), unless PyImport_ImportFrozenModuleObject() was used (which normally shouldn't have been).

@gvanrossum
Copy link
Member

I suspect encodings.__path__ is not set properly. You can verify this with:

./python -c "import encodings; print(encodings.__path__).

It should be something like ['Lib/encodings'].

This it does, even when I pass -X frozen_modules=on:

PS C:\Users\gvanrossum\cpython> .\PCbuild\amd64\python.exe -X frozen_modules=on 
Python 3.11.0a2+ (heads/pr-29575:88610c7f15, Nov 30 2021, 13:53:01) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import encodings   
>>> encodings.__path__
['C:\\Users\\gvanrossum\\cpython\\lib\\encodings']
>>> ^Z

PS C:\Users\gvanrossum\cpython>

However the failing tests seem to be embedding Python (hence test_embed) and then the initialization sequence is sufficiently different that apparently the __path__ value is either not set or it points to a non-existing directory (I added some debug print() calls to one of the test cases and it seemed to have a .zip file for the stdlib, so maybe that's a clue?).

@ericsnowcurrently
Copy link
Member

However the failing tests seem to be embedding Python (hence test_embed) and then the initialization sequence is sufficiently different that apparently the __path__ value is either not set or it points to a non-existing directory

Good point. That's an important observation. If we cannot extrapolate the stdlib dir during runtime init then frozen modules will not have __file__ or __path__ set appropriately. That would definitely cause import of unfrozen submodules to fail. sys._stdlib_dir is where the discovered stdlib dir is recorded (and where FrozenImporter gets the value).

So, if that is the problem then:

One possible fallback for runtime-init-could-not-find-stdlib-dir is to manually walk sys.path looking for os.py (the stdlib dir marker we use elsewhere). This could be done during runtime init (or lazily in FrozenImporter, though there isn't any point in doing it lazily I suppose). I didn't do that when I added sys._stdlib_dir due to the potential cost to runtime init and because it didn't seem necessary. However, it is important if we want __path__ to work for partially frozen packages.

One thing to watch out for when walking sys.path is when the stdlib is located in a zip file (a '.zip' file on sys.path). It might make sense to use zipimporter.find_spec() (Lib/zipimport.py) for those '.zip' files. (We can't just use importlib.util.find_spec() to do the whole thing for us since FrozenImporter will find the module. It would work if we temporarily removed FrozenImporter from sys.meta_path but a hack like that makes me too nervous. We could probably get away with walking sys.path_hooks or using FileFinder.find_spec() directly, but that's overkill.)

Another possible solution is to disable the frozen encodings module if the frozen one doesn't have __path__ populated. This could be done in FrozenImporter.find_spec() for all modules it identifies as packages, returning None if (the eventual) __path__ is empty. We'd probably need to update a few other FrozenImporter methods to match.

FWIW, the matter of runtime init not finding the stdlib dir (and hence sys._stdlib_dir not being set) should probably be handled in a separate issue.

@gvanrossum
Copy link
Member

Kumar, pleased don’t force push. Use merge instead of rebase; and add new commits as you change the code. It makes review easier, and we will squash at the end when we merge upstream.

@kumaraditya303
Copy link
Contributor Author

Kumar, pleased don’t force push. Use merge instead of rebase; and add new commits as you change the code. It makes review easier, and we will squash at the end when we merge upstream.

Okay, will always merge onwards :)

@gvanrossum
Copy link
Member

Hey, you don't have to merge every commit on main!

Also, you need a news item. Follow the instructions in the relevant test failure.

@kumaraditya303
Copy link
Contributor Author

There was a conflict with getpath.c hence I needed to merge main.

@gvanrossum
Copy link
Member

Honestly I think this should work. I will close and reopen the issue so as to re-trigger the CI.

@gvanrossum gvanrossum closed this Jan 6, 2022
@gvanrossum
Copy link
Member

...aand, we're back!

@gvanrossum gvanrossum reopened this Jan 6, 2022
@gvanrossum
Copy link
Member

Oh wait, even if this passes, we still need the fixes from @tiran in #30030. So that one really supersedes this.

@gvanrossum
Copy link
Member

Hey @kumaraditya303, what are your plans for this PR? It seems it currently fails some tests and there's also the problem reported by @tiran in https://bugs.python.org/issue45653#msg408474.

@kumaraditya303
Copy link
Contributor Author

Hey @kumaraditya303, what are your plans for this PR? It seems it currently fails some tests and there's also the problem reported by @tiran in https://bugs.python.org/issue45653#msg408474.

It is failing on the same embedding tests as earlier and I am busy currently so I'll investigate later on this as the tests are a bit tough to debug on this one.

@gvanrossum
Copy link
Member

Okay, no worries!

@kumaraditya303
Copy link
Contributor Author

I wasn't able to fix this despite debugging it for months and the performance improvement is negligible on Linux. I close this PR.

@kumaraditya303 kumaraditya303 deleted the encodings branch June 30, 2022 09:26
@gvanrossum
Copy link
Member

Sorry to hear it, but it sounds like it’s for the best.

@kumaraditya303
Copy link
Contributor Author

Sorry to hear it, but it sounds like it’s for the best.

No worries, I don't usually leave things unfinished but the perf improvement from this if the current issue could be solved is not worth it so I decided to leave it. I may revisit this if I find something but I have some other more interesting thing TODO ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants