Skip to content

Guard script wrapper entrypoint import with if __name__ == "__main__" #242

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ichard26
Copy link
Member

This way, the entrypoint will only be imported if the script wrapper is ran directly. This is beneficial for applications that use multiprocessing. multiprocessing imports the __main__ module while initializing new workers to restore any global state the parallelized logic may rely on (e.g., a package-wide logger). Unfortunately, if the application is called using the console script wrapper (e.g., pip install), then the wrapper is the main module. For pip, this means every child process will import venv/bin/pip and consequently run from pip._internal.cli.main import main which is quite a heavy import.

(And yup, this means that multiprocessing is often slower when running tool compared to python -m tool if the application's __main__.py uses a if __name__ == "__main__" guard like pip.)

I hope this use-case is convincing enough. I do not wish to drag the discussion out like with what happened with #239. If you'd like a demo that shows the performance implications, I'm happy to write one.

Concretely, this would let me remove this awful hack from my PR parallelizing .pyc compilation.

Finally, I'll note that distlib's original script template had the entrypoint import under if __name__ == "__main__" before pip's template was synced over to distlib ~6 years ago: ec0bcea. That seems to suggest that there shouldn't be any backwards compatibility concerns.

This way, the entrypoint will only be imported if the script wrapper is ran directly. This is beneficial for applications that use multiprocessing. multiprocessing imports the `__main__` module while initializing new workers to restore any global state the parallelized logic may rely on (e.g., a package-wide logger). Unfortunately, if the application is called using the console script wrapper (e.g., `pip install`), then the wrapper is the main module. For pip, this means every child process will import `venv/bin/pip` and consequently run `from pip._internal.cli.main import main` which is quite a heavy import.

(And yup, this means that multiprocessing is often slower when running`pip` compared to `python -m pip` if the application's `__main__.py` uses a `if __main__` guard like pip.)
Copy link

codecov bot commented Apr 13, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.59%. Comparing base (674a491) to head (ad184c3).
Report is 20 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #242      +/-   ##
==========================================
+ Coverage   81.49%   81.59%   +0.10%     
==========================================
  Files          24       24              
  Lines        8885     8956      +71     
  Branches     1747     1535     -212     
==========================================
+ Hits         7241     7308      +67     
- Misses       1300     1309       +9     
+ Partials      344      339       -5     
Flag Coverage Δ
unittests 80.75% <ø> (+0.13%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant