Skip to content

Port PyPy's new unicode name db format to optimize binary CPython size #96954

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
isidentical opened this issue Sep 20, 2022 · 5 comments
Closed
Labels
type-feature A feature request or enhancement

Comments

@isidentical
Copy link
Member

isidentical commented Sep 20, 2022

Feature or enhancement

PyPy has a new algorithm that might help us reduce the unicode name db size of our binaries (which would be helpful on our way forward to better WASM compatibility [through reduced download sizes]). For more details, you can see the new implementation in PyPy side by @cfbolz.

CC: @ambv @cfbolz

We'll probably start with missing tests, and then build out a prototype to see how feasible it is and will share numbers on how much it would help before going forward with the implementation.

Linked PRs

@isidentical isidentical added the type-feature A feature request or enhancement label Sep 20, 2022
isidentical added a commit to isidentical/cpython that referenced this issue Sep 20, 2022
They were undertested, and since python#96954 might involve a
rewrite of this part of the code we want to ensure that
there won't be any behavioral change.

Co-authored-by: Carl Friedrich Bolz-Tereick <[email protected]>
isidentical added a commit to isidentical/cpython that referenced this issue Sep 20, 2022
They were undertested, and since python#96954 might involve a
rewrite of this part of the code we want to ensure that
there won't be any behavioral change.

Co-authored-by: Carl Friedrich Bolz-Tereick <[email protected]>
ambv pushed a commit that referenced this issue Sep 21, 2022
They were undertested, and since #96954 might involve a
rewrite of this part of the code we want to ensure that
there won't be any behavioral change.

Co-authored-by: Carl Friedrich Bolz-Tereick <[email protected]>
ambv pushed a commit to ambv/cpython that referenced this issue Sep 21, 2022
…H-96955)

They were undertested, and since pythonGH-96954 might involve a
rewrite of this part of the code we want to ensure that
there won't be any behavioral change.

Co-authored-by: Carl Friedrich Bolz-Tereick <[email protected]>
(cherry picked from commit 5a32eec)

Co-authored-by: Batuhan Taskaya <[email protected]>
ambv pushed a commit to ambv/cpython that referenced this issue Sep 21, 2022
…H-96955)

They were undertested, and since pythonGH-96954 might involve a
rewrite of this part of the code we want to ensure that
there won't be any behavioral change.

Co-authored-by: Carl Friedrich Bolz-Tereick <[email protected]>
(cherry picked from commit 5a32eec)

Co-authored-by: Batuhan Taskaya <[email protected]>
ambv pushed a commit that referenced this issue Sep 21, 2022
They were undertested, and since GH-96954 might involve a
rewrite of this part of the code we want to ensure that
there won't be any behavioral change.

Co-authored-by: Carl Friedrich Bolz-Tereick <[email protected]>
(cherry picked from commit 5a32eec)

Co-authored-by: Batuhan Taskaya <[email protected]>
ambv added a commit that referenced this issue Sep 21, 2022
…-96992)

They were undertested, and since GH-96954 might involve a
rewrite of this part of the code we want to ensure that
there won't be any behavioral change.

Co-authored-by: Carl Friedrich Bolz-Tereick <[email protected]>
(cherry picked from commit 5a32eec)

Co-authored-by: Batuhan Taskaya <[email protected]>
ambv added a commit that referenced this issue Sep 21, 2022
…-96991)

They were undertested, and since GH-96954 might involve a
rewrite of this part of the code we want to ensure that
there won't be any behavioral change.

Co-authored-by: Carl Friedrich Bolz-Tereick <[email protected]>
(cherry picked from commit 5a32eec)

Co-authored-by: Batuhan Taskaya <[email protected]>
ambv added a commit that referenced this issue Nov 4, 2023
…ta codepoint names (#97906)

Co-authored-by: Łukasz Langa <[email protected]>
Co-authored-by: Pieter Eendebak <[email protected]>
Co-authored-by: Dennis Sweeney <[email protected]>
ambv pushed a commit that referenced this issue Nov 6, 2023
hugovk pushed a commit to hugovk/cpython that referenced this issue Nov 8, 2023
@hroncok
Copy link
Contributor

hroncok commented Nov 15, 2023

I've noticed make regen-unicodedata fails in out-of-tree build:

python3.13 Tools/unicode/makeunicodedata.py
python3.13: can't open file '.../build/debug/Tools/unicode/makeunicodedata.py': [Errno 2] No such file or directory
make: *** [Makefile:1498: regen-unicodedata] Error 2

I think I know how to fix this and will submit a PR once I test my assumption.

@hroncok
Copy link
Contributor

hroncok commented Nov 15, 2023

#112118

@hroncok
Copy link
Contributor

hroncok commented Nov 15, 2023

I've also noticed it is now impossible to call make regen-all when offline.

Is it necessary to include regen-unicodedata in regen-all?

vstinner pushed a commit that referenced this issue Nov 15, 2023
This avoids:

    python3.13 Tools/unicode/makeunicodedata.py
    python3.13: can't open file '.../build/debug/Tools/unicode/makeunicodedata.py': [Errno 2] No such file or directory
    make: *** [Makefile:1498: regen-unicodedata] Error 2

Re-run `make regen-unicodedata` to update the script path in generated files.
@vstinner
Copy link
Member

I've also noticed it is now impossible to call make regen-all when offline.

regen-configure pulls a docker container image, it's not the only "regen" command using Internet. But regen-configure is excluded from make regen-all.

Maybe we can exclude regen-unicodedata from regen-all since it's rare to have to update these data, whereas it's common to run regen-all to update more frequently changed files.

vstinner added a commit to vstinner/cpython that referenced this issue Nov 15, 2023
The "make regen-unicodedata" should now be run manually. By the
default, it requires an Internet connection, which is not always the
case. Some Linux distributions build Linux packages in isolated
environment (without network).
vstinner added a commit that referenced this issue Nov 15, 2023
The "make regen-unicodedata" should now be run manually. By the
default, it requires an Internet connection, which is not always the
case. Some Linux distributions build Linux packages in isolated
environment (without network).
@vstinner
Copy link
Member

The initial feature request is now implemented and known issues have been fixed. I close the issue.

Thanks @cfbolz for this cool optimization!

aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
…codedata codepoint names (python#97906)

Co-authored-by: Łukasz Langa <[email protected]>
Co-authored-by: Pieter Eendebak <[email protected]>
Co-authored-by: Dennis Sweeney <[email protected]>
aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
…ython#112118)

This avoids:

    python3.13 Tools/unicode/makeunicodedata.py
    python3.13: can't open file '.../build/debug/Tools/unicode/makeunicodedata.py': [Errno 2] No such file or directory
    make: *** [Makefile:1498: regen-unicodedata] Error 2

Re-run `make regen-unicodedata` to update the script path in generated files.
aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
The "make regen-unicodedata" should now be run manually. By the
default, it requires an Internet connection, which is not always the
case. Some Linux distributions build Linux packages in isolated
environment (without network).
Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
…codedata codepoint names (python#97906)

Co-authored-by: Łukasz Langa <[email protected]>
Co-authored-by: Pieter Eendebak <[email protected]>
Co-authored-by: Dennis Sweeney <[email protected]>
Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
…ython#112118)

This avoids:

    python3.13 Tools/unicode/makeunicodedata.py
    python3.13: can't open file '.../build/debug/Tools/unicode/makeunicodedata.py': [Errno 2] No such file or directory
    make: *** [Makefile:1498: regen-unicodedata] Error 2

Re-run `make regen-unicodedata` to update the script path in generated files.
Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
The "make regen-unicodedata" should now be run manually. By the
default, it requires an Internet connection, which is not always the
case. Some Linux distributions build Linux packages in isolated
environment (without network).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants