Skip to content

More efficient representation of integers #245

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lpereira opened this issue Jan 31, 2022 · 4 comments
Closed

More efficient representation of integers #245

lpereira opened this issue Jan 31, 2022 · 4 comments
Assignees
Labels
epic-compact-objects Reducing size of objects for 3.12

Comments

@lpereira
Copy link

lpereira commented Jan 31, 2022

This is just to track my work on implementing #147. There's some discussion on this already on #231.

@lpereira lpereira self-assigned this Jan 31, 2022
@mdboom mdboom moved this from In Progress to Todo in Fancy CPython Board May 9, 2022
@mdboom mdboom added the epic-compact-objects Reducing size of objects for 3.12 label Aug 2, 2022
@jneb
Copy link

jneb commented Oct 21, 2022

Maybe a silly question but are there any statistics on int size used in practice?
This could really help in making a wise decision on the boundary between "short" and "long".
I would guess that the size of the small ints could be anything down to about 16 bits, but actual statistics would really help.

@gvanrossum
Copy link
Collaborator

Maybe a silly question but are there any statistics on int size used in practice? This could really help in making a wise decision on the boundary between "short" and "long". I would guess that the size of the small ints could be anything down to about 16 bits, but actual statistics would really help.

I don't think we have any, but it would be simple to add code, normally #ifdef-ed out, to the int allocator that counts how many allocations there are of integers with k "digits" (a digit being a 15-bit or 30-bit chunk in the current representation), for k between 0 and some reasonable number (binning everything larger in a single overflow bin). Then we could run some test code and dump the stats. Or if you want more precise numbers, at considerable extra cost, we could calculate the number of bits needed to represent the value.

Of course, then we would have to look for "representative" test code for which to get the stats. Would the pyperformance benchmarks be representative? Some other set of benchmarks (maybe the Pyston ones)? Or maybe we could try running the standard library test suite? I imagine every application has its own distribution.

Nothing is ever easy... Until then, my intuition tells me that the vast, vast majority of integers in almost all Python applications are under 64 bits, and most are easily under 32 bits. We don't really need more precision than that since memory allocation doesn't have much finer granularity.

@jneb
Copy link

jneb commented Nov 3, 2022

I did a test on this a very long time ago (~1990) when designing my own language and found that almost everything fits in two bytes, most in one byte (fewer negative that positive). Since I did cryptography, there were also big numbers, which were in my case about 512 bits (RSA size in these days). The numbers bigger than two bytes, but fewer than 64 were almost none.
Even though that data set had only a few programs in it, I believe you that 32 bits will be the big majority of numbers, and we must make these small and fast.

@markshannon
Copy link
Member

Superseded by #548

@markshannon markshannon closed this as not planned Won't fix, can't repro, duplicate, stale Mar 16, 2023
@github-project-automation github-project-automation bot moved this from Todo to Done in Fancy CPython Board Mar 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic-compact-objects Reducing size of objects for 3.12
Projects
Development

No branches or pull requests

5 participants