Skip to content

bpo-46406: Faster single digit int division. #30626

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jan 23, 2022

Conversation

gpshead
Copy link
Member

@gpshead gpshead commented Jan 16, 2022

This expresses the algorithm in a more basic manner resulting in better
instruction generation by todays compilers.

See https://mail.python.org/archives/list/[email protected]/thread/ZICIMX5VFCX4IOFH5NUPVHCUJCQ4Q7QM/#NEUNFZU3TQU4CPTYZNF3WCN7DOJBBTK5

https://bugs.python.org/issue46406

This expresses the algorithm in a more basic manner resulting in better
instruction generation by todays compilers.

See https://mail.python.org/archives/list/[email protected]/thread/ZICIMX5VFCX4IOFH5NUPVHCUJCQ4Q7QM/#NEUNFZU3TQU4CPTYZNF3WCN7DOJBBTK5
@gpshead gpshead marked this pull request as ready for review January 17, 2022 03:15
@gpshead gpshead added the performance Performance or resource usage label Jan 17, 2022
Copy link
Member

@mdickinson mdickinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, modulo a cast needed to silence the compiler on Windows.

gpshead and others added 2 commits January 22, 2022 10:37
@mdickinson
Copy link
Member

Still LGTM. Thank you!

@mdickinson mdickinson merged commit c7f20f1 into python:main Jan 23, 2022
@bedevere-bot
Copy link

@mdickinson: Please replace # with GH- in the commit message next time. Thanks!

@gpshead gpshead deleted the long-faster-divide branch January 23, 2022 10:02
PGO/FDO builds doing value specialization such as a fast path for //10. :)

Verify that 17 isn't specialized and this works as a quick test:
python -m timeit -s 'x = 10**1000; r=x//10; assert r == 10**999, r' 'x//17'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asserting that division by 10 "works" is irrelevant when the test is actually dividing by 17. But it's a timing statement, so asserting that the division works is beside the point anyway. The point is to get the timing, so

python -m timeit -s 'x = 10**1000' 'x//17'

would be best. And 17 isn't of real interest anyway. Use 10 instead. That's the value we've actually seen special-cased in PGO builds. 17 only came up when Mark temporarily fudged a PGO-build input test to pick on 17 instead of 10, just to see what would happen.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The command is just a paste of what i was using in my terminal as a quick unittest and microbenchmark. The assert in the setup code is a unittest. It's just easy to write a concise test using 10 and what value is used there isn't really relevant to what is used in the x//17 microbenchmark expression itself.

remainder = dividend % n;
pout[size] = quotient;
}
return remainder;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patterns like *--pin and *--pout are ubiquitous in this file. so converting that style to pin[size] (etc) is jarring in context. The patch would be better if it stuck to changing what needed to be changed.

It would also be good to add a comment explaining why computing both "/" and "%" is faster on most boxes now than doing what the code originally did. "/" and "%" are both very expensive if they're done in isolation, which is presumably why the original code did a "*" and "-" instead of "%".

Copy link
Member Author

@gpshead gpshead Jan 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to add the missing comment to this effect while you're in this file in your #30856.

[] vs pointer arithmetic would make me want to rerun benchmarks and examine generated code out of curiosity so i'll leave that alone for now - it might be worth investigating and would be consistent. I just took what Mark had written on python-dev and ran with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants