-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
bpo-46406: Faster single digit int division. #30626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This expresses the algorithm in a more basic manner resulting in better instruction generation by todays compilers. See https://mail.python.org/archives/list/[email protected]/thread/ZICIMX5VFCX4IOFH5NUPVHCUJCQ4Q7QM/#NEUNFZU3TQU4CPTYZNF3WCN7DOJBBTK5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, modulo a cast needed to silence the compiler on Windows.
Co-authored-by: Mark Dickinson <[email protected]>
Co-authored-by: Mark Dickinson <[email protected]>
Still LGTM. Thank you! |
@mdickinson: Please replace |
PGO/FDO builds doing value specialization such as a fast path for //10. :) | ||
|
||
Verify that 17 isn't specialized and this works as a quick test: | ||
python -m timeit -s 'x = 10**1000; r=x//10; assert r == 10**999, r' 'x//17' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Asserting that division by 10 "works" is irrelevant when the test is actually dividing by 17. But it's a timing statement, so asserting that the division works is beside the point anyway. The point is to get the timing, so
python -m timeit -s 'x = 10**1000' 'x//17'
would be best. And 17 isn't of real interest anyway. Use 10 instead. That's the value we've actually seen special-cased in PGO builds. 17 only came up when Mark temporarily fudged a PGO-build input test to pick on 17 instead of 10, just to see what would happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The command is just a paste of what i was using in my terminal as a quick unittest and microbenchmark. The assert in the setup code is a unittest. It's just easy to write a concise test using 10 and what value is used there isn't really relevant to what is used in the x//17
microbenchmark expression itself.
remainder = dividend % n; | ||
pout[size] = quotient; | ||
} | ||
return remainder; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Patterns like *--pin
and *--pout
are ubiquitous in this file. so converting that style to pin[size]
(etc) is jarring in context. The patch would be better if it stuck to changing what needed to be changed.
It would also be good to add a comment explaining why computing both "/" and "%" is faster on most boxes now than doing what the code originally did. "/" and "%" are both very expensive if they're done in isolation, which is presumably why the original code did a "*" and "-" instead of "%".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to add the missing comment to this effect while you're in this file in your #30856.
[] vs pointer arithmetic would make me want to rerun benchmarks and examine generated code out of curiosity so i'll leave that alone for now - it might be worth investigating and would be consistent. I just took what Mark had written on python-dev and ran with it.
This expresses the algorithm in a more basic manner resulting in better
instruction generation by todays compilers.
See https://mail.python.org/archives/list/[email protected]/thread/ZICIMX5VFCX4IOFH5NUPVHCUJCQ4Q7QM/#NEUNFZU3TQU4CPTYZNF3WCN7DOJBBTK5
https://bugs.python.org/issue46406