bpo-46406: Faster single digit int division. #30626

gpshead · 2022-01-16T23:45:40Z

This expresses the algorithm in a more basic manner resulting in better
instruction generation by todays compilers.

See https://mail.python.org/archives/list/[email protected]/thread/ZICIMX5VFCX4IOFH5NUPVHCUJCQ4Q7QM/#NEUNFZU3TQU4CPTYZNF3WCN7DOJBBTK5

https://bugs.python.org/issue46406

This expresses the algorithm in a more basic manner resulting in better instruction generation by todays compilers. See https://mail.python.org/archives/list/[email protected]/thread/ZICIMX5VFCX4IOFH5NUPVHCUJCQ4Q7QM/#NEUNFZU3TQU4CPTYZNF3WCN7DOJBBTK5

Objects/longobject.c

mdickinson

LGTM, modulo a cast needed to silence the compiler on Windows.

Co-authored-by: Mark Dickinson <[email protected]>

mdickinson · 2022-01-23T10:00:14Z

Still LGTM. Thank you!

bedevere-bot · 2022-01-23T10:00:45Z

@mdickinson: Please replace # with GH- in the commit message next time. Thanks!

tim-one · 2022-01-23T17:04:37Z

Objects/longobject.c

+   PGO/FDO builds doing value specialization such as a fast path for //10. :)
+
+   Verify that 17 isn't specialized and this works as a quick test:
+     python -m timeit -s 'x = 10**1000; r=x//10; assert r == 10**999, r' 'x//17'


Asserting that division by 10 "works" is irrelevant when the test is actually dividing by 17. But it's a timing statement, so asserting that the division works is beside the point anyway. The point is to get the timing, so

python -m timeit -s 'x = 10**1000' 'x//17'

would be best. And 17 isn't of real interest anyway. Use 10 instead. That's the value we've actually seen special-cased in PGO builds. 17 only came up when Mark temporarily fudged a PGO-build input test to pick on 17 instead of 10, just to see what would happen.

The command is just a paste of what i was using in my terminal as a quick unittest and microbenchmark. The assert in the setup code is a unittest. It's just easy to write a concise test using 10 and what value is used there isn't really relevant to what is used in the x//17 microbenchmark expression itself.

tim-one · 2022-01-23T17:18:27Z

Objects/longobject.c

+        remainder = dividend % n;
+        pout[size] = quotient;
+    }
+    return remainder;


Patterns like *--pin and *--pout are ubiquitous in this file. so converting that style to pin[size] (etc) is jarring in context. The patch would be better if it stuck to changing what needed to be changed.

It would also be good to add a comment explaining why computing both "/" and "%" is faster on most boxes now than doing what the code originally did. "/" and "%" are both very expensive if they're done in isolation, which is presumably why the original code did a "*" and "-" instead of "%".

Feel free to add the missing comment to this effect while you're in this file in your #30856.

[] vs pointer arithmetic would make me want to rerun benchmarks and examine generated code out of curiosity so i'll leave that alone for now - it might be worth investigating and would be consistent. I just took what Mark had written on python-dev and ran with it.

gpshead requested a review from mdickinson January 16, 2022 23:45

the-knights-who-say-ni added the CLA signed label Jan 16, 2022

bedevere-bot added the awaiting core review label Jan 16, 2022

re-add the assert, reword news.

91b1050

gpshead marked this pull request as ready for review January 17, 2022 03:15

gpshead added the performance Performance or resource usage label Jan 17, 2022

mdickinson reviewed Jan 22, 2022

View reviewed changes

Objects/longobject.c Outdated Show resolved Hide resolved

mdickinson reviewed Jan 22, 2022

View reviewed changes

Objects/longobject.c Outdated Show resolved Hide resolved

mdickinson approved these changes Jan 22, 2022

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting core review labels Jan 22, 2022

raghavthind2005 approved these changes Jan 22, 2022

View reviewed changes

gpshead and others added 2 commits January 22, 2022 10:37

Spell Mark's name properly.

d09f0c7

Co-authored-by: Mark Dickinson <[email protected]>

Add a cast to silence a compiler warning.

9902dc2

Co-authored-by: Mark Dickinson <[email protected]>

mdickinson merged commit c7f20f1 into python:main Jan 23, 2022

bedevere-bot removed the awaiting merge label Jan 23, 2022

gpshead deleted the long-faster-divide branch January 23, 2022 10:02

tim-one reviewed Jan 23, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

bpo-46406: Faster single digit int division. #30626

bpo-46406: Faster single digit int division. #30626

Uh oh!

gpshead commented Jan 16, 2022 •

edited by bedevere-bot

Loading

Uh oh!

Uh oh!

Uh oh!

mdickinson left a comment

Uh oh!

mdickinson commented Jan 23, 2022

Uh oh!

bedevere-bot commented Jan 23, 2022

Uh oh!

tim-one Jan 23, 2022

Uh oh!

gpshead Jan 24, 2022

Uh oh!

tim-one Jan 23, 2022

Uh oh!

gpshead Jan 24, 2022 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

bpo-46406: Faster single digit int division. #30626

bpo-46406: Faster single digit int division. #30626

Uh oh!

Conversation

gpshead commented Jan 16, 2022 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mdickinson left a comment

Choose a reason for hiding this comment

Uh oh!

mdickinson commented Jan 23, 2022

Uh oh!

bedevere-bot commented Jan 23, 2022

Uh oh!

tim-one Jan 23, 2022

Choose a reason for hiding this comment

Uh oh!

gpshead Jan 24, 2022

Choose a reason for hiding this comment

Uh oh!

tim-one Jan 23, 2022

Choose a reason for hiding this comment

Uh oh!

gpshead Jan 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gpshead commented Jan 16, 2022 •

edited by bedevere-bot

Loading

gpshead Jan 24, 2022 •

edited

Loading