Skip to content

[mypyc] Use optimized implementation for builtins.sum #10268

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Nov 10, 2021

Conversation

sinback
Copy link
Contributor

@sinback sinback commented Mar 30, 2021

Description

In mypyc, add an optimized implementation for the builtins iterator idiom sum().

For some reason, in the case of calling sum() over a Generator, the builtins implementation of sum() is significantly slower than a naive implementation which increments return value every time something in the generator evaluates to True. This PR forces the computation of sum() to be been shown through benchmarking to speed up the execution of sum() over integers by about 2-5x.

There is some support for the start argument, but only for if 'start' is a literal expression (has a .value attribute). The current implementation doesn't work with arbitrary values for start, because I couldn't figure out how to get any Expression that could be given to be evaluated fully into something that you can initialize the retval Register to. So for example, these cases will not get optimized:

a = 1
sum((x == 0 for x in [0]), a)        # won't get evaluated because a is a NameExpr
sum((x == 0 for x in [0], 0 + j)    # won't get evaluated because 0 + j is an OpExpr

I did some playing around for the above two cases which works ok, where you evaluate the expressions by calling builder.accept() on the expression given for 'start' until you finally get down to a literal value (something that has '.value') but it doesn't work for arbitrary expressions and I realized that if mypyc does stuff like that it should probably do it in an expression-substituter somewhere else. idk

This does a good chunk of mypyc/mypyc#796 but doesn't finish it because it only implements sum().

Test Plan

I added a bunch of test cases for different ways that the optimized. I think there are probably enough. Happy to add more.

@sinback sinback changed the title Sinback/mypyc 796 sum impl mypyc: use optimized implementation for builtins.sum Mar 30, 2021
@msullivan msullivan self-requested a review March 30, 2021 21:50
Copy link
Collaborator

@msullivan msullivan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for submitting this PR! This is a great start.

call_expr = builder.accept(gen.left_expr)
builder.add_bool_branch(call_expr, true_block, false_block)
builder.activate_block(true_block)
builder.assign(retval, builder.binary_op(retval, Integer(1), '+', expr.line), -1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding Integer(1) seems like it is not what you want? Probably you want call_expr

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, oops, I looked closer at this:
Counting up the number of elements that match something in a list was one of our motivating use cases, but we need to work when the value is anything, not just a boolean.

I think you can ditch all the logic that branches on the call_expr and just accumulate the result of + into retval.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, because booleans. That makes lots of sense lol

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, hold up, if I do this (and you are right, that is most definitely what sum is actually for, lol - not just for boolean comparisons), then stuff breaks a la what I saw in the comment below about using builder.accept(start_expr) regardless of whether stuff is literals or not:

def fn():
    return 1
print(sum((x == 0 for x in [fn(), fn()])))

leads to

building 'example' extension
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/home/sinback/repos/mypy/mypyc/lib-rt -I/home/sinback/repos/mypy/env/include -I/home/sinback/.pyenv/versions/3.9.2/include/python3.9 -c build/__native.c -o build/temp.linux-x86_64-3.9/build/__native.o -O3 -Werror -Wno-unused-function -Wno-unused-label -Wno-unreachable-code -Wno-unused-variable -Wno-unused-command-line-argument -Wno-unknown-warning-option -Wno-unused-but-set-variable
build/__native.c: In function ‘CPyDef___top_level__’:
build/__native.c:172:14: error: assignment makes integer from pointer without a cast [-Werror=int-conversion]
     cpy_r_r5 = cpy_r_r21;
              ^
build/__native.c: At top level:
cc1: error: unrecognized command line option ‘-Wno-unknown-warning-option’ [-Werror]
cc1: error: unrecognized command line option ‘-Wno-unused-command-line-argument’ [-Werror]
cc1: all warnings being treated as errors
error: command '/usr/bin/gcc' failed with exit code 1

at compile time. that is too bad

target_type = float_rprimitive
else:
target_type = object_rprimitive
# give up if start_expr is not a literal
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You ought to be able to call builder.accept on start_expr and get the right thing back regardless of whether it has a 'value' attribute. What was going wrong when you tried that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was hoping for that too, but couldn't get it to work. It seemed like stuff always went wrong for me when dealing with non-literals for startval that way. Here's some examples. (For all I just commented out the part that makes us give up, we try to initialize the retval to just builder.accept(start_expr) without requiring anything of start_expr)

a = 1
print(sum((x == 0 for x in [0]), a))

makes builder.accept(), when called on a, resolve it from a NameExpr to an Unbox, which seems probably right, but then gcc fails when it actually runs:

building 'example' extension
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/home/sinback/repos/mypy/mypyc/lib-rt -I/home/sinback/repos/mypy/env/include -I/home/sinback/.pyenv/versions/3.9.2/include/python3.9 -c build/__native.c -o build/temp.linux-x86_64-3.9/build/__native.o -O3 -Werror -Wno-unused-function -Wno-unused-label -Wno-unreachable-code -Wno-unused-variable -Wno-unused-command-line-argument -Wno-unknown-warning-option -Wno-unused-but-set-variable
build/__native.c: In function ‘CPyDef___top_level__’:
build/__native.c:165:15: error: assignment makes pointer from integer without a cast [-Werror=int-conversion]
     cpy_r_r18 = cpy_r_r17;
               ^
build/__native.c: At top level:
cc1: error: unrecognized command line option ‘-Wno-unknown-warning-option’ [-Werror]
cc1: error: unrecognized command line option ‘-Wno-unused-command-line-argument’ [-Werror]
cc1: all warnings being treated as errors
error: command '/usr/bin/gcc' failed with exit code 1

and it also doesn't really work for calls:

def fn() -> int:
    return 1

print(sum((x == 0 for x in [0]), fn()))

evaluates a CallExpr into a Call, but then the same unsafe pointer operation problem happens.

and doesn't work for arithmetic operations either:

print(sum((x == 0 for x in [0]), 0 + 1))

tries to turn an OpExpr into a CallC and then has the same unsafe pointer operation problem.

I dunno what was going on and figuring out seemed like a can of worms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did something yesterday which involved a while loop until there was a .value attribute present, but I can't figure out exactly what right now, and it would never finish for CallCs I think

if not expr.arg_kinds[1] in (ARG_POS, ARG_NAMED):
return None
start_expr = expr.args[1]
if isinstance(start_expr, IntExpr):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of figuring out the target_type by casing on the expression, we can find it in a more general way with builder.node_type(expr). (This will also work in the case where there is no start_expr)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a good function to be pointed at, thanks!

I implemented & pushed this change - note that for the compile errors I am griping about in the other comments, this change seems to replace those errors with mere segfaults at runtime. idk what's up there yet either

# a is a NameExpr and not supported
a = 1
print(sum((x == 0 for x in [0, 1]), a))

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some test cases where the sum is of some mathematical function of a list, and some where the values are floats? maybe something like (x**2 for x in [some float list])

sinback added 3 commits March 31, 2021 09:47
To avoid a strange issue where one-element sums that should be 1 were
instead True, initialize the sum return value to 0.0 when its intended
type is not specified as integer. This still leads to sums of boolean
expressions returning integers (as in CPython).
@sinback
Copy link
Contributor Author

sinback commented Apr 1, 2021

haha, this stuff is hard, my first implementation was very silly. All your suggestions ultimately helped me make the solution way less derpy though. I think this way looks basically as right as I can figure out how to make it right now.

I updated the code so that when it's time to initialize the sum, if no start argument was given, you just evaluate a dummy start expression anyway, which returns 0. In the 'object' case I found I had to initialize the sum to 0.0 instead of 0, to prevent sums which should be 1 from being True? I beat my head against it for an hour but I couldn't figure out how to avoid that another way, I tried using a bunch of coercions and stuff but it didn't work.

else:
# IntExpr feels better here, but then if the return value of sum was untypehinted and
# the result should be 1, it seems to be True instead, unless we initialize it this
# way?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this seems like it ought to be an IntExpr, and I think that it is actually important for cases like:

l: List[Any] = [1, 2, 3]
result = sum(x*x for x in l)

That will use object_rprimitive as the type, and we want to produce 14 as the result and not 14.0.

What was an example of a case where a True was getting produced?

I think that probably we want to wrap the compilation of start_expr in a builder.coerce before assigning it to retval.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call. all fixed now

@msullivan
Copy link
Collaborator

I'll do a (final, probably!) round of comments this afternoon, but I took a quick look at the test failures and the main issue is that apparently the start argument to sum couldn't be specified by name until Python 3.8, so tests will need to pass it without the name in order to work on earlier versions. Also looks like there are some lint errors from flake8.

Copy link
Collaborator

@msullivan msullivan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I didn't get back to you last week like I said I was going to.

Everything looks pretty good now and just needs a couple minor bits:

  • I've asked for one more test case
  • Fixing the CI failures. It looks like it is the start parameter issue and some flake8 lint warnings

print(sum((x == 0 for x in [0, 1]), 1j))
print(sum(c == 'd' for c in 'abcdd'))
print(sum((c == 'd' for c in 'abcdd'), 1))
print(sum((c == 'd' for c in 'abcdd'), start=1))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to drop the start, since it isn't supported on all the pythons we need to support. (We could add a test to run-python38.test, but it hardly seems worth it for this.)

print(sum(i + j == 0 for i, j in zip([0, 0, 0], [0, 1, 0])))

print('test misc cases')
print(sum((x == 0 for x in [0, 1]), 0 + 1j))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add a test that sums up complex numbers but doesn't have a start argument? Just to test the object flow without a start.

@sinback
Copy link
Contributor Author

sinback commented Apr 6, 2021

Thanks Michael for catching all those little things! I'll get to them.

Last week I also realized the sum implementation wasn't actually being used by the run-misc.test file. I'm not sure why not but I'm going to figure out what's up there as well.

I have a bunch of interviews this week but will probably get to everything by the end of the weekend

The start kwarg was only added in Python 3.8, so it should not be tested
as part of the normal testing suite.

[case testSum]
[file driver.py]
print('test sums of numbers')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These don't actually test anything, since the driver.py file is not compiled. (This is a common mistake so we should try to improve the developer experience here.)

The preferred way is to use test_ functions in the main test case and not include driver.py at all. For example:

[case testFoo]
def test_whatever() -> None:
    assert 1 + 2 == 3

def test_more() -> None:
    assert 'x' * 2 == 'xx'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tip!

sinback added 3 commits April 6, 2021 09:52
driver.py is not actually compiled, so move all the tests into the main
test file.
@sinback
Copy link
Contributor Author

sinback commented Apr 6, 2021

Actually I think this PR is good to go already? despite me saying I'd get it done by the end of the weekend just above.

Is it poor form to rebase and force-push the branch after opening a PR? there are a lot of little fixup commits in this PR by now and it seems like not all this history would need to make its way into the main branch. Or do the maintainers like to handle the git history themselves at merge-time?

@msullivan
Copy link
Collaborator

Oh, oops; good catch on the test, Jukka >_>.

The preferred workflow for this project is to push new commits instead of rebasing them, since it makes it easier for reviewers to look at just what is changed. We'll squash it down to one commit when we merge it.

@@ -238,6 +238,7 @@ class StopIteration(Exception):

def any(i: Iterable[T]) -> bool: pass
def all(i: Iterable[T]) -> bool: pass
def sum(i: Iterable[T]) -> int: pass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this will need to take a start argument. It looks like that is the cause of a bunch of the test failures.

Copy link
Collaborator

@97littleleaf11 97littleleaf11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! only a few suggestions on tests.


def test_sum_multi() -> None:
assert sum(i + j == 0 for i, j in zip([0, 0, 0], [0, 1, 0]))

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert sum == 2

@@ -875,6 +875,56 @@ assert call_all(mixed_110) == 1
assert call_any_nested([[1, 1, 1], [1, 1], []]) == 1
assert call_any_nested([[1, 1, 1], [0, 1], []]) == 0


[case testSum]
from typing import Any, List
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding [typing fixtures/typing-full.pyi] here is the simplest way to solve test errors.

@97littleleaf11
Copy link
Collaborator

@sinback Hi! Any progress on this PR? IMHO it's almost done and it really helps the performance.

@97littleleaf11 97littleleaf11 self-requested a review November 10, 2021 03:23
@97littleleaf11 97littleleaf11 changed the title mypyc: use optimized implementation for builtins.sum [mypyc] Use optimized implementation for builtins.sum Nov 10, 2021
@97littleleaf11 97littleleaf11 merged commit 47b22c5 into python:master Nov 10, 2021
@sinback
Copy link
Contributor Author

sinback commented Nov 10, 2021

excited to see this got merged! :) sorry for the radio silence after the change requests, I worked on this in between jobs and was in the middle of trying to get up to speed on the new one around when I dropped off the map. appreciate you carrying it over the finish line @97littleleaf11

@97littleleaf11
Copy link
Collaborator

:) You don't need to say sorry, it was already a good pr before I picked it up, which not only covers many test cases but also improves the perf a lot.

tushar-deepsource pushed a commit to DeepSourceCorp/mypy that referenced this pull request Jan 20, 2022
Partially fixes mypyc/mypyc#796

This PR forces the computation of sum() to be been shown through benchmarking to speed up the execution of sum() over integers by about 2-5x.

There is some support for the start argument, but only for if 'start' is a literal expression (has a .value attribute). The current implementation doesn't work with arbitrary values for start, because I couldn't figure out how to get any Expression that could be given to be evaluated fully into something that you can initialize the retval Register to. So for example, these cases will not get optimized:

a = 1
sum((x == 0 for x in [0]), a)        # won't get evaluated because a is a NameExpr
sum((x == 0 for x in [0], 0 + j)    # won't get evaluated because 0 + j is an OpExpr

Co-authored-by: 97littleleaf11 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants