Skip to content

Performance regression 0.770 -> 0.780 due to zip() stub #9016

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kaste opened this issue Jun 18, 2020 · 13 comments
Closed

Performance regression 0.770 -> 0.780 due to zip() stub #9016

kaste opened this issue Jun 18, 2020 · 13 comments

Comments

@kaste
Copy link

kaste commented Jun 18, 2020

After upgrading to 0.780 I found a severe performance regression. Luckily, on open software so I can reproduce it as follows. This is on Windows. The program is at https://github.com/SublimeLinter/SublimeLinter

> git clone https://github.com/SublimeLinter/SublimeLinter.git
> mypy --follow-imports silent --show-error-codes --show-column-numbers --hide-error-context -v D:\SublimeLinter\panel_view.py

So I cloned into D:\SublimeLinter\, of importance is that you lint panel_view.py using the absolute path. (Note that other top level files here are as fast as before.)

On mypy 0.770, this ends with

LOG:  Build finished in 3.645 seconds with 63 modules, and 3 errors
Found 3 errors in 1 file (checked 1 source file)

On mypy 0.780:

LOG:  Build finished in 47.830 seconds with 64 modules, and 4 errors
Found 4 errors in 1 file (checked 1 source file)
@TH3CHARLie
Copy link
Collaborator

TH3CHARLie commented Jun 18, 2020

On my macOS:
0.780

LOG:  Build finished in 6.396 seconds with 62 modules, and 4 errors
Found 3 errors in 1 file (checked 1 source file)

0.770

LOG:  Build finished in 1.886 seconds with 61 modules, and 3 errors
Found 2 errors in 1 file (checked 1 source file)

Not that severe, but definitely some performance regression.

@JelleZijlstra
Copy link
Member

Interesting that the number of modules checked goes up by one. Running with -v may let you figure out what the new module is.

@kaste
Copy link
Author

kaste commented Jun 18, 2020

Yeah sure, one module up and one error up.

https://www.diffchecker.com/Fs4iNvez

LOG:  Metadata fresh for genericpath: file c:\dev\python38\lib\site-packages\mypy\typeshed\stdlib\2and3\genericpath.pyi

☝️ This is the new module.

@JukkaL
Copy link
Collaborator

JukkaL commented Jun 19, 2020

I was also able to reproduce the issue on macOS (timings are similar to what @TH3CHARLie reported). I will investigate this.

@JukkaL
Copy link
Collaborator

JukkaL commented Jun 19, 2020

I've narrowed it down to python/typeshed#3830. This has the side effect of causing the type checking of this statement to take several seconds (found in panel_view.py that was mentioned above):

    widths = tuple(
        zip(
            ('line', 'col', 'error_type', 'linter_name', 'code'),
            map(
                max,
                zip(*[
                    (
                        len(str(error['line'] + 1)),
                        len(str(error['start'] + 1)),
                        len(error['error_type']),
                        len(error['linter']),
                        len(str(error['code'])),
                    )
                    for error in flatten(errors_by_file.values())
                ])
            )
        )
    )  # type: Tuple[Tuple[str, int], ...]

There may other things going on as well, but this looks like the biggest issue. The above statement triggers exponential slowdown in mypy, and the typeshed change made things worse.

A simple workaround would be to split the statement into multiple ones, each with only a single zip or map call.

I'll see if there is some simple optimization we can use to speed up cases like this. Related (but not identical) issues have been reported before.

@kaste kaste closed this as completed Jun 19, 2020
@kaste kaste reopened this Jun 19, 2020
@kaste
Copy link
Author

kaste commented Jun 19, 2020

Removing or rewriting the statement brings the execution time down to 2,5-2,8 secs. Esp. it is enough to take the inner starzip zip(*[...]) out. Still, the rewritten version of that file runs in ~1,9s on 0.770.

Unfortunately mypy now doesn't like the map(max, ...) expression. E.g. rewritten as

    widths_per_error = (
        (
            len(str(error['line'] + 1)),
            len(str(error['start'] + 1)),
            len(error['error_type']),
            len(error['linter']),
            len(str(error['code'])),
        )
        for error in flatten(errors_by_file.values())
    )
    widths = tuple(
        zip(
            ('line', 'col', 'error_type', 'linter_name', 'code'),
            map(max, zip(*widths_per_error))  # type: ignore[arg-type]  # <===
        )
    )  # type: Tuple[Tuple[str, int], ...]

I now have to have an ignore comment here. This code checks fine in 0.770. (Hm, looks like mypy doesn't understand map(max, [[2]])...)

@JukkaL
Copy link
Collaborator

JukkaL commented Jun 19, 2020

@kaste After removing the statement performance is similar on 0.780 for me on macOS. Is 0.780 consistently slower on Windows? What if you use --no-incremental?

@kaste
Copy link
Author

kaste commented Jun 19, 2020

Well, yeah, completely removing the statement performs the same on both mypy versions. But the rewritten version I just posted, I cannot just remove the statement obviously, shows a performance regression from ~1,9s -> 2,5-2,8s. If I use --no-incremental I see the same ~0.7s increase.

@JukkaL
Copy link
Collaborator

JukkaL commented Jun 19, 2020

Thanks for clarification. It seems that the performance regression is specific to zip and can be pretty bad even for relatively simple code. I'm going to look at making this faster.

@JukkaL JukkaL changed the title Performance regression 0.770 -> 0.780 Performance regression 0.770 -> 0.780 due to zip() stub Jun 25, 2020
@hauntsaninja
Copy link
Collaborator

Note typeshed reverted the relevant changes to zip in python/typeshed#4254

@JukkaL
Copy link
Collaborator

JukkaL commented Jun 26, 2020

This can be closed after a typeshed sync.

@JukkaL
Copy link
Collaborator

JukkaL commented Jul 3, 2020

Typeshed has been synced, so this should be back to how it worked in 0.770.

@kaste Can you verify that the performance regression has been fixed on GitHub master?

@kaste
Copy link
Author

kaste commented Jul 5, 2020

With current master exec time is ~3.9-4.0 while on 0.770 I have ~3.7s. Checking the whole project takes 9,9-10 on master, and ~9,6-7 with 0.770. Clearly no excess here but a degradation.

@kaste kaste closed this as completed Jul 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants