Performance regression 0.770 -> 0.780 due to zip() stub #9016

kaste · 2020-06-18T07:50:57Z

After upgrading to 0.780 I found a severe performance regression. Luckily, on open software so I can reproduce it as follows. This is on Windows. The program is at https://github.com/SublimeLinter/SublimeLinter

> git clone https://github.com/SublimeLinter/SublimeLinter.git
> mypy --follow-imports silent --show-error-codes --show-column-numbers --hide-error-context -v D:\SublimeLinter\panel_view.py

So I cloned into D:\SublimeLinter\, of importance is that you lint panel_view.py using the absolute path. (Note that other top level files here are as fast as before.)

On mypy 0.770, this ends with

LOG:  Build finished in 3.645 seconds with 63 modules, and 3 errors
Found 3 errors in 1 file (checked 1 source file)

On mypy 0.780:

LOG:  Build finished in 47.830 seconds with 64 modules, and 4 errors
Found 4 errors in 1 file (checked 1 source file)

The text was updated successfully, but these errors were encountered:

TH3CHARLie · 2020-06-18T09:51:19Z

On my macOS:
0.780

LOG:  Build finished in 6.396 seconds with 62 modules, and 4 errors
Found 3 errors in 1 file (checked 1 source file)

0.770

LOG:  Build finished in 1.886 seconds with 61 modules, and 3 errors
Found 2 errors in 1 file (checked 1 source file)

Not that severe, but definitely some performance regression.

JelleZijlstra · 2020-06-18T13:38:22Z

Interesting that the number of modules checked goes up by one. Running with -v may let you figure out what the new module is.

kaste · 2020-06-18T15:55:35Z

Yeah sure, one module up and one error up.

https://www.diffchecker.com/Fs4iNvez

LOG:  Metadata fresh for genericpath: file c:\dev\python38\lib\site-packages\mypy\typeshed\stdlib\2and3\genericpath.pyi

☝️ This is the new module.

JukkaL · 2020-06-19T11:25:26Z

I was also able to reproduce the issue on macOS (timings are similar to what @TH3CHARLie reported). I will investigate this.

JukkaL · 2020-06-19T12:58:02Z

I've narrowed it down to python/typeshed#3830. This has the side effect of causing the type checking of this statement to take several seconds (found in panel_view.py that was mentioned above):

    widths = tuple(
        zip(
            ('line', 'col', 'error_type', 'linter_name', 'code'),
            map(
                max,
                zip(*[
                    (
                        len(str(error['line'] + 1)),
                        len(str(error['start'] + 1)),
                        len(error['error_type']),
                        len(error['linter']),
                        len(str(error['code'])),
                    )
                    for error in flatten(errors_by_file.values())
                ])
            )
        )
    )  # type: Tuple[Tuple[str, int], ...]

There may other things going on as well, but this looks like the biggest issue. The above statement triggers exponential slowdown in mypy, and the typeshed change made things worse.

A simple workaround would be to split the statement into multiple ones, each with only a single zip or map call.

I'll see if there is some simple optimization we can use to speed up cases like this. Related (but not identical) issues have been reported before.

kaste · 2020-06-19T15:04:21Z

Removing or rewriting the statement brings the execution time down to 2,5-2,8 secs. Esp. it is enough to take the inner starzip zip(*[...]) out. Still, the rewritten version of that file runs in ~1,9s on 0.770.

Unfortunately mypy now doesn't like the map(max, ...) expression. E.g. rewritten as

    widths_per_error = (
        (
            len(str(error['line'] + 1)),
            len(str(error['start'] + 1)),
            len(error['error_type']),
            len(error['linter']),
            len(str(error['code'])),
        )
        for error in flatten(errors_by_file.values())
    )
    widths = tuple(
        zip(
            ('line', 'col', 'error_type', 'linter_name', 'code'),
            map(max, zip(*widths_per_error))  # type: ignore[arg-type]  # <===
        )
    )  # type: Tuple[Tuple[str, int], ...]

I now have to have an ignore comment here. This code checks fine in 0.770. (Hm, looks like mypy doesn't understand map(max, [[2]])...)

JukkaL · 2020-06-19T15:56:31Z

@kaste After removing the statement performance is similar on 0.780 for me on macOS. Is 0.780 consistently slower on Windows? What if you use --no-incremental?

kaste · 2020-06-19T16:15:01Z

Well, yeah, completely removing the statement performs the same on both mypy versions. But the rewritten version I just posted, I cannot just remove the statement obviously, shows a performance regression from ~1,9s -> 2,5-2,8s. If I use --no-incremental I see the same ~0.7s increase.

JukkaL · 2020-06-19T16:21:10Z

Thanks for clarification. It seems that the performance regression is specific to zip and can be pretty bad even for relatively simple code. I'm going to look at making this faster.

hauntsaninja · 2020-06-25T18:14:21Z

Note typeshed reverted the relevant changes to zip in python/typeshed#4254

JukkaL · 2020-06-26T15:32:59Z

This can be closed after a typeshed sync.

JukkaL · 2020-07-03T17:17:08Z

Typeshed has been synced, so this should be back to how it worked in 0.770.

@kaste Can you verify that the performance regression has been fixed on GitHub master?

kaste · 2020-07-05T18:50:55Z

With current master exec time is ~3.9-4.0 while on 0.770 I have ~3.7s. Checking the whole project takes 9,9-10 on master, and ~9,6-7 with 0.770. Clearly no excess here but a degradation.

JukkaL added performance priority-0-high labels Jun 19, 2020

JukkaL self-assigned this Jun 19, 2020

kaste closed this as completed Jun 19, 2020

kaste reopened this Jun 19, 2020

JelleZijlstra mentioned this issue Jun 21, 2020

Regression in determining return type of zip() on tuples python/typeshed#4226

Closed

JukkaL changed the title ~~Performance regression 0.770 -> 0.780~~ Performance regression 0.770 -> 0.780 due to zip() stub Jun 25, 2020

kaste closed this as completed Jul 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance regression 0.770 -> 0.780 due to zip() stub #9016

Performance regression 0.770 -> 0.780 due to zip() stub #9016

kaste commented Jun 18, 2020

TH3CHARLie commented Jun 18, 2020 •

edited

Loading

JelleZijlstra commented Jun 18, 2020

kaste commented Jun 18, 2020

JukkaL commented Jun 19, 2020

JukkaL commented Jun 19, 2020

kaste commented Jun 19, 2020

JukkaL commented Jun 19, 2020

kaste commented Jun 19, 2020

JukkaL commented Jun 19, 2020

hauntsaninja commented Jun 25, 2020

JukkaL commented Jun 26, 2020

JukkaL commented Jul 3, 2020

kaste commented Jul 5, 2020

Performance regression 0.770 -> 0.780 due to zip() stub #9016

Performance regression 0.770 -> 0.780 due to zip() stub #9016

Comments

kaste commented Jun 18, 2020

TH3CHARLie commented Jun 18, 2020 • edited Loading

JelleZijlstra commented Jun 18, 2020

kaste commented Jun 18, 2020

JukkaL commented Jun 19, 2020

JukkaL commented Jun 19, 2020

kaste commented Jun 19, 2020

JukkaL commented Jun 19, 2020

kaste commented Jun 19, 2020

JukkaL commented Jun 19, 2020

hauntsaninja commented Jun 25, 2020

JukkaL commented Jun 26, 2020

JukkaL commented Jul 3, 2020

kaste commented Jul 5, 2020

TH3CHARLie commented Jun 18, 2020 •

edited

Loading