-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Optimizing speed of the starting and initialisation phase #4814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I ran
(Command: However, I think the main problem might be the startup overhead. Update: |
Ultimately pylint's characteristic is that it's slow and very thorough because it infer values contrary to some faster linter. Your file might be small but if the dependencies it uses are huge, pylint will analyses all of them and takes time to infer the value of things (On your file the lint is 1,87s if flask is installed, 0,51 without flask). Improving performance is a goal, but some checks cannot be done without calculation. For example a checker that is known to be slow on big code base is the Regarding startup time, I would not be surprised if the way this is done right now with a giant god class (PyLinter), strange call backs, and deprecated library (optparse) could be hugely improved upon. A performance analysis of the startup time of this part of the code would be greatly beneficial in order to prioritize what needs to be done if we refactor to improve performance. |
Aw, that's discouraging... So I guess Pylint needs some work on the performance front. Are there any efforts to do that, that you're aware of? Also, do any real alternatives to Pylint exist? (I.e. a generic linter with configurable checkers) |
Issues are opened for it, and we're aware of the problem. Everyone likes contributing new check or bug fix, and improving performance is hard work (especially if we need to stop inferring everything and taking type clue into account which would be a major redesign). So yes, there is an effort to do that but maintainers time is limited and being fast is not pylint's main value proposition.
For duplicate code there is Simian. For linting, vanilla flake8 is fast because it does less checks than pylint and does not infer values. It includes pyflake, pycodestyle, mccabe and some of its own checks. I would encourage you to run pylint in continuous integration and pre-commit, and to run pycodestyle/pyflakes in IDE (or simply Pycharm's check if you use pycharm). |
I noticed that another large amount of time was attributed to Simply timing the import of
This is to 75% due to the import of
I'll try to do the same profiling for |
We're refactoring the nodes module right now un astroid. This is for readability reason, because a 4500 lines file freeze the ide for seconds while being analyzed (with a fast linter) and because there are hidden circular import. If there is something to do regarding this import (beside bursting it, we're doing that anyway) it's the perfect time :D |
I ran the same script over the current Example: start = time.time()
import sys
from foo import bar
import_end = time.time()
# the following should probably be included when measuring the import time, but I used a very simple script...
try:
import spam
except ImportError:
spam = None
class Dummy:
pass
end = time.time()
print(';'.join([__file__, f'{(import_end - start):.10f}', f'{(end - import_end):.10f}'])) The result is the following:
The most taxing individual modules (with the highest
I suggest to run the script again after you finished the refactoring and see if it is already better. 😁 If someone wants to use the script himself, here is the snippet: from pathlib import Path
all_py_files = Path("astroid").rglob("*.py")
for py_file in all_py_files:
try:
print(f"Patching {py_file}...")
lines = py_file.read_text().splitlines()
last_import_line = 0
for lineno, line in enumerate(reversed(lines)):
if line.startswith("from") or line.startswith("import"):
last_import_line = len(lines) - lineno
while lines[last_import_line] != "":
last_import_line += 1
break
print(f"Last import is on line {last_import_line}...")
lines[0:0] = ["import time", "start = time.time()"]
lines[last_import_line+2:last_import_line+2] = ["import_end = time.time()"]
lines.extend([
"end = time.time()",
"print(';'.join([__file__, f'{(import_end - start):.10f}', f'{(end - import_end):.10f}']))"
])
py_file.write_text("\n".join(lines))
except IndexError:
print(f"Could not patch file {py_file}!") |
Restricted the issue title to something actionable so it does not become an epic ticket we can never close. |
One idea how to tackle this problem is to introduce
Implementing a simple protocol will also simplify editor integration when editor can use this protocol instead of calling binary directly. [1] https://mypy.readthedocs.io/en/stable/mypy_daemon.html ps. I personally like blackd implementation which is simple, straightforward and also protocol is lightweight (simple http based protocol) |
OK I have finally managed to gather proper information. I have created a very simple $ cat test.py # empty python file
pass
$ time pylint test.py # running pylint command
************* Module test
test.py:1:0: C0114: Missing module docstring (missing-module-docstring)
--------------------------------------------------------------------
Your code has been rated at 0.00/10 (previous run: 10.00/10, -10.00)
real 0m0.744s
user 0m0.572s
sys 0m0.102s
$ time curl http://localhost:8000 # asking pylintd
************* Module test
test.py:1:0: C0114: Missing module docstring (missing-module-docstring)
------------------------------------------------------------------
Your code has been rated at 0.00/10 (previous run: 0.00/10, +0.00)
real 0m0.082s
user 0m0.006s
sys 0m0.011s |
I have also tried the example directly from issue report (generic_http_mock_server.py) and it seems that still the most expensive part is linting itself: $ time curl http://localhost:8000
************* Module generic_http_mock_server
generic_http_mock_server.py:13:0: E0401: Unable to import 'flask' (import-error)
generic_http_mock_server.py:14:0: E0401: Unable to import 'flask.typing' (import-error)
generic_http_mock_server.py:28:0: C0115: Missing class docstring (missing-class-docstring)
real 0m1.732s
user 0m0.006s
sys 0m0.009s
$ time pylint generic_http_mock_server.py
************* Module generic_http_mock_server
generic_http_mock_server.py:13:0: E0401: Unable to import 'flask' (import-error)
generic_http_mock_server.py:14:0: E0401: Unable to import 'flask.typing' (import-error)
generic_http_mock_server.py:28:0: C0115: Missing class docstring (missing-class-docstring)
real 0m2.325s
user 0m4.342s
sys 0m0.523s |
I'm really not sure a deamon server does make for pylint. At least not currently. It might if the linting itself would be fast but as you pointed out and we all already know, that is the most expensive part. Startup doesn't make much of a difference there. Some other things to consider, @Pierre-Sassoulas already mentioned pylint-dev/astroid#792. I would bet there are also other issues. Adding something like this now, does only mean the we need to support one more thing. Maintenance is hard enough as it is already. There is also the issue how it will interact with caching if it gets added at some point which would complicate any implementation even further. I do agree that performance is an issue, but IMO this isn't the right step at the moment. |
As @Pierre-Sassoulas said earlier, Pylint apparently analyzes all dependencies, which would explain why linting even a tiny Python file can be frustratingly slow. If a Pylint server were to keep this analysis in memory and only analyze changes on demand, it wouldn't surprise me if that could improve the performance dramatically. |
That happens as part of the inference and not the startup itself. Say for example you inherit from some unknown base class in your "small" file, Tbh I'm not sure it would be feasible to preinfer all symbols from dependencies. Much just isn't needed. So far we also don't have a use for it.
I believe what you're looking for is most likely some form of caching of inference results. There have been a few discussions around that, in particular pylint-dev/astroid#1145. The main challenge I see there is to know when to invalidate which cache entries. That isn't trivial as an inference result might depend on multiple other files. Crucially though, caching should not require a dedicated server. The cache results could be stored in temp files and be available for the next pylint run. (Similar to mypy.) |
That sounds like a very logical next step for Pylint. Even a "dumb" implementation of an inference cache would be an amazing start:
It sounds to me like this would provide a massive Pylint speed boost for the vast majority of Python projects, and the cache would almost never be an issue. The cache can be invalidate by deleting the cache directory. At least as an opt-in feature until smarter cache invalidation can be figured out, this would be freaking amazing. |
Not so fast! How do you handle dependency upgrades between runs? I'm not sure we infer much from the standard library but even there what about patch version updates?
Although I agree a working caching solution would be amazing, I'm not sure
Although I can understand why you (and others) would like that, it's not something I would recommend personally. If people start using it, they generally expect it to work and will be easily frustrated if it doesn't. Even if we mention that it's only |
Isn't that what flag names like But yes, I understand your point, even though I hope they do it anyway 😁 |
Is it me or is most of the time spent in the bottom left quadrant: handling of messages, messages states, symbols and ids? I'm not sure if I'm reading the graph correctly but we're spending 10% of the run in So if anybody wants to work with this I think that's where they should be looking. @Pierre-Sassoulas Perhaps we should update the title to reflect this as well? |
I'm going to create another issue for optimizing the message store. I'm not sure it's going to be enough to fix the startup time issue entirely but it seems like something we should pursue for sure. |
@DudeNr33 Do you happen to have the command you used to get this profile? It has a lot of noice so it's not particularly useful for finding any improvable areas. |
@DanielNoord unfortunately I can't recall exactly. I think I ran it over the whole pylint codebase, and did not apply any additional options or filtering (my original graph also contains stdlib calls just like yours). It could help to filter the results to only contain calls from What is remarkable in your graph is the big difference in the time spent in |
Hmm, I'll look into the Although I think with my latest PR the issue found in your graph has been mostly fixed? When I run cProfile now the most time is spent in |
Yes, that's what I thought too when I saw your graph. I played around a bit, and while exporting to CSV is not really straightforward with import pstats
p = pstats.Stats("output.pstats")
p.sort_stats("cumtime")
p.print_stats("pylint/pylint", 10) This gives: Fri Dec 24 11:34:23 2021 output.pstats
18251477 function calls (17126994 primitive calls) in 4.765 seconds
Ordered by: cumulative time
List reduced from 3309 to 507 due to restriction <'pylint/pylint'>
List reduced from 507 to 10 due to restriction <10>
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 4.787 4.787 /Users/andreas/programming/forks/pylint/pylint/__main__.py:6(<module>)
1 0.000 0.000 4.787 4.787 /Users/andreas/programming/forks/pylint/pylint/__init__.py:20(run_pylint)
1 0.000 0.000 4.521 4.521 /Users/andreas/programming/forks/pylint/pylint/lint/run.py:76(__init__)
1 0.000 0.000 4.366 4.366 /Users/andreas/programming/forks/pylint/pylint/lint/pylinter.py:1010(check)
1 0.000 0.000 4.365 4.365 /Users/andreas/programming/forks/pylint/pylint/lint/pylinter.py:1067(_check_files)
136 0.001 0.000 4.355 0.032 /Users/andreas/programming/forks/pylint/pylint/lint/pylinter.py:1089(_check_file)
136 0.000 0.000 2.949 0.022 /Users/andreas/programming/forks/pylint/pylint/lint/pylinter.py:1208(get_ast)
136 0.010 0.000 1.403 0.010 /Users/andreas/programming/forks/pylint/pylint/lint/pylinter.py:1238(check_astroid_module)
136 0.000 0.000 1.392 0.010 /Users/andreas/programming/forks/pylint/pylint/lint/pylinter.py:1255(_check_astroid_module)
136 0.010 0.000 0.792 0.006 /Users/andreas/programming/forks/pylint/pylint/utils/utils.py:163(tokenize_module) (Taken from the current main branch, with |
After #5605 and #5606 I don't see any low hanging fruits anymore. I tested with
Here's the result of
Maybe we can create an issue directly in astroid to make importing faster ? |
Sorry I did not respond yet, I haven't found the time to look at it properly. But I trust your analysis, if there are no more low hanging fruits we should not keep this issue open "just in case", but rather create a new one if we find spots were concrete improvements can be done. I also noticed that the |
I reread prior comments, where you said the startup time was:
The import time from astroid now takes 92% of the total, I think it's safe to say, if we find area for improvement in pylint, it's probably going to be impactful only once we take care of astroid import time. I'm going to close this issue, thank you all who participated ! |
I opened pylint-dev/astroid#1320 for the follow-up |
Current problem
Pylint is frustratingly slow, taking several seconds to lint even tiny, simple Python files. Pylint is glacially slow compared to any linting tool in any other language I have worked with. This slowness is not too big a problem for CI/CD pipelines, since in this context a few seconds doesn't matter too much and linting of multiple files can be parallelized.
However, for background linting in editors while editing (for example with ALE) this leads to a terrible experience of warnings popping up seconds after I've moved past a line, or warnings staying up seconds after I've resolved them.
Since Python is not a slow language, I assume this slowness has to do with one or more linter settings. However, since there are hundreds of settings, I have no idea where to begin with improving the performance.
My goal would be to get a pylint run on a single file to take somewhere in the ballpark of 100 ms.
Desired solution
I would like a section in the front page README, or a separate guide linked to from the README file, with suggestions for how to improve the performance of pylint, for example disabling certain options that are particularly taxing.
Linting 80 lines of Python should not take 2.5 seconds, at the very most it should take maybe 100 ms.
Additional context
My pylint config: https://gist.github.com/Hubro/7adba88c42e4df0706067bfe9b2cea53#file-pylintrc-ini
The file I'm currently linting: https://gist.github.com/Hubro/7adba88c42e4df0706067bfe9b2cea53#file-generic_http_mock_server-py
The text was updated successfully, but these errors were encountered: