-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Investigate inference cache improvements for improving the performance #2912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
We could definitely try to see if the disk caching helps a bit, but from the looks of it, I would guess it's going to incur a performance penalty for the offloading of inferred data to disk, given that we do infer a lot during the analysis of a single, relatively large file. We might try with a daemon approach where the caching is done separately from the main analysis program, but sounds like a significant challenge to take. There hasn't been any investigation into the use of |
Re disk penalties - I guess where I'm coming from is that if it takes 30s to infer e.g. |
For simple repeated runs, would it be enough to use a hash of the file contents as a hash key, and cache the results on that? I could see that being problematic if a dependency were to break something for a file that hasn't changed, but for something that brings linting from 15 seconds down to 1.5 it seems awfully tempting. Not being incredibly familiar with the code yet, would there be any fairly cheap way to construct a module dependency tree and invalidate everything that depends on changed files? That would seem to be the safest way of getting that kind of speed improvements. If anyone wants to see/mess with this approach, it's at https://github.com/0xADD1E/pylint/tree/caching-tests |
@0xADD1E I can't access your code. |
@CarliJoy It looks like I may have gotten rid of it a bit ago due to inactivity here. I'll see if I can dig up an old copy from backups later today. If memory serves it was just using stdlib's shelve module and a sha256 key around https://github.com/PyCQA/pylint/blob/8a17bb557a23708ebbe64174c564f9c5741fb5dd/pylint/lint/pylinter.py#L1059 or so... |
DIsk caching is worth investigating. Will close as a duplicate of #1416, but will call out the promising experiment described at #2912 (comment). |
Caching (
lru_cache
) is currently used to to speed up inferencing. I believe there may be a few ways to improve this, and this is a general issue to rethink caching (which is currently very naive, I believe). Some ideas:pandas==X.X.X
and it would build the cache once for that, and then assume that it won't change in their code from them on. This is probably an easy first step, still with some pretty big wins.pandas==X.X.X
there's no need for others to have to ...lru_cache(1024)
). I'm not sure if there's ever been any investigation into how to optimize the caching.NB - I'll try to keep this up-to-date based on any replies below (including removing any of the above if it's rubbish).
The text was updated successfully, but these errors were encountered: