-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Refactor file checking to allow parallel linting in Prospector #3016
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I got some encoding errors when trying to run the tests in a Windows VM but the same error can be reproduced on Linux by setting multiprocessing start type to spawn:
So PyLinter should be made picklable and the easiest way would be to split the linting implementation out from PyLinter class into separate class that would contain only the required functionality for linting, i.e. all configuration etc. would be left into PyLinter. |
I updated this PR to contain only the refactoring changes that cleanup copy-paste code and other things to allow easier refactoring of parallel linting. Parallel linting refactoring needs a bit more thinking so I decided to get the changes required for the actual refactoring merged back so there would be no conflicts due to changes in PyLint master |
I tried to split PyLinter into two classes where one would take care of configurations etc. and the second would contain linting logic but that proved to be a very big change. Instead I was able to introduce couple quite small changes that made PyLinter objects pickleable. I will update this PR as soon as I get the changes polished a bit. |
If it makes reviewing easier, I can split this PR into multiple PRs. Also note that each commit in this PR is a atomic change and all commits can be reviewed one by one to be able to see why different changes were made. |
The commits in this PR were rebased on top of commi 41c9522 containing conflicting changes, i.e. the conflicts are now resolved. |
Rebased the changes once more as there was conflicts in the changes merged to master |
@PCManticore I have been resolving conflicts from code pushed to master couple of times and would like to know if this PR is something you are even considering pulling? Is there something I could do to make reviewing easier or help in some other way to get this PR merged? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's nice to see this area of the code getting some love! Thanks for the change.
My only request is please could you add some more complete docstrings and/or type hints on the new methods that you've added.
pylint/lint.py
Outdated
linter.reporter = None | ||
|
||
# The linter is inherited by all the pool's workers, i.e. the linter | ||
# is identical to the linter object here. This is requirde so that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trivial: Typo here for "required".
Fixed the typo and added docstrings. Anything else you would like to get changed and/or updated? |
3de8339
to
04db81a
Compare
Is there anyyhing I can do to make this PR move forward? |
@PCManticore I've resolved the conflicts and squashed all the commits. Did you want to review before we merge? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome stuff @jannero ! This looks good to go, can you solve the conflicts and move the changelog to 2.5, same for the documentation?
@@ -15,7 +15,17 @@ Release date: TBA | |||
What's New in Pylint 2.4.3? | |||
=========================== | |||
|
|||
Release date: TBA | |||
* Allow parallel linting when run under Prospector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move this to the 2.5 section instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pylint/lint.py
Outdated
# The linter is inherited by all the pool's workers, i.e. the linter | ||
# is identical to the linter object here. This is requred so that | ||
# a custom PyLinter object (inherited from PyLinter) can be used. | ||
# See https://github.com/PyCQA/prospector/issues/320 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think a link to prospector's issue tracker adds any value here, can we remove and expand the comment if needed more context instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pylint/lint.py
Outdated
def _check_astroid_module(self, ast_node, walker, rawcheckers, tokencheckers): | ||
"""Check given AST node with given walker and checkers | ||
|
||
ast_mode: AST node of the module to check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we switch to restructuredText instead? It's most used structured documentation format used in the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
This change prepares the code for enabling Prospector to take advantage of running PyLint parallel. Iterating files is moved into generator (_iterate_file_descrs) so that parallel checking can use the same implementation (_check_file) just by providing different kind of generator that reads the files from parent process. The refactoring removes code duplication that existed in PyLinter._do_check method; checking module content from stdin had identical implementation to checking content from a source file. Made PyLinter.expand_files a private method. The previous implementation of parallel linting created new PyLinter objects in the worker (child) process causing failure when running under Prospector because Prospector uses a custom PyLinter class (a class inherited from PyLinter) and PyLint naturally just creates PyLinter object. This caused linting to fail because there is options for Prospector's IndentChecker which was not created in the worker process. The new implementation passes the original PyLinter object into workers when the workers are created. See https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods Note that as Windows uses spawn method by default, PyLinter object (and its) members need to be pickleable from now on with the exception being PyLinter.reporter which is not passed to child processes. The performance has remained about the same based on quick tests done with Django project containing about 30 000 lines of code; with the old implementation linting took 26-28 seconds with 8 jobs on quad core i7 and 24-27 seconds with the new implementation.
Thank you @jannero ! |
Does this solve running in parallel with |
@s0undt3ch I don't know if there any issues, but I've documented my understanding of the situation here: #4874 (comment). In short: on macOS with Python 3.8+ and the changes introduced by this PR, astroid transformation plugins won't work. |
Description
The PyLinter.check and related methods are refactored to remove copy-paste code and to allow simpler parallel linting implementation.
PyLinter is made pickleable so that the configured linter can be passed to worker processes (even when not using fork method) so custom PyLinter objects can be used when linting parallel. For example, Prospector, will benefit from this. See prospector-dev/prospector#320
Type of Changes