From 078231aba6a9d545299445134d7e43dd0246e3af Mon Sep 17 00:00:00 2001 From: Jukka Lehtosalo Date: Tue, 2 Jul 2019 17:12:34 +0100 Subject: [PATCH 1/3] Document how plugins can support the new semantic analyzer --- mypy/plugin.py | 139 ++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 110 insertions(+), 29 deletions(-) diff --git a/mypy/plugin.py b/mypy/plugin.py index 8183db576e6a..b1a860f7f4d7 100644 --- a/mypy/plugin.py +++ b/mypy/plugin.py @@ -1,40 +1,121 @@ """Plugin system for extending mypy. At large scale the plugin system works as following: -* Plugins are collected from the corresponding config option - (either a paths to Python files, or installed Python modules) - and imported using importlib + +* Plugins are collected from the corresponding mypy config file option + (either via paths to Python files, or installed Python modules) + and imported using importlib. + * Every module should get an entry point function (called 'plugin' by default, - but may be overridden in the config file), that should accept a single string - argument that is a full mypy version (includes git commit hash for dev versions) - and return a subclass of mypy.plugins.Plugin + but may be overridden in the config file) that should accept a single string + argument that is a full mypy version (includes git commit hash for dev + versions) and return a subclass of mypy.plugins.Plugin. + * All plugin class constructors should match the signature of mypy.plugin.Plugin - (i.e. should accept an mypy.options.Options object), and *must* call super().__init__ -* At several steps during semantic analysis and type checking mypy calls special `get_xxx` - methods on user plugins with a single string argument that is a full name of a relevant - node (see mypy.plugin.Plugin method docstrings for details) -* The plugins are called in the order they are passed in the config option. Every plugin must - decide whether to act on a given full name. The first plugin that returns non-None object - will be used + (i.e. should accept an mypy.options.Options object), and *must* call + super().__init__(). + +* At several steps during semantic analysis and type checking mypy calls + special `get_xxx` methods on user plugins with a single string argument that + is a fully qualified name (full name) of a relevant definition + (see mypy.plugin.Plugin method docstrings for details). + +* The plugins are called in the order they are passed in the config option. + Every plugin must decide whether to act on a given full name. The first + plugin that returns non-None object will be used. + * The above decision should be made using the limited common API specified by - mypy.plugin.CommonPluginApi -* The callback returned by the plugin will be called with a larger context that includes - relevant current state (e.g. a default return type, or a default attribute type) and - a wider relevant API provider (e.g. SemanticAnalyzerPluginInterface or - CheckerPluginInterface) -* The result of this is used for further processing. See various `XxxContext` named tuples - for details about which information is given to each hook. + mypy.plugin.CommonPluginApi. + +* The callback returned by the plugin will be called with a larger context that + includes relevant current state (e.g. a default return type, or a default + attribute type) and a wider relevant API provider (e.g. + SemanticAnalyzerPluginInterface or CheckerPluginInterface). + +* The result of this is used for further processing. See various `XxxContext` + named tuples for details about which information is given to each hook. Plugin developers should ensure that their plugins work well in incremental and -daemon modes. In particular, plugins should not hold global state, and should always call -add_plugin_dependency() in plugin hooks called during semantic analysis, see the method -docstring for more details. - -There is no dedicated cache storage for plugins, but plugins can store per-TypeInfo data -in a special .metadata attribute that is serialized to cache between incremental runs. -To avoid collisions between plugins they are encouraged to store their state -under a dedicated key coinciding with plugin name in the metadata dictionary. -Every value stored there must be JSON-serializable. +daemon modes, and with both the old and new semantic analyzers (the old semantic +analyzer will be removed soon). In particular, plugins should not hold global +state, and should always call add_plugin_dependency() in plugin hooks called +during semantic analysis. See the method docstring for more details. + +There is no dedicated cache storage for plugins, but plugins can store +per-TypeInfo data in a special .metadata attribute that is serialized to the +mypy caches between incremental runs. To avoid collisions between plugins, they +are encouraged to store their state under a dedicated key coinciding with +plugin name in the metadata dictionary. Every value stored there must be +JSON-serializable. + +## New semantic analyzer + +The new semantic analyzer (enabled through the --new-semantic-analyzer flag) +changes how plugins are expected to work in several notable ways: + +1. The order of processing modules is different. The old semantic analyzer + processed modules in textual order, one module at a time. The new semantic + analyzer first processes the module top levels, including bodies of any + top-level classes. Functions and methods are processed only after module + top levels have been finished. If there is an import cycle, all module + top levels in the cycle are processed before processing any functions or + methods. Each unit of processing (a module top level or a function/method) + is called a *target*. + + This also means that function signatures in the same module have not been + analyzed yet when analyzing the module top level. If you need access to + a function signature, you'll need to explicitly analyze the signature first + using `type_anal()`. + +2. Each target can be processed multiple times. This may happen if some forward + references are not ready yet, for example. This means that semantic analyzer + related plugin hooks can be called multiple times for the same full name. + These plugin methods must thus be idempotent. + +3. The `anal_type` API function returns None if some part of the type is not + available yet. If this happens, the current target being analyzed will be + *deferred*, which means that it will be processed again soon, in the hope + that additional dependencies will be available. This may happen if there are + forward references to types or inter-module references to types within an + import cycle. + + Note that if there is a circular definition, mypy may decide to stop + processing to avoid an infinite number of iterations. When this happens, + `anal_type` will generate an error and return an `AnyType` type object + during the final iteration (instead of None). + +4. There is a new API method `defer()`. This can be used to explicitly request + the current target to be reprocessed one more time. You don't need this + to call this if `anal_type` returns None, however. + +5. There is a new API property `final_iteration`, which is true once the + maximum semantic analysis iteration count has been reached. You must never + defer during the final iteration, as it will cause a crash. + +6. The `node` attribute of SymbolTableNode objects may contain a reference to + a PlaceholderNode object. This object means that this definition has not + been fully processed yet. If you encounter a PlaceholderNode, you should + defer unless it's the final iteration. If it's the final iteration, you + should generate an error message. It usually means that there's a cyclic + definition that cannot be resolved by mypy. PlaceholderNodes can only refer + to references inside an import cycle. If you are looking up things from + another module, such as the builtins, that is outside the current module or + import cycle, you can safely assume that you won't receive a placeholder. + +When testing your plugin with the new semantic analyzer, you should have a test +case that forces a module top level to be processed multiple times. The easiest +way to do this is to include a forward reference to a class in a top-level +annotation. Example: + + c: C # Forward reference causes second analysis pass + class C: pass + +Note that a forward reference in a function signature won't trigger another +pass, since all functions are processed only after the top level has been fully +analyzed. + +You can use `api.options.new_semantic_analyzer` to check whether the new +semantic analyzer is enabled. """ import types From 64a2254e84b783add73a8f99ba394e686e83a2ab Mon Sep 17 00:00:00 2001 From: Jukka Lehtosalo Date: Wed, 3 Jul 2019 12:08:29 +0100 Subject: [PATCH 2/3] Respond to feedback --- docs/source/extending_mypy.rst | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/docs/source/extending_mypy.rst b/docs/source/extending_mypy.rst index f524af847212..17db1946ab72 100644 --- a/docs/source/extending_mypy.rst +++ b/docs/source/extending_mypy.rst @@ -235,3 +235,30 @@ insert some entries there) before the class body is analyzed. module. It is called before semantic analysis. For example, this can be used if a library has dependencies that are dynamically loaded based on configuration information. + +Supporting the new semantic analyzer +************************************ + +Support for the new semantic analyzer (enabled through +``--new-semantic-analyzer``) requires some changes to plugins. Here is +a short summary of the most important changes: + +* The order of processing AST nodes is different. Code outside + functions is processed first, and functions and methods are + processed afterwards. + +* Each AST node can be processed multiple times to resolve forward + references. The same plugin hook may be called multiple times, so + they need to be idempotent. + +* The ``anal_type()`` API method returns ``None`` if some part of + the type is not available yet due to forward references, for example. + +* When looking up symbols, you may encounter *placeholder nodes* that + are used for names that haven't been fully processed yet. You'll + generally want to request another semantic analysis iteration by + *deferring* in that case. + +See the docstring at the top of +`mypy/plugin.py `_ +for more details. From 5c258d3f6937d899f2219bcfd09a038b73d41ace Mon Sep 17 00:00:00 2001 From: Jukka Lehtosalo Date: Wed, 3 Jul 2019 13:31:42 +0100 Subject: [PATCH 3/3] More updates --- mypy/plugin.py | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/mypy/plugin.py b/mypy/plugin.py index b1a860f7f4d7..aac6a41f58df 100644 --- a/mypy/plugin.py +++ b/mypy/plugin.py @@ -53,19 +53,20 @@ The new semantic analyzer (enabled through the --new-semantic-analyzer flag) changes how plugins are expected to work in several notable ways: -1. The order of processing modules is different. The old semantic analyzer - processed modules in textual order, one module at a time. The new semantic - analyzer first processes the module top levels, including bodies of any - top-level classes. Functions and methods are processed only after module - top levels have been finished. If there is an import cycle, all module - top levels in the cycle are processed before processing any functions or - methods. Each unit of processing (a module top level or a function/method) - is called a *target*. +1. The order of processing AST nodes in modules is different. The old semantic + analyzer processes modules in textual order, one module at a time. The new + semantic analyzer first processes the module top levels, including bodies of + any top-level classes and classes nested within classes. ("Top-level" here + means "not nested within a function/method".) Functions and methods are + processed only after module top levels have been finished. If there is an + import cycle, all module top levels in the cycle are processed before + processing any functions or methods. Each unit of processing (a module top + level or a function/method) is called a *target*. This also means that function signatures in the same module have not been analyzed yet when analyzing the module top level. If you need access to a function signature, you'll need to explicitly analyze the signature first - using `type_anal()`. + using `anal_type()`. 2. Each target can be processed multiple times. This may happen if some forward references are not ready yet, for example. This means that semantic analyzer @@ -88,8 +89,9 @@ the current target to be reprocessed one more time. You don't need this to call this if `anal_type` returns None, however. -5. There is a new API property `final_iteration`, which is true once the - maximum semantic analysis iteration count has been reached. You must never +5. There is a new API property `final_iteration`, which is true once mypy + detected no progress during the previous iteration or if the maximum + semantic analysis iteration count has been reached. You must never defer during the final iteration, as it will cause a crash. 6. The `node` attribute of SymbolTableNode objects may contain a reference to