Skip to content

Commit ccd290c

Browse files
authored
Re-work indirect dependencies (#19798)
Wow, this was quite a ride. Indirect dependencies were always supported kind of on best effort. This PR puts them on some principled foundation. It fixes three crashes and three stale types reported. All tests are quite weird/obscure, they are designed to expose the flaws in current logic (plus one test that passes on master, but it covers important corner case, so I add it just in case ). A short summary of various fixes (in arbitrary order): * Update many outdated comments and docstrings * Missing transitive dependency is now considered stale * Handle transitive generic bases in indirection visitor * Handle chained alias targets in indirection visitor * Always record original aliases during semantic analysis * Replace ad-hoc `module_refs` logic in type checker with more principled one during semantic analysis * Delete `qualified_tvars` as a concept, they are not needed since long ago * Remove ad-hoc handling for `TypeInfo`s from `build.py` * Support symbols with setter type different from getter type In general the logic should be more simple/straightforward now: * Get all symbols we try to access in a module and record the modules they were defined in (note this automatically handles problem with possible excessive `get_proper_type()` calls). * Get all types in a type map, for each type _transitively_ find all named types in them (thus aggregating all interfaces the type depends on) Note since this makes the algorithm correct, it may also make it slower (most notably because we must visit generic bases). I tried to offset this by couple optimizations, hopefully performance impact will be minimal. On my machine slow down is ~0.6%
1 parent f09aa57 commit ccd290c

File tree

11 files changed

+402
-156
lines changed

11 files changed

+402
-156
lines changed

mypy/build.py

Lines changed: 25 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@
4747
from mypy.graph_utils import prepare_sccs, strongly_connected_components, topsort
4848
from mypy.indirection import TypeIndirectionVisitor
4949
from mypy.messages import MessageBuilder
50-
from mypy.nodes import Import, ImportAll, ImportBase, ImportFrom, MypyFile, SymbolTable, TypeInfo
50+
from mypy.nodes import Import, ImportAll, ImportBase, ImportFrom, MypyFile, SymbolTable
5151
from mypy.partially_defined import PossiblyUndefinedVariableVisitor
5252
from mypy.semanal import SemanticAnalyzer
5353
from mypy.semanal_pass1 import SemanticAnalyzerPreAnalysis
@@ -1765,26 +1765,24 @@ def delete_cache(id: str, path: str, manager: BuildManager) -> None:
17651765
17661766
For single nodes, processing is simple. If the node was cached, we
17671767
deserialize the cache data and fix up cross-references. Otherwise, we
1768-
do semantic analysis followed by type checking. We also handle (c)
1769-
above; if a module has valid cache data *but* any of its
1770-
dependencies was processed from source, then the module should be
1771-
processed from source.
1772-
1773-
A relatively simple optimization (outside SCCs) we might do in the
1774-
future is as follows: if a node's cache data is valid, but one or more
1775-
of its dependencies are out of date so we have to re-parse the node
1776-
from source, once we have fully type-checked the node, we can decide
1777-
whether its symbol table actually changed compared to the cache data
1778-
(by reading the cache data and comparing it to the data we would be
1779-
writing). If there is no change we can declare the node up to date,
1780-
and any node that depends (and for which we have cached data, and
1781-
whose other dependencies are up to date) on it won't need to be
1782-
re-parsed from source.
1768+
do semantic analysis followed by type checking. Once we (re-)processed
1769+
an SCC we check whether its interface (symbol table) is still fresh
1770+
(matches previous cached value). If it is not, we consider dependent SCCs
1771+
stale so that they need to be re-parsed as well.
1772+
1773+
Note on indirect dependencies: normally dependencies are determined from
1774+
imports, but since our interfaces are "opaque" (i.e. symbol tables can
1775+
contain cross-references as well as types identified by name), these are not
1776+
enough. We *must* also add "indirect" dependencies from symbols and types to
1777+
their definitions. For this purpose, we record all accessed symbols during
1778+
semantic analysis, and after we finished processing a module, we traverse its
1779+
type map, and for each type we find (transitively) on which named types it
1780+
depends.
17831781
17841782
Import cycles
17851783
-------------
17861784
1787-
Finally we have to decide how to handle (c), import cycles. Here
1785+
Finally we have to decide how to handle (b), import cycles. Here
17881786
we'll need a modified version of the original state machine
17891787
(build.py), but we only need to do this per SCC, and we won't have to
17901788
deal with changes to the list of nodes while we're processing it.
@@ -2409,21 +2407,15 @@ def finish_passes(self) -> None:
24092407

24102408
# We should always patch indirect dependencies, even in full (non-incremental) builds,
24112409
# because the cache still may be written, and it must be correct.
2412-
# TODO: find a more robust way to traverse *all* relevant types?
2413-
all_types = list(self.type_map().values())
2414-
for _, sym, _ in self.tree.local_definitions():
2415-
if sym.type is not None:
2416-
all_types.append(sym.type)
2417-
if isinstance(sym.node, TypeInfo):
2418-
# TypeInfo symbols have some extra relevant types.
2419-
all_types.extend(sym.node.bases)
2420-
if sym.node.metaclass_type:
2421-
all_types.append(sym.node.metaclass_type)
2422-
if sym.node.typeddict_type:
2423-
all_types.append(sym.node.typeddict_type)
2424-
if sym.node.tuple_type:
2425-
all_types.append(sym.node.tuple_type)
2426-
self._patch_indirect_dependencies(self.type_checker().module_refs, all_types)
2410+
self._patch_indirect_dependencies(
2411+
# Two possible sources of indirect dependencies:
2412+
# * Symbols not directly imported in this module but accessed via an attribute
2413+
# or via a re-export (vast majority of these recorded in semantic analysis).
2414+
# * For each expression type we need to record definitions of type components
2415+
# since "meaning" of the type may be updated when definitions are updated.
2416+
self.tree.module_refs | self.type_checker().module_refs,
2417+
set(self.type_map().values()),
2418+
)
24272419

24282420
if self.options.dump_inference_stats:
24292421
dump_type_stats(
@@ -2452,7 +2444,7 @@ def free_state(self) -> None:
24522444
self._type_checker.reset()
24532445
self._type_checker = None
24542446

2455-
def _patch_indirect_dependencies(self, module_refs: set[str], types: list[Type]) -> None:
2447+
def _patch_indirect_dependencies(self, module_refs: set[str], types: set[Type]) -> None:
24562448
assert None not in types
24572449
valid = self.valid_references()
24582450

mypy/checker.py

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -378,11 +378,9 @@ class TypeChecker(NodeVisitor[None], TypeCheckerSharedApi):
378378
inferred_attribute_types: dict[Var, Type] | None = None
379379
# Don't infer partial None types if we are processing assignment from Union
380380
no_partial_types: bool = False
381-
382-
# The set of all dependencies (suppressed or not) that this module accesses, either
383-
# directly or indirectly.
381+
# Extra module references not detected during semantic analysis (these are rare cases
382+
# e.g. access to class-level import via instance).
384383
module_refs: set[str]
385-
386384
# A map from variable nodes to a snapshot of the frame ids of the
387385
# frames that were active when the variable was declared. This can
388386
# be used to determine nearest common ancestor frame of a variable's

mypy/checkexpr.py

Lines changed: 0 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -198,7 +198,6 @@
198198
)
199199
from mypy.typestate import type_state
200200
from mypy.typevars import fill_typevars
201-
from mypy.util import split_module_names
202201
from mypy.visitor import ExpressionVisitor
203202

204203
# Type of callback user for checking individual function arguments. See
@@ -248,36 +247,6 @@ def allow_fast_container_literal(t: Type) -> bool:
248247
)
249248

250249

251-
def extract_refexpr_names(expr: RefExpr, output: set[str]) -> None:
252-
"""Recursively extracts all module references from a reference expression.
253-
254-
Note that currently, the only two subclasses of RefExpr are NameExpr and
255-
MemberExpr."""
256-
while isinstance(expr.node, MypyFile) or expr.fullname:
257-
if isinstance(expr.node, MypyFile) and expr.fullname:
258-
# If it's None, something's wrong (perhaps due to an
259-
# import cycle or a suppressed error). For now we just
260-
# skip it.
261-
output.add(expr.fullname)
262-
263-
if isinstance(expr, NameExpr):
264-
is_suppressed_import = isinstance(expr.node, Var) and expr.node.is_suppressed_import
265-
if isinstance(expr.node, TypeInfo):
266-
# Reference to a class or a nested class
267-
output.update(split_module_names(expr.node.module_name))
268-
elif "." in expr.fullname and not is_suppressed_import:
269-
# Everything else (that is not a silenced import within a class)
270-
output.add(expr.fullname.rsplit(".", 1)[0])
271-
break
272-
elif isinstance(expr, MemberExpr):
273-
if isinstance(expr.expr, RefExpr):
274-
expr = expr.expr
275-
else:
276-
break
277-
else:
278-
raise AssertionError(f"Unknown RefExpr subclass: {type(expr)}")
279-
280-
281250
class Finished(Exception):
282251
"""Raised if we can terminate overload argument check early (no match)."""
283252

@@ -370,7 +339,6 @@ def visit_name_expr(self, e: NameExpr) -> Type:
370339
371340
It can be of any kind: local, member or global.
372341
"""
373-
extract_refexpr_names(e, self.chk.module_refs)
374342
result = self.analyze_ref_expr(e)
375343
narrowed = self.narrow_type_from_binder(e, result)
376344
self.chk.check_deprecated(e.node, e)
@@ -3344,7 +3312,6 @@ def check_union_call(
33443312

33453313
def visit_member_expr(self, e: MemberExpr, is_lvalue: bool = False) -> Type:
33463314
"""Visit member expression (of form e.id)."""
3347-
extract_refexpr_names(e, self.chk.module_refs)
33483315
result = self.analyze_ordinary_member_access(e, is_lvalue)
33493316
narrowed = self.narrow_type_from_binder(e, result)
33503317
self.chk.warn_deprecated(e.node, e)

mypy/checkmember.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -543,6 +543,8 @@ def analyze_member_var_access(
543543
if isinstance(v, FuncDef):
544544
assert False, "Did not expect a function"
545545
if isinstance(v, MypyFile):
546+
# Special case: accessing module on instances is allowed, but will not
547+
# be recorded by semantic analyzer.
546548
mx.chk.module_refs.add(v.fullname)
547549

548550
if isinstance(vv, (TypeInfo, TypeAlias, MypyFile, TypeVarLikeExpr)):

mypy/fixup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -441,4 +441,4 @@ def missing_info(modules: dict[str, MypyFile]) -> TypeInfo:
441441

442442
def missing_alias() -> TypeAlias:
443443
suggestion = _SUGGESTION.format("alias")
444-
return TypeAlias(AnyType(TypeOfAny.special_form), suggestion, line=-1, column=-1)
444+
return TypeAlias(AnyType(TypeOfAny.special_form), suggestion, "<missing>", line=-1, column=-1)

mypy/indirection.py

Lines changed: 56 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -4,17 +4,6 @@
44

55
import mypy.types as types
66
from mypy.types import TypeVisitor
7-
from mypy.util import split_module_names
8-
9-
10-
def extract_module_names(type_name: str | None) -> list[str]:
11-
"""Returns the module names of a fully qualified type name."""
12-
if type_name is not None:
13-
# Discard the first one, which is just the qualified name of the type
14-
possible_module_names = split_module_names(type_name)
15-
return possible_module_names[1:]
16-
else:
17-
return []
187

198

209
class TypeIndirectionVisitor(TypeVisitor[None]):
@@ -23,49 +12,57 @@ class TypeIndirectionVisitor(TypeVisitor[None]):
2312
def __init__(self) -> None:
2413
# Module references are collected here
2514
self.modules: set[str] = set()
26-
# User to avoid infinite recursion with recursive type aliases
27-
self.seen_aliases: set[types.TypeAliasType] = set()
28-
# Used to avoid redundant work
29-
self.seen_fullnames: set[str] = set()
15+
# User to avoid infinite recursion with recursive types
16+
self.seen_types: set[types.TypeAliasType | types.Instance] = set()
3017

3118
def find_modules(self, typs: Iterable[types.Type]) -> set[str]:
3219
self.modules = set()
33-
self.seen_fullnames = set()
34-
self.seen_aliases = set()
20+
self.seen_types = set()
3521
for typ in typs:
3622
self._visit(typ)
3723
return self.modules
3824

3925
def _visit(self, typ: types.Type) -> None:
40-
if isinstance(typ, types.TypeAliasType):
41-
# Avoid infinite recursion for recursive type aliases.
42-
self.seen_aliases.add(typ)
26+
# Note: instances are needed for `class str(Sequence[str]): ...`
27+
if (
28+
isinstance(typ, types.TypeAliasType)
29+
or isinstance(typ, types.ProperType)
30+
and isinstance(typ, types.Instance)
31+
):
32+
# Avoid infinite recursion for recursive types.
33+
if typ in self.seen_types:
34+
return
35+
self.seen_types.add(typ)
4336
typ.accept(self)
4437

4538
def _visit_type_tuple(self, typs: tuple[types.Type, ...]) -> None:
4639
# Micro-optimization: Specialized version of _visit for lists
4740
for typ in typs:
48-
if isinstance(typ, types.TypeAliasType):
49-
# Avoid infinite recursion for recursive type aliases.
50-
if typ in self.seen_aliases:
41+
if (
42+
isinstance(typ, types.TypeAliasType)
43+
or isinstance(typ, types.ProperType)
44+
and isinstance(typ, types.Instance)
45+
):
46+
# Avoid infinite recursion for recursive types.
47+
if typ in self.seen_types:
5148
continue
52-
self.seen_aliases.add(typ)
49+
self.seen_types.add(typ)
5350
typ.accept(self)
5451

5552
def _visit_type_list(self, typs: list[types.Type]) -> None:
5653
# Micro-optimization: Specialized version of _visit for tuples
5754
for typ in typs:
58-
if isinstance(typ, types.TypeAliasType):
59-
# Avoid infinite recursion for recursive type aliases.
60-
if typ in self.seen_aliases:
55+
if (
56+
isinstance(typ, types.TypeAliasType)
57+
or isinstance(typ, types.ProperType)
58+
and isinstance(typ, types.Instance)
59+
):
60+
# Avoid infinite recursion for recursive types.
61+
if typ in self.seen_types:
6162
continue
62-
self.seen_aliases.add(typ)
63+
self.seen_types.add(typ)
6364
typ.accept(self)
6465

65-
def _visit_module_name(self, module_name: str) -> None:
66-
if module_name not in self.modules:
67-
self.modules.update(split_module_names(module_name))
68-
6966
def visit_unbound_type(self, t: types.UnboundType) -> None:
7067
self._visit_type_tuple(t.args)
7168

@@ -105,27 +102,36 @@ def visit_parameters(self, t: types.Parameters) -> None:
105102
self._visit_type_list(t.arg_types)
106103

107104
def visit_instance(self, t: types.Instance) -> None:
105+
# Instance is named, record its definition and continue digging into
106+
# components that constitute semantic meaning of this type: bases, metaclass,
107+
# tuple type, and typeddict type.
108+
# Note: we cannot simply record the MRO, in case an intermediate base contains
109+
# a reference to type alias, this affects meaning of map_instance_to_supertype(),
110+
# see e.g. testDoubleReexportGenericUpdated.
108111
self._visit_type_tuple(t.args)
109112
if t.type:
110-
# Uses of a class depend on everything in the MRO,
111-
# as changes to classes in the MRO can add types to methods,
112-
# change property types, change the MRO itself, etc.
113+
# Important optimization: instead of simply recording the definition and
114+
# recursing into bases, record the MRO and only traverse generic bases.
113115
for s in t.type.mro:
114-
self._visit_module_name(s.module_name)
115-
if t.type.metaclass_type is not None:
116-
self._visit_module_name(t.type.metaclass_type.type.module_name)
116+
self.modules.add(s.module_name)
117+
for base in s.bases:
118+
if base.args:
119+
self._visit_type_tuple(base.args)
120+
if t.type.metaclass_type:
121+
self._visit(t.type.metaclass_type)
122+
if t.type.typeddict_type:
123+
self._visit(t.type.typeddict_type)
124+
if t.type.tuple_type:
125+
self._visit(t.type.tuple_type)
117126

118127
def visit_callable_type(self, t: types.CallableType) -> None:
119128
self._visit_type_list(t.arg_types)
120129
self._visit(t.ret_type)
121-
if t.definition is not None:
122-
fullname = t.definition.fullname
123-
if fullname not in self.seen_fullnames:
124-
self.modules.update(extract_module_names(t.definition.fullname))
125-
self.seen_fullnames.add(fullname)
130+
self._visit_type_tuple(t.variables)
126131

127132
def visit_overloaded(self, t: types.Overloaded) -> None:
128-
self._visit_type_list(list(t.items))
133+
for item in t.items:
134+
self._visit(item)
129135
self._visit(t.fallback)
130136

131137
def visit_tuple_type(self, t: types.TupleType) -> None:
@@ -149,4 +155,9 @@ def visit_type_type(self, t: types.TypeType) -> None:
149155
self._visit(t.item)
150156

151157
def visit_type_alias_type(self, t: types.TypeAliasType) -> None:
152-
self._visit(types.get_proper_type(t))
158+
# Type alias is named, record its definition and continue digging into
159+
# components that constitute semantic meaning of this type: target and args.
160+
if t.alias:
161+
self.modules.add(t.alias.module)
162+
self._visit(t.alias.target)
163+
self._visit_type_list(t.args)

0 commit comments

Comments
 (0)