Description
For example, ./python -m pydoc test.test_enum
takes 32 seconds. It is 20 seconds in 3.12, 15 seconds in 3.11 and only 1.6 seconds in 3.10. Well, perhaps test.test_enum
was grown, but the main culprit is bpo-35113. And further changes like gh-106727 only added to it.
For every class without a docstring pydoc
tries to find its comments by calling inspect.getcomments()
which calls inspect.findsource()
which reads and parses the module source, then traverse it and find classes with the specific qualname. For large modules with many classes it has quadratic complexity.
I tried to optimize the AST traversing code, and get 18 seconds on main. It still has quadratic complexity. Further optimization will require introducing a cache and finding positions of all classes in one pass.
But it all would be much simpler and faster if simply save the value of co_firstlineno
of the code object executed during class creation in the file dict (as __firstlineno__
for example).
Activity
[-]pydoc is obscently slow for some modules[/-][+]pydoc is obscenely slow for some modules[/+]pythongh-118465: Optimize inspect.findsource() for classes
pythongh-118465: Add __firstlineno__ attribute to class
serhiy-storchaka commentedon May 1, 2024
#118471 reduces the time from 32 to 18 seconds.
#118475 reduces the time to just 1 second.
carljm commentedon May 1, 2024
Not only does a runtime
__firstlineno__
for classes fix the performance problem, it also improves correctness. In cases where there are multiple conditional definitions of a class, the previous code had to guess. Now we know exactly where the runtime class object you are asking about was actually defined.terryjreedy commentedon May 1, 2024
`pyclbr produces a custom tree of class and function descriptors, including line numbers, in one visitor pass from the module ast. The consumer can then poke around the tree as desired without rereading and reparsing. There is intentionally no code execution so that unknown-to-be-safe files can be browsed.
inspect
gets information from live objects. So it seems sensible that all the information inspect might need, or a source index thereto, should be included with the object. I presume the first line of Python functions is already available in the code object.Is it possible to parse and construct an ast for a single statement starting with its first line?
Perhaps when pydoc is given a filename, it should directly call ast and run a custom visitor, like pyclbr now does, and not use inspect. This would be a separate issue from enhancing live objects for the benefit of inspect.
serhiy-storchaka commentedon May 2, 2024
Yes, the current code that searches the class definition in the sources is awful and not completely reliable, but the code that preceded it was worse, although faster.
My concern is that this attribute is only used in
inspect.findsource()
(and indirectly ininspect.getcomments()
). But it is the only reliable solution of the specified problem, otherwise we can only guess. There may be several class definitions with the same name in the file, and the class can be renamed after creation that breaks any searching attempts.Other solution is to deprecate
inspect.findsource()
andinspect.getcomments()
for classes.Maybe, but usually (when used as
help()
in the REPL) it is given a live object: a function, a class or a module. Even if it is given an object path and can load and parse the module code, we have a problem of multiple definitions (depending on conditions), generated classes and functions (see for exampleturtle
) and classes and functions imported from other modules where they are implemented (C implementations, submodules). I makes sense to show what the user get when they import the module, even if it is implemented elsewhere, and not what they can potentially get on other platforms or in different environment.gh-118465: Add __firstlineno__ attribute to class (GH-118475)
pythongh-118465: Add __firstlineno__ attribute to class (pythonGH-118475
__firstlineno__
class attribute microsoft/pyright#9484