Skip to content

pydoc is obscenely slow for some modules #118465

Closed
@serhiy-storchaka

Description

@serhiy-storchaka
Member

For example, ./python -m pydoc test.test_enum takes 32 seconds. It is 20 seconds in 3.12, 15 seconds in 3.11 and only 1.6 seconds in 3.10. Well, perhaps test.test_enum was grown, but the main culprit is bpo-35113. And further changes like gh-106727 only added to it.

For every class without a docstring pydoc tries to find its comments by calling inspect.getcomments() which calls inspect.findsource() which reads and parses the module source, then traverse it and find classes with the specific qualname. For large modules with many classes it has quadratic complexity.

I tried to optimize the AST traversing code, and get 18 seconds on main. It still has quadratic complexity. Further optimization will require introducing a cache and finding positions of all classes in one pass.

But it all would be much simpler and faster if simply save the value of co_firstlineno of the code object executed during class creation in the file dict (as __firstlineno__ for example).

Linked PRs

Activity

changed the title [-]pydoc is obscently slow for some modules[/-] [+]pydoc is obscenely slow for some modules[/+] on May 1, 2024
added a commit that references this issue on May 1, 2024

pythongh-118465: Optimize inspect.findsource() for classes

added a commit that references this issue on May 1, 2024

pythongh-118465: Add __firstlineno__ attribute to class

serhiy-storchaka

serhiy-storchaka commented on May 1, 2024

@serhiy-storchaka
MemberAuthor

#118471 reduces the time from 32 to 18 seconds.
#118475 reduces the time to just 1 second.

carljm

carljm commented on May 1, 2024

@carljm
Member

Not only does a runtime __firstlineno__ for classes fix the performance problem, it also improves correctness. In cases where there are multiple conditional definitions of a class, the previous code had to guess. Now we know exactly where the runtime class object you are asking about was actually defined.

terryjreedy

terryjreedy commented on May 1, 2024

@terryjreedy
Member

`pyclbr produces a custom tree of class and function descriptors, including line numbers, in one visitor pass from the module ast. The consumer can then poke around the tree as desired without rereading and reparsing. There is intentionally no code execution so that unknown-to-be-safe files can be browsed.

inspect gets information from live objects. So it seems sensible that all the information inspect might need, or a source index thereto, should be included with the object. I presume the first line of Python functions is already available in the code object.

Is it possible to parse and construct an ast for a single statement starting with its first line?

Perhaps when pydoc is given a filename, it should directly call ast and run a custom visitor, like pyclbr now does, and not use inspect. This would be a separate issue from enhancing live objects for the benefit of inspect.

serhiy-storchaka

serhiy-storchaka commented on May 2, 2024

@serhiy-storchaka
MemberAuthor

Yes, the current code that searches the class definition in the sources is awful and not completely reliable, but the code that preceded it was worse, although faster.

My concern is that this attribute is only used in inspect.findsource() (and indirectly in inspect.getcomments()). But it is the only reliable solution of the specified problem, otherwise we can only guess. There may be several class definitions with the same name in the file, and the class can be renamed after creation that breaks any searching attempts.

Other solution is to deprecate inspect.findsource() and inspect.getcomments() for classes.

Perhaps when pydoc is given a filename, it should directly call ast and run a custom visitor, like pyclbr now does, and not use inspect. This would be a separate issue from enhancing live objects for the benefit of inspect.

Maybe, but usually (when used as help() in the REPL) it is given a live object: a function, a class or a module. Even if it is given an object path and can load and parse the module code, we have a problem of multiple definitions (depending on conditions), generated classes and functions (see for example turtle) and classes and functions imported from other modules where they are implemented (C implementations, submodules). I makes sense to show what the user get when they import the module, even if it is implemented elsewhere, and not what they can potentially get on other platforms or in different environment.

added a commit that references this issue on May 6, 2024

gh-118465: Add __firstlineno__ attribute to class (GH-118475)

153b3f7
added a commit that references this issue on May 8, 2024

pythongh-118465: Add __firstlineno__ attribute to class (pythonGH-118475

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    performancePerformance or resource usage

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @carljm@serhiy-storchaka@terryjreedy

        Issue actions

          pydoc is obscenely slow for some modules · Issue #118465 · python/cpython