Skip to content

use a queue for reading files in parallel #6330

Open
@jbweston

Description

@jbweston

The problem

I and some colleagues are developing a sphinx extension (doctree transform) that executes code blocks embedded in RST documents and inserts the code's output into the final document.

Sometimes the code blocks take a long time to run (several seconds), so it is useful to use the parallel build feature of sphinx. However, when parallel reading, sphinx passes the input files to the worker processes lexicographically, i.e. the first process gets the first N files, the second process the next N files etc. If the files that take a long time to execute just so happen to have names that are lexicographically close, there is a good chance that they will be assigned to the same worker process, hence losing the advantage of parallelism.

A possible solution

Have the main sphinx process maintain a queue of files to build, and have the worker processes pop files from this shared queue as needed.

I am willing to have a go at implementing this, if it is in principle a feature that would be accepted if implemented correctly etc.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions