Skip to content

Installation progress bar ✨ #13220

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 20, 2025
Merged

Installation progress bar ✨ #13220

merged 2 commits into from
Feb 20, 2025

Conversation

ichard26
Copy link
Member

Towards #12712.

Installation can be pretty slow so it'd be nice to provide progress feedback to the user.

Implementation notes:

  • The progress bar will wait one refresh cycle (1000ms/6 = 170ms) before appearing. This avoids unsightly very short flashes.

  • The progress bar is transient (i.e. it will disappear once all packages have been installed). This choice was made to avoid adding more clutter to pip install's output (despite the download progress bar being persistent).

  • The progress bar won't be used at all if there's only one package to install.

Demo

Screencast.from.2025-02-11.17-33-02.webm

Where are the tests?

Turns out that aren't any progress bar tests so I had nothing to base any new tests on. I'd appreciate suggestions for testing this w/o essentially retesting rich's own functionality.

For an install progress bar, we'd like to emit logs while the progress
bar updates (for uninstallation messages, etc.). To avoid interwoven
logs, we need to log to the same console that the progress bar is using.

This is easiest to achieve by simply storing a global stdout and stderr
console, queried via a get_console() helper.
Installation can be pretty slow so it'd be nice to provide progress
feedback to the user.

This commit adds a new progress renderer designed for installation:

- The progress bar will wait one refresh cycle (1000ms/6 = 170ms) before
  appearing. This avoids unsightly very short flashes.

- The progress bar is transient (i.e. it will disappear once all
  packages have been installed). This choice was made to avoid adding
  more clutter to pip install's output (despite the download progress
  bar being persistent).

- The progress bar won't be used at all if there's only one package to
  install.
@ichard26
Copy link
Member Author

For non-TTY usecases, this shouldn't break anything, except that a blank line is (unfortunately) added between the "Installing $packages" and "Successfully installed $packages" log lines.

For example, this is the redirected output of pip install pytest > log:

Collecting pytest
  Using cached pytest-8.3.4-py3-none-any.whl.metadata (7.5 kB)
Collecting iniconfig (from pytest)
  Using cached iniconfig-2.0.0-py3-none-any.whl.metadata (2.6 kB)
Collecting packaging (from pytest)
  Using cached packaging-24.2-py3-none-any.whl.metadata (3.2 kB)
Collecting pluggy<2,>=1.5 (from pytest)
  Using cached pluggy-1.5.0-py3-none-any.whl.metadata (4.8 kB)
Using cached pytest-8.3.4-py3-none-any.whl (343 kB)
Using cached pluggy-1.5.0-py3-none-any.whl (20 kB)
Using cached iniconfig-2.0.0-py3-none-any.whl (5.9 kB)
Using cached packaging-24.2-py3-none-any.whl (65 kB)
Installing collected packages: pluggy, packaging, iniconfig, pytest

Successfully installed iniconfig-2.0.0 packaging-24.2 pluggy-1.5.0 pytest-8.3.4

I'd like to say that is fine, but who knows. Do we think this could break some script/pipeline that parses pip's output1 and do we care?

Footnotes

  1. which they shouldn't do, but alas, I'm sure it's happening

@pfmoore
Copy link
Member

pfmoore commented Feb 12, 2025

As a side note, the screencast isn't very convincing because all of the uninstallation messages act as a reasonable progress report in themselves. But I assume this would be more useful when doing a fresh install.

+1 on the idea in general. But often for me, it's not so much the "installed 25 out of 96 packages" progress that's the key issue, it's the time it takes to unpack a wheel. For example, pip install numpy takes 8 seconds on my PC, and pip install scipy takes 15 seconds (with numpy already installed). Having a progress bar for unpacking the wheel would be a lot more beneficial than just "installed 1 of 2". Maybe that's something that could be added as a follow-up?

Although, to be brutally honest, being able to install numpy and scipy in 344ms like uv can would make having a progress bar mostly irrelevant 🙁 So maybe we'd be better off trying to optimise the wheel unpacking process...

@ichard26
Copy link
Member Author

ichard26 commented Feb 12, 2025 via email

@ichard26
Copy link
Member Author

OK, ouch that email reply did not format well at all.

Anyway, here's another screencast with --ignore-installed so there aren't any uninstallation messages.

Screencast.from.2025-02-12.20-15-17.webm

@ichard26
Copy link
Member Author

ichard26 commented Feb 13, 2025

^ in that example, bytecode compilation is also quite literally taking 3/4 of the installation step. I think we're going to have consider parallelizing bytecode compilation despite the complexity (although it may not benefit Windows that much due to high process creation overhead...)

image

OTOH, I am on a Linux box with a SSD and no antivirus, so it's very possible that the file I/O dominates on Windows.

@notatallshaw
Copy link
Member

Anyway, here's another screencast with --ignore-installed so there aren't any uninstallation messages.

Yeah, I describe the scenario that triggered me to write the original issue here: #12712 (comment). In that scenario you see a lot of uninstall messages, and then absolutely nothing for a significant amount of time (in my case 40+ seconds) to the point where you really start to worry if pip is frozen. I will run some tests locally, but from the screencast it looks like it completely removes this issue of thinking pip is frozen.

Although, to be brutally honest, being able to install numpy and scipy in 344ms like uv can would make having a progress bar mostly irrelevant 🙁 So maybe we'd be better off trying to optimise the wheel unpacking process...

My understanding is you only get that performance because the unpacked wheels are cached, uv still takes about the same amount of time to download and unpack a single large wheel, but it does both of these for multiple wheels concurrently. So it will not save you if you have a single very large package you're missing or need to download a new version. It also grows the cache much faster than pip does.

Not saying it isn't worth doing, but it's not always better.


bar = Progress(*columns, refresh_per_second=6, console=console, transient=True)
# Hiding the progress bar at initialization forces a refresh cycle to occur
# until the bar appears, avoiding very short flashes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice trick!

@pfmoore
Copy link
Member

pfmoore commented Feb 13, 2025

My understanding is you only get that performance because the unpacked wheels are cached

Agreed, I wasn't trying to suggest we have to aim for uv's performance (they make different trade-offs than we do). Progress bars do help us to visualise where we spend our time, though, which can direct our optimisation efforts better.

although it may not benefit Windows that much due to high process creation overhead

Yes, we absolutely should prefer threads over processes wherever possible, as process creation is very costly on Windows. Maybe we have a single byte-compilation process (because it needs to run in the target env), that handles the individual compilations on multiple threads? That's a discussion for a separate issue, though.

@ichard26
Copy link
Member Author

Maybe we have a single byte-compilation process (because it needs to run in the target env), that handles the individual compilations on multiple threads? That's a discussion for a separate issue, though.

Experimenting locally, it doesn't seem like bytecode compilation releases the GIL (fair enough) so this isn't an option 🙁. See #12712 (comment) for additional discussion on parallelizing bytecode compilation.

Benchmarking code

import compileall
import multiprocessing as mp
import time
from contextlib import redirect_stdout
from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor
from functools import partial
from io import StringIO
from pathlib import Path

import click


def compile_single(path: Path):
    compileall.compile_file(path, force=True, quiet=2)


def no_parallel(paths: list[Path]) -> None:
    for p in paths:
        compile_single(p)


def threaded(paths: list[Path]) -> None:
    with ThreadPoolExecutor(max_workers=4) as pool:
        pool.map(compile_single, paths)


def multiprocess(paths, *, workers: int = 3) -> None:
    ctx = mp.get_context("spawn")
    with ProcessPoolExecutor(max_workers=workers, mp_context=ctx) as pool:
        pool.map(compile_single, paths)


@click.command
@click.argument("paths", nargs=-1, type=click.Path(exists=True, path_type=Path))
def main(paths: list[Path]) -> None:
    multiprocess_jobs = []
    for n in range(8, 1, -1):
        func = partial(multiprocess, workers=n)
        func.__name__ = f"{func.func.__name__} {n=}"
        multiprocess_jobs.append(func)

    for func in [no_parallel, threaded, *multiprocess_jobs]:
        t0 = time.perf_counter()

        with redirect_stdout(StringIO()) as stdout:
            func(paths)

        elapsed = time.perf_counter() - t0
        print(f"[{func.__name__:<16}] compiled {len(paths)} files in {elapsed:.3f}s")        


if __name__ == "__main__":
    main()

@ichard26
Copy link
Member Author

What's up with the second approval @pradyunsg? :P

I'm leaving this open for the time being as it does make a non-trivial addition to pip install UX, so a longer period for objections is warranted.

@pradyunsg
Copy link
Member

I had this open in multiple tabs. 😅

@ichard26
Copy link
Member Author

I'm planning to merge either this weekend or sometime next week. While I do want to be patient and let people share their feedback and objections before landing this, I also recognise the visibility advantages of being on main early in a release cycle. If this turns out to be a problem, we're more likely to hear about it once it's been available on main for a while.

@uranusjr uranusjr merged commit de44d99 into pypa:main Feb 20, 2025
31 checks passed
@uranusjr
Copy link
Member

Nah I’m just going to merge this. It’s early in the cycle as you mentioned, and revert is cheap.

@ichard26
Copy link
Member Author

Haha, that works. Thanks!

@ichard26 ichard26 deleted the feat/install-bar branch March 8, 2025 04:19
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 23, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants