-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Installation progress bar ✨ #13220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Installation progress bar ✨ #13220
Conversation
For an install progress bar, we'd like to emit logs while the progress bar updates (for uninstallation messages, etc.). To avoid interwoven logs, we need to log to the same console that the progress bar is using. This is easiest to achieve by simply storing a global stdout and stderr console, queried via a get_console() helper.
Installation can be pretty slow so it'd be nice to provide progress feedback to the user. This commit adds a new progress renderer designed for installation: - The progress bar will wait one refresh cycle (1000ms/6 = 170ms) before appearing. This avoids unsightly very short flashes. - The progress bar is transient (i.e. it will disappear once all packages have been installed). This choice was made to avoid adding more clutter to pip install's output (despite the download progress bar being persistent). - The progress bar won't be used at all if there's only one package to install.
For non-TTY usecases, this shouldn't break anything, except that a blank line is (unfortunately) added between the "Installing $packages" and "Successfully installed $packages" log lines. For example, this is the redirected output of
I'd like to say that is fine, but who knows. Do we think this could break some script/pipeline that parses pip's output1 and do we care? Footnotes
|
As a side note, the screencast isn't very convincing because all of the uninstallation messages act as a reasonable progress report in themselves. But I assume this would be more useful when doing a fresh install. +1 on the idea in general. But often for me, it's not so much the "installed 25 out of 96 packages" progress that's the key issue, it's the time it takes to unpack a wheel. For example, Although, to be brutally honest, being able to install numpy and scipy in 344ms like uv can would make having a progress bar mostly irrelevant 🙁 So maybe we'd be better off trying to optimise the wheel unpacking process... |
As a side note, the screencast isn't very convincing because all of the uninstallation messages act as a reasonable progress report in themselves. But I assume this would be more useful when doing a fresh install.
Haha, whoops. That was an oversight on my part. It is indeed much more
useful when doing a fresh install.
So maybe we'd be better off trying to optimise the wheel unpacking process...
There has been some effort to optimise the wheel unpacking process already.
I've mostly stopped optimising the install step as what remains to optimise
is likely nontrivial. Last time I checked, the main contributors to install
time are 1) compilation to bytecode, 2) zip unpacking itself, 3) other file
I/O (à la the fsync call we discussed a while back). None of those are easy
to address...
Having a progress bar for unpacking the wheel would be a lot more beneficial than just "installed 1 of 2". Maybe that's something that could be added as a follow-up?
Makes sense, although I'd be wary of flashing tasks in the common case
where the packages are small/fast to install. As long as we do the same
thing with this progress bar and wait a refresh cycle before displaying the
subtask, it's probably fine. In addition, I'm a bit worried about the
performance implications of displaying the per-file installation progress.
It'd be rather ironic if we added progress tracking only for the install
step to be slower.
|
OK, ouch that email reply did not format well at all. Anyway, here's another screencast with Screencast.from.2025-02-12.20-15-17.webm |
^ in that example, bytecode compilation is also quite literally taking 3/4 of the installation step. I think we're going to have consider parallelizing bytecode compilation despite the complexity (although it may not benefit Windows that much due to high process creation overhead...) OTOH, I am on a Linux box with a SSD and no antivirus, so it's very possible that the file I/O dominates on Windows. |
Yeah, I describe the scenario that triggered me to write the original issue here: #12712 (comment). In that scenario you see a lot of uninstall messages, and then absolutely nothing for a significant amount of time (in my case 40+ seconds) to the point where you really start to worry if pip is frozen. I will run some tests locally, but from the screencast it looks like it completely removes this issue of thinking pip is frozen.
My understanding is you only get that performance because the unpacked wheels are cached, uv still takes about the same amount of time to download and unpack a single large wheel, but it does both of these for multiple wheels concurrently. So it will not save you if you have a single very large package you're missing or need to download a new version. It also grows the cache much faster than pip does. Not saying it isn't worth doing, but it's not always better. |
|
||
bar = Progress(*columns, refresh_per_second=6, console=console, transient=True) | ||
# Hiding the progress bar at initialization forces a refresh cycle to occur | ||
# until the bar appears, avoiding very short flashes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice trick!
Agreed, I wasn't trying to suggest we have to aim for uv's performance (they make different trade-offs than we do). Progress bars do help us to visualise where we spend our time, though, which can direct our optimisation efforts better.
Yes, we absolutely should prefer threads over processes wherever possible, as process creation is very costly on Windows. Maybe we have a single byte-compilation process (because it needs to run in the target env), that handles the individual compilations on multiple threads? That's a discussion for a separate issue, though. |
Experimenting locally, it doesn't seem like bytecode compilation releases the GIL (fair enough) so this isn't an option 🙁. See #12712 (comment) for additional discussion on parallelizing bytecode compilation. Benchmarking code
import compileall
import multiprocessing as mp
import time
from contextlib import redirect_stdout
from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor
from functools import partial
from io import StringIO
from pathlib import Path
import click
def compile_single(path: Path):
compileall.compile_file(path, force=True, quiet=2)
def no_parallel(paths: list[Path]) -> None:
for p in paths:
compile_single(p)
def threaded(paths: list[Path]) -> None:
with ThreadPoolExecutor(max_workers=4) as pool:
pool.map(compile_single, paths)
def multiprocess(paths, *, workers: int = 3) -> None:
ctx = mp.get_context("spawn")
with ProcessPoolExecutor(max_workers=workers, mp_context=ctx) as pool:
pool.map(compile_single, paths)
@click.command
@click.argument("paths", nargs=-1, type=click.Path(exists=True, path_type=Path))
def main(paths: list[Path]) -> None:
multiprocess_jobs = []
for n in range(8, 1, -1):
func = partial(multiprocess, workers=n)
func.__name__ = f"{func.func.__name__} {n=}"
multiprocess_jobs.append(func)
for func in [no_parallel, threaded, *multiprocess_jobs]:
t0 = time.perf_counter()
with redirect_stdout(StringIO()) as stdout:
func(paths)
elapsed = time.perf_counter() - t0
print(f"[{func.__name__:<16}] compiled {len(paths)} files in {elapsed:.3f}s")
if __name__ == "__main__":
main() |
What's up with the second approval @pradyunsg? :P I'm leaving this open for the time being as it does make a non-trivial addition to pip install UX, so a longer period for objections is warranted. |
I had this open in multiple tabs. 😅 |
I'm planning to merge either this weekend or sometime next week. While I do want to be patient and let people share their feedback and objections before landing this, I also recognise the visibility advantages of being on |
Nah I’m just going to merge this. It’s early in the cycle as you mentioned, and revert is cheap. |
Haha, that works. Thanks! |
Towards #12712.
Installation can be pretty slow so it'd be nice to provide progress feedback to the user.
Implementation notes:
The progress bar will wait one refresh cycle (1000ms/6 = 170ms) before appearing. This avoids unsightly very short flashes.
The progress bar is transient (i.e. it will disappear once all packages have been installed). This choice was made to avoid adding more clutter to pip install's output (despite the download progress bar being persistent).
The progress bar won't be used at all if there's only one package to install.
Demo
Screencast.from.2025-02-11.17-33-02.webm
Where are the tests?
Turns out that aren't any progress bar tests so I had nothing to base any new tests on. I'd appreciate suggestions for testing this w/o essentially retesting rich's own functionality.