-
Notifications
You must be signed in to change notification settings - Fork 951
docs installation slow on NFS home #1867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
What's fascinating here is that while the install took minutes, there's only 6 seconds of syscalls there. I can only assume we're only getting CPU cost, not wait latency logged by strace. However, profiles support should resolve this issue for you (PR #1673) so let's revisit this after the release of a |
I don't think I'd assume that; here's the
I'll be watching that PR with great interest! |
Yep, so 2m38s of wallclock time, of which 16s user time (likely decompression, hashing, etc) and 6.6s of syscall CPU time. The remaining 2m14s or so will be waiting for the storage/network. :D |
Yep, you are entirely blocked on high latency IO. Can I get another strace please, with -T this time? Attach it somewhere I can download it, it will be long. Pending verification from that trace bug... We can introduce threading into your environment; if the open+write calls are low latency and close is slow, which client-checked but server enforced filesystems often have, then the current threading will be effective, we just need to enable it for you. We can stop doing a second walk over the unpacked files - tar-rs already chmods (fchmod now) the files as it unpacks; we actually completely reset the permissions afterwards. A little bit of smarts (e.g. doing a fchmod on the handle from tar-rs for the common case and then only doing a fchmod for the uncommon case from the manifest) would let us drastically reduce syscalls); better still might be to tell tar-rs to not set chmod's at all as another feature - we'll want to be guided by the profile data. |
Set RUSTUP_CLOSE_THREADS=disabled to force single threaded IO, or to a specific number if desired for testing/tuning. This may improve rust-lang#1867, but has no impact on rust-lang#1866 due to the coarse lock around the fd-handle table inside WSL.
@saethlin I haven't reviewed the strace yet, and its late, but that patch of mine is the lowest hanging fruit that may help, I'd be interested to know whether it has any impact for you at all. |
Ok, so interesting that we get open() rather than openat, but -
about 3ms to create the inode, about 2ms to fchmod it, and writes and closes are nearly free, and chmod is also ~1ms. 20k items, so 1ms == 20 seconds. so 60 seconds in open(), 40 seconds in fchmod(), and 20 seconds() in the final chmods. So, the things we need to do to make it fast for HPC filesystems are:
|
@saethlin how many cores do you have available? Clearly we can oversubscribe as your actual CPU time is near zero, but I'm wondering if a heuristic of machine-cpu's which we use for dealing with defender will be reasonable here too (and save on code path proliferation) |
Usually I have 32 cores available. Just to be clear, any changes you make in the name of installing docs faster specifically on HPC systems (as opposed to everywhere) will be moot as soon as we can disable docs. I'm also not really keen on having an installation process that uses threads to drive enormous load onto the shared filesystem. I've witnessed write latencies of a few seconds because of misuse and I don't want to see rustup usage be responsible for that. |
alexcrichton/tar-rs#207 will permit tackling the chmod duplication. Its worth noting (for @alexcrichton if no-one else that there are several reasons we don't trust the distribution archives: they have been wrong in the past; we are still used to install those old archives, and when we start streaming-installs of archives we'll only verify signatures at the end so we want to trust as little as possible. @saethlin so there are two parts here, one is actually doing the least work possible. Like - we used to do a full additional copy of the files which would have made older releases even slower; fixed already. Eliminating a full second pass of chmod metadata updates will reduce load on the metadata server (which HPC system are you using BTW? gluster? cephfs?). The second part, once we have reduce the work done to the bare minimum with no waste is about scheduling: do we schedule the work to complete as rapidly as possible, or do we throttle it? I understand your concern and share it: I certainly don't want to create headaches for folk by driving up a backlog somewhere. But lets put this in context - we have under a second of IO to a single platter of spinning rust disks (consumer disks have exceeded 200MBps for a decade or more now). Lets not assume that we are going to cause a problem until we have data showing it. I know for instance that with ext4 you want to create the files relatively closely together on a busy system, because ext4 is going to try to keep them together, and if the fs is too full (and high activity is moving what is used around a lot) it won't be able to. https://www.kernel.org/doc/html/latest/filesystems/ext4/overview.html#block-and-inode-allocation-policy So my proposal is this: I write various things, and you try them out - I don't have your HPC cluster with its particular tradeoffs. If they cause havoc in test, we don't push them into master. If they don't cause havoc, we can consider moving forward. |
We're on Lustre, not entirely sure what version though. Your proposal sounds good. I'm learning about |
As a for instance rust-docs/share/doc/rust/html/rust-by-example/fn/closures/closure_examples: strace for closure_examples
This has a total syscall time of
Which is representative of the extra work here - we really only need to set umask appropriately, do the mkdir once, then never touch the dir again. That would halve the amount of work that the HPC file system is being asked to do. |
http://www.nics.tennessee.edu/computing-resources/file-systems/io-lustre-tips seems like it has good recommendations here. In particular, @saethlin can you
And test current released rustup ? I'm not expecting any radical change, just curious about any impact this might have. The stdlib and rustc etc are obviously stripable, and we could look at special casing docs etc to make just them single-stripe, but this is more along the lines of information gathering. |
(Also TBH I think just some sensible docs in README.md might be useful here). |
Oh dangit, I was mistaken. Our
I'll sure find a use for those lustre tips anyway. |
Ahha! ok, so if ~ is on plain old NFS, we're probably facing a classic single-machine multi user situation - but its also likely that that machine actually has decent IO backing it, if we're not nuts - we're just 1ms away on the network, so anything that has to cache-bust or be synchronous blocks. I'll have some non-aggressive test branch for you soon. |
Set RUSTUP_CLOSE_THREADS=disabled to force single threaded IO, or to a specific number if desired for testing/tuning. This may improve rust-lang#1867, but has no impact on rust-lang#1866 due to the coarse lock around the fd-handle table inside WSL.
Set RUSTUP_CLOSE_THREADS=disabled to force single threaded IO, or to a specific number if desired for testing/tuning. This may improve rust-lang#1867, but has no impact on rust-lang#1866 due to the coarse lock around the fd-handle table inside WSL.
Set RUSTUP_CLOSE_THREADS=disabled to force single threaded IO, or to a specific number if desired for testing/tuning. This may improve rust-lang#1867, but has no impact on rust-lang#1866 due to the coarse lock around the fd-handle table inside WSL.
Set RUSTUP_CLOSE_THREADS=disabled to force single threaded IO, or to a specific number if desired for testing/tuning. This may improve rust-lang#1867, but has no impact on rust-lang#1866 due to the coarse lock around the fd-handle table inside WSL.
Rustup 1.20 has been released which supports a |
Absolutely. Thanks so much for all your hard work on this. |
Problem
rustup docs installation is remarkably slow on an HPC system (probably most HPC systems). In case it's useful to anyone, I'm observing this on HiPerGator, which uses the Lustre filesystem.
I've attached some
strace -c
output below along with what rustup printed during the update. The 433.6 KiB/s is probably mis-stating the situation; for a few seconds at a time the install speed drops to between 50 and 8.0 KiB/s as if it's being throttled by the filesystem.Possible Solution(s)
It would be great to opt out of docs altogether; I'm told this is a feature under consideration which would be great. There's no way for me to use the HTML docs in an HPC environment; I use the local ones on the machine I'm ssh'd from.
Notes
Output of
rustup --version
: rustup 1.18.3 (435397f 2019-05-22)Output of
rustup show
:strace -c
output while running arustup update nightly
The text was updated successfully, but these errors were encountered: