docs installation slow on NFS home #1867

saethlin · 2019-05-23T04:36:31Z

Problem
rustup docs installation is remarkably slow on an HPC system (probably most HPC systems). In case it's useful to anyone, I'm observing this on HiPerGator, which uses the Lustre filesystem.
I've attached some strace -c output below along with what rustup printed during the update. The 433.6 KiB/s is probably mis-stating the situation; for a few seconds at a time the install speed drops to between 50 and 8.0 KiB/s as if it's being throttled by the filesystem.

Possible Solution(s)
It would be great to opt out of docs altogether; I'm told this is a feature under consideration which would be great. There's no way for me to use the HTML docs in an HPC environment; I use the local ones on the machine I'm ssh'd from.

Notes
Output of rustup --version: rustup 1.18.3 (435397f 2019-05-22)
Output of rustup show:

Default host: x86_64-unknown-linux-gnu

installed toolchains
--------------------

stable-x86_64-unknown-linux-gnu (default)
nightly-x86_64-unknown-linux-gnu

installed targets for active toolchain
--------------------------------------

x86_64-unknown-linux-gnu
x86_64-unknown-linux-musl

active toolchain
----------------

stable-x86_64-unknown-linux-gnu (default)
rustc 1.34.2 (6c2484dc3 2019-05-13)

info: syncing channel updates for 'nightly-x86_64-unknown-linux-gnu'
info: latest update on 2019-05-23, rust version 1.36.0-nightly (37ff5d388 2019-05-22)
info: downloading component 'rustc'
 91.0 MiB /  91.0 MiB (100 %)  48.0 MiB/s in  2s ETA:  0s
info: downloading component 'rust-std'
 61.3 MiB /  61.3 MiB (100 %)  38.4 MiB/s in  1s ETA:  0s
info: downloading component 'cargo'
info: downloading component 'rust-docs'
info: removing component 'rustc'
info: removing component 'rust-std'
info: removing component 'cargo'
info: removing component 'rust-docs'
info: installing component 'rustc'
 91.0 MiB /  91.0 MiB (100 %)  11.7 MiB/s in  9s ETA:  0s
info: installing component 'rust-std'
 61.3 MiB /  61.3 MiB (100 %)  10.3 MiB/s in  5s ETA:  0s
info: installing component 'cargo'
info: installing component 'rust-docs'
 11.1 MiB /  11.1 MiB (100 %) 433.6 KiB/s in  1m 58s ETA:  0s

  nightly-x86_64-unknown-linux-gnu updated - rustc 1.36.0-nightly (37ff5d388 2019-05-22)

strace -c output while running a rustup update nightly

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 26.12    1.577563          79     19971           unlink
 22.39    1.351963          66     20379        37 open
 15.56    0.939575          47     20002           fchmod
 13.00    0.784812          37     21441           chmod
  7.07    0.426997           3    157983           write
  4.40    0.265628          13     20503           lstat
  2.53    0.153032          49      3108           getdents
  1.98    0.119512           5     21896           close
  1.32    0.079967         396       202           read
  1.26    0.076021          49      1545           openat
  0.91    0.055049           5     10621        91 futex
  0.83    0.050257          18      2737       510 stat
  0.82    0.049569          62       803        10 mkdir
  0.67    0.040695          51       794           rmdir
  0.53    0.031927        1774        18           fdatasync
  0.35    0.021093           1     33633           ioctl
  0.17    0.009982          22       462           rename
  0.05    0.002836          24       118           brk
  0.02    0.001310         164         8           munmap
  0.01    0.000356          27        13           copy_file_range
  0.00    0.000296         148         2           clone
  0.00    0.000164           6        28           mmap
  0.00    0.000162           4        39           fstat
  0.00    0.000124           8        15           mprotect
  0.00    0.000054          27         2         1 poll
  0.00    0.000037           7         5           getrandom
  0.00    0.000028           5         6           rt_sigaction
  0.00    0.000020          10         2           wait4
  0.00    0.000018           9         2           socketpair
  0.00    0.000017           4         4         2 recvfrom
  0.00    0.000015           5         3           pipe2
  0.00    0.000013           4         3           sigaltstack
  0.00    0.000011           6         2           getrlimit
  0.00    0.000007           7         1           sched_getaffinity
  0.00    0.000005           1         4           lseek
  0.00    0.000005           5         1           rt_sigprocmask
  0.00    0.000005           5         1           fcntl
  0.00    0.000005           5         1           arch_prctl
  0.00    0.000005           5         1           set_tid_address
  0.00    0.000005           5         1           set_robust_list
  0.00    0.000004           4         1         1 rt_sigreturn
  0.00    0.000004           4         1           mremap
  0.00    0.000000           0         1         1 access
  0.00    0.000000           0         1           execve
------ ----------- ----------- --------- --------- ----------------
100.00    6.039148                336364       653 total

The text was updated successfully, but these errors were encountered:

kinnison · 2019-05-23T15:23:56Z

What's fascinating here is that while the install took minutes, there's only 6 seconds of syscalls there. I can only assume we're only getting CPU cost, not wait latency logged by strace.

However, profiles support should resolve this issue for you (PR #1673) so let's revisit this after the release of a rustup containing that.

saethlin · 2019-05-23T15:35:57Z

I don't think I'd assume that; here's the time output for a re-installation of nightly

real	2m38.806s
user	0m16.254s
sys	0m6.602s

I'll be watching that PR with great interest!

kinnison · 2019-05-23T15:52:18Z

Yep, so 2m38s of wallclock time, of which 16s user time (likely decompression, hashing, etc) and 6.6s of syscall CPU time. The remaining 2m14s or so will be waiting for the storage/network. :D

rbtcollins · 2019-05-25T22:09:08Z

Yep, you are entirely blocked on high latency IO. Can I get another strace please, with -T this time? Attach it somewhere I can download it, it will be long.

Pending verification from that trace bug...
There are several things we can do, some of which are low hanging.

We can introduce threading into your environment; if the open+write calls are low latency and close is slow, which client-checked but server enforced filesystems often have, then the current threading will be effective, we just need to enable it for you.

We can stop doing a second walk over the unpacked files - tar-rs already chmods (fchmod now) the files as it unpacks; we actually completely reset the permissions afterwards. A little bit of smarts (e.g. doing a fchmod on the handle from tar-rs for the common case and then only doing a fchmod for the uncommon case from the manifest) would let us drastically reduce syscalls); better still might be to tell tar-rs to not set chmod's at all as another feature - we'll want to be guided by the profile data.

saethlin · 2019-05-26T03:33:34Z

Indeed it is long
https://drive.google.com/file/d/1ZsuwEZZ06oGt47cfBUAKUi8gz2yJxuon/view?usp=sharing

Set RUSTUP_CLOSE_THREADS=disabled to force single threaded IO, or to a specific number if desired for testing/tuning. This may improve rust-lang#1867, but has no impact on rust-lang#1866 due to the coarse lock around the fd-handle table inside WSL.

rbtcollins · 2019-05-26T10:14:30Z

@saethlin I haven't reviewed the strace yet, and its late, but that patch of mine is the lowest hanging fruit that may help, I'd be interested to know whether it has any impact for you at all.

rbtcollins · 2019-05-27T06:41:28Z

Ok, so interesting that we get open() rather than openat, but -

open("/home/kimockb/.rustup/tmp/4duqw9jkaf9kmt3r_dir/rust-docs/share/doc/rust/html/core/core_arch/x86/avx/_mm256_broadcast_sd.v.html", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 9 <0.002960>
write(9, "<!DOCTYPE html>\n<html lang=\"en\">"..., 349) = 349 <0.000012>
fchmod(9, 0644)                         = 0 <0.001859>
close(9)                                = 0 <0.000009>
...
chmod("/home/kimockb/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/share/doc/rust/html/tutorial.html", 0644) = 0 <0.001082>

about 3ms to create the inode, about 2ms to fchmod it, and writes and closes are nearly free, and chmod is also ~1ms.

20k items, so 1ms == 20 seconds. so 60 seconds in open(), 40 seconds in fchmod(), and 20 seconds() in the final chmods.

So, the things we need to do to make it fast for HPC filesystems are:

don't block on inode creation
don't apply chmod's twice
and do the final chmod threaded or in the initial IO

rbtcollins · 2019-05-27T06:51:05Z

@saethlin how many cores do you have available? Clearly we can oversubscribe as your actual CPU time is near zero, but I'm wondering if a heuristic of machine-cpu's which we use for dealing with defender will be reasonable here too (and save on code path proliferation)

saethlin · 2019-05-27T07:09:38Z

Usually I have 32 cores available.

Just to be clear, any changes you make in the name of installing docs faster specifically on HPC systems (as opposed to everywhere) will be moot as soon as we can disable docs. I'm also not really keen on having an installation process that uses threads to drive enormous load onto the shared filesystem. I've witnessed write latencies of a few seconds because of misuse and I don't want to see rustup usage be responsible for that.

rbtcollins · 2019-05-27T07:46:17Z

alexcrichton/tar-rs#207 will permit tackling the chmod duplication. Its worth noting (for @alexcrichton if no-one else that there are several reasons we don't trust the distribution archives: they have been wrong in the past; we are still used to install those old archives, and when we start streaming-installs of archives we'll only verify signatures at the end so we want to trust as little as possible.

@saethlin so there are two parts here, one is actually doing the least work possible. Like - we used to do a full additional copy of the files which would have made older releases even slower; fixed already. Eliminating a full second pass of chmod metadata updates will reduce load on the metadata server (which HPC system are you using BTW? gluster? cephfs?).

The second part, once we have reduce the work done to the bare minimum with no waste is about scheduling: do we schedule the work to complete as rapidly as possible, or do we throttle it?

I understand your concern and share it: I certainly don't want to create headaches for folk by driving up a backlog somewhere. But lets put this in context - we have under a second of IO to a single platter of spinning rust disks (consumer disks have exceeded 200MBps for a decade or more now). Lets not assume that we are going to cause a problem until we have data showing it. I know for instance that with ext4 you want to create the files relatively closely together on a busy system, because ext4 is going to try to keep them together, and if the fs is too full (and high activity is moving what is used around a lot) it won't be able to. https://www.kernel.org/doc/html/latest/filesystems/ext4/overview.html#block-and-inode-allocation-policy

So my proposal is this: I write various things, and you try them out - I don't have your HPC cluster with its particular tradeoffs. If they cause havoc in test, we don't push them into master. If they don't cause havoc, we can consider moving forward.

saethlin · 2019-05-27T08:07:38Z

We're on Lustre, not entirely sure what version though.

Your proposal sounds good. I'm learning about iostat, hopefully I can watch rustup run and learn something or just keep an eye on the system.

rbtcollins · 2019-05-27T08:09:22Z

As a for instance rust-docs/share/doc/rust/html/rust-by-example/fn/closures/closure_examples:

strace for closure_examples

mkdir("/home/kimockb/.rustup/tmp/4duqw9jkaf9kmt3r_dir/rust-docs/share/doc/rust/html/rust-by-example/fn/closures/closure_examples", 0777) = 0 <0.001906>
chmod("/home/kimockb/.rustup/tmp/4duqw9jkaf9kmt3r_dir/rust-docs/share/doc/rust/html/rust-by-example/fn/closures/closure_examples", 0755) = 0 <0.001278>
stat("/home/kimockb/.rustup/tmp/4duqw9jkaf9kmt3r_dir/rust-docs/share/doc/rust/html/rust-by-example/fn/closures/closure_examples", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 <0.000020>
openat(AT_FDCWD, "/home/kimockb/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/share/doc/rust/html/rust-by-example/fn/closures/closure_examples", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 12 <0.000223>
lstat("/home/kimockb/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/share/doc/rust/html/rust-by-example/fn/closures/closure_examples", {st_mode=S_IFDIR|0755, st_size=1024, ...}) = 0 <0.000013>
chmod("/home/kimockb/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/share/doc/rust/html/rust-by-example/fn/closures/closure_examples", 0755) = 0 <0.001041>

This has a total syscall time of

mkdir  0.001906
chmod  0.001278
stat   0.000020  
openat 0.000223 
lstat  0.000013 
chmod  0.001041
--------------
       0.004481

Which is representative of the extra work here - we really only need to set umask appropriately, do the mkdir once, then never touch the dir again. That would halve the amount of work that the HPC file system is being asked to do.

rbtcollins · 2019-05-27T23:51:53Z

http://www.nics.tennessee.edu/computing-resources/file-systems/io-lustre-tips seems like it has good recommendations here.

In particular, @saethlin can you

lfs setstripe ~/.rustup -s 1m -i -1 -c 1
rm -rf ~/.rustup/tmp

And test current released rustup ? I'm not expecting any radical change, just curious about any impact this might have. The stdlib and rustc etc are obviously stripable, and we could look at special casing docs etc to make just them single-stripe, but this is more along the lines of information gathering.

rbtcollins · 2019-05-28T00:09:20Z

(Also TBH I think just some sensible docs in README.md might be useful here).

saethlin · 2019-05-28T02:25:37Z

Oh dangit, I was mistaken. Our /home partitions aren't Lustre, only the big data partitions are.

╰ ➤ mount | grep $HOME
10.13.152.221:/home/kimockb on /home/kimockb type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,noacl,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.13.152.221,mountvers=3,mountport=2049,mountproto=tcp,local_lock=none,addr=10.13.152.221)

I'll sure find a use for those lustre tips anyway.

rbtcollins · 2019-05-28T02:28:28Z

Ahha! ok, so if ~ is on plain old NFS, we're probably facing a classic single-machine multi user situation - but its also likely that that machine actually has decent IO backing it, if we're not nuts - we're just 1ms away on the network, so anything that has to cache-bust or be synchronous blocks. I'll have some non-aggressive test branch for you soon.

Set RUSTUP_CLOSE_THREADS=disabled to force single threaded IO, or to a specific number if desired for testing/tuning. This may improve rust-lang#1867, but has no impact on rust-lang#1866 due to the coarse lock around the fd-handle table inside WSL.

kinnison · 2019-10-18T20:24:39Z

Rustup 1.20 has been released which supports a minimal profile to not install rust-docs by default. Is this acceptable enough to close this issue?

saethlin · 2019-10-19T17:56:25Z

Absolutely. Thanks so much for all your hard work on this.

saethlin added the bug label May 23, 2019

rbtcollins mentioned this issue May 26, 2019

rustup is still slow under WSL #1866

Open

rbtcollins mentioned this issue May 26, 2019

Enable fully threaded IO for installs #1876

Merged

rbtcollins mentioned this issue May 27, 2019

RFE: Abstract out file system operations alexcrichton/tar-rs#188

Closed

rbtcollins changed the title ~~docs installation slow on HPC system(s)~~ docs installation slow on NFS home Jun 6, 2019

saethlin closed this as completed Oct 19, 2019

rbtcollins mentioned this issue Jan 20, 2025

Threaded FS IO probably should be async now? #4159

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs installation slow on NFS home #1867

docs installation slow on NFS home #1867

saethlin commented May 23, 2019 •

edited

Loading

kinnison commented May 23, 2019

saethlin commented May 23, 2019

kinnison commented May 23, 2019

rbtcollins commented May 25, 2019

saethlin commented May 26, 2019

rbtcollins commented May 26, 2019

rbtcollins commented May 27, 2019

rbtcollins commented May 27, 2019

saethlin commented May 27, 2019 •

edited

Loading

rbtcollins commented May 27, 2019

saethlin commented May 27, 2019

rbtcollins commented May 27, 2019

rbtcollins commented May 27, 2019

rbtcollins commented May 28, 2019

saethlin commented May 28, 2019

rbtcollins commented May 28, 2019

kinnison commented Oct 18, 2019

saethlin commented Oct 19, 2019

docs installation slow on NFS home #1867

docs installation slow on NFS home #1867

Comments

saethlin commented May 23, 2019 • edited Loading

kinnison commented May 23, 2019

saethlin commented May 23, 2019

kinnison commented May 23, 2019

rbtcollins commented May 25, 2019

saethlin commented May 26, 2019

rbtcollins commented May 26, 2019

rbtcollins commented May 27, 2019

rbtcollins commented May 27, 2019

saethlin commented May 27, 2019 • edited Loading

rbtcollins commented May 27, 2019

saethlin commented May 27, 2019

rbtcollins commented May 27, 2019

rbtcollins commented May 27, 2019

rbtcollins commented May 28, 2019

saethlin commented May 28, 2019

rbtcollins commented May 28, 2019

kinnison commented Oct 18, 2019

saethlin commented Oct 19, 2019

saethlin commented May 23, 2019 •

edited

Loading

saethlin commented May 27, 2019 •

edited

Loading