Skip to content

on macOS, std.fs.deleteTree() is 40% slower than rm -rf #13048

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Jarred-Sumner opened this issue Oct 3, 2022 · 8 comments · Fixed by #13073
Closed

on macOS, std.fs.deleteTree() is 40% slower than rm -rf #13048

Jarred-Sumner opened this issue Oct 3, 2022 · 8 comments · Fixed by #13073
Labels
bug Observed behavior contradicts documented or intended behavior standard library This issue involves writing Zig code for the standard library.
Milestone

Comments

@Jarred-Sumner
Copy link
Contributor

Jarred-Sumner commented Oct 3, 2022

Zig Version

0.10.0-dev.2822+b79884eaf

Steps to Reproduce

Zig (zig build-exe -OReleaseFast rm.zig)

const std = @import("std");
pub fn main() anyerror!void {
    try std.fs.cwd().deleteTree(std.mem.span(std.os.argv[std.mem.span(std.os.argv).len - 1]));
}

C (clang -O3 rm.c) using removefile (which turns out to be slower or the same as rm)

#include <errno.h>
#include <removefile.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main(int argc, char *argv[]) {
  if (argc < 2) {
    fprintf(stderr, "usage: rm path");
    return 1;
  }

  if (removefile(argv[argc - 1], NULL, REMOVEFILE_RECURSIVE)) {
    fprintf(stderr, "removefile failed: %s", strerror(errno));
    return 1;
  }

  return 0;
}

Assuming node_modules is a folder with lots and lots of files and directories (~700 MB, xxx,xxx files)

hyperfine "./rm node_modules_copy" "./a.out node_modules_copy" "rm -rf node_modules_copy" --prepare="cp -r node_modules node_modules_copy"

Expected Behavior

Zig should perform similarly or better

Actual Behavior

hyperfine "./rm node_modules_copy" "./a.out node_modules_copy" "rm -rf node_modules_copy" --prepare="cp -r node_modules node_modules_copy"
Benchmark 1: ./rm node_modules_copy
  Time (mean ± σ):      5.796 s ±  0.681 s    [User: 0.041 s, System: 5.363 s]
  Range (minmax):    4.457 s6.761 s    10 runs
 
Benchmark 2: ./a.out node_modules_copy
  Time (mean ± σ):      4.614 s ±  0.536 s    [User: 0.040 s, System: 4.226 s]
  Range (minmax):    4.170 s6.090 s    10 runs
 
  Warning: The first benchmarking run for this command was significantly slower than the rest (6.090 s). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.
 
Benchmark 3: rm -rf node_modules_copy
  Time (mean ± σ):      4.145 s ±  0.198 s    [User: 0.046 s, System: 3.750 s]
  Range (minmax):    3.894 s4.420 s    10 runs
 
Summary
  'rm -rf node_modules_copy' ran
    1.11 ± 0.14 times faster than './a.out node_modules_copy'
    1.40 ± 0.18 times faster than './rm node_modules_copy'

image

@Jarred-Sumner Jarred-Sumner added the bug Observed behavior contradicts documented or intended behavior label Oct 3, 2022
@Vexu Vexu added the standard library This issue involves writing Zig code for the standard library. label Oct 3, 2022
@Vexu Vexu added this to the 0.11.0 milestone Oct 3, 2022
@squeek502
Copy link
Collaborator

squeek502 commented Oct 3, 2022

I get similar results on Linux:

$ hyperfine "./rm my-app-copy" "rm -rf my-app-copy" --prepare="cp -r my-app my-app-copy" --warmup 1
Benchmark #1: ./rm my-app-copy
  Time (mean ± σ):     675.0 ms ±   5.1 ms    [User: 28.9 ms, System: 636.1 ms]
  Range (min … max):   668.6 ms … 685.5 ms    10 runs
 
Benchmark #2: rm -rf my-app-copy
  Time (mean ± σ):     381.4 ms ±   5.1 ms    [User: 12.0 ms, System: 361.1 ms]
  Range (min … max):   373.9 ms … 388.4 ms    10 runs
 
Summary
  'rm -rf my-app-copy' ran
    1.77 ± 0.03 times faster than './rm my-app-copy'

(tested with the folder that resulted from npx create-react-app my-app; 41209 files, 344MiB)

strace -c results:

Zig:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 47.41    0.570193           8     65368     24159 unlinkat
 33.56    0.403598          13     29036           getdents64
  7.62    0.091679           3     24159           close
  6.81    0.081900           3     24159           openat
  4.61    0.055399           2     24159           lseek
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         1           arch_prctl
  0.00    0.000000           0         1           prlimit64
------ ----------- ----------- --------- --------- ----------------
100.00    1.202769                166884     24159 total

rm -rf:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 67.81    0.473027          11     41209           unlinkat
  9.31    0.064921           4     14585           getdents64
  7.31    0.050991           2     24303           fcntl
  5.57    0.038845           2     14687           close
  4.61    0.032171           3      9825           openat
  3.21    0.022378           2      9825           fstat
  2.17    0.015122           3      4863           newfstatat
  0.01    0.000051           5         9           brk
  0.00    0.000015          15         1           munmap
  0.00    0.000008           1         8           mmap
  0.00    0.000008           8         1           ioctl
  0.00    0.000007           7         1           lstat
  0.00    0.000004           4         1           fstatfs
  0.00    0.000003           3         1         1 lseek
  0.00    0.000000           0         1           read
  0.00    0.000000           0         3           mprotect
  0.00    0.000000           0         6           pread64
  0.00    0.000000           0         1         1 access
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         2         1 arch_prctl
------ ----------- ----------- --------- --------- ----------------
100.00    0.697551                119333         3 total

Worth noting that find my-app -type d | wc -l gives 4863 so it seems likely that Zig is opening/iterating directories much more than necessary. Also worth noting that if the 24159 erroring unlinkat calls were removed from Zig's implementation then the number of unlinkat calls would match with rm -rf.

@squeek502
Copy link
Collaborator

squeek502 commented Oct 3, 2022

As far as I can tell, the extra syscalls come from two sources:

  1. deleteTree is non-allocating and only has one directory open at a time. This means that a directory structure like a/b/c will open/close a, b, and c, then delete c, then re-open/close a and b, then delete b, then re-open/close a and then finally delete a. An allocating API could use a stack a la Walker which would remove these redundant open/closes.
  2. At the top of the start_over loop, there is a deleteFile call. This tries to delete the sub_path as a file over and over and most of the time this call will just fail over and over with error.IsDir since deleteTree is usually called on directories. For example, in the a/b/c structure, it will call deleteFile("a") 3 times and they will all fail with error.IsDir
    • In theory this could be moved outside of the loop (just try to delete it as a file once and then never try again if we get error.IsDir), but I think that'd introduce an edge case where we start calling deleteFile on a directory and then while trying to delete stuff it gets changed into a file, which deleteTree would then fail to delete.
    • Actually we'll get error.IsDir an additional 3 times in the a/b/c example due to the deleteFile within the scan_dir loop as well. This could possibly be addressed by checking entry.kind which isn't done currently.

I'm not totally sure if these are addressable given the non-allocating, single-open-fd API of deleteTree, but an allocating version could address them, so maybe introducing that as an option might be worth looking into.

@nektro
Copy link
Contributor

nektro commented Oct 4, 2022

why allocate when the function can be made recursive? (without reading the source) I would assume its a recursive function given the nature of the operation. was it to save on open file descriptors? stack overflow?

going with recursion the non-allocating aspect can definitely be preserved

@Jarred-Sumner
Copy link
Contributor Author

An interesting note from removefile.h:

REMOVEFILE_RECURSIVE = (1 << 0), // If path is a directory, recurse (depth first traversal)

I wonder what the perf impact here is if we go depth-first instead of breadth-first

@squeek502
Copy link
Collaborator

squeek502 commented Oct 4, 2022

why allocate when the function can be made recursive? (without reading the source) I would assume its a recursive function given the nature of the operation. was it to save on open file descriptors? stack overflow?

Not totally sure, but there is this comment:

zig/lib/std/fs.zig

Lines 2099 to 2101 in ff534d2

// Here we must avoid recursion, in order to provide O(1) memory guarantee of this function.
// Go through each entry and if it is not a directory, delete it. If it is a directory,
// open it, and close the original directory. Repeat. Then start the entire operation over.

I would assume it's to avoid stack overflow

@ifreund
Copy link
Member

ifreund commented Oct 4, 2022

going with recursion the non-allocating aspect can definitely be preserved

recursion is just a different form of allocation. It allocates on the stack instead of the heap.

@nektro
Copy link
Contributor

nektro commented Oct 4, 2022

having deleteTreeRecursive as an option would solve the performance issue though for those that arent concerned about it not allocating at all

@squeek502
Copy link
Collaborator

squeek502 commented Oct 5, 2022

My attempt at a recursive implementation ended up being slightly faster than my system's rm -rf.

$ hyperfine "./rm-recursive my-app-copy" "rm -rf my-app-copy" --prepare="cp -r my-app my-app-copy" --warmup 1 --runs 20
Benchmark #1: ./rm-recursive my-app-copy
  Time (mean ± σ):     388.2 ms ±   7.0 ms    [User: 22.8 ms, System: 357.4 ms]
  Range (min … max):   376.0 ms … 399.9 ms    20 runs
 
Benchmark #2: rm -rf my-app-copy
  Time (mean ± σ):     415.8 ms ±  24.7 ms    [User: 18.1 ms, System: 378.6 ms]
  Range (min … max):   392.7 ms … 489.9 ms    20 runs
 
Summary
  './rm-recursive my-app-copy' ran
    1.07 ± 0.07 times faster than 'rm -rf my-app-copy'

PR here: #13070

EDIT:

Just-as-fast but non-recursive implementation here: #13073

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Observed behavior contradicts documented or intended behavior standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants