Skip to content

Better benchmarks #684

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
bjorn3 opened this issue Aug 21, 2019 · 7 comments
Open

Better benchmarks #684

bjorn3 opened this issue Aug 21, 2019 · 7 comments
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one.

Comments

@bjorn3
Copy link
Member

bjorn3 commented Aug 21, 2019

Currently only compilation and execution of very simple crates is benchmarked. An example of a useful benchmark would be https://github.com/ebobby/simple-raytracer.

@bjorn3 bjorn3 added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Aug 21, 2019
@bjorn3
Copy link
Member Author

bjorn3 commented Aug 22, 2019

I tried to compile libstd with mir inlining for fairer comparison with cg_llvm, as the later uses an optimized sysroot, while cg_clif doesn't. However I hit a rustc bug: rust-lang/rust#63802.

@bjorn3
Copy link
Member Author

bjorn3 commented Aug 22, 2019

$ # Bench cg_llvm, cg_clif+cg_clif sysroot, cg_clif+cg_llvm sysroot
$ hyperfine --prepare "cargo clean" "cargo build" "CHANNEL=release ../cargo.sh build" 'RUSTFLAGS="-Zcodegen-backend=/home/bjorn/Documenten/cg_clif/target/release/librustc_codegen_cranelift.so -Cpanic=abort" cargo build --target x86_64-unknown-linux-gnu'
Benchmark #1: cargo build
  Time (mean ± σ):     31.611 s ±  1.041 s    [User: 84.973 s, System: 3.935 s]
  Range (min … max):   30.514 s … 33.711 s    10 runs
 
Benchmark #2: CHANNEL=release ../cargo.sh build
  Time (mean ± σ):     31.211 s ±  1.130 s    [User: 66.759 s, System: 5.140 s]
  Range (min … max):   29.462 s … 32.760 s    10 runs
 
Benchmark #3: RUSTFLAGS="-Zcodegen-backend=/home/bjorn/Documenten/cg_clif/target/release/librustc_codegen_cranelift.so -Cpanic=abort" cargo build --target x86_64-unknown-linux-gnu
  Time (mean ± σ):     29.833 s ±  1.501 s    [User: 66.105 s, System: 4.819 s]
  Range (min … max):   27.988 s … 32.409 s    10 runs
 
Summary
  'RUSTFLAGS="-Zcodegen-backend=/home/bjorn/Documenten/cg_clif/target/release/librustc_codegen_cranelift.so -Cpanic=abort" cargo build --target x86_64-unknown-linux-gnu' ran
    1.05 ± 0.06 times faster than 'CHANNEL=release ../cargo.sh build'
    1.06 ± 0.06 times faster than 'cargo build'

Difference between cg_clif (compiled in release mode) and cg_llvm is within noise. This is despite cg_llvm using multiple threads for optimizations unlike cg_clif and cg_clif containing a lot of sanity checks.

$ # Bench cg_llvm with single thread optimization
$ hyperfine --prepare "cargo clean" "RUSTFLAGS=-Ccodegen-units=1 cargo build"
Benchmark #1: RUSTFLAGS=-Ccodegen-units=1 cargo build
  Time (mean ± σ):     35.033 s ±  1.427 s    [User: 69.766 s, System: 3.784 s]
  Range (min … max):   33.336 s … 38.439 s    10 runs

@bjorn3
Copy link
Member Author

bjorn3 commented Aug 22, 2019

Keeping the incremental data gives cg_clif a huge advantage over cg_llvm though:

$ hyperfine --prepare "rm -r target/debug/deps" --warmup 1 "cargo build" "RUSTFLAGS=-Ccodegen-units=1 cargo build" "CHANNEL=release ../cargo.sh build" 'RUSTFLAGS="-Zcodegen-backend=/home/bjorn/Documenten/cg_clif/target/release/librustc_codegen_cranelift.so -Cpanic=abort" cargo build --target x86_64-unknown-linux-gnu'
Benchmark #1: cargo build
  Time (mean ± σ):     28.747 s ±  1.363 s    [User: 76.805 s, System: 2.840 s]
  Range (min … max):   26.742 s … 30.305 s    10 runs
 
Benchmark #2: RUSTFLAGS=-Ccodegen-units=1 cargo build
  Time (mean ± σ):     34.041 s ±  2.252 s    [User: 63.653 s, System: 2.885 s]
  Range (min … max):   31.641 s … 38.352 s    10 runs
 
Benchmark #3: CHANNEL=release ../cargo.sh build
  Time (mean ± σ):     20.232 s ±  1.091 s    [User: 28.533 s, System: 1.291 s]
  Range (min … max):   18.719 s … 21.881 s    10 runs
 
Benchmark #4: RUSTFLAGS="-Zcodegen-backend=/home/bjorn/Documenten/cg_clif/target/release/librustc_codegen_cranelift.so -Cpanic=abort" cargo build --target x86_64-unknown-linux-gnu
  Time (mean ± σ):     20.707 s ±  2.022 s    [User: 28.811 s, System: 1.252 s]
  Range (min … max):   18.844 s … 25.844 s    10 runs
 
Summary
  'CHANNEL=release ../cargo.sh build' ran
    1.02 ± 0.11 times faster than 'RUSTFLAGS="-Zcodegen-backend=/home/bjorn/Documenten/cg_clif/target/release/librustc_codegen_cranelift.so -Cpanic=abort" cargo build --target x86_64-unknown-linux-gnu'
    1.42 ± 0.10 times faster than 'cargo build'
    1.68 ± 0.14 times faster than 'RUSTFLAGS=-Ccodegen-units=1 cargo build'

bjorn3 added a commit that referenced this issue Aug 28, 2019
Speeds up simple-raytracer by 7% (cc #684)
bjorn3 added a commit that referenced this issue Aug 28, 2019
Speeds up simple-raytracer by 30% (cc #684)
Also reduces the size of the simple-raytracer binary from 9.2MB to 8.6MB
@bjorn3
Copy link
Member Author

bjorn3 commented Aug 28, 2019

Runtime of simple-raytracer:

hyperfine ./raytracer-cg_llvm ./raytracer-cg_clif ./raytracer-cg_clif2 ./raytracer-cg_clif3 ./raytracer-cg_clif4
Benchmark #1: ./raytracer-cg_llvm
  Time (mean ± σ):      9.483 s ±  0.099 s    [User: 9.473 s, System: 0.006 s]
  Range (min … max):    9.396 s …  9.710 s    10 runs
 
Benchmark #2: ./raytracer-cg_clif
  Time (mean ± σ):     14.945 s ±  0.026 s    [User: 14.935 s, System: 0.005 s]
  Range (min … max):   14.910 s … 14.980 s    10 runs
 
Benchmark #3: ./raytracer-cg_clif2
  Time (mean ± σ):     14.091 s ±  0.082 s    [User: 14.079 s, System: 0.011 s]
  Range (min … max):   13.990 s … 14.301 s    10 runs
 
Benchmark #4: ./raytracer-cg_clif3
  Time (mean ± σ):     14.164 s ±  0.295 s    [User: 14.156 s, System: 0.007 s]
  Range (min … max):   13.983 s … 14.988 s    10 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark #5: ./raytracer-cg_clif4
  Time (mean ± σ):     10.750 s ±  0.208 s    [User: 10.744 s, System: 0.004 s]
  Range (min … max):   10.621 s … 11.312 s    10 runs
 
  Warning: The first benchmarking run for this command was significantly slower than the rest (11.312 s). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.
 
Summary
  './raytracer-cg_llvm' ran
    1.13 ± 0.02 times faster than './raytracer-cg_clif4'
    1.49 ± 0.02 times faster than './raytracer-cg_clif2'
    1.49 ± 0.03 times faster than './raytracer-cg_clif3'
    1.58 ± 0.02 times faster than './raytracer-cg_clif'

cg_clif is b9dc950
cg_clif2 is 4062999
cg_clif3 is 6127632
cg_clif4 is 1018a34

bjorn3 added a commit that referenced this issue Aug 30, 2019
Eg when the local is immutable **and** the type is freeze.

This makes the simple raytracer runtime benchmark 1% faster than cg_llvm
without optimizations. Before it was 2% slower.

cc #691
cc #684
@bjorn3
Copy link
Member Author

bjorn3 commented Aug 30, 2019

After 15b9834:

Benchmark #1: ./raytracer_cg_llvm
  Time (mean ± σ):      7.477 s ±  0.156 s    [User: 7.393 s, System: 0.037 s]
  Range (min … max):    7.237 s …  7.853 s    20 runs
 
Benchmark #2: ./raytracer_cg_clif
  Time (mean ± σ):      7.372 s ±  0.106 s    [User: 7.305 s, System: 0.029 s]
  Range (min … max):    7.240 s …  7.669 s    20 runs
 
Summary
  './raytracer_cg_clif' ran
    1.01 ± 0.03 times faster than './raytracer_cg_llvm'

(benched on faster machine than previous comments)

There is now pretty much no difference between cg_clif and cg_llvm. 🎉

bjorn3 added a commit that referenced this issue Aug 30, 2019
@bjorn3
Copy link
Member Author

bjorn3 commented Aug 30, 2019

5b17cf2 added simple-raytracer as benchmark.

@bjorn3
Copy link
Member Author

bjorn3 commented Jan 6, 2020

Just tried to compile veloren using cg_clif. I was surprised by the huge gap between cg_clif and cg_llvm:

CHANNEL="release" ../cargo.sh build  1613,63s user 64,18s system 315% cpu 8:51,02 total
                        cargo build  3272,94s user 50,07s system 315% cpu 17:33,41 total

I haven't tried running the compiled version though as threads are not yet supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one.
Projects
None yet
Development

No branches or pull requests

1 participant