Better benchmarks #684

bjorn3 · 2019-08-21T12:45:00Z

Currently only compilation and execution of very simple crates is benchmarked. An example of a useful benchmark would be https://github.com/ebobby/simple-raytracer.

bjorn3 · 2019-08-22T09:05:51Z

I tried to compile libstd with mir inlining for fairer comparison with cg_llvm, as the later uses an optimized sysroot, while cg_clif doesn't. However I hit a rustc bug: rust-lang/rust#63802.

bjorn3 · 2019-08-22T09:44:58Z

$ # Bench cg_llvm, cg_clif+cg_clif sysroot, cg_clif+cg_llvm sysroot
$ hyperfine --prepare "cargo clean" "cargo build" "CHANNEL=release ../cargo.sh build" 'RUSTFLAGS="-Zcodegen-backend=/home/bjorn/Documenten/cg_clif/target/release/librustc_codegen_cranelift.so -Cpanic=abort" cargo build --target x86_64-unknown-linux-gnu'
Benchmark #1: cargo build
  Time (mean ± σ):     31.611 s ±  1.041 s    [User: 84.973 s, System: 3.935 s]
  Range (min … max):   30.514 s … 33.711 s    10 runs
 
Benchmark #2: CHANNEL=release ../cargo.sh build
  Time (mean ± σ):     31.211 s ±  1.130 s    [User: 66.759 s, System: 5.140 s]
  Range (min … max):   29.462 s … 32.760 s    10 runs
 
Benchmark #3: RUSTFLAGS="-Zcodegen-backend=/home/bjorn/Documenten/cg_clif/target/release/librustc_codegen_cranelift.so -Cpanic=abort" cargo build --target x86_64-unknown-linux-gnu
  Time (mean ± σ):     29.833 s ±  1.501 s    [User: 66.105 s, System: 4.819 s]
  Range (min … max):   27.988 s … 32.409 s    10 runs
 
Summary
  'RUSTFLAGS="-Zcodegen-backend=/home/bjorn/Documenten/cg_clif/target/release/librustc_codegen_cranelift.so -Cpanic=abort" cargo build --target x86_64-unknown-linux-gnu' ran
    1.05 ± 0.06 times faster than 'CHANNEL=release ../cargo.sh build'
    1.06 ± 0.06 times faster than 'cargo build'

Difference between cg_clif (compiled in release mode) and cg_llvm is within noise. This is despite cg_llvm using multiple threads for optimizations unlike cg_clif and cg_clif containing a lot of sanity checks.

$ # Bench cg_llvm with single thread optimization
$ hyperfine --prepare "cargo clean" "RUSTFLAGS=-Ccodegen-units=1 cargo build"
Benchmark #1: RUSTFLAGS=-Ccodegen-units=1 cargo build
  Time (mean ± σ):     35.033 s ±  1.427 s    [User: 69.766 s, System: 3.784 s]
  Range (min … max):   33.336 s … 38.439 s    10 runs

bjorn3 · 2019-08-22T10:10:17Z

Keeping the incremental data gives cg_clif a huge advantage over cg_llvm though:

$ hyperfine --prepare "rm -r target/debug/deps" --warmup 1 "cargo build" "RUSTFLAGS=-Ccodegen-units=1 cargo build" "CHANNEL=release ../cargo.sh build" 'RUSTFLAGS="-Zcodegen-backend=/home/bjorn/Documenten/cg_clif/target/release/librustc_codegen_cranelift.so -Cpanic=abort" cargo build --target x86_64-unknown-linux-gnu'
Benchmark #1: cargo build
  Time (mean ± σ):     28.747 s ±  1.363 s    [User: 76.805 s, System: 2.840 s]
  Range (min … max):   26.742 s … 30.305 s    10 runs
 
Benchmark #2: RUSTFLAGS=-Ccodegen-units=1 cargo build
  Time (mean ± σ):     34.041 s ±  2.252 s    [User: 63.653 s, System: 2.885 s]
  Range (min … max):   31.641 s … 38.352 s    10 runs
 
Benchmark #3: CHANNEL=release ../cargo.sh build
  Time (mean ± σ):     20.232 s ±  1.091 s    [User: 28.533 s, System: 1.291 s]
  Range (min … max):   18.719 s … 21.881 s    10 runs
 
Benchmark #4: RUSTFLAGS="-Zcodegen-backend=/home/bjorn/Documenten/cg_clif/target/release/librustc_codegen_cranelift.so -Cpanic=abort" cargo build --target x86_64-unknown-linux-gnu
  Time (mean ± σ):     20.707 s ±  2.022 s    [User: 28.811 s, System: 1.252 s]
  Range (min … max):   18.844 s … 25.844 s    10 runs
 
Summary
  'CHANNEL=release ../cargo.sh build' ran
    1.02 ± 0.11 times faster than 'RUSTFLAGS="-Zcodegen-backend=/home/bjorn/Documenten/cg_clif/target/release/librustc_codegen_cranelift.so -Cpanic=abort" cargo build --target x86_64-unknown-linux-gnu'
    1.42 ± 0.10 times faster than 'cargo build'
    1.68 ± 0.14 times faster than 'RUSTFLAGS=-Ccodegen-units=1 cargo build'

Speeds up simple-raytracer by 7% (cc #684)

Speeds up simple-raytracer by 30% (cc #684) Also reduces the size of the simple-raytracer binary from 9.2MB to 8.6MB

bjorn3 · 2019-08-28T16:13:25Z

Runtime of simple-raytracer:

hyperfine ./raytracer-cg_llvm ./raytracer-cg_clif ./raytracer-cg_clif2 ./raytracer-cg_clif3 ./raytracer-cg_clif4
Benchmark #1: ./raytracer-cg_llvm
  Time (mean ± σ):      9.483 s ±  0.099 s    [User: 9.473 s, System: 0.006 s]
  Range (min … max):    9.396 s …  9.710 s    10 runs
 
Benchmark #2: ./raytracer-cg_clif
  Time (mean ± σ):     14.945 s ±  0.026 s    [User: 14.935 s, System: 0.005 s]
  Range (min … max):   14.910 s … 14.980 s    10 runs
 
Benchmark #3: ./raytracer-cg_clif2
  Time (mean ± σ):     14.091 s ±  0.082 s    [User: 14.079 s, System: 0.011 s]
  Range (min … max):   13.990 s … 14.301 s    10 runs
 
Benchmark #4: ./raytracer-cg_clif3
  Time (mean ± σ):     14.164 s ±  0.295 s    [User: 14.156 s, System: 0.007 s]
  Range (min … max):   13.983 s … 14.988 s    10 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark #5: ./raytracer-cg_clif4
  Time (mean ± σ):     10.750 s ±  0.208 s    [User: 10.744 s, System: 0.004 s]
  Range (min … max):   10.621 s … 11.312 s    10 runs
 
  Warning: The first benchmarking run for this command was significantly slower than the rest (11.312 s). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.
 
Summary
  './raytracer-cg_llvm' ran
    1.13 ± 0.02 times faster than './raytracer-cg_clif4'
    1.49 ± 0.02 times faster than './raytracer-cg_clif2'
    1.49 ± 0.03 times faster than './raytracer-cg_clif3'
    1.58 ± 0.02 times faster than './raytracer-cg_clif'

cg_clif is b9dc950
cg_clif2 is 4062999
cg_clif3 is 6127632
cg_clif4 is 1018a34

Eg when the local is immutable **and** the type is freeze. This makes the simple raytracer runtime benchmark 1% faster than cg_llvm without optimizations. Before it was 2% slower. cc #691 cc #684

bjorn3 · 2019-08-30T14:11:30Z

After 15b9834:

Benchmark #1: ./raytracer_cg_llvm
  Time (mean ± σ):      7.477 s ±  0.156 s    [User: 7.393 s, System: 0.037 s]
  Range (min … max):    7.237 s …  7.853 s    20 runs
 
Benchmark #2: ./raytracer_cg_clif
  Time (mean ± σ):      7.372 s ±  0.106 s    [User: 7.305 s, System: 0.029 s]
  Range (min … max):    7.240 s …  7.669 s    20 runs
 
Summary
  './raytracer_cg_clif' ran
    1.01 ± 0.03 times faster than './raytracer_cg_llvm'

(benched on faster machine than previous comments)

There is now pretty much no difference between cg_clif and cg_llvm. 🎉

cc #684

bjorn3 · 2019-08-30T15:33:07Z

5b17cf2 added simple-raytracer as benchmark.

bjorn3 · 2020-01-06T20:06:47Z

Just tried to compile veloren using cg_clif. I was surprised by the huge gap between cg_clif and cg_llvm:

CHANNEL="release" ../cargo.sh build  1613,63s user 64,18s system 315% cpu 8:51,02 total
                        cargo build  3272,94s user 50,07s system 315% cpu 17:33,41 total

I haven't tried running the compiled version though as threads are not yet supported.

bjorn3 added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Aug 21, 2019

bjorn3 added a commit that referenced this issue Aug 28, 2019

Don't force RETURN_PLACE to stack

4062999

Speeds up simple-raytracer by 7% (cc #684)

bjorn3 added a commit that referenced this issue Aug 28, 2019

Don't add stack_addr intructions to prelude

1018a34

Speeds up simple-raytracer by 30% (cc #684) Also reduces the size of the simple-raytracer binary from 9.2MB to 8.6MB

bjorn3 added a commit that referenced this issue Aug 30, 2019

Add ebobby/simple-raytracer as benchmark

5b17cf2

cc #684

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better benchmarks #684

Better benchmarks #684

bjorn3 commented Aug 21, 2019

bjorn3 commented Aug 22, 2019

bjorn3 commented Aug 22, 2019

bjorn3 commented Aug 22, 2019

bjorn3 commented Aug 28, 2019

bjorn3 commented Aug 30, 2019

bjorn3 commented Aug 30, 2019

bjorn3 commented Jan 6, 2020 •

edited

Loading

Better benchmarks #684

Better benchmarks #684

Comments

bjorn3 commented Aug 21, 2019

bjorn3 commented Aug 22, 2019

bjorn3 commented Aug 22, 2019

bjorn3 commented Aug 22, 2019

bjorn3 commented Aug 28, 2019

bjorn3 commented Aug 30, 2019

bjorn3 commented Aug 30, 2019

bjorn3 commented Jan 6, 2020 • edited Loading

bjorn3 commented Jan 6, 2020 •

edited

Loading