Description
It's very useful to have CTFE benchmarked within rustc-perf. But there are some problems with the current benchmarks.
- There are too many: 7 out of 36 benchmarks.
- They have high variation: up to plus or minus 8% or so. I suspect this is because they have high hash table use, and hash table iteration is non-deterministic (though I could be wrong). In combination with the previous point, the
compare
page on perf.rust-lang.org now has a lot of entries that usually need to be ignored. - They are too long running. For
check
builds, they take between 50--100 billion instructions. The only longer-running benchmarks arestyle-servo
andscript-servo
. This slows down benchmarking and profiling, esp. with slow profilers such as Callgrind. Many of the other benchmarks take fewer than 10 billion instructions. Furthermore, the CTFE ones are so repetitive that making them smaller would not lose information. - The names are too long. As a result, on perf.rust-lang.org some of them don't fit three graphs across the screen like they're supposed to.
- There is not much difference between them. First, their source mostly consist of the
expensive_static
andconst_repeat
macros. Second, even though they are nominally stressing different aspects of CTFE, the profiles look pretty similar.
To expand on that last point, here are instruction counts for the the hottest four source files for each one:
cgout-Orig-ctfe-stress-cast-Check-Clean
63,415,123,847 TOTAL
13.5% 13.5% 8,590,340,259 librustc/ty/query/plumbing.rs
12.4% 26.0% 7,887,149,209 librustc_mir/interpret/eval_context.rs
8.5% 34.5% 5,408,861,948 libcore/cell.rs
7.9% 42.4% 4,998,540,292 librustc/ty/layout.rs
cgout-Orig-ctfe-stress-const-fn-Check-Clean
49,800,782,339 TOTAL
13.7% 13.7% 6,803,527,890 librustc/ty/query/plumbing.rs
12.5% 26.1% 6,201,217,834 librustc_mir/interpret/eval_context.rs
8.6% 34.7% 4,293,120,813 libcore/cell.rs
5.6% 40.3% 2,775,820,287 libcore/ptr.rs
cgout-Orig-ctfe-stress-force-alloc-Check-Clean
56,259,650,108 TOTAL
13.2% 13.2% 7,444,960,046 librustc_mir/interpret/memory.rs
8.0% 21.3% 4,526,115,767 librustc/ty/query/plumbing.rs
7.5% 28.8% 4,210,254,628 librustc_mir/interpret/place.rs
6.0% 34.7% 3,357,339,539 librustc_mir/interpret/eval_context.rs
cgout-Orig-ctfe-stress-index-check-Check-Clean
54,348,254,144 TOTAL
13.9% 13.9% 7,535,276,480 librustc_mir/interpret/eval_context.rs
12.7% 26.5% 6,887,503,868 librustc/ty/query/plumbing.rs
8.1% 34.6% 4,393,910,195 libcore/cell.rs
6.3% 40.9% 3,427,755,811 librustc/ty/layout.rs
cgout-Orig-ctfe-stress-ops-Check-Clean
100,246,255,577 TOTAL
13.4% 13.4% 13,480,923,396 librustc_mir/interpret/eval_context.rs
12.8% 26.3% 12,843,447,695 librustc/ty/query/plumbing.rs
8.1% 34.3% 8,100,046,914 libcore/cell.rs
7.0% 41.3% 7,008,238,377 librustc/ty/layout.rs
cgout-Orig-ctfe-stress-reloc-Check-Clean
95,647,321,682 TOTAL
18.7% 18.7% 17,888,413,265 librustc_mir/interpret/eval_context.rs
14.5% 33.2% 13,885,565,400 librustc/ty/query/plumbing.rs
9.2% 42.4% 8,810,350,240 libcore/cell.rs
5.9% 48.4% 5,686,371,066 librustc/ty/layout.rs
cgout-Orig-ctfe-stress-unsize-slice-Check-Clean
60,974,886,778 TOTAL
16.9% 16.9% 10,318,564,804 librustc_mir/interpret/eval_context.rs
13.7% 30.6% 8,349,291,251 librustc/ty/query/plumbing.rs
8.7% 39.3% 5,305,447,236 libcore/cell.rs
5.9% 45.2% 3,608,659,105 librustc/ty/layout.rs
There is not a lot of variation.
I suggest combining all 7 into a single benchmark, called ctfe-stress
. It would have 7 invocations of the expensive_static
macro. Also, that macro would be changed so the number of sub-expressions is 5 or 10x smaller.
This would fix all the above problems except for the high variation. The only downside I can see is that the single benchmark would be measuring multiple things, rather than a single thing, which muddies the waters when doing local profiling. But there is a pretty simple workaround for that: if you are doing local profiling, just comment out whichever macro invocations you aren't interested in.
Thoughts?