Skip to content

CTFE benchmarks need streamlining #280

Closed
@nnethercote

Description

@nnethercote

It's very useful to have CTFE benchmarked within rustc-perf. But there are some problems with the current benchmarks.

  • There are too many: 7 out of 36 benchmarks.
  • They have high variation: up to plus or minus 8% or so. I suspect this is because they have high hash table use, and hash table iteration is non-deterministic (though I could be wrong). In combination with the previous point, the compare page on perf.rust-lang.org now has a lot of entries that usually need to be ignored.
  • They are too long running. For check builds, they take between 50--100 billion instructions. The only longer-running benchmarks are style-servo and script-servo. This slows down benchmarking and profiling, esp. with slow profilers such as Callgrind. Many of the other benchmarks take fewer than 10 billion instructions. Furthermore, the CTFE ones are so repetitive that making them smaller would not lose information.
  • The names are too long. As a result, on perf.rust-lang.org some of them don't fit three graphs across the screen like they're supposed to.
  • There is not much difference between them. First, their source mostly consist of the expensive_static and const_repeat macros. Second, even though they are nominally stressing different aspects of CTFE, the profiles look pretty similar.

To expand on that last point, here are instruction counts for the the hottest four source files for each one:

cgout-Orig-ctfe-stress-cast-Check-Clean
              63,415,123,847 TOTAL
13.5% 13.5%    8,590,340,259 librustc/ty/query/plumbing.rs
12.4% 26.0%    7,887,149,209 librustc_mir/interpret/eval_context.rs
 8.5% 34.5%    5,408,861,948 libcore/cell.rs         
 7.9% 42.4%    4,998,540,292 librustc/ty/layout.rs   

cgout-Orig-ctfe-stress-const-fn-Check-Clean
              49,800,782,339 TOTAL
13.7% 13.7%    6,803,527,890 librustc/ty/query/plumbing.rs
12.5% 26.1%    6,201,217,834 librustc_mir/interpret/eval_context.rs
 8.6% 34.7%    4,293,120,813 libcore/cell.rs         
 5.6% 40.3%    2,775,820,287 libcore/ptr.rs          

cgout-Orig-ctfe-stress-force-alloc-Check-Clean
              56,259,650,108 TOTAL
13.2% 13.2%    7,444,960,046 librustc_mir/interpret/memory.rs
 8.0% 21.3%    4,526,115,767 librustc/ty/query/plumbing.rs
 7.5% 28.8%    4,210,254,628 librustc_mir/interpret/place.rs
 6.0% 34.7%    3,357,339,539 librustc_mir/interpret/eval_context.rs

cgout-Orig-ctfe-stress-index-check-Check-Clean
              54,348,254,144 TOTAL
13.9% 13.9%    7,535,276,480 librustc_mir/interpret/eval_context.rs
12.7% 26.5%    6,887,503,868 librustc/ty/query/plumbing.rs
 8.1% 34.6%    4,393,910,195 libcore/cell.rs         
 6.3% 40.9%    3,427,755,811 librustc/ty/layout.rs   

cgout-Orig-ctfe-stress-ops-Check-Clean
             100,246,255,577 TOTAL
13.4% 13.4%   13,480,923,396 librustc_mir/interpret/eval_context.rs
12.8% 26.3%   12,843,447,695 librustc/ty/query/plumbing.rs
 8.1% 34.3%    8,100,046,914 libcore/cell.rs         
 7.0% 41.3%    7,008,238,377 librustc/ty/layout.rs   

cgout-Orig-ctfe-stress-reloc-Check-Clean
              95,647,321,682 TOTAL
18.7% 18.7%   17,888,413,265 librustc_mir/interpret/eval_context.rs
14.5% 33.2%   13,885,565,400 librustc/ty/query/plumbing.rs
 9.2% 42.4%    8,810,350,240 libcore/cell.rs         
 5.9% 48.4%    5,686,371,066 librustc/ty/layout.rs   

cgout-Orig-ctfe-stress-unsize-slice-Check-Clean
              60,974,886,778 TOTAL
16.9% 16.9%   10,318,564,804 librustc_mir/interpret/eval_context.rs
13.7% 30.6%    8,349,291,251 librustc/ty/query/plumbing.rs
 8.7% 39.3%    5,305,447,236 libcore/cell.rs         
 5.9% 45.2%    3,608,659,105 librustc/ty/layout.rs  

There is not a lot of variation.

I suggest combining all 7 into a single benchmark, called ctfe-stress. It would have 7 invocations of the expensive_static macro. Also, that macro would be changed so the number of sub-expressions is 5 or 10x smaller.

This would fix all the above problems except for the high variation. The only downside I can see is that the single benchmark would be measuring multiple things, rather than a single thing, which muddies the waters when doing local profiling. But there is a pretty simple workaround for that: if you are doing local profiling, just comment out whichever macro invocations you aren't interested in.

Thoughts?

CC @Mark-Simulacrum @RalfJung @oli-obk

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions