-
Notifications
You must be signed in to change notification settings - Fork 152
CTFE benchmarks need streamlining #280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The funny thing is that this is exactly what my first PR did, but I was told to split them up. ;) But it is fine for me. However, I'd like to include two more kinds of operations (merging this branch). This was blocked on those benchmarks not terminating in reasonable time due to a regression, but somehow that regression got fixed and I don't even know when... Myself, I have no experience writing such benchmarks. So I am happy for any advice I can get. What you are seeing in terms of where the cost is matches my experience debugging performance regressions in some of them: A huge part of the costs is hits into the query cache. I have two questions related to that:
More generally it is also rather frustrating that the bottleneck in CTFE is "doing stuff with types" (those queries are monomorphization and layout computation), not the actual CTFE operations. I wonder if there is something we can do about that, but that is a separate topic. |
I opened #282 to hopefully fix this. However, I am still interested in figuring out why these have such high variance -- is caching just so non-deterministic, or do we accidentally have a "real" source of non-determinism in the compler? One thing coming to my mind is -- maybe we are hashing pointers, and thanks to ASLR that can change even behavior of the FxHashMap. |
Great! Thank you for doing this. On the non-determinism front, hash table iteration is another possibility. |
Style servo had a non-deterministic build script which generated code. If I recall correctly, that was due to hashmap/set use. |
It's very useful to have CTFE benchmarked within rustc-perf. But there are some problems with the current benchmarks.
compare
page on perf.rust-lang.org now has a lot of entries that usually need to be ignored.check
builds, they take between 50--100 billion instructions. The only longer-running benchmarks arestyle-servo
andscript-servo
. This slows down benchmarking and profiling, esp. with slow profilers such as Callgrind. Many of the other benchmarks take fewer than 10 billion instructions. Furthermore, the CTFE ones are so repetitive that making them smaller would not lose information.expensive_static
andconst_repeat
macros. Second, even though they are nominally stressing different aspects of CTFE, the profiles look pretty similar.To expand on that last point, here are instruction counts for the the hottest four source files for each one:
There is not a lot of variation.
I suggest combining all 7 into a single benchmark, called
ctfe-stress
. It would have 7 invocations of theexpensive_static
macro. Also, that macro would be changed so the number of sub-expressions is 5 or 10x smaller.This would fix all the above problems except for the high variation. The only downside I can see is that the single benchmark would be measuring multiple things, rather than a single thing, which muddies the waters when doing local profiling. But there is a pretty simple workaround for that: if you are doing local profiling, just comment out whichever macro invocations you aren't interested in.
Thoughts?
CC @Mark-Simulacrum @RalfJung @oli-obk
The text was updated successfully, but these errors were encountered: