-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Description
@jorive commented on Fri Aug 31 2018
When tiered jitting was enabled by default on CoreClr, nothing was done to disable it for the performance runs on CoreFx. Was it intended? What do we want to do here?
@danmosemsft commented on Fri Aug 31 2018
@kouvel my assumption is that for the microbenchmarks we use for corefx, we do not want tiered jitting involved (since it introduces uncertainty and also we are more interested in steady state perf) and ideally could switch it off entirely.
@noahfalk commented on Fri Aug 31 2018
@kouvel is on vacation for a week
I'd be hesitant to disable tiered compilation as a long term measure. It has significant implications for steady state performance in some cases and that is the behavior customers are going to see.
If tests have regressed I suspect we'll want to figure out where that regression has come from and whether we think its a real issue that impacts customers (in which case we should improve tiered compilation) or its an implementation detail of how the test was written that does not match real customer usage (in which case maybe we change test)
The most likely issues the benchmarks would hit are:
- Tiered compilation doesn't handle hot loops in cold methods very well. Benchmarks tend to follow this pattern far more than real-world use-cases. We have a mitigation planned for 2.2
- Benchmarks that are attempting to measure steady-state perf, but don't warm up long enough to actually reach steady-state now that tiered compilation is present.
@jorive - can you send me + @kouvel more info on the regressed tests so we can take a look?
@kouvel commented on Sat Sep 01 2018
My expectation was that most cases of cold methods with hot loops in the benchmarks would disappear after the switch to use BenchmarkDotNet. It looks like some new benchmarks based on BDN are in dotnet/performance repo (https://github.com/dotnet/performance/tree/master/src/benchmarks/corefx). Are the performance tests in the corefx repo going to be deleted eventually? Is there a regression in the BDN-based benchmarks?
I also wasn't sure if the benchmarks run against R2R'ed binaries. I believe we ship CoreFX libraries as R2R'ed, so the benchmarks should probably also run similarly. In that case tiering should help with steady-state perf. I'll take a closer look when I'm back.
@danmosemsft commented on Sat Sep 01 2018
@adamsitnik commented on Tue Sep 04 2018
It looks like some new benchmarks based on BDN are in dotnet/performance repo
I have recently ported and partially rewrote all 3000+ CoreCLR and CoreFX benchmarks to BenchmarkDotNet. They are all available in the performance repo. We plan to open it as soon as we are ready (1-3 weeks). The old benchmarks are going to be removed, we are going to stop using xunit-performance. Moreover, every new benchmark will be going through detailed performance-oriented code review.
My expectation was that most cases of cold methods with hot loops in the benchmarks would disappear after the switch to use BenchmarkDotNet.
Yes, it will. How?
- BDN performs Pilot Experiment to decide how many times given benchmark should be executed per iteration (it's calculating the xunit
InnerIterationCount
). By default, the Iteration takes 500ms, in performance repo we are limited with the time to run for CI and it's configured to 250 ms. For short running benchmarks, it typically needs 5 iterations to calculate theInnerIterationCount
. So we execute the benchmark for 1250ms. Tiered JIT heuristic requires the benchmark to be executed more than 30 times for longer than 100ms. So the code gets tiered in the Pilot stage. - BDN does an extra Warmup. By default, it warms up the code until our heuristic is happy, but in the performance repo, we have configured it to run only 1 warmup iteration (we know all the benchmarks, only a few of them need more than 1 warmup iteration and these benchmarks are configured to run more warmups)
- BDN removes the outliers by default, so even if Pilot stage and Warmup does not hit the tiered condition, but during the actual runs we hit it, BDN is going to remove the worst results (those where the benchmarked code was in tier 0).
The problem are time-consuming benchmarks which take more than 250ms to execute. Why? Because the pilot stage determines that we should run them once after first run. Warmup runs them once to not waste too much time. So the condition of tiered JIT to run the method more than 30 times is not met.
Good news: after the port CoreFX has no such benchmarks.
Bad news: CoreCLR repo has, an example are BenchmarkGame benchmarks.
What I think that we should do: have two CI jobs for benchmarks: with tiered compilation enabled and disabled. Compare the results, eliminate the regressions and remove the CI jobs for tiered compilation disabled or run them less frequently.
@kouvel commented on Thu Sep 20 2018
That all sounds awesome, thanks for the details. We're planning on adding MethodImplOptions.AggressiveOptimization
(https://github.com/dotnet/corefx/issues/32235) to work around the cold method with hot loops issue. For benchmarks that take long per iteration they could be attributed with a MethodImplAttribute
to have them start at tier 1. It could also be added to all benchmarks but the attribute could be used for other optimizations in the JIT, and that may skew the results from perf in unattributed cases. I also like the idea of having a periodic perf run without tiering to find gaps in new benchmarks or gaps that result from new bugs.
@karelz commented on Sun Jan 20 2019
@kouvel @noahfalk what is status of this bug? It is not in any area (i.e. nobody pays attention to it) - can you please set the right area for it?
If you're not working on it actively, please unassign yourself ... thanks!
@kouvel commented on Mon Jan 21 2019
I have been looking at perf gaps with tiering from tests in the performance repo, they are mostly break-even or improvements, some cases of cold methods with hot loops (https://github.com/dotnet/coreclr/issues/19751), one or two regressions that appear to be real and the cause needs to be confirmed, and the rest are noise or unrelated. I'm thinking about what the appropriate mode would be while trying not to disable tiering completely so that we can get somewhat representative numbers. It would probably be something like skipping tier 0 JIT for methods that don't have prejitted code, to work around the main regressions.
@bartonjs commented on Wed Feb 27 2019
@kouvel Does this issue belong in corefx? Or is it better tracked in the performance repository?
@kouvel commented on Mon Mar 04 2019
CoreCLR repo would be appropriate, moving