Skip to content
This repository was archived by the owner on Feb 25, 2025. It is now read-only.

Fix premature DisplayList caching #34562

Merged
merged 4 commits into from
Jul 11, 2022

Conversation

flar
Copy link
Contributor

@flar flar commented Jul 9, 2022

This should fix the regressions seen in flutter/flutter#106987 originally caused by #31892

The logic for when to increment the access count for a DisplayList cache entry was tied to whether or not a cache entry existed for it which is convoluted logic. This PR introduces an explicit flag to indicate whether the cache item will be visible in the scene and then the logic for incrementing the access count can key off of that flag instead of a side effect of having a cache entry.

Some benchmark results. In all cases A is the Flutter framework tree using an engine before #31892 was integrated and B is the same tree using an engine with both that PR and this patch. Specifically framework commit flutter/flutter@caadc25 was used.

textfield perf A/B results
═══════╡ ••• Formatted summary for "textfield_perf_ABresults.json" ••• ╞════════

                   Score                     Average A (noise)  Average B (noise)  Speed-up
average_frame_build_time_millis                   2.42 (4.99%)       2.55 (6.11%)   0.95x  
worst_frame_build_time_millis                     8.77 (22.91%)      7.61 (21.72%)  1.15x  
90th_percentile_frame_build_time_millis           3.56 (9.99%)       4.12 (10.75%)  0.86x  
99th_percentile_frame_build_time_millis           8.77 (22.91%)      7.61 (21.72%)  1.15x  
average_frame_rasterizer_time_millis             14.07 (4.94%)      14.03 (3.41%)   1.00x  
worst_frame_rasterizer_time_millis              117.45 (10.28%)    119.28 (5.89%)   0.98x  
90th_percentile_frame_rasterizer_time_millis     32.91 (54.65%)     23.12 (3.32%)   1.42x  
99th_percentile_frame_rasterizer_time_millis    117.45 (10.28%)    119.28 (5.89%)   0.98x  
average_layer_cache_count                         0.00 (0.00%)       0.00 (0.00%)    NaNx  
90th_percentile_layer_cache_count                 0.00 (0.00%)       0.00 (0.00%)    NaNx  
99th_percentile_layer_cache_count                 0.00 (0.00%)       0.00 (0.00%)    NaNx  
worst_layer_cache_count                           0.00 (0.00%)       0.00 (0.00%)    NaNx  
average_layer_cache_memory                        0.00 (0.00%)       0.00 (0.00%)    NaNx  
90th_percentile_layer_cache_memory                0.00 (0.00%)       0.00 (0.00%)    NaNx  
99th_percentile_layer_cache_memory                0.00 (0.00%)       0.00 (0.00%)    NaNx  
worst_layer_cache_memory                          0.00 (0.00%)       0.00 (0.00%)    NaNx  
average_picture_cache_count                       0.17 (2.78%)       0.49 (1.64%)   0.34x  
90th_percentile_picture_cache_count               0.00 (0.00%)       0.00 (0.00%)    NaNx  
99th_percentile_picture_cache_count               3.00 (0.00%)       8.00 (0.00%)   0.38x  
worst_picture_cache_count                         3.00 (0.00%)       8.00 (0.00%)   0.38x  
average_picture_cache_memory                      0.17 (2.74%)       0.51 (1.64%)   0.34x  
90th_percentile_picture_cache_memory              0.00 (0.00%)       0.00 (0.00%)    NaNx  
99th_percentile_picture_cache_memory              3.13 (0.06%)       8.35 (0.00%)   0.38x  
worst_picture_cache_memory                        3.13 (0.06%)       8.35 (0.00%)   0.38x  
new_gen_gc_count                                  4.00 (0.00%)       4.00 (0.00%)   1.00x  
old_gen_gc_count                                  0.00 (0.00%)       0.00 (0.00%)    NaNx  
average_vsync_transitions_missed                  2.86 (23.60%)      2.58 (19.32%)  1.11x  
90th_percentile_vsync_transitions_missed          4.75 (40.43%)      4.50 (36.85%)  1.06x  
99th_percentile_vsync_transitions_missed          6.00 (20.41%)      5.75 (18.95%)  1.04x  
30hz_frame_percentage                             0.00 (0.00%)       0.00 (0.00%)    NaNx  
60hz_frame_percentage                           100.00 (0.00%)     100.00 (0.00%)   1.00x  
80hz_frame_percentage                             0.00 (0.00%)       0.00 (0.00%)    NaNx  
90hz_frame_percentage                             0.00 (0.00%)       0.00 (0.00%)    NaNx  
120hz_frame_percentage                            0.00 (0.00%)       0.00 (0.00%)    NaNx  
illegal_refresh_rate_frame_count                  0.00 (0.00%)       0.00 (0.00%)    NaNx  
raster cache memory use A/B results
═════╡ ••• Formatted summary for "rastercache_mem_use_ABresults.json" ••• ╞═════

                   Score                     Average A (noise)  Average B (noise)   Speed-up
average_frame_build_time_millis                   1.19 (5.67%)       1.41 (6.31%)    0.84x  
worst_frame_build_time_millis                     2.69 (27.53%)      2.63 (20.92%)   1.02x  
90th_percentile_frame_build_time_millis           1.41 (5.54%)       1.58 (7.13%)    0.89x  
99th_percentile_frame_build_time_millis           1.59 (2.96%)       1.88 (21.14%)   0.84x  
average_frame_rasterizer_time_millis              3.44 (1.09%)       3.56 (2.20%)    0.97x  
worst_frame_rasterizer_time_millis                5.88 (18.79%)      6.39 (13.00%)   0.92x  
90th_percentile_frame_rasterizer_time_millis      3.72 (1.71%)       3.84 (2.43%)    0.97x  
99th_percentile_frame_rasterizer_time_millis      4.46 (6.94%)       4.64 (7.58%)    0.96x  
average_layer_cache_count                         4.00 (0.00%)       2.00 (0.00%)    2.00x  
90th_percentile_layer_cache_count                 4.00 (0.00%)       2.00 (0.00%)    2.00x  
99th_percentile_layer_cache_count                 4.00 (0.00%)       2.00 (0.00%)    2.00x  
worst_layer_cache_count                           4.00 (0.00%)       2.00 (0.00%)    2.00x  
average_layer_cache_memory                        2.28 (0.00%)       1.47 (0.00%)    1.55x  
90th_percentile_layer_cache_memory                2.28 (0.00%)       1.47 (0.00%)    1.55x  
99th_percentile_layer_cache_memory                2.28 (0.00%)       1.47 (0.00%)    1.55x  
worst_layer_cache_memory                          2.28 (0.00%)       1.47 (0.00%)    1.55x  
average_picture_cache_count                       1.00 (0.00%)       1.00 (0.00%)    1.00x  
90th_percentile_picture_cache_count               1.00 (0.00%)       1.00 (0.00%)    1.00x  
99th_percentile_picture_cache_count               1.00 (0.00%)       1.00 (0.00%)    1.00x  
worst_picture_cache_count                         1.00 (0.00%)       1.00 (0.00%)    1.00x  
average_picture_cache_memory                      0.06 (0.00%)       0.06 (0.00%)    1.00x  
90th_percentile_picture_cache_memory              0.06 (0.00%)       0.06 (0.00%)    1.00x  
99th_percentile_picture_cache_memory              0.06 (0.00%)       0.06 (0.00%)    1.00x  
worst_picture_cache_memory                        0.06 (0.00%)       0.06 (0.00%)    1.00x  
new_gen_gc_count                                  1.50 (57.74%)      0.50 (173.21%)  3.00x  
old_gen_gc_count                                  0.00 (0.00%)       0.00 (0.00%)     NaNx  
backdropfilter perf A/B results
═════╡ ••• Formatted summary for "backdropfilter_perf_ABresults.json" ••• ╞═════

                   Score                     Average A (noise)   Average B (noise)   Speed-up
average_frame_build_time_millis                   1.62 (4.52%)        1.73 (9.94%)    0.94x  
worst_frame_build_time_millis                     2.33 (2.44%)        3.09 (28.57%)   0.75x  
90th_percentile_frame_build_time_millis           2.05 (3.02%)        2.27 (9.17%)    0.90x  
99th_percentile_frame_build_time_millis           2.33 (2.44%)        3.09 (28.57%)   0.75x  
average_frame_rasterizer_time_millis            321.21 (0.39%)      324.43 (0.32%)    0.99x  
worst_frame_rasterizer_time_millis              342.10 (2.42%)      344.24 (0.88%)    0.99x  
90th_percentile_frame_rasterizer_time_millis    324.40 (0.20%)      330.31 (0.56%)    0.98x  
99th_percentile_frame_rasterizer_time_millis    342.10 (2.42%)      344.24 (0.88%)    0.99x  
average_layer_cache_count                         0.00 (0.00%)        0.00 (0.00%)     NaNx  
90th_percentile_layer_cache_count                 0.00 (0.00%)        0.00 (0.00%)     NaNx  
99th_percentile_layer_cache_count                 0.00 (0.00%)        0.00 (0.00%)     NaNx  
worst_layer_cache_count                           0.00 (0.00%)        0.00 (0.00%)     NaNx  
average_layer_cache_memory                        0.00 (0.00%)        0.00 (0.00%)     NaNx  
90th_percentile_layer_cache_memory                0.00 (0.00%)        0.00 (0.00%)     NaNx  
99th_percentile_layer_cache_memory                0.00 (0.00%)        0.00 (0.00%)     NaNx  
worst_layer_cache_memory                          0.00 (0.00%)        0.00 (0.00%)     NaNx  
average_picture_cache_count                       1.75 (0.23%)        1.81 (0.00%)    0.97x  
90th_percentile_picture_cache_count               2.00 (0.00%)        2.00 (0.00%)    1.00x  
99th_percentile_picture_cache_count               2.00 (0.00%)        2.00 (0.00%)    1.00x  
worst_picture_cache_count                         2.00 (0.00%)        2.00 (0.00%)    1.00x  
average_picture_cache_memory                      6.91 (0.23%)        7.14 (0.00%)    0.97x  
90th_percentile_picture_cache_memory              7.91 (0.00%)        7.91 (0.00%)    1.00x  
99th_percentile_picture_cache_memory              7.91 (0.00%)        7.91 (0.00%)    1.00x  
worst_picture_cache_memory                        7.91 (0.00%)        7.91 (0.00%)    1.00x  
new_gen_gc_count                                  1.00 (100.00%)      1.00 (100.00%)  1.00x  
old_gen_gc_count                                  0.00 (0.00%)        0.00 (0.00%)     NaNx  
average_vsync_transitions_missed                 20.38 (0.48%)       20.53 (0.31%)    0.99x  
90th_percentile_vsync_transitions_missed         21.00 (0.00%)       21.00 (0.00%)    1.00x  
99th_percentile_vsync_transitions_missed         22.25 (7.37%)       21.75 (1.99%)    1.02x  
30hz_frame_percentage                             0.00 (0.00%)        0.00 (0.00%)     NaNx  
60hz_frame_percentage                           100.00 (0.00%)      100.00 (0.00%)    1.00x  
80hz_frame_percentage                             0.00 (0.00%)        0.00 (0.00%)     NaNx  
90hz_frame_percentage                             0.00 (0.00%)        0.00 (0.00%)     NaNx  
120hz_frame_percentage                            0.00 (0.00%)        0.00 (0.00%)     NaNx  
illegal_refresh_rate_frame_count                  0.00 (0.00%)        0.00 (0.00%)     NaNx  

@flutter-dashboard
Copy link

It looks like this pull request may not have tests. Please make sure to add tests before merging. If you need an exemption to this rule, contact Hixie on the #hackers channel in Chat (don't just cc him here, he won't see it! He's on Discord!).

If you are not sure if you need tests, consider this rule of thumb: the purpose of a test is to make sure someone doesn't accidentally revert the fix. Ask yourself, is there anything in your PR that you feel it is important we not accidentally revert back to how it was before your fix?

Reviewers: Read the Tree Hygiene page and make sure this patch meets those guidelines before LGTMing.

@flar flar requested a review from JsouLiang July 9, 2022 01:16
@flar
Copy link
Contributor Author

flar commented Jul 9, 2022

I'm still working on a unit (not benchmark) test that will verify that the regression behavior is gone.

@flar
Copy link
Contributor Author

flar commented Jul 10, 2022

I've added a test that verifies the DisplayListLayer caching behavior as it slips in and out of the cull rect to prevent future regressions.

@JsouLiang
Copy link
Contributor

═════════════════════════╡ ••• Final A/B results ••• ╞══════════════════════════

Score	Average A (noise)	Average B (noise)	Speed-up
average_frame_build_time_millis	1.55 (1.71%)	1.66 (6.31%)	0.93x	
worst_frame_build_time_millis	3.99 (6.90%)	5.39 (27.85%)	0.74x	
90th_percentile_frame_build_time_millis	2.88 (7.81%)	2.83 (13.20%)	1.02x	
99th_percentile_frame_build_time_millis	3.99 (6.90%)	4.43 (5.43%)	0.90x	
average_frame_rasterizer_time_millis	7.06 (2.12%)	7.12 (4.54%)	0.99x	
worst_frame_rasterizer_time_millis	53.68 (5.28%)	55.64 (13.31%)	0.96x	
90th_percentile_frame_rasterizer_time_millis	12.57 (9.32%)	11.56 (2.79%)	1.09x	
99th_percentile_frame_rasterizer_time_millis	53.68 (5.28%)	54.25 (17.67%)	0.99x	
average_layer_cache_count	0.00 (0.00%)	0.01 (173.21%)	0.00x	
90th_percentile_layer_cache_count	0.00 (0.00%)	0.00 (0.00%)	NaNx	
99th_percentile_layer_cache_count	0.00 (0.00%)	0.25 (173.21%)	0.00x	
worst_layer_cache_count	0.00 (0.00%)	0.25 (173.21%)	0.00x	
average_layer_cache_memory	0.00 (0.00%)	0.08 (173.21%)	0.00x	
90th_percentile_layer_cache_memory	0.00 (0.00%)	0.00 (0.00%)	NaNx	
99th_percentile_layer_cache_memory	0.00 (0.00%)	2.11 (173.21%)	0.00x	
worst_layer_cache_memory	0.00 (0.00%)	2.11 (173.21%)	0.00x	
average_picture_cache_count	0.40 (15.03%)	0.63 (7.58%)	0.63x	
90th_percentile_picture_cache_count	0.00 (0.00%)	0.00 (0.00%)	NaNx	
99th_percentile_picture_cache_count	10.50 (14.29%)	16.00 (7.65%)	0.66x	
worst_picture_cache_count	10.50 (14.29%)	16.25 (7.99%)	0.65x	
average_picture_cache_memory	0.36 (15.03%)	0.57 (7.58%)	0.63x	
90th_percentile_picture_cache_memory	0.00 (0.00%)	0.00 (0.00%)	NaNx	
99th_percentile_picture_cache_memory	9.49 (14.29%)	14.47 (7.65%)	0.66x	
worst_picture_cache_memory	9.49 (14.29%)	14.69 (7.99%)	0.65x	
new_gen_gc_count	2.00 (0.00%)	4.00 (0.00%)	0.50x	
old_gen_gc_count	0.00 (0.00%)	0.50 (173.21%)	0.00x	
average_vsync_transitions_missed	1.96 (6.10%)	1.82 (20.49%)	1.08x	
90th_percentile_vsync_transitions_missed	3.50 (24.74%)	3.75 (39.44%)	0.93x	
99th_percentile_vsync_transitions_missed	5.75 (7.53%)	5.25 (24.74%)	1.10x	
30hz_frame_percentage	0.00 (0.00%)	0.00 (0.00%)	NaNx	
60hz_frame_percentage	8.93 (0.00%)	8.89 (0.76%)	1.00x	
80hz_frame_percentage	0.00 (0.00%)	0.00 (0.00%)	NaNx	
90hz_frame_percentage	91.07 (0.00%)	91.11 (0.07%)	1.00x	
120hz_frame_percentage	0.00 (0.00%)	0.00 (0.00%)	NaNx	
illegal_refresh_rate_frame_count	0.00 (0.00%)	0.00 (0.00%)	NaNx	

The result is run the textfield_perf__timeline_summary benchmark

@flar
Copy link
Contributor Author

flar commented Jul 11, 2022

I took a quick look at the textfield_perf benchmark and it is poorly written. Adding a page delay to allow it to settle after the page transition drops the frame rasterizer times by 3x. I'll propose a fix to the benchmark driver for it later, but for now I don't think it makes a good test case for the RasterCache work.

@flar
Copy link
Contributor Author

flar commented Jul 11, 2022

Some figures from new_gallery__transitions_perf

raster times:
                avg       90%       99%       worst
originally    14.7ms    26.4ms    73.6ms     759ms
31892         16.5ms    33.1ms    95.3ms     763ms
This PR       14.7ms    26.2ms    73.7ms     779ms


picture cache counts
                avg       90%       99%      worst
originally      3.4        7        13        13
31892          10.95      15        22        24
This PR         2.9        6         9        10

picture cache memory
                avg       90%       99%       worst
originally    19.8mb    35.7mb    52.3mb     52.3mb
31892         24.5mb    39.1mb    55.7mb     56.8mb
This PR       10.0mb    32.7mb    32.7mb     32.7mb

The raster times are about the same with significant reductions in memory usage. The memory savings was likely due to the elimination of "double caching" which was the purpose of #31892 but some logic errors that caused it to over-cache DisplayLists obscured those savings.

@flar flar requested a review from JsouLiang July 11, 2022 10:03
@zanderso
Copy link
Member

SkiaPerf is reporting a big improvement to worst frame raster time on color_filter_and_fade_perf on Android for this change. The benchmark hadn't regressed with the initial raster cache refactor, so this is an overall improvement. (It's also so big, that I'm wondering if we should run it locally to make sure it's still working correctly.)

https://flutter-flutter-perf.skia.org/e/?begin=1657472407&end=1657567386&keys=X4f552e8924717034f0b067c81cc2acba&num_commits=50&request_type=1&xbaroffset=29803

@flar
Copy link
Contributor Author

flar commented Jul 14, 2022

About the color_filter_and_face benchmark.

I started looking into this because it wasn't entirely clear how the logic changes would have affected the performance here, but after looking at some of the data I believe what is happening is that the churn in the RasterCache ended up causing us to miss sync on some frames. In particular, the worst frames happen when the cache is being invalidated and revalidated when the fade animation hits 0 or 1 where the framework disrupts the tree to add/remove the opacity layer or omit the subtree entirely. Omitting the subtree will cause us to have to re-render the DL when it re-appears as the animation takes the opacity up from 0 again.

Render time graph before RasterCache refactoring pre-refactor-render times
Render time graph after RasterCache refactoring, before this fix post-refactor-render times
Render time graph after Refactor and this fix post-refactor-and-fix-render times

You can see that the middle spike of around 40ms is there on all 3 graphs, but the two spikes early and late in the graphs that take about 120ms are missing on the last graph. The question is what is causing those spikes. Looking at the caching patterns can give us some evidence.

Layer/DL caching flows before refactor pre-refactor cache flows
Layer/DL caching flows after refactor post-refactor cache flows
Layer/DL caching flow after refactor and this fix post-refactor-and-fix cache flows

In the first case before the refactor it looks like the animation caching was mostly handled by the layer cache, likely with the "layer child cache" mechanism. The DisplayList cache flow appears to be the controls that are rendered outside of the animating region being cached. Indeed, looking at the tracing we see that the OpacityLayer renders from the cache while another DisplayListLayer after it in the tree also renders from the cache, which matches that hypothesis.

Single frame trace graph showing OpacityLayer using child caching pre-refactor frame trace

The case after the refactor looks like it switches back and forth between child layer caching from the OpacityLayer to "peephole optimized" caching using the DisplayList cache. Indeed, looking at the tracing for the period of time where the long flow is in the Layer flow graph you see a trace that looks exactly like the one just above from the "pre-refactor" trace. But, looking at the tracing in the time frame where there are 2 DL flows in parallel shows that the peephole optimization is in use during that time frame as can be seen in the following graph:

Single frame trace graph showing OpacityLayer handing opacity down to DL with peephole optimization post-refactor frame trace

Finally, looking at the trace with this fix applied, it appears that the peephole opacity mechanism is always used - which is what that mechanism was designed for, as all of the frames have a similar pattern to the second tracing graph above.

What I haven't dug into, though, is why the first two render graphs both have the large spikes to the 120ms range while the 3rd graph does not. It could be related to the fact that one of the things this fix allows is for the DL cache entries to discover that they are a stable part of the frame even if a parent is already caching them. That doesn't necessarily lead to the double-caching situation we had before where the same content would be cached both as the children of the OpacityLayer in the layer cache and as the DL in the child in the DL cache since the caching of the child will be prohibited as long as the parent is already caching. What it does allow is the switch from layer child caching to the peephole opacity scenario much more reliably.

@flar
Copy link
Contributor Author

flar commented Jul 14, 2022

One more clue to leave here. The following is a graph of the tracing during one of the "long frame times" sections in the "pre-refactor" and "post-refactor-pre-fix" timelines. It shows that we are spending most of our time in SkCanvas::Flush and most of that time is spent in a wait state, so we are running into some kind of resource constraint low down inside the canvas flush mechanism. I'm not sure exactly what is causing those waits, though.

Render time graph before RasterCache refactoring trace of slow frames

@zanderso
Copy link
Member

It shows that we are spending most of our time in SkCanvas::Flush and most of that time is spent in a wait state, so we are running into some kind of resource constraint low down inside the canvas flush mechanism. I'm not sure exactly what is causing those waits, though.

I think we have a flag for more detailed trace events from Skia. I think it's --trace-skia? @jason-simmons may know how to spell it correctly.

@flar
Copy link
Contributor Author

flar commented Jul 14, 2022

It shows that we are spending most of our time in SkCanvas::Flush and most of that time is spent in a wait state, so we are running into some kind of resource constraint low down inside the canvas flush mechanism. I'm not sure exactly what is causing those waits, though.

I think we have a flag for more detailed trace events from Skia. I think it's --trace-skia? @jason-simmons may know how to spell it correctly.

I've tried to diagnose these "stuck in flush" scenarios before and didn't get very far, even with --trace-skia. If someone can figure out exactly where we are getting stuck and how we can avoid it that would go a long way to avoid some of our worst "worst frame rate" issues that come up. There are some benchmarks where we average around 8-10ms/frame and then you can do a run where every frame is 16+ms/frame and if you look at the trace every single frame is stuck in a wait in a flush. We basically get all the work done well before the frame ends, but then the flush doesn't complete until after the next vsync and so we miss every frame at that point. Then you can poke the app and suddenly its back to 8-10ms/frame again. It's almost like you just have to momentarily stop rendering for a frame or two and it will reset itself.

I've been kind of looking at it like a cat. You can pet it all you want, but without warning it may decide "that's too much petting" and inform you by way of claws.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
autosubmit Merge PR when tree becomes green via auto submit App
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants