-
Notifications
You must be signed in to change notification settings - Fork 6k
Conversation
It looks like this pull request may not have tests. Please make sure to add tests before merging. If you need an exemption to this rule, contact Hixie on the #hackers channel in Chat (don't just cc him here, he won't see it! He's on Discord!). If you are not sure if you need tests, consider this rule of thumb: the purpose of a test is to make sure someone doesn't accidentally revert the fix. Ask yourself, is there anything in your PR that you feel it is important we not accidentally revert back to how it was before your fix? Reviewers: Read the Tree Hygiene page and make sure this patch meets those guidelines before LGTMing. |
I'm still working on a unit (not benchmark) test that will verify that the regression behavior is gone. |
I've added a test that verifies the DisplayListLayer caching behavior as it slips in and out of the cull rect to prevent future regressions. |
The result is run the textfield_perf__timeline_summary benchmark |
I took a quick look at the textfield_perf benchmark and it is poorly written. Adding a page delay to allow it to settle after the page transition drops the frame rasterizer times by 3x. I'll propose a fix to the benchmark driver for it later, but for now I don't think it makes a good test case for the RasterCache work. |
Some figures from new_gallery__transitions_perf
The raster times are about the same with significant reductions in memory usage. The memory savings was likely due to the elimination of "double caching" which was the purpose of #31892 but some logic errors that caused it to over-cache DisplayLists obscured those savings. |
SkiaPerf is reporting a big improvement to worst frame raster time on color_filter_and_fade_perf on Android for this change. The benchmark hadn't regressed with the initial raster cache refactor, so this is an overall improvement. (It's also so big, that I'm wondering if we should run it locally to make sure it's still working correctly.) |
About the color_filter_and_face benchmark. I started looking into this because it wasn't entirely clear how the logic changes would have affected the performance here, but after looking at some of the data I believe what is happening is that the churn in the RasterCache ended up causing us to miss sync on some frames. In particular, the worst frames happen when the cache is being invalidated and revalidated when the fade animation hits 0 or 1 where the framework disrupts the tree to add/remove the opacity layer or omit the subtree entirely. Omitting the subtree will cause us to have to re-render the DL when it re-appears as the animation takes the opacity up from 0 again. You can see that the middle spike of around 40ms is there on all 3 graphs, but the two spikes early and late in the graphs that take about 120ms are missing on the last graph. The question is what is causing those spikes. Looking at the caching patterns can give us some evidence. In the first case before the refactor it looks like the animation caching was mostly handled by the layer cache, likely with the "layer child cache" mechanism. The DisplayList cache flow appears to be the controls that are rendered outside of the animating region being cached. Indeed, looking at the tracing we see that the OpacityLayer renders from the cache while another DisplayListLayer after it in the tree also renders from the cache, which matches that hypothesis. The case after the refactor looks like it switches back and forth between child layer caching from the OpacityLayer to "peephole optimized" caching using the DisplayList cache. Indeed, looking at the tracing for the period of time where the long flow is in the Layer flow graph you see a trace that looks exactly like the one just above from the "pre-refactor" trace. But, looking at the tracing in the time frame where there are 2 DL flows in parallel shows that the peephole optimization is in use during that time frame as can be seen in the following graph: Finally, looking at the trace with this fix applied, it appears that the peephole opacity mechanism is always used - which is what that mechanism was designed for, as all of the frames have a similar pattern to the second tracing graph above. What I haven't dug into, though, is why the first two render graphs both have the large spikes to the 120ms range while the 3rd graph does not. It could be related to the fact that one of the things this fix allows is for the DL cache entries to discover that they are a stable part of the frame even if a parent is already caching them. That doesn't necessarily lead to the double-caching situation we had before where the same content would be cached both as the children of the OpacityLayer in the layer cache and as the DL in the child in the DL cache since the caching of the child will be prohibited as long as the parent is already caching. What it does allow is the switch from layer child caching to the peephole opacity scenario much more reliably. |
One more clue to leave here. The following is a graph of the tracing during one of the "long frame times" sections in the "pre-refactor" and "post-refactor-pre-fix" timelines. It shows that we are spending most of our time in SkCanvas::Flush and most of that time is spent in a wait state, so we are running into some kind of resource constraint low down inside the canvas flush mechanism. I'm not sure exactly what is causing those waits, though. |
I think we have a flag for more detailed trace events from Skia. I think it's |
I've tried to diagnose these "stuck in flush" scenarios before and didn't get very far, even with --trace-skia. If someone can figure out exactly where we are getting stuck and how we can avoid it that would go a long way to avoid some of our worst "worst frame rate" issues that come up. There are some benchmarks where we average around 8-10ms/frame and then you can do a run where every frame is 16+ms/frame and if you look at the trace every single frame is stuck in a wait in a flush. We basically get all the work done well before the frame ends, but then the flush doesn't complete until after the next vsync and so we miss every frame at that point. Then you can poke the app and suddenly its back to 8-10ms/frame again. It's almost like you just have to momentarily stop rendering for a frame or two and it will reset itself. I've been kind of looking at it like a cat. You can pet it all you want, but without warning it may decide "that's too much petting" and inform you by way of claws. |
This should fix the regressions seen in flutter/flutter#106987 originally caused by #31892
The logic for when to increment the access count for a DisplayList cache entry was tied to whether or not a cache entry existed for it which is convoluted logic. This PR introduces an explicit flag to indicate whether the cache item will be visible in the scene and then the logic for incrementing the access count can key off of that flag instead of a side effect of having a cache entry.
Some benchmark results. In all cases A is the Flutter framework tree using an engine before #31892 was integrated and B is the same tree using an engine with both that PR and this patch. Specifically framework commit flutter/flutter@caadc25 was used.
textfield perf A/B results
raster cache memory use A/B results
backdropfilter perf A/B results