-
Notifications
You must be signed in to change notification settings - Fork 6k
[Impeller] Add guidance for writing shaders #34634
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
impeller/docs/shader_optimization.md
Outdated
simpler shader compilation and better handling of traditionally slow shader | ||
code. In fact, straight forward "unoptimized" shader code filled with branches | ||
may significantly outperform the equivalent branchless optimized shader code | ||
when targeting newer GPU architectures. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"when targeting newer architectures due to ..."
If there's a simple 1-phrase explanation or summary it would be good to fill that in here rather than leaving it a mystery. Absent that, maybe point to the PR or issue where we had that discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment directing readers to the "Don't flatten simple varying" section for an example/explanation of this.
impeller/docs/shader_optimization.md
Outdated
essential for making good tradeoff decisions while writing shaders. | ||
|
||
For these reasons, it's also important to profile shaders against some of the | ||
older devices that Flutter can target (such as the iPhone 4s) when making |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We no longer support 32-bit iOS devices. The oldest iOS device we now support is the oldest 64-bit device, which I think is the 5s? Please double-check me on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably also mention something here about backend differences and/or targetting multiple platforms (GL, Metal)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops -- went with 6s for safety.
We should probably also mention something here about backend differences and/or targetting multiple platforms (GL, Metal)
Good idea, added an excerpt about testing different backends, as the early stages of shader compilation (and the code output by ImpellerC) may vary quite a bit.
impeller/docs/shader_optimization.md
Outdated
virtually all GPU architectures in use today have instruction-level support for | ||
dynamic branching, and it's quite unlikely that we'll come across a mobile | ||
device capable of running Flutter that doesn't. For example, the oldest devices | ||
we test against in CI (iPhone 4s and Moto G4) run GPUs that support dynamic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The oldest (and only) iPhone model in CI is (I think) an iPhone 6 (or 6s?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be better to mention GPU models rather than phone models in this document anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to 6s and updated references to it's GPU below. Lucky for me it still uses non-scalar FUs even though it hit the market in 2015, so I don't have to go looking around for other SIMD GPUs. :)
impeller/docs/shader_optimization.md
Outdated
to a _uniform_ or _constant_, which will remain the same for the whole draw | ||
call). | ||
|
||
On SIMT architectures, this branch incurs very little overhead because, and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On SIMT architectures, this branch incurs very little overhead because, and | |
On SIMT architectures, this branch incurs very little overhead because |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
The same concerns and advice apply to this branch as the scenario under "Avoid | ||
complex varying branches". | ||
|
||
### Use lower precision whenever possible |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[optional] Maybe add an example here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might also be nice to comment on how and where in the stack we use different precisions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a shader compilation concern, and every common shading language supports some kind of floating point precision directive. For Impeller shaders it should be as easy as specifying precision mediump float;
. I plan to update this recommendation with more details after I try it out with some of our shaders and profile the difference.
impeller/docs/shader_optimization.md
Outdated
### Thread-level parallelism | ||
|
||
Newer GPUs (but also some older hardware such as the Adreno 306 GPU found on the | ||
Moto G4's Snapdragon SOC) use scalar functional units (no SIMD/VLIW/MIMD) and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SoC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend posting in this in the Flutter & Skia chat and seeing if some folks there have additional feedback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome and explains the results of the profiling runs well. A great start to keep adding to as we discover and tune our findings.
Barring some nits already mentioned. LGTM.
} | ||
``` | ||
|
||
Note that `color` is _varying_. Specifically, it's an interpolated output from a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's file a separate issue to just define these terms in a separate doc. We use varying
a lot but since our shaders are SL 4.60, we never use the term in the shader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done: flutter/flutter#107609
@@ -0,0 +1,280 @@ | |||
# Writing efficient shaders |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please cross reference this from impeller/README.md. There is a TOC at the bottom.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Co-authored-by: Zachary Anderson <[email protected]>
Dropped a link in the Skia & Flutter chat about this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
cost of the conditional. The worst case scenario is that some of the warp's | ||
threads fail the conditional and the rest succeed, requiring the program to | ||
execute both paths of the branch back-to-back in the warp. Note that this is | ||
very favorable to the SIMD scenario with non-uniform/varying branches, as SIMT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "this approach is very preferable over"
I'm gonna swing back around and reword the PowerVR stuff. Overall situation with branches isn't really as bad as I made it out to be here. The gist is that the compiler has less flexibility when packing instruction words, but it should still be able to make use of vector registers within branches, etc... |
…er/engine#34634) (#107870) Commit: a1e4cfad5e1ada5b63171009778cb3caf0b920ce
Resolves flutter/flutter#105062 (comment).
This doc describes how widely used mobile GPU architectures parallelize instructions and makes some recommendations to get the best out of this combined behavior. I've only been able to verify the SIMT behavior through profiling so far, as I don't yet have any easily-to-profile devices on hand with an architecture that relies on non-scalar instructions for parallelism (VLIW or SIMD).
I'd like to add some notes around <=16 bit precision, mipmapping, and shader library utilities once we actually start using them a bit in Impeller. Also, rough guidance on how to diagnose poor parallelism/high instruction count using vendor-specific shader analysis tools by Qualcomm/ARM/AMD/Nvidia would probably come in handy (and force me to try using them all at least once :)).