Skip to content
This repository was archived by the owner on Feb 25, 2025. It is now read-only.

[Impeller] Add guidance for writing shaders #34634

Merged
merged 3 commits into from
Jul 18, 2022
Merged

Conversation

bdero
Copy link
Member

@bdero bdero commented Jul 13, 2022

Resolves flutter/flutter#105062 (comment).

This doc describes how widely used mobile GPU architectures parallelize instructions and makes some recommendations to get the best out of this combined behavior. I've only been able to verify the SIMT behavior through profiling so far, as I don't yet have any easily-to-profile devices on hand with an architecture that relies on non-scalar instructions for parallelism (VLIW or SIMD).

I'd like to add some notes around <=16 bit precision, mipmapping, and shader library utilities once we actually start using them a bit in Impeller. Also, rough guidance on how to diagnose poor parallelism/high instruction count using vendor-specific shader analysis tools by Qualcomm/ARM/AMD/Nvidia would probably come in handy (and force me to try using them all at least once :)).

@bdero bdero force-pushed the bdero/shader-docs branch from 92d9d3a to 0b5f791 Compare July 13, 2022 13:29
Copy link
Member

@zanderso zanderso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

simpler shader compilation and better handling of traditionally slow shader
code. In fact, straight forward "unoptimized" shader code filled with branches
may significantly outperform the equivalent branchless optimized shader code
when targeting newer GPU architectures.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"when targeting newer architectures due to ..."

If there's a simple 1-phrase explanation or summary it would be good to fill that in here rather than leaving it a mystery. Absent that, maybe point to the PR or issue where we had that discussion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment directing readers to the "Don't flatten simple varying" section for an example/explanation of this.

essential for making good tradeoff decisions while writing shaders.

For these reasons, it's also important to profile shaders against some of the
older devices that Flutter can target (such as the iPhone 4s) when making
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We no longer support 32-bit iOS devices. The oldest iOS device we now support is the oldest 64-bit device, which I think is the 5s? Please double-check me on that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably also mention something here about backend differences and/or targetting multiple platforms (GL, Metal)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops -- went with 6s for safety.

We should probably also mention something here about backend differences and/or targetting multiple platforms (GL, Metal)

Good idea, added an excerpt about testing different backends, as the early stages of shader compilation (and the code output by ImpellerC) may vary quite a bit.

virtually all GPU architectures in use today have instruction-level support for
dynamic branching, and it's quite unlikely that we'll come across a mobile
device capable of running Flutter that doesn't. For example, the oldest devices
we test against in CI (iPhone 4s and Moto G4) run GPUs that support dynamic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The oldest (and only) iPhone model in CI is (I think) an iPhone 6 (or 6s?).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to mention GPU models rather than phone models in this document anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to 6s and updated references to it's GPU below. Lucky for me it still uses non-scalar FUs even though it hit the market in 2015, so I don't have to go looking around for other SIMD GPUs. :)

to a _uniform_ or _constant_, which will remain the same for the whole draw
call).

On SIMT architectures, this branch incurs very little overhead because, and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
On SIMT architectures, this branch incurs very little overhead because, and
On SIMT architectures, this branch incurs very little overhead because

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

The same concerns and advice apply to this branch as the scenario under "Avoid
complex varying branches".

### Use lower precision whenever possible
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[optional] Maybe add an example here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might also be nice to comment on how and where in the stack we use different precisions.

Copy link
Member Author

@bdero bdero Jul 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a shader compilation concern, and every common shading language supports some kind of floating point precision directive. For Impeller shaders it should be as easy as specifying precision mediump float;. I plan to update this recommendation with more details after I try it out with some of our shaders and profile the difference.

### Thread-level parallelism

Newer GPUs (but also some older hardware such as the Adreno 306 GPU found on the
Moto G4's Snapdragon SOC) use scalar functional units (no SIMD/VLIW/MIMD) and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SoC

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@dnfield dnfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend posting in this in the Flutter & Skia chat and seeing if some folks there have additional feedback

Copy link
Member

@chinmaygarde chinmaygarde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome and explains the results of the profiling runs well. A great start to keep adding to as we discover and tune our findings.

Barring some nits already mentioned. LGTM.

}
```

Note that `color` is _varying_. Specifically, it's an interpolated output from a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's file a separate issue to just define these terms in a separate doc. We use varying a lot but since our shaders are SL 4.60, we never use the term in the shader.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -0,0 +1,280 @@
# Writing efficient shaders
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please cross reference this from impeller/README.md. There is a TOC at the bottom.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@bdero
Copy link
Member Author

bdero commented Jul 14, 2022

Dropped a link in the Skia & Flutter chat about this.

Copy link
Member

@zanderso zanderso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

cost of the conditional. The worst case scenario is that some of the warp's
threads fail the conditional and the rest succeed, requiring the program to
execute both paths of the branch back-to-back in the warp. Note that this is
very favorable to the SIMD scenario with non-uniform/varying branches, as SIMT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "this approach is very preferable over"

@bdero bdero force-pushed the bdero/shader-docs branch from 9500e34 to 405b443 Compare July 18, 2022 00:30
@bdero
Copy link
Member Author

bdero commented Jul 18, 2022

I'm gonna swing back around and reword the PowerVR stuff. Overall situation with branches isn't really as bad as I made it out to be here. The gist is that the compiler has less flexibility when packing instruction words, but it should still be able to make use of vector registers within branches, etc...

@bdero bdero merged commit efc604e into flutter:main Jul 18, 2022
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Jul 18, 2022
zanderso pushed a commit to flutter/flutter that referenced this pull request Jul 19, 2022
sourcegraph-bot pushed a commit to sgtest/megarepo that referenced this pull request Jul 19, 2022
…er/engine#34634) (#107870)

Commit: a1e4cfad5e1ada5b63171009778cb3caf0b920ce
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

[Impeller] Document patterns for writing performant shaders.
4 participants