[ET-VK] Minor performance improvements to native layer norm. #9892

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

trivedivivek wants to merge 9 commits into gh/trivedivivek/74/base from gh/trivedivivek/74/head

Contributor

trivedivivek commented Apr 4, 2025 •

edited

Loading

Stack from ghstack (oldest at bottom):

This diff introduces minor performance improvements to the native layer norm function in the Vulkan backend of Executorch.

In this new approach:
The mean and variance values are calculated in 2 separate passes.
Shader is dispatched based on input texture size, and input texel is read and stored in shared memory.
Input stored in shard memory is then summed up using a reduce function.

This implementation better utilizes a GPUs parallel processing capabilities.

Differential Revision: D72430290


          [ET-VK] Minor performance improvements to native layer norm.

453b4dc

This diff introduces minor performance improvements to the native layer norm function in the Vulkan backend of Executorch.

In this new approach:
The mean and variance values are calculated in 2 separate passes.
Shader is dispatched based on input texture size, and input texel is read and stored in shared memory.
Input stored in shard memory is then summed up using a reduce function.

This implementation better utilizes a GPUs parallel processing capabilities.

Differential Revision: [D72430290](https://our.internmc.facebook.com/intern/diff/D72430290/)

[ghstack-poisoned]

trivedivivek requested a review from SS-JIA as a code owner

April 4, 2025 04:54

pytorch-bot bot commented Apr 4, 2025 •

edited

Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9892

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit 667e472 with merge base 047bbc7 ():

NEW FAILURES - The following jobs have failed:

pull / test-models-linux (emformer_transcribe, xnnpack-quantization-delegation, linux.2xlarge) / linux-job (gh)
Missing file at path: /home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_output_2f9b9877-0074-44a0-9daa-3ba87c1fcd3e
pull / test-models-linux (linear, portable, linux.2xlarge) / linux-job (gh)
Missing file at path: /home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_output_389cb005-1496-4eb0-9e22-0e42027b4c42

This comment was automatically generated by Dr. CI and updates every 15 minutes.

trivedivivek added a commit that referenced this pull request


          [ET-VK] Minor performance improvements to native layer norm.

ef6fbce

This diff introduces minor performance improvements to the native layer norm function in the Vulkan backend of Executorch.

In this new approach:
The mean and variance values are calculated in 2 separate passes.
Shader is dispatched based on input texture size, and input texel is read and stored in shared memory.
Input stored in shard memory is then summed up using a reduce function.

This implementation better utilizes a GPUs parallel processing capabilities.

Differential Revision: [D72430290](https://our.internmc.facebook.com/intern/diff/D72430290/)

ghstack-source-id: 276053981
Pull Request resolved: #9892

facebook-github-bot added the CLA Signed label

Contributor

facebook-github-bot commented Apr 4, 2025

This pull request was exported from Phabricator. Differential Revision: D72430290

trivedivivek added the topic: not user facing label


          Update on "[ET-VK] Minor performance improvements to native layer norm."

1407ff7

This diff introduces minor performance improvements to the native layer norm function in the Vulkan backend of Executorch.

In this new approach:
The mean and variance values are calculated in 2 separate passes.
Shader is dispatched based on input texture size, and input texel is read and stored in shared memory.
Input stored in shard memory is then summed up using a reduce function.

This implementation better utilizes a GPUs parallel processing capabilities.

Differential Revision: [D72430290](https://our.internmc.facebook.com/intern/diff/D72430290/)

[ghstack-poisoned]

trivedivivek added a commit that referenced this pull request


          [ET-VK] Minor performance improvements to native layer norm.

e1d9986

Pull Request resolved: #9892

This diff introduces minor performance improvements to the native layer norm function in the Vulkan backend of Executorch.

In this new approach:
The mean and variance values are calculated in 2 separate passes.
Shader is dispatched based on input texture size, and input texel is read and stored in shared memory.
Input stored in shard memory is then summed up using a reduce function.

This implementation better utilizes a GPUs parallel processing capabilities.

Differential Revision: [D72430290](https://our.internmc.facebook.com/intern/diff/D72430290/)
ghstack-source-id: 276439596

Contributor

facebook-github-bot commented Apr 7, 2025

This pull request was exported from Phabricator. Differential Revision: D72430290

facebook-github-bot added the fb-exported label

SS-JIA approved these changes

View reviewed changes


          Update on "[ET-VK] Minor performance improvements to native layer norm."

9cb166b

This diff introduces minor performance improvements to the native layer norm function in the Vulkan backend of Executorch.

In this new approach:
The mean and variance values are calculated in 2 separate passes.
Shader is dispatched based on input texture size, and input texel is read and stored in shared memory.
Input stored in shard memory is then summed up using a reduce function.

This implementation better utilizes a GPUs parallel processing capabilities.

Differential Revision: [D72430290](https://our.internmc.facebook.com/intern/diff/D72430290/)

[ghstack-poisoned]

trivedivivek added a commit that referenced this pull request


          [ET-VK] Minor performance improvements to native layer norm.

58605bb

Pull Request resolved: #9892

This diff introduces minor performance improvements to the native layer norm function in the Vulkan backend of Executorch.

In this new approach:
The mean and variance values are calculated in 2 separate passes.
Shader is dispatched based on input texture size, and input texel is read and stored in shared memory.
Input stored in shard memory is then summed up using a reduce function.

This implementation better utilizes a GPUs parallel processing capabilities.
ghstack-source-id: 276575089

Differential Revision: [D72430290](https://our.internmc.facebook.com/intern/diff/D72430290/)

Contributor

facebook-github-bot commented Apr 7, 2025

This pull request was exported from Phabricator. Differential Revision: D72430290


          Update on "[ET-VK] Minor performance improvements to native layer norm."

cfb8351

This diff introduces minor performance improvements to the native layer norm function in the Vulkan backend of Executorch.

In this new approach:
The mean and variance values are calculated in 2 separate passes.
Shader is dispatched based on input texture size, and input texel is read and stored in shared memory.
Input stored in shard memory is then summed up using a reduce function.

This implementation better utilizes a GPUs parallel processing capabilities.

Differential Revision: [D72430290](https://our.internmc.facebook.com/intern/diff/D72430290/)

[ghstack-poisoned]

trivedivivek added a commit that referenced this pull request


          [ET-VK] Minor performance improvements to native layer norm.

0a68f5e

Pull Request resolved: #9892

This diff introduces minor performance improvements to the native layer norm function in the Vulkan backend of Executorch.

In this new approach:
The mean and variance values are calculated in 2 separate passes.
Shader is dispatched based on input texture size, and input texel is read and stored in shared memory.
Input stored in shard memory is then summed up using a reduce function.

This implementation better utilizes a GPUs parallel processing capabilities.
ghstack-source-id: 276877983

Differential Revision: [D72430290](https://our.internmc.facebook.com/intern/diff/D72430290/)

Contributor

facebook-github-bot commented Apr 8, 2025

This pull request was exported from Phabricator. Differential Revision: D72430290

trivedivivek mentioned this pull request

[ET-VK] Tuning native layer norm local workgroup size to improve thread occupancy during reduce. #9984

Open


          Update on "[ET-VK] Minor performance improvements to native layer norm."

fbb5f98

This diff introduces minor performance improvements to the native layer norm function in the Vulkan backend of Executorch.

In this new approach:
The mean and variance values are calculated in 2 separate passes.
Shader is dispatched based on input texture size, and input texel is read and stored in shared memory.
Input stored in shard memory is then summed up using a reduce function.

This implementation better utilizes a GPUs parallel processing capabilities.

Differential Revision: [D72430290](https://our.internmc.facebook.com/intern/diff/D72430290/)

[ghstack-poisoned]

This was referenced Apr 11, 2025

[ET-VK] Modify quantized linear naive shader to linearly dispatch work to improve performance. #10116

Open

[ET-VK] Minor improvement to permute op. #10117

Open

Contributor

facebook-github-bot commented Apr 11, 2025

This pull request was exported from Phabricator. Differential Revision: D72430290


          Update on "[ET-VK] Minor performance improvements to native layer norm."

d6a49ed

This diff introduces minor performance improvements to the native layer norm function in the Vulkan backend of Executorch.

In this new approach:
The mean and variance values are calculated in 2 separate passes.
Shader is dispatched based on input texture size, and input texel is read and stored in shared memory.
Input stored in shard memory is then summed up using a reduce function.

This implementation better utilizes a GPUs parallel processing capabilities.

Differential Revision: [D72430290](https://our.internmc.facebook.com/intern/diff/D72430290/)

[ghstack-poisoned]

Contributor

facebook-github-bot commented Apr 14, 2025

This pull request was exported from Phabricator. Differential Revision: D72430290

github-actions bot mentioned this pull request

Weekly pr metrics report - 2025-04-01..2025-04-07 wdvr/pytorch#28

Open


          Update on "[ET-VK] Minor performance improvements to native layer norm."

0f82910

This diff introduces minor performance improvements to the native layer norm function in the Vulkan backend of Executorch.

In this new approach:
The mean and variance values are calculated in 2 separate passes.
Shader is dispatched based on input texture size, and input texel is read and stored in shared memory.
Input stored in shard memory is then summed up using a reduce function.

This implementation better utilizes a GPUs parallel processing capabilities.

Differential Revision: [D72430290](https://our.internmc.facebook.com/intern/diff/D72430290/)

[ghstack-poisoned]

Contributor

facebook-github-bot commented Apr 14, 2025

This pull request was exported from Phabricator. Differential Revision: D72430290


          Update on "[ET-VK] Minor performance improvements to native layer norm."

fc5cdde

This diff introduces minor performance improvements to the native layer norm function in the Vulkan backend of Executorch.

In this new approach:
The mean and variance values are calculated in 2 separate passes.
Shader is dispatched based on input texture size, and input texel is read and stored in shared memory.
Input stored in shard memory is then summed up using a reduce function.

This implementation better utilizes a GPUs parallel processing capabilities.

Differential Revision: [D72430290](https://our.internmc.facebook.com/intern/diff/D72430290/)

[ghstack-poisoned]

Contributor

facebook-github-bot commented Apr 16, 2025

This pull request was exported from Phabricator. Differential Revision: D72430290


          Update on "[ET-VK] Minor performance improvements to native layer norm."

667e472

This diff introduces minor performance improvements to the native layer norm function in the Vulkan backend of Executorch.

In this new approach:
The mean and variance values are calculated in 2 separate passes.
Shader is dispatched based on input texture size, and input texel is read and stored in shared memory.
Input stored in shard memory is then summed up using a reduce function.

This implementation better utilizes a GPUs parallel processing capabilities.

Differential Revision: [D72430290](https://our.internmc.facebook.com/intern/diff/D72430290/)

[ghstack-poisoned]

Contributor

facebook-github-bot commented Apr 16, 2025

This pull request was exported from Phabricator. Differential Revision: D72430290

github-actions bot mentioned this pull request

Weekly pr metrics report - 2025-04-01..2025-04-07 wdvr/pytorch#30

Open

github-actions bot mentioned this pull request

Weekly pr metrics report - 2025-04-01..2025-04-07 wdvr/pytorch#35

Open

kedarnath03 pushed a commit to kedarnath03/executorch that referenced this pull request


          [ET-VK] Minor performance improvements to native layer norm.

4b58bea

Pull Request resolved: pytorch/executorch#9892

This diff introduces minor performance improvements to the native layer norm function in the Vulkan backend of Executorch.

In this new approach:
The mean and variance values are calculated in 2 separate passes.
Shader is dispatched based on input texture size, and input texel is read and stored in shared memory.
Input stored in shard memory is then summed up using a reduce function.

This implementation better utilizes a GPUs parallel processing capabilities.
ghstack-source-id: 278469025

Differential Revision: [D72430290](https://our.internmc.facebook.com/intern/diff/D72430290/)

github-actions bot commented Aug 31, 2025

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

github-actions bot added the stale label

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed fb-exported stale topic: not user facing