Skip to content

Move default Vela/Regor configurations to Sram_Only #10279

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

3l1
Copy link
Contributor

@3l1 3l1 commented Apr 17, 2025

Summary: Move default Vela/Regor configurations to Sram_Only

Differential Revision: D73212758

@3l1 3l1 requested a review from digantdesai as a code owner April 17, 2025 21:13
Copy link

pytorch-bot bot commented Apr 17, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10279

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 1 New Failure

As of commit 758e29c with merge base 3997ae9 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 17, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73212758

@3l1
Copy link
Contributor Author

3l1 commented Apr 17, 2025

@pytorchbot label "topic: not user facing"

@3l1 3l1 force-pushed the export-D73212758 branch from 4d1f3b0 to cccb1e2 Compare April 21, 2025 02:44
3l1 added a commit to 3l1/executorch that referenced this pull request Apr 21, 2025
Summary:

Move default Vela/Regor configurations to Sram_Only

Differential Revision: D73212758
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73212758

3l1 added a commit to 3l1/executorch that referenced this pull request Apr 21, 2025
Summary:
Pull Request resolved: pytorch#10279

Move default Vela/Regor configurations to Sram_Only

Differential Revision: D73212758
@3l1 3l1 force-pushed the export-D73212758 branch from cccb1e2 to 87434e3 Compare April 21, 2025 02:48
@@ -93,7 +93,7 @@ def get_tosa_compile_spec_unbuilt(
def get_u55_compile_spec(
macs: int = 128,
system_config: str = "Ethos_U55_High_End_Embedded",
memory_mode: str = "Shared_Sram",
memory_mode: str = "Sram_Only",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the rationale for changing the default? I don't see any issues, asking because I am not sure why this was the default before.

cc @freddan80, @zingo

Copy link
Collaborator

@gggekov gggekov May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @digantdesai @3l1 ,
For U55, Shared_Sram is the most widely used memory mode. The reason is that on most embedded SoCs, you have limited amount of SRAM, hence most people prefer to place the NN in the Flash and use the SRAM only for the scratch_buffer(the SRAM will be used also for SW stack outside of the inference, e.g. pre/post processing, running an RTOS, etc). If you have a small model and SoC with enough SRAM for the NN, scratch buffer and your overall SW stack, then yes, it makes sense to use Sram_Only and you will get the performance benefit from the lower latency/higher bandwidth for the NN, but I would say this is the exception rather than the rule.

Also, the Corstone-300 has 2MB of SRAM and 256MB of DDR(can behave like Flash after the Timing Adapter parameters). Right now, for greater flexibility, we place everything in the DDR in the linker script. Then, via the REGIONCFG registers we tell the NPU to read the weights via AXI1(AXI1 providing Flash latency/BW thanks to the Timing Adapter settings) and read the scratch buffer via the AXI0(AXI0 providing SRAM latency & BW thanks to the TAs). On silicon, you can't do that- to get good performance from Sram_Only, you have to place NN & scratch buffer in the SRAM. I have a patch for internal review where i will move the allocation for the scratch buffer into an array in the SRAM.

The same considerations go for the U85 as well. In addition to Shared_Sram, the U85 will often be used on SoCs with Cortex-A and DRAM which is why we provide the Dedicated_Sram mode.

@3l1 maybe we have to look how we can enable you to easily change the memory mode, perhaps from the cmd line ? Is there a way we can do it that doesn't involve changing the default memory mode we test against?

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73212758

@3l1 3l1 force-pushed the export-D73212758 branch from 87434e3 to a2fae41 Compare May 1, 2025 23:42
3l1 added a commit to 3l1/executorch that referenced this pull request May 1, 2025
Summary:
Pull Request resolved: pytorch#10279

Move default Vela/Regor configurations to Sram_Only

Differential Revision: D73212758
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73212758

3l1 added a commit to 3l1/executorch that referenced this pull request May 2, 2025
Summary:
Pull Request resolved: pytorch#10279

Move default Vela/Regor configurations to Sram_Only

Differential Revision: D73212758
@3l1 3l1 force-pushed the export-D73212758 branch from a2fae41 to 836ea7e Compare May 2, 2025 02:29
@zingo zingo requested a review from gggekov May 2, 2025 07:20
@zingo zingo added the module: arm Issues related to arm backend label May 2, 2025
Copy link
Collaborator

@gggekov gggekov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @digantdesai @3l1 ,
For U55, Shared_Sram is the most widely used memory mode. The reason is that on most embedded SoCs, you have limited amount of SRAM, hence most people prefer to place the NN in the Flash and use the SRAM only for the scratch_buffer(the SRAM will be used also for SW stack outside of the inference, e.g. pre/post processing, running an RTOS, etc). If you have a small model and SoC with enough SRAM for the NN, scratch buffer and your overall SW stack, then yes, it makes sense to use Sram_Only and you will get the performance benefit from the lower latency/higher bandwidth for the NN, but I would say this is the exception rather than the rule.

Also, the Corstone-300 has 2MB of SRAM and 256MB of DDR(can behave like Flash after the Timing Adapter parameters). Right now, for greater flexibility, we place everything in the DDR in the linker script. Then, via the REGIONCFG registers we tell the NPU to read the weights via AXI1(AXI1 providing Flash latency/BW thanks to the Timing Adapter settings) and read the scratch buffer via the AXI0(AXI0 providing SRAM latency & BW thanks to the TAs). On silicon, you can't do that- to get good performance from Sram_Only, you have to place NN & scratch buffer in the SRAM. I have a patch for internal review where i will move the allocation for the scratch buffer into an array in the SRAM.

The same considerations go for the U85 as well. In addition to Shared_Sram, the U85 will often be used on SoCs with Cortex-A and DRAM which is why we provide the Dedicated_Sram mode.

@3l1 perhaps we have to look how we can enable you to easily change the memory mode, perhaps from the cmd line ? Is there a way we can do it that doesn't involve changing the default memory mode?

@@ -93,7 +93,7 @@ def get_tosa_compile_spec_unbuilt(
def get_u55_compile_spec(
macs: int = 128,
system_config: str = "Ethos_U55_High_End_Embedded",
memory_mode: str = "Shared_Sram",
memory_mode: str = "Sram_Only",
Copy link
Collaborator

@gggekov gggekov May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @digantdesai @3l1 ,
For U55, Shared_Sram is the most widely used memory mode. The reason is that on most embedded SoCs, you have limited amount of SRAM, hence most people prefer to place the NN in the Flash and use the SRAM only for the scratch_buffer(the SRAM will be used also for SW stack outside of the inference, e.g. pre/post processing, running an RTOS, etc). If you have a small model and SoC with enough SRAM for the NN, scratch buffer and your overall SW stack, then yes, it makes sense to use Sram_Only and you will get the performance benefit from the lower latency/higher bandwidth for the NN, but I would say this is the exception rather than the rule.

Also, the Corstone-300 has 2MB of SRAM and 256MB of DDR(can behave like Flash after the Timing Adapter parameters). Right now, for greater flexibility, we place everything in the DDR in the linker script. Then, via the REGIONCFG registers we tell the NPU to read the weights via AXI1(AXI1 providing Flash latency/BW thanks to the Timing Adapter settings) and read the scratch buffer via the AXI0(AXI0 providing SRAM latency & BW thanks to the TAs). On silicon, you can't do that- to get good performance from Sram_Only, you have to place NN & scratch buffer in the SRAM. I have a patch for internal review where i will move the allocation for the scratch buffer into an array in the SRAM.

The same considerations go for the U85 as well. In addition to Shared_Sram, the U85 will often be used on SoCs with Cortex-A and DRAM which is why we provide the Dedicated_Sram mode.

@3l1 maybe we have to look how we can enable you to easily change the memory mode, perhaps from the cmd line ? Is there a way we can do it that doesn't involve changing the default memory mode we test against?

@3l1
Copy link
Contributor Author

3l1 commented May 2, 2025

You can ignore this pull request :) (Im not sure how to cancel it in this UI)

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73212758

3l1 added a commit to 3l1/executorch that referenced this pull request May 2, 2025
Summary:
Pull Request resolved: pytorch#10279

Move default Vela/Regor configurations to Sram_Only

Differential Revision: D73212758
@3l1 3l1 force-pushed the export-D73212758 branch from 836ea7e to b1c3398 Compare May 2, 2025 18:51
3l1 added a commit to 3l1/executorch that referenced this pull request May 2, 2025
Summary:

Move default Vela/Regor configurations to Sram_Only

Differential Revision: D73212758
@3l1 3l1 force-pushed the export-D73212758 branch from b1c3398 to befb9ff Compare May 2, 2025 19:54
3l1 added a commit to 3l1/executorch that referenced this pull request May 2, 2025
Summary:

Move default Vela/Regor configurations to Sram_Only

Differential Revision: D73212758
@3l1 3l1 force-pushed the export-D73212758 branch from befb9ff to 18c26f4 Compare May 2, 2025 19:56
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73212758

3l1 added a commit to 3l1/executorch that referenced this pull request May 2, 2025
Summary:
Pull Request resolved: pytorch#10279

Move default Vela/Regor configurations to Sram_Only

Differential Revision: D73212758
@3l1 3l1 force-pushed the export-D73212758 branch from 18c26f4 to 9f67a99 Compare May 2, 2025 19:58
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73212758

3l1 added a commit to 3l1/executorch that referenced this pull request May 2, 2025
Summary:
Pull Request resolved: pytorch#10279

Move default Vela/Regor configurations to Sram_Only

Differential Revision: D73212758
@3l1 3l1 force-pushed the export-D73212758 branch from 9f67a99 to 847f57e Compare May 2, 2025 20:04
3l1 added a commit to 3l1/executorch that referenced this pull request May 2, 2025
Summary:

Move default Vela/Regor configurations to Sram_Only

Differential Revision: D73212758
@3l1 3l1 force-pushed the export-D73212758 branch from 847f57e to 873f99e Compare May 2, 2025 23:39
Summary:
Pull Request resolved: pytorch#10279

Move default Vela/Regor configurations to Sram_Only

Differential Revision: D73212758
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D73212758

@3l1 3l1 force-pushed the export-D73212758 branch from 873f99e to 758e29c Compare May 2, 2025 23:42
@zingo zingo marked this pull request as draft May 6, 2025 10:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported module: arm Issues related to arm backend topic: not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants