-
Notifications
You must be signed in to change notification settings - Fork 602
Move default Vela/Regor configurations to Sram_Only #10279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10279
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 1 New FailureAs of commit 758e29c with merge base 3997ae9 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This pull request was exported from Phabricator. Differential Revision: D73212758 |
@pytorchbot label "topic: not user facing" |
Summary: Move default Vela/Regor configurations to Sram_Only Differential Revision: D73212758
This pull request was exported from Phabricator. Differential Revision: D73212758 |
Summary: Pull Request resolved: pytorch#10279 Move default Vela/Regor configurations to Sram_Only Differential Revision: D73212758
@@ -93,7 +93,7 @@ def get_tosa_compile_spec_unbuilt( | |||
def get_u55_compile_spec( | |||
macs: int = 128, | |||
system_config: str = "Ethos_U55_High_End_Embedded", | |||
memory_mode: str = "Shared_Sram", | |||
memory_mode: str = "Sram_Only", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the rationale for changing the default? I don't see any issues, asking because I am not sure why this was the default before.
cc @freddan80, @zingo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @digantdesai @3l1 ,
For U55, Shared_Sram is the most widely used memory mode. The reason is that on most embedded SoCs, you have limited amount of SRAM, hence most people prefer to place the NN in the Flash and use the SRAM only for the scratch_buffer(the SRAM will be used also for SW stack outside of the inference, e.g. pre/post processing, running an RTOS, etc). If you have a small model and SoC with enough SRAM for the NN, scratch buffer and your overall SW stack, then yes, it makes sense to use Sram_Only and you will get the performance benefit from the lower latency/higher bandwidth for the NN, but I would say this is the exception rather than the rule.
Also, the Corstone-300 has 2MB of SRAM and 256MB of DDR(can behave like Flash after the Timing Adapter parameters). Right now, for greater flexibility, we place everything in the DDR in the linker script. Then, via the REGIONCFG registers we tell the NPU to read the weights via AXI1(AXI1 providing Flash latency/BW thanks to the Timing Adapter settings) and read the scratch buffer via the AXI0(AXI0 providing SRAM latency & BW thanks to the TAs). On silicon, you can't do that- to get good performance from Sram_Only, you have to place NN & scratch buffer in the SRAM. I have a patch for internal review where i will move the allocation for the scratch buffer into an array in the SRAM.
The same considerations go for the U85 as well. In addition to Shared_Sram, the U85 will often be used on SoCs with Cortex-A and DRAM which is why we provide the Dedicated_Sram mode.
@3l1 maybe we have to look how we can enable you to easily change the memory mode, perhaps from the cmd line ? Is there a way we can do it that doesn't involve changing the default memory mode we test against?
This pull request was exported from Phabricator. Differential Revision: D73212758 |
Summary: Pull Request resolved: pytorch#10279 Move default Vela/Regor configurations to Sram_Only Differential Revision: D73212758
This pull request was exported from Phabricator. Differential Revision: D73212758 |
Summary: Pull Request resolved: pytorch#10279 Move default Vela/Regor configurations to Sram_Only Differential Revision: D73212758
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @digantdesai @3l1 ,
For U55, Shared_Sram is the most widely used memory mode. The reason is that on most embedded SoCs, you have limited amount of SRAM, hence most people prefer to place the NN in the Flash and use the SRAM only for the scratch_buffer(the SRAM will be used also for SW stack outside of the inference, e.g. pre/post processing, running an RTOS, etc). If you have a small model and SoC with enough SRAM for the NN, scratch buffer and your overall SW stack, then yes, it makes sense to use Sram_Only and you will get the performance benefit from the lower latency/higher bandwidth for the NN, but I would say this is the exception rather than the rule.
Also, the Corstone-300 has 2MB of SRAM and 256MB of DDR(can behave like Flash after the Timing Adapter parameters). Right now, for greater flexibility, we place everything in the DDR in the linker script. Then, via the REGIONCFG registers we tell the NPU to read the weights via AXI1(AXI1 providing Flash latency/BW thanks to the Timing Adapter settings) and read the scratch buffer via the AXI0(AXI0 providing SRAM latency & BW thanks to the TAs). On silicon, you can't do that- to get good performance from Sram_Only, you have to place NN & scratch buffer in the SRAM. I have a patch for internal review where i will move the allocation for the scratch buffer into an array in the SRAM.
The same considerations go for the U85 as well. In addition to Shared_Sram, the U85 will often be used on SoCs with Cortex-A and DRAM which is why we provide the Dedicated_Sram mode.
@3l1 perhaps we have to look how we can enable you to easily change the memory mode, perhaps from the cmd line ? Is there a way we can do it that doesn't involve changing the default memory mode?
@@ -93,7 +93,7 @@ def get_tosa_compile_spec_unbuilt( | |||
def get_u55_compile_spec( | |||
macs: int = 128, | |||
system_config: str = "Ethos_U55_High_End_Embedded", | |||
memory_mode: str = "Shared_Sram", | |||
memory_mode: str = "Sram_Only", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @digantdesai @3l1 ,
For U55, Shared_Sram is the most widely used memory mode. The reason is that on most embedded SoCs, you have limited amount of SRAM, hence most people prefer to place the NN in the Flash and use the SRAM only for the scratch_buffer(the SRAM will be used also for SW stack outside of the inference, e.g. pre/post processing, running an RTOS, etc). If you have a small model and SoC with enough SRAM for the NN, scratch buffer and your overall SW stack, then yes, it makes sense to use Sram_Only and you will get the performance benefit from the lower latency/higher bandwidth for the NN, but I would say this is the exception rather than the rule.
Also, the Corstone-300 has 2MB of SRAM and 256MB of DDR(can behave like Flash after the Timing Adapter parameters). Right now, for greater flexibility, we place everything in the DDR in the linker script. Then, via the REGIONCFG registers we tell the NPU to read the weights via AXI1(AXI1 providing Flash latency/BW thanks to the Timing Adapter settings) and read the scratch buffer via the AXI0(AXI0 providing SRAM latency & BW thanks to the TAs). On silicon, you can't do that- to get good performance from Sram_Only, you have to place NN & scratch buffer in the SRAM. I have a patch for internal review where i will move the allocation for the scratch buffer into an array in the SRAM.
The same considerations go for the U85 as well. In addition to Shared_Sram, the U85 will often be used on SoCs with Cortex-A and DRAM which is why we provide the Dedicated_Sram mode.
@3l1 maybe we have to look how we can enable you to easily change the memory mode, perhaps from the cmd line ? Is there a way we can do it that doesn't involve changing the default memory mode we test against?
You can ignore this pull request :) (Im not sure how to cancel it in this UI) |
This pull request was exported from Phabricator. Differential Revision: D73212758 |
Summary: Pull Request resolved: pytorch#10279 Move default Vela/Regor configurations to Sram_Only Differential Revision: D73212758
Summary: Move default Vela/Regor configurations to Sram_Only Differential Revision: D73212758
Summary: Move default Vela/Regor configurations to Sram_Only Differential Revision: D73212758
This pull request was exported from Phabricator. Differential Revision: D73212758 |
Summary: Pull Request resolved: pytorch#10279 Move default Vela/Regor configurations to Sram_Only Differential Revision: D73212758
This pull request was exported from Phabricator. Differential Revision: D73212758 |
Summary: Pull Request resolved: pytorch#10279 Move default Vela/Regor configurations to Sram_Only Differential Revision: D73212758
Summary: Move default Vela/Regor configurations to Sram_Only Differential Revision: D73212758
Summary: Pull Request resolved: pytorch#10279 Move default Vela/Regor configurations to Sram_Only Differential Revision: D73212758
This pull request was exported from Phabricator. Differential Revision: D73212758 |
Summary: Move default Vela/Regor configurations to Sram_Only
Differential Revision: D73212758