Skip to content

Move default Vela/Regor configurations to Sram_Only #10279

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions backends/arm/test/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ def get_tosa_compile_spec_unbuilt(
def get_u55_compile_spec(
macs: int = 128,
system_config: str = "Ethos_U55_High_End_Embedded",
memory_mode: str = "Shared_Sram",
memory_mode: str = "Sram_Only",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the rationale for changing the default? I don't see any issues, asking because I am not sure why this was the default before.

cc @freddan80, @zingo

Copy link
Collaborator

@gggekov gggekov May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @digantdesai @3l1 ,
For U55, Shared_Sram is the most widely used memory mode. The reason is that on most embedded SoCs, you have limited amount of SRAM, hence most people prefer to place the NN in the Flash and use the SRAM only for the scratch_buffer(the SRAM will be used also for SW stack outside of the inference, e.g. pre/post processing, running an RTOS, etc). If you have a small model and SoC with enough SRAM for the NN, scratch buffer and your overall SW stack, then yes, it makes sense to use Sram_Only and you will get the performance benefit from the lower latency/higher bandwidth for the NN, but I would say this is the exception rather than the rule.

Also, the Corstone-300 has 2MB of SRAM and 256MB of DDR(can behave like Flash after the Timing Adapter parameters). Right now, for greater flexibility, we place everything in the DDR in the linker script. Then, via the REGIONCFG registers we tell the NPU to read the weights via AXI1(AXI1 providing Flash latency/BW thanks to the Timing Adapter settings) and read the scratch buffer via the AXI0(AXI0 providing SRAM latency & BW thanks to the TAs). On silicon, you can't do that- to get good performance from Sram_Only, you have to place NN & scratch buffer in the SRAM. I have a patch for internal review where i will move the allocation for the scratch buffer into an array in the SRAM.

The same considerations go for the U85 as well. In addition to Shared_Sram, the U85 will often be used on SoCs with Cortex-A and DRAM which is why we provide the Dedicated_Sram mode.

@3l1 maybe we have to look how we can enable you to easily change the memory mode, perhaps from the cmd line ? Is there a way we can do it that doesn't involve changing the default memory mode we test against?

extra_flags: str = "--debug-force-regor --output-format=raw",
custom_path: Optional[str] = None,
) -> list[CompileSpec]:
Expand All @@ -112,7 +112,7 @@ def get_u55_compile_spec(
def get_u85_compile_spec(
macs: int = 128,
system_config="Ethos_U85_SYS_DRAM_Mid",
memory_mode="Shared_Sram",
memory_mode="Sram_Only",
extra_flags="--output-format=raw",
custom_path=None,
) -> list[CompileSpec]:
Expand Down
Loading