Skip to content

[AArch64][SME] Propagate desired ZA states in the MachineSMEABIPass #149510

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 7 commits into
base: users/MacDue/sme-loops
Choose a base branch
from

Conversation

MacDue
Copy link
Member

@MacDue MacDue commented Jul 18, 2025

This patch adds a propagation step to the MachineSMEABIPass that propagates desired ZA states forwards (from predecessors to successors).

The aim of this is to pick better ZA states for edge bundles, as when many (or all) blocks in a bundle do not have a preferred ZA state, the ZA state assigned to a bundle can be less than ideal.

An important case is nested loops, where only the inner loop has a preferred ZA state. Here we'd like to propagate the ZA state up from the inner loop to the outer loops (to avoid saves/restores in any loop).

Copy link

github-actions bot commented Jul 18, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@MacDue MacDue force-pushed the users/MacDue/sme-prop branch 2 times, most recently from c2d3414 to b3c2183 Compare July 18, 2025 14:25
MacDue added 7 commits August 5, 2025 10:13
<h1>Short Summary</h1>

This patch adds a new pass `aarch64-machine-sme-abi` to handle the ABI
for ZA state (e.g., lazy saves and agnostic ZA functions). This is
currently not enabled by default (but aims to be by LLVM 22). The goal
is for this new pass to more optimally place ZA saves/restores and to
work with exception handling.

<h1>Long Description</h1>

This patch reimplements management of ZA state for functions with
private and shared ZA state. Agnostic ZA functions will be handled in
a later patch. For now, this is under the flag `-aarch64-new-sme-abi`,
however, we intend for this to replace the current SelectionDAG
implementation once complete.

The approach taken here is to mark instructions as needing ZA to be in a
specific ("ACTIVE" or "LOCAL_SAVED"). Machine instructions implicitly
defining or using ZA registers (such as $zt0 or $zab0) require the
"ACTIVE" state. Function calls may need the "LOCAL_SAVED" or "ACTIVE"
state depending on the callee (having shared or private ZA).

We already add ZA register uses/definitions to machine instructions,
so no extra work is needed to mark these.

Calls need to be marked by glueing Arch64ISD::INOUT_ZA_USE or
Arch64ISD::REQUIRES_ZA_SAVE to the CALLSEQ_START.

These markers are then used by the MachineSMEABIPass to find
instructions where there is a transition between required ZA states.
These are the points we need to insert code to set up or restore a ZA
save (or initialize ZA).

To handle control flow between blocks (which may have different ZA state
requirements), we bundle the incoming and outgoing edges of blocks.
Bundles are formed by assigning each block an incoming and outgoing
bundle (initially, all blocks have their own two bundles). Bundles are
then combined by joining the outgoing bundle of a block with the
incoming bundle of all successors.

These bundles are then assigned a ZA state based on the blocks that
participate in the bundle. Blocks whose incoming edges are in a bundle
"vote" for a ZA state that matches the state required at the first
instruction in the block, and likewise, blocks whose outgoing edges are
in a bundle vote for the ZA state that matches the last instruction in
the block. The ZA state with the most votes is used, which aims to
minimize the number of state transitions.

Change-Id: Iced4a3f329deab3ff8f3fd449a2337f7bbfa71ec
TODO: Make a test case that validates this
On Windows or with stack probes on other targets, additional code needs
to be inserted after dynamic stack allocations to validate stack
accesses and/or ensure enough stack space has been allocated.

Rather than handle this case in the MachineSMEABIPass (like we do for
the standard case), we allocate the memory for the lazy save buffer in
SelectionDAG, which allows the existing expansions to emit the correct
code.

Note: This means in these cases, we may allocate a lazy save buffer when
there are no lazy saves present in the function (as we have to allocate
the buffer before the MachineSMEABIPass runs).

Change-Id: If89ab54c4de79f6fe5513a6b387e9e349f7bc7d1
Change-Id: I5dca6eaca8613a33e89a5cec9cc7d2c0f9cc7fb5
This extends the MachineSMEABIPass to handle agnostic ZA functions. This
case is currently handled like shared ZA functions, but we don't require
ZA state to be reloaded before agnostic ZA calls.

Note: This patch does not yet fully handle agnostic ZA functions that
can catch exceptions. E.g.:

```
__arm_agnostic("sme_za_state") void try_catch_agnostic_za_callee()
{
  try {
    agnostic_za_call();
  } catch(...) {
    noexcept_agnostic_za_call();
  }
}
```

As in this case, we won't commit a ZA save before the
`agnostic_za_call()`, which would be needed to restore ZA in the catch
block. This will be handled in a later patch.

Change-Id: I9cce7b42ec8b64d5442b35231b65dfaf9d149eed
This patch uses the MachineLoopInfo to give blocks within loops a higher
weight when choosing the bundle ZA state. MachineLoopInfo does not find
loop trip counts, so this uses an arbitrary weight (default 10), which
can be configured with the `-aarch64-sme-abi-loop-edge-weight` flag.

This makes the MachineSMEABIPass pass more likely to pick a bundle state
that matches the loop's entry/exit state, which avoids state changes in
the loop (which we assume will happen more than once).

This does require some extra analysis, so this is only enabled at -O1
and above.

Change-Id: If318c809d2f7cc1fca144fbe424ba2a2ca7fb19f
This patch adds a propagation step to the MachineSMEABIPass that
propagates desired ZA states forwards (from predecessors to successors).

The aim of this is to pick better ZA states for edge bundles, as when
many (or all) blocks in a bundle do not have a preferred ZA state, the
ZA state assigned to a bundle can be less than ideal.

An important case is nested loops, where only the inner loop has a
preferred ZA state. Here we'd like to propagate the ZA state up from the
inner loop to the outer loops (to avoid saves/restores in any loop).

Change-Id: I39f9c7d7608e2fa070be2fb88351b4d1d0079041
@MacDue MacDue force-pushed the users/MacDue/sme-loops branch from 2c9e14c to f182168 Compare August 5, 2025 11:04
@MacDue MacDue force-pushed the users/MacDue/sme-prop branch from b3c2183 to 0497093 Compare August 5, 2025 11:04
@MacDue MacDue force-pushed the users/MacDue/sme-loops branch 2 times, most recently from 88c0bb6 to 2d5441c Compare August 20, 2025 13:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant