Skip to content

[AArch64][SVE] Fold ADD+CNTB to INCB and DECB #118280

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 17, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions llvm/lib/Target/AArch64/AArch64Features.td
Original file line number Diff line number Diff line change
Expand Up @@ -818,6 +818,11 @@ def FeatureUseFixedOverScalableIfEqualCost : SubtargetFeature<"use-fixed-over-sc
def FeatureAvoidLDAPUR : SubtargetFeature<"avoid-ldapur", "AvoidLDAPUR", "true",
"Prefer add+ldapr to offset ldapur">;

// Some INC/DEC forms have better latency and throughput than ADDVL.
def FeatureDisableFastIncVL : SubtargetFeature<"disable-fast-inc-vl",
"HasDisableFastIncVL", "true",
"Do not prefer INC/DEC, ALL, { 1, 2, 4 } over ADDVL">;

//===----------------------------------------------------------------------===//
// Architectures.
//
Expand Down
2 changes: 2 additions & 0 deletions llvm/lib/Target/AArch64/AArch64InstrInfo.td
Original file line number Diff line number Diff line change
Expand Up @@ -385,6 +385,8 @@ def UseScalarIncVL : Predicate<"Subtarget->useScalarIncVL()">;

def NoUseScalarIncVL : Predicate<"!Subtarget->useScalarIncVL()">;

def HasFastIncVL : Predicate<"!Subtarget->hasDisableFastIncVL()">;

def UseSVEFPLD1R : Predicate<"!Subtarget->noSVEFPLD1R()">;

def UseLDAPUR : Predicate<"!Subtarget->avoidLDAPUR()">;
Expand Down
23 changes: 23 additions & 0 deletions llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
Original file line number Diff line number Diff line change
Expand Up @@ -2677,6 +2677,29 @@ let Predicates = [HasSVE_or_SME] in {
(DECD_ZPiI ZPR:$op, 31, $imm)>;
}

// Some INCB/DECB forms have better latency and throughput than ADDVL, so we
// prefer using them here.
// We could extend this to other INC/DEC (scalar) instructions.
let Predicates = [HasSVE_or_SME, UseScalarIncVL, HasFastIncVL], AddedComplexity = 6 in {
foreach imm = [ 1, 2, 4 ] in {
def : Pat<(add GPR64:$op, (vscale !mul(imm, 16))),
(INCB_XPiI GPR64:$op, 31, imm)>;

def : Pat<(add GPR32:$op, (i32 (trunc (vscale !mul(imm, 16))))),
(EXTRACT_SUBREG (INCB_XPiI (INSERT_SUBREG (IMPLICIT_DEF),
GPR32:$op, sub_32), 31, imm),
sub_32)>;

def : Pat<(add GPR64:$op, (vscale !mul(imm, -16))),
(DECB_XPiI GPR64:$op, 31, imm)>;

def : Pat<(add GPR32:$op, (i32 (trunc (vscale !mul(imm, -16))))),
(EXTRACT_SUBREG (DECB_XPiI (INSERT_SUBREG (IMPLICIT_DEF),
GPR32:$op, sub_32), 31, imm),
sub_32)>;
}
}

let Predicates = [HasSVE_or_SME, UseScalarIncVL], AddedComplexity = 5 in {
def : Pat<(add GPR64:$op, (vscale (sve_rdvl_imm i32:$imm))),
(ADDVL_XXI GPR64:$op, $imm)>;
Expand Down
6 changes: 4 additions & 2 deletions llvm/test/CodeGen/AArch64/sme-framelower-use-bp.ll
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,8 @@ define void @quux() #1 {
; CHECK-NEXT: mov sp, x9
; CHECK-NEXT: sub x10, x29, #104
; CHECK-NEXT: stur x9, [x10, #-256] // 8-byte Folded Spill
; CHECK-NEXT: addvl x9, x8, #1
; CHECK-NEXT: mov x9, x8
; CHECK-NEXT: incb x9
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can still increase code size as exemplified here.
I'm not sure how I can avoid it without doing the fold later.

Copy link
Collaborator

@paulwalker-arm paulwalker-arm Apr 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about having a dedicated SubtargetFeature to use along with UseScalarIncVL. Something like HasFastIncDecVL?

Doing this gives a route to disable the patterns if necessary, be that for code size of some other reason. We can still make the feature default on so that all targets that enable UseScalarIncVL (i.e. everything SVE2 onwards) will trigger the new patterns.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a good idea, thanks. I'll give that a try and update the PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added HasFastIncVL (and FeatureDisableFastIncVL) to control this - is this what you had in mind?
I'm happy to invert the logic of FeatureDisableFastIncVL if you'd rather have it enabled on a CPU-by-CPU basis (presumably, based on the SWOGs, at least for the Neoverse V2 and Neoverse V3).

; CHECK-NEXT: mov w0, w9
; CHECK-NEXT: // implicit-def: $x9
; CHECK-NEXT: mov w9, w0
Expand Down Expand Up @@ -160,7 +161,8 @@ define void @quux() #1 {
; CHECK-NEXT: mov x9, sp
; CHECK-NEXT: subs x9, x9, #16
; CHECK-NEXT: mov sp, x9
; CHECK-NEXT: addvl x9, x8, #2
; CHECK-NEXT: mov x9, x8
; CHECK-NEXT: incb x9, all, mul #2
; CHECK-NEXT: mov w0, w9
; CHECK-NEXT: // implicit-def: $x9
; CHECK-NEXT: mov w9, w0
Expand Down
2 changes: 1 addition & 1 deletion llvm/test/CodeGen/AArch64/sve-lsrchain.ll
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ define void @test(ptr nocapture noundef readonly %kernel, i32 noundef %kw, float
; CHECK-NEXT: ldr z5, [x4, #3, mul vl]
; CHECK-NEXT: fmla z4.h, p0/m, z5.h, z3.h
; CHECK-NEXT: str z4, [x16, #3, mul vl]
; CHECK-NEXT: addvl x16, x16, #4
; CHECK-NEXT: incb x16, all, mul #4
; CHECK-NEXT: cmp x16, x11
; CHECK-NEXT: b.lo .LBB0_4
; CHECK-NEXT: // %bb.5: // %while.cond.i..exit_crit_edge.us
Expand Down
134 changes: 131 additions & 3 deletions llvm/test/CodeGen/AArch64/sve-vl-arith.ll
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2
; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -verify-machineinstrs < %s | FileCheck %s -check-prefix=NO_SCALAR_INC
; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -sve-use-scalar-inc-vl=true -verify-machineinstrs < %s | FileCheck %s
; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve,disable-fast-inc-vl -sve-use-scalar-inc-vl=true -verify-machineinstrs < %s | FileCheck %s -check-prefix=NO_FAST_INC
; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve2 -verify-machineinstrs < %s | FileCheck %s
; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve2 -sve-use-scalar-inc-vl=false -verify-machineinstrs < %s | FileCheck %s -check-prefix=NO_SCALAR_INC
; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve2,disable-fast-inc-vl -verify-machineinstrs < %s | FileCheck %s -check-prefix=NO_FAST_INC

define <vscale x 8 x i16> @inch_vec(<vscale x 8 x i16> %a) {
; NO_SCALAR_INC-LABEL: inch_vec:
Expand All @@ -14,6 +16,11 @@ define <vscale x 8 x i16> @inch_vec(<vscale x 8 x i16> %a) {
; CHECK: // %bb.0:
; CHECK-NEXT: inch z0.h
; CHECK-NEXT: ret
;
; NO_FAST_INC-LABEL: inch_vec:
; NO_FAST_INC: // %bb.0:
; NO_FAST_INC-NEXT: inch z0.h
; NO_FAST_INC-NEXT: ret
%vscale = call i16 @llvm.vscale.i16()
%mul = mul i16 %vscale, 8
%vl = insertelement <vscale x 8 x i16> poison, i16 %mul, i32 0
Expand All @@ -32,6 +39,11 @@ define <vscale x 4 x i32> @incw_vec(<vscale x 4 x i32> %a) {
; CHECK: // %bb.0:
; CHECK-NEXT: incw z0.s
; CHECK-NEXT: ret
;
; NO_FAST_INC-LABEL: incw_vec:
; NO_FAST_INC: // %bb.0:
; NO_FAST_INC-NEXT: incw z0.s
; NO_FAST_INC-NEXT: ret
%vscale = call i32 @llvm.vscale.i32()
%mul = mul i32 %vscale, 4
%vl = insertelement <vscale x 4 x i32> poison, i32 %mul, i32 0
Expand All @@ -50,6 +62,11 @@ define <vscale x 2 x i64> @incd_vec(<vscale x 2 x i64> %a) {
; CHECK: // %bb.0:
; CHECK-NEXT: incd z0.d
; CHECK-NEXT: ret
;
; NO_FAST_INC-LABEL: incd_vec:
; NO_FAST_INC: // %bb.0:
; NO_FAST_INC-NEXT: incd z0.d
; NO_FAST_INC-NEXT: ret
%vscale = call i64 @llvm.vscale.i64()
%mul = mul i64 %vscale, 2
%vl = insertelement <vscale x 2 x i64> poison, i64 %mul, i32 0
Expand All @@ -68,6 +85,11 @@ define <vscale x 8 x i16> @dech_vec(<vscale x 8 x i16> %a) {
; CHECK: // %bb.0:
; CHECK-NEXT: dech z0.h, all, mul #2
; CHECK-NEXT: ret
;
; NO_FAST_INC-LABEL: dech_vec:
; NO_FAST_INC: // %bb.0:
; NO_FAST_INC-NEXT: dech z0.h, all, mul #2
; NO_FAST_INC-NEXT: ret
%vscale = call i16 @llvm.vscale.i16()
%mul = mul i16 %vscale, 16
%vl = insertelement <vscale x 8 x i16> poison, i16 %mul, i32 0
Expand All @@ -86,6 +108,11 @@ define <vscale x 4 x i32> @decw_vec(<vscale x 4 x i32> %a) {
; CHECK: // %bb.0:
; CHECK-NEXT: decw z0.s, all, mul #4
; CHECK-NEXT: ret
;
; NO_FAST_INC-LABEL: decw_vec:
; NO_FAST_INC: // %bb.0:
; NO_FAST_INC-NEXT: decw z0.s, all, mul #4
; NO_FAST_INC-NEXT: ret
%vscale = call i32 @llvm.vscale.i32()
%mul = mul i32 %vscale, 16
%vl = insertelement <vscale x 4 x i32> poison, i32 %mul, i32 0
Expand All @@ -104,6 +131,11 @@ define <vscale x 2 x i64> @decd_vec(<vscale x 2 x i64> %a) {
; CHECK: // %bb.0:
; CHECK-NEXT: decd z0.d, all, mul #8
; CHECK-NEXT: ret
;
; NO_FAST_INC-LABEL: decd_vec:
; NO_FAST_INC: // %bb.0:
; NO_FAST_INC-NEXT: decd z0.d, all, mul #8
; NO_FAST_INC-NEXT: ret
%vscale = call i64 @llvm.vscale.i64()
%mul = mul i64 %vscale, 16
%vl = insertelement <vscale x 2 x i64> poison, i64 %mul, i32 0
Expand All @@ -123,8 +155,13 @@ define i64 @incb_scalar_i64(i64 %a) {
;
; CHECK-LABEL: incb_scalar_i64:
; CHECK: // %bb.0:
; CHECK-NEXT: addvl x0, x0, #1
; CHECK-NEXT: incb x0
; CHECK-NEXT: ret
;
; NO_FAST_INC-LABEL: incb_scalar_i64:
; NO_FAST_INC: // %bb.0:
; NO_FAST_INC-NEXT: addvl x0, x0, #1
; NO_FAST_INC-NEXT: ret
%vscale = call i64 @llvm.vscale.i64()
%mul = mul i64 %vscale, 16
%add = add i64 %a, %mul
Expand All @@ -142,6 +179,11 @@ define i64 @inch_scalar_i64(i64 %a) {
; CHECK: // %bb.0:
; CHECK-NEXT: inch x0
; CHECK-NEXT: ret
;
; NO_FAST_INC-LABEL: inch_scalar_i64:
; NO_FAST_INC: // %bb.0:
; NO_FAST_INC-NEXT: inch x0
; NO_FAST_INC-NEXT: ret
%vscale = call i64 @llvm.vscale.i64()
%mul = mul i64 %vscale, 8
%add = add i64 %a, %mul
Expand All @@ -159,6 +201,11 @@ define i64 @incw_scalar_i64(i64 %a) {
; CHECK: // %bb.0:
; CHECK-NEXT: incw x0
; CHECK-NEXT: ret
;
; NO_FAST_INC-LABEL: incw_scalar_i64:
; NO_FAST_INC: // %bb.0:
; NO_FAST_INC-NEXT: incw x0
; NO_FAST_INC-NEXT: ret
%vscale = call i64 @llvm.vscale.i64()
%mul = mul i64 %vscale, 4
%add = add i64 %a, %mul
Expand All @@ -176,6 +223,11 @@ define i64 @incd_scalar_i64(i64 %a) {
; CHECK: // %bb.0:
; CHECK-NEXT: incd x0
; CHECK-NEXT: ret
;
; NO_FAST_INC-LABEL: incd_scalar_i64:
; NO_FAST_INC: // %bb.0:
; NO_FAST_INC-NEXT: incd x0
; NO_FAST_INC-NEXT: ret
%vscale = call i64 @llvm.vscale.i64()
%mul = mul i64 %vscale, 2
%add = add i64 %a, %mul
Expand All @@ -193,8 +245,13 @@ define i64 @decb_scalar_i64(i64 %a) {
;
; CHECK-LABEL: decb_scalar_i64:
; CHECK: // %bb.0:
; CHECK-NEXT: addvl x0, x0, #-2
; CHECK-NEXT: decb x0, all, mul #2
; CHECK-NEXT: ret
;
; NO_FAST_INC-LABEL: decb_scalar_i64:
; NO_FAST_INC: // %bb.0:
; NO_FAST_INC-NEXT: addvl x0, x0, #-2
; NO_FAST_INC-NEXT: ret
%vscale = call i64 @llvm.vscale.i64()
%mul = mul i64 %vscale, 32
%sub = sub i64 %a, %mul
Expand All @@ -212,6 +269,11 @@ define i64 @dech_scalar_i64(i64 %a) {
; CHECK: // %bb.0:
; CHECK-NEXT: dech x0, all, mul #3
; CHECK-NEXT: ret
;
; NO_FAST_INC-LABEL: dech_scalar_i64:
; NO_FAST_INC: // %bb.0:
; NO_FAST_INC-NEXT: dech x0, all, mul #3
; NO_FAST_INC-NEXT: ret
%vscale = call i64 @llvm.vscale.i64()
%mul = mul i64 %vscale, 24
%sub = sub i64 %a, %mul
Expand All @@ -229,6 +291,11 @@ define i64 @decw_scalar_i64(i64 %a) {
; CHECK: // %bb.0:
; CHECK-NEXT: decw x0, all, mul #3
; CHECK-NEXT: ret
;
; NO_FAST_INC-LABEL: decw_scalar_i64:
; NO_FAST_INC: // %bb.0:
; NO_FAST_INC-NEXT: decw x0, all, mul #3
; NO_FAST_INC-NEXT: ret
%vscale = call i64 @llvm.vscale.i64()
%mul = mul i64 %vscale, 12
%sub = sub i64 %a, %mul
Expand All @@ -246,6 +313,11 @@ define i64 @decd_scalar_i64(i64 %a) {
; CHECK: // %bb.0:
; CHECK-NEXT: decd x0, all, mul #3
; CHECK-NEXT: ret
;
; NO_FAST_INC-LABEL: decd_scalar_i64:
; NO_FAST_INC: // %bb.0:
; NO_FAST_INC-NEXT: decd x0, all, mul #3
; NO_FAST_INC-NEXT: ret
%vscale = call i64 @llvm.vscale.i64()
%mul = mul i64 %vscale, 6
%sub = sub i64 %a, %mul
Expand All @@ -267,6 +339,13 @@ define i32 @incb_scalar_i32(i32 %a) {
; CHECK-NEXT: addvl x0, x0, #3
; CHECK-NEXT: // kill: def $w0 killed $w0 killed $x0
; CHECK-NEXT: ret
;
; NO_FAST_INC-LABEL: incb_scalar_i32:
; NO_FAST_INC: // %bb.0:
; NO_FAST_INC-NEXT: // kill: def $w0 killed $w0 def $x0
; NO_FAST_INC-NEXT: addvl x0, x0, #3
; NO_FAST_INC-NEXT: // kill: def $w0 killed $w0 killed $x0
; NO_FAST_INC-NEXT: ret

%vscale = call i64 @llvm.vscale.i64()
%mul = mul i64 %vscale, 48
Expand All @@ -288,6 +367,13 @@ define i32 @inch_scalar_i32(i32 %a) {
; CHECK-NEXT: inch x0, all, mul #7
; CHECK-NEXT: // kill: def $w0 killed $w0 killed $x0
; CHECK-NEXT: ret
;
; NO_FAST_INC-LABEL: inch_scalar_i32:
; NO_FAST_INC: // %bb.0:
; NO_FAST_INC-NEXT: // kill: def $w0 killed $w0 def $x0
; NO_FAST_INC-NEXT: inch x0, all, mul #7
; NO_FAST_INC-NEXT: // kill: def $w0 killed $w0 killed $x0
; NO_FAST_INC-NEXT: ret

%vscale = call i64 @llvm.vscale.i64()
%mul = mul i64 %vscale, 56
Expand All @@ -309,6 +395,13 @@ define i32 @incw_scalar_i32(i32 %a) {
; CHECK-NEXT: incw x0, all, mul #7
; CHECK-NEXT: // kill: def $w0 killed $w0 killed $x0
; CHECK-NEXT: ret
;
; NO_FAST_INC-LABEL: incw_scalar_i32:
; NO_FAST_INC: // %bb.0:
; NO_FAST_INC-NEXT: // kill: def $w0 killed $w0 def $x0
; NO_FAST_INC-NEXT: incw x0, all, mul #7
; NO_FAST_INC-NEXT: // kill: def $w0 killed $w0 killed $x0
; NO_FAST_INC-NEXT: ret

%vscale = call i64 @llvm.vscale.i64()
%mul = mul i64 %vscale, 28
Expand All @@ -330,6 +423,13 @@ define i32 @incd_scalar_i32(i32 %a) {
; CHECK-NEXT: incd x0, all, mul #7
; CHECK-NEXT: // kill: def $w0 killed $w0 killed $x0
; CHECK-NEXT: ret
;
; NO_FAST_INC-LABEL: incd_scalar_i32:
; NO_FAST_INC: // %bb.0:
; NO_FAST_INC-NEXT: // kill: def $w0 killed $w0 def $x0
; NO_FAST_INC-NEXT: incd x0, all, mul #7
; NO_FAST_INC-NEXT: // kill: def $w0 killed $w0 killed $x0
; NO_FAST_INC-NEXT: ret

%vscale = call i64 @llvm.vscale.i64()
%mul = mul i64 %vscale, 14
Expand All @@ -350,9 +450,16 @@ define i32 @decb_scalar_i32(i32 %a) {
; CHECK-LABEL: decb_scalar_i32:
; CHECK: // %bb.0:
; CHECK-NEXT: // kill: def $w0 killed $w0 def $x0
; CHECK-NEXT: addvl x0, x0, #-4
; CHECK-NEXT: decb x0, all, mul #4
; CHECK-NEXT: // kill: def $w0 killed $w0 killed $x0
; CHECK-NEXT: ret
;
; NO_FAST_INC-LABEL: decb_scalar_i32:
; NO_FAST_INC: // %bb.0:
; NO_FAST_INC-NEXT: // kill: def $w0 killed $w0 def $x0
; NO_FAST_INC-NEXT: addvl x0, x0, #-4
; NO_FAST_INC-NEXT: // kill: def $w0 killed $w0 killed $x0
; NO_FAST_INC-NEXT: ret

%vscale = call i64 @llvm.vscale.i64()
%mul = mul i64 %vscale, 64
Expand All @@ -374,6 +481,13 @@ define i32 @dech_scalar_i32(i32 %a) {
; CHECK-NEXT: dech x0
; CHECK-NEXT: // kill: def $w0 killed $w0 killed $x0
; CHECK-NEXT: ret
;
; NO_FAST_INC-LABEL: dech_scalar_i32:
; NO_FAST_INC: // %bb.0:
; NO_FAST_INC-NEXT: // kill: def $w0 killed $w0 def $x0
; NO_FAST_INC-NEXT: dech x0
; NO_FAST_INC-NEXT: // kill: def $w0 killed $w0 killed $x0
; NO_FAST_INC-NEXT: ret

%vscale = call i64 @llvm.vscale.i64()
%mul = mul i64 %vscale, 8
Expand All @@ -395,6 +509,13 @@ define i32 @decw_scalar_i32(i32 %a) {
; CHECK-NEXT: decw x0
; CHECK-NEXT: // kill: def $w0 killed $w0 killed $x0
; CHECK-NEXT: ret
;
; NO_FAST_INC-LABEL: decw_scalar_i32:
; NO_FAST_INC: // %bb.0:
; NO_FAST_INC-NEXT: // kill: def $w0 killed $w0 def $x0
; NO_FAST_INC-NEXT: decw x0
; NO_FAST_INC-NEXT: // kill: def $w0 killed $w0 killed $x0
; NO_FAST_INC-NEXT: ret

%vscale = call i64 @llvm.vscale.i64()
%mul = mul i64 %vscale, 4
Expand All @@ -416,6 +537,13 @@ define i32 @decd_scalar_i32(i32 %a) {
; CHECK-NEXT: decd x0
; CHECK-NEXT: // kill: def $w0 killed $w0 killed $x0
; CHECK-NEXT: ret
;
; NO_FAST_INC-LABEL: decd_scalar_i32:
; NO_FAST_INC: // %bb.0:
; NO_FAST_INC-NEXT: // kill: def $w0 killed $w0 def $x0
; NO_FAST_INC-NEXT: decd x0
; NO_FAST_INC-NEXT: // kill: def $w0 killed $w0 killed $x0
; NO_FAST_INC-NEXT: ret
%vscale = call i64 @llvm.vscale.i64()
%mul = mul i64 %vscale, 2
%vl = trunc i64 %mul to i32
Expand Down
4 changes: 2 additions & 2 deletions llvm/test/CodeGen/AArch64/sve2p1-intrinsics-ld1-single.ll
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,8 @@ define <vscale x 4 x i32> @test_svld1uwq_i32_si(<vscale x 1 x i1> %pred, ptr %ba
define <vscale x 4 x i32> @test_svld1uwq_i32_out_of_bound(<vscale x 1 x i1> %pred, ptr %base) {
; CHECK-LABEL: test_svld1uwq_i32_out_of_bound:
; CHECK: // %bb.0:
; CHECK-NEXT: addvl x8, x0, #2
; CHECK-NEXT: ld1w { z0.q }, p0/z, [x8]
; CHECK-NEXT: incb x0, all, mul #2
; CHECK-NEXT: ld1w { z0.q }, p0/z, [x0]
; CHECK-NEXT: ret
%gep = getelementptr inbounds <vscale x 1 x i32>, ptr %base, i64 8
%res = call <vscale x 4 x i32> @llvm.aarch64.sve.ld1uwq.nxv4i32(<vscale x 1 x i1> %pred, ptr %gep)
Expand Down
Loading
Loading