You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[AArch64] Improve urem by constant costs (#122236)
A urem by a constant, much like a udiv by a constant, can be expanded
into a series of mul/add/shift instructions. The exact sequence of
instructions depends on the constants and the types.
If the constant is a power-2 then a shift / and will be used, so the
cost will be 1. This canonicalization happens relatively early so this
likely has very little effect in practice (it does help the cost of
funnel shifts).
For a non-power 2 the code for div will expand to a series of UMULH +
Add + Shift + Add, depending on the constant. urem is generally udiv +
mul + sub, so involves a few extra instructions. The UMULH is not always
available, i32 will use umull+shift, and vector types will use
umull+shift or umull+umull2+uzp depending on the vector size. v2i64 will
be scalarized because there is no mul available. SVE does have a UMULH
instruction.
The end result is that the costs should be closer to reality, with
scalable types a little lower cost than the fixed-width versions. (In
the future we might be able to use umulh for fixed-width when the SVE
instruction is available, but for the moment this should favour scalable
vectorization a little).
I've tried to make this patch only apply to constant UREM/UDIV
instructions. SDIV and SREM are left until a later patch to prevent this
becoming too complex. The funnel shift costs are changing as it believes
it will need a urem to clamp the shift amount, which should be a power-2
value for most common types.
Copy file name to clipboardExpand all lines: llvm/test/Analysis/CostModel/AArch64/fshl.ll
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -224,7 +224,7 @@ declare <2 x i64> @llvm.fshl.v4i64(<2 x i64>, <2 x i64>, <2 x i64>)
224
224
225
225
define <4 x i30> @fshl_v4i30_3rd_arg_var(<4 x i30> %a, <4 x i30> %b, <4 x i30> %c) {
226
226
; CHECK-LABEL: 'fshl_v4i30_3rd_arg_var'
227
-
; CHECK-NEXT: Cost Model: Found an estimated cost of 34 for instruction: %fshl = tail call <4 x i30> @llvm.fshl.v4i30(<4 x i30> %a, <4 x i30> %b, <4 x i30> %c)
227
+
; CHECK-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %fshl = tail call <4 x i30> @llvm.fshl.v4i30(<4 x i30> %a, <4 x i30> %b, <4 x i30> %c)
228
228
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i30> %fshl
Copy file name to clipboardExpand all lines: llvm/test/Analysis/CostModel/AArch64/fshr.ll
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -224,7 +224,7 @@ declare <2 x i64> @llvm.fshr.v4i64(<2 x i64>, <2 x i64>, <2 x i64>)
224
224
225
225
define <4 x i30> @fshr_v4i30_3rd_arg_var(<4 x i30> %a, <4 x i30> %b, <4 x i30> %c) {
226
226
; CHECK-LABEL: 'fshr_v4i30_3rd_arg_var'
227
-
; CHECK-NEXT: Cost Model: Found an estimated cost of 34 for instruction: %fshr = tail call <4 x i30> @llvm.fshr.v4i30(<4 x i30> %a, <4 x i30> %b, <4 x i30> %c)
227
+
; CHECK-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %fshr = tail call <4 x i30> @llvm.fshr.v4i30(<4 x i30> %a, <4 x i30> %b, <4 x i30> %c)
228
228
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i30> %fshr
0 commit comments