-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[CodeGen] Increase NumVisited limit in TwoAddressInstructionPass to 64 #80627
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-x86 @llvm/pr-subscribers-backend-aarch64 Author: AtariDreams (AtariDreams) ChangesNow that hardware has progressed, we do not need an arbitrary limit anymore. Full diff: https://github.com/llvm/llvm-project/pull/80627.diff 3 Files Affected:
diff --git a/llvm/lib/CodeGen/TwoAddressInstructionPass.cpp b/llvm/lib/CodeGen/TwoAddressInstructionPass.cpp
index 74d7904aee33a..9e466391385cd 100644
--- a/llvm/lib/CodeGen/TwoAddressInstructionPass.cpp
+++ b/llvm/lib/CodeGen/TwoAddressInstructionPass.cpp
@@ -914,16 +914,12 @@ bool TwoAddressInstructionPass::rescheduleMIBelowKill(
}
// Check if the reschedule will not break dependencies.
- unsigned NumVisited = 0;
MachineBasicBlock::iterator KillPos = KillMI;
++KillPos;
for (MachineInstr &OtherMI : make_range(End, KillPos)) {
// Debug or pseudo instructions cannot be counted against the limit.
if (OtherMI.isDebugOrPseudoInstr())
continue;
- if (NumVisited > 10) // FIXME: Arbitrary limit to reduce compile time cost.
- return false;
- ++NumVisited;
if (OtherMI.hasUnmodeledSideEffects() || OtherMI.isCall() ||
OtherMI.isBranch() || OtherMI.isTerminator())
// Don't move pass calls, etc.
@@ -1088,15 +1084,11 @@ bool TwoAddressInstructionPass::rescheduleKillAboveMI(
}
// Check if the reschedule will not break depedencies.
- unsigned NumVisited = 0;
for (MachineInstr &OtherMI :
make_range(mi, MachineBasicBlock::iterator(KillMI))) {
// Debug or pseudo instructions cannot be counted against the limit.
if (OtherMI.isDebugOrPseudoInstr())
continue;
- if (NumVisited > 10) // FIXME: Arbitrary limit to reduce compile time cost.
- return false;
- ++NumVisited;
if (OtherMI.hasUnmodeledSideEffects() || OtherMI.isCall() ||
OtherMI.isBranch() || OtherMI.isTerminator())
// Don't move pass calls, etc.
diff --git a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-extends.ll b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-extends.ll
index c7a89612d278f..68f09bf0e5932 100644
--- a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-extends.ll
+++ b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-extends.ll
@@ -236,22 +236,20 @@ define void @sext_v16i8_v16i64(<16 x i8> %a, ptr %out) {
; CHECK-NEXT: sunpklo z4.d, z2.s
; CHECK-NEXT: ext z2.b, z2.b, z2.b, #8
; CHECK-NEXT: sunpklo z0.s, z0.h
-; CHECK-NEXT: mov z7.d, z1.d
-; CHECK-NEXT: sunpklo z2.d, z2.s
+; CHECK-NEXT: sunpklo z7.d, z1.s
+; CHECK-NEXT: ext z1.b, z1.b, z1.b, #8
; CHECK-NEXT: sunpklo z5.d, z3.s
; CHECK-NEXT: ext z3.b, z3.b, z3.b, #8
-; CHECK-NEXT: ext z7.b, z7.b, z1.b, #8
+; CHECK-NEXT: sunpklo z2.d, z2.s
; CHECK-NEXT: sunpklo z1.d, z1.s
-; CHECK-NEXT: mov z6.d, z0.d
+; CHECK-NEXT: sunpklo z6.d, z0.s
+; CHECK-NEXT: ext z0.b, z0.b, z0.b, #8
; CHECK-NEXT: sunpklo z3.d, z3.s
; CHECK-NEXT: stp q4, q2, [x0]
-; CHECK-NEXT: sunpklo z4.d, z7.s
-; CHECK-NEXT: ext z6.b, z6.b, z0.b, #8
; CHECK-NEXT: sunpklo z0.d, z0.s
+; CHECK-NEXT: stp q7, q1, [x0, #32]
; CHECK-NEXT: stp q5, q3, [x0, #64]
-; CHECK-NEXT: sunpklo z2.d, z6.s
-; CHECK-NEXT: stp q1, q4, [x0, #32]
-; CHECK-NEXT: stp q0, q2, [x0, #96]
+; CHECK-NEXT: stp q6, q0, [x0, #96]
; CHECK-NEXT: ret
%b = sext <16 x i8> %a to <16 x i64>
store <16 x i64> %b, ptr %out
@@ -264,62 +262,60 @@ define void @sext_v32i8_v32i64(ptr %in, ptr %out) {
; CHECK-NEXT: ldp q1, q0, [x0]
; CHECK-NEXT: add z0.b, z0.b, z0.b
; CHECK-NEXT: add z1.b, z1.b, z1.b
-; CHECK-NEXT: mov z2.d, z0.d
+; CHECK-NEXT: sunpklo z2.h, z0.b
+; CHECK-NEXT: ext z0.b, z0.b, z0.b, #8
+; CHECK-NEXT: sunpklo z3.h, z1.b
+; CHECK-NEXT: ext z1.b, z1.b, z1.b, #8
; CHECK-NEXT: sunpklo z0.h, z0.b
-; CHECK-NEXT: mov z3.d, z1.d
+; CHECK-NEXT: sunpklo z4.s, z2.h
; CHECK-NEXT: sunpklo z1.h, z1.b
+; CHECK-NEXT: sunpklo z5.s, z3.h
; CHECK-NEXT: ext z2.b, z2.b, z2.b, #8
; CHECK-NEXT: ext z3.b, z3.b, z3.b, #8
-; CHECK-NEXT: sunpklo z4.s, z0.h
+; CHECK-NEXT: sunpklo z6.s, z0.h
; CHECK-NEXT: ext z0.b, z0.b, z0.b, #8
-; CHECK-NEXT: sunpklo z5.s, z1.h
-; CHECK-NEXT: ext z1.b, z1.b, z1.b, #8
-; CHECK-NEXT: sunpklo z2.h, z2.b
-; CHECK-NEXT: sunpklo z3.h, z3.b
-; CHECK-NEXT: sunpklo z0.s, z0.h
-; CHECK-NEXT: sunpklo z16.d, z4.s
+; CHECK-NEXT: sunpklo z7.d, z4.s
; CHECK-NEXT: ext z4.b, z4.b, z4.b, #8
-; CHECK-NEXT: sunpklo z1.s, z1.h
+; CHECK-NEXT: sunpklo z2.s, z2.h
+; CHECK-NEXT: sunpklo z3.s, z3.h
+; CHECK-NEXT: sunpklo z16.s, z1.h
; CHECK-NEXT: sunpklo z17.d, z5.s
; CHECK-NEXT: ext z5.b, z5.b, z5.b, #8
-; CHECK-NEXT: sunpklo z6.s, z2.h
-; CHECK-NEXT: sunpklo z7.s, z3.h
-; CHECK-NEXT: ext z2.b, z2.b, z2.b, #8
+; CHECK-NEXT: ext z1.b, z1.b, z1.b, #8
+; CHECK-NEXT: sunpklo z0.s, z0.h
; CHECK-NEXT: sunpklo z4.d, z4.s
-; CHECK-NEXT: ext z3.b, z3.b, z3.b, #8
-; CHECK-NEXT: sunpklo z19.d, z0.s
-; CHECK-NEXT: sunpklo z5.d, z5.s
-; CHECK-NEXT: ext z0.b, z0.b, z0.b, #8
-; CHECK-NEXT: sunpklo z2.s, z2.h
; CHECK-NEXT: sunpklo z18.d, z6.s
; CHECK-NEXT: ext z6.b, z6.b, z6.b, #8
-; CHECK-NEXT: sunpklo z3.s, z3.h
-; CHECK-NEXT: stp q16, q4, [x1, #128]
-; CHECK-NEXT: mov z16.d, z7.d
-; CHECK-NEXT: sunpklo z0.d, z0.s
-; CHECK-NEXT: stp q17, q5, [x1]
-; CHECK-NEXT: sunpklo z5.d, z7.s
-; CHECK-NEXT: sunpklo z4.d, z6.s
-; CHECK-NEXT: mov z6.d, z1.d
-; CHECK-NEXT: ext z16.b, z16.b, z7.b, #8
+; CHECK-NEXT: sunpklo z5.d, z5.s
+; CHECK-NEXT: sunpklo z1.s, z1.h
+; CHECK-NEXT: sunpklo z19.d, z16.s
+; CHECK-NEXT: sunpklo z6.d, z6.s
+; CHECK-NEXT: ext z16.b, z16.b, z16.b, #8
+; CHECK-NEXT: stp q7, q4, [x1, #128]
; CHECK-NEXT: mov z7.d, z2.d
-; CHECK-NEXT: stp q19, q0, [x1, #160]
-; CHECK-NEXT: sunpklo z0.d, z2.s
-; CHECK-NEXT: ext z6.b, z6.b, z1.b, #8
-; CHECK-NEXT: sunpklo z1.d, z1.s
-; CHECK-NEXT: stp q18, q4, [x1, #192]
; CHECK-NEXT: mov z4.d, z3.d
-; CHECK-NEXT: ext z7.b, z7.b, z2.b, #8
+; CHECK-NEXT: stp q17, q5, [x1]
+; CHECK-NEXT: mov z5.d, z0.d
; CHECK-NEXT: sunpklo z16.d, z16.s
-; CHECK-NEXT: sunpklo z6.d, z6.s
+; CHECK-NEXT: ext z7.b, z7.b, z2.b, #8
; CHECK-NEXT: ext z4.b, z4.b, z3.b, #8
-; CHECK-NEXT: sunpklo z2.d, z7.s
+; CHECK-NEXT: stp q18, q6, [x1, #192]
+; CHECK-NEXT: mov z6.d, z1.d
+; CHECK-NEXT: sunpklo z2.d, z2.s
; CHECK-NEXT: sunpklo z3.d, z3.s
-; CHECK-NEXT: stp q5, q16, [x1, #64]
-; CHECK-NEXT: stp q1, q6, [x1, #32]
-; CHECK-NEXT: sunpklo z1.d, z4.s
+; CHECK-NEXT: ext z5.b, z5.b, z0.b, #8
+; CHECK-NEXT: sunpklo z0.d, z0.s
+; CHECK-NEXT: sunpklo z7.d, z7.s
+; CHECK-NEXT: sunpklo z4.d, z4.s
+; CHECK-NEXT: stp q19, q16, [x1, #64]
+; CHECK-NEXT: ext z6.b, z6.b, z1.b, #8
+; CHECK-NEXT: sunpklo z1.d, z1.s
+; CHECK-NEXT: stp q3, q4, [x1, #32]
+; CHECK-NEXT: sunpklo z3.d, z6.s
+; CHECK-NEXT: stp q2, q7, [x1, #160]
+; CHECK-NEXT: sunpklo z2.d, z5.s
+; CHECK-NEXT: stp q1, q3, [x1, #96]
; CHECK-NEXT: stp q0, q2, [x1, #224]
-; CHECK-NEXT: stp q3, q1, [x1, #96]
; CHECK-NEXT: ret
%a = load <32 x i8>, ptr %in
%b = add <32 x i8> %a, %a
@@ -661,22 +657,20 @@ define void @zext_v16i8_v16i64(<16 x i8> %a, ptr %out) {
; CHECK-NEXT: uunpklo z4.d, z2.s
; CHECK-NEXT: ext z2.b, z2.b, z2.b, #8
; CHECK-NEXT: uunpklo z0.s, z0.h
-; CHECK-NEXT: mov z7.d, z1.d
-; CHECK-NEXT: uunpklo z2.d, z2.s
+; CHECK-NEXT: uunpklo z7.d, z1.s
+; CHECK-NEXT: ext z1.b, z1.b, z1.b, #8
; CHECK-NEXT: uunpklo z5.d, z3.s
; CHECK-NEXT: ext z3.b, z3.b, z3.b, #8
-; CHECK-NEXT: ext z7.b, z7.b, z1.b, #8
+; CHECK-NEXT: uunpklo z2.d, z2.s
; CHECK-NEXT: uunpklo z1.d, z1.s
-; CHECK-NEXT: mov z6.d, z0.d
+; CHECK-NEXT: uunpklo z6.d, z0.s
+; CHECK-NEXT: ext z0.b, z0.b, z0.b, #8
; CHECK-NEXT: uunpklo z3.d, z3.s
; CHECK-NEXT: stp q4, q2, [x0]
-; CHECK-NEXT: uunpklo z4.d, z7.s
-; CHECK-NEXT: ext z6.b, z6.b, z0.b, #8
; CHECK-NEXT: uunpklo z0.d, z0.s
+; CHECK-NEXT: stp q7, q1, [x0, #32]
; CHECK-NEXT: stp q5, q3, [x0, #64]
-; CHECK-NEXT: uunpklo z2.d, z6.s
-; CHECK-NEXT: stp q1, q4, [x0, #32]
-; CHECK-NEXT: stp q0, q2, [x0, #96]
+; CHECK-NEXT: stp q6, q0, [x0, #96]
; CHECK-NEXT: ret
%b = zext <16 x i8> %a to <16 x i64>
store <16 x i64> %b, ptr %out
@@ -689,62 +683,60 @@ define void @zext_v32i8_v32i64(ptr %in, ptr %out) {
; CHECK-NEXT: ldp q1, q0, [x0]
; CHECK-NEXT: add z0.b, z0.b, z0.b
; CHECK-NEXT: add z1.b, z1.b, z1.b
-; CHECK-NEXT: mov z2.d, z0.d
+; CHECK-NEXT: uunpklo z2.h, z0.b
+; CHECK-NEXT: ext z0.b, z0.b, z0.b, #8
+; CHECK-NEXT: uunpklo z3.h, z1.b
+; CHECK-NEXT: ext z1.b, z1.b, z1.b, #8
; CHECK-NEXT: uunpklo z0.h, z0.b
-; CHECK-NEXT: mov z3.d, z1.d
+; CHECK-NEXT: uunpklo z4.s, z2.h
; CHECK-NEXT: uunpklo z1.h, z1.b
+; CHECK-NEXT: uunpklo z5.s, z3.h
; CHECK-NEXT: ext z2.b, z2.b, z2.b, #8
; CHECK-NEXT: ext z3.b, z3.b, z3.b, #8
-; CHECK-NEXT: uunpklo z4.s, z0.h
+; CHECK-NEXT: uunpklo z6.s, z0.h
; CHECK-NEXT: ext z0.b, z0.b, z0.b, #8
-; CHECK-NEXT: uunpklo z5.s, z1.h
-; CHECK-NEXT: ext z1.b, z1.b, z1.b, #8
-; CHECK-NEXT: uunpklo z2.h, z2.b
-; CHECK-NEXT: uunpklo z3.h, z3.b
-; CHECK-NEXT: uunpklo z0.s, z0.h
-; CHECK-NEXT: uunpklo z16.d, z4.s
+; CHECK-NEXT: uunpklo z7.d, z4.s
; CHECK-NEXT: ext z4.b, z4.b, z4.b, #8
-; CHECK-NEXT: uunpklo z1.s, z1.h
+; CHECK-NEXT: uunpklo z2.s, z2.h
+; CHECK-NEXT: uunpklo z3.s, z3.h
+; CHECK-NEXT: uunpklo z16.s, z1.h
; CHECK-NEXT: uunpklo z17.d, z5.s
; CHECK-NEXT: ext z5.b, z5.b, z5.b, #8
-; CHECK-NEXT: uunpklo z6.s, z2.h
-; CHECK-NEXT: uunpklo z7.s, z3.h
-; CHECK-NEXT: ext z2.b, z2.b, z2.b, #8
+; CHECK-NEXT: ext z1.b, z1.b, z1.b, #8
+; CHECK-NEXT: uunpklo z0.s, z0.h
; CHECK-NEXT: uunpklo z4.d, z4.s
-; CHECK-NEXT: ext z3.b, z3.b, z3.b, #8
-; CHECK-NEXT: uunpklo z19.d, z0.s
-; CHECK-NEXT: uunpklo z5.d, z5.s
-; CHECK-NEXT: ext z0.b, z0.b, z0.b, #8
-; CHECK-NEXT: uunpklo z2.s, z2.h
; CHECK-NEXT: uunpklo z18.d, z6.s
; CHECK-NEXT: ext z6.b, z6.b, z6.b, #8
-; CHECK-NEXT: uunpklo z3.s, z3.h
-; CHECK-NEXT: stp q16, q4, [x1, #128]
-; CHECK-NEXT: mov z16.d, z7.d
-; CHECK-NEXT: uunpklo z0.d, z0.s
-; CHECK-NEXT: stp q17, q5, [x1]
-; CHECK-NEXT: uunpklo z5.d, z7.s
-; CHECK-NEXT: uunpklo z4.d, z6.s
-; CHECK-NEXT: mov z6.d, z1.d
-; CHECK-NEXT: ext z16.b, z16.b, z7.b, #8
+; CHECK-NEXT: uunpklo z5.d, z5.s
+; CHECK-NEXT: uunpklo z1.s, z1.h
+; CHECK-NEXT: uunpklo z19.d, z16.s
+; CHECK-NEXT: uunpklo z6.d, z6.s
+; CHECK-NEXT: ext z16.b, z16.b, z16.b, #8
+; CHECK-NEXT: stp q7, q4, [x1, #128]
; CHECK-NEXT: mov z7.d, z2.d
-; CHECK-NEXT: stp q19, q0, [x1, #160]
-; CHECK-NEXT: uunpklo z0.d, z2.s
-; CHECK-NEXT: ext z6.b, z6.b, z1.b, #8
-; CHECK-NEXT: uunpklo z1.d, z1.s
-; CHECK-NEXT: stp q18, q4, [x1, #192]
; CHECK-NEXT: mov z4.d, z3.d
-; CHECK-NEXT: ext z7.b, z7.b, z2.b, #8
+; CHECK-NEXT: stp q17, q5, [x1]
+; CHECK-NEXT: mov z5.d, z0.d
; CHECK-NEXT: uunpklo z16.d, z16.s
-; CHECK-NEXT: uunpklo z6.d, z6.s
+; CHECK-NEXT: ext z7.b, z7.b, z2.b, #8
; CHECK-NEXT: ext z4.b, z4.b, z3.b, #8
-; CHECK-NEXT: uunpklo z2.d, z7.s
+; CHECK-NEXT: stp q18, q6, [x1, #192]
+; CHECK-NEXT: mov z6.d, z1.d
+; CHECK-NEXT: uunpklo z2.d, z2.s
; CHECK-NEXT: uunpklo z3.d, z3.s
-; CHECK-NEXT: stp q5, q16, [x1, #64]
-; CHECK-NEXT: stp q1, q6, [x1, #32]
-; CHECK-NEXT: uunpklo z1.d, z4.s
+; CHECK-NEXT: ext z5.b, z5.b, z0.b, #8
+; CHECK-NEXT: uunpklo z0.d, z0.s
+; CHECK-NEXT: uunpklo z7.d, z7.s
+; CHECK-NEXT: uunpklo z4.d, z4.s
+; CHECK-NEXT: stp q19, q16, [x1, #64]
+; CHECK-NEXT: ext z6.b, z6.b, z1.b, #8
+; CHECK-NEXT: uunpklo z1.d, z1.s
+; CHECK-NEXT: stp q3, q4, [x1, #32]
+; CHECK-NEXT: uunpklo z3.d, z6.s
+; CHECK-NEXT: stp q2, q7, [x1, #160]
+; CHECK-NEXT: uunpklo z2.d, z5.s
+; CHECK-NEXT: stp q1, q3, [x1, #96]
; CHECK-NEXT: stp q0, q2, [x1, #224]
-; CHECK-NEXT: stp q3, q1, [x1, #96]
; CHECK-NEXT: ret
%a = load <32 x i8>, ptr %in
%b = add <32 x i8> %a, %a
diff --git a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-to-fp.ll b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-to-fp.ll
index c110e89326cc0..9d84af1c60cdd 100644
--- a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-to-fp.ll
+++ b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-to-fp.ll
@@ -207,36 +207,35 @@ define void @ucvtf_v16i16_v16f64(ptr %a, ptr %b) {
; CHECK-NEXT: ext z1.b, z1.b, z1.b, #8
; CHECK-NEXT: uunpklo z0.s, z0.h
; CHECK-NEXT: uunpklo z1.s, z1.h
-; CHECK-NEXT: mov z4.d, z2.d
+; CHECK-NEXT: uunpklo z4.d, z2.s
+; CHECK-NEXT: ext z2.b, z2.b, z2.b, #8
; CHECK-NEXT: mov z7.d, z3.d
-; CHECK-NEXT: mov z5.d, z0.d
-; CHECK-NEXT: ext z4.b, z4.b, z2.b, #8
+; CHECK-NEXT: uunpklo z5.d, z0.s
+; CHECK-NEXT: ext z0.b, z0.b, z0.b, #8
; CHECK-NEXT: uunpklo z2.d, z2.s
; CHECK-NEXT: mov z6.d, z1.d
+; CHECK-NEXT: ucvtf z4.d, p0/m, z4.d
; CHECK-NEXT: ext z7.b, z7.b, z3.b, #8
; CHECK-NEXT: uunpklo z3.d, z3.s
-; CHECK-NEXT: ext z5.b, z5.b, z0.b, #8
-; CHECK-NEXT: uunpklo z4.d, z4.s
; CHECK-NEXT: uunpklo z0.d, z0.s
; CHECK-NEXT: ext z6.b, z6.b, z1.b, #8
; CHECK-NEXT: uunpklo z1.d, z1.s
; CHECK-NEXT: ucvtf z2.d, p0/m, z2.d
-; CHECK-NEXT: ucvtf z3.d, p0/m, z3.d
+; CHECK-NEXT: ucvtf z5.d, p0/m, z5.d
; CHECK-NEXT: uunpklo z7.d, z7.s
-; CHECK-NEXT: uunpklo z5.d, z5.s
-; CHECK-NEXT: ucvtf z4.d, p0/m, z4.d
; CHECK-NEXT: ucvtf z0.d, p0/m, z0.d
; CHECK-NEXT: uunpklo z6.d, z6.s
; CHECK-NEXT: ucvtf z1.d, p0/m, z1.d
-; CHECK-NEXT: ucvtf z5.d, p0/m, z5.d
-; CHECK-NEXT: stp q2, q4, [x1, #64]
-; CHECK-NEXT: movprfx z2, z6
-; CHECK-NEXT: ucvtf z2.d, p0/m, z6.d
-; CHECK-NEXT: stp q1, q2, [x1, #32]
-; CHECK-NEXT: stp q0, q5, [x1, #96]
-; CHECK-NEXT: movprfx z0, z7
-; CHECK-NEXT: ucvtf z0.d, p0/m, z7.d
-; CHECK-NEXT: stp q3, q0, [x1]
+; CHECK-NEXT: stp q4, q2, [x1, #64]
+; CHECK-NEXT: movprfx z4, z6
+; CHECK-NEXT: ucvtf z4.d, p0/m, z6.d
+; CHECK-NEXT: movprfx z2, z3
+; CHECK-NEXT: ucvtf z2.d, p0/m, z3.d
+; CHECK-NEXT: movprfx z3, z7
+; CHECK-NEXT: ucvtf z3.d, p0/m, z7.d
+; CHECK-NEXT: stp q2, q3, [x1]
+; CHECK-NEXT: stp q5, q0, [x1, #96]
+; CHECK-NEXT: stp q1, q4, [x1, #32]
; CHECK-NEXT: ret
%op1 = load <16 x i16>, ptr %a
%res = uitofp <16 x i16> %op1 to <16 x double>
@@ -780,36 +779,35 @@ define void @scvtf_v16i16_v16f64(ptr %a, ptr %b) {
; CHECK-NEXT: ext z1.b, z1.b, z1.b, #8
; CHECK-NEXT: sunpklo z0.s, z0.h
; CHECK-NEXT: sunpklo z1.s, z1.h
-; CHECK-NEXT: mov z4.d, z2.d
+; CHECK-NEXT: sunpklo z4.d, z2.s
+; CHECK-NEXT: ext z2.b, z2.b, z2.b, #8
; CHECK-NEXT: mov z7.d, z3.d
-; CHECK-NEXT: mov z5.d, z0.d
-; CHECK-NEXT: ext z4.b, z4.b, z2.b, #8
+; CHECK-NEXT: sunpklo z5.d, z0.s
+; CHECK-NEXT: ext z0.b, z0.b, z0.b, #8
; CHECK-NEXT: sunpklo z2.d, z2.s
; CHECK-NEXT: mov z6.d, z1.d
+; CHECK-NEXT: scvtf z4.d, p0/m, z4.d
; CHECK-NEXT: ext z7.b, z7.b, z3.b, #8
; CHECK-NEXT: sunpklo z3.d, z3.s
-; CHECK-NEXT: ext z5.b, z5.b, z0.b, #8
-; CHECK-NEXT: sunpklo z4.d, z4.s
; CHECK-NEXT: sunpklo z0.d, z0.s
; CHECK-NEXT: ext z6.b, z6.b, z1.b, #8
; CHECK-NEXT: sunpklo z1.d, z1.s
; CHECK-NEXT: scvtf z2.d, p0/m, z2.d
-; CHECK-NEXT: scvtf z3.d, p0/m, z3.d
+; CHECK-NEXT: scvtf z5.d, p0/m, z5.d
; CHECK-NEXT: sunpklo z7.d, z7.s
-; CHECK-NEXT: sunpklo z5.d, z5.s
-; CHECK-NEXT: scvtf z4.d, p0/m, z4.d
; CHECK-NEXT: scvtf z0.d, p0/m, z0.d
; CHECK-NEXT: sunpklo z6.d, z6.s
; CHECK-NEXT: scvtf z1.d, p0/m, z1.d
-; CHECK-NEXT: scvtf z5.d, p0/m, z5.d
-; CHECK-NEXT: stp q2, q4, [x1, #64]
-; CHECK-NEXT: movprfx z2, z6
-; CHECK-NEXT: scvtf z2.d, p0/m, z6.d
-; CHECK-NEXT: stp q1, q2, [x1, #32]
-; CHECK-NEXT: stp q0, q5, [x1, #96]
-; CHECK-NEXT: movprfx z0, z7
-; CHECK-NEXT: scvtf z0.d, p0/m, z7.d
-; CHECK-NEXT: stp q3, q0, [x1]
+; CHECK-NEXT: stp q4, q2, [x1, #64]
+; CHECK-NEXT: movprfx z4, z6
+; CHECK-NEXT: scvtf z4.d, p0/m, z6.d
+; CHECK-NEXT: movprfx z2, z3
+; CHECK-NEXT: scvtf z2.d, p0/m, z3.d
+; CHECK-NEXT: movprfx z3, z7
+; CHECK-NEXT: scvtf z3.d, p0/m, z7.d
+; CHECK-NEXT: stp q2, q3, [x1]
+; CHECK-NEXT: stp q5, q0, [x1, #96]
+; CHECK-NEXT: stp q1, q4, [x1, #32]
; CHECK-NEXT: ret
%op1 = load <16 x i16>, ptr %a
%res = sitofp <16 x i16> %op1 to <16 x double>
|
|
|
|
@AtariDreams please can you work with @nikic to investigate the effect on compile time : https://llvm-compile-time-tracker.com/ |
Maybe I should find out when the returns diminish enough and set that as the limit |
✅ With the latest revision this PR passed the C/C++ code formatter. |
c217f64
to
f995e46
Compare
35846cb
to
069afb0
Compare
I don't think that is needed. |
@topperc Thoughts? |
I'm skeptical it's a good idea to remove the limit completely. The reason we have thresholds like this is that it allows us to use simple algorithms that would otherwise be O(n^2) or worse. Usually what happens is that the code appears to work fine on common benchmarks, but then someone files a bug report saying the compiler times out in specific cases. |
The previously version of this PR that just raised the limits a bit looked fine to me.
It's indeed not needed, but it's pretty common to use a cl::opt for such cutoffs, so it's easier to test different value for them. |
a32bec2
to
56b3903
Compare
Now that hardware has progressed, we can greatly increase the limit to something larger, allowing room for more optimization.
…mmand line option Pulled out of comment made on llvm#80627
…mand line option Pulled out of comment made on llvm#80627 - to simplify further investigation into visit limits. Since 10 was the limit over a decade ago, I have decided to increase it by 10-fold because that is around the number where compile time vs. benefit starts to wear off for the tests that changed codegen.
…mand line option Pulled out of comment made on llvm#80627 - to simplify further investigation into visit limits. Since 10 was the limit over a decade ago, I have decided to increase it by 10-fold because that is around the number where compile time vs. benefit starts to wear off for the tests that changed codegen.
…mand line option Pulled out of comment made on llvm#80627 - to simplify further investigation into visit limits. Since 10 was the limit over a decade ago, I have decided to increase it by 10-fold because that is around the number where compile time vs. benefit starts to wear off for the tests that changed codegen.
…mand line option Pulled out of comment made on llvm#80627 - to simplify further investigation into visit limits. Since 10 was the limit over a decade ago, I have decided to increase it by 10-fold because that is around the number where compile time vs. benefit starts to wear off for the tests that changed codegen.
Now that hardware has progressed, we can greatly increase the limit to something larger, allowing room for more optimization.