Skip to content

[CodeGen] TwoAddressInstructionPass: Update default option #100046

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

AZero13
Copy link
Contributor

@AZero13 AZero13 commented Jul 23, 2024

Pulled out of comment made on #80627 - to simplify further investigation into visit limits.

Since 10 was the limit over a decade ago, I have decided to increase it by 10-fold because that is around the number where compile time vs. benefit starts to wear off for the tests that changed codegen.

@llvmbot
Copy link
Member

llvmbot commented Jul 23, 2024

@llvm/pr-subscribers-backend-aarch64
@llvm/pr-subscribers-backend-arm

@llvm/pr-subscribers-backend-x86

Author: AtariDreams (AtariDreams)

Changes

Pulled out of comment made on #80627 - to simplify further investigation into visit limits.

Since 10 was the limit over a decade ago, I have decided to increase it by 10-fold because that is around the number where compile time vs. benefit starts to wear off for the tests that changed codegen.


Patch is 2.59 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/100046.diff

35 Files Affected:

  • (modified) llvm/lib/CodeGen/TwoAddressInstructionPass.cpp (+10-3)
  • (modified) llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-extends.ll (+78-90)
  • (modified) llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-to-fp.ll (+59-73)
  • (modified) llvm/test/CodeGen/ARM/copy-by-struct-i32.ll (+18-16)
  • (modified) llvm/test/CodeGen/ARM/vselect_imax.ll (+153-156)
  • (modified) llvm/test/CodeGen/Thumb2/mve-shuffle.ll (+34-34)
  • (modified) llvm/test/CodeGen/Thumb2/mve-vld3.ll (+264-345)
  • (modified) llvm/test/CodeGen/Thumb2/mve-vldst4.ll (+28-28)
  • (modified) llvm/test/CodeGen/Thumb2/mve-vst3.ll (+44-48)
  • (modified) llvm/test/CodeGen/X86/any_extend_vector_inreg_of_broadcast_from_memory.ll (+128-124)
  • (modified) llvm/test/CodeGen/X86/machine-cp.ll (+20-20)
  • (modified) llvm/test/CodeGen/X86/oddshuffles.ll (+37-37)
  • (modified) llvm/test/CodeGen/X86/pmulh.ll (+48-50)
  • (modified) llvm/test/CodeGen/X86/scmp.ll (+117-115)
  • (modified) llvm/test/CodeGen/X86/ucmp.ll (+161-176)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-3.ll (+629-636)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-5.ll (+675-687)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-7.ll (+1222-1220)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-8.ll (+1116-1127)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-3.ll (+634-655)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-5.ll (+1468-1443)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-6.ll (+1386-1403)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-load-i8-stride-3.ll (+314-300)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-load-i8-stride-4.ll (+777-789)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-load-i8-stride-5.ll (+1554-1545)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-load-i8-stride-6.ll (+1483-1496)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-load-i8-stride-7.ll (+1923-1899)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-store-i16-stride-5.ll (+1026-1041)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-store-i16-stride-7.ll (+1948-1970)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-store-i32-stride-3.ll (+428-437)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-store-i32-stride-6.ll (+57-60)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-store-i32-stride-7.ll (+196-196)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-store-i8-stride-7.ll (+1509-1520)
  • (modified) llvm/test/CodeGen/X86/vselect-minmax.ll (+34-35)
  • (modified) llvm/test/CodeGen/X86/zero_extend_vector_inreg_of_broadcast_from_memory.ll (+55-51)
diff --git a/llvm/lib/CodeGen/TwoAddressInstructionPass.cpp b/llvm/lib/CodeGen/TwoAddressInstructionPass.cpp
index 665d57841a97b..f8b6b34e92b9f 100644
--- a/llvm/lib/CodeGen/TwoAddressInstructionPass.cpp
+++ b/llvm/lib/CodeGen/TwoAddressInstructionPass.cpp
@@ -80,6 +80,13 @@ EnableRescheduling("twoaddr-reschedule",
                    cl::desc("Coalesce copies by rescheduling (default=true)"),
                    cl::init(true), cl::Hidden);
 
+// Limit the number of rescheduling visits to dependent instructions.
+// FIXME: Arbitrary limit to reduce compile time cost.
+static cl::opt<unsigned> MaxVisits(
+    "twoaddr-visit-limit", cl::Hidden, cl::init(100),
+    cl::desc(
+        "Maximum number of rescheduling visits to dependent instructions (0 = no limit)"));
+
 // Limit the number of dataflow edges to traverse when evaluating the benefit
 // of commuting operands.
 static cl::opt<unsigned> MaxDataFlowEdge(
@@ -994,7 +1001,7 @@ bool TwoAddressInstructionImpl::rescheduleMIBelowKill(
     // Debug or pseudo instructions cannot be counted against the limit.
     if (OtherMI.isDebugOrPseudoInstr())
       continue;
-    if (NumVisited > 10)  // FIXME: Arbitrary limit to reduce compile time cost.
+    if (MaxVisits && NumVisited > MaxVisits)
       return false;
     ++NumVisited;
     if (OtherMI.hasUnmodeledSideEffects() || OtherMI.isCall() ||
@@ -1160,14 +1167,14 @@ bool TwoAddressInstructionImpl::rescheduleKillAboveMI(
     }
   }
 
-  // Check if the reschedule will not break depedencies.
+  // Check if the reschedule will not break dependencies.
   unsigned NumVisited = 0;
   for (MachineInstr &OtherMI :
        make_range(mi, MachineBasicBlock::iterator(KillMI))) {
     // Debug or pseudo instructions cannot be counted against the limit.
     if (OtherMI.isDebugOrPseudoInstr())
       continue;
-    if (NumVisited > 10)  // FIXME: Arbitrary limit to reduce compile time cost.
+    if (MaxVisits && NumVisited > MaxVisits)
       return false;
     ++NumVisited;
     if (OtherMI.hasUnmodeledSideEffects() || OtherMI.isCall() ||
diff --git a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-extends.ll b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-extends.ll
index 25a6ea490c163..c32757f123aa8 100644
--- a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-extends.ll
+++ b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-extends.ll
@@ -1148,63 +1148,57 @@ define void @sext_v32i8_v32i64(ptr %in, ptr %out) {
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    ldp q1, q0, [x0]
 ; CHECK-NEXT:    add z0.b, z0.b, z0.b
-; CHECK-NEXT:    add z1.b, z1.b, z1.b
-; CHECK-NEXT:    mov z2.d, z0.d
-; CHECK-NEXT:    sunpklo z0.h, z0.b
-; CHECK-NEXT:    mov z3.d, z1.d
-; CHECK-NEXT:    sunpklo z1.h, z1.b
+; CHECK-NEXT:    add z2.b, z1.b, z1.b
+; CHECK-NEXT:    sunpklo z3.h, z0.b
+; CHECK-NEXT:    ext z0.b, z0.b, z0.b, #8
+; CHECK-NEXT:    sunpklo z1.h, z2.b
 ; CHECK-NEXT:    ext z2.b, z2.b, z2.b, #8
+; CHECK-NEXT:    sunpklo z0.h, z0.b
+; CHECK-NEXT:    sunpklo z4.s, z3.h
 ; CHECK-NEXT:    ext z3.b, z3.b, z3.b, #8
-; CHECK-NEXT:    sunpklo z4.s, z0.h
-; CHECK-NEXT:    ext z0.b, z0.b, z0.b, #8
 ; CHECK-NEXT:    sunpklo z5.s, z1.h
-; CHECK-NEXT:    ext z1.b, z1.b, z1.b, #8
 ; CHECK-NEXT:    sunpklo z2.h, z2.b
-; CHECK-NEXT:    sunpklo z3.h, z3.b
-; CHECK-NEXT:    sunpklo z0.s, z0.h
-; CHECK-NEXT:    sunpklo z16.d, z4.s
+; CHECK-NEXT:    ext z1.b, z1.b, z1.b, #8
+; CHECK-NEXT:    sunpklo z6.s, z0.h
+; CHECK-NEXT:    sunpklo z3.s, z3.h
+; CHECK-NEXT:    ext z0.b, z0.b, z0.b, #8
+; CHECK-NEXT:    sunpklo z7.d, z4.s
 ; CHECK-NEXT:    ext z4.b, z4.b, z4.b, #8
-; CHECK-NEXT:    sunpklo z1.s, z1.h
-; CHECK-NEXT:    sunpklo z17.d, z5.s
+; CHECK-NEXT:    sunpklo z16.d, z5.s
 ; CHECK-NEXT:    ext z5.b, z5.b, z5.b, #8
-; CHECK-NEXT:    sunpklo z6.s, z2.h
-; CHECK-NEXT:    sunpklo z7.s, z3.h
+; CHECK-NEXT:    sunpklo z17.s, z2.h
 ; CHECK-NEXT:    ext z2.b, z2.b, z2.b, #8
-; CHECK-NEXT:    sunpklo z4.d, z4.s
+; CHECK-NEXT:    sunpklo z1.s, z1.h
+; CHECK-NEXT:    sunpklo z0.s, z0.h
+; CHECK-NEXT:    sunpklo z18.d, z6.s
+; CHECK-NEXT:    ext z6.b, z6.b, z6.b, #8
+; CHECK-NEXT:    sunpklo z19.d, z3.s
 ; CHECK-NEXT:    ext z3.b, z3.b, z3.b, #8
-; CHECK-NEXT:    sunpklo z19.d, z0.s
+; CHECK-NEXT:    sunpklo z4.d, z4.s
 ; CHECK-NEXT:    sunpklo z5.d, z5.s
-; CHECK-NEXT:    ext z0.b, z0.b, z0.b, #8
 ; CHECK-NEXT:    sunpklo z2.s, z2.h
-; CHECK-NEXT:    sunpklo z18.d, z6.s
-; CHECK-NEXT:    ext z6.b, z6.b, z6.b, #8
-; CHECK-NEXT:    sunpklo z3.s, z3.h
-; CHECK-NEXT:    stp q16, q4, [x1, #128]
-; CHECK-NEXT:    mov z16.d, z7.d
-; CHECK-NEXT:    sunpklo z0.d, z0.s
-; CHECK-NEXT:    stp q17, q5, [x1]
-; CHECK-NEXT:    sunpklo z5.d, z7.s
-; CHECK-NEXT:    sunpklo z4.d, z6.s
-; CHECK-NEXT:    mov z6.d, z1.d
-; CHECK-NEXT:    ext z16.b, z16.b, z7.b, #8
-; CHECK-NEXT:    mov z7.d, z2.d
-; CHECK-NEXT:    stp q19, q0, [x1, #160]
-; CHECK-NEXT:    sunpklo z0.d, z2.s
-; CHECK-NEXT:    ext z6.b, z6.b, z1.b, #8
-; CHECK-NEXT:    sunpklo z1.d, z1.s
-; CHECK-NEXT:    stp q18, q4, [x1, #192]
-; CHECK-NEXT:    mov z4.d, z3.d
-; CHECK-NEXT:    ext z7.b, z7.b, z2.b, #8
-; CHECK-NEXT:    sunpklo z16.d, z16.s
 ; CHECK-NEXT:    sunpklo z6.d, z6.s
-; CHECK-NEXT:    ext z4.b, z4.b, z3.b, #8
-; CHECK-NEXT:    sunpklo z2.d, z7.s
 ; CHECK-NEXT:    sunpklo z3.d, z3.s
-; CHECK-NEXT:    stp q5, q16, [x1, #64]
-; CHECK-NEXT:    stp q1, q6, [x1, #32]
-; CHECK-NEXT:    sunpklo z1.d, z4.s
-; CHECK-NEXT:    stp q0, q2, [x1, #224]
-; CHECK-NEXT:    stp q3, q1, [x1, #96]
+; CHECK-NEXT:    stp q16, q5, [x1]
+; CHECK-NEXT:    sunpklo z5.d, z1.s
+; CHECK-NEXT:    ext z1.b, z1.b, z1.b, #8
+; CHECK-NEXT:    stp q7, q4, [x1, #128]
+; CHECK-NEXT:    sunpklo z4.d, z17.s
+; CHECK-NEXT:    ext z17.b, z17.b, z17.b, #8
+; CHECK-NEXT:    stp q18, q6, [x1, #192]
+; CHECK-NEXT:    sunpklo z6.d, z0.s
+; CHECK-NEXT:    ext z0.b, z0.b, z0.b, #8
+; CHECK-NEXT:    stp q19, q3, [x1, #160]
+; CHECK-NEXT:    sunpklo z3.d, z2.s
+; CHECK-NEXT:    ext z2.b, z2.b, z2.b, #8
+; CHECK-NEXT:    sunpklo z7.d, z17.s
+; CHECK-NEXT:    sunpklo z1.d, z1.s
+; CHECK-NEXT:    sunpklo z0.d, z0.s
+; CHECK-NEXT:    sunpklo z2.d, z2.s
+; CHECK-NEXT:    stp q5, q1, [x1, #32]
+; CHECK-NEXT:    stp q4, q7, [x1, #64]
+; CHECK-NEXT:    stp q3, q2, [x1, #96]
+; CHECK-NEXT:    stp q6, q0, [x1, #224]
 ; CHECK-NEXT:    ret
 ;
 ; NONEON-NOSVE-LABEL: sext_v32i8_v32i64:
@@ -3133,63 +3127,57 @@ define void @zext_v32i8_v32i64(ptr %in, ptr %out) {
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    ldp q1, q0, [x0]
 ; CHECK-NEXT:    add z0.b, z0.b, z0.b
-; CHECK-NEXT:    add z1.b, z1.b, z1.b
-; CHECK-NEXT:    mov z2.d, z0.d
-; CHECK-NEXT:    uunpklo z0.h, z0.b
-; CHECK-NEXT:    mov z3.d, z1.d
-; CHECK-NEXT:    uunpklo z1.h, z1.b
+; CHECK-NEXT:    add z2.b, z1.b, z1.b
+; CHECK-NEXT:    uunpklo z3.h, z0.b
+; CHECK-NEXT:    ext z0.b, z0.b, z0.b, #8
+; CHECK-NEXT:    uunpklo z1.h, z2.b
 ; CHECK-NEXT:    ext z2.b, z2.b, z2.b, #8
+; CHECK-NEXT:    uunpklo z0.h, z0.b
+; CHECK-NEXT:    uunpklo z4.s, z3.h
 ; CHECK-NEXT:    ext z3.b, z3.b, z3.b, #8
-; CHECK-NEXT:    uunpklo z4.s, z0.h
-; CHECK-NEXT:    ext z0.b, z0.b, z0.b, #8
 ; CHECK-NEXT:    uunpklo z5.s, z1.h
-; CHECK-NEXT:    ext z1.b, z1.b, z1.b, #8
 ; CHECK-NEXT:    uunpklo z2.h, z2.b
-; CHECK-NEXT:    uunpklo z3.h, z3.b
-; CHECK-NEXT:    uunpklo z0.s, z0.h
-; CHECK-NEXT:    uunpklo z16.d, z4.s
+; CHECK-NEXT:    ext z1.b, z1.b, z1.b, #8
+; CHECK-NEXT:    uunpklo z6.s, z0.h
+; CHECK-NEXT:    uunpklo z3.s, z3.h
+; CHECK-NEXT:    ext z0.b, z0.b, z0.b, #8
+; CHECK-NEXT:    uunpklo z7.d, z4.s
 ; CHECK-NEXT:    ext z4.b, z4.b, z4.b, #8
-; CHECK-NEXT:    uunpklo z1.s, z1.h
-; CHECK-NEXT:    uunpklo z17.d, z5.s
+; CHECK-NEXT:    uunpklo z16.d, z5.s
 ; CHECK-NEXT:    ext z5.b, z5.b, z5.b, #8
-; CHECK-NEXT:    uunpklo z6.s, z2.h
-; CHECK-NEXT:    uunpklo z7.s, z3.h
+; CHECK-NEXT:    uunpklo z17.s, z2.h
 ; CHECK-NEXT:    ext z2.b, z2.b, z2.b, #8
-; CHECK-NEXT:    uunpklo z4.d, z4.s
+; CHECK-NEXT:    uunpklo z1.s, z1.h
+; CHECK-NEXT:    uunpklo z0.s, z0.h
+; CHECK-NEXT:    uunpklo z18.d, z6.s
+; CHECK-NEXT:    ext z6.b, z6.b, z6.b, #8
+; CHECK-NEXT:    uunpklo z19.d, z3.s
 ; CHECK-NEXT:    ext z3.b, z3.b, z3.b, #8
-; CHECK-NEXT:    uunpklo z19.d, z0.s
+; CHECK-NEXT:    uunpklo z4.d, z4.s
 ; CHECK-NEXT:    uunpklo z5.d, z5.s
-; CHECK-NEXT:    ext z0.b, z0.b, z0.b, #8
 ; CHECK-NEXT:    uunpklo z2.s, z2.h
-; CHECK-NEXT:    uunpklo z18.d, z6.s
-; CHECK-NEXT:    ext z6.b, z6.b, z6.b, #8
-; CHECK-NEXT:    uunpklo z3.s, z3.h
-; CHECK-NEXT:    stp q16, q4, [x1, #128]
-; CHECK-NEXT:    mov z16.d, z7.d
-; CHECK-NEXT:    uunpklo z0.d, z0.s
-; CHECK-NEXT:    stp q17, q5, [x1]
-; CHECK-NEXT:    uunpklo z5.d, z7.s
-; CHECK-NEXT:    uunpklo z4.d, z6.s
-; CHECK-NEXT:    mov z6.d, z1.d
-; CHECK-NEXT:    ext z16.b, z16.b, z7.b, #8
-; CHECK-NEXT:    mov z7.d, z2.d
-; CHECK-NEXT:    stp q19, q0, [x1, #160]
-; CHECK-NEXT:    uunpklo z0.d, z2.s
-; CHECK-NEXT:    ext z6.b, z6.b, z1.b, #8
-; CHECK-NEXT:    uunpklo z1.d, z1.s
-; CHECK-NEXT:    stp q18, q4, [x1, #192]
-; CHECK-NEXT:    mov z4.d, z3.d
-; CHECK-NEXT:    ext z7.b, z7.b, z2.b, #8
-; CHECK-NEXT:    uunpklo z16.d, z16.s
 ; CHECK-NEXT:    uunpklo z6.d, z6.s
-; CHECK-NEXT:    ext z4.b, z4.b, z3.b, #8
-; CHECK-NEXT:    uunpklo z2.d, z7.s
 ; CHECK-NEXT:    uunpklo z3.d, z3.s
-; CHECK-NEXT:    stp q5, q16, [x1, #64]
-; CHECK-NEXT:    stp q1, q6, [x1, #32]
-; CHECK-NEXT:    uunpklo z1.d, z4.s
-; CHECK-NEXT:    stp q0, q2, [x1, #224]
-; CHECK-NEXT:    stp q3, q1, [x1, #96]
+; CHECK-NEXT:    stp q16, q5, [x1]
+; CHECK-NEXT:    uunpklo z5.d, z1.s
+; CHECK-NEXT:    ext z1.b, z1.b, z1.b, #8
+; CHECK-NEXT:    stp q7, q4, [x1, #128]
+; CHECK-NEXT:    uunpklo z4.d, z17.s
+; CHECK-NEXT:    ext z17.b, z17.b, z17.b, #8
+; CHECK-NEXT:    stp q18, q6, [x1, #192]
+; CHECK-NEXT:    uunpklo z6.d, z0.s
+; CHECK-NEXT:    ext z0.b, z0.b, z0.b, #8
+; CHECK-NEXT:    stp q19, q3, [x1, #160]
+; CHECK-NEXT:    uunpklo z3.d, z2.s
+; CHECK-NEXT:    ext z2.b, z2.b, z2.b, #8
+; CHECK-NEXT:    uunpklo z7.d, z17.s
+; CHECK-NEXT:    uunpklo z1.d, z1.s
+; CHECK-NEXT:    uunpklo z0.d, z0.s
+; CHECK-NEXT:    uunpklo z2.d, z2.s
+; CHECK-NEXT:    stp q5, q1, [x1, #32]
+; CHECK-NEXT:    stp q4, q7, [x1, #64]
+; CHECK-NEXT:    stp q3, q2, [x1, #96]
+; CHECK-NEXT:    stp q6, q0, [x1, #224]
 ; CHECK-NEXT:    ret
 ;
 ; NONEON-NOSVE-LABEL: zext_v32i8_v32i64:
diff --git a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-to-fp.ll b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-to-fp.ll
index afd3bb7161c15..e34af3fe4db95 100644
--- a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-to-fp.ll
+++ b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-int-to-fp.ll
@@ -567,42 +567,37 @@ define void @ucvtf_v16i16_v16f64(ptr %a, ptr %b) {
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    ldp q1, q0, [x0]
 ; CHECK-NEXT:    ptrue p0.d, vl2
-; CHECK-NEXT:    mov z2.d, z0.d
+; CHECK-NEXT:    uunpklo z2.s, z0.h
+; CHECK-NEXT:    ext z0.b, z0.b, z0.b, #8
 ; CHECK-NEXT:    uunpklo z3.s, z1.h
 ; CHECK-NEXT:    ext z1.b, z1.b, z1.b, #8
 ; CHECK-NEXT:    uunpklo z0.s, z0.h
+; CHECK-NEXT:    uunpklo z4.d, z2.s
 ; CHECK-NEXT:    ext z2.b, z2.b, z2.b, #8
 ; CHECK-NEXT:    uunpklo z1.s, z1.h
-; CHECK-NEXT:    mov z5.d, z3.d
-; CHECK-NEXT:    uunpklo z4.d, z0.s
+; CHECK-NEXT:    uunpklo z5.d, z3.s
+; CHECK-NEXT:    ext z3.b, z3.b, z3.b, #8
+; CHECK-NEXT:    uunpklo z2.d, z2.s
+; CHECK-NEXT:    uunpklo z6.d, z0.s
 ; CHECK-NEXT:    ext z0.b, z0.b, z0.b, #8
-; CHECK-NEXT:    uunpklo z2.s, z2.h
-; CHECK-NEXT:    ext z5.b, z5.b, z3.b, #8
-; CHECK-NEXT:    mov z7.d, z1.d
+; CHECK-NEXT:    uunpklo z7.d, z1.s
+; CHECK-NEXT:    ext z1.b, z1.b, z1.b, #8
 ; CHECK-NEXT:    uunpklo z3.d, z3.s
-; CHECK-NEXT:    uunpklo z0.d, z0.s
 ; CHECK-NEXT:    ucvtf z4.d, p0/m, z4.d
-; CHECK-NEXT:    mov z6.d, z2.d
-; CHECK-NEXT:    uunpklo z5.d, z5.s
-; CHECK-NEXT:    ext z7.b, z7.b, z1.b, #8
+; CHECK-NEXT:    ucvtf z5.d, p0/m, z5.d
+; CHECK-NEXT:    uunpklo z0.d, z0.s
+; CHECK-NEXT:    ucvtf z2.d, p0/m, z2.d
 ; CHECK-NEXT:    uunpklo z1.d, z1.s
+; CHECK-NEXT:    ucvtf z6.d, p0/m, z6.d
 ; CHECK-NEXT:    ucvtf z3.d, p0/m, z3.d
 ; CHECK-NEXT:    ucvtf z0.d, p0/m, z0.d
-; CHECK-NEXT:    ext z6.b, z6.b, z2.b, #8
-; CHECK-NEXT:    uunpklo z2.d, z2.s
-; CHECK-NEXT:    uunpklo z7.d, z7.s
-; CHECK-NEXT:    ucvtf z5.d, p0/m, z5.d
+; CHECK-NEXT:    stp q4, q2, [x1, #64]
+; CHECK-NEXT:    movprfx z2, z7
+; CHECK-NEXT:    ucvtf z2.d, p0/m, z7.d
 ; CHECK-NEXT:    ucvtf z1.d, p0/m, z1.d
-; CHECK-NEXT:    uunpklo z6.d, z6.s
-; CHECK-NEXT:    stp q4, q0, [x1, #64]
-; CHECK-NEXT:    ucvtf z2.d, p0/m, z2.d
-; CHECK-NEXT:    stp q3, q5, [x1]
-; CHECK-NEXT:    movprfx z3, z7
-; CHECK-NEXT:    ucvtf z3.d, p0/m, z7.d
-; CHECK-NEXT:    movprfx z0, z6
-; CHECK-NEXT:    ucvtf z0.d, p0/m, z6.d
-; CHECK-NEXT:    stp q1, q3, [x1, #32]
-; CHECK-NEXT:    stp q2, q0, [x1, #96]
+; CHECK-NEXT:    stp q5, q3, [x1]
+; CHECK-NEXT:    stp q6, q0, [x1, #96]
+; CHECK-NEXT:    stp q2, q1, [x1, #32]
 ; CHECK-NEXT:    ret
 ;
 ; NONEON-NOSVE-LABEL: ucvtf_v16i16_v16f64:
@@ -2024,42 +2019,37 @@ define void @scvtf_v16i16_v16f64(ptr %a, ptr %b) {
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    ldp q1, q0, [x0]
 ; CHECK-NEXT:    ptrue p0.d, vl2
-; CHECK-NEXT:    mov z2.d, z0.d
+; CHECK-NEXT:    sunpklo z2.s, z0.h
+; CHECK-NEXT:    ext z0.b, z0.b, z0.b, #8
 ; CHECK-NEXT:    sunpklo z3.s, z1.h
 ; CHECK-NEXT:    ext z1.b, z1.b, z1.b, #8
 ; CHECK-NEXT:    sunpklo z0.s, z0.h
+; CHECK-NEXT:    sunpklo z4.d, z2.s
 ; CHECK-NEXT:    ext z2.b, z2.b, z2.b, #8
 ; CHECK-NEXT:    sunpklo z1.s, z1.h
-; CHECK-NEXT:    mov z5.d, z3.d
-; CHECK-NEXT:    sunpklo z4.d, z0.s
+; CHECK-NEXT:    sunpklo z5.d, z3.s
+; CHECK-NEXT:    ext z3.b, z3.b, z3.b, #8
+; CHECK-NEXT:    sunpklo z2.d, z2.s
+; CHECK-NEXT:    sunpklo z6.d, z0.s
 ; CHECK-NEXT:    ext z0.b, z0.b, z0.b, #8
-; CHECK-NEXT:    sunpklo z2.s, z2.h
-; CHECK-NEXT:    ext z5.b, z5.b, z3.b, #8
-; CHECK-NEXT:    mov z7.d, z1.d
+; CHECK-NEXT:    sunpklo z7.d, z1.s
+; CHECK-NEXT:    ext z1.b, z1.b, z1.b, #8
 ; CHECK-NEXT:    sunpklo z3.d, z3.s
-; CHECK-NEXT:    sunpklo z0.d, z0.s
 ; CHECK-NEXT:    scvtf z4.d, p0/m, z4.d
-; CHECK-NEXT:    mov z6.d, z2.d
-; CHECK-NEXT:    sunpklo z5.d, z5.s
-; CHECK-NEXT:    ext z7.b, z7.b, z1.b, #8
+; CHECK-NEXT:    scvtf z5.d, p0/m, z5.d
+; CHECK-NEXT:    sunpklo z0.d, z0.s
+; CHECK-NEXT:    scvtf z2.d, p0/m, z2.d
 ; CHECK-NEXT:    sunpklo z1.d, z1.s
+; CHECK-NEXT:    scvtf z6.d, p0/m, z6.d
 ; CHECK-NEXT:    scvtf z3.d, p0/m, z3.d
 ; CHECK-NEXT:    scvtf z0.d, p0/m, z0.d
-; CHECK-NEXT:    ext z6.b, z6.b, z2.b, #8
-; CHECK-NEXT:    sunpklo z2.d, z2.s
-; CHECK-NEXT:    sunpklo z7.d, z7.s
-; CHECK-NEXT:    scvtf z5.d, p0/m, z5.d
+; CHECK-NEXT:    stp q4, q2, [x1, #64]
+; CHECK-NEXT:    movprfx z2, z7
+; CHECK-NEXT:    scvtf z2.d, p0/m, z7.d
 ; CHECK-NEXT:    scvtf z1.d, p0/m, z1.d
-; CHECK-NEXT:    sunpklo z6.d, z6.s
-; CHECK-NEXT:    stp q4, q0, [x1, #64]
-; CHECK-NEXT:    scvtf z2.d, p0/m, z2.d
-; CHECK-NEXT:    stp q3, q5, [x1]
-; CHECK-NEXT:    movprfx z3, z7
-; CHECK-NEXT:    scvtf z3.d, p0/m, z7.d
-; CHECK-NEXT:    movprfx z0, z6
-; CHECK-NEXT:    scvtf z0.d, p0/m, z6.d
-; CHECK-NEXT:    stp q1, q3, [x1, #32]
-; CHECK-NEXT:    stp q2, q0, [x1, #96]
+; CHECK-NEXT:    stp q5, q3, [x1]
+; CHECK-NEXT:    stp q6, q0, [x1, #96]
+; CHECK-NEXT:    stp q2, q1, [x1, #32]
 ; CHECK-NEXT:    ret
 ;
 ; NONEON-NOSVE-LABEL: scvtf_v16i16_v16f64:
@@ -2507,37 +2497,33 @@ define void @scvtf_v16i32_v16f64(ptr %a, ptr %b) {
 ; CHECK-NEXT:    ldp q1, q0, [x0, #32]
 ; CHECK-NEXT:    ptrue p0.d, vl2
 ; CHECK-NEXT:    ldp q5, q4, [x0]
-; CHECK-NEXT:    mov z2.d, z0.d
-; CHECK-NEXT:    mov z3.d, z1.d
-; CHECK-NEXT:    mov z6.d, z4.d
-; CHECK-NEXT:    mov z7.d, z5.d
-; CHECK-NEXT:    ext z2.b, z2.b, z0.b, #8
-; CHECK-NEXT:    ext z3.b, z3.b, z1.b, #8
+; CHECK-NEXT:    sunpklo z2.d, z0.s
+; CHECK-NEXT:    ext z0.b, z0.b, z0.b, #8
+; CHECK-NEXT:    sunpklo z3.d, z1.s
+; CHECK-NEXT:    ext z1.b, z1.b, z1.b, #8
+; CHECK-NEXT:    sunpklo z6.d, z4.s
+; CHECK-NEXT:    ext z4.b, z4.b, z4.b, #8
+; CHECK-NEXT:    sunpklo z7.d, z5.s
+; CHECK-NEXT:    ext z5.b, z5.b, z5.b, #8
 ; CHECK-NEXT:    sunpklo z0.d, z0.s
 ; CHECK-NEXT:    sunpklo z1.d, z1.s
-; CHECK-NEXT:    ext z6.b, z6.b, z4.b, #8
-; CHECK-NEXT:    ext z7.b, z7.b, z5.b, #8
+; CHECK-NEXT:    scvtf z2.d, p0/m, z2.d
 ; CHECK-NEXT:    sunpklo z4.d, z4.s
+; CHECK-NEXT:    scvtf z3.d, p0/m, z3.d
 ; CHECK-NEXT:    sunpklo z5.d, z5.s
-; CHECK-NEXT:    sunpklo z2.d, z2.s
-; CHECK-NEXT:    sunpklo z3.d, z3.s
+; CHECK-NEXT:    scvtf z6.d, p0/m, z6.d
 ; CHECK-NEXT:    scvtf z0.d, p0/m, z0.d
-; CHECK-NEXT:    sunpklo z6.d, z6.s
-; CHECK-NEXT:    sunpklo z7.d, z7.s
 ; CHECK-NEXT:    scvtf z1.d, p0/m, z1.d
-; CHECK-NEXT:    scvtf z4.d, p0/m, z4.d
-; CHECK-NEXT:    scvtf z2.d, p0/m, z2.d
-; CHECK-NEXT:    scvtf z3.d, p0/m, z3.d
-; CHECK-NEXT:    stp q1, q3, [x1, #64]
-; CHECK-NEXT:    movprfx z1, z7
-; CHECK-NEXT:    scvtf z1.d, p0/m, z7.d
-; CHECK-NEXT:    stp q0, q2, [x1, #96]
-; CHECK-NEXT:    movprfx z0, z6
-; CHECK-NEXT:    scvtf z0.d, p0/m, z6.d
-; CHECK-NEXT:    movprfx z2, z5
-; CHECK-NEXT:    scvtf z2.d, p0/m, z5.d
-; CHECK-NEXT:    stp q2, q1, [x1]
-; CHECK-NEXT:    stp q4, q0, [x1, #32]
+; CHECK-NEXT:    stp q2, q0, [x1, #96]
+; CHECK-NEXT:    movprfx z2, z4
+; CHECK-NEXT:    scvtf z2.d, p0/m, z4.d
+; CHECK-NEXT:    movprfx z0, z7
+; CHECK-NEXT:    scvtf z0.d, p0/m, z7.d
+; CHECK-NEXT:    stp q3, q1, [x1, #64]
+; CHECK-NEXT:    movprfx z3, z5
+; CHECK-NEXT:    scvtf z3.d, p0/m, z5.d
+; CHECK-NEXT:    stp q6, q2, [x1, #32]
+; CHECK-NEXT:    stp q0, q3, [x1]
 ; CHECK-NEXT:    ret
 ;
 ; NONEON-NOSVE-LABEL: scvtf_v16i32_v16f64:
diff --git a/llvm/test/CodeGen/ARM/copy-by-struct-i32.ll b/llvm/test/CodeGen/ARM/copy-by-struct-i32.ll
index 34aab4c04b109..8f134e0ac7f18 100644
--- a/llvm/test/CodeGen/ARM/copy-by-struct-i32.ll
+++ b/llvm/test/CodeGen/ARM/copy-by-struct-i32.ll
@@ -22,23 +22,23 @@ define arm_aapcscc void @s(ptr %q, ptr %p) {
 ; ASSEMBLY-NEXT:    ldr r2, [r1, #8]
 ; ASSEMBLY-NEXT:    ldr r3, [r1, #12]
 ; ASSEMBLY-NEXT:    strd r4, r5, [sp, #128]
-; ASSEMBLY-NEXT:    add r5, r1, #16
-; ASSEMBLY-NEXT:    mov r4, sp
-; ASSEMBLY-NEXT:    vld1.32 {d16}, [r5]!
-; ASSEMBLY-NEXT:    vst1.32 {d16}, [r4]!
-; ASSEMBLY-NEXT:    vld1.32 {d16}, [r5]!
-; ASSEMBLY-NEXT:    vst1.32 {d16}, [r4]!
-; ASSEMBLY-NEXT:    vld1.32 {d16}, [r5]!
-; ASSEMBLY-NEXT:    vst1.32 {d16}, [r4]!
-; ASSEMBLY-NEXT:    vld1.32 {d16}, [r5]!
-; ASSEMBLY-NEXT:    vst1.32 {d16}, [r4]!
-; ASSEMBLY-NEXT:    vld1.32 {d16}, [r5]!
-; ASSEMBLY-NEXT:    vst1.32 {d16}, [r4]!
-; ASSEMBLY-NEXT:    vld1.32 {d16}, [r5]!
-; ASSEMBLY-NEXT:    vst1.32 {d16}, [r4]!
-; ASSEMBLY-NEXT:    vld1.32 {d16}, [r5]!
-; ASSEMBLY-NEXT:    vst1.32 {d16}, [r4]!
+; ASSEMBLY-NEXT:    add r4, r1, #16
+; ASSEMBLY-NEXT:    mov r5, sp
+; ASSEMBLY-NEXT:    vld1.32 {d16}, [r4]!
+; ASSEMBLY-NEXT:    vst1.32 {d16}, [r5]!
+; ASSEMBLY-NEXT:    vld1.32 {d16}, [r4]!
+; ASSEMBLY-NEXT:    vst1.32 {d16}, [r5]!
+; ASSEMBLY-NEXT:    vld1.32 {d16}, [r4]!
+; ASSEMBLY-NEXT:    vst1.32 {d16}, [r5]!
+; ASSEMBLY-NEXT:    vld1.32 {d16}, [r4]!
+; ASSEMBLY-NEXT:    vst1.32 {d16}, [r5]!
+; ASSEMBLY-NEXT:    vld1.32 {d16}, [r4]!
+; ASSEMBLY-NEXT:    vst1.32 {d16}, [r5]!
+; ASSEMBLY-NEXT:    vld1.32 {d16}, [r4]!
+; ASSEMBLY-NEXT:    vst1.32 {d16}, [r5]!
+; ASSEMBLY-NEXT:    vld1.32 {d16}, [r4]!
 ; ASSEMBLY-NEXT:    movw r4, #72
+; ASSEMBLY-NEXT:    vst1.32 {d16}, [r5]!
 ; ASSEMBLY-NEXT:  .LBB0_1: @ %entry
 ; ASSEMBLY-NEXT:    @ =>This Inner Loop Header: Depth=1
 ; ASSEMBLY-NEXT:    vld1.32 {d16}, [r1]!
@@ -58,3 +58,5 @@ entry:
 }
 
 declare arm_aapcscc void @r(...)
+;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+; BEFORE-EXPAND: {{.*}}
diff --git a/llvm/test/CodeGen/ARM/vselect_imax.ll b/llvm/test/CodeGen/ARM/vselect_imax.ll
index 37f511fcc68cc..89072683fb01a 100644
--- a/llvm/test/CodeGen/ARM/vselect_imax.ll
+++ b/llvm/test/CodeGen/ARM/vselect_imax.ll
@@ -242,198 +242,195 @@ define void @func_blend20(ptr %loadaddr, ptr %loadaddr2,
                            ptr %blend, ptr %storeaddr) {
 ; CHECK-LABEL: func_blend20:
 ; CHECK:       @ %bb.0:
-; CHECK-NEXT:    .save {r4, r5, r6, r7, r8, r9, r10, lr}
-; CHECK-NEXT:    push {r4, r5, r6, r7, r8, r9, r10, lr}
+; CHECK-NEXT:    .save {r4, r5, r6, r7, r8, r9, r11, lr}
+; CHECK-NEXT:    push {r4, r5, r6, r7, r8, r9, r11, lr}
 ; CHECK-NEXT...
[truncated]

@AZero13 AZero13 force-pushed the instrmac branch 2 times, most recently from e829f0d to 094d235 Compare July 23, 2024 00:58
@AZero13 AZero13 changed the title [CodeGen] TwoAddressInstructionPass - Control NumVisited limit via command line option [CodeGen] TwoAddressInstructionPass: Control NumVisited limit via command line option Jul 23, 2024
@AZero13
Copy link
Contributor Author

AZero13 commented Jul 23, 2024

@RKSimon It is now ready

@AZero13 AZero13 force-pushed the instrmac branch 3 times, most recently from 36cd318 to 16e50a3 Compare July 23, 2024 02:43
…mand line option

Pulled out of comment made on llvm#80627 - to simplify further investigation into visit limits.

Since 10 was the limit over a decade ago, I have decided to increase it by 10-fold because that is around the number where compile time vs. benefit starts to wear off for the tests that changed codegen.
@AZero13
Copy link
Contributor Author

AZero13 commented Jul 23, 2024

@topperc Thoughts on this?

// Limit the number of rescheduling visits to dependent instructions.
// FIXME: Arbitrary limit to reduce compile time cost.
static cl::opt<unsigned>
MaxVisits("twoaddr-visit-limit", cl::Hidden, cl::init(100),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this PR, it would be better to keep the original value as default (10) - but add test coverage for setting twoaddr-visit-limit to higher (and lower?) values. We can then do another PR that alters the default value in the future.

@AZero13 AZero13 changed the title [CodeGen] TwoAddressInstructionPass: Control NumVisited limit via command line option [CodeGen] TwoAddressInstructionPass: Update default option Jul 23, 2024
@AZero13 AZero13 closed this Sep 10, 2024
@AZero13 AZero13 reopened this Mar 4, 2025
@AZero13 AZero13 closed this Mar 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants