Skip to content

[MCP] Optimize copies when src is used during backward propagation #111130

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Oct 23, 2024

Conversation

vladimirradosavljevic
Copy link
Contributor

Before this patch, redundant COPY couldn't be removed for the following case:

  $R0 = OP ...
  ... // Read of %R0
  $R1 = COPY killed $R0

This patch adds support for tracking the users of the source register during backward propagation, so that we can remove the redundant COPY in the above case and optimize it to:

  $R1 = OP ...
  ... // Replace all uses of %R0 with $R1

Copy link

github-actions bot commented Oct 4, 2024

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented Oct 4, 2024

@llvm/pr-subscribers-backend-aarch64
@llvm/pr-subscribers-llvm-regalloc
@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-backend-arm

Author: Vladimir Radosavljevic (vladimirradosavljevic)

Changes

Before this patch, redundant COPY couldn't be removed for the following case:

  $R0 = OP ...
  ... // Read of %R0
  $R1 = COPY killed $R0

This patch adds support for tracking the users of the source register during backward propagation, so that we can remove the redundant COPY in the above case and optimize it to:

  $R1 = OP ...
  ... // Replace all uses of %R0 with $R1

Patch is 1.37 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/111130.diff

65 Files Affected:

  • (modified) llvm/lib/CodeGen/MachineCopyPropagation.cpp (+75-2)
  • (modified) llvm/test/CodeGen/ARM/umulo-128-legalisation-lowering.ll (+2-3)
  • (modified) llvm/test/CodeGen/Mips/llvm-ir/sdiv.ll (+30-38)
  • (modified) llvm/test/CodeGen/Mips/llvm-ir/srem.ll (+30-38)
  • (modified) llvm/test/CodeGen/Mips/llvm-ir/udiv.ll (+30-38)
  • (modified) llvm/test/CodeGen/Mips/llvm-ir/urem.ll (+30-38)
  • (modified) llvm/test/CodeGen/Mips/mcount.ll (+2-3)
  • (removed) llvm/test/CodeGen/Mips/micromips-gp-rc.ll (-18)
  • (modified) llvm/test/CodeGen/Mips/tailcall/tailcall.ll (+17-4)
  • (modified) llvm/test/CodeGen/Mips/tls.ll (+3-4)
  • (modified) llvm/test/CodeGen/X86/fp128-libcalls-strict.ll (+4-6)
  • (modified) llvm/test/CodeGen/X86/fptosi-sat-vector-128.ll (+5-8)
  • (modified) llvm/test/CodeGen/X86/matrix-multiply.ll (+15-20)
  • (modified) llvm/test/CodeGen/X86/mul-i1024.ll (+29-40)
  • (modified) llvm/test/CodeGen/X86/mul-i512.ll (+6-8)
  • (modified) llvm/test/CodeGen/X86/pr46877.ll (+2-3)
  • (modified) llvm/test/CodeGen/X86/sdiv_fix.ll (+2-3)
  • (modified) llvm/test/CodeGen/X86/shift-and.ll (+2-3)
  • (modified) llvm/test/CodeGen/X86/smulo-128-legalisation-lowering.ll (+2-3)
  • (modified) llvm/test/CodeGen/X86/sqrt-fastmath.ll (+3-4)
  • (modified) llvm/test/CodeGen/X86/umulo-128-legalisation-lowering.ll (+2-3)
  • (modified) llvm/test/CodeGen/X86/vec_smulo.ll (+6-9)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-3.ll (+42-61)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-4.ll (+11-15)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-5.ll (+172-247)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-6.ll (+37-54)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-7.ll (+222-325)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-8.ll (+89-128)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-3.ll (+27-36)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-4.ll (+100-141)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-5.ll (+73-103)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-6.ll (+70-97)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-7.ll (+248-335)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-8.ll (+164-236)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-load-i64-stride-3.ll (+6-8)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-load-i64-stride-5.ll (+126-162)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-load-i64-stride-6.ll (+132-177)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-load-i64-stride-7.ll (+112-152)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-load-i64-stride-8.ll (+80-112)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-load-i8-stride-4.ll (+17-25)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-load-i8-stride-5.ll (+49-70)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-load-i8-stride-6.ll (+73-99)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-load-i8-stride-7.ll (+159-228)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-load-i8-stride-8.ll (+193-283)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-store-i16-stride-3.ll (+37-52)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-store-i16-stride-5.ll (+17-25)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-store-i16-stride-6.ll (+131-181)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-store-i16-stride-7.ll (+345-471)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-store-i16-stride-8.ll (+57-84)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-store-i32-stride-3.ll (+39-52)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-store-i32-stride-5.ll (+96-120)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-store-i32-stride-6.ll (+49-64)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-store-i32-stride-7.ll (+244-338)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-store-i32-stride-8.ll (+146-216)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-store-i64-stride-5.ll (+8-11)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-store-i64-stride-6.ll (+136-152)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-store-i64-stride-7.ll (+268-352)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-store-i8-stride-5.ll (+38-52)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-store-i8-stride-6.ll (+60-84)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-store-i8-stride-7.ll (+107-149)
  • (modified) llvm/test/CodeGen/X86/vector-interleaved-store-i8-stride-8.ll (+41-61)
  • (modified) llvm/test/CodeGen/X86/vector-replicaton-i1-mask.ll (+28-37)
  • (modified) llvm/test/CodeGen/X86/wide-scalar-shift-by-byte-multiple-legalization.ll (+18-24)
  • (modified) llvm/test/CodeGen/X86/wide-scalar-shift-legalization.ll (+16-21)
  • (modified) llvm/test/CodeGen/X86/widen-load-of-small-alloca-with-zero-upper-half.ll (+2-3)
diff --git a/llvm/lib/CodeGen/MachineCopyPropagation.cpp b/llvm/lib/CodeGen/MachineCopyPropagation.cpp
index 8bcc437cbfb865..8293aba823ed79 100644
--- a/llvm/lib/CodeGen/MachineCopyPropagation.cpp
+++ b/llvm/lib/CodeGen/MachineCopyPropagation.cpp
@@ -110,6 +110,7 @@ class CopyTracker {
   struct CopyInfo {
     MachineInstr *MI = nullptr;
     MachineInstr *LastSeenUseInCopy = nullptr;
+    SmallPtrSet<MachineInstr *, 4> SrcUsers;
     SmallVector<MCRegister, 4> DefRegs;
     bool Avail = false;
   };
@@ -224,6 +225,43 @@ class CopyTracker {
     }
   }
 
+  /// Track copy's src users, and return false if that can't be done.
+  /// We can only track if we have a COPY instruction which source is
+  /// the same as the Reg.
+  bool trackSrcUsers(MCRegister Reg, MachineInstr &MI,
+                     const TargetRegisterInfo &TRI, const TargetInstrInfo &TII,
+                     bool UseCopyInstr) {
+    MCRegUnit RU = *TRI.regunits(Reg).begin();
+    MachineInstr *AvailCopy = findCopyDefViaUnit(RU, TRI);
+    if (!AvailCopy)
+      return false;
+
+    std::optional<DestSourcePair> CopyOperands =
+        isCopyInstr(*AvailCopy, TII, UseCopyInstr);
+    Register Src = CopyOperands->Source->getReg();
+
+    // Bail out, if the source of the copy is not the same as the Reg.
+    if (Src != Reg)
+      return false;
+
+    auto I = Copies.find(RU);
+    if (I == Copies.end())
+      return false;
+
+    I->second.SrcUsers.insert(&MI);
+    return true;
+  }
+
+  /// Return the users for a given register.
+  SmallPtrSet<MachineInstr *, 4> getSrcUsers(MCRegister Reg,
+                                             const TargetRegisterInfo &TRI) {
+    MCRegUnit RU = *TRI.regunits(Reg).begin();
+    auto I = Copies.find(RU);
+    if (I == Copies.end())
+      return {};
+    return I->second.SrcUsers;
+  }
+
   /// Add this copy's registers into the tracker's copy maps.
   void trackCopy(MachineInstr *MI, const TargetRegisterInfo &TRI,
                  const TargetInstrInfo &TII, bool UseCopyInstr) {
@@ -236,7 +274,7 @@ class CopyTracker {
 
     // Remember Def is defined by the copy.
     for (MCRegUnit Unit : TRI.regunits(Def))
-      Copies[Unit] = {MI, nullptr, {}, true};
+      Copies[Unit] = {MI, nullptr, {}, {}, true};
 
     // Remember source that's copied to Def. Once it's clobbered, then
     // it's no longer available for copy propagation.
@@ -427,6 +465,7 @@ class MachineCopyPropagation : public MachineFunctionPass {
   bool hasImplicitOverlap(const MachineInstr &MI, const MachineOperand &Use);
   bool hasOverlappingMultipleDef(const MachineInstr &MI,
                                  const MachineOperand &MODef, Register Def);
+  bool canUpdateSrcUsers(const MachineInstr &Copy, const MachineOperand &MODef);
 
   /// Candidates for deletion.
   SmallSetVector<MachineInstr *, 8> MaybeDeadCopies;
@@ -667,6 +706,26 @@ bool MachineCopyPropagation::hasOverlappingMultipleDef(
   return false;
 }
 
+/// Return true if it is safe to update the users of the source register of the
+/// copy.
+bool MachineCopyPropagation::canUpdateSrcUsers(const MachineInstr &Copy,
+                                               const MachineOperand &CopySrc) {
+  for (auto *SrcUser : Tracker.getSrcUsers(CopySrc.getReg(), *TRI)) {
+    if (hasImplicitOverlap(*SrcUser, CopySrc))
+      return false;
+
+    for (MachineOperand &MO : SrcUser->uses()) {
+      if (!MO.isReg() || MO.getReg() != CopySrc.getReg())
+        continue;
+      if (MO.isTied() || !MO.isRenamable() ||
+          !isBackwardPropagatableRegClassCopy(Copy, *SrcUser,
+                                              MO.getOperandNo()))
+        return false;
+    }
+  }
+  return true;
+}
+
 /// Look for available copies whose destination register is used by \p MI and
 /// replace the use in \p MI with the copy's source register.
 void MachineCopyPropagation::forwardUses(MachineInstr &MI) {
@@ -1030,6 +1089,9 @@ void MachineCopyPropagation::propagateDefs(MachineInstr &MI) {
     if (hasOverlappingMultipleDef(MI, MODef, Def))
       continue;
 
+    if (!canUpdateSrcUsers(*Copy, *CopyOperands->Source))
+      continue;
+
     LLVM_DEBUG(dbgs() << "MCP: Replacing " << printReg(MODef.getReg(), TRI)
                       << "\n     with " << printReg(Def, TRI) << "\n     in "
                       << MI << "     from " << *Copy);
@@ -1037,6 +1099,15 @@ void MachineCopyPropagation::propagateDefs(MachineInstr &MI) {
     MODef.setReg(Def);
     MODef.setIsRenamable(CopyOperands->Destination->isRenamable());
 
+    for (auto *SrcUser : Tracker.getSrcUsers(Src, *TRI)) {
+      for (MachineOperand &MO : SrcUser->operands()) {
+        if (!MO.isReg() || !MO.isUse() || MO.getReg() != Src)
+          continue;
+        MO.setReg(Def);
+        MO.setIsRenamable(CopyOperands->Destination->isRenamable());
+      }
+    }
+
     LLVM_DEBUG(dbgs() << "MCP: After replacement: " << MI << "\n");
     MaybeDeadCopies.insert(Copy);
     Changed = true;
@@ -1102,7 +1173,9 @@ void MachineCopyPropagation::BackwardCopyPropagateBlock(
               CopyDbgUsers[Copy].insert(&MI);
             }
           }
-        } else {
+        } else if (!Tracker.trackSrcUsers(MO.getReg().asMCReg(), MI, *TRI, *TII,
+                                          UseCopyInstr)) {
+          // If we can't track the source users, invalidate the register.
           Tracker.invalidateRegister(MO.getReg().asMCReg(), *TRI, *TII,
                                      UseCopyInstr);
         }
diff --git a/llvm/test/CodeGen/ARM/umulo-128-legalisation-lowering.ll b/llvm/test/CodeGen/ARM/umulo-128-legalisation-lowering.ll
index afd75940b45932..464808ec8861b3 100644
--- a/llvm/test/CodeGen/ARM/umulo-128-legalisation-lowering.ll
+++ b/llvm/test/CodeGen/ARM/umulo-128-legalisation-lowering.ll
@@ -7,12 +7,11 @@ define { i128, i8 } @muloti_test(i128 %l, i128 %r) unnamed_addr #0 {
 ; ARMV6:       @ %bb.0: @ %start
 ; ARMV6-NEXT:    push {r4, r5, r6, r7, r8, r9, r10, r11, lr}
 ; ARMV6-NEXT:    sub sp, sp, #28
-; ARMV6-NEXT:    ldr r7, [sp, #72]
+; ARMV6-NEXT:    ldr lr, [sp, #72]
 ; ARMV6-NEXT:    mov r6, r0
 ; ARMV6-NEXT:    str r0, [sp, #8] @ 4-byte Spill
 ; ARMV6-NEXT:    ldr r4, [sp, #84]
-; ARMV6-NEXT:    umull r1, r0, r2, r7
-; ARMV6-NEXT:    mov lr, r7
+; ARMV6-NEXT:    umull r1, r0, r2, lr
 ; ARMV6-NEXT:    umull r5, r10, r4, r2
 ; ARMV6-NEXT:    str r1, [r6]
 ; ARMV6-NEXT:    ldr r6, [sp, #80]
diff --git a/llvm/test/CodeGen/Mips/llvm-ir/sdiv.ll b/llvm/test/CodeGen/Mips/llvm-ir/sdiv.ll
index 8d548861f43936..72cead18f89fab 100644
--- a/llvm/test/CodeGen/Mips/llvm-ir/sdiv.ll
+++ b/llvm/test/CodeGen/Mips/llvm-ir/sdiv.ll
@@ -388,9 +388,8 @@ define signext i64 @sdiv_i64(i64 signext %a, i64 signext %b) {
 ; MMR3-NEXT:    .cfi_def_cfa_offset 24
 ; MMR3-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
 ; MMR3-NEXT:    .cfi_offset 31, -4
-; MMR3-NEXT:    addu $2, $2, $25
-; MMR3-NEXT:    lw $25, %call16(__divdi3)($2)
-; MMR3-NEXT:    move $gp, $2
+; MMR3-NEXT:    addu $gp, $2, $25
+; MMR3-NEXT:    lw $25, %call16(__divdi3)($gp)
 ; MMR3-NEXT:    jalr $25
 ; MMR3-NEXT:    nop
 ; MMR3-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
@@ -405,9 +404,8 @@ define signext i64 @sdiv_i64(i64 signext %a, i64 signext %b) {
 ; MMR6-NEXT:    .cfi_def_cfa_offset 24
 ; MMR6-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
 ; MMR6-NEXT:    .cfi_offset 31, -4
-; MMR6-NEXT:    addu $2, $2, $25
-; MMR6-NEXT:    lw $25, %call16(__divdi3)($2)
-; MMR6-NEXT:    move $gp, $2
+; MMR6-NEXT:    addu $gp, $2, $25
+; MMR6-NEXT:    lw $25, %call16(__divdi3)($gp)
 ; MMR6-NEXT:    jalr $25
 ; MMR6-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
 ; MMR6-NEXT:    addiu $sp, $sp, 24
@@ -549,65 +547,59 @@ define signext i128 @sdiv_i128(i128 signext %a, i128 signext %b) {
 ; MMR3:       # %bb.0: # %entry
 ; MMR3-NEXT:    lui $2, %hi(_gp_disp)
 ; MMR3-NEXT:    addiu $2, $2, %lo(_gp_disp)
-; MMR3-NEXT:    addiusp -48
-; MMR3-NEXT:    .cfi_def_cfa_offset 48
-; MMR3-NEXT:    sw $ra, 44($sp) # 4-byte Folded Spill
-; MMR3-NEXT:    swp $16, 36($sp)
+; MMR3-NEXT:    addiusp -40
+; MMR3-NEXT:    .cfi_def_cfa_offset 40
+; MMR3-NEXT:    sw $ra, 36($sp) # 4-byte Folded Spill
+; MMR3-NEXT:    sw $17, 32($sp) # 4-byte Folded Spill
 ; MMR3-NEXT:    .cfi_offset 31, -4
 ; MMR3-NEXT:    .cfi_offset 17, -8
-; MMR3-NEXT:    .cfi_offset 16, -12
-; MMR3-NEXT:    addu $16, $2, $25
+; MMR3-NEXT:    addu $gp, $2, $25
 ; MMR3-NEXT:    move $1, $7
-; MMR3-NEXT:    lw $7, 68($sp)
-; MMR3-NEXT:    lw $17, 72($sp)
-; MMR3-NEXT:    lw $3, 76($sp)
+; MMR3-NEXT:    lw $7, 60($sp)
+; MMR3-NEXT:    lw $17, 64($sp)
+; MMR3-NEXT:    lw $3, 68($sp)
 ; MMR3-NEXT:    move $2, $sp
 ; MMR3-NEXT:    sw16 $3, 28($2)
 ; MMR3-NEXT:    sw16 $17, 24($2)
 ; MMR3-NEXT:    sw16 $7, 20($2)
-; MMR3-NEXT:    lw $3, 64($sp)
+; MMR3-NEXT:    lw $3, 56($sp)
 ; MMR3-NEXT:    sw16 $3, 16($2)
-; MMR3-NEXT:    lw $25, %call16(__divti3)($16)
+; MMR3-NEXT:    lw $25, %call16(__divti3)($gp)
 ; MMR3-NEXT:    move $7, $1
-; MMR3-NEXT:    move $gp, $16
 ; MMR3-NEXT:    jalr $25
 ; MMR3-NEXT:    nop
-; MMR3-NEXT:    lwp $16, 36($sp)
-; MMR3-NEXT:    lw $ra, 44($sp) # 4-byte Folded Reload
-; MMR3-NEXT:    addiusp 48
+; MMR3-NEXT:    lw $17, 32($sp) # 4-byte Folded Reload
+; MMR3-NEXT:    lw $ra, 36($sp) # 4-byte Folded Reload
+; MMR3-NEXT:    addiusp 40
 ; MMR3-NEXT:    jrc $ra
 ;
 ; MMR6-LABEL: sdiv_i128:
 ; MMR6:       # %bb.0: # %entry
 ; MMR6-NEXT:    lui $2, %hi(_gp_disp)
 ; MMR6-NEXT:    addiu $2, $2, %lo(_gp_disp)
-; MMR6-NEXT:    addiu $sp, $sp, -48
-; MMR6-NEXT:    .cfi_def_cfa_offset 48
-; MMR6-NEXT:    sw $ra, 44($sp) # 4-byte Folded Spill
-; MMR6-NEXT:    sw $17, 40($sp) # 4-byte Folded Spill
-; MMR6-NEXT:    sw $16, 36($sp) # 4-byte Folded Spill
+; MMR6-NEXT:    addiu $sp, $sp, -40
+; MMR6-NEXT:    .cfi_def_cfa_offset 40
+; MMR6-NEXT:    sw $ra, 36($sp) # 4-byte Folded Spill
+; MMR6-NEXT:    sw $17, 32($sp) # 4-byte Folded Spill
 ; MMR6-NEXT:    .cfi_offset 31, -4
 ; MMR6-NEXT:    .cfi_offset 17, -8
-; MMR6-NEXT:    .cfi_offset 16, -12
-; MMR6-NEXT:    addu $16, $2, $25
+; MMR6-NEXT:    addu $gp, $2, $25
 ; MMR6-NEXT:    move $1, $7
-; MMR6-NEXT:    lw $7, 68($sp)
-; MMR6-NEXT:    lw $17, 72($sp)
-; MMR6-NEXT:    lw $3, 76($sp)
+; MMR6-NEXT:    lw $7, 60($sp)
+; MMR6-NEXT:    lw $17, 64($sp)
+; MMR6-NEXT:    lw $3, 68($sp)
 ; MMR6-NEXT:    move $2, $sp
 ; MMR6-NEXT:    sw16 $3, 28($2)
 ; MMR6-NEXT:    sw16 $17, 24($2)
 ; MMR6-NEXT:    sw16 $7, 20($2)
-; MMR6-NEXT:    lw $3, 64($sp)
+; MMR6-NEXT:    lw $3, 56($sp)
 ; MMR6-NEXT:    sw16 $3, 16($2)
-; MMR6-NEXT:    lw $25, %call16(__divti3)($16)
+; MMR6-NEXT:    lw $25, %call16(__divti3)($gp)
 ; MMR6-NEXT:    move $7, $1
-; MMR6-NEXT:    move $gp, $16
 ; MMR6-NEXT:    jalr $25
-; MMR6-NEXT:    lw $16, 36($sp) # 4-byte Folded Reload
-; MMR6-NEXT:    lw $17, 40($sp) # 4-byte Folded Reload
-; MMR6-NEXT:    lw $ra, 44($sp) # 4-byte Folded Reload
-; MMR6-NEXT:    addiu $sp, $sp, 48
+; MMR6-NEXT:    lw $17, 32($sp) # 4-byte Folded Reload
+; MMR6-NEXT:    lw $ra, 36($sp) # 4-byte Folded Reload
+; MMR6-NEXT:    addiu $sp, $sp, 40
 ; MMR6-NEXT:    jrc $ra
 entry:
   %r = sdiv i128 %a, %b
diff --git a/llvm/test/CodeGen/Mips/llvm-ir/srem.ll b/llvm/test/CodeGen/Mips/llvm-ir/srem.ll
index 29cb34b8d970f1..72496fcc53a5ac 100644
--- a/llvm/test/CodeGen/Mips/llvm-ir/srem.ll
+++ b/llvm/test/CodeGen/Mips/llvm-ir/srem.ll
@@ -336,9 +336,8 @@ define signext i64 @srem_i64(i64 signext %a, i64 signext %b) {
 ; MMR3-NEXT:    .cfi_def_cfa_offset 24
 ; MMR3-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
 ; MMR3-NEXT:    .cfi_offset 31, -4
-; MMR3-NEXT:    addu $2, $2, $25
-; MMR3-NEXT:    lw $25, %call16(__moddi3)($2)
-; MMR3-NEXT:    move $gp, $2
+; MMR3-NEXT:    addu $gp, $2, $25
+; MMR3-NEXT:    lw $25, %call16(__moddi3)($gp)
 ; MMR3-NEXT:    jalr $25
 ; MMR3-NEXT:    nop
 ; MMR3-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
@@ -353,9 +352,8 @@ define signext i64 @srem_i64(i64 signext %a, i64 signext %b) {
 ; MMR6-NEXT:    .cfi_def_cfa_offset 24
 ; MMR6-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
 ; MMR6-NEXT:    .cfi_offset 31, -4
-; MMR6-NEXT:    addu $2, $2, $25
-; MMR6-NEXT:    lw $25, %call16(__moddi3)($2)
-; MMR6-NEXT:    move $gp, $2
+; MMR6-NEXT:    addu $gp, $2, $25
+; MMR6-NEXT:    lw $25, %call16(__moddi3)($gp)
 ; MMR6-NEXT:    jalr $25
 ; MMR6-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
 ; MMR6-NEXT:    addiu $sp, $sp, 24
@@ -497,65 +495,59 @@ define signext i128 @srem_i128(i128 signext %a, i128 signext %b) {
 ; MMR3:       # %bb.0: # %entry
 ; MMR3-NEXT:    lui $2, %hi(_gp_disp)
 ; MMR3-NEXT:    addiu $2, $2, %lo(_gp_disp)
-; MMR3-NEXT:    addiusp -48
-; MMR3-NEXT:    .cfi_def_cfa_offset 48
-; MMR3-NEXT:    sw $ra, 44($sp) # 4-byte Folded Spill
-; MMR3-NEXT:    swp $16, 36($sp)
+; MMR3-NEXT:    addiusp -40
+; MMR3-NEXT:    .cfi_def_cfa_offset 40
+; MMR3-NEXT:    sw $ra, 36($sp) # 4-byte Folded Spill
+; MMR3-NEXT:    sw $17, 32($sp) # 4-byte Folded Spill
 ; MMR3-NEXT:    .cfi_offset 31, -4
 ; MMR3-NEXT:    .cfi_offset 17, -8
-; MMR3-NEXT:    .cfi_offset 16, -12
-; MMR3-NEXT:    addu $16, $2, $25
+; MMR3-NEXT:    addu $gp, $2, $25
 ; MMR3-NEXT:    move $1, $7
-; MMR3-NEXT:    lw $7, 68($sp)
-; MMR3-NEXT:    lw $17, 72($sp)
-; MMR3-NEXT:    lw $3, 76($sp)
+; MMR3-NEXT:    lw $7, 60($sp)
+; MMR3-NEXT:    lw $17, 64($sp)
+; MMR3-NEXT:    lw $3, 68($sp)
 ; MMR3-NEXT:    move $2, $sp
 ; MMR3-NEXT:    sw16 $3, 28($2)
 ; MMR3-NEXT:    sw16 $17, 24($2)
 ; MMR3-NEXT:    sw16 $7, 20($2)
-; MMR3-NEXT:    lw $3, 64($sp)
+; MMR3-NEXT:    lw $3, 56($sp)
 ; MMR3-NEXT:    sw16 $3, 16($2)
-; MMR3-NEXT:    lw $25, %call16(__modti3)($16)
+; MMR3-NEXT:    lw $25, %call16(__modti3)($gp)
 ; MMR3-NEXT:    move $7, $1
-; MMR3-NEXT:    move $gp, $16
 ; MMR3-NEXT:    jalr $25
 ; MMR3-NEXT:    nop
-; MMR3-NEXT:    lwp $16, 36($sp)
-; MMR3-NEXT:    lw $ra, 44($sp) # 4-byte Folded Reload
-; MMR3-NEXT:    addiusp 48
+; MMR3-NEXT:    lw $17, 32($sp) # 4-byte Folded Reload
+; MMR3-NEXT:    lw $ra, 36($sp) # 4-byte Folded Reload
+; MMR3-NEXT:    addiusp 40
 ; MMR3-NEXT:    jrc $ra
 ;
 ; MMR6-LABEL: srem_i128:
 ; MMR6:       # %bb.0: # %entry
 ; MMR6-NEXT:    lui $2, %hi(_gp_disp)
 ; MMR6-NEXT:    addiu $2, $2, %lo(_gp_disp)
-; MMR6-NEXT:    addiu $sp, $sp, -48
-; MMR6-NEXT:    .cfi_def_cfa_offset 48
-; MMR6-NEXT:    sw $ra, 44($sp) # 4-byte Folded Spill
-; MMR6-NEXT:    sw $17, 40($sp) # 4-byte Folded Spill
-; MMR6-NEXT:    sw $16, 36($sp) # 4-byte Folded Spill
+; MMR6-NEXT:    addiu $sp, $sp, -40
+; MMR6-NEXT:    .cfi_def_cfa_offset 40
+; MMR6-NEXT:    sw $ra, 36($sp) # 4-byte Folded Spill
+; MMR6-NEXT:    sw $17, 32($sp) # 4-byte Folded Spill
 ; MMR6-NEXT:    .cfi_offset 31, -4
 ; MMR6-NEXT:    .cfi_offset 17, -8
-; MMR6-NEXT:    .cfi_offset 16, -12
-; MMR6-NEXT:    addu $16, $2, $25
+; MMR6-NEXT:    addu $gp, $2, $25
 ; MMR6-NEXT:    move $1, $7
-; MMR6-NEXT:    lw $7, 68($sp)
-; MMR6-NEXT:    lw $17, 72($sp)
-; MMR6-NEXT:    lw $3, 76($sp)
+; MMR6-NEXT:    lw $7, 60($sp)
+; MMR6-NEXT:    lw $17, 64($sp)
+; MMR6-NEXT:    lw $3, 68($sp)
 ; MMR6-NEXT:    move $2, $sp
 ; MMR6-NEXT:    sw16 $3, 28($2)
 ; MMR6-NEXT:    sw16 $17, 24($2)
 ; MMR6-NEXT:    sw16 $7, 20($2)
-; MMR6-NEXT:    lw $3, 64($sp)
+; MMR6-NEXT:    lw $3, 56($sp)
 ; MMR6-NEXT:    sw16 $3, 16($2)
-; MMR6-NEXT:    lw $25, %call16(__modti3)($16)
+; MMR6-NEXT:    lw $25, %call16(__modti3)($gp)
 ; MMR6-NEXT:    move $7, $1
-; MMR6-NEXT:    move $gp, $16
 ; MMR6-NEXT:    jalr $25
-; MMR6-NEXT:    lw $16, 36($sp) # 4-byte Folded Reload
-; MMR6-NEXT:    lw $17, 40($sp) # 4-byte Folded Reload
-; MMR6-NEXT:    lw $ra, 44($sp) # 4-byte Folded Reload
-; MMR6-NEXT:    addiu $sp, $sp, 48
+; MMR6-NEXT:    lw $17, 32($sp) # 4-byte Folded Reload
+; MMR6-NEXT:    lw $ra, 36($sp) # 4-byte Folded Reload
+; MMR6-NEXT:    addiu $sp, $sp, 40
 ; MMR6-NEXT:    jrc $ra
 entry:
   %r = srem i128 %a, %b
diff --git a/llvm/test/CodeGen/Mips/llvm-ir/udiv.ll b/llvm/test/CodeGen/Mips/llvm-ir/udiv.ll
index cc2c6614e69c8f..9451f1e9be0967 100644
--- a/llvm/test/CodeGen/Mips/llvm-ir/udiv.ll
+++ b/llvm/test/CodeGen/Mips/llvm-ir/udiv.ll
@@ -336,9 +336,8 @@ define signext i64 @udiv_i64(i64 signext %a, i64 signext %b) {
 ; MMR3-NEXT:    .cfi_def_cfa_offset 24
 ; MMR3-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
 ; MMR3-NEXT:    .cfi_offset 31, -4
-; MMR3-NEXT:    addu $2, $2, $25
-; MMR3-NEXT:    lw $25, %call16(__udivdi3)($2)
-; MMR3-NEXT:    move $gp, $2
+; MMR3-NEXT:    addu $gp, $2, $25
+; MMR3-NEXT:    lw $25, %call16(__udivdi3)($gp)
 ; MMR3-NEXT:    jalr $25
 ; MMR3-NEXT:    nop
 ; MMR3-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
@@ -353,9 +352,8 @@ define signext i64 @udiv_i64(i64 signext %a, i64 signext %b) {
 ; MMR6-NEXT:    .cfi_def_cfa_offset 24
 ; MMR6-NEXT:    sw $ra, 20($sp) # 4-byte Folded Spill
 ; MMR6-NEXT:    .cfi_offset 31, -4
-; MMR6-NEXT:    addu $2, $2, $25
-; MMR6-NEXT:    lw $25, %call16(__udivdi3)($2)
-; MMR6-NEXT:    move $gp, $2
+; MMR6-NEXT:    addu $gp, $2, $25
+; MMR6-NEXT:    lw $25, %call16(__udivdi3)($gp)
 ; MMR6-NEXT:    jalr $25
 ; MMR6-NEXT:    lw $ra, 20($sp) # 4-byte Folded Reload
 ; MMR6-NEXT:    addiu $sp, $sp, 24
@@ -497,65 +495,59 @@ define signext i128 @udiv_i128(i128 signext %a, i128 signext %b) {
 ; MMR3:       # %bb.0: # %entry
 ; MMR3-NEXT:    lui $2, %hi(_gp_disp)
 ; MMR3-NEXT:    addiu $2, $2, %lo(_gp_disp)
-; MMR3-NEXT:    addiusp -48
-; MMR3-NEXT:    .cfi_def_cfa_offset 48
-; MMR3-NEXT:    sw $ra, 44($sp) # 4-byte Folded Spill
-; MMR3-NEXT:    swp $16, 36($sp)
+; MMR3-NEXT:    addiusp -40
+; MMR3-NEXT:    .cfi_def_cfa_offset 40
+; MMR3-NEXT:    sw $ra, 36($sp) # 4-byte Folded Spill
+; MMR3-NEXT:    sw $17, 32($sp) # 4-byte Folded Spill
 ; MMR3-NEXT:    .cfi_offset 31, -4
 ; MMR3-NEXT:    .cfi_offset 17, -8
-; MMR3-NEXT:    .cfi_offset 16, -12
-; MMR3-NEXT:    addu $16, $2, $25
+; MMR3-NEXT:    addu $gp, $2, $25
 ; MMR3-NEXT:    move $1, $7
-; MMR3-NEXT:    lw $7, 68($sp)
-; MMR3-NEXT:    lw $17, 72($sp)
-; MMR3-NEXT:    lw $3, 76($sp)
+; MMR3-NEXT:    lw $7, 60($sp)
+; MMR3-NEXT:    lw $17, 64($sp)
+; MMR3-NEXT:    lw $3, 68($sp)
 ; MMR3-NEXT:    move $2, $sp
 ; MMR3-NEXT:    sw16 $3, 28($2)
 ; MMR3-NEXT:    sw16 $17, 24($2)
 ; MMR3-NEXT:    sw16 $7, 20($2)
-; MMR3-NEXT:    lw $3, 64($sp)
+; MMR3-NEXT:    lw $3, 56($sp)
 ; MMR3-NEXT:    sw16 $3, 16($2)
-; MMR3-NEXT:    lw $25, %call16(__udivti3)($16)
+; MMR3-NEXT:    lw $25, %call16(__udivti3)($gp)
 ; MMR3-NEXT:    move $7, $1
-; MMR3-NEXT:    move $gp, $16
 ; MMR3-NEXT:    jalr $25
 ; MMR3-NEXT:    nop
-; MMR3-NEXT:    lwp $16, 36($sp)
-; MMR3-NEXT:    lw $ra, 44($sp) # 4-byte Folded Reload
-; MMR3-NEXT:    addiusp 48
+; MMR3-NEXT:    lw $17, 32($sp) # 4-byte Folded Reload
+; MMR3-NEXT:    lw $ra, 36($sp) # 4-byte Folded Reload
+; MMR3-NEXT:    addiusp 40
 ; MMR3-NEXT:    jrc $ra
 ;
 ; MMR6-LABEL: udiv_i128:
 ; MMR6:       # %bb.0: # %entry
 ; MMR6-NEXT:    lui $2, %hi(_gp_disp)
 ; MMR6-NEXT:    addiu $2, $2, %lo(_gp_disp)
-; MMR6-NEXT:    addiu $sp, $sp, -48
-; MMR6-NEXT:    .cfi_def_cfa_offset 48
-; MMR6-NEXT:    sw $ra, 44($sp) # 4-byte Folded Spill
-; MMR6-NEXT:    sw $17, 40($sp) # 4-byte Folded Spill
-; MMR6-NEXT:    sw $16, 36($sp) # 4-byte Folded Spill
+; MMR6-NEXT:    addiu $sp, $sp, -40
+; MMR6-NEXT:    .cfi_def_cfa_offset 40
+; MMR6-NEXT:    sw $ra, 36($sp) # 4-byte Folded Spill
+; MMR6-NEXT:    sw $17, 32($sp) # 4-byte Folded Spill
 ; MMR6-NEXT:    .cfi_offset 31, -4
 ; MMR6-NEXT:    .cfi_offset 17, -8
-; MMR6-NEXT:    .cfi_offset 16, -12
-; MMR6-NEXT:    addu $16, $2, $25
+; MMR6-NEXT:    addu $gp, $2, $25
 ; MMR6-NEXT:    move $1, $7
-; MMR6-NEXT:    lw $7, 68($sp)
-; MMR6-NEXT:    lw $17, 72($sp)
-; MMR6-NEXT:    lw $3, 76($sp)
+; MMR6-NEXT:    lw $7, 60($sp)
+; MMR6-NEXT:    lw $17, 64($sp)
+; MMR6-NEXT:    lw $3, 68($sp)
 ; MMR6-NEXT:    move $2, $sp
 ; MMR6-NEXT:    sw16 $3, 28($2)
 ; MMR6-NEXT:    sw16 $17, 24($2)
 ; MMR6-NEXT:    sw16 $7, 20($2)
-; MMR6-NEXT:    lw $3, 64($sp)
+; MMR6-NEXT:    lw $3, 56($sp)
 ; MMR6-NEXT:    sw16 $3, 16($2)
-; MMR6-NEXT:    lw $25, %call16(__udivti3)($16)
+; MMR6-NEXT:    lw $25, %call16(__udivti3)($gp)
 ; MMR6-NEXT:    move $7, $1
-; MMR6-NEXT:    move $gp, $16
 ; MMR6-NEXT:    jalr $25
-; MMR6-NEXT:    lw $16, 36($sp) # 4-byte Folded Reload
-; MMR6-NEXT:    lw $17, 40($sp) # 4-byte Folded Reload
-; MMR6-NEXT:    lw $ra, 44($sp) # 4-byte Folded Reload
-; MMR6-NEXT:    addiu $sp, $sp, 48
+; MMR6-NEXT:    lw $17, 32($sp) # 4-byte Folded Reload
+; MMR6-NEXT:    lw $ra, 36($sp) # 4-byte Folded Relo...
[truncated]

@vladimirradosavljevic
Copy link
Contributor Author

@qcolombet @arsenm @topperc PTAL.

@vladimirradosavljevic vladimirradosavljevic force-pushed the mcp_backward_users branch 2 times, most recently from 2519013 to a3daac1 Compare October 8, 2024 11:24
vladimirradosavljevic added a commit to matter-labs/era-compiler-llvm that referenced this pull request Oct 9, 2024
Before this patch, redundant COPY couldn't be removed
for the following case:
  $R0 = OP ...
  ... // Read of %R0
  $R1 = COPY killed $R0

This patch adds support for tracking the users of
the source register during backward propagation, so
that we can remove the redundant COPY in the above case
and optimize it to:
  $R1 = OP ...
  ... // Replace all uses of %R0 with $R1

Upstream PR:
llvm/llvm-project#111130

Signed-off-by: Vladimir Radosavljevic <[email protected]>
vladimirradosavljevic added a commit to matter-labs/era-compiler-llvm that referenced this pull request Oct 9, 2024
Before this patch, redundant COPY couldn't be removed
for the following case:
  $R0 = OP ...
  ... // Read of %R0
  $R1 = COPY killed $R0

This patch adds support for tracking the users of
the source register during backward propagation, so
that we can remove the redundant COPY in the above case
and optimize it to:
  $R1 = OP ...
  ... // Replace all uses of %R0 with $R1

Upstream PR:
llvm/llvm-project#111130

Signed-off-by: Vladimir Radosavljevic <[email protected]>
akiramenai pushed a commit to matter-labs/era-compiler-llvm that referenced this pull request Oct 10, 2024
Before this patch, redundant COPY couldn't be removed
for the following case:
  $R0 = OP ...
  ... // Read of %R0
  $R1 = COPY killed $R0

This patch adds support for tracking the users of
the source register during backward propagation, so
that we can remove the redundant COPY in the above case
and optimize it to:
  $R1 = OP ...
  ... // Replace all uses of %R0 with $R1

Upstream PR:
llvm/llvm-project#111130

Signed-off-by: Vladimir Radosavljevic <[email protected]>
@arsenm arsenm requested a review from qcolombet October 10, 2024 17:38
Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is one of these test changes a dedicated MIR test for this situation?

@qcolombet
Copy link
Collaborator

Is one of these test changes a dedicated MIR test for this situation?

Having a quick glance at the patch I don't think it is.

Could you add a few tests that checks specifically for the new pattern?
E.g.,

r0 = ..
// one of the use of r0 is not renameable
r1 = copy r0

and also

r0 = ..
// one of the use of r0 does not accept r1
r1 = copy r0

and so on.

Essentially add explicit test coverage with focused mir tests.

@qcolombet
Copy link
Collaborator

Note: I think you support all these cases, but I'd like to see them written down.

@vladimirradosavljevic
Copy link
Contributor Author

vladimirradosavljevic commented Oct 11, 2024

@arsenm @qcolombet I added dedicated MIR test to cover 3 cases:

  1. Normal case where transformation is performed.
  2. Where use is not renamable.
  3. Implicit use.

@vladimirradosavljevic
Copy link
Contributor Author

Ping.

Copy link
Collaborator

@qcolombet qcolombet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Before this patch, redundant COPY couldn't be removed
for the following case:
  $R0 = OP ...
  ... // Read of %R0
  $R1 = COPY killed $R0

This patch adds support for tracking the users of
the source register during backward propagation, so
that we can remove the redundant COPY in the above case
and optimize it to:
  $R1 = OP ...
  ... // Replace all uses of %R0 with $R1
@vladimirradosavljevic
Copy link
Contributor Author

Rebased and updated checks in midpoint-int.ll.
@qcolombet Thanks for the review! Could you please merge this PR on my behalf?

@qcolombet qcolombet merged commit 401d123 into llvm:main Oct 23, 2024
8 checks passed
Copy link

@vladimirradosavljevic Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

@frobtech frobtech mentioned this pull request Oct 25, 2024
NoumanAmir657 pushed a commit to NoumanAmir657/llvm-project that referenced this pull request Nov 4, 2024
…lvm#111130)

Before this patch, redundant COPY couldn't be removed for the following
case:
```
  $R0 = OP ...
  ... // Read of %R0
  $R1 = COPY killed $R0
```
This patch adds support for tracking the users of the source register
during backward propagation, so that we can remove the redundant COPY in
the above case and optimize it to:
```
  $R1 = OP ...
  ... // Replace all uses of %R0 with $R1
```
akiramenai pushed a commit to matter-labs/era-compiler-llvm that referenced this pull request Apr 8, 2025
Before this patch, redundant COPY couldn't be removed
for the following case:
  $R0 = OP ...
  ... // Read of %R0
  $R1 = COPY killed $R0

This patch adds support for tracking the users of
the source register during backward propagation, so
that we can remove the redundant COPY in the above case
and optimize it to:
  $R1 = OP ...
  ... // Replace all uses of %R0 with $R1

Upstream PR:
llvm/llvm-project#111130

Signed-off-by: Vladimir Radosavljevic <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants