Skip to content

[RegisterCoalescer] Fix up subreg lanemasks after rematerializing. #116191

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 19, 2024

Conversation

sdesmalen-arm
Copy link
Collaborator

In a situation like the following:

   undef %2.subreg = INST %1         ; DefMI (rematerializable),
                                     ; DefSubIdx = subreg
   %3 = COPY %2                      ; SrcIdx = DstIdx = 0
   .... = SOMEINSTR %3, %2

there are no subranges for %3 because the entire register is copied, but after rematerialization the subrange of the rematerialized value must be fixed up with the appropriate subranges for .subreg.

(To me this issue seemed a bit similar to the issue fixed by #96839, but then related to rematerialization)

In a situation like the following:

   undef %2.subreg = INST %1         ; DefMI (rematerializable),
                                     ; DefSubIdx = subreg
   %3 = COPY %2                      ; SrcIdx = DstIdx = 0
   .... = SOMEINSTR %3, %2

there are no subranges for %3 because the entire register is copied,
but after rematerialization the subrange of the rematerialized value
must be fixed up with the appropriate subranges for `.subreg`.
@llvmbot
Copy link
Member

llvmbot commented Nov 14, 2024

@llvm/pr-subscribers-backend-aarch64

Author: Sander de Smalen (sdesmalen-arm)

Changes

In a situation like the following:

   undef %2.subreg = INST %1         ; DefMI (rematerializable),
                                     ; DefSubIdx = subreg
   %3 = COPY %2                      ; SrcIdx = DstIdx = 0
   .... = SOMEINSTR %3, %2

there are no subranges for %3 because the entire register is copied, but after rematerialization the subrange of the rematerialized value must be fixed up with the appropriate subranges for .subreg.

(To me this issue seemed a bit similar to the issue fixed by #96839, but then related to rematerialization)


Full diff: https://github.com/llvm/llvm-project/pull/116191.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/RegisterCoalescer.cpp (+22)
  • (added) llvm/test/CodeGen/AArch64/register-coalesce-update-subranges-remat.mir (+38)
diff --git a/llvm/lib/CodeGen/RegisterCoalescer.cpp b/llvm/lib/CodeGen/RegisterCoalescer.cpp
index 2e1f498c090d1a..073ce367af1b85 100644
--- a/llvm/lib/CodeGen/RegisterCoalescer.cpp
+++ b/llvm/lib/CodeGen/RegisterCoalescer.cpp
@@ -1374,6 +1374,27 @@ bool RegisterCoalescer::reMaterializeTrivialDef(const CoalescerPair &CP,
   MachineInstr &NewMI = *std::prev(MII);
   NewMI.setDebugLoc(DL);
 
+  // In a situation like the following:
+  //
+  //    undef %2.subreg:reg = INST %1:reg         ; DefMI (rematerializable),
+  //                                              ; DefSubIdx = subreg
+  //    %3:reg = COPY %2                          ; SrcIdx = DstIdx = 0
+  //    .... = SOMEINSTR %3:reg
+  //
+  // there are no subranges for %3 so after rematerialization we need
+  // to explicitly create them. Undefined subranges are removed later on.
+  if (DstReg.isVirtual() && DefSubIdx && !CP.getSrcIdx() && !CP.getDstIdx() &&
+      MRI->shouldTrackSubRegLiveness(DstReg)) {
+    LiveInterval &DstInt = LIS->getInterval(DstReg);
+    if (!DstInt.hasSubRanges()) {
+      LaneBitmask FullMask = MRI->getMaxLaneMaskForVReg(DstReg);
+      LaneBitmask UsedLanes = TRI->getSubRegIndexLaneMask(DefSubIdx);
+      LaneBitmask UnusedLanes = FullMask & ~UsedLanes;
+      DstInt.createSubRangeFrom(LIS->getVNInfoAllocator(), UsedLanes, DstInt);
+      DstInt.createSubRangeFrom(LIS->getVNInfoAllocator(), UnusedLanes, DstInt);
+    }
+  }
+
   // In a situation like the following:
   //     %0:subreg = instr              ; DefMI, subreg = DstIdx
   //     %1        = copy %0:subreg ; CopyMI, SrcIdx = 0
@@ -1486,6 +1507,7 @@ bool RegisterCoalescer::reMaterializeTrivialDef(const CoalescerPair &CP,
         NewRC = TRI->getCommonSubClass(NewRC, DefRC);
       assert(NewRC && "subreg chosen for remat incompatible with instruction");
     }
+
     // Remap subranges to new lanemask and change register class.
     LiveInterval &DstInt = LIS->getInterval(DstReg);
     for (LiveInterval::SubRange &SR : DstInt.subranges()) {
diff --git a/llvm/test/CodeGen/AArch64/register-coalesce-update-subranges-remat.mir b/llvm/test/CodeGen/AArch64/register-coalesce-update-subranges-remat.mir
new file mode 100644
index 00000000000000..b61fa4be040070
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/register-coalesce-update-subranges-remat.mir
@@ -0,0 +1,38 @@
+# RUN: llc -mtriple=aarch64 -verify-machineinstrs -o - -run-pass=register-coalescer -aarch64-enable-subreg-liveness-tracking %s | FileCheck %s --check-prefix=CHECK
+# RUN: llc -mtriple=aarch64 -verify-machineinstrs -o /dev/null -run-pass=register-coalescer -aarch64-enable-subreg-liveness-tracking -debug-only=regalloc %s 2>&1 | FileCheck %s --check-prefix=CHECK-DBG
+# REQUIRES: asserts
+
+# CHECK-DBG: ********** REGISTER COALESCER **********
+# CHECK-DBG: ********** Function: test
+# CHECK-DBG: ********** JOINING INTERVALS ***********
+# CHECK-DBG: ********** INTERVALS **********
+# CHECK-DBG: %0 [16r,32r:0) 0@16r  weight:0.000000e+00
+# CHECK-DBG: %3 [48r,112r:0) 0@48r  L0000000000000040 [48r,112r:0) 0@48r  weight:0.000000e+00
+# CHECK-DBG: %4 [80r,112e:1)[112e,112d:0) 0@112e 1@80r  L0000000000000080 [112e,112d:0) 0@112e  L0000000000000040 [80r,112e:1)[112e,112d:0) 0@112e 1@80r  weight:0.000000e+00
+# CHECK-DBG: %5 [32r,112r:1)[112r,112d:0) 0@112r 1@32r  weight:0.000000e+00
+---
+name:            test
+tracksRegLiveness: true
+fixedStack:      []
+stack:
+  - { id: 0, name: '', type: default, offset: 0, size: 65, alignment: 16,
+      stack-id: default }
+body:             |
+  bb.0.entry:
+    ; CHECK-LABEL: name: test
+    ; CHECK: [[ADDXri:%[0-9]+]]:gpr64sp = ADDXri %stack.0, 0, 0
+    ; CHECK-NEXT: [[ADDXri1:%[0-9]+]]:gpr64common = nuw ADDXri [[ADDXri]], 64, 0
+    ; CHECK-NEXT: undef [[MOVi32imm:%[0-9]+]].sub_32:gpr64 = MOVi32imm 64
+    ; CHECK-NEXT: undef [[MOVi32imm1:%[0-9]+]].sub_32:gpr64 = MOVi32imm 64
+    ; CHECK-NEXT: dead [[ADDXri1]]:gpr64common, dead early-clobber [[MOVi32imm1]]:gpr64 = MOPSMemorySetPseudo [[ADDXri1]], [[MOVi32imm1]], [[MOVi32imm]], implicit-def dead $nzcv
+    ; CHECK-NEXT: RET_ReallyLR
+    %1:gpr64sp = ADDXri %stack.0, 0, 0
+    %2:gpr64common = nuw ADDXri killed %1, 64, 0
+    %3:gpr32 = MOVi32imm 64
+    %4:gpr64 = SUBREG_TO_REG 0, killed %3, %subreg.sub_32
+    %6:gpr64 = COPY %4
+    %5:gpr64common = COPY killed %2
+    dead %5:gpr64common, dead early-clobber %6:gpr64 = MOPSMemorySetPseudo %5, %6, %4, implicit-def dead $nzcv
+    RET_ReallyLR
+
+...

@llvmbot
Copy link
Member

llvmbot commented Nov 14, 2024

@llvm/pr-subscribers-llvm-regalloc

Author: Sander de Smalen (sdesmalen-arm)

Changes

In a situation like the following:

   undef %2.subreg = INST %1         ; DefMI (rematerializable),
                                     ; DefSubIdx = subreg
   %3 = COPY %2                      ; SrcIdx = DstIdx = 0
   .... = SOMEINSTR %3, %2

there are no subranges for %3 because the entire register is copied, but after rematerialization the subrange of the rematerialized value must be fixed up with the appropriate subranges for .subreg.

(To me this issue seemed a bit similar to the issue fixed by #96839, but then related to rematerialization)


Full diff: https://github.com/llvm/llvm-project/pull/116191.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/RegisterCoalescer.cpp (+22)
  • (added) llvm/test/CodeGen/AArch64/register-coalesce-update-subranges-remat.mir (+38)
diff --git a/llvm/lib/CodeGen/RegisterCoalescer.cpp b/llvm/lib/CodeGen/RegisterCoalescer.cpp
index 2e1f498c090d1a..073ce367af1b85 100644
--- a/llvm/lib/CodeGen/RegisterCoalescer.cpp
+++ b/llvm/lib/CodeGen/RegisterCoalescer.cpp
@@ -1374,6 +1374,27 @@ bool RegisterCoalescer::reMaterializeTrivialDef(const CoalescerPair &CP,
   MachineInstr &NewMI = *std::prev(MII);
   NewMI.setDebugLoc(DL);
 
+  // In a situation like the following:
+  //
+  //    undef %2.subreg:reg = INST %1:reg         ; DefMI (rematerializable),
+  //                                              ; DefSubIdx = subreg
+  //    %3:reg = COPY %2                          ; SrcIdx = DstIdx = 0
+  //    .... = SOMEINSTR %3:reg
+  //
+  // there are no subranges for %3 so after rematerialization we need
+  // to explicitly create them. Undefined subranges are removed later on.
+  if (DstReg.isVirtual() && DefSubIdx && !CP.getSrcIdx() && !CP.getDstIdx() &&
+      MRI->shouldTrackSubRegLiveness(DstReg)) {
+    LiveInterval &DstInt = LIS->getInterval(DstReg);
+    if (!DstInt.hasSubRanges()) {
+      LaneBitmask FullMask = MRI->getMaxLaneMaskForVReg(DstReg);
+      LaneBitmask UsedLanes = TRI->getSubRegIndexLaneMask(DefSubIdx);
+      LaneBitmask UnusedLanes = FullMask & ~UsedLanes;
+      DstInt.createSubRangeFrom(LIS->getVNInfoAllocator(), UsedLanes, DstInt);
+      DstInt.createSubRangeFrom(LIS->getVNInfoAllocator(), UnusedLanes, DstInt);
+    }
+  }
+
   // In a situation like the following:
   //     %0:subreg = instr              ; DefMI, subreg = DstIdx
   //     %1        = copy %0:subreg ; CopyMI, SrcIdx = 0
@@ -1486,6 +1507,7 @@ bool RegisterCoalescer::reMaterializeTrivialDef(const CoalescerPair &CP,
         NewRC = TRI->getCommonSubClass(NewRC, DefRC);
       assert(NewRC && "subreg chosen for remat incompatible with instruction");
     }
+
     // Remap subranges to new lanemask and change register class.
     LiveInterval &DstInt = LIS->getInterval(DstReg);
     for (LiveInterval::SubRange &SR : DstInt.subranges()) {
diff --git a/llvm/test/CodeGen/AArch64/register-coalesce-update-subranges-remat.mir b/llvm/test/CodeGen/AArch64/register-coalesce-update-subranges-remat.mir
new file mode 100644
index 00000000000000..b61fa4be040070
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/register-coalesce-update-subranges-remat.mir
@@ -0,0 +1,38 @@
+# RUN: llc -mtriple=aarch64 -verify-machineinstrs -o - -run-pass=register-coalescer -aarch64-enable-subreg-liveness-tracking %s | FileCheck %s --check-prefix=CHECK
+# RUN: llc -mtriple=aarch64 -verify-machineinstrs -o /dev/null -run-pass=register-coalescer -aarch64-enable-subreg-liveness-tracking -debug-only=regalloc %s 2>&1 | FileCheck %s --check-prefix=CHECK-DBG
+# REQUIRES: asserts
+
+# CHECK-DBG: ********** REGISTER COALESCER **********
+# CHECK-DBG: ********** Function: test
+# CHECK-DBG: ********** JOINING INTERVALS ***********
+# CHECK-DBG: ********** INTERVALS **********
+# CHECK-DBG: %0 [16r,32r:0) 0@16r  weight:0.000000e+00
+# CHECK-DBG: %3 [48r,112r:0) 0@48r  L0000000000000040 [48r,112r:0) 0@48r  weight:0.000000e+00
+# CHECK-DBG: %4 [80r,112e:1)[112e,112d:0) 0@112e 1@80r  L0000000000000080 [112e,112d:0) 0@112e  L0000000000000040 [80r,112e:1)[112e,112d:0) 0@112e 1@80r  weight:0.000000e+00
+# CHECK-DBG: %5 [32r,112r:1)[112r,112d:0) 0@112r 1@32r  weight:0.000000e+00
+---
+name:            test
+tracksRegLiveness: true
+fixedStack:      []
+stack:
+  - { id: 0, name: '', type: default, offset: 0, size: 65, alignment: 16,
+      stack-id: default }
+body:             |
+  bb.0.entry:
+    ; CHECK-LABEL: name: test
+    ; CHECK: [[ADDXri:%[0-9]+]]:gpr64sp = ADDXri %stack.0, 0, 0
+    ; CHECK-NEXT: [[ADDXri1:%[0-9]+]]:gpr64common = nuw ADDXri [[ADDXri]], 64, 0
+    ; CHECK-NEXT: undef [[MOVi32imm:%[0-9]+]].sub_32:gpr64 = MOVi32imm 64
+    ; CHECK-NEXT: undef [[MOVi32imm1:%[0-9]+]].sub_32:gpr64 = MOVi32imm 64
+    ; CHECK-NEXT: dead [[ADDXri1]]:gpr64common, dead early-clobber [[MOVi32imm1]]:gpr64 = MOPSMemorySetPseudo [[ADDXri1]], [[MOVi32imm1]], [[MOVi32imm]], implicit-def dead $nzcv
+    ; CHECK-NEXT: RET_ReallyLR
+    %1:gpr64sp = ADDXri %stack.0, 0, 0
+    %2:gpr64common = nuw ADDXri killed %1, 64, 0
+    %3:gpr32 = MOVi32imm 64
+    %4:gpr64 = SUBREG_TO_REG 0, killed %3, %subreg.sub_32
+    %6:gpr64 = COPY %4
+    %5:gpr64common = COPY killed %2
+    dead %5:gpr64common, dead early-clobber %6:gpr64 = MOPSMemorySetPseudo %5, %6, %4, implicit-def dead $nzcv
+    RET_ReallyLR
+
+...

@sdesmalen-arm sdesmalen-arm merged commit 3093b29 into llvm:main Nov 19, 2024
9 of 11 checks passed
vitalybuka added a commit that referenced this pull request Nov 22, 2024
…zing. (#116191)" (#117367)

To pass tests with #117307 revert.

This reverts commit 3093b29.
sdesmalen-arm added a commit that referenced this pull request Nov 28, 2024
…zing. (#116191)"

This patch can now reland after 318c69d relanded #114827.

This reverts commit 14a58a1.
sdesmalen-arm added a commit that referenced this pull request Jan 6, 2025
By moving the code a bit later, we can factor out some of the conditions as
those are now already tested, which will help when adding another fix
on top that uses `NewMI`'s subreg index (NewIdx).

This change is intended to be NFC.
sdesmalen-arm added a commit that referenced this pull request Jan 6, 2025
The code added in #116191 that updated the lanemasks for rematerialized
values, checked if DefMI's destination reg had a subreg index.
This seems to have missed the following case:

  %0:gpr32 = MOVi32imm 1
  %1:gpr64 = SUBREG_TO_REG 0, %0:gpr32, %subreg.sub_32

which during rematerialization would have the following variables set:

  DefMI = %0:gpr32 = MOVi32imm 1

  NewMI = %3.sub_32:gpr64 = MOVi32imm 1   (rematerialized value)

When checking whether the lanemasks need to be generated, considering
whether DefMI's destination has a subreg index is insufficient, we should
look at DefMI's subreg index instead.

The added tests are a bit more involved, because I was not able to
reconstruct the issue without having some control flow in the test.
These tests come from actual reproducers.
sdesmalen-arm added a commit that referenced this pull request Jan 7, 2025
By moving the code a bit later, we can factor out some of the conditions
as those are now already tested.
This will also be useful when adding another fix on top that uses
`NewMI`'s subreg index (to follow as a separate PR).

The change is intended to be NFC.
sdesmalen-arm added a commit that referenced this pull request Jan 7, 2025
The code added in #116191 that updated the lanemasks for rematerialized
values, checked if DefMI's destination reg had a subreg index.
This seems to have missed the following case:

  %0:gpr32 = MOVi32imm 1
  %1:gpr64 = SUBREG_TO_REG 0, %0:gpr32, %subreg.sub_32

which during rematerialization would have the following variables set:

  DefMI = %0:gpr32 = MOVi32imm 1

  NewMI = %3.sub_32:gpr64 = MOVi32imm 1   (rematerialized value)

When checking whether the lanemasks need to be generated, considering
whether DefMI's destination has a subreg index is insufficient, we should
look at DefMI's subreg index instead.

The added tests are a bit more involved, because I was not able to
reconstruct the issue without having some control flow in the test.
These tests come from actual reproducers.
sdesmalen-arm added a commit that referenced this pull request Jan 7, 2025
…21780)

The code added in #116191 that updated the lanemasks for rematerialized
values checked if `DefMI`'s destination register had a subreg index.
This seems to have missed the following case:

```
  %0:gpr32 = MOVi32imm 1
  %1:gpr64 = SUBREG_TO_REG 0, %0:gpr32, %subreg.sub_32
```

which during rematerialization would have the following variables set:

```
  DefMI = %0:gpr32 = MOVi32imm 1

  NewMI = %3.sub_32:gpr64 = MOVi32imm 1   (rematerialized value)
```

When checking whether the lanemasks need to be generated, considering
whether DefMI's destination has a subreg index is insufficient, we
should look at DefMI's subreg index instead.

The added tests are a bit more involved, because I was not able to
reconstruct the issue without having some control flow in the test.
These tests come from actual reproducers.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants