-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[RegisterCoalescer] Fix up subreg lanemasks after rematerializing. #116191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
In a situation like the following: undef %2.subreg = INST %1 ; DefMI (rematerializable), ; DefSubIdx = subreg %3 = COPY %2 ; SrcIdx = DstIdx = 0 .... = SOMEINSTR %3, %2 there are no subranges for %3 because the entire register is copied, but after rematerialization the subrange of the rematerialized value must be fixed up with the appropriate subranges for `.subreg`.
@llvm/pr-subscribers-backend-aarch64 Author: Sander de Smalen (sdesmalen-arm) ChangesIn a situation like the following:
there are no subranges for (To me this issue seemed a bit similar to the issue fixed by #96839, but then related to rematerialization) Full diff: https://github.com/llvm/llvm-project/pull/116191.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/RegisterCoalescer.cpp b/llvm/lib/CodeGen/RegisterCoalescer.cpp
index 2e1f498c090d1a..073ce367af1b85 100644
--- a/llvm/lib/CodeGen/RegisterCoalescer.cpp
+++ b/llvm/lib/CodeGen/RegisterCoalescer.cpp
@@ -1374,6 +1374,27 @@ bool RegisterCoalescer::reMaterializeTrivialDef(const CoalescerPair &CP,
MachineInstr &NewMI = *std::prev(MII);
NewMI.setDebugLoc(DL);
+ // In a situation like the following:
+ //
+ // undef %2.subreg:reg = INST %1:reg ; DefMI (rematerializable),
+ // ; DefSubIdx = subreg
+ // %3:reg = COPY %2 ; SrcIdx = DstIdx = 0
+ // .... = SOMEINSTR %3:reg
+ //
+ // there are no subranges for %3 so after rematerialization we need
+ // to explicitly create them. Undefined subranges are removed later on.
+ if (DstReg.isVirtual() && DefSubIdx && !CP.getSrcIdx() && !CP.getDstIdx() &&
+ MRI->shouldTrackSubRegLiveness(DstReg)) {
+ LiveInterval &DstInt = LIS->getInterval(DstReg);
+ if (!DstInt.hasSubRanges()) {
+ LaneBitmask FullMask = MRI->getMaxLaneMaskForVReg(DstReg);
+ LaneBitmask UsedLanes = TRI->getSubRegIndexLaneMask(DefSubIdx);
+ LaneBitmask UnusedLanes = FullMask & ~UsedLanes;
+ DstInt.createSubRangeFrom(LIS->getVNInfoAllocator(), UsedLanes, DstInt);
+ DstInt.createSubRangeFrom(LIS->getVNInfoAllocator(), UnusedLanes, DstInt);
+ }
+ }
+
// In a situation like the following:
// %0:subreg = instr ; DefMI, subreg = DstIdx
// %1 = copy %0:subreg ; CopyMI, SrcIdx = 0
@@ -1486,6 +1507,7 @@ bool RegisterCoalescer::reMaterializeTrivialDef(const CoalescerPair &CP,
NewRC = TRI->getCommonSubClass(NewRC, DefRC);
assert(NewRC && "subreg chosen for remat incompatible with instruction");
}
+
// Remap subranges to new lanemask and change register class.
LiveInterval &DstInt = LIS->getInterval(DstReg);
for (LiveInterval::SubRange &SR : DstInt.subranges()) {
diff --git a/llvm/test/CodeGen/AArch64/register-coalesce-update-subranges-remat.mir b/llvm/test/CodeGen/AArch64/register-coalesce-update-subranges-remat.mir
new file mode 100644
index 00000000000000..b61fa4be040070
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/register-coalesce-update-subranges-remat.mir
@@ -0,0 +1,38 @@
+# RUN: llc -mtriple=aarch64 -verify-machineinstrs -o - -run-pass=register-coalescer -aarch64-enable-subreg-liveness-tracking %s | FileCheck %s --check-prefix=CHECK
+# RUN: llc -mtriple=aarch64 -verify-machineinstrs -o /dev/null -run-pass=register-coalescer -aarch64-enable-subreg-liveness-tracking -debug-only=regalloc %s 2>&1 | FileCheck %s --check-prefix=CHECK-DBG
+# REQUIRES: asserts
+
+# CHECK-DBG: ********** REGISTER COALESCER **********
+# CHECK-DBG: ********** Function: test
+# CHECK-DBG: ********** JOINING INTERVALS ***********
+# CHECK-DBG: ********** INTERVALS **********
+# CHECK-DBG: %0 [16r,32r:0) 0@16r weight:0.000000e+00
+# CHECK-DBG: %3 [48r,112r:0) 0@48r L0000000000000040 [48r,112r:0) 0@48r weight:0.000000e+00
+# CHECK-DBG: %4 [80r,112e:1)[112e,112d:0) 0@112e 1@80r L0000000000000080 [112e,112d:0) 0@112e L0000000000000040 [80r,112e:1)[112e,112d:0) 0@112e 1@80r weight:0.000000e+00
+# CHECK-DBG: %5 [32r,112r:1)[112r,112d:0) 0@112r 1@32r weight:0.000000e+00
+---
+name: test
+tracksRegLiveness: true
+fixedStack: []
+stack:
+ - { id: 0, name: '', type: default, offset: 0, size: 65, alignment: 16,
+ stack-id: default }
+body: |
+ bb.0.entry:
+ ; CHECK-LABEL: name: test
+ ; CHECK: [[ADDXri:%[0-9]+]]:gpr64sp = ADDXri %stack.0, 0, 0
+ ; CHECK-NEXT: [[ADDXri1:%[0-9]+]]:gpr64common = nuw ADDXri [[ADDXri]], 64, 0
+ ; CHECK-NEXT: undef [[MOVi32imm:%[0-9]+]].sub_32:gpr64 = MOVi32imm 64
+ ; CHECK-NEXT: undef [[MOVi32imm1:%[0-9]+]].sub_32:gpr64 = MOVi32imm 64
+ ; CHECK-NEXT: dead [[ADDXri1]]:gpr64common, dead early-clobber [[MOVi32imm1]]:gpr64 = MOPSMemorySetPseudo [[ADDXri1]], [[MOVi32imm1]], [[MOVi32imm]], implicit-def dead $nzcv
+ ; CHECK-NEXT: RET_ReallyLR
+ %1:gpr64sp = ADDXri %stack.0, 0, 0
+ %2:gpr64common = nuw ADDXri killed %1, 64, 0
+ %3:gpr32 = MOVi32imm 64
+ %4:gpr64 = SUBREG_TO_REG 0, killed %3, %subreg.sub_32
+ %6:gpr64 = COPY %4
+ %5:gpr64common = COPY killed %2
+ dead %5:gpr64common, dead early-clobber %6:gpr64 = MOPSMemorySetPseudo %5, %6, %4, implicit-def dead $nzcv
+ RET_ReallyLR
+
+...
|
@llvm/pr-subscribers-llvm-regalloc Author: Sander de Smalen (sdesmalen-arm) ChangesIn a situation like the following:
there are no subranges for (To me this issue seemed a bit similar to the issue fixed by #96839, but then related to rematerialization) Full diff: https://github.com/llvm/llvm-project/pull/116191.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/RegisterCoalescer.cpp b/llvm/lib/CodeGen/RegisterCoalescer.cpp
index 2e1f498c090d1a..073ce367af1b85 100644
--- a/llvm/lib/CodeGen/RegisterCoalescer.cpp
+++ b/llvm/lib/CodeGen/RegisterCoalescer.cpp
@@ -1374,6 +1374,27 @@ bool RegisterCoalescer::reMaterializeTrivialDef(const CoalescerPair &CP,
MachineInstr &NewMI = *std::prev(MII);
NewMI.setDebugLoc(DL);
+ // In a situation like the following:
+ //
+ // undef %2.subreg:reg = INST %1:reg ; DefMI (rematerializable),
+ // ; DefSubIdx = subreg
+ // %3:reg = COPY %2 ; SrcIdx = DstIdx = 0
+ // .... = SOMEINSTR %3:reg
+ //
+ // there are no subranges for %3 so after rematerialization we need
+ // to explicitly create them. Undefined subranges are removed later on.
+ if (DstReg.isVirtual() && DefSubIdx && !CP.getSrcIdx() && !CP.getDstIdx() &&
+ MRI->shouldTrackSubRegLiveness(DstReg)) {
+ LiveInterval &DstInt = LIS->getInterval(DstReg);
+ if (!DstInt.hasSubRanges()) {
+ LaneBitmask FullMask = MRI->getMaxLaneMaskForVReg(DstReg);
+ LaneBitmask UsedLanes = TRI->getSubRegIndexLaneMask(DefSubIdx);
+ LaneBitmask UnusedLanes = FullMask & ~UsedLanes;
+ DstInt.createSubRangeFrom(LIS->getVNInfoAllocator(), UsedLanes, DstInt);
+ DstInt.createSubRangeFrom(LIS->getVNInfoAllocator(), UnusedLanes, DstInt);
+ }
+ }
+
// In a situation like the following:
// %0:subreg = instr ; DefMI, subreg = DstIdx
// %1 = copy %0:subreg ; CopyMI, SrcIdx = 0
@@ -1486,6 +1507,7 @@ bool RegisterCoalescer::reMaterializeTrivialDef(const CoalescerPair &CP,
NewRC = TRI->getCommonSubClass(NewRC, DefRC);
assert(NewRC && "subreg chosen for remat incompatible with instruction");
}
+
// Remap subranges to new lanemask and change register class.
LiveInterval &DstInt = LIS->getInterval(DstReg);
for (LiveInterval::SubRange &SR : DstInt.subranges()) {
diff --git a/llvm/test/CodeGen/AArch64/register-coalesce-update-subranges-remat.mir b/llvm/test/CodeGen/AArch64/register-coalesce-update-subranges-remat.mir
new file mode 100644
index 00000000000000..b61fa4be040070
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/register-coalesce-update-subranges-remat.mir
@@ -0,0 +1,38 @@
+# RUN: llc -mtriple=aarch64 -verify-machineinstrs -o - -run-pass=register-coalescer -aarch64-enable-subreg-liveness-tracking %s | FileCheck %s --check-prefix=CHECK
+# RUN: llc -mtriple=aarch64 -verify-machineinstrs -o /dev/null -run-pass=register-coalescer -aarch64-enable-subreg-liveness-tracking -debug-only=regalloc %s 2>&1 | FileCheck %s --check-prefix=CHECK-DBG
+# REQUIRES: asserts
+
+# CHECK-DBG: ********** REGISTER COALESCER **********
+# CHECK-DBG: ********** Function: test
+# CHECK-DBG: ********** JOINING INTERVALS ***********
+# CHECK-DBG: ********** INTERVALS **********
+# CHECK-DBG: %0 [16r,32r:0) 0@16r weight:0.000000e+00
+# CHECK-DBG: %3 [48r,112r:0) 0@48r L0000000000000040 [48r,112r:0) 0@48r weight:0.000000e+00
+# CHECK-DBG: %4 [80r,112e:1)[112e,112d:0) 0@112e 1@80r L0000000000000080 [112e,112d:0) 0@112e L0000000000000040 [80r,112e:1)[112e,112d:0) 0@112e 1@80r weight:0.000000e+00
+# CHECK-DBG: %5 [32r,112r:1)[112r,112d:0) 0@112r 1@32r weight:0.000000e+00
+---
+name: test
+tracksRegLiveness: true
+fixedStack: []
+stack:
+ - { id: 0, name: '', type: default, offset: 0, size: 65, alignment: 16,
+ stack-id: default }
+body: |
+ bb.0.entry:
+ ; CHECK-LABEL: name: test
+ ; CHECK: [[ADDXri:%[0-9]+]]:gpr64sp = ADDXri %stack.0, 0, 0
+ ; CHECK-NEXT: [[ADDXri1:%[0-9]+]]:gpr64common = nuw ADDXri [[ADDXri]], 64, 0
+ ; CHECK-NEXT: undef [[MOVi32imm:%[0-9]+]].sub_32:gpr64 = MOVi32imm 64
+ ; CHECK-NEXT: undef [[MOVi32imm1:%[0-9]+]].sub_32:gpr64 = MOVi32imm 64
+ ; CHECK-NEXT: dead [[ADDXri1]]:gpr64common, dead early-clobber [[MOVi32imm1]]:gpr64 = MOPSMemorySetPseudo [[ADDXri1]], [[MOVi32imm1]], [[MOVi32imm]], implicit-def dead $nzcv
+ ; CHECK-NEXT: RET_ReallyLR
+ %1:gpr64sp = ADDXri %stack.0, 0, 0
+ %2:gpr64common = nuw ADDXri killed %1, 64, 0
+ %3:gpr32 = MOVi32imm 64
+ %4:gpr64 = SUBREG_TO_REG 0, killed %3, %subreg.sub_32
+ %6:gpr64 = COPY %4
+ %5:gpr64common = COPY killed %2
+ dead %5:gpr64common, dead early-clobber %6:gpr64 = MOPSMemorySetPseudo %5, %6, %4, implicit-def dead $nzcv
+ RET_ReallyLR
+
+...
|
By moving the code a bit later, we can factor out some of the conditions as those are now already tested, which will help when adding another fix on top that uses `NewMI`'s subreg index (NewIdx). This change is intended to be NFC.
The code added in #116191 that updated the lanemasks for rematerialized values, checked if DefMI's destination reg had a subreg index. This seems to have missed the following case: %0:gpr32 = MOVi32imm 1 %1:gpr64 = SUBREG_TO_REG 0, %0:gpr32, %subreg.sub_32 which during rematerialization would have the following variables set: DefMI = %0:gpr32 = MOVi32imm 1 NewMI = %3.sub_32:gpr64 = MOVi32imm 1 (rematerialized value) When checking whether the lanemasks need to be generated, considering whether DefMI's destination has a subreg index is insufficient, we should look at DefMI's subreg index instead. The added tests are a bit more involved, because I was not able to reconstruct the issue without having some control flow in the test. These tests come from actual reproducers.
By moving the code a bit later, we can factor out some of the conditions as those are now already tested. This will also be useful when adding another fix on top that uses `NewMI`'s subreg index (to follow as a separate PR). The change is intended to be NFC.
The code added in #116191 that updated the lanemasks for rematerialized values, checked if DefMI's destination reg had a subreg index. This seems to have missed the following case: %0:gpr32 = MOVi32imm 1 %1:gpr64 = SUBREG_TO_REG 0, %0:gpr32, %subreg.sub_32 which during rematerialization would have the following variables set: DefMI = %0:gpr32 = MOVi32imm 1 NewMI = %3.sub_32:gpr64 = MOVi32imm 1 (rematerialized value) When checking whether the lanemasks need to be generated, considering whether DefMI's destination has a subreg index is insufficient, we should look at DefMI's subreg index instead. The added tests are a bit more involved, because I was not able to reconstruct the issue without having some control flow in the test. These tests come from actual reproducers.
…21780) The code added in #116191 that updated the lanemasks for rematerialized values checked if `DefMI`'s destination register had a subreg index. This seems to have missed the following case: ``` %0:gpr32 = MOVi32imm 1 %1:gpr64 = SUBREG_TO_REG 0, %0:gpr32, %subreg.sub_32 ``` which during rematerialization would have the following variables set: ``` DefMI = %0:gpr32 = MOVi32imm 1 NewMI = %3.sub_32:gpr64 = MOVi32imm 1 (rematerialized value) ``` When checking whether the lanemasks need to be generated, considering whether DefMI's destination has a subreg index is insufficient, we should look at DefMI's subreg index instead. The added tests are a bit more involved, because I was not able to reconstruct the issue without having some control flow in the test. These tests come from actual reproducers.
In a situation like the following:
there are no subranges for
%3
because the entire register is copied, but after rematerialization the subrange of the rematerialized value must be fixed up with the appropriate subranges for.subreg
.(To me this issue seemed a bit similar to the issue fixed by #96839, but then related to rematerialization)