[AMDGPU][True16][CodeGen] fix moveToVALU with proper subreg access in true16 #132089

broxigarchen · 2025-03-19T19:58:06Z

There are V2S copies between vpgr16 and spgr32 in true16 mode. This is caused by vgpr16 and sgpr32 both selectable by 16bit src in ISel.

When a V2S copy and its useMI are lowered to VALU, this patch check

If the generated new VALU is used by a true16 inst. Add subreg access if necessary.
Legalize the V2S copy by replacing it to subreg_to_reg

an example MIR looks like:

%2:sgpr_32 = COPY %1:vgpr_16
%3:sgpr_32 = S_OR_B32 %2:sgpr_32, ...
%4:vgpr_16 = V_ADD_F16_t16 %3:sgpr_32, ...

currently lowered to

%2:vgpr_32 = COPY %1:vgpr_16
%3:vgpr_32 = V_OR_B32 %2:vgpr_32, ...
%4:vgpr_16 = V_ADD_F16_t16 %3:vgpr_32, ...

after this patch

%2:vgpr_32 = SUBREG_TO_REG 0, %1:vgpr_16, lo16
%3:vgpr_32 = V_OR_B32 %2:vgpr_32, ...
%4:vgpr_16 = V_ADD_F16_t16 %3.lo16:vgpr_32, ...

llvmbot · 2025-03-19T20:26:40Z

@llvm/pr-subscribers-backend-amdgpu

Author: Brox Chen (broxigarchen)

Changes

When a SGPR copy is lowered to a VALU, check if the new VALU instruction is used by a true16 instructions. Add subreg access if necessary.

Full diff: https://github.com/llvm/llvm-project/pull/132089.diff

2 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SIInstrInfo.cpp (+16)
(modified) llvm/test/CodeGen/AMDGPU/fneg-combines.f16.ll (+1-1)

diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index fb791c8342282..5f2bd507d1767 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -7835,6 +7835,22 @@ void SIInstrInfo::moveToVALUImpl(SIInstrWorklist &Worklist,
     assert(NewDstRC);
     NewDstReg = MRI.createVirtualRegister(NewDstRC);
     MRI.replaceRegWith(DstReg, NewDstReg);
+
+    // Check useMI of NewInstr. If used by a true16 instruction,
+    // add a lo16 subreg access if size mismatched
+    if (ST.useRealTrue16Insts() && NewDstRC == &AMDGPU::VGPR_32RegClass) {
+      for (MachineRegisterInfo::use_iterator I = MRI.use_begin(NewDstReg),
+                                             E = MRI.use_end();
+           I != E; ++I) {
+        MachineInstr &UseMI = *I->getParent();
+        unsigned UseMIOpcode = UseMI.getOpcode();
+        if (AMDGPU::isTrue16Inst(UseMIOpcode) &&
+            (16 ==
+             RI.getRegSizeInBits(*getOpRegClass(UseMI, I.getOperandNo())))) {
+          I->setSubReg(AMDGPU::lo16);
+        }
+      }
+    }
   }
   fixImplicitOperands(*NewInstr);
   // Legalize the operands
diff --git a/llvm/test/CodeGen/AMDGPU/fneg-combines.f16.ll b/llvm/test/CodeGen/AMDGPU/fneg-combines.f16.ll
index 5ea39997938ad..4f6b334ec0819 100644
--- a/llvm/test/CodeGen/AMDGPU/fneg-combines.f16.ll
+++ b/llvm/test/CodeGen/AMDGPU/fneg-combines.f16.ll
@@ -699,7 +699,7 @@ define amdgpu_ps half @fneg_fadd_0_f16(half inreg %tmp2, half inreg %tmp6, <4 x
 ; GFX11-SAFE-TRUE16-NEXT:    v_cmp_ngt_f16_e32 vcc_lo, s0, v0.l
 ; GFX11-SAFE-TRUE16-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_1)
 ; GFX11-SAFE-TRUE16-NEXT:    v_xor_b32_e32 v0, 0x8000, v1
-; GFX11-SAFE-TRUE16-NEXT:    v_cndmask_b16 v0.l, v0/*Invalid register, operand has 'VS_16' register class*/, s0, vcc_lo
+; GFX11-SAFE-TRUE16-NEXT:    v_cndmask_b16 v0.l, v0.l, s0, vcc_lo
 ; GFX11-SAFE-TRUE16-NEXT:    s_delay_alu instid0(VALU_DEP_1)
 ; GFX11-SAFE-TRUE16-NEXT:    v_cmp_nlt_f16_e32 vcc_lo, 0, v0.l
 ; GFX11-SAFE-TRUE16-NEXT:    v_cndmask_b16 v0.l, 0x7e00, 0, vcc_lo

broxigarchen · 2025-03-19T20:52:52Z

We probably should also fix the bad copy %2:vgpr_32 = COPY %1:vgpr_16 and select t16 VALU directly

changing this PR to draft for now

broxigarchen · 2025-03-20T17:23:52Z

It seems it takes more work to select t16 VALU during moveToVALU. Can be a seperate patch. Currently just fix the bad copy %2:vgpr_32 = COPY %1:vgpr_16.

Close this patch #131859 and proposed this as an alternative fix

broxigarchen · 2025-03-24T16:48:18Z

rebased

broxigarchen · 2025-03-25T14:10:55Z

Gentle ping!

Sisyph

There was a comment on the original PR #131859 from @arsenm that 'you should be able to get this correct from the start'. I don't think we can get it right from the start, because the core issue is putting both 16 and 32 bit values into SGPR32.

So this patch and #131859 are different ways of fixing that up. This patch is better in one way because it attaches proper subregisters, where the other one doesn't. But this patch adds more special purpose code.

I see we don't have exiting uses of SUBREG_TO_REG in the AMDGPU backend. Where is that lowered to a target instruction? Do we expect passes seeing it to understand it? If so, this patch seems ok to me.

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-f16-true16.mir

broxigarchen · 2025-03-26T15:22:00Z

There was a comment on the original PR #131859 from @arsenm that 'you should be able to get this correct from the start'. I don't think we can get it right from the start, because the core issue is putting both 16 and 32 bit values into SGPR32.

So this patch and #131859 are different ways of fixing that up. This patch is better in one way because it attaches proper subregisters, where the other one doesn't. But this patch adds more special purpose code.

I see we don't have exiting uses of SUBREG_TO_REG in the AMDGPU backend. Where is that lowered to a target instruction? Do we expect passes seeing it to understand it? If so, this patch seems ok to me.

Hi Joe. SUBREG_TO_REG is lowered to copy in a codegen pass https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/ExpandPostRAPseudos.cpp. Also it's understandable by other codegen pass such as RA and coalescer

Sisyph

LGTM

broxigarchen · 2025-04-01T16:40:14Z

rebased

… true16 (llvm#132089) There are V2S copies between vpgr16 and spgr32 in true16 mode. This is caused by vgpr16 and sgpr32 both selectable by 16bit src in ISel. When a V2S copy and its useMI are lowered to VALU, this patch check 1. If the generated new VALU is used by a true16 inst. Add subreg access if necessary. 2. Legalize the V2S copy by replacing it to subreg_to_reg an example MIR looks like: ``` %2:sgpr_32 = COPY %1:vgpr_16 %3:sgpr_32 = S_OR_B32 %2:sgpr_32, ... %4:vgpr_16 = V_ADD_F16_t16 %3:sgpr_32, ... ``` currently lowered to ``` %2:vgpr_32 = COPY %1:vgpr_16 %3:vgpr_32 = V_OR_B32 %2:vgpr_32, ... %4:vgpr_16 = V_ADD_F16_t16 %3:vgpr_32, ... ``` after this patch ``` %2:vgpr_32 = SUBREG_TO_REG 0, %1:vgpr_16, lo16 %3:vgpr_32 = V_OR_B32 %2:vgpr_32, ... %4:vgpr_16 = V_ADD_F16_t16 %3.lo16:vgpr_32, ... ```

…LU (#133985) This is a follow up PR from #132089. When a V2S copy and its useMI are lowered to VALU, this patch check: If the generated new VALU is a true16 inst. Add subreg access on all operands if necessary. an example MIR looks like: ``` %1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0 ... %2:sreg_32 = COPY %1:vgpr_32 %3:sreg_32 = S_FLOOR_F16 %2:sreg_32, ... ``` currently lowered to ``` %1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0 ... %2:vgpr_16 = V_FLOOR_F16_t16_e64 0, %1:vgpr_32, 0, 0, 0 ... ``` after this patch ``` %1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0 ... %2:vgpr_16 = V_FLOOR_F16_t16_e64 0, %1.lo16:vgpr_32, 0, 0, 0 ... ```

… SALU to VALU (#133985) This is a follow up PR from llvm/llvm-project#132089. When a V2S copy and its useMI are lowered to VALU, this patch check: If the generated new VALU is a true16 inst. Add subreg access on all operands if necessary. an example MIR looks like: ``` %1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0 ... %2:sreg_32 = COPY %1:vgpr_32 %3:sreg_32 = S_FLOOR_F16 %2:sreg_32, ... ``` currently lowered to ``` %1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0 ... %2:vgpr_16 = V_FLOOR_F16_t16_e64 0, %1:vgpr_32, 0, 0, 0 ... ``` after this patch ``` %1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0 ... %2:vgpr_16 = V_FLOOR_F16_t16_e64 0, %1.lo16:vgpr_32, 0, 0, 0 ... ```

broxigarchen changed the title ~~fix true16 usemi~~ [AMDGPU][True16][CodeGen] fix moveToVALU with proper subreg access in true16 Mar 19, 2025

broxigarchen force-pushed the main-merge-true16-codegen-fix-spgr-lower branch from 54fa2b0 to 5139e2f Compare March 19, 2025 20:24

broxigarchen marked this pull request as ready for review March 19, 2025 20:26

broxigarchen requested a review from kosarev March 19, 2025 20:26

llvmbot added the backend:AMDGPU label Mar 19, 2025

broxigarchen requested review from arsenm, Sisyph and jayfoad March 19, 2025 20:26

broxigarchen mentioned this pull request Mar 19, 2025

[AMDGPU][True16][CodeGen] fix moveToVALU with proper subreg access in true16 #131859

Closed

broxigarchen marked this pull request as draft March 19, 2025 20:33

broxigarchen force-pushed the main-merge-true16-codegen-fix-spgr-lower branch from 5139e2f to 5f08d52 Compare March 20, 2025 17:18

broxigarchen marked this pull request as ready for review March 20, 2025 17:36

broxigarchen force-pushed the main-merge-true16-codegen-fix-spgr-lower branch from 2945f24 to ee658a1 Compare March 24, 2025 16:48

Sisyph reviewed Mar 26, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp Show resolved Hide resolved

llvm/test/CodeGen/AMDGPU/fix-sgpr-copies-f16-true16.mir Show resolved Hide resolved

Sisyph approved these changes Mar 27, 2025

View reviewed changes

broxigarchen added 2 commits April 1, 2025 11:04

fix moveToVALU in true16

d78f484

update comments

2c920f5

broxigarchen force-pushed the main-merge-true16-codegen-fix-spgr-lower branch from 985f5ce to 2c920f5 Compare April 1, 2025 16:39

broxigarchen merged commit dd1d41f into llvm:main Apr 1, 2025
6 of 10 checks passed

broxigarchen mentioned this pull request Apr 1, 2025

[AMDGPU][True16][CodeGen] legalize operands when move16bit SALU to VALU #133985

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU][True16][CodeGen] fix moveToVALU with proper subreg access in true16 #132089

[AMDGPU][True16][CodeGen] fix moveToVALU with proper subreg access in true16 #132089

Uh oh!

broxigarchen commented Mar 19, 2025 •

edited

Loading

Uh oh!

llvmbot commented Mar 19, 2025

Uh oh!

broxigarchen commented Mar 19, 2025 •

edited

Loading

Uh oh!

broxigarchen commented Mar 20, 2025 •

edited

Loading

Uh oh!

broxigarchen commented Mar 24, 2025

Uh oh!

broxigarchen commented Mar 25, 2025

Uh oh!

Sisyph left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

broxigarchen commented Mar 26, 2025 •

edited

Loading

Uh oh!

Sisyph left a comment

Uh oh!

broxigarchen commented Apr 1, 2025

Uh oh!

Uh oh!

Uh oh!

[AMDGPU][True16][CodeGen] fix moveToVALU with proper subreg access in true16 #132089

[AMDGPU][True16][CodeGen] fix moveToVALU with proper subreg access in true16 #132089

Uh oh!

Conversation

broxigarchen commented Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Mar 19, 2025

Uh oh!

broxigarchen commented Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

broxigarchen commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

broxigarchen commented Mar 24, 2025

Uh oh!

broxigarchen commented Mar 25, 2025

Uh oh!

Sisyph left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

broxigarchen commented Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Sisyph left a comment

Choose a reason for hiding this comment

Uh oh!

broxigarchen commented Apr 1, 2025

Uh oh!

Uh oh!

Uh oh!

broxigarchen commented Mar 19, 2025 •

edited

Loading

broxigarchen commented Mar 19, 2025 •

edited

Loading

broxigarchen commented Mar 20, 2025 •

edited

Loading

broxigarchen commented Mar 26, 2025 •

edited

Loading