Skip to content

[AMDGPU][True16][CodeGen] fix moveToVALU with proper subreg access in true16 #132089

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

broxigarchen
Copy link
Contributor

@broxigarchen broxigarchen commented Mar 19, 2025

There are V2S copies between vpgr16 and spgr32 in true16 mode. This is caused by vgpr16 and sgpr32 both selectable by 16bit src in ISel.

When a V2S copy and its useMI are lowered to VALU, this patch check

  1. If the generated new VALU is used by a true16 inst. Add subreg access if necessary.
  2. Legalize the V2S copy by replacing it to subreg_to_reg

an example MIR looks like:

%2:sgpr_32 = COPY %1:vgpr_16
%3:sgpr_32 = S_OR_B32 %2:sgpr_32, ...
%4:vgpr_16 = V_ADD_F16_t16 %3:sgpr_32, ...

currently lowered to

%2:vgpr_32 = COPY %1:vgpr_16
%3:vgpr_32 = V_OR_B32 %2:vgpr_32, ...
%4:vgpr_16 = V_ADD_F16_t16 %3:vgpr_32, ...

after this patch

%2:vgpr_32 = SUBREG_TO_REG 0, %1:vgpr_16, lo16
%3:vgpr_32 = V_OR_B32 %2:vgpr_32, ...
%4:vgpr_16 = V_ADD_F16_t16 %3.lo16:vgpr_32, ...

@broxigarchen broxigarchen changed the title fix true16 usemi [AMDGPU][True16][CodeGen] fix moveToVALU with proper subreg access in true16 Mar 19, 2025
@broxigarchen broxigarchen force-pushed the main-merge-true16-codegen-fix-spgr-lower branch from 54fa2b0 to 5139e2f Compare March 19, 2025 20:24
@broxigarchen broxigarchen marked this pull request as ready for review March 19, 2025 20:26
@broxigarchen broxigarchen requested a review from kosarev March 19, 2025 20:26
@llvmbot
Copy link
Member

llvmbot commented Mar 19, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Brox Chen (broxigarchen)

Changes

When a SGPR copy is lowered to a VALU, check if the new VALU instruction is used by a true16 instructions. Add subreg access if necessary.


Full diff: https://github.com/llvm/llvm-project/pull/132089.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.cpp (+16)
  • (modified) llvm/test/CodeGen/AMDGPU/fneg-combines.f16.ll (+1-1)
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index fb791c8342282..5f2bd507d1767 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -7835,6 +7835,22 @@ void SIInstrInfo::moveToVALUImpl(SIInstrWorklist &Worklist,
     assert(NewDstRC);
     NewDstReg = MRI.createVirtualRegister(NewDstRC);
     MRI.replaceRegWith(DstReg, NewDstReg);
+
+    // Check useMI of NewInstr. If used by a true16 instruction,
+    // add a lo16 subreg access if size mismatched
+    if (ST.useRealTrue16Insts() && NewDstRC == &AMDGPU::VGPR_32RegClass) {
+      for (MachineRegisterInfo::use_iterator I = MRI.use_begin(NewDstReg),
+                                             E = MRI.use_end();
+           I != E; ++I) {
+        MachineInstr &UseMI = *I->getParent();
+        unsigned UseMIOpcode = UseMI.getOpcode();
+        if (AMDGPU::isTrue16Inst(UseMIOpcode) &&
+            (16 ==
+             RI.getRegSizeInBits(*getOpRegClass(UseMI, I.getOperandNo())))) {
+          I->setSubReg(AMDGPU::lo16);
+        }
+      }
+    }
   }
   fixImplicitOperands(*NewInstr);
   // Legalize the operands
diff --git a/llvm/test/CodeGen/AMDGPU/fneg-combines.f16.ll b/llvm/test/CodeGen/AMDGPU/fneg-combines.f16.ll
index 5ea39997938ad..4f6b334ec0819 100644
--- a/llvm/test/CodeGen/AMDGPU/fneg-combines.f16.ll
+++ b/llvm/test/CodeGen/AMDGPU/fneg-combines.f16.ll
@@ -699,7 +699,7 @@ define amdgpu_ps half @fneg_fadd_0_f16(half inreg %tmp2, half inreg %tmp6, <4 x
 ; GFX11-SAFE-TRUE16-NEXT:    v_cmp_ngt_f16_e32 vcc_lo, s0, v0.l
 ; GFX11-SAFE-TRUE16-NEXT:    s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_1)
 ; GFX11-SAFE-TRUE16-NEXT:    v_xor_b32_e32 v0, 0x8000, v1
-; GFX11-SAFE-TRUE16-NEXT:    v_cndmask_b16 v0.l, v0/*Invalid register, operand has 'VS_16' register class*/, s0, vcc_lo
+; GFX11-SAFE-TRUE16-NEXT:    v_cndmask_b16 v0.l, v0.l, s0, vcc_lo
 ; GFX11-SAFE-TRUE16-NEXT:    s_delay_alu instid0(VALU_DEP_1)
 ; GFX11-SAFE-TRUE16-NEXT:    v_cmp_nlt_f16_e32 vcc_lo, 0, v0.l
 ; GFX11-SAFE-TRUE16-NEXT:    v_cndmask_b16 v0.l, 0x7e00, 0, vcc_lo

@broxigarchen broxigarchen marked this pull request as draft March 19, 2025 20:33
@broxigarchen
Copy link
Contributor Author

broxigarchen commented Mar 19, 2025

We probably should also fix the bad copy %2:vgpr_32 = COPY %1:vgpr_16 and select t16 VALU directly

changing this PR to draft for now

@broxigarchen broxigarchen force-pushed the main-merge-true16-codegen-fix-spgr-lower branch from 5139e2f to 5f08d52 Compare March 20, 2025 17:18
@broxigarchen
Copy link
Contributor Author

broxigarchen commented Mar 20, 2025

It seems it takes more work to select t16 VALU during moveToVALU. Can be a seperate patch. Currently just fix the bad copy %2:vgpr_32 = COPY %1:vgpr_16.

Close this patch #131859 and proposed this as an alternative fix

@broxigarchen broxigarchen marked this pull request as ready for review March 20, 2025 17:36
@broxigarchen broxigarchen force-pushed the main-merge-true16-codegen-fix-spgr-lower branch from 2945f24 to ee658a1 Compare March 24, 2025 16:48
@broxigarchen
Copy link
Contributor Author

rebased

@broxigarchen
Copy link
Contributor Author

Gentle ping!

Copy link
Contributor

@Sisyph Sisyph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a comment on the original PR #131859 from @arsenm that 'you should be able to get this correct from the start'. I don't think we can get it right from the start, because the core issue is putting both 16 and 32 bit values into SGPR32.

So this patch and #131859 are different ways of fixing that up. This patch is better in one way because it attaches proper subregisters, where the other one doesn't. But this patch adds more special purpose code.

I see we don't have exiting uses of SUBREG_TO_REG in the AMDGPU backend. Where is that lowered to a target instruction? Do we expect passes seeing it to understand it? If so, this patch seems ok to me.

@broxigarchen
Copy link
Contributor Author

broxigarchen commented Mar 26, 2025

There was a comment on the original PR #131859 from @arsenm that 'you should be able to get this correct from the start'. I don't think we can get it right from the start, because the core issue is putting both 16 and 32 bit values into SGPR32.

So this patch and #131859 are different ways of fixing that up. This patch is better in one way because it attaches proper subregisters, where the other one doesn't. But this patch adds more special purpose code.

I see we don't have exiting uses of SUBREG_TO_REG in the AMDGPU backend. Where is that lowered to a target instruction? Do we expect passes seeing it to understand it? If so, this patch seems ok to me.

Hi Joe. SUBREG_TO_REG is lowered to copy in a codegen pass https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/ExpandPostRAPseudos.cpp. Also it's understandable by other codegen pass such as RA and coalescer

Copy link
Contributor

@Sisyph Sisyph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@broxigarchen broxigarchen force-pushed the main-merge-true16-codegen-fix-spgr-lower branch from 985f5ce to 2c920f5 Compare April 1, 2025 16:39
@broxigarchen
Copy link
Contributor Author

rebased

@broxigarchen broxigarchen merged commit dd1d41f into llvm:main Apr 1, 2025
6 of 10 checks passed
Ankur-0429 pushed a commit to Ankur-0429/llvm-project that referenced this pull request Apr 2, 2025
… true16 (llvm#132089)

There are V2S copies between vpgr16 and spgr32 in true16 mode. This is
caused by vgpr16 and sgpr32 both selectable by 16bit src in ISel.

When a V2S copy and its useMI are lowered to VALU,  this patch check
1. If the generated new VALU is used by a true16 inst. Add subreg access
if necessary.
2. Legalize the V2S copy by replacing it to subreg_to_reg

an example MIR looks like:
```
%2:sgpr_32 = COPY %1:vgpr_16
%3:sgpr_32 = S_OR_B32 %2:sgpr_32, ...
%4:vgpr_16 = V_ADD_F16_t16 %3:sgpr_32, ...
```
currently lowered to
```
%2:vgpr_32 = COPY %1:vgpr_16
%3:vgpr_32 = V_OR_B32 %2:vgpr_32, ...
%4:vgpr_16 = V_ADD_F16_t16 %3:vgpr_32, ...
```
after this patch
```
%2:vgpr_32 = SUBREG_TO_REG 0, %1:vgpr_16, lo16
%3:vgpr_32 = V_OR_B32 %2:vgpr_32, ...
%4:vgpr_16 = V_ADD_F16_t16 %3.lo16:vgpr_32, ...
```
broxigarchen added a commit that referenced this pull request Apr 3, 2025
…LU (#133985)

This is a follow up PR from
#132089.

When a V2S copy and its useMI are lowered to VALU,  this patch check:
If the generated new VALU is a true16 inst. Add subreg access on all
operands if necessary.

an example MIR looks like:
```
%1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0 ...
%2:sreg_32 = COPY %1:vgpr_32
%3:sreg_32 = S_FLOOR_F16 %2:sreg_32, ...
```
currently lowered to
```
%1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0 ...
%2:vgpr_16 = V_FLOOR_F16_t16_e64 0, %1:vgpr_32, 0, 0, 0 ...
```
after this patch
```
%1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0 ...
%2:vgpr_16 = V_FLOOR_F16_t16_e64 0, %1.lo16:vgpr_32, 0, 0, 0 ...
```
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Apr 3, 2025
… SALU to VALU (#133985)

This is a follow up PR from
llvm/llvm-project#132089.

When a V2S copy and its useMI are lowered to VALU,  this patch check:
If the generated new VALU is a true16 inst. Add subreg access on all
operands if necessary.

an example MIR looks like:
```
%1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0 ...
%2:sreg_32 = COPY %1:vgpr_32
%3:sreg_32 = S_FLOOR_F16 %2:sreg_32, ...
```
currently lowered to
```
%1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0 ...
%2:vgpr_16 = V_FLOOR_F16_t16_e64 0, %1:vgpr_32, 0, 0, 0 ...
```
after this patch
```
%1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0 ...
%2:vgpr_16 = V_FLOOR_F16_t16_e64 0, %1.lo16:vgpr_32, 0, 0, 0 ...
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants