Skip to content

[AMDGPU] Emit S_CBRANCH_SCC for floating-point conditions. #120588

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

kmitropoulou
Copy link
Contributor

  • [AMDGPU] Add new test.
  • [AMDGPU] Emit S_CBRANCH_SCC for floating-point conditions.

@llvmbot
Copy link
Member

llvmbot commented Dec 19, 2024

@llvm/pr-subscribers-backend-amdgpu

Author: Konstantina Mitropoulou (kmitropoulou)

Changes
  • [AMDGPU] Add new test.
  • [AMDGPU] Emit S_CBRANCH_SCC for floating-point conditions.

Full diff: https://github.com/llvm/llvm-project/pull/120588.diff

3 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp (+4-2)
  • (modified) llvm/test/CodeGen/AMDGPU/branch-relaxation.ll (+1-4)
  • (added) llvm/test/CodeGen/AMDGPU/uniform_branch_with_floating_point_cond.ll (+97)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
index c0e01a020e0eb9..97d21fb80d3dac 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
@@ -2389,13 +2389,15 @@ bool AMDGPUDAGToDAGISel::isCBranchSCC(const SDNode *N) const {
   if (VT == MVT::i32)
     return true;
 
+  const auto *ST = static_cast<const GCNSubtarget *>(Subtarget);
   if (VT == MVT::i64) {
-    const auto *ST = static_cast<const GCNSubtarget *>(Subtarget);
-
     ISD::CondCode CC = cast<CondCodeSDNode>(Cond.getOperand(2))->get();
     return (CC == ISD::SETEQ || CC == ISD::SETNE) && ST->hasScalarCompareEq64();
   }
 
+  if ((VT == MVT::f32 || VT == MVT::f64) && ST->hasSALUFloatInsts())
+    return true;
+
   return false;
 }
 
diff --git a/llvm/test/CodeGen/AMDGPU/branch-relaxation.ll b/llvm/test/CodeGen/AMDGPU/branch-relaxation.ll
index 1d984bd49756e0..ff47c865c67e65 100644
--- a/llvm/test/CodeGen/AMDGPU/branch-relaxation.ll
+++ b/llvm/test/CodeGen/AMDGPU/branch-relaxation.ll
@@ -297,10 +297,7 @@ define amdgpu_kernel void @uniform_conditional_min_long_forward_vcnd_branch(ptr
 ; GFX12-NEXT:    s_load_b32 s0, s[4:5], 0x2c
 ; GFX12-NEXT:    s_wait_kmcnt 0x0
 ; GFX12-NEXT:    s_cmp_eq_f32 s0, 0
-; GFX12-NEXT:    s_cselect_b32 s1, -1, 0
-; GFX12-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
-; GFX12-NEXT:    s_and_b32 vcc_lo, exec_lo, s1
-; GFX12-NEXT:    s_cbranch_vccz .LBB2_1
+; GFX12-NEXT:    s_cbranch_scc0 .LBB2_1
 ; GFX12-NEXT:  ; %bb.3: ; %bb0
 ; GFX12-NEXT:    s_getpc_b64 s[2:3]
 ; GFX12-NEXT:  .Lpost_getpc2:
diff --git a/llvm/test/CodeGen/AMDGPU/uniform_branch_with_floating_point_cond.ll b/llvm/test/CodeGen/AMDGPU/uniform_branch_with_floating_point_cond.ll
new file mode 100644
index 00000000000000..4cf1c2af55b7e9
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/uniform_branch_with_floating_point_cond.ll
@@ -0,0 +1,97 @@
+; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1200 -verify-machineinstrs  -stop-after=amdgpu-isel < %s 2>&1 | FileCheck %s
+
+@external_constant = external addrspace(4) constant i32, align 4
+@const.ptr = external addrspace(4) constant ptr, align 4
+
+define void @test() {
+  ; CHECK-LABEL: name: test
+  ; CHECK: bb.0.entry:
+  ; CHECK-NEXT:   successors: %bb.1(0x30000000), %bb.3(0x50000000)
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   [[SI_PC_ADD_REL_OFFSET:%[0-9]+]]:sreg_64 = SI_PC_ADD_REL_OFFSET target-flags(amdgpu-gotprel32-lo) @external_constant, target-flags(amdgpu-gotprel32-hi) @external_constant, implicit-def dead $scc
+  ; CHECK-NEXT:   [[S_LOAD_DWORDX2_IMM:%[0-9]+]]:sreg_64_xexec = S_LOAD_DWORDX2_IMM killed [[SI_PC_ADD_REL_OFFSET]], 0, 0 :: (dereferenceable invariant load (s64) from got, addrspace 4)
+  ; CHECK-NEXT:   [[S_LOAD_DWORD_IMM:%[0-9]+]]:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM killed [[S_LOAD_DWORDX2_IMM]], 0, 0 :: (dereferenceable invariant load (s32) from @external_constant, addrspace 4)
+  ; CHECK-NEXT:   [[S_MOV_B32_:%[0-9]+]]:sgpr_32 = S_MOV_B32 0
+  ; CHECK-NEXT:   nofpexcept S_CMP_LG_F32 killed [[S_LOAD_DWORD_IMM]], killed [[S_MOV_B32_]], implicit-def $scc, implicit $mode
+  ; CHECK-NEXT:   S_CBRANCH_SCC1 %bb.3, implicit $scc
+  ; CHECK-NEXT:   S_BRANCH %bb.1
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.1.bb1:
+  ; CHECK-NEXT:   successors: %bb.2(0x40000000), %bb.4(0x40000000)
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   [[SI_PC_ADD_REL_OFFSET1:%[0-9]+]]:sreg_64 = SI_PC_ADD_REL_OFFSET target-flags(amdgpu-gotprel32-lo) @const.ptr, target-flags(amdgpu-gotprel32-hi) @const.ptr, implicit-def dead $scc
+  ; CHECK-NEXT:   [[S_LOAD_DWORDX2_IMM1:%[0-9]+]]:sreg_64_xexec = S_LOAD_DWORDX2_IMM killed [[SI_PC_ADD_REL_OFFSET1]], 0, 0 :: (dereferenceable invariant load (s64) from got, addrspace 4)
+  ; CHECK-NEXT:   [[S_LOAD_DWORDX2_IMM2:%[0-9]+]]:sreg_64_xexec_xnull = S_LOAD_DWORDX2_IMM killed [[S_LOAD_DWORDX2_IMM1]], 0, 0 :: (invariant load (s64) from @const.ptr, addrspace 4)
+  ; CHECK-NEXT:   [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
+  ; CHECK-NEXT:   [[GLOBAL_LOAD_DWORD_SADDR:%[0-9]+]]:vgpr_32 = GLOBAL_LOAD_DWORD_SADDR killed [[S_LOAD_DWORDX2_IMM2]], killed [[V_MOV_B32_e32_]], 0, 0, implicit $exec :: (load (s32) from %ir.0, addrspace 1)
+  ; CHECK-NEXT:   [[S_MOV_B32_1:%[0-9]+]]:sgpr_32 = S_MOV_B32 1092616192
+  ; CHECK-NEXT:   [[S_MOV_B32_2:%[0-9]+]]:sgpr_32 = S_MOV_B32 1065353216
+  ; CHECK-NEXT:   [[COPY:%[0-9]+]]:sreg_32 = COPY [[GLOBAL_LOAD_DWORD_SADDR]]
+  ; CHECK-NEXT:   nofpexcept S_CMP_LT_F32 killed [[COPY]], killed [[S_MOV_B32_2]], implicit-def $scc, implicit $mode
+  ; CHECK-NEXT:   S_CBRANCH_SCC1 %bb.4, implicit $scc
+  ; CHECK-NEXT:   S_BRANCH %bb.2
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.2.bb2:
+  ; CHECK-NEXT:   successors: %bb.4(0x80000000)
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   [[S_MOV_B32_3:%[0-9]+]]:sgpr_32 = S_MOV_B32 0
+  ; CHECK-NEXT:   S_BRANCH %bb.4
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.3.Flow1:
+  ; CHECK-NEXT:   successors: %bb.7(0x80000000)
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   S_BRANCH %bb.7
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.4.bb3:
+  ; CHECK-NEXT:   successors: %bb.5(0x50000000), %bb.6(0x30000000)
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   [[PHI:%[0-9]+]]:sgpr_32 = PHI [[S_MOV_B32_1]], %bb.1, [[S_MOV_B32_3]], %bb.2
+  ; CHECK-NEXT:   [[S_MOV_B32_4:%[0-9]+]]:sgpr_32 = S_MOV_B32 0
+  ; CHECK-NEXT:   nofpexcept S_CMP_NEQ_F32 [[PHI]], killed [[S_MOV_B32_4]], implicit-def $scc, implicit $mode
+  ; CHECK-NEXT:   S_CBRANCH_SCC1 %bb.6, implicit $scc
+  ; CHECK-NEXT:   S_BRANCH %bb.5
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.5.bb4:
+  ; CHECK-NEXT:   successors: %bb.6(0x80000000)
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 1082130432, implicit $exec
+  ; CHECK-NEXT:   [[DEF:%[0-9]+]]:sreg_32_xexec_hi = IMPLICIT_DEF
+  ; CHECK-NEXT:   SCRATCH_STORE_DWORD_SADDR killed [[V_MOV_B32_e32_1]], killed [[DEF]], 0, 0, implicit $exec, implicit $flat_scr :: (store (s32) into `ptr addrspace(5) undef`, addrspace 5)
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.6.Flow:
+  ; CHECK-NEXT:   successors: %bb.3(0x80000000)
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   S_BRANCH %bb.3
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.7.bb5:
+  ; CHECK-NEXT:   SI_RETURN
+entry:
+  %ld1 = load float, ptr addrspace(4) @external_constant
+  %cmp1 = fcmp one float %ld1, 0.0
+  br i1 %cmp1, label %bb5, label %bb1, !amdgpu.uniform !0
+
+bb1:
+  %ptr = load ptr, ptr addrspace(4) @const.ptr
+  %ld2 = load float, ptr %ptr, align 4
+  %cmp2 = fcmp olt float %ld2, 1.0
+  %or = or i1 %cmp2, false
+  br i1 %or, label %bb3, label %bb2, !amdgpu.uniform !0
+
+bb2:
+  br label %bb3
+
+bb3:
+  %phi = phi float [ 10.0, %bb1 ], [ 0.0, %bb2 ]
+  %cmp3 = fcmp oeq float %phi, 0.0
+  br i1 %cmp3, label %bb4, label %bb5, !amdgpu.uniform !0
+
+bb4:
+  store float 4.0, ptr addrspace(5) undef, align 4
+  br label %bb5
+
+bb5:
+  ret void
+}
+
+!0 = !{}

@kmitropoulou kmitropoulou requested a review from dstutt December 19, 2024 15:07
@kmitropoulou kmitropoulou changed the title uniform branch floating point condition [AMDGPU] Emit S_CBRANCH_SCC for floating-point conditions. Dec 19, 2024
Copy link

github-actions bot commented Dec 19, 2024

✅ With the latest revision this PR passed the undef deprecator.

@kmitropoulou kmitropoulou force-pushed the uniform_branch_floating_point_condition branch 2 times, most recently from 8d3d8d0 to 2995c8c Compare December 19, 2024 16:34
@kmitropoulou kmitropoulou requested a review from jayfoad December 19, 2024 16:36
}

if ((VT == MVT::f16 || VT == MVT::f32) && ST->hasSALUFloatInsts())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if ((VT == MVT::f16 || VT == MVT::f32) && ST->hasSALUFloatInsts())
if ((VT == MVT::f16 || VT == MVT::f32) && Subtarget->hasSALUFloatInsts())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Jay :) I forgot to rebase the patch. I just updated a new version.

@kmitropoulou kmitropoulou force-pushed the uniform_branch_floating_point_condition branch from 2995c8c to 699bc18 Compare December 19, 2024 16:43
@kmitropoulou kmitropoulou requested a review from jayfoad December 19, 2024 16:45
@@ -0,0 +1,100 @@
; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
; RUN: llc -mtriple=amdgcn -mcpu=gfx1200 -verify-machineinstrs -stop-after=amdgpu-isel < %s 2>&1 | FileCheck %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
; RUN: llc -mtriple=amdgcn -mcpu=gfx1200 -verify-machineinstrs -stop-after=amdgpu-isel < %s 2>&1 | FileCheck %s
; RUN: llc -mtriple=amdgcn -mcpu=gfx1200 -stop-after=amdgpu-isel < %s 2>&1 | FileCheck %s

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -0,0 +1,100 @@
; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
; RUN: llc -mtriple=amdgcn -mcpu=gfx1200 -verify-machineinstrs -stop-after=amdgpu-isel < %s 2>&1 | FileCheck %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need 2>&1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it. Thank you :)

@kmitropoulou kmitropoulou force-pushed the uniform_branch_floating_point_condition branch from 699bc18 to 0dc2a0a Compare December 19, 2024 16:51
@kmitropoulou kmitropoulou requested a review from jayfoad December 19, 2024 16:52
Copy link
Contributor

@jayfoad jayfoad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks.

@kmitropoulou kmitropoulou merged commit d3508cc into llvm:main Dec 19, 2024
8 checks passed
qiaojbao pushed a commit to GPUOpen-Drivers/llvm-project that referenced this pull request Feb 7, 2025
…lvm#120588)"

This reverts commit d3508cc.

Change-Id: Idc3b9497c81779055fe226a2705bcbe25cd70889
qiaojbao pushed a commit to GPUOpen-Drivers/llvm-project that referenced this pull request Feb 7, 2025
Local branch amd-gfx d71fa76 Revert "[AMDGPU] Emit S_CBRANCH_SCC for floating-point conditions. (llvm#120588)"
Remote branch main 0fa59c6 [llvm][Docs] Update supported hardware (llvm#121743)

Change-Id: Ic39049333a827f4a1840c385d1d6ce004af4bd64
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants