AMDGPU: Add subtarget feature for global atomic fadd denormal support #96443

arsenm · 2024-06-23T20:18:19Z

Not sure what the behavior for gfx90a is. The SPG says it always flushes.
The instruction documentation says it does not.

arsenm · 2024-06-23T20:18:26Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @arsenm and the rest of your teammates on Graphite

llvmbot · 2024-06-23T20:20:15Z

@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)

Changes

Not sure what the behavior for gfx90a is. The SPG says it always flushes.
The instruction documentation says it does not.

Full diff: https://github.com/llvm/llvm-project/pull/96443.diff

2 Files Affected:

(modified) llvm/lib/Target/AMDGPU/AMDGPU.td (+11-2)
(modified) llvm/lib/Target/AMDGPU/GCNSubtarget.h (+7)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index 7ff861f5b144d..5f798b4391704 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -788,6 +788,13 @@ def FeatureFlatAtomicFaddF32Inst
   "Has flat_atomic_add_f32 instruction"
 >;
 
+def FeatureMemoryAtomicFaddF32DenormalSupport
+  : SubtargetFeature<"memory-atomic-fadd-f32-denormal-support",
+  "HasAtomicMemoryAtomicFaddF32DenormalSupport",
+  "true",
+  "global/flat/buffer atomic fadd for float supports denormal handling"
+>;
+
 def FeatureAgentScopeFineGrainedRemoteMemoryAtomics
   : SubtargetFeature<"agent-scope-fine-grained-remote-memory-atomics",
   "HasAgentScopeFineGrainedRemoteMemoryAtomics",
@@ -1425,7 +1432,8 @@ def FeatureISAVersion9_4_Common : FeatureSet<
    FeatureKernargPreload,
    FeatureAtomicFMinFMaxF64GlobalInsts,
    FeatureAtomicFMinFMaxF64FlatInsts,
-   FeatureAgentScopeFineGrainedRemoteMemoryAtomics
+   FeatureAgentScopeFineGrainedRemoteMemoryAtomics,
+   FeatureMemoryAtomicFaddF32DenormalSupport
    ]>;
 
 def FeatureISAVersion9_4_0 : FeatureSet<
@@ -1628,7 +1636,8 @@ def FeatureISAVersion12 : FeatureSet<
    FeatureVGPRSingleUseHintInsts,
    FeatureScalarDwordx3Loads,
    FeatureDPPSrc1SGPR,
-   FeatureMaxHardClauseLength32]>;
+   FeatureMaxHardClauseLength32,
+   FeatureMemoryAtomicFaddF32DenormalSupport]>;
 
 def FeatureISAVersion12_Generic: FeatureSet<
   !listconcat(FeatureISAVersion12.Features,
diff --git a/llvm/lib/Target/AMDGPU/GCNSubtarget.h b/llvm/lib/Target/AMDGPU/GCNSubtarget.h
index c40efbdcf7f0b..674d84422538f 100644
--- a/llvm/lib/Target/AMDGPU/GCNSubtarget.h
+++ b/llvm/lib/Target/AMDGPU/GCNSubtarget.h
@@ -167,6 +167,7 @@ class GCNSubtarget final : public AMDGPUGenSubtargetInfo,
   bool HasAtomicFlatPkAdd16Insts = false;
   bool HasAtomicFaddRtnInsts = false;
   bool HasAtomicFaddNoRtnInsts = false;
+  bool HasAtomicMemoryAtomicFaddF32DenormalSupport = false;
   bool HasAtomicBufferGlobalPkAddF16NoRtnInsts = false;
   bool HasAtomicBufferGlobalPkAddF16Insts = false;
   bool HasAtomicCSubNoRtnInsts = false;
@@ -872,6 +873,12 @@ class GCNSubtarget final : public AMDGPUGenSubtargetInfo,
 
   bool hasFlatAtomicFaddF32Inst() const { return HasFlatAtomicFaddF32Inst; }
 
+  /// \return true if the target's flat, global, and buffer atomic fadd for
+  /// float supports denormal handling.
+  bool hasMemoryAtomicFaddF32DenormalSupport() const {
+    return HasAtomicMemoryAtomicFaddF32DenormalSupport;
+  }
+
   /// \return true if atomic operations targeting fine-grained memory work
   /// correctly at device scope, in allocations in host or peer PCIe device
   /// memory.

rampitec · 2024-06-24T17:39:12Z

It is worse than that. It behaves differently depending on where atomic is executed. There is no single answer if this instruction supports denorms or not.

arsenm · 2024-06-24T20:18:33Z

It is worse than that. It behaves differently depending on where atomic is executed. There is no single answer if this instruction supports denorms or not.

That doesn't matter. The flat case that sometimes flushes is just a no. Flushing is never a guarantee, we only need to know a flush may happen

llvm/lib/Target/AMDGPU/GCNSubtarget.h

arsenm · 2024-07-10T12:37:35Z

Merge activity

Jul 10, 8:37 AM EDT: @arsenm started a stack merge that includes this pull request via Graphite.
Jul 10, 8:45 AM EDT: Graphite rebased this pull request as part of a merge.
Jul 10, 8:47 AM EDT: @arsenm merged this pull request with Graphite.

Not sure what the behavior for gfx90a is. The SPG says it always flushes. The instruction documentation says it does not.

RDNA 3 manual says "Floating-point addition handles NAN/INF/denorm" thought I'm not sure I trust it.

…llvm#96443) Not sure what the behavior for gfx90a is. The SPG says it always flushes. The instruction documentation says it does not.

This was referenced Jun 23, 2024

AMDGPU: Legalize v2f16 atomicrmw fadd for buffer fat pointers #95929

Merged

AMDGPU: Handle legal v2bf16 atomicrmw fadd for gfx12 #95930

Merged

This was referenced Jun 23, 2024

AMDGPU: Add a subtarget feature for fine-grained remote memory support #96442

Merged

AMDGPU: Add subtarget feature for memory atomic fadd f64 #96444

Merged

arsenm added the backend:AMDGPU label Jun 23, 2024 — with Graphite App

arsenm requested review from AlexVlx, jayfoad, Pierre-vh, rampitec, Sisyph, yashssh and yxsamliu June 23, 2024 20:20

arsenm marked this pull request as ready for review June 23, 2024 20:20

rampitec approved these changes Jun 24, 2024

View reviewed changes

arsenm force-pushed the users/arsenm/amdgpu-add-subtarget-feature-fine-grained-remote-memory-atomics branch from 2da0565 to 1a441c0 Compare June 25, 2024 09:10

arsenm force-pushed the users/arsenm/amdgpu-subtarget-feature-fadd-denormal-support branch from 4594135 to 3ec4e64 Compare June 25, 2024 09:10

arsenm force-pushed the users/arsenm/amdgpu-add-subtarget-feature-fine-grained-remote-memory-atomics branch from 1a441c0 to 302a99a Compare June 25, 2024 22:32

arsenm force-pushed the users/arsenm/amdgpu-subtarget-feature-fadd-denormal-support branch from 3ec4e64 to 47017c2 Compare June 25, 2024 22:32

jayfoad reviewed Jun 26, 2024

View reviewed changes

llvm/lib/Target/AMDGPU/GCNSubtarget.h Outdated Show resolved Hide resolved

This was referenced Jun 26, 2024

AMDGPU: Handle remote/fine-grained memory in atomicrmw fmin/fmax lowering #96759

Merged

AMDGPU: Handle new atomicrmw metadata for fadd case #96760

Merged

arsenm force-pushed the users/arsenm/amdgpu-add-subtarget-feature-fine-grained-remote-memory-atomics branch from 302a99a to 10c0aec Compare June 27, 2024 07:46

arsenm force-pushed the users/arsenm/amdgpu-subtarget-feature-fadd-denormal-support branch from 23ec97c to b57b67e Compare June 27, 2024 07:47

arsenm force-pushed the users/arsenm/amdgpu-add-subtarget-feature-fine-grained-remote-memory-atomics branch from 10c0aec to 81cc1b7 Compare June 27, 2024 09:10

arsenm force-pushed the users/arsenm/amdgpu-subtarget-feature-fadd-denormal-support branch from b57b67e to 5a62792 Compare June 27, 2024 09:10

arsenm mentioned this pull request Jun 27, 2024

clang/AMDGPU: Emit atomicrmw for __builtin_amdgcn_global_atomic_fadd_{f32|f64} #96872

Merged

arsenm force-pushed the users/arsenm/amdgpu-add-subtarget-feature-fine-grained-remote-memory-atomics branch from 81cc1b7 to 438d5bb Compare June 27, 2024 14:28

arsenm force-pushed the users/arsenm/amdgpu-subtarget-feature-fadd-denormal-support branch from 5a62792 to 1e3c134 Compare June 27, 2024 14:28

arsenm force-pushed the users/arsenm/amdgpu-add-subtarget-feature-fine-grained-remote-memory-atomics branch from 438d5bb to 53a120c Compare June 28, 2024 12:40

arsenm force-pushed the users/arsenm/amdgpu-subtarget-feature-fadd-denormal-support branch from 1e3c134 to ab52788 Compare June 28, 2024 12:41

This was referenced Jun 28, 2024

AMDGPU/GlobalISel: Legalize atomicrmw fmin/fmax #97048

Merged

AMDGPU: Remove flat/global atomic fadd v2bf16 intrinsics #97050

Merged

AMDGPU: Remove global/flat atomic fadd intrinics #97051

Merged

arsenm force-pushed the users/arsenm/amdgpu-add-subtarget-feature-fine-grained-remote-memory-atomics branch from 53a120c to 8a87e14 Compare July 2, 2024 17:01

arsenm force-pushed the users/arsenm/amdgpu-subtarget-feature-fadd-denormal-support branch from ab52788 to 1a5d8b8 Compare July 2, 2024 17:01

arsenm force-pushed the users/arsenm/amdgpu-add-subtarget-feature-fine-grained-remote-memory-atomics branch from 8a87e14 to e0ae621 Compare July 3, 2024 17:07

arsenm force-pushed the users/arsenm/amdgpu-subtarget-feature-fadd-denormal-support branch from 1a5d8b8 to 9cf93c6 Compare July 3, 2024 17:07

arsenm force-pushed the users/arsenm/amdgpu-add-subtarget-feature-fine-grained-remote-memory-atomics branch from e0ae621 to 1a6ff86 Compare July 3, 2024 21:41

arsenm force-pushed the users/arsenm/amdgpu-subtarget-feature-fadd-denormal-support branch from 9cf93c6 to deebca2 Compare July 3, 2024 21:41

arsenm force-pushed the users/arsenm/amdgpu-add-subtarget-feature-fine-grained-remote-memory-atomics branch from 1a6ff86 to a060a2a Compare July 4, 2024 09:33

arsenm force-pushed the users/arsenm/amdgpu-subtarget-feature-fadd-denormal-support branch from deebca2 to 573e7bc Compare July 4, 2024 09:35

arsenm force-pushed the users/arsenm/amdgpu-add-subtarget-feature-fine-grained-remote-memory-atomics branch from a060a2a to 76190f2 Compare July 4, 2024 09:42

arsenm force-pushed the users/arsenm/amdgpu-subtarget-feature-fadd-denormal-support branch from 573e7bc to 5ef29a5 Compare July 4, 2024 09:42

arsenm force-pushed the users/arsenm/amdgpu-add-subtarget-feature-fine-grained-remote-memory-atomics branch from 76190f2 to 2b6a7bb Compare July 10, 2024 12:40

Base automatically changed from users/arsenm/amdgpu-add-subtarget-feature-fine-grained-remote-memory-atomics to main July 10, 2024 12:43

arsenm added 3 commits July 10, 2024 12:44

AMDGPU: Add subtarget feature for global atomic fadd denormal support

709e791

Not sure what the behavior for gfx90a is. The SPG says it always flushes. The instruction documentation says it does not.

Add to gfx11.

9d38cff

RDNA 3 manual says "Floating-point addition handles NAN/INF/denorm" thought I'm not sure I trust it.

Rename

43dc4f2

arsenm force-pushed the users/arsenm/amdgpu-subtarget-feature-fadd-denormal-support branch from 5ef29a5 to 43dc4f2 Compare July 10, 2024 12:45

arsenm merged commit 409815d into main Jul 10, 2024
4 of 6 checks passed

arsenm deleted the users/arsenm/amdgpu-subtarget-feature-fadd-denormal-support branch July 10, 2024 12:48

This was referenced Aug 2, 2024

IR/AMDGPU: Autoupgrade amdgpu-unsafe-fp-atomics attribute #101698

Merged

AMDGPU: Stop handling legacy amdgpu-unsafe-fp-atomics attribute #101699

Merged

arsenm mentioned this pull request Aug 22, 2024

AMDGPU: Remove flat/global fmin/fmax intrinsics #105642

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AMDGPU: Add subtarget feature for global atomic fadd denormal support #96443

AMDGPU: Add subtarget feature for global atomic fadd denormal support #96443

Uh oh!

arsenm commented Jun 23, 2024

Uh oh!

arsenm commented Jun 23, 2024 •

edited

Loading

Uh oh!

llvmbot commented Jun 23, 2024

Uh oh!

rampitec commented Jun 24, 2024

Uh oh!

arsenm commented Jun 24, 2024

Uh oh!

Uh oh!

arsenm commented Jul 10, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

AMDGPU: Add subtarget feature for global atomic fadd denormal support #96443

AMDGPU: Add subtarget feature for global atomic fadd denormal support #96443

Uh oh!

Conversation

arsenm commented Jun 23, 2024

Uh oh!

arsenm commented Jun 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jun 23, 2024

Uh oh!

rampitec commented Jun 24, 2024

Uh oh!

arsenm commented Jun 24, 2024

Uh oh!

Uh oh!

arsenm commented Jul 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge activity

Uh oh!

Uh oh!

Uh oh!

arsenm commented Jun 23, 2024 •

edited

Loading

arsenm commented Jul 10, 2024 •

edited

Loading