Skip to content

AMDGPU: Add subtarget feature for global atomic fadd denormal support #96443

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

arsenm
Copy link
Contributor

@arsenm arsenm commented Jun 23, 2024

Not sure what the behavior for gfx90a is. The SPG says it always flushes.
The instruction documentation says it does not.

@llvmbot
Copy link
Member

llvmbot commented Jun 23, 2024

@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)

Changes

Not sure what the behavior for gfx90a is. The SPG says it always flushes.
The instruction documentation says it does not.


Full diff: https://github.com/llvm/llvm-project/pull/96443.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPU.td (+11-2)
  • (modified) llvm/lib/Target/AMDGPU/GCNSubtarget.h (+7)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index 7ff861f5b144d..5f798b4391704 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -788,6 +788,13 @@ def FeatureFlatAtomicFaddF32Inst
   "Has flat_atomic_add_f32 instruction"
 >;
 
+def FeatureMemoryAtomicFaddF32DenormalSupport
+  : SubtargetFeature<"memory-atomic-fadd-f32-denormal-support",
+  "HasAtomicMemoryAtomicFaddF32DenormalSupport",
+  "true",
+  "global/flat/buffer atomic fadd for float supports denormal handling"
+>;
+
 def FeatureAgentScopeFineGrainedRemoteMemoryAtomics
   : SubtargetFeature<"agent-scope-fine-grained-remote-memory-atomics",
   "HasAgentScopeFineGrainedRemoteMemoryAtomics",
@@ -1425,7 +1432,8 @@ def FeatureISAVersion9_4_Common : FeatureSet<
    FeatureKernargPreload,
    FeatureAtomicFMinFMaxF64GlobalInsts,
    FeatureAtomicFMinFMaxF64FlatInsts,
-   FeatureAgentScopeFineGrainedRemoteMemoryAtomics
+   FeatureAgentScopeFineGrainedRemoteMemoryAtomics,
+   FeatureMemoryAtomicFaddF32DenormalSupport
    ]>;
 
 def FeatureISAVersion9_4_0 : FeatureSet<
@@ -1628,7 +1636,8 @@ def FeatureISAVersion12 : FeatureSet<
    FeatureVGPRSingleUseHintInsts,
    FeatureScalarDwordx3Loads,
    FeatureDPPSrc1SGPR,
-   FeatureMaxHardClauseLength32]>;
+   FeatureMaxHardClauseLength32,
+   FeatureMemoryAtomicFaddF32DenormalSupport]>;
 
 def FeatureISAVersion12_Generic: FeatureSet<
   !listconcat(FeatureISAVersion12.Features,
diff --git a/llvm/lib/Target/AMDGPU/GCNSubtarget.h b/llvm/lib/Target/AMDGPU/GCNSubtarget.h
index c40efbdcf7f0b..674d84422538f 100644
--- a/llvm/lib/Target/AMDGPU/GCNSubtarget.h
+++ b/llvm/lib/Target/AMDGPU/GCNSubtarget.h
@@ -167,6 +167,7 @@ class GCNSubtarget final : public AMDGPUGenSubtargetInfo,
   bool HasAtomicFlatPkAdd16Insts = false;
   bool HasAtomicFaddRtnInsts = false;
   bool HasAtomicFaddNoRtnInsts = false;
+  bool HasAtomicMemoryAtomicFaddF32DenormalSupport = false;
   bool HasAtomicBufferGlobalPkAddF16NoRtnInsts = false;
   bool HasAtomicBufferGlobalPkAddF16Insts = false;
   bool HasAtomicCSubNoRtnInsts = false;
@@ -872,6 +873,12 @@ class GCNSubtarget final : public AMDGPUGenSubtargetInfo,
 
   bool hasFlatAtomicFaddF32Inst() const { return HasFlatAtomicFaddF32Inst; }
 
+  /// \return true if the target's flat, global, and buffer atomic fadd for
+  /// float supports denormal handling.
+  bool hasMemoryAtomicFaddF32DenormalSupport() const {
+    return HasAtomicMemoryAtomicFaddF32DenormalSupport;
+  }
+
   /// \return true if atomic operations targeting fine-grained memory work
   /// correctly at device scope, in allocations in host or peer PCIe device
   /// memory.

@arsenm arsenm marked this pull request as ready for review June 23, 2024 20:20
@rampitec
Copy link
Collaborator

It is worse than that. It behaves differently depending on where atomic is executed. There is no single answer if this instruction supports denorms or not.

@arsenm
Copy link
Contributor Author

arsenm commented Jun 24, 2024

It is worse than that. It behaves differently depending on where atomic is executed. There is no single answer if this instruction supports denorms or not.

That doesn't matter. The flat case that sometimes flushes is just a no. Flushing is never a guarantee, we only need to know a flush may happen

@arsenm arsenm force-pushed the users/arsenm/amdgpu-add-subtarget-feature-fine-grained-remote-memory-atomics branch from 2da0565 to 1a441c0 Compare June 25, 2024 09:10
@arsenm arsenm force-pushed the users/arsenm/amdgpu-subtarget-feature-fadd-denormal-support branch from 4594135 to 3ec4e64 Compare June 25, 2024 09:10
@arsenm arsenm force-pushed the users/arsenm/amdgpu-add-subtarget-feature-fine-grained-remote-memory-atomics branch from 1a441c0 to 302a99a Compare June 25, 2024 22:32
@arsenm arsenm force-pushed the users/arsenm/amdgpu-subtarget-feature-fadd-denormal-support branch from 3ec4e64 to 47017c2 Compare June 25, 2024 22:32
@arsenm arsenm force-pushed the users/arsenm/amdgpu-add-subtarget-feature-fine-grained-remote-memory-atomics branch from 302a99a to 10c0aec Compare June 27, 2024 07:46
@arsenm arsenm force-pushed the users/arsenm/amdgpu-subtarget-feature-fadd-denormal-support branch from 23ec97c to b57b67e Compare June 27, 2024 07:47
@arsenm arsenm force-pushed the users/arsenm/amdgpu-add-subtarget-feature-fine-grained-remote-memory-atomics branch from 10c0aec to 81cc1b7 Compare June 27, 2024 09:10
@arsenm arsenm force-pushed the users/arsenm/amdgpu-subtarget-feature-fadd-denormal-support branch from b57b67e to 5a62792 Compare June 27, 2024 09:10
@arsenm arsenm force-pushed the users/arsenm/amdgpu-add-subtarget-feature-fine-grained-remote-memory-atomics branch from 81cc1b7 to 438d5bb Compare June 27, 2024 14:28
@arsenm arsenm force-pushed the users/arsenm/amdgpu-subtarget-feature-fadd-denormal-support branch from 5a62792 to 1e3c134 Compare June 27, 2024 14:28
@arsenm arsenm force-pushed the users/arsenm/amdgpu-add-subtarget-feature-fine-grained-remote-memory-atomics branch from 438d5bb to 53a120c Compare June 28, 2024 12:40
@arsenm arsenm force-pushed the users/arsenm/amdgpu-subtarget-feature-fadd-denormal-support branch from 1e3c134 to ab52788 Compare June 28, 2024 12:41
@arsenm arsenm force-pushed the users/arsenm/amdgpu-add-subtarget-feature-fine-grained-remote-memory-atomics branch from 53a120c to 8a87e14 Compare July 2, 2024 17:01
@arsenm arsenm force-pushed the users/arsenm/amdgpu-subtarget-feature-fadd-denormal-support branch from ab52788 to 1a5d8b8 Compare July 2, 2024 17:01
@arsenm arsenm force-pushed the users/arsenm/amdgpu-add-subtarget-feature-fine-grained-remote-memory-atomics branch from 8a87e14 to e0ae621 Compare July 3, 2024 17:07
@arsenm arsenm force-pushed the users/arsenm/amdgpu-subtarget-feature-fadd-denormal-support branch from 1a5d8b8 to 9cf93c6 Compare July 3, 2024 17:07
@arsenm arsenm force-pushed the users/arsenm/amdgpu-add-subtarget-feature-fine-grained-remote-memory-atomics branch from e0ae621 to 1a6ff86 Compare July 3, 2024 21:41
@arsenm arsenm force-pushed the users/arsenm/amdgpu-subtarget-feature-fadd-denormal-support branch from 9cf93c6 to deebca2 Compare July 3, 2024 21:41
@arsenm arsenm force-pushed the users/arsenm/amdgpu-add-subtarget-feature-fine-grained-remote-memory-atomics branch from 1a6ff86 to a060a2a Compare July 4, 2024 09:33
@arsenm arsenm force-pushed the users/arsenm/amdgpu-subtarget-feature-fadd-denormal-support branch from deebca2 to 573e7bc Compare July 4, 2024 09:35
@arsenm arsenm force-pushed the users/arsenm/amdgpu-add-subtarget-feature-fine-grained-remote-memory-atomics branch from a060a2a to 76190f2 Compare July 4, 2024 09:42
@arsenm arsenm force-pushed the users/arsenm/amdgpu-subtarget-feature-fadd-denormal-support branch from 573e7bc to 5ef29a5 Compare July 4, 2024 09:42
Copy link
Contributor Author

arsenm commented Jul 10, 2024

Merge activity

  • Jul 10, 8:37 AM EDT: @arsenm started a stack merge that includes this pull request via Graphite.
  • Jul 10, 8:45 AM EDT: Graphite rebased this pull request as part of a merge.
  • Jul 10, 8:47 AM EDT: @arsenm merged this pull request with Graphite.

@arsenm arsenm force-pushed the users/arsenm/amdgpu-add-subtarget-feature-fine-grained-remote-memory-atomics branch from 76190f2 to 2b6a7bb Compare July 10, 2024 12:40
Base automatically changed from users/arsenm/amdgpu-add-subtarget-feature-fine-grained-remote-memory-atomics to main July 10, 2024 12:43
arsenm added 3 commits July 10, 2024 12:44
Not sure what the behavior for gfx90a is. The SPG says it always flushes.
The instruction documentation says it does not.
RDNA 3 manual says "Floating-point addition handles NAN/INF/denorm"
thought I'm not sure I trust it.
@arsenm arsenm force-pushed the users/arsenm/amdgpu-subtarget-feature-fadd-denormal-support branch from 5ef29a5 to 43dc4f2 Compare July 10, 2024 12:45
@arsenm arsenm merged commit 409815d into main Jul 10, 2024
4 of 6 checks passed
@arsenm arsenm deleted the users/arsenm/amdgpu-subtarget-feature-fadd-denormal-support branch July 10, 2024 12:48
aaryanshukla pushed a commit to aaryanshukla/llvm-project that referenced this pull request Jul 14, 2024
…llvm#96443)

Not sure what the behavior for gfx90a is. The SPG says it always flushes.
The instruction documentation says it does not.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants