-
Notifications
You must be signed in to change notification settings - Fork 13.7k
AMDGPU: Add subtarget feature for global atomic fadd denormal support #96443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AMDGPU: Add subtarget feature for global atomic fadd denormal support #96443
Conversation
@llvm/pr-subscribers-backend-amdgpu Author: Matt Arsenault (arsenm) ChangesNot sure what the behavior for gfx90a is. The SPG says it always flushes. Full diff: https://github.com/llvm/llvm-project/pull/96443.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index 7ff861f5b144d..5f798b4391704 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -788,6 +788,13 @@ def FeatureFlatAtomicFaddF32Inst
"Has flat_atomic_add_f32 instruction"
>;
+def FeatureMemoryAtomicFaddF32DenormalSupport
+ : SubtargetFeature<"memory-atomic-fadd-f32-denormal-support",
+ "HasAtomicMemoryAtomicFaddF32DenormalSupport",
+ "true",
+ "global/flat/buffer atomic fadd for float supports denormal handling"
+>;
+
def FeatureAgentScopeFineGrainedRemoteMemoryAtomics
: SubtargetFeature<"agent-scope-fine-grained-remote-memory-atomics",
"HasAgentScopeFineGrainedRemoteMemoryAtomics",
@@ -1425,7 +1432,8 @@ def FeatureISAVersion9_4_Common : FeatureSet<
FeatureKernargPreload,
FeatureAtomicFMinFMaxF64GlobalInsts,
FeatureAtomicFMinFMaxF64FlatInsts,
- FeatureAgentScopeFineGrainedRemoteMemoryAtomics
+ FeatureAgentScopeFineGrainedRemoteMemoryAtomics,
+ FeatureMemoryAtomicFaddF32DenormalSupport
]>;
def FeatureISAVersion9_4_0 : FeatureSet<
@@ -1628,7 +1636,8 @@ def FeatureISAVersion12 : FeatureSet<
FeatureVGPRSingleUseHintInsts,
FeatureScalarDwordx3Loads,
FeatureDPPSrc1SGPR,
- FeatureMaxHardClauseLength32]>;
+ FeatureMaxHardClauseLength32,
+ FeatureMemoryAtomicFaddF32DenormalSupport]>;
def FeatureISAVersion12_Generic: FeatureSet<
!listconcat(FeatureISAVersion12.Features,
diff --git a/llvm/lib/Target/AMDGPU/GCNSubtarget.h b/llvm/lib/Target/AMDGPU/GCNSubtarget.h
index c40efbdcf7f0b..674d84422538f 100644
--- a/llvm/lib/Target/AMDGPU/GCNSubtarget.h
+++ b/llvm/lib/Target/AMDGPU/GCNSubtarget.h
@@ -167,6 +167,7 @@ class GCNSubtarget final : public AMDGPUGenSubtargetInfo,
bool HasAtomicFlatPkAdd16Insts = false;
bool HasAtomicFaddRtnInsts = false;
bool HasAtomicFaddNoRtnInsts = false;
+ bool HasAtomicMemoryAtomicFaddF32DenormalSupport = false;
bool HasAtomicBufferGlobalPkAddF16NoRtnInsts = false;
bool HasAtomicBufferGlobalPkAddF16Insts = false;
bool HasAtomicCSubNoRtnInsts = false;
@@ -872,6 +873,12 @@ class GCNSubtarget final : public AMDGPUGenSubtargetInfo,
bool hasFlatAtomicFaddF32Inst() const { return HasFlatAtomicFaddF32Inst; }
+ /// \return true if the target's flat, global, and buffer atomic fadd for
+ /// float supports denormal handling.
+ bool hasMemoryAtomicFaddF32DenormalSupport() const {
+ return HasAtomicMemoryAtomicFaddF32DenormalSupport;
+ }
+
/// \return true if atomic operations targeting fine-grained memory work
/// correctly at device scope, in allocations in host or peer PCIe device
/// memory.
|
It is worse than that. It behaves differently depending on where atomic is executed. There is no single answer if this instruction supports denorms or not. |
That doesn't matter. The flat case that sometimes flushes is just a no. Flushing is never a guarantee, we only need to know a flush may happen |
2da0565
to
1a441c0
Compare
4594135
to
3ec4e64
Compare
1a441c0
to
302a99a
Compare
3ec4e64
to
47017c2
Compare
302a99a
to
10c0aec
Compare
23ec97c
to
b57b67e
Compare
10c0aec
to
81cc1b7
Compare
b57b67e
to
5a62792
Compare
81cc1b7
to
438d5bb
Compare
5a62792
to
1e3c134
Compare
438d5bb
to
53a120c
Compare
1e3c134
to
ab52788
Compare
53a120c
to
8a87e14
Compare
ab52788
to
1a5d8b8
Compare
8a87e14
to
e0ae621
Compare
1a5d8b8
to
9cf93c6
Compare
e0ae621
to
1a6ff86
Compare
9cf93c6
to
deebca2
Compare
1a6ff86
to
a060a2a
Compare
deebca2
to
573e7bc
Compare
a060a2a
to
76190f2
Compare
573e7bc
to
5ef29a5
Compare
76190f2
to
2b6a7bb
Compare
Not sure what the behavior for gfx90a is. The SPG says it always flushes. The instruction documentation says it does not.
RDNA 3 manual says "Floating-point addition handles NAN/INF/denorm" thought I'm not sure I trust it.
5ef29a5
to
43dc4f2
Compare
…llvm#96443) Not sure what the behavior for gfx90a is. The SPG says it always flushes. The instruction documentation says it does not.
Not sure what the behavior for gfx90a is. The SPG says it always flushes.
The instruction documentation says it does not.