[rocm6.4_internal_testing] [NAVI32] Skipped sdpa_2 test in test_aot_inductor for Navi32 (#1882)

iupaikov-amd · dnikolaev-amd · commit 123e7a93cbb5 · 2025-04-17T15:57:07.000Z
The test fails with assertion error "Tensors are not close" After testing I can confirm that this issue is caused by eager mode execution specific to navi32 during the test_sdpa_2 run. Made a cross reference between navi31, navi32 and mi300. AOTInductor results are all the exact same for all of the archs, only the eager mode fails here for navi32 with 1.5% difference in tensor values from the gpu run. I assume that this happens due to fp16-32-16 conversions in eager mode or missing some if-statements for navi32 specifically. Simple reproducer to check the values for cpu/gpu/eager/aoti runs. [gfx1101_test_sdpa_2_issue_reproducer.txt](https://github.com/user-attachments/files/18676367/gfx1101_test_sdpa_2_issue_reproducer.txt) (cherry picked from commit 896c789)
diff --git a/test/inductor/test_aot_inductor.py b/test/inductor/test_aot_inductor.py
@@ -45,8 +45,10 @@
     IS_WINDOWS,
     parametrize,
     skipIfRocm,
+    skipIfRocmArch,
     skipIfXpu,
     TEST_WITH_ROCM,
+    NAVI32_ARCH,
 )
 from torch.testing._internal.custom_tensor import CustomTensorPlainOut
 from torch.testing._internal.inductor_utils import GPU_TYPE
@@ -1016,6 +1018,8 @@ def forward(self, q, k, v):
         )
         self.check_model(Model(), example_inputs)
 
+    # Eager mode produces incorrect tensor values for navi32 during this test
+    @skipIfRocmArch(NAVI32_ARCH)
     @unittest.skipIf(IS_FBCODE, "Not yet runnable in fbcode")
     @unittest.skipIf(not SM80OrLater, "bfloat16 only supported in sm80+")
     def test_sdpa_2(self):
diff --git a/torch/testing/_internal/common_utils.py b/torch/testing/_internal/common_utils.py
@@ -1349,6 +1349,8 @@ def printErrors(self) -> None:
 IS_ARM64 = platform.machine() in ('arm64', 'aarch64')
 IS_S390X = platform.machine() == "s390x"
 
+NAVI32_ARCH = "gfx1101"
+
 def is_navi_arch():
     if torch.cuda.is_available():
         prop = torch.cuda.get_device_properties(0)