[Offload] Implement event sync in amdgpu #149300

RossBrunton · 2025-07-17T12:47:13Z

No description provided.

llvmbot · 2025-07-17T12:47:45Z

@llvm/pr-subscribers-offload

@llvm/pr-subscribers-backend-amdgpu

Author: Ross Brunton (RossBrunton)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/149300.diff

3 Files Affected:

(modified) offload/plugins-nextgen/amdgpu/src/rtl.cpp (+50-2)
(modified) offload/unittests/OffloadAPI/common/Fixtures.hpp (-3)
(modified) offload/unittests/OffloadAPI/event/olWaitEvent.cpp (-3)

diff --git a/offload/plugins-nextgen/amdgpu/src/rtl.cpp b/offload/plugins-nextgen/amdgpu/src/rtl.cpp
index 12c7cc62905c9..b2fd950c9d500 100644
--- a/offload/plugins-nextgen/amdgpu/src/rtl.cpp
+++ b/offload/plugins-nextgen/amdgpu/src/rtl.cpp
@@ -1115,6 +1115,18 @@ struct AMDGPUStreamTy {
     return Plugin::success();
   }
 
+  /// Complete pending post actions until and including the event in target
+  /// slot.
+  Error completeUntil(uint32_t TargetSlot) {
+    for (uint32_t Slot = 0; Slot <= TargetSlot; ++Slot) {
+      // Take the post action of the operation if any.
+      if (auto Err = Slots[Slot].performAction())
+        return Err;
+    }
+
+    return Plugin::success();
+  }
+
   /// Make the current stream wait on a specific operation of another stream.
   /// The idea is to make the current stream waiting on two signals: 1) the last
   /// signal of the current stream, and 2) the last signal of the other stream.
@@ -1502,6 +1514,11 @@ struct AMDGPUStreamTy {
     return complete();
   }
 
+  /// Synchronize the stream until the given event. The current thread waits
+  /// until the provided event is finalized, and it performs the pending post
+  /// actions for that and prior events.
+  Error synchronizeOn(AMDGPUEventTy &Event);
+
   /// Query the stream and complete pending post actions if operations finished.
   /// Return whether all the operations completed. This operation does not block
   /// the calling thread.
@@ -1575,6 +1592,21 @@ struct AMDGPUEventTy {
     return Stream.waitEvent(*this);
   }
 
+  Error sync() {
+    std::lock_guard<std::mutex> Lock(Mutex);
+
+    if (!RecordedStream)
+      return Plugin::error(ErrorCode::INVALID_ARGUMENT,
+                           "event does not have any recorded stream");
+
+    // No need to wait on anything, the recorded stream already finished the
+    // corresponding operation.
+    if (RecordedSlot < 0)
+      return Plugin::success();
+
+    return RecordedStream->synchronizeOn(*this);
+  }
+
 protected:
   /// The stream registered in this event.
   AMDGPUStreamTy *RecordedStream;
@@ -1630,6 +1662,22 @@ Error AMDGPUStreamTy::waitEvent(const AMDGPUEventTy &Event) {
   return waitOnStreamOperation(RecordedStream, Event.RecordedSlot);
 }
 
+Error AMDGPUStreamTy::synchronizeOn(AMDGPUEventTy &Event) {
+  std::lock_guard<std::mutex> Lock(Mutex);
+
+  // Wait until the requested slot has completed
+  if (auto Err = Slots[Event.RecordedSlot].Signal->wait(
+          StreamBusyWaitMicroseconds, &Device))
+    return Err;
+
+  // If the event is the last one in the stream, just do a full finalize
+  if (Event.RecordedSlot == last())
+    return complete();
+
+  // Otherwise, only finalize until the appropriate event
+  return completeUntil(Event.RecordedSlot);
+}
+
 struct AMDGPUStreamManagerTy final
     : GenericDeviceResourceManagerTy<AMDGPUResourceRef<AMDGPUStreamTy>> {
   using ResourceRef = AMDGPUResourceRef<AMDGPUStreamTy>;
@@ -2540,8 +2588,8 @@ struct AMDGPUDeviceTy : public GenericDeviceTy, AMDGenericDeviceTy {
 
   /// Synchronize the current thread with the event.
   Error syncEventImpl(void *EventPtr) override {
-    return Plugin::error(ErrorCode::UNIMPLEMENTED,
-                         "synchronize event not implemented");
+    AMDGPUEventTy *Event = reinterpret_cast<AMDGPUEventTy *>(EventPtr);
+    return Event->sync();
   }
 
   /// Print information about the device.
diff --git a/offload/unittests/OffloadAPI/common/Fixtures.hpp b/offload/unittests/OffloadAPI/common/Fixtures.hpp
index e5d815ecda965..546921164f691 100644
--- a/offload/unittests/OffloadAPI/common/Fixtures.hpp
+++ b/offload/unittests/OffloadAPI/common/Fixtures.hpp
@@ -171,9 +171,6 @@ struct OffloadQueueTest : OffloadDeviceTest {
 struct OffloadEventTest : OffloadQueueTest {
   void SetUp() override {
     RETURN_ON_FATAL_FAILURE(OffloadQueueTest::SetUp());
-    if (getPlatformBackend() == OL_PLATFORM_BACKEND_AMDGPU)
-      GTEST_SKIP() << "AMDGPU synchronize event not implemented";
-
     // Get an event from a memcpy. We can still use it in olGetEventInfo etc
     // after it has been waited on.
     void *Alloc;
diff --git a/offload/unittests/OffloadAPI/event/olWaitEvent.cpp b/offload/unittests/OffloadAPI/event/olWaitEvent.cpp
index 05356d4ef8d75..f80dabb4fc93f 100644
--- a/offload/unittests/OffloadAPI/event/olWaitEvent.cpp
+++ b/offload/unittests/OffloadAPI/event/olWaitEvent.cpp
@@ -14,9 +14,6 @@ using olWaitEventTest = OffloadQueueTest;
 OFFLOAD_TESTS_INSTANTIATE_DEVICE_FIXTURE(olWaitEventTest);
 
 TEST_P(olWaitEventTest, Success) {
-  if (getPlatformBackend() == OL_PLATFORM_BACKEND_AMDGPU)
-    GTEST_SKIP() << "AMDGPU synchronize event not implemented";
-
   uint32_t Src = 42;
   void *DstPtr;

RossBrunton · 2025-07-17T12:50:40Z

@jhuber6 I know you said you were going to look into this, but I had a quick look and it seems that all of the machinery was already there, and it just needed plumbing in. Hopefully this is all correct.

jhuber6 · 2025-07-17T13:26:49Z

offload/plugins-nextgen/amdgpu/src/rtl.cpp

  }

+  Error sync() {
+    std::lock_guard<std::mutex> Lock(Mutex);


Can you explain the need for these mutexes? I'm just a little concerned when we have mutexes combined with functions that handle callbacks.

I cribbed this function from wait above, which does a similar thing and also protects the event with a mutex. But looking at them both closely, I'm not sure why either need to be guarded. The only way I can see it being required is if you wait/sync on an event that is in the middle of a record... But I'm not sure if that can ever happen. If we do remove the mutex, I think it makes sense to remove it entirely and as a separate change just to make it easier to track down if it does cause issues.

For synchronizeOn, I think it is required because another thread (or callback handler) could add, synchronise, wait or finalize an event while synchronizeOn is running.

Just had a quick look. Libompt allows you to "record" events after they are constructed, which I assume can also happen while you wait/sync on them, so I think the mutex is required.

jhuber6

Yeah, also this is reminding me that we really need some unit tests that stress this stuff in parallel.

[Offload] Implement event sync in amdgpu

e4d1370

llvmbot added backend:AMDGPU offload labels Jul 17, 2025

RossBrunton requested review from callumfare and jhuber6 July 17, 2025 12:49

jhuber6 reviewed Jul 17, 2025

View reviewed changes

jhuber6 approved these changes Jul 17, 2025

View reviewed changes

RossBrunton merged commit df9a864 into llvm:main Jul 18, 2025
12 checks passed

This was referenced Jul 23, 2025

test abhinavgaba/llvm-project#2

Closed

Add dataFence plugin interface abhinavgaba/llvm-project#3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Offload] Implement event sync in amdgpu #149300

[Offload] Implement event sync in amdgpu #149300

Uh oh!

RossBrunton commented Jul 17, 2025

Uh oh!

llvmbot commented Jul 17, 2025 •

edited

Loading

Uh oh!

RossBrunton commented Jul 17, 2025

Uh oh!

jhuber6 Jul 17, 2025

Uh oh!

RossBrunton Jul 17, 2025

Uh oh!

RossBrunton Jul 18, 2025

Uh oh!

jhuber6 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Offload] Implement event sync in amdgpu #149300

[Offload] Implement event sync in amdgpu #149300

Uh oh!

Conversation

RossBrunton commented Jul 17, 2025

Uh oh!

llvmbot commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RossBrunton commented Jul 17, 2025

Uh oh!

jhuber6 Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

RossBrunton Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

RossBrunton Jul 18, 2025

Choose a reason for hiding this comment

Uh oh!

jhuber6 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

llvmbot commented Jul 17, 2025 •

edited

Loading