-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[Offload] Add Error Codes to PluginInterface #138258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
I'm not sure if this is the exact design LLVM is looking for for error codes here, but it's probably worth having something to look at and discuss as a possible solution. Basically, we have both liboffload and PluginInterface wanting to share the same error codes (to avoid converting to and from them). However, we also want these error codes to be available in the OffloadAPI.h file directly so only one file needs to be shipped to users. Further compounding this is that OffloadAPI.h is a C header, so we can't use any of the std::error_code goodness. I've added a new tablegen target such that when offload is generating OffloadAPI.h, it also generates a OffloadErrcodes.inc file inside the plugins include dir (which ensures that they always stay in sync), which is checked into the repo like generated files. This is a bit jank, so if anyone knows of a simpler way, let me know. |
@llvm/pr-subscribers-offload Author: Ross Brunton (RossBrunton) ChangesA new ErrorCode enumeration is present in PluginInterface which can This enum must be kept up to sync with liboffload's ol_errc_t enum, so Some error codes have also been shuffled around to allow for future Depends on #137339 , please ignore first commit of this MR. Patch is 26.12 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/138258.diff 17 Files Affected:
diff --git a/offload/liboffload/API/APIDefs.td b/offload/liboffload/API/APIDefs.td
index 640932dcf8464..38508525f1d26 100644
--- a/offload/liboffload/API/APIDefs.td
+++ b/offload/liboffload/API/APIDefs.td
@@ -152,9 +152,10 @@ class Function : APIObject {
AddHandleChecksToReturns<params, returns_with_def>.returns_out>.returns_out;
}
-class Etor<string Name, string Desc> {
+class Etor<string Name, string Desc, int Value=-1> {
string name = Name;
string desc = Desc;
+ int value = Value;
string tagged_type;
}
diff --git a/offload/liboffload/API/CMakeLists.txt b/offload/liboffload/API/CMakeLists.txt
index 8fd6cb539374a..62b8803ca2a13 100644
--- a/offload/liboffload/API/CMakeLists.txt
+++ b/offload/liboffload/API/CMakeLists.txt
@@ -11,13 +11,17 @@ if (CLANG_FORMAT)
tablegen(OFFLOAD OffloadFuncs.inc -gen-func-names)
tablegen(OFFLOAD OffloadImplFuncDecls.inc -gen-impl-func-decls)
tablegen(OFFLOAD OffloadPrint.hpp -gen-print-header)
+ tablegen(OFFLOAD OffloadErrcodes.inc -gen-errcodes)
- set(OFFLOAD_GENERATED_FILES ${TABLEGEN_OUTPUT})
+ set(FILES_TO_COPY "OffloadAPI.h;OffloadEntryPoints.inc;OffloadFuncs.inc;OffloadImplFuncDecls.inc;OffloadPrint.hpp")
+ set(GEN_DIR ${CMAKE_CURRENT_SOURCE_DIR}/../include/generated)
add_public_tablegen_target(OffloadGenerate)
add_custom_command(TARGET OffloadGenerate POST_BUILD COMMAND ${CLANG_FORMAT}
- -i ${OFFLOAD_GENERATED_FILES})
+ -i ${TABLEGEN_OUTPUT})
add_custom_command(TARGET OffloadGenerate POST_BUILD COMMAND ${CMAKE_COMMAND}
- -E copy_if_different ${OFFLOAD_GENERATED_FILES} "${CMAKE_CURRENT_SOURCE_DIR}/../include/generated")
+ -E copy_if_different ${FILES_TO_COPY} ${GEN_DIR})
+ add_custom_command(TARGET OffloadGenerate POST_BUILD COMMAND ${CMAKE_COMMAND}
+ -E copy_if_different OffloadErrcodes.inc "${LIBOFFLOAD_ROOT}/../plugins-nextgen/common/include/OffloadErrcodes.inc")
else()
message(WARNING "clang-format was not found, so the OffloadGenerate target\
will not be available. Offload will still build, but you will not be\
diff --git a/offload/liboffload/API/Common.td b/offload/liboffload/API/Common.td
index de7502b540618..12551d1eb5fd5 100644
--- a/offload/liboffload/API/Common.td
+++ b/offload/liboffload/API/Common.td
@@ -83,26 +83,30 @@ def : Typedef {
let value = "void *";
}
-def : Enum {
+def ErrorCode : Enum {
let name = "ol_errc_t";
let desc = "Defines Return/Error codes";
let etors =[
- Etor<"SUCCESS", "Success">,
- Etor<"INVALID_VALUE", "Invalid Value">,
+ Etor<"SUCCESS", "Success", 0>,
+
+ // Universal errors
+ Etor<"INVALID_NULL_POINTER", "A pointer argument is null when it should not be">,
+ Etor<"INVALID_ARGUMENT", "An argument is invalid">,
+ Etor<"OUT_OF_RESOURCES", "Out of resources">,
+ Etor<"UNSUPPORTED", "generic error code for unsupported features and enums">,
+
+ // Liboffload specific errors
+ Etor<"INVALID_VALUE", "Invalid Value", 0x1000>,
Etor<"INVALID_PLATFORM", "Invalid platform">,
Etor<"INVALID_DEVICE", "Invalid device">,
Etor<"INVALID_QUEUE", "Invalid queue">,
Etor<"INVALID_EVENT", "Invalid event">,
Etor<"INVALID_KERNEL_NAME", "Named kernel not found in the program binary">,
- Etor<"OUT_OF_RESOURCES", "Out of resources">,
- Etor<"UNSUPPORTED_FEATURE", "generic error code for unsupported features">,
- Etor<"INVALID_ARGUMENT", "generic error code for invalid arguments">,
Etor<"INVALID_NULL_HANDLE", "handle argument is not valid">,
- Etor<"INVALID_NULL_POINTER", "pointer argument may not be nullptr">,
Etor<"INVALID_SIZE", "invalid size or dimensions (e.g., must not be zero, or is out of bounds)">,
Etor<"INVALID_ENUMERATION", "enumerator argument is not valid">,
- Etor<"UNSUPPORTED_ENUMERATION", "enumerator argument is not supported by the device">,
- Etor<"UNKNOWN", "Unknown or internal error">
+
+ Etor<"UNKNOWN", "Unknown or internal error", 0x10000>
];
}
diff --git a/offload/liboffload/CMakeLists.txt b/offload/liboffload/CMakeLists.txt
index db12236ddfc7f..9927fa3c3400a 100644
--- a/offload/liboffload/CMakeLists.txt
+++ b/offload/liboffload/CMakeLists.txt
@@ -1,3 +1,5 @@
+set(LIBOFFLOAD_ROOT "${CMAKE_CURRENT_SOURCE_DIR}")
+
add_subdirectory(API)
add_llvm_library(
diff --git a/offload/liboffload/include/OffloadImpl.hpp b/offload/liboffload/include/OffloadImpl.hpp
index ec470a355309a..7d2c0c53fc85b 100644
--- a/offload/liboffload/include/OffloadImpl.hpp
+++ b/offload/liboffload/include/OffloadImpl.hpp
@@ -7,6 +7,7 @@
//===----------------------------------------------------------------------===//
#pragma once
+#include "PluginInterface.h"
#include <OffloadAPI.h>
#include <iostream>
#include <memory>
@@ -19,6 +20,7 @@
#include "llvm/ADT/DenseSet.h"
#include "llvm/ADT/StringRef.h"
#include "llvm/ADT/StringSet.h"
+#include "llvm/Support/Error.h"
struct OffloadConfig {
bool TracingEnabled = false;
@@ -88,8 +90,28 @@ struct ol_impl_result_t {
Result = errors().emplace(std::move(Err)).first->get();
}
+ static ol_impl_result_t fromError(llvm::Error &&Error) {
+ ol_errc_t ErrCode;
+ llvm::StringRef Details;
+ llvm::handleAllErrors(std::move(Error), [&](llvm::StringError &Err) {
+ ErrCode = GetErrorCode(Err.convertToErrorCode());
+ Details = errorStrs().insert(Err.getMessage()).first->getKeyData();
+ });
+
+ return ol_impl_result_t{ErrCode, Details};
+ }
+
operator ol_result_t() { return Result; }
private:
+ static ol_errc_t GetErrorCode(std::error_code Code) {
+ if (Code.category() == llvm::omp::target::plugin::make_error_code(
+ llvm::omp::target::plugin::ErrorCode::SUCCESS)
+ .category()) {
+ return static_cast<ol_errc_t>(Code.value());
+ }
+ return OL_ERRC_UNKNOWN;
+ }
+
ol_result_t Result;
};
diff --git a/offload/liboffload/include/generated/OffloadAPI.h b/offload/liboffload/include/generated/OffloadAPI.h
index ace31c57cf2f8..13a840ce772fb 100644
--- a/offload/liboffload/include/generated/OffloadAPI.h
+++ b/offload/liboffload/include/generated/OffloadAPI.h
@@ -17,6 +17,45 @@
extern "C" {
#endif
+///////////////////////////////////////////////////////////////////////////////
+/// @brief Defines Return/Error codes
+typedef enum ol_errc_t {
+ /// Success
+ OL_ERRC_SUCCESS = 0,
+ /// A pointer argument is null when it should not be
+ OL_ERRC_INVALID_NULL_POINTER = 1,
+ /// An argument is invalid
+ OL_ERRC_INVALID_ARGUMENT = 2,
+ /// Out of resources
+ OL_ERRC_OUT_OF_RESOURCES = 3,
+ /// generic error code for unsupported features and enums
+ OL_ERRC_UNSUPPORTED = 4,
+ /// Invalid Value
+ OL_ERRC_INVALID_VALUE = 4096,
+ /// Invalid platform
+ OL_ERRC_INVALID_PLATFORM = 4097,
+ /// Invalid device
+ OL_ERRC_INVALID_DEVICE = 4098,
+ /// Invalid queue
+ OL_ERRC_INVALID_QUEUE = 4099,
+ /// Invalid event
+ OL_ERRC_INVALID_EVENT = 4100,
+ /// Named kernel not found in the program binary
+ OL_ERRC_INVALID_KERNEL_NAME = 4101,
+ /// handle argument is not valid
+ OL_ERRC_INVALID_NULL_HANDLE = 4102,
+ /// invalid size or dimensions (e.g., must not be zero, or is out of bounds)
+ OL_ERRC_INVALID_SIZE = 4103,
+ /// enumerator argument is not valid
+ OL_ERRC_INVALID_ENUMERATION = 4104,
+ /// Unknown or internal error
+ OL_ERRC_UNKNOWN = 65536,
+ /// @cond
+ OL_ERRC_FORCE_UINT32 = 0x7fffffff
+ /// @endcond
+
+} ol_errc_t;
+
///////////////////////////////////////////////////////////////////////////////
#ifndef OL_VERSION_MAJOR
/// @brief Major version of the Offload API
@@ -101,47 +140,6 @@ typedef struct ol_program_impl_t *ol_program_handle_t;
/// @brief Handle of kernel object
typedef void *ol_kernel_handle_t;
-///////////////////////////////////////////////////////////////////////////////
-/// @brief Defines Return/Error codes
-typedef enum ol_errc_t {
- /// Success
- OL_ERRC_SUCCESS = 0,
- /// Invalid Value
- OL_ERRC_INVALID_VALUE = 1,
- /// Invalid platform
- OL_ERRC_INVALID_PLATFORM = 2,
- /// Invalid device
- OL_ERRC_INVALID_DEVICE = 3,
- /// Invalid queue
- OL_ERRC_INVALID_QUEUE = 4,
- /// Invalid event
- OL_ERRC_INVALID_EVENT = 5,
- /// Named kernel not found in the program binary
- OL_ERRC_INVALID_KERNEL_NAME = 6,
- /// Out of resources
- OL_ERRC_OUT_OF_RESOURCES = 7,
- /// generic error code for unsupported features
- OL_ERRC_UNSUPPORTED_FEATURE = 8,
- /// generic error code for invalid arguments
- OL_ERRC_INVALID_ARGUMENT = 9,
- /// handle argument is not valid
- OL_ERRC_INVALID_NULL_HANDLE = 10,
- /// pointer argument may not be nullptr
- OL_ERRC_INVALID_NULL_POINTER = 11,
- /// invalid size or dimensions (e.g., must not be zero, or is out of bounds)
- OL_ERRC_INVALID_SIZE = 12,
- /// enumerator argument is not valid
- OL_ERRC_INVALID_ENUMERATION = 13,
- /// enumerator argument is not supported by the device
- OL_ERRC_UNSUPPORTED_ENUMERATION = 14,
- /// Unknown or internal error
- OL_ERRC_UNKNOWN = 15,
- /// @cond
- OL_ERRC_FORCE_UINT32 = 0x7fffffff
- /// @endcond
-
-} ol_errc_t;
-
///////////////////////////////////////////////////////////////////////////////
/// @brief Details of the error condition returned by an API call
typedef struct ol_error_struct_t {
@@ -477,7 +475,8 @@ OL_APIEXPORT ol_result_t OL_APICALL olMemFree(
/// @brief Enqueue a memcpy operation.
///
/// @details
-/// - For host pointers, use the device returned by olGetHostDevice
+/// - For host pointers, use the host device belonging to the
+/// OL_PLATFORM_BACKEND_HOST platform.
/// - If a queue is specified, at least one device must be a non-host device
/// - If a queue is not specified, the memcpy happens synchronously
///
diff --git a/offload/liboffload/include/generated/OffloadPrint.hpp b/offload/liboffload/include/generated/OffloadPrint.hpp
index 7f5e33aea6f73..e99bb2db669fb 100644
--- a/offload/liboffload/include/generated/OffloadPrint.hpp
+++ b/offload/liboffload/include/generated/OffloadPrint.hpp
@@ -49,6 +49,18 @@ inline llvm::raw_ostream &operator<<(llvm::raw_ostream &os,
case OL_ERRC_SUCCESS:
os << "OL_ERRC_SUCCESS";
break;
+ case OL_ERRC_INVALID_NULL_POINTER:
+ os << "OL_ERRC_INVALID_NULL_POINTER";
+ break;
+ case OL_ERRC_INVALID_ARGUMENT:
+ os << "OL_ERRC_INVALID_ARGUMENT";
+ break;
+ case OL_ERRC_OUT_OF_RESOURCES:
+ os << "OL_ERRC_OUT_OF_RESOURCES";
+ break;
+ case OL_ERRC_UNSUPPORTED:
+ os << "OL_ERRC_UNSUPPORTED";
+ break;
case OL_ERRC_INVALID_VALUE:
os << "OL_ERRC_INVALID_VALUE";
break;
@@ -67,30 +79,15 @@ inline llvm::raw_ostream &operator<<(llvm::raw_ostream &os,
case OL_ERRC_INVALID_KERNEL_NAME:
os << "OL_ERRC_INVALID_KERNEL_NAME";
break;
- case OL_ERRC_OUT_OF_RESOURCES:
- os << "OL_ERRC_OUT_OF_RESOURCES";
- break;
- case OL_ERRC_UNSUPPORTED_FEATURE:
- os << "OL_ERRC_UNSUPPORTED_FEATURE";
- break;
- case OL_ERRC_INVALID_ARGUMENT:
- os << "OL_ERRC_INVALID_ARGUMENT";
- break;
case OL_ERRC_INVALID_NULL_HANDLE:
os << "OL_ERRC_INVALID_NULL_HANDLE";
break;
- case OL_ERRC_INVALID_NULL_POINTER:
- os << "OL_ERRC_INVALID_NULL_POINTER";
- break;
case OL_ERRC_INVALID_SIZE:
os << "OL_ERRC_INVALID_SIZE";
break;
case OL_ERRC_INVALID_ENUMERATION:
os << "OL_ERRC_INVALID_ENUMERATION";
break;
- case OL_ERRC_UNSUPPORTED_ENUMERATION:
- os << "OL_ERRC_UNSUPPORTED_ENUMERATION";
- break;
case OL_ERRC_UNKNOWN:
os << "OL_ERRC_UNKNOWN";
break;
diff --git a/offload/liboffload/src/OffloadImpl.cpp b/offload/liboffload/src/OffloadImpl.cpp
index bef72a7d1851a..b50c7e0f87b7c 100644
--- a/offload/liboffload/src/OffloadImpl.cpp
+++ b/offload/liboffload/src/OffloadImpl.cpp
@@ -311,8 +311,7 @@ ol_impl_result_t olMemAlloc_impl(ol_device_handle_t Device,
auto Alloc =
Device->Device->dataAlloc(Size, nullptr, convertOlToPluginAllocTy(Type));
if (!Alloc)
- return {OL_ERRC_OUT_OF_RESOURCES,
- formatv("Could not create allocation on device {0}", Device).str()};
+ return ol_impl_result_t::fromError(Alloc.takeError());
*AllocationOut = *Alloc;
allocInfoMap().insert_or_assign(*Alloc, AllocInfo{Device, Type});
@@ -330,7 +329,7 @@ ol_impl_result_t olMemFree_impl(void *Address) {
auto Res =
Device->Device->dataDelete(Address, convertOlToPluginAllocTy(Type));
if (Res)
- return {OL_ERRC_OUT_OF_RESOURCES, "Could not free allocation"};
+ return ol_impl_result_t::fromError(std::move(Res));
allocInfoMap().erase(Address);
@@ -342,7 +341,7 @@ ol_impl_result_t olCreateQueue_impl(ol_device_handle_t Device,
auto CreatedQueue = std::make_unique<ol_queue_impl_t>(nullptr, Device);
auto Err = Device->Device->initAsyncInfo(&(CreatedQueue->AsyncInfo));
if (Err)
- return {OL_ERRC_UNKNOWN, "Could not initialize stream resource"};
+ return ol_impl_result_t::fromError(std::move(Err));
*Queue = CreatedQueue.release();
return OL_SUCCESS;
@@ -358,7 +357,7 @@ ol_impl_result_t olWaitQueue_impl(ol_queue_handle_t Queue) {
if (Queue->AsyncInfo->Queue) {
auto Err = Queue->Device->Device->synchronize(Queue->AsyncInfo);
if (Err)
- return {OL_ERRC_INVALID_QUEUE, "The queue failed to synchronize"};
+ return ol_impl_result_t::fromError(std::move(Err));
}
// Recreate the stream resource so the queue can be reused
@@ -366,7 +365,7 @@ ol_impl_result_t olWaitQueue_impl(ol_queue_handle_t Queue) {
// it to begin with.
auto Res = Queue->Device->Device->initAsyncInfo(&Queue->AsyncInfo);
if (Res)
- return {OL_ERRC_UNKNOWN, "Could not reinitialize the stream resource"};
+ return ol_impl_result_t::fromError(std::move(Res));
return OL_SUCCESS;
}
@@ -374,7 +373,7 @@ ol_impl_result_t olWaitQueue_impl(ol_queue_handle_t Queue) {
ol_impl_result_t olWaitEvent_impl(ol_event_handle_t Event) {
auto Res = Event->Queue->Device->Device->syncEvent(Event->EventInfo);
if (Res)
- return {OL_ERRC_INVALID_EVENT, "The event failed to synchronize"};
+ return ol_impl_result_t::fromError(std::move(Res));
return OL_SUCCESS;
}
@@ -390,13 +389,17 @@ ol_impl_result_t olDestroyEvent_impl(ol_event_handle_t Event) {
ol_event_handle_t makeEvent(ol_queue_handle_t Queue) {
auto EventImpl = std::make_unique<ol_event_impl_t>(nullptr, Queue);
auto Res = Queue->Device->Device->createEvent(&EventImpl->EventInfo);
- if (Res)
+ if (Res) {
+ llvm::consumeError(std::move(Res));
return nullptr;
+ }
Res = Queue->Device->Device->recordEvent(EventImpl->EventInfo,
Queue->AsyncInfo);
- if (Res)
+ if (Res) {
+ llvm::consumeError(std::move(Res));
return nullptr;
+ }
return EventImpl.release();
}
@@ -422,16 +425,16 @@ ol_impl_result_t olMemcpy_impl(ol_queue_handle_t Queue, void *DstPtr,
if (DstDevice == HostDevice()) {
auto Res = SrcDevice->Device->dataRetrieve(DstPtr, SrcPtr, Size, QueueImpl);
if (Res)
- return {OL_ERRC_UNKNOWN, "The data retrieve operation failed"};
+ return ol_impl_result_t::fromError(std::move(Res));
} else if (SrcDevice == HostDevice()) {
auto Res = DstDevice->Device->dataSubmit(DstPtr, SrcPtr, Size, QueueImpl);
if (Res)
- return {OL_ERRC_UNKNOWN, "The data submit operation failed"};
+ return ol_impl_result_t::fromError(std::move(Res));
} else {
auto Res = SrcDevice->Device->dataExchange(SrcPtr, *DstDevice->Device,
DstPtr, Size, QueueImpl);
if (Res)
- return {OL_ERRC_UNKNOWN, "The data exchange operation failed"};
+ return ol_impl_result_t::fromError(std::move(Res));
}
if (EventOut)
@@ -459,7 +462,7 @@ ol_impl_result_t olCreateProgram_impl(ol_device_handle_t Device,
Device->Device->loadBinary(Device->Device->Plugin, &Prog->DeviceImage);
if (!Res) {
delete Prog;
- return OL_ERRC_INVALID_VALUE;
+ return ol_impl_result_t::fromError(Res.takeError());
}
Prog->Image = *Res;
@@ -483,7 +486,7 @@ ol_impl_result_t olGetKernel_impl(ol_program_handle_t Program,
auto Err = KernelImpl->init(Device, *Program->Image);
if (Err)
- return {OL_ERRC_UNKNOWN, "Could not initialize the kernel"};
+ return ol_impl_result_t::fromError(std::move(Err));
*Kernel = &*KernelImpl;
@@ -526,7 +529,7 @@ olLaunchKernel_impl(ol_queue_handle_t Queue, ol_device_handle_t Device,
AsyncInfoWrapper.finalize(Err);
if (Err)
- return {OL_ERRC_UNKNOWN, "Could not finalize the AsyncInfoWrapper"};
+ return ol_impl_result_t::fromError(std::move(Err));
if (EventOut)
*EventOut = makeEvent(Queue);
diff --git a/offload/plugins-nextgen/common/include/OffloadErrcodes.inc b/offload/plugins-nextgen/common/include/OffloadErrcodes.inc
new file mode 100644
index 0000000000000..146a2cd0ce0bf
--- /dev/null
+++ b/offload/plugins-nextgen/common/include/OffloadErrcodes.inc
@@ -0,0 +1,37 @@
+//===- Auto-generated file, part of the LLVM/Offload project --------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef OFFLOAD_ERRC
+#error Please define the macro OFFLOAD_ERRCODE(Name, Desc, Value)
+#endif
+
+// Error codes are shared between PluginInterface and liboffload.
+// To add new error codes, add them to offload/liboffload/API/Common.td and run
+// the GenerateOffload target.
+
+OFFLOAD_ERRC(SUCCESS, "Success", 0)
+OFFLOAD_ERRC(INVALID_NULL_POINTER,
+ "A pointer argument is null when it should not be", 1)
+OFFLOAD_ERRC(INVALID_ARGUMENT, "An argument is invalid", 2)
+OFFLOAD_ERRC(OUT_OF_RESOURCES, "Out of resources", 3)
+OFFLOAD_ERRC(UNSUPPORTED,
+ "generic error code for unsupported features and enums", 4)
+OFFLOAD_ERRC(INVALID_VALUE, "Invalid Value", 4096)
+OFFLOAD_ERRC(INVALID_PLATFORM, "Invalid platform", 4097)
+OFFLOAD_ERRC(INVALID_DEVICE, "Invalid device", 4098)
+OFFLOAD_ERRC(INVALID_QUEUE, "Invalid queue", 4099)
+OFFLOAD_ERRC(INVALID_EVENT, "Invalid event", 4100)
+OFFLOAD_ERRC(INVALID_KERNEL_NAME,
+ "Named kernel not found in the program binary", 4101)
+OFFLOAD_ERRC(INVALID_NULL_HANDLE, "handle argument is not valid", 4102)
+OFFLOAD_ERRC(
+ INVALID_SIZE,
+ "invalid size or dimensions (e.g., must not be zero, or is out of bounds)",
+ 4103)
+OFFLOAD_ERRC(INVALID_ENUMERATION, "enumerator argument is not valid", 4104)
+OFFLOAD_ERRC(UNKNOWN, "Unknown or internal error", 65536)
diff --git a/offload/plugins-nextgen/common/include/PluginInterface.h b/offload/plugins-nextgen/common/include/PluginInterface.h
index e54a8afdd3f4f..19fe26b10760b 100644
--- a/offload/plugins-nextgen/common/include/PluginInterface.h
+++ b/offload/plugins-nextgen/common/include/PluginInterface.h
@@ -58,6 +58,30 @@ struct GenericKernelTy;
struct GenericDeviceTy;
struct RecordReplayTy;
+enum class ErrorCode {
+#define OFFLOAD_ERRC(Name, _, Value) Name = Value,
+#include "OffloadErrcodes.inc"
+#undef OFFLOAD_ERRC
+};
+
+class OffloadErrorCategory : public std::error_category {
+ const char *name() const noexcept override { return "Offload Error"; }
+ std::string message(int ev) const override {
+ switch (static_cast<ErrorCode>(ev)) {
+#define OFFLOAD_ERRC(Name, Desc, Value) \
+ case ErrorCode::Name: \
+ return #Desc;
+#include "OffloadErrcodes.inc"
+#undef OFFLOAD_ERRC
+ }
+ }
+};
+
+inline std::error_code make_error_code(ErrorCode EC) {
+ static OffloadErrorCategory Cat{};
+ return {static_cast<int>(EC), Cat};
+}
+
/// Class that wraps the __tgt_async_info to simply its usage. In case the
/// object is constructed without a valid __tgt_async_info, the object will use
/// an internal one and will synchronize the current thread with the pending
@@ -1385,7 +1409,13 @@ static inline Error success() { return Error::success(); }
/// Create a string error.
template <typename... ArgsTy>
static Error error(const char *ErrFmt, ArgsTy... Args) {
- return createStringError(inconvertibleErrorCode(), ErrFmt, Args...);
+ return createStringError(ErrorCode::UNKNOWN, ErrFmt, Args...);
+}
+
+/// Create a strin...
[truncated]
|
@@ -477,7 +475,8 @@ OL_APIEXPORT ol_result_t OL_APICALL olMemFree( | |||
/// @brief Enqueue a memcpy operation. | |||
/// | |||
/// @details | |||
/// - For host pointers, use the device returned by olGetHostDevice | |||
/// - For host pointers, use the host device belonging to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change isn't part of this MR, but I assume a previous update to the comments didn't get regen-ed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's my bad. If we continue to check in the generated files it would be useful to have CI catch this by checking there's no diff after regenerating.
✅ With the latest revision this PR passed the C/C++ code formatter. |
offload/liboffload/API/Common.td
Outdated
Etor<"UNSUPPORTED_ENUMERATION", "enumerator argument is not supported by the device">, | ||
Etor<"UNKNOWN", "Unknown or internal error"> | ||
|
||
Etor<"UNKNOWN", "Unknown or internal error", 0x10000> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this have the same enum value as invalid value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't, this is 10,000 while invalid value is 1000.
Would UINT32_MAX make more sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, missed the extra zero. Why do these need to be so separate in the first place? I really don't think there should be a distinction between these errors at all, so we'd just want 0x1000
to be a 'generic' error code and then everything after that to be something more specific both the 'plugin' implementation and the API use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The design I'm considering is to partition errors into several categories:
- "Common" errors that are shared across all components.
- Errors that only make sense returned from liboffload.
- HSA specific errors.
- Cuda specific errors.
There are a number of errors that only make sense when returned from liboffload (a plugin will never return INVALID_DEVICE
since it doesn't handle device handles), which is why there is a division there. Especially since in theory these error codes could be used by things other than liboffload.
UNKNOWN
being the highest value makes sense to me, but I can't exactly articulate why so I'll just move it into the "common" group.
@callumfare HSA and Cuda also have specific features that only make sense for them and no other backend. But I agree that maybe the "NATIVE_ERROR" could just map to "UNKNOWN". It's something to probably discuss in the next MR though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my head the whole point of this is to abstract over these platforms. If we're just providing hooks into the underlying implementation people can just use that instead and cut out the middle man. What's the benefit to having these specific error codes instead of just having a completely common set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding was that it was desirable to give backends the ability to pass backend-specific error codes to PluginInterface users. If that is not the case, I'm perfectly happy to axe the categories thing and have them all be in the same pool.
Thinking out loud here, maybe having a field for the CUResult
/hsa_status_t
in the ol_result_t
struct might be a better solution to this? But that's a future discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is some concept of interop, but I don't really remember how fleshed out it is. The idea with interop is that you'd get the underlying object for that platform and then you could do you own error handling on it. I don't think that requires us to forward their error codes, since we already convert from the CUDA / HSA error codes to the LLVM error type ASAP currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just pushed a new commit that removes the "partitioning" entirely.
RossBrunton@2b4bcd3 Example of how this new error reporting works in practice (figured It'd be best to split it into a new PR once this is merged). |
Overall LGTM to me but one thing I don't like about this proposed use is that the plugin-specific error categories make it hard for runtimes and other users to handle errors from the library in a generic way. Especially |
A new ErrorCode enumeration is present in PluginInterface which can be used when returning an llvm::Error from offload and PluginInterface functions. This enum must be kept up to sync with liboffload's ol_errc_t enum, so both are automatically generated from liboffload's enum definition. Some error codes have also been shuffled around to allow for future work. Note that this patch only adds the machinery; actual error codes will be added in a future patch.
// the GenerateOffload target. | ||
|
||
OFFLOAD_ERRC(SUCCESS, "Success", 0) | ||
OFFLOAD_ERRC(UNKNOWN, "Unknown or internal error", 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we even need these string arguments. The main benefit of doing errors the way we do with the LLVM error type and a tuple is that we can create error codes that are more descriptive. I suppose they serve as a good default if someone doesn't specify, but realistically the intended string should be generated where the error is created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::error_code
requires a message
function that returns a string description of the error code, so if you pull the error code out of the llvm::Error and query it directly (or just create a new one out of the blue), it will print that message. I don't know why you'd do that, but you can.
I imagine having descriptions for the error codes is also probably useful for users of the C API that just want to use the error code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I guess it serves as a form of comment for usage, but I'm just saying in general that these will likely be unused internally.
inline std::error_code make_error_code(ErrorCode EC) { | ||
static OffloadErrorCategory Cat{}; | ||
return {static_cast<int>(EC), Cat}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do we really need std::error_code
for? I thought we were just going from LLVM error to error code + string pair in the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error codes inside llvm::Error are a std::error_code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just wondering if this is really necessary, going on comments like this which do something similar to what you're doing here
llvm-project/llvm/lib/Object/Error.cpp
Line 21 in f49ee00
// FIXME: This class is only here to support the transition to llvm::Error. It |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The createStringError message takes a std::error_code as a parameter, which is why I added all the error_code machinery.
I also think it makes sense to use here, since it also allows us to inspect whether a given error code is an "offload" error code or some other kind of error that has snuck in (which is what GetErrorCode does).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but shouldn't we be able to use https://en.cppreference.com/w/cpp/error/errc/make_error_code for the generic category? I don't think having an offload
category helps us here since it's not used for anything. The string message is going to come from the person who created the error, so we only care about the numerical part of the error code.
A new ErrorCode enumeration is present in PluginInterface which can
be used when returning an llvm::Error from offload and PluginInterface
functions.
This enum must be kept up to sync with liboffload's ol_errc_t enum, so
both are automatically generated from liboffload's enum definition.
Some error codes have also been shuffled around to allow for future
work. Note that this patch only adds the machinery; actual error codes
will be added in a future patch.
Depends on #137339 , please ignore first commit of this MR.This has been merged.