Skip to content

[SYCL] Fix depends_on handling with pi commands #5901

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Sep 1, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
d622da0
Fix lost dependencies in case of depends_on usage and blocking commands
KseniyaTikhomirova Mar 28, 2022
f4b9a4f
Move empty node commands cleanup to post-enqueue cleanup (fixes clean…
KseniyaTikhomirova Mar 28, 2022
6bc9474
Update unit test for post enqueue cleanup & wait
KseniyaTikhomirova Mar 28, 2022
1238b1e
Fix lost dependencies
KseniyaTikhomirova Mar 29, 2022
03e7541
Move event skip for in-order queue to getPIEvents
KseniyaTikhomirova Mar 31, 2022
8e2ea56
Add test with kernel usage and make MockHandler common
KseniyaTikhomirova Mar 28, 2022
9bd31fb
Add unit tests
KseniyaTikhomirova Apr 1, 2022
17309be
Minor fix for event status handling (covered by tests above)
KseniyaTikhomirova Apr 1, 2022
6ff2eb6
Merge branch 'sycl' into default_dependson
KseniyaTikhomirova Apr 1, 2022
b04d724
Fix test failure
KseniyaTikhomirova Apr 1, 2022
5a82065
Fix event status handling 2
KseniyaTikhomirova Apr 1, 2022
a94e513
Limit backends for test
KseniyaTikhomirova Apr 1, 2022
78643fc
Fix tests
KseniyaTikhomirova Apr 4, 2022
8b9d95f
Fix clang-format
KseniyaTikhomirova Apr 4, 2022
2b85d95
Merge branch 'sycl' into default_dependson
KseniyaTikhomirova Aug 11, 2022
a0c087b
Fix comments
KseniyaTikhomirova Aug 11, 2022
9fe1ea7
Add missed part for comments
KseniyaTikhomirova Aug 11, 2022
23707d3
Fix clang-format
KseniyaTikhomirova Aug 11, 2022
460ff51
FIx error code
KseniyaTikhomirova Aug 12, 2022
df467f5
Update namespaces
KseniyaTikhomirova Aug 12, 2022
e4da934
Fix unittests build
KseniyaTikhomirova Aug 12, 2022
8eec835
Revert condition state
KseniyaTikhomirova Aug 16, 2022
1588b9c
Merge branch 'sycl' into default_dependson
KseniyaTikhomirova Aug 18, 2022
cbed05e
Merge branch 'sycl' into default_dependson
KseniyaTikhomirova Aug 29, 2022
7d5def9
Fix comments
KseniyaTikhomirova Aug 29, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 9 additions & 3 deletions sycl/source/detail/event_impl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -342,10 +342,16 @@ event_impl::get_info<info::event::command_execution_status>() {
if (MState == HES_Discarded)
return info::event_command_status::ext_oneapi_unknown;

if (!MHostEvent && MEvent) {
return get_event_info<info::event::command_execution_status>(
this->getHandleRef(), this->getPlugin());
if (!MHostEvent) {
// Command is enqueued and PiEvent is ready
if (MEvent)
return get_event_info<info::event::command_execution_status>(
this->getHandleRef(), this->getPlugin());
// Command is blocked and not enqueued, PiEvent is not assigned yet
else if (MCommand)
return sycl::info::event_command_status::submitted;
}

return MHostEvent && MState.load() != HES_Complete
? sycl::info::event_command_status::submitted
: info::event_command_status::complete;
Expand Down
7 changes: 7 additions & 0 deletions sycl/source/detail/event_impl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -212,6 +212,11 @@ class event_impl {
}
bool needsCleanupAfterWait() { return MNeedsCleanupAfterWait; }

/// Returns worker queue for command.
///
/// @return a reference to MWorkerQueue.
QueueImplPtr &getWorkerQueue() { return MWorkerQueue; };

/// Checks if an event is in a fully intialized state. Default-constructed
/// events will return true only after having initialized its native event,
/// while other events will assume that they are fully initialized at
Expand Down Expand Up @@ -243,6 +248,8 @@ class event_impl {
std::weak_ptr<queue_impl> MQueue;
const bool MIsProfilingEnabled = false;

QueueImplPtr MWorkerQueue;

/// Dependency events prepared for waiting by backend.
std::vector<EventImplPtr> MPreparedDepsEvents;
std::vector<EventImplPtr> MPreparedHostDepsEvents;
Expand Down
6 changes: 3 additions & 3 deletions sycl/source/detail/queue_impl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -120,9 +120,7 @@ class queue_impl {
}
if (!MHostQueue) {
const QueueOrder QOrder =
MPropList.has_property<property::queue::in_order>()
? QueueOrder::Ordered
: QueueOrder::OOO;
MIsInorder ? QueueOrder::Ordered : QueueOrder::OOO;
MQueues.push_back(createQueue(QOrder));
}
}
Expand Down Expand Up @@ -202,6 +200,8 @@ class queue_impl {
/// \return true if this queue has discard_events support.
bool has_discard_events_support() const { return MHasDiscardEventsSupport; }

bool isInOrder() const { return MIsInorder; }

/// Queries SYCL queue for information.
///
/// The return type depends on information being queried.
Expand Down
59 changes: 35 additions & 24 deletions sycl/source/detail/scheduler/commands.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -203,17 +203,34 @@ static std::string commandToName(Command::CommandType Type) {
}
#endif

static std::vector<RT::PiEvent>
getPiEvents(const std::vector<EventImplPtr> &EventImpls) {
std::vector<RT::PiEvent>
Command::getPiEvents(const std::vector<EventImplPtr> &EventImpls) const {
std::vector<RT::PiEvent> RetPiEvents;
for (auto &EventImpl : EventImpls) {
if (EventImpl->getHandleRef() != nullptr)
RetPiEvents.push_back(EventImpl->getHandleRef());
if (EventImpl->getHandleRef() == nullptr)
continue;

// Do not add redundant event dependencies for in-order queues.
// At this stage dependency is definitely pi task and need to check if
// current one is a host task. In this case we should not skip pi event due
// to different sync mechanisms for different task types on in-order queue.
const QueueImplPtr &WorkerQueue = getWorkerQueue();
if (EventImpl->getWorkerQueue() == WorkerQueue &&
WorkerQueue->isInOrder() && !isHostTask())
continue;

RetPiEvents.push_back(EventImpl->getHandleRef());
}

return RetPiEvents;
}

bool Command::isHostTask() const {
return (MType == CommandType::RUN_CG) /* host task has this type also */ &&
((static_cast<const ExecCGCommand *>(this))->getCG().getType() ==
CG::CGTYPE::CodeplayHostTask);
}

static void flushCrossQueueDeps(const std::vector<EventImplPtr> &EventImpls,
const QueueImplPtr &Queue) {
for (auto &EventImpl : EventImpls) {
Expand All @@ -240,7 +257,8 @@ class DispatchHostTask {
// sophisticated waiting mechanism to allow to utilize this thread for any
// other available job and resume once all required events are ready.
for (auto &PluginWithEvents : RequiredEventsPerPlugin) {
std::vector<RT::PiEvent> RawEvents = getPiEvents(PluginWithEvents.second);
std::vector<RT::PiEvent> RawEvents =
MThisCmd->getPiEvents(PluginWithEvents.second);
try {
PluginWithEvents.first->call<PiApiKind::piEventsWait>(RawEvents.size(),
RawEvents.data());
Expand Down Expand Up @@ -393,10 +411,12 @@ void Command::waitForEvents(QueueImplPtr Queue,
Command::Command(CommandType Type, QueueImplPtr Queue)
: MQueue(std::move(Queue)),
MEvent(std::make_shared<detail::event_impl>(MQueue)),
MWorkerQueue(MEvent->getWorkerQueue()),
MPreparedDepsEvents(MEvent->getPreparedDepsEvents()),
MPreparedHostDepsEvents(MEvent->getPreparedHostDepsEvents()),
MType(Type) {
MSubmittedQueue = MQueue;
MWorkerQueue = MQueue;
MEvent->setCommand(this);
MEvent->setContextImpl(MQueue->getContextImplPtr());
MEvent->setStateIncomplete();
Expand Down Expand Up @@ -600,12 +620,6 @@ Command *Command::processDepEvent(EventImplPtr DepEvent, const DepDesc &Dep,

Command *ConnectionCmd = nullptr;

// Do not add redundant event dependencies for in-order queues.
if (Dep.MDepCommand && Dep.MDepCommand->getWorkerQueue() == WorkerQueue &&
WorkerQueue->has_property<property::queue::in_order>() &&
getType() != CommandType::HOST_TASK)
return nullptr;

ContextImplPtr DepEventContext = DepEvent->getContextImpl();
// If contexts don't match we'll connect them using host task
if (DepEventContext != WorkerContext && !WorkerContext->is_host()) {
Expand All @@ -621,14 +635,14 @@ const ContextImplPtr &Command::getWorkerContext() const {
return MQueue->getContextImplPtr();
}

const QueueImplPtr &Command::getWorkerQueue() const { return MQueue; }
const QueueImplPtr &Command::getWorkerQueue() const {
assert(MWorkerQueue && "MWorkerQueue must not be nullptr");
return MWorkerQueue;
}

bool Command::producesPiEvent() const { return true; }

bool Command::supportsPostEnqueueCleanup() const {
// Isolated commands are cleaned up separately
return !MUsers.empty() || !MDeps.empty();
}
bool Command::supportsPostEnqueueCleanup() const { return true; }

Command *Command::addDep(DepDesc NewDep, std::vector<Command *> &ToCleanUp) {
Command *ConnectionCmd = nullptr;
Expand Down Expand Up @@ -1298,6 +1312,9 @@ MemCpyCommand::MemCpyCommand(Requirement SrcReq,
if (!MSrcQueue->is_host()) {
MEvent->setContextImpl(MSrcQueue->getContextImplPtr());
}

MWorkerQueue = MQueue->is_host() ? MSrcQueue : MQueue;

emitInstrumentationDataProxy();
}

Expand Down Expand Up @@ -1335,10 +1352,6 @@ const ContextImplPtr &MemCpyCommand::getWorkerContext() const {
return getWorkerQueue()->getContextImplPtr();
}

const QueueImplPtr &MemCpyCommand::getWorkerQueue() const {
return MQueue->is_host() ? MSrcQueue : MQueue;
}

bool MemCpyCommand::producesPiEvent() const {
// TODO remove this workaround once the batching issue is addressed in Level
// Zero plugin.
Expand Down Expand Up @@ -1481,6 +1494,8 @@ MemCpyCommandHost::MemCpyCommandHost(Requirement SrcReq,
MEvent->setContextImpl(MSrcQueue->getContextImplPtr());
}

MWorkerQueue = MQueue->is_host() ? MSrcQueue : MQueue;

emitInstrumentationDataProxy();
}

Expand Down Expand Up @@ -1518,10 +1533,6 @@ const ContextImplPtr &MemCpyCommandHost::getWorkerContext() const {
return getWorkerQueue()->getContextImplPtr();
}

const QueueImplPtr &MemCpyCommandHost::getWorkerQueue() const {
return MQueue->is_host() ? MSrcQueue : MQueue;
}

pi_int32 MemCpyCommandHost::enqueueImp() {
const QueueImplPtr &Queue = getWorkerQueue();
waitForPreparedHostEvents();
Expand Down
17 changes: 14 additions & 3 deletions sycl/source/detail/scheduler/commands.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -199,19 +199,28 @@ class Command {

/// Get the queue this command will be submitted to. Could differ from MQueue
/// for memory copy commands.
virtual const QueueImplPtr &getWorkerQueue() const;
const QueueImplPtr &getWorkerQueue() const;

/// Returns true iff the command produces a PI event on non-host devices.
virtual bool producesPiEvent() const;

/// Returns true iff this command can be freed by post enqueue cleanup.
virtual bool supportsPostEnqueueCleanup() const;

/// Collect PI events from EventImpls and filter out some of them in case of
/// in order queue
std::vector<RT::PiEvent>
getPiEvents(const std::vector<EventImplPtr> &EventImpls) const;

bool isHostTask() const;

protected:
QueueImplPtr MQueue;
QueueImplPtr MSubmittedQueue;
EventImplPtr MEvent;

QueueImplPtr &MWorkerQueue;

/// Dependency events prepared for waiting by backend.
/// See processDepEvent for details.
std::vector<EventImplPtr> &MPreparedDepsEvents;
Expand Down Expand Up @@ -252,6 +261,10 @@ class Command {
return MPreparedHostDepsEvents;
}

const std::vector<EventImplPtr> &getPreparedDepsEvents() const {
return MPreparedDepsEvents;
}

/// Contains list of dependencies(edges)
std::vector<DepDesc> MDeps;
/// Contains list of commands that depend on the command.
Expand Down Expand Up @@ -492,7 +505,6 @@ class MemCpyCommand : public Command {
const Requirement *getRequirement() const final { return &MDstReq; }
void emitInstrumentationData() final;
const ContextImplPtr &getWorkerContext() const final;
const QueueImplPtr &getWorkerQueue() const final;
bool producesPiEvent() const final;

private:
Expand All @@ -517,7 +529,6 @@ class MemCpyCommandHost : public Command {
const Requirement *getRequirement() const final { return &MDstReq; }
void emitInstrumentationData() final;
const ContextImplPtr &getWorkerContext() const final;
const QueueImplPtr &getWorkerQueue() const final;

private:
pi_int32 enqueueImp() final;
Expand Down
27 changes: 12 additions & 15 deletions sycl/source/detail/scheduler/graph_processor.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -58,23 +58,20 @@ bool Scheduler::GraphProcessor::enqueueCommand(
return false;
}

// Recursively enqueue all the dependencies first and
// exit immediately if any of the commands cannot be enqueued.
for (DepDesc &Dep : Cmd->MDeps) {
if (!enqueueCommand(Dep.MDepCommand, EnqueueResult, ToCleanUp, Blocking))
return false;
// Recursively enqueue all the implicit + explicit backend level dependencies
// first and exit immediately if any of the commands cannot be enqueued.
for (const EventImplPtr &Event : Cmd->getPreparedDepsEvents()) {
if (Command *DepCmd = static_cast<Command *>(Event->getCommand()))
if (!enqueueCommand(DepCmd, EnqueueResult, ToCleanUp, Blocking))
return false;
}

// Asynchronous host operations (amongst dependencies of an arbitrary command)
// are not supported (see Command::processDepEvent method). This impacts
// operation of host-task feature a lot with hangs and long-runs. Hence we
// have this workaround here.
// This workaround is safe as long as the only asynchronous host operation we
// have is a host task.
// This may iterate over some of dependencies in Cmd->MDeps. Though, the
// enqueue operation is idempotent and the second call will result in no-op.
// TODO remove the workaround when proper fix for host-task dispatching is
// implemented.
// Recursively enqueue all the implicit + explicit host dependencies and
// exit immediately if any of the commands cannot be enqueued.
// Host task execution is asynchronous. In current implementation enqueue for
// this command will wait till host task completion by waitInternal call on
// MHostDepsEvents. TO FIX: implement enqueue of blocked commands on host task
// completion stage and eliminate this event waiting in enqueue.
for (const EventImplPtr &Event : Cmd->getPreparedHostDepsEvents()) {
if (Command *DepCmd = static_cast<Command *>(Event->getCommand()))
if (!enqueueCommand(DepCmd, EnqueueResult, ToCleanUp, Blocking))
Expand Down
6 changes: 0 additions & 6 deletions sycl/source/detail/scheduler/scheduler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -157,12 +157,6 @@ EventImplPtr Scheduler::addCG(std::unique_ptr<detail::CG> CommandGroup,
CleanUp();
std::rethrow_exception(std::current_exception());
}

// If there are no memory dependencies decouple and free the command.
// Though, dismiss ownership of native kernel command group as it's
// resources may be in use by backend and synchronization point here is
// at native kernel execution finish.
CleanUp();
}
}
cleanupCommands(ToCleanUp);
Expand Down
1 change: 1 addition & 0 deletions sycl/unittests/scheduler/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,5 @@ add_sycl_unittest(SchedulerTests OBJECT
LeafLimitDiffContexts.cpp
InOrderQueueSyncCheck.cpp
RunOnHostIntelCG.cpp
EnqueueWithDependsOnDeps.cpp
)
Loading