-
Notifications
You must be signed in to change notification settings - Fork 792
[SYCL] discard_write optimization #2854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
9a1f5f8
11cdf0d
939d82a
bcab2bb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
|
@@ -1141,21 +1141,12 @@ cl_int MemCpyCommand::enqueueImp() { | |||
|
||||
auto RawEvents = getPiEvents(EventImpls); | ||||
|
||||
// Omit copying if mode is discard one. | ||||
// TODO: Handle this at the graph building time by, for example, creating | ||||
// empty node instead of memcpy. | ||||
if (MDstReq.MAccessMode == access::mode::discard_read_write || | ||||
MDstReq.MAccessMode == access::mode::discard_write || | ||||
MSrcAllocaCmd->getMemAllocation() == MDstAllocaCmd->getMemAllocation()) { | ||||
Command::waitForEvents(Queue, EventImpls, Event); | ||||
} else { | ||||
MemoryManager::copy( | ||||
MSrcAllocaCmd->getSYCLMemObj(), MSrcAllocaCmd->getMemAllocation(), | ||||
MSrcQueue, MSrcReq.MDims, MSrcReq.MMemoryRange, MSrcReq.MAccessRange, | ||||
MSrcReq.MOffset, MSrcReq.MElemSize, MDstAllocaCmd->getMemAllocation(), | ||||
MQueue, MDstReq.MDims, MDstReq.MMemoryRange, MDstReq.MAccessRange, | ||||
MDstReq.MOffset, MDstReq.MElemSize, std::move(RawEvents), Event); | ||||
} | ||||
MemoryManager::copy( | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Does it mean that this unconditional copy is expected to slow down OpenCL execution currently? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe my commit comment could be clearer. There are two sides: unified memory and not. This particular line of code you are commenting on is for when we do not have unified memory. In that case, the underlying backend doesn't really matter. SYCL itself is scheduling individual mem read/writes to maintain the coherency, and this optimization will avoid the needless mem read. The comment about CUDA requires no change is in the context of when there is unified memory. Then we are scheduling paired map/unmap operations. Any optimization will have to be performed by the backend (or it's PI interface). In the case of Level Zero, this PR adds the optimization to its PI interface. In the case of CUDA, it looks like the PI interface is already performing the optimization (and microbenchmarks confirm that). For OpenCL, the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Could you double check and spot it in the CUDA PI plugin sources?
This is strange that OpenCL doesn't optimize this, are you going to follow up with OpenCL team? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
It's at llvm/sycl/plugins/cuda/pi_cuda.cpp Line 4120 in 6733c8b
Agreed. That is the plan. |
||||
MSrcAllocaCmd->getSYCLMemObj(), MSrcAllocaCmd->getMemAllocation(), | ||||
MSrcQueue, MSrcReq.MDims, MSrcReq.MMemoryRange, MSrcReq.MAccessRange, | ||||
MSrcReq.MOffset, MSrcReq.MElemSize, MDstAllocaCmd->getMemAllocation(), | ||||
MQueue, MDstReq.MDims, MDstReq.MMemoryRange, MDstReq.MAccessRange, | ||||
MDstReq.MOffset, MDstReq.MElemSize, std::move(RawEvents), Event); | ||||
|
||||
return CL_SUCCESS; | ||||
} | ||||
|
Uh oh!
There was an error while loading. Please reload this page.