-
Notifications
You must be signed in to change notification settings - Fork 699
[Feature request] Backend-specific tensors #1214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@jsubag Jacob, how do graphic drivers solve this problem? When I program cuda/opencl code I don't need to allocate user-space buffers with the driver. I understand that in some cases the driver would need to copy the data. Do you know what latency will this copy introduce? |
@nadavrot On GPUs there's usually a copy between system memory and the GPU memory but here there are usages that may require an additional copy. Consider the case of an application analyzing images taken from a camera on the same host machine. Typically, the captured images are written to a specific location designated by the camera driver. In the current Glow design this data will be copied to a Tensor used as an input (backed by system memory), then the backend will be able to copy it a second time to the GPU. Additionally, some GPUs require the system memory to be copied from be pinned and aligned. Latency numbers vary across different hardware - but if we're talking about copying a few MB in system memory on a modern computer that shouldn't take more than a millisecond. |
That's a pretty interesting use-case. I think we could avoid the camera->system copy by using the |
@bertmaher I originally introduced the But of course, this constructor requires that the payload is at least in the same address space and has the same memory layout, alignment, padding, etc. There could be use-cases, where it is not the case. E.g. the payload of the tensor is in a different address space (e.g. GPU memory, etc) or has a different memory layout (e.g. padding, alignment, row vs column major, etc). Such use-cases may indeed need more than just this constructor. |
@opti-mix For the cases you mention there should be a mechanism to trigger data transfer between the backend specific address space/layout/etc. and host memory space (maybe similar to GPU-style map/unmap). |
Backends should be able to implement their own type of tensor objects.
A good example for using this feature would be having tensors backed by OpenCL resources for the OpenCL backend (and similarly for other backends).
In order to support this, the Tensor class can be made more suitable for deriving (adding virtual qualifiers etc.) or create a Tensor interface class for other implementations to derive from.
Additional features for this Tensor interface should include support for lock/unlock (map/unmap) or some other forms of synchronization mechanism.
The text was updated successfully, but these errors were encountered: