Skip to content

[Feature request] Backend-specific tensors #1214

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jsubag opened this issue Jul 4, 2018 · 6 comments
Open

[Feature request] Backend-specific tensors #1214

jsubag opened this issue Jul 4, 2018 · 6 comments

Comments

@jsubag
Copy link
Contributor

jsubag commented Jul 4, 2018

Backends should be able to implement their own type of tensor objects.
A good example for using this feature would be having tensors backed by OpenCL resources for the OpenCL backend (and similarly for other backends).

In order to support this, the Tensor class can be made more suitable for deriving (adding virtual qualifiers etc.) or create a Tensor interface class for other implementations to derive from.
Additional features for this Tensor interface should include support for lock/unlock (map/unmap) or some other forms of synchronization mechanism.

@nadavrot
Copy link
Contributor

nadavrot commented Jul 4, 2018

@jsubag Jacob, how do graphic drivers solve this problem? When I program cuda/opencl code I don't need to allocate user-space buffers with the driver. I understand that in some cases the driver would need to copy the data. Do you know what latency will this copy introduce?

@jsubag
Copy link
Contributor Author

jsubag commented Jul 4, 2018

@nadavrot On GPUs there's usually a copy between system memory and the GPU memory but here there are usages that may require an additional copy.

Consider the case of an application analyzing images taken from a camera on the same host machine. Typically, the captured images are written to a specific location designated by the camera driver. In the current Glow design this data will be copied to a Tensor used as an input (backed by system memory), then the backend will be able to copy it a second time to the GPU.
The first copy can be removed by exposing the right mechanism so that the application can initiate only the "second" copy from camera output buffer to the GPU resource.
There are other mechanisms in the GPU domain such as sharing EGL resources (potentially saving all copies) but i think those require a tighter handshake between the components & drivers.

Additionally, some GPUs require the system memory to be copied from be pinned and aligned.
If the memory backing the tensor isn't complying with these that can incur a third copy from the tensor system memory to a another system memory location that complies with the pinning/alignment requirements (this is usually the result of DMA requirements).

Latency numbers vary across different hardware - but if we're talking about copying a few MB in system memory on a modern computer that shouldn't take more than a millisecond.
However, if your input/output are larger and your workload is latency sensitive this can be the difference between reaching your target frame-rate and missing it.
So the more we can save the better :)

@bertmaher
Copy link
Contributor

That's a pretty interesting use-case. I think we could avoid the camera->system copy by using the Tensor(void *data, TypeRef ty) constructor to make a tensor backed by the camera memory, and then bind that tensor to an input Variable.

@opti-mix
Copy link
Contributor

opti-mix commented Jul 9, 2018

@bertmaher I originally introduced the Tensor(void *data, TypeRef ty) constructor exactly for integrating with other 3rd party frameworks and the like, where the tensor itself is allocated and manged outside Glow.

But of course, this constructor requires that the payload is at least in the same address space and has the same memory layout, alignment, padding, etc. There could be use-cases, where it is not the case. E.g. the payload of the tensor is in a different address space (e.g. GPU memory, etc) or has a different memory layout (e.g. padding, alignment, row vs column major, etc).

Such use-cases may indeed need more than just this constructor.

@jsubag
Copy link
Contributor Author

jsubag commented Jul 10, 2018

@opti-mix For the cases you mention there should be a mechanism to trigger data transfer between the backend specific address space/layout/etc. and host memory space (maybe similar to GPU-style map/unmap).

@nadavrot
Copy link
Contributor

@jsubag This issue is related to #1334

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants