Closed
Description
We saw that on large tensors, for example an image with 1 x 3 x 224 x 224 ( 602 KiB ), the CPU cycles (wasted) on Allocator take a big chunk of the total redis-server main thread CPU cycles. In the example bellow we observe 42% wasted cpu cycles.
We should investigate ways to improve it by testing reusing previously allocated command arg memory. Only possible for BLOB
tensorsets...