-
Notifications
You must be signed in to change notification settings - Fork 699
Teach Glow how to support custom alginment requirements for tensor dimensions #2686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
For 1, do we want to use libjit for some backends other than the CPU Backend that require/prefer strides? RE: For 2, we could always create a new StridedTensor class that all backends could share. I agree it might not be great if we need to keep all unstrided Tensors around too, or copy everything over to the new format. Though I'm not sure how we avoid copying in option 1 either -- we would load Constants as non-strided and need to copy over all the data into the new layout once the backend specifies what its requirements are. And I think we could ditch the unstrided Tensors in either case once we create the Strided versions, no? For 3, this is similar to the way I implemented fused rowwise-quantized Tensors. I added a new |
Yes, they'd work. But we'd need to update a bunch of Interpreter kernels and virtually all libjit and OpenCL kernels would need to get a
Sounds reasonable.
+1.
If you are talking about types, then StridedTensor would become the new Tensor as it is strictly a superset of the current Tensor class, because current Tensor class is just a StridedTensor where all alignment requirements are 1, i.e. there are no alignment requirements. But I was talking more about executing NN models.
Right, we could e.g. transform all constants at compile-time into whatever memory layout is specified by the backend.
This is the question. I guess if a backend with custom alignment requirements supports all the operations we can convert everything into the strided versions and then ditch the original unstrided tensors. But if we need to use e.g. our current libjit or Interpreter implementations as a fallback for some operations, e.g. during the development of a new backend, we run into a problem that our current existing implementations expect unstrided tensors. And thus we'd need to perform some copies between strided and unstrided tensors at runtime so that we can provide these tensors in the proper expected format to those operations. Of course, if we'd update all those operations to support alignments/striding as in (1), we wouldn't have this problem.
Yeah, (3) is pretty similar to (2), but it is much more dangerous IMHO, because we use the same Tensor unstrided type for everything, but have to remember if any given instance of this Tensor type is logically strided or not. This sounds like a recipe for disaster :-) And we cannot reuse our existing kernels on the strided versions. Thus, I think (3) is the least attractive alternative. |
@opti-mix One way to view this is to consider packed tensors as the canonical representation, and aligned tensors as the lowered tensor. The packed version is close to the pure mathematical form. This is the form that you want to have when you are performing optimizations, such as fusing operations. Later in the compilation pipeline we can complicate things by adding things like padding, alignment, or even perhaps distribution across multiple banks. There are two ways to represent the aligned tensors in the type system. 1) add first-class support to everything, like you mentioned. 2) implement the semantics with specialized operators that know what to do with the pads. LLVM, btw is closer to #1, where the operations (loads/stores) contain the alignment information. Another example for the typesystem question is what @jfix71 mentioned. We could have implemented the rowwise operators as a change to the type-system, but instead decided to add custom operators and keep the tensors opaque (using the pointer terminology here). Let's say that we want to support padded convolutions. One easy way to implement this is to manually pad some tensor by making it larger, and create some special convolution that knows that only a certain window inside of it contains the "real" data. All of the low-level optimization passes that deal with copying memory don't need to know anything about the padding/alignment because for them this is just memory that is copied around. Only the operations that access the data need to know what's padding and what's data. We can transform the pure graph into this form during lowering. |
I've spent more time thinking about these issues. Here is what I came up with: Different operations (e.g. convolutions, gemms, etc) may require each of their inputs and outputs to use a specific memory layout, both in terms of alignments and logical ordering of dimensions (e.g. NHWC or NCHW). More over, these requirements may depend on the properties of the operation, e.g. dimensions of its operands, etc. E.g. a convolution with a small filter may need the input operands in a format different from a convolution with a big filter. Today, Glow core and custom backends implicitly hard-code this knowledge about the operations into (backend-specific) nodes and code that works with them, i.e. in the code that transforms Glow nodes, which are in the canonical What if Glow would be able to ask a backend for each node what are the requirements for each of node's inputs and results in terms of alignments and ordering of dimensions?
IMHO, this approach seems to solve this common issue in a pretty clean and principled way. |
Is LayoutTransform similar to Transpose? |
Yes. The overall idea and purpose is very similar, but |
Some backends may have specific requirements on the alignment of data inside tensors. E.g. each row of the innermost dimension to be cache line size aligned. In principle, any of the dimensions may have an alignment requirement.
Glow's
Tensor
class as of today is not able to represent this requirements, because they underlyingType
cannot express it.Type
only stores the actual dimensions, but not the aligned dimensions.So, how could one use Glow Tensors to enforce any requirements on alignments?
Here are some possible alternatives:
Glow Tensors could be extended to explicitly support custom strides.
Type
would have a dedicatedstrides_
member in addition tosizes_
.It is also possible to try tweaking the representation of tensors only in the specific backends that need it, without changing the
Tensor
andType
classes. E.g. one could introduce custom backend-specific Tensor classes. This is certainly doable, but it is very likely that this logic will be re-implemented in a number of different backends.Tensor
andType
classes. Every time when a custom-aligned tensor is needed, one would represent it e.g. as a Tensor with bigger dimensions that accommodate for the required alignments and contains some junk padding elements. Such a tensor would have a desired memory layout. But it is important to note that some of the existing operations on tensors would not produce correct results when applied to this tensor, because all those operations assume that all elements contain real data and need to be processed, which is not the case due to the added junk padding elements. The content of the original tensor would be copied into the "padded" tensor before the operation that makes use of custom alignments and would be copied back into the original tensor after that operation.I'm probably missing some further alternatives here.
It would be interesting to hear from those who faced a similar issue what was their experience and how they solved this problem. May be we can learn from it.
Personally, I tend to think that (1) is the cleanest solution, but I'm afraid it may involve a lot of re-factoring in existing kernels.
Note: This issue is related to #1214.
The text was updated successfully, but these errors were encountered: