You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Enable dispatch to tinygemm int4 and int8 kernels for unified quantized tensor
Summary:
This adds some dispatch to the tinygemm kernels for cuda, although need to resolve implementation
mismatch problem for tinygemm first
Test Plan:
TODO
Reviewers:
Subscribers:
Tasks:
Tags:
Copy file name to clipboardExpand all lines: torchao/quantization/subclass.py
+55-4
Original file line number
Diff line number
Diff line change
@@ -626,7 +626,7 @@ class AffineQuantizedTensor(torch.Tensor):
626
626
shape (torch.Size): the shape for the Tensor
627
627
quant_min (Optional[int]): minimum quantized value for the Tensor, if not specified, it will be derived from dtype of `int_data`
628
628
quant_max (Optional[int]): maximum quantized value for the Tensor, if not specified, it will be derived from dtype of `int_data`
629
-
input_quant_func (Optional[Callable]): function for quantizing the input float Tensor to a quantized tensor subclass object, that takes input Tensor as input and outputs an AffineQuantizedTensor object
629
+
input_quant_func (Optional[Callable]): function for quantizing the input float Tensor to a quantized tensor subclass object, that takes float Tensor as input and outputs an AffineQuantizedTensor object
630
630
dtype: dtype for external representation of the tensor, e.g. torch.float32
0 commit comments