You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Enable dispatch to tinygemm int4 and int8 kernels for unified quantized tensor
Summary:
This adds some dispatch to the tinygemm kernels for cuda, although need to resolve implementation
mismatch problem for tinygemm first
Test Plan:
TODO
Reviewers:
Subscribers:
Tasks:
Tags:
Copy file name to clipboardExpand all lines: torchao/quantization/subclass.py
+95-3Lines changed: 95 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -14,7 +14,9 @@
14
14
dynamically_quantize_per_channel,
15
15
groupwise_affine_quantize_tensor,
16
16
quant_int8_dynamic_per_token_linear,
17
+
pack_tinygemm_scales_and_zeros,
17
18
unpack_tinygemm_scales_and_zeros,
19
+
groupwise_affine_quantize_tensor_from_qparams,
18
20
choose_qparams_affine,
19
21
quantize_affine,
20
22
dequantize_affine,
@@ -619,7 +621,7 @@ class AffineQuantizedTensor(torch.Tensor):
619
621
shape (torch.Size): the shape for the Tensor
620
622
quant_min (Optional[int]): minimum quantized value for the Tensor, if not specified, it will be derived from dtype of `int_data`
621
623
quant_max (Optional[int]): maximum quantized value for the Tensor, if not specified, it will be derived from dtype of `int_data`
622
-
input_quant_func (Optional[Callable]): function for quantizing the input float Tensor to a quantized tensor subclass object, that takes input Tensor as input and outputs an AffineQuantizedTensor object
624
+
input_quant_func (Optional[Callable]): function for quantizing the input float Tensor to a quantized tensor subclass object, that takes float Tensor as input and outputs an AffineQuantizedTensor object
623
625
dtype: dtype for external representation of the tensor, e.g. torch.float32
0 commit comments