-
Notifications
You must be signed in to change notification settings - Fork 476
[PyTorch] Quantizer as API #2039
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Evgeny <[email protected]>
Signed-off-by: Evgeny <[email protected]>
/te-ci pytorch |
About inconsistencies of Naming: parameters I would propose to reconsider this design. Renaming would already improve the situation.
Although this introduces a new class for users to learn. |
@ptrendx I propose to keep This is TE-specific optimization (weight workspace caching + cuda graph support etc.)
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exposing a pure Python interface will be quite nice. Ideally we'd design the quantizers in such a way that a C++ or Python impl is an implementation detail, and we don't need any special logic in the modules.
One question is how to handle the C++ quantizer infrastructure, e.g. in tex.quantize
. Options:
- Only use the C++ quantizer as a perf optimization. We'll need to add checks to avoid passing a pure-Python quantizer into
tex
functions (e.g. for norms or activations). - Add a C++ quantizer that calls a Python function (or modify the C++ quantizer base class). The C++ quantizer has to deal with both Python tensor classes (as pybind objects) and
NVTETensor
, and handlingNVTETensor
s from Python will be challenging.
I think the pure Python approach in this PR is more straightforward.
self.columnwise_usage = columnwise | ||
|
||
|
||
class Quantizer(QuantizerBase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the distinction between Quantizer
and QuantizerBase
is logical. This PR is trying to distinguish between quantizers that call tex.quantize
and those that call a Python impl, but that's a quantizer-specific implementation detail. QuantizerBase
is also haphazardly removing parts of the quantizer API, like the ability to construct empty tensors, PyTorch autograd support, etc.
I think the right design is not to add an unnecessary QuantizerBase
class, but to decouple Quantizer
from tex.quantize
. We can add an abstract quantize_impl
function that is called within quantize
. The existing quantizers should call tex.quantize
, but future quantizers could use a pure Python impl.
I don't think this is a good reason to change the API. If we want to cut corners by not implementing things (fair enough for experimentation), we can throw |
Description
Expose quantizer as an API.
Main objective - let users build custom quantizers.
Currently,
Quantizer
contains TE-specific logic (quantize()
with autograd function,calibrate()
,_get_compatible_recipe()
).I propose to extract the most generic interfaces/implementations into
QuantizerBase
and expose it as a first-class API.Usage example:
Custom quantizers can be used:
update_quantized()
)Type of change
Changes
Please list the changes introduced in this PR:
QuantizerBase
QuantizerBase
as an APIChecklist: