-
Notifications
You must be signed in to change notification settings - Fork 506
Make quantize_ respect the usages of the quantizer #1836
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Przemek Tredak <[email protected]>
Signed-off-by: Przemek Tredak <[email protected]>
Signed-off-by: Przemek Tredak <[email protected]>
Signed-off-by: Przemek Tredak <[email protected]>
Signed-off-by: Przemek Tredak <[email protected]>
Signed-off-by: Przemek Tredak <[email protected]>
Signed-off-by: Przemek Tredak <[email protected]>
Signed-off-by: Przemek Tredak <[email protected]>
for more information, see https://pre-commit.ci
/te-ci pytorch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
const std::vector<size_t>& shape, DType dtype, const py::object& output = py::none(), | ||
std::optional<at::Tensor> rowwise_data = std::nullopt) const = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhat orthogonal, but since we're touching Quantizer::create_tensor
, we should consider removing the rowwise_data
arg. It was a UB-specific option that doesn't really make sense anymore. I believe all usages have been refactored away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, cool. I will do that - it will make the code nicer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually can't do that just yet. Attention also uses this unfortunately.
Signed-off-by: Przemek Tredak <[email protected]>
for more information, see https://pre-commit.ci
/te-ci pytorch |
Signed-off-by: Przemek Tredak <[email protected]>
/te-ci pytorch |
Signed-off-by: Przemek Tredak <[email protected]>
Signed-off-by: Przemek Tredak <[email protected]>
Signed-off-by: Przemek Tredak <[email protected]>
Signed-off-by: Przemek Tredak <[email protected]>
Signed-off-by: Przemek Tredak <[email protected]>
for more information, see https://pre-commit.ci
/te-ci pytorch |
Signed-off-by: Przemek Tredak <[email protected]>
/te-ci pytorch |
if (output_tensor_type == PythonTensorType::TENSOR_BASE) { | ||
output.attr("_quantizer") = this->quantizer; | ||
} else { | ||
output.attr("_quantizer") = python_copy(this->quantizer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we use python_copy(this->quantizer) for:
- output is not None (reused)
- construction of quantized tensor class (non-base)
But for the base class, we choose to use this->quantizer instead of python_copy(this->quantizer).
Have we analyzed CPU overhead of this copy?
It looks okay to me since at least the Base class doesn't require python-copy (since in my #1793, I enforced the creation of base class and I even avoided using _a
in the constructor to reduce overhead by 2x), but still curious to know why python-copy is needed.
Signed-off-by: Przemek Tredak <[email protected]>
for more information, see https://pre-commit.ci
/te-ci pytorch |
Signed-off-by: Przemek Tredak <[email protected]>
/te-ci pytorch |
Signed-off-by: Przemek Tredak <[email protected]>
Signed-off-by: Przemek Tredak <[email protected]>
for more information, see https://pre-commit.ci
Description
This PR changes the create_tensor function to accept an optional
out
parameter - a tensor that can be reused. It also changes the tex.quantize function to always try to recreate the tensor (potentially reusing the providedout
) rather than blindly believing thatout
is already prepared properly. This ensures that the quantizer settings are going to be preserved and enables us to drop the workaround inget_weight_workspaces
to ignore the cached MXFP8 weight tensor when the usages are incompatible. See #1593 (comment) for more details.Adding @guyueh1
Type of change
Changes
Please list the changes introduced in this PR:
Checklist: