Skip to content

Conversation

ptrendx
Copy link
Member

@ptrendx ptrendx commented May 31, 2025

Description

This PR changes the create_tensor function to accept an optional out parameter - a tensor that can be reused. It also changes the tex.quantize function to always try to recreate the tensor (potentially reusing the provided out) rather than blindly believing that out is already prepared properly. This ensures that the quantizer settings are going to be preserved and enables us to drop the workaround in get_weight_workspaces to ignore the cached MXFP8 weight tensor when the usages are incompatible. See #1593 (comment) for more details.

Adding @guyueh1

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

Please list the changes introduced in this PR:

  • Change A
  • Change B

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@ptrendx ptrendx requested a review from timmoon10 May 31, 2025 00:15
@ptrendx
Copy link
Member Author

ptrendx commented May 31, 2025

/te-ci pytorch

timmoon10
timmoon10 previously approved these changes May 31, 2025
Copy link
Collaborator

@timmoon10 timmoon10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +100 to 101
const std::vector<size_t>& shape, DType dtype, const py::object& output = py::none(),
std::optional<at::Tensor> rowwise_data = std::nullopt) const = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat orthogonal, but since we're touching Quantizer::create_tensor, we should consider removing the rowwise_data arg. It was a UB-specific option that doesn't really make sense anymore. I believe all usages have been refactored away.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, cool. I will do that - it will make the code nicer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually can't do that just yet. Attention also uses this unfortunately.

Signed-off-by: Przemek Tredak <[email protected]>
@ptrendx
Copy link
Member Author

ptrendx commented Jun 2, 2025

/te-ci pytorch

timmoon10
timmoon10 previously approved these changes Jun 2, 2025
Signed-off-by: Przemek Tredak <[email protected]>
@ptrendx
Copy link
Member Author

ptrendx commented Jun 6, 2025

/te-ci pytorch

Signed-off-by: Przemek Tredak <[email protected]>
ptrendx and others added 5 commits June 11, 2025 11:24
@ptrendx
Copy link
Member Author

ptrendx commented Jun 12, 2025

/te-ci pytorch

Signed-off-by: Przemek Tredak <[email protected]>
@ptrendx
Copy link
Member Author

ptrendx commented Jun 12, 2025

/te-ci pytorch

if (output_tensor_type == PythonTensorType::TENSOR_BASE) {
output.attr("_quantizer") = this->quantizer;
} else {
output.attr("_quantizer") = python_copy(this->quantizer);
Copy link
Collaborator

@zhongbozhu zhongbozhu Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we use python_copy(this->quantizer) for:

  • output is not None (reused)
  • construction of quantized tensor class (non-base)

But for the base class, we choose to use this->quantizer instead of python_copy(this->quantizer).

Have we analyzed CPU overhead of this copy?

It looks okay to me since at least the Base class doesn't require python-copy (since in my #1793, I enforced the creation of base class and I even avoided using _a in the constructor to reduce overhead by 2x), but still curious to know why python-copy is needed.

@ptrendx
Copy link
Member Author

ptrendx commented Jun 13, 2025

/te-ci pytorch

Signed-off-by: Przemek Tredak <[email protected]>
@ptrendx
Copy link
Member Author

ptrendx commented Jun 13, 2025

/te-ci pytorch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants