Make quantize_ respect the usages of the quantizer #1836

ptrendx · 2025-05-31T00:15:51Z

Description

This PR changes the create_tensor function to accept an optional out parameter - a tensor that can be reused. It also changes the tex.quantize function to always try to recreate the tensor (potentially reusing the provided out) rather than blindly believing that out is already prepared properly. This ensures that the quantizer settings are going to be preserved and enables us to drop the workaround in get_weight_workspaces to ignore the cached MXFP8 weight tensor when the usages are incompatible. See #1593 (comment) for more details.

Adding @guyueh1

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

Change A
Change B

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Przemek Tredak <[email protected]>

for more information, see https://pre-commit.ci

ptrendx · 2025-05-31T00:23:17Z

/te-ci pytorch

timmoon10

LGTM

timmoon10 · 2025-05-31T00:51:14Z

transformer_engine/pytorch/csrc/common.h

+      const std::vector<size_t>& shape, DType dtype, const py::object& output = py::none(),
      std::optional<at::Tensor> rowwise_data = std::nullopt) const = 0;


Somewhat orthogonal, but since we're touching Quantizer::create_tensor, we should consider removing the rowwise_data arg. It was a UB-specific option that doesn't really make sense anymore. I believe all usages have been refactored away.

ok, cool. I will do that - it will make the code nicer.

Actually can't do that just yet. Attention also uses this unfortunately.

transformer_engine/pytorch/csrc/quantizer.cpp

Signed-off-by: Przemek Tredak <[email protected]>

for more information, see https://pre-commit.ci

ptrendx · 2025-06-02T21:29:55Z

/te-ci pytorch

Signed-off-by: Przemek Tredak <[email protected]>

for more information, see https://pre-commit.ci

ptrendx · 2025-06-06T00:50:39Z

/te-ci pytorch

Signed-off-by: Przemek Tredak <[email protected]>

for more information, see https://pre-commit.ci

ptrendx · 2025-06-12T00:02:59Z

/te-ci pytorch

Signed-off-by: Przemek Tredak <[email protected]>

ptrendx · 2025-06-12T00:07:44Z

/te-ci pytorch

zhongbozhu · 2025-06-12T18:44:27Z

transformer_engine/pytorch/csrc/quantizer.cpp

+    if (output_tensor_type == PythonTensorType::TENSOR_BASE) {
+      output.attr("_quantizer") = this->quantizer;
+    } else {
+      output.attr("_quantizer") = python_copy(this->quantizer);


Why do we use python_copy(this->quantizer) for:

output is not None (reused)

construction of quantized tensor class (non-base)

But for the base class, we choose to use this->quantizer instead of python_copy(this->quantizer).

Have we analyzed CPU overhead of this copy?

It looks okay to me since at least the Base class doesn't require python-copy (since in my #1793, I enforced the creation of base class and I even avoided using _a in the constructor to reduce overhead by 2x), but still curious to know why python-copy is needed.

Signed-off-by: Przemek Tredak <[email protected]>

for more information, see https://pre-commit.ci

ptrendx · 2025-06-13T21:10:04Z

/te-ci pytorch

Signed-off-by: Przemek Tredak <[email protected]>

ptrendx · 2025-06-13T21:57:19Z

/te-ci pytorch

Signed-off-by: Przemek Tredak <[email protected]>

for more information, see https://pre-commit.ci

ptrendx added 8 commits May 30, 2025 15:41

Beginning of work to properly reuse the output given to quantize

1d7776f

Signed-off-by: Przemek Tredak <[email protected]>

Add current scaling

d207cea

Signed-off-by: Przemek Tredak <[email protected]>

Beginning of the other recipes

fa28e49

Signed-off-by: Przemek Tredak <[email protected]>

Added MXFP8 and cleanup

49ad122

Signed-off-by: Przemek Tredak <[email protected]>

Fix

17678d9

Signed-off-by: Przemek Tredak <[email protected]>

Actually reuse tensors and get rid of the hack for MXFP8

b488101

Signed-off-by: Przemek Tredak <[email protected]>

Small cleaning

209cb9f

Signed-off-by: Przemek Tredak <[email protected]>

Make sure dgrad is not needed in the test during eval phase

c61b14b

Signed-off-by: Przemek Tredak <[email protected]>

ptrendx requested a review from timmoon10 May 31, 2025 00:15

[pre-commit.ci] auto fixes from pre-commit.com hooks

7561fb4

for more information, see https://pre-commit.ci

timmoon10 previously approved these changes May 31, 2025

View reviewed changes

Fixes

41b8fb4

Signed-off-by: Przemek Tredak <[email protected]>

ptrendx dismissed timmoon10’s stale review via 41b8fb4 June 2, 2025 21:12

[pre-commit.ci] auto fixes from pre-commit.com hooks

20363a4

for more information, see https://pre-commit.ci

timmoon10 previously approved these changes Jun 2, 2025

View reviewed changes

Fixes

2acb07a

Signed-off-by: Przemek Tredak <[email protected]>

ptrendx dismissed timmoon10’s stale review via 2acb07a June 6, 2025 00:49

pre-commit-ci bot and others added 2 commits June 6, 2025 00:50

[pre-commit.ci] auto fixes from pre-commit.com hooks

9550644

for more information, see https://pre-commit.ci

Merge branch 'main' into pr_quantize_output_respect_usages

2f9f9ce

Fix for integer overflow

f0f96b9

Signed-off-by: Przemek Tredak <[email protected]>

zhongbozhu mentioned this pull request Jun 7, 2025

[PyTorch][MoE] Reduce CPU Overhead By Fuse Torch Empty Calls #1793

Merged

13 tasks

ptrendx and others added 5 commits June 11, 2025 11:24

Try copying the quantizer

eb49987

Signed-off-by: Przemek Tredak <[email protected]>

Fix

6dcd480

Signed-off-by: Przemek Tredak <[email protected]>

Fix CUDA graphs test

b6f1aeb

Signed-off-by: Przemek Tredak <[email protected]>

Merge branch 'main' into pr_quantize_output_respect_usages

b92f3f5

Signed-off-by: Przemek Tredak <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

1f4f894

for more information, see https://pre-commit.ci

Fix

53554f2

Signed-off-by: Przemek Tredak <[email protected]>

zhongbozhu reviewed Jun 12, 2025

View reviewed changes

ptrendx and others added 3 commits June 13, 2025 14:05

Fix the float8blockwise tests and MXFP8 cuda graphs tests

343d43d

Signed-off-by: Przemek Tredak <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

817d8ce

for more information, see https://pre-commit.ci

Merge branch 'main' into pr_quantize_output_respect_usages

b6b4af3

Fix issue from merge

207e4b7

Signed-off-by: Przemek Tredak <[email protected]>

ptrendx and others added 3 commits June 13, 2025 15:35

Always use tex.quantize when updating cache to use proper quantizer

715cc53

Signed-off-by: Przemek Tredak <[email protected]>

Debug

d682178

Signed-off-by: Przemek Tredak <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

e6f38d1

for more information, see https://pre-commit.ci

This was referenced Jul 15, 2025

[PyTorch] Refactor C++ quantizer infrastructure #1952

Merged

[PyTorch] Reset FP8 weight workspace if usages are invalid #1972

Merged

ksivaman closed this in #1952 Jul 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make quantize_ respect the usages of the quantizer #1836

Make quantize_ respect the usages of the quantizer #1836

Uh oh!

ptrendx commented May 31, 2025

Uh oh!

ptrendx commented May 31, 2025

Uh oh!

timmoon10 left a comment

Uh oh!

timmoon10 May 31, 2025

Uh oh!

ptrendx May 31, 2025

Uh oh!

ptrendx Jun 2, 2025

Uh oh!

Uh oh!

Uh oh!

ptrendx commented Jun 2, 2025

Uh oh!

ptrendx commented Jun 6, 2025

Uh oh!

ptrendx commented Jun 12, 2025

Uh oh!

ptrendx commented Jun 12, 2025

Uh oh!

zhongbozhu Jun 12, 2025 •

edited

Loading

Uh oh!

ptrendx commented Jun 13, 2025

Uh oh!

ptrendx commented Jun 13, 2025

Uh oh!

Uh oh!

		const std::vector<size_t>& shape, DType dtype, const py::object& output = py::none(),
		std::optional<at::Tensor> rowwise_data = std::nullopt) const = 0;

Make quantize_ respect the usages of the quantizer #1836

Make quantize_ respect the usages of the quantizer #1836

Uh oh!

Conversation

ptrendx commented May 31, 2025

Description

Type of change

Changes

Checklist:

Uh oh!

ptrendx commented May 31, 2025

Uh oh!

timmoon10 left a comment

Choose a reason for hiding this comment

Uh oh!

timmoon10 May 31, 2025

Choose a reason for hiding this comment

Uh oh!

ptrendx May 31, 2025

Choose a reason for hiding this comment

Uh oh!

ptrendx Jun 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ptrendx commented Jun 2, 2025

Uh oh!

ptrendx commented Jun 6, 2025

Uh oh!

ptrendx commented Jun 12, 2025

Uh oh!

ptrendx commented Jun 12, 2025

Uh oh!

zhongbozhu Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ptrendx commented Jun 13, 2025

Uh oh!

ptrendx commented Jun 13, 2025

Uh oh!

Uh oh!

zhongbozhu Jun 12, 2025 •

edited

Loading