[PyTorch] Debug linear layer when saving original input and using debug quantizer #1963

timmoon10 · 2025-07-17T23:43:24Z

Description

#1865 introduced a failure in the distributed debug tests. ~~The root cause is because we only all-gather the row-wise data for DebugQuantizedTensor:~~

TransformerEngine/transformer_engine/pytorch/distributed.py

Line 1394 in f8933bb

    
           rowwise_total = gather_along_first_dim(rowwise, process_group, False, final_quantizer)[0]

~~However, if the linear layer is caching its original input tensor and requantizing in the backward pass, the correct behavior is to only quantize the column-wise data.~~ This PR is a hacky workaround that only applies the debug quantizer to the gathered input tensor.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Modify linear backward to avoid all-gathering debug tensor with only column-wise data

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Tim Moon <[email protected]>

for more information, see https://pre-commit.ci

timmoon10 · 2025-07-17T23:54:16Z

/te-ci pytorch L1

Signed-off-by: Tim Moon <[email protected]>

hxbai · 2025-07-20T15:44:38Z

transformer_engine/pytorch/module/linear.py

                        else:
-                            quantizer.set_usage(rowwise=False, columnwise=True)
+                            quantizer.set_usage(rowwise=True, columnwise=True)


Hi Tim, why do we need the rowwise data here?

In principle we shouldn't need it, but I was running into issues where FP8 casts were failing without it. ~~Actually, I don't think we need it once we skip the debug quantizer case.~~

I think this may be due to that the condition is modified. For (Float8Quantizer, Float8CurrentScalingQuantizer) cases, it doesn't support only quantizing the colwise data. But if backward_input_needs_gather is False, (Float8Quantizer, Float8CurrentScalingQuantizer) tensors would also go into the else path, which will cause the error. This hurts the performance of blockwise FP8 and MXFP8. Do you need me to create a fix PR for this, or will you fix this together with the DebugTensor?

pggPL · 2025-07-21T14:35:23Z

I think this line is source of error

TransformerEngine/transformer_engine/pytorch/distributed.py

Line 1408 in f8933bb

out_obj.rowwise_gemm_tensor = out_obj.rowwise_gemm_tensor

The rowwise/columnwise in DebugQuantizedTensors are the tensors used in gemms -> they can be both the same Float8Tensor object for example. And update_usage() does nothing in debug tensors currently.

Signed-off-by: Tim Moon <[email protected]>

timmoon10 · 2025-07-21T19:06:39Z

@pggPL I tried changing

TransformerEngine/transformer_engine/pytorch/distributed.py

Line 1408 in f8933bb

out_obj.rowwise_gemm_tensor = out_obj.rowwise_gemm_tensor

to

out_obj.columnwise_gemm_tensor = out_obj.rowwise_gemm_tensor

However, the error reappeared when I applied the debug quantizer to the local input tensor.

For now, I think we should merge this as a quick bugfix and we can fix the edge cases for the debug tensor later.

ksivaman

LGTM

timmoon10 · 2025-07-21T21:53:50Z

/te-ci pytorch L1

FP8 does not support transpose-only cast. Signed-off-by: Tim Moon <[email protected]>

…ug quantizer (#1963) * Debug linear layer when saving original input and using debug quantizer Signed-off-by: Tim Moon <[email protected]> * Workaround bugs with quantizing with only column-wise usage Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: Tim Moon <[email protected]> * Avoid unnecessary row-wise data Signed-off-by: Tim Moon <[email protected]> * Workaround bugs with quantizing with only column-wise usage FP8 does not support transpose-only cast. Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Debug linear layer when saving original input and using debug quantizer

867ab1c

Signed-off-by: Tim Moon <[email protected]>

timmoon10 requested a review from pggPL July 17, 2025 23:43

Workaround bugs with quantizing with only column-wise usage

6f66aa4

Signed-off-by: Tim Moon <[email protected]>

timmoon10 force-pushed the debug-linear-original-input branch from 5387dd7 to 6f66aa4 Compare July 17, 2025 23:52

timmoon10 added the 2.6.0 label Jul 17, 2025

[pre-commit.ci] auto fixes from pre-commit.com hooks

7c20075

for more information, see https://pre-commit.ci

cyanguwa mentioned this pull request Jul 18, 2025

Update cudnn-frontend to 1.13.0 #1960

Merged

13 tasks

Remove unused imports

212a446

Signed-off-by: Tim Moon <[email protected]>

timmoon10 requested a review from ksivaman July 18, 2025 21:21

hxbai reviewed Jul 20, 2025

View reviewed changes

timmoon10 added 2 commits July 21, 2025 17:57

Merge branch 'main' into debug-linear-original-input

d83014d

Avoid unnecessary row-wise data

62d5c38

Signed-off-by: Tim Moon <[email protected]>

ksivaman previously approved these changes Jul 21, 2025

View reviewed changes

Merge branch 'main' into debug-linear-original-input

6cc7cac

Workaround bugs with quantizing with only column-wise usage

d9c7dd1

FP8 does not support transpose-only cast. Signed-off-by: Tim Moon <[email protected]>

timmoon10 dismissed ksivaman’s stale review via d9c7dd1 July 22, 2025 00:55

timmoon10 merged commit 315b47d into NVIDIA:main Jul 22, 2025
12 checks passed

hxbai mentioned this pull request Jul 22, 2025

[PyTorch] fix input_quantizer usage for save_original_input; fix blockwise FP8 convert_and_update_tensor #1978

Merged

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PyTorch] Debug linear layer when saving original input and using debug quantizer #1963

[PyTorch] Debug linear layer when saving original input and using debug quantizer #1963

Uh oh!

timmoon10 commented Jul 17, 2025 •

edited

Loading

Uh oh!

timmoon10 commented Jul 17, 2025

Uh oh!

hxbai Jul 20, 2025

Uh oh!

timmoon10 Jul 21, 2025 •

edited

Loading

Uh oh!

hxbai Jul 22, 2025

Uh oh!

pggPL commented Jul 21, 2025 •

edited

Loading

Uh oh!

timmoon10 commented Jul 21, 2025

Uh oh!

ksivaman left a comment

Uh oh!

timmoon10 commented Jul 21, 2025

Uh oh!

Uh oh!

Uh oh!

[PyTorch] Debug linear layer when saving original input and using debug quantizer #1963

[PyTorch] Debug linear layer when saving original input and using debug quantizer #1963

Uh oh!

Conversation

timmoon10 commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Changes

Checklist:

Uh oh!

timmoon10 commented Jul 17, 2025

Uh oh!

hxbai Jul 20, 2025

Choose a reason for hiding this comment

Uh oh!

timmoon10 Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hxbai Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

pggPL commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

timmoon10 commented Jul 21, 2025

Uh oh!

ksivaman left a comment

Choose a reason for hiding this comment

Uh oh!

timmoon10 commented Jul 21, 2025

Uh oh!

Uh oh!

Uh oh!

timmoon10 commented Jul 17, 2025 •

edited

Loading

timmoon10 Jul 21, 2025 •

edited

Loading

pggPL commented Jul 21, 2025 •

edited

Loading