-
Notifications
You must be signed in to change notification settings - Fork 511
[PyTorch Debug] Fix issue with negative underflow. #2107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Pawel Gadzinski <[email protected]>
Signed-off-by: Pawel Gadzinski <[email protected]>
/te-ci pytorch |
Signed-off-by: Pawel Gadzinski <[email protected]>
/te-ci pytorch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes an issue with negative underflow percentage calculations in PyTorch FP8 debugging functionality. The problem was that the code only counted 0
values but missed -0
values (represented as 128
in FP8 format). Additionally, it corrects a missing subtraction in the underflow percentage computation.
Key changes:
- Updated underflow detection to include both
0
and-0
values usingtorch.isin
with[0, 128]
- Fixed missing
- (x == 0).sum()
in percentage calculation - Updated tests to use random tensors and adjusted tolerance values
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
File | Description |
---|---|
transformer_engine/debug/features/utils/stats_computation.py | Updated underflow detection logic to handle both positive and negative zero values |
tests/pytorch/debug/test_log.py | Modified test to use random tensors and updated MSE tolerance |
tests/pytorch/debug/test_api_features.py | Fixed test to use dequantized tensor for underflow calculation |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Signed-off-by: Pawel Gadzinski <[email protected]>
for more information, see https://pre-commit.ci
Signed-off-by: Pawel Gadzinski <[email protected]>
Signed-off-by: Pawel Gadzinski <[email protected]>
Signed-off-by: Pawel Gadzinski <[email protected]>
a590f94
to
7464c48
Compare
for more information, see https://pre-commit.ci
Signed-off-by: Pawel Gadzinski <[email protected]>
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/te-ci pytorch |
for more information, see https://pre-commit.ci [PyTorch Debug] Fix issue with negative underflow% stat. (NVIDIA#2107) * fix underflows log issue Signed-off-by: Pawel Gadzinski <[email protected]> * fix Signed-off-by: Pawel Gadzinski <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by: Pawel Gadzinski <[email protected]> * fix Signed-off-by: Pawel Gadzinski <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by: Pawel Gadzinski <[email protected]> * fix Signed-off-by: Pawel Gadzinski <[email protected]> * fix Signed-off-by: Pawel Gadzinski <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by: Pawel Gadzinski <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Pawel Gadzinski <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* fix underflows log issue Signed-off-by: Pawel Gadzinski <[email protected]> * fix Signed-off-by: Pawel Gadzinski <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by: Pawel Gadzinski <[email protected]> * fix Signed-off-by: Pawel Gadzinski <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by: Pawel Gadzinski <[email protected]> * fix Signed-off-by: Pawel Gadzinski <[email protected]> * fix Signed-off-by: Pawel Gadzinski <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by: Pawel Gadzinski <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Pawel Gadzinski <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Description
The issue with negative percentage of underflow was observed. The reason is that we count only
0
infp8_tensor._data
tensor, but we need to take into account also-0
, represented by128
(10000000
in binary, only sign bit is 1).This PR also fixes issue with computation of percentage of underflows on one device - I forgot to add
- (x == 0).sum()
.Type of change
Checklist: